Research Summary
Our lab studies the three-dimensional structures of macromolecular
complexes by integrating both experimental and bioinformatic
methods from the fields of X-ray crystallography, structural
bioinformatics, and evolutionary theory. Our previous research
has concentrated on the biophysical basis of sequence-specific
recognition of unusually structured nucleic acids (such
as ssDNAs and ssRNAs) and on the evolution of proteins involved
in this important biological function.
Telomeric OB-folds
The
OB-fold is one of the most important protein folds that
specifically interacts with single-stranded DNAs and RNAs,
and it is also one of the few protein superfolds. Multiple
OB-fold domains are common in nucleic acid recognition.
However, in general OB-fold domains are notoriously difficult
to detect based upon sequence similarity alone, and most
proteins containing this structural motif share little sequence
similarity. The OB-fold is found in several telomere-binding
proteins that specifically recognize and bind the single-stranded
DNA telomeric overhang of chromosomes, including the ciliate
telomere end-binding protein TEBP, the metazoan Pot1 proteins,
and budding yeast Cdc13 proteins. Telomere maintenance and
end-protection are essential for the survival and proliferation
of eukaryotic cells, suggesting that these proteins would
be highly conserved. In practice, however, evidence for
bona fide homology among telomeric factors has been elusive,
and, in the case of the known end-protection proteins, evolutionary
relationships have been postulated largely on the basis
of protein structural and functional similarity alone.
We have recently developed new bioinformatic methods for
macromolecular structural comparison and for exploration
of the distant evolutionary relationships among OB-fold
domains, especially its telomeric representatives. What
we've found is somewhat surprising: even though a billion
years of evolution have nearly erased any discernible sequence
similarities between individual OB-fold domains, distant
similarities can be gleaned from the noise when families
of domains are compared. And gratifyingly, these weak similarities
reliably classify the OB-fold domains according to known
cellular functions, as one would expect if these modern
domains had evolved from common ancestral domains.
Bayesian and likelihood methods for structural comparison
and analysis
Superpositioning
macromolecular structures is an essential tool in structural
bioinformatics and is used routinely in the fields of NMR,
X-ray crystallography, protein folding, molecular dynamics,
rational drug design, and structural evolution. Superpositioning
allows comparison of structures by fitting their atomic
coordinates to each other as closely as possible. Interpretation
of a superposition relies upon the accuracy of the estimated
orientations of the molecules, and thus reliable and robust
superpositioning tools are a critical component of structural
analysis and comparison.
The
structural superposition problem has classically been solved
with the standard statistical optimization method of least-squares
(LS). However, LS can provide misleading and inaccurate
results in theory and in practice. To correct for the shortcomings
of LS, we have applied likelihood and Bayesian techniques
to the superposition problem, resulting in much more accurate
superpositions and analyses of the complex correlations
among the atoms within macromolecules. For more information
see: http://www.theseus3d.org/.
Future Goals and Research
The lab's long-term scientific goals lie in developing
precise molecular understandings of the function of macromolecular
assemblies, an endeavor which ultimately must be informed
by evolutionary knowledge. Currently, the dominant paradigm
in structural biology is neutral evolutionary theory, which
assumes that the differences among homologous proteins are
unimportant for their functions. However, according to the
theory of natural selection, differences among proteins
can be important for function. Thus, for a full understanding
of the relationship between macromolecular function and
structure, we consider it essential to explicitly incorporate
the modern developments in population genetics regarding
natural selection. Conversely, structural knowledge can
also inform evolutionary inferences. Implementation of these
ideas requires rigorous bioinformatic techniques and modern
phylogenetic methods.
One ongoing research project involves "protein resurrection"
methods, in which multiple ancient and extinct proteins
are recreated in the lab, assayed experimentally for enzymatic
activity, and their atomic resolution structures determined
by crystallography. One of the goals of this research is
to create a movie in which we can watch how the three-dimensional
structure of a macromolecule has evolved in different lineages
via point mutations, with each change correlated with changes
in the molecule's biochemical function. These "structo-evo"
studies will shed light on important structure-function
questions, including possibilities for the rational design
of proteins with novel functions and for understanding how
changes in proteins can affect their function and structures.
Recent publications
Douglas L. Theobald and Deborah S. Wuttke (2008) "Accurate structural correlations from maximum likelihood superpositions."
PLOS Computational Biology 4(2):e43
Douglas L. Theobald (2007) "Punctuated equilibrium." Forthcoming
in International Encyclopedia of the Social Sciences, 2nd
Edition.
Douglas L. Theobald and Deborah S. Wuttke (2006) "Empirical
Bayes hierarchical models for regularizing maximum likelihood
estimation in the matrix Gaussian Procrustes problem." Proceedings
of the National Academy of Sciences USA 103(49): 18521-18527.
Douglas L. Theobald and Deborah S. Wuttke (2006) "THESEUS:
Maximum likelihood superpositioning and analysis of macromolecular
structures." Bioinformatics 22(17): 2171-2172.
Douglas L. Theobald and Deborah S. Wuttke (2005) "Divergent
evolution within protein superfolds inferred from profile-based
phylogenetics." Journal of Molecular Biology 354(3): 722-737.
Douglas L. Theobald (2005) "Rapid calculation of RMSD using
a quaternion-based characteristic polynomial." Acta Crystallographica
A 61(Pt 4): 478-480.
Douglas L. Theobald and Deborah S. Wuttke (2004) "Prediction
of multiple tandem OB-fold domains in telomere end-binding
proteins Pot1 and Cdc13." Structure (Cambridge) 12(10):
1877-1879.
Rachel M. Mitton-Fry, Emily M. Anderson, Douglas L. Theobald,
Leslie W. Glustrom, and Deborah S. Wuttke (2004) "Structural
basis for telomeric single-stranded DNA recognition by yeast
Cdc13." Journal of Molecular Biology 338(2): 241-255.
Douglas L. Theobald, Rachel B. Cervantes, Victoria Lundblad,
and Deborah S. Wuttke (2003) "Homology among telomeric end-protection
proteins." Structure (Cambridge) 11(9): 1049-1050.
Douglas L. Theobald, Rachel M. Mitton-Fry, and Deborah
S. Wuttke (2003) "Nucleic acid recognition by OB-fold proteins."
Annual Review of Biophysical and Biomolecular Structure.
32: 115-133.