theobaldDouglas Theobald, Ph.D.

Assistant Professor of Biochemistry

Evolution, structure, and function of macromolecular complexes

Ph,D., University of Colorado at Boulder

contact information           lab website

Fields of Specialization

          • Structure and function of single-stranded nucleic acid protein complexes
          • Likelihood and Bayesian techniques in structural bioinformatics
          • Adaptive evolution of molecular structures

Research Summary

Our lab studies the three-dimensional structures of macromolecular complexes by integrating both experimental and bioinformatic methods from the fields of X-ray crystallography, structural bioinformatics, and evolutionary theory. Our previous research has concentrated on the biophysical basis of sequence-specific recognition of unusually structured nucleic acids (such as ssDNAs and ssRNAs) and on the evolution of proteins involved in this important biological function.

Telomeric OB-folds

figure2The OB-fold is one of the most important protein folds that specifically interacts with single-stranded DNAs and RNAs, and it is also one of the few protein superfolds. Multiple OB-fold domains are common in nucleic acid recognition. However, in general OB-fold domains are notoriously difficult to detect based upon sequence similarity alone, and most proteins containing this structural motif share little sequence similarity. The OB-fold is found in several telomere-binding proteins that specifically recognize and bind the single-stranded DNA telomeric overhang of chromosomes, including the ciliate telomere end-binding protein TEBP, the metazoan Pot1 proteins, and budding yeast Cdc13 proteins. Telomere maintenance and end-protection are essential for the survival and proliferation of eukaryotic cells, suggesting that these proteins would be highly conserved. In practice, however, evidence for bona fide homology among telomeric factors has been elusive, and, in the case of the known end-protection proteins, evolutionary relationships have been postulated largely on the basis of protein structural and functional similarity alone.

We have recently developed new bioinformatic methods for macromolecular structural comparison and for exploration of the distant evolutionary relationships among OB-fold domains, especially its telomeric representatives. What we've found is somewhat surprising: even though a billion years of evolution have nearly erased any discernible sequence similarities between individual OB-fold domains, distant similarities can be gleaned from the noise when families of domains are compared. And gratifyingly, these weak similarities reliably classify the OB-fold domains according to known cellular functions, as one would expect if these modern domains had evolved from common ancestral domains.

figure3Bayesian and likelihood methods for structural comparison and analysis

Superpositioning macromolecular structures is an essential tool in structural bioinformatics and is used routinely in the fields of NMR, X-ray crystallography, protein folding, molecular dynamics, rational drug design, and structural evolution. Superpositioning allows comparison of structures by fitting their atomic coordinates to each other as closely as possible. Interpretation of a superposition relies upon the accuracy of the estimated orientations of the molecules, and thus reliable and robust superpositioning tools are a critical component of structural analysis and comparison.

The structural superposition problem has classically been solved with the standard statistical optimization method of least-squares (LS). However, LS can provide misleading and inaccurate results in theory and in practice. To correct for the shortcomings of LS, we have applied likelihood and Bayesian techniques to the superposition problem, resulting in much more accurate superpositions and analyses of the complex correlations among the atoms within macromolecules. For more information see: http://www.theseus3d.org/.

figure1Future Goals and Research

The lab's long-term scientific goals lie in developing precise molecular understandings of the function of macromolecular assemblies, an endeavor which ultimately must be informed by evolutionary knowledge. Currently, the dominant paradigm in structural biology is neutral evolutionary theory, which assumes that the differences among homologous proteins are unimportant for their functions. However, according to the theory of natural selection, differences among proteins can be important for function. Thus, for a full understanding of the relationship between macromolecular function and structure, we consider it essential to explicitly incorporate the modern developments in population genetics regarding natural selection. Conversely, structural knowledge can also inform evolutionary inferences. Implementation of these ideas requires rigorous bioinformatic techniques and modern phylogenetic methods.

One ongoing research project involves "protein resurrection" methods, in which multiple ancient and extinct proteins are recreated in the lab, assayed experimentally for enzymatic activity, and their atomic resolution structures determined by crystallography. One of the goals of this research is to create a movie in which we can watch how the three-dimensional structure of a macromolecule has evolved in different lineages via point mutations, with each change correlated with changes in the molecule's biochemical function. These "structo-evo" studies will shed light on important structure-function questions, including possibilities for the rational design of proteins with novel functions and for understanding how changes in proteins can affect their function and structures.

Recent publications

Kristine A. Mackin, Richard A. Roy, and Douglas L. Theobald (2013) "An empirical test of convergent evolution in rhodopsins." Molecular Biology and Evoluton, published online September 27, 2013 doi:10.1093/molbev/mst171, [PMID: 24077848]

Lina Ni, Peter Bronk, Elaine C. Chang, April M. Lowell, Juliette O. Flam, Vincent C. Panzano, Douglas L. Theobald, Leslie C. Griffith, and Paul A. Garrity (2013) "A gustatory receptor paralog controls rapid warmth avoidance in Drosophila." Nature 500(7464):580-584. doi:10.1038/nature12390, [PMID: 23925112]

Erin L. Devine, Daniel D. Oprian, and Douglas L. Theobald (2013) "Relocating the active-site lysine in rhodopsin and implications for evolution of the retinylidene proteins." Proceedings of the National Academy of Sciences USA 110(33):13351-13355. doi:10.1073/pnas.1306826110, [PMID: 23904486]

Dmitry Lyumkis, Axel F. Brilot, Douglas L. Theobald, and Nikolaus Grigorieff (2013) "Likelihood-based classification of cryo-EM images using FREALIGN." Journal of Structural Biology 183(3): 377-388. doi:10.1016/j.jsb.2013.07.005, [PMID: 23872434]

Kanti V. Mardia, Christopher J. Fallaize, Stuart Barber, Richard M. Jackson, and Douglas L. Theobald (2013) "Bayesian alignment of similarity shapes." The Annals of Applied Statistics 7(2): 989-1009. doi:10.1214/12-AOAS615, [PMID: 24052809]

Douglas L. Theobald and Phillip A. Steindel (2012) "Optimal simultaneous superpositioning of multiple structures with missing data." Bioinformatics 28 (15): 1972-1979 doi:10.1093/bioinformatics/bts243

Douglas L. Theobald (2012) "Likelihood and empirical Bayes superpositions of multiple macromolecular structures." In Bayesian Methods in Structural Bioinformatics, Thomas Hamelryck, Kanti V. Mardia, and Jesper Ferkinghoff-Borg, Editors, Statistics for Biology and Health Series, Springer Verlag, New York.

Douglas L. Theobald (2011) "On universal common ancestry, sequence similarity, and phylogenetic structure: The sins of P-values and the virtues of Bayesian evidence." Biology Direct 6(1): 60. [Highly accessed article] doi:10.1186/1745-6150-6-60

Kene Piasta, Douglas L. Theobald, and Christopher Miller (2011) "Potassium-selective block of barium permeation through single KcsA channels." Journal of General Physiology 138(4): 421-436. doi:10.1085/jgp.201110684

Pu Liu, Dimitris K. Agrafiotis, and Douglas L. Theobald (2010) "Fast determination of the optimal rotational matrix for macromolecular superpositions." Journal of Computational Chemistry 31(7): 1561–1563. doi:10.1002/jcc.21439

Theobald DL. "A formal test of the theory of universal common ancestry." Nature (2010) 2010 May 13; 465(7295):219-22.

Kang K, Pulver SR, Panzano VC, Chang EC, Griffith LC, Theobald DL and Garrity PA. "Analysis of Drosophila TRPA1 reveals an ancient origin for human chemical nociception." Nature (2010). 2010 Mar 25;464(7288):597-600.

Theobald DL. "Likelihood and empirical Bayes superpositions of multiple macromolecular structures." Bayesian methods in structural bioinformatics. Ed. Hamelryck T, Mardia KV, and Ferkinghoff-Borg J. New York: Springer Verlag, 2010.

Theobald DL and Miller C. "Membrane transport proteins: Surprises in structural sameness." Nature Structural & Molecular Biology 17. 1 (2010): 2-3.

Theobald DL. A nonisotropic Bayesian approach to superpositioning multiple macromolecules.. Proc. of the 28th Leeds Annual Statistical Research (LASR) Workshop, "Statistical Tools for Challenges in Bioinformatics". University of Leeds, UK: 2009.

Theobald DL and Wuttke DS. "Accurate structural correlations from maximum likelihood superpositions." PLoS Comput Biol 4. 2 (2008): e43.

Theobald DL, Darity WA. "Punctuated equilibrium." International Encyclopedia of the Social Sciences. Second Edition ed. 1 vols. 2007.

Theobald DL and Wuttke DS. "Empirical Bayes hierarchical models for regularizing maximum likelihood estimation in the matrix Gaussian Procrustes problem." Proc Natl Acad Sci U S A 103. 49 (2006): 18521-7.

Theobald DL and Wuttke DS. "THESEUS: maximum likelihood superpositioning and analysis of macromolecular structures." Bioinformatics 22. 17 (2006): 2171-2.

Theobald DL and Wuttke DS. "Divergent evolution within protein superfolds inferred from profile-based phylogenetics." J Mol Biol 354. 3 (2005): 722-37.

Theobald DL. "Rapid calculation of RMSDs using a quaternion-based characteristic polynomial." Acta Crystallogr A 61. Pt 4 (2005): 478-80.

Mitton-Fry RM, Anderson EM, Theobald DL, Glustrom LW, and Wuttke DS. "Structural basis for telomeric single-stranded DNA recognition by yeast Cdc13." J Mol Biol 338. 2 (2004): 241-55.

Theobald DL and Wuttke DS. "Prediction of multiple tandem OB-fold domains in telomere end-binding proteins Pot1 and Cdc13." Structure 12. 10 (2004): 1877-9.

Theobald DL and Schultz SC. "Nucleotide shuffling and ssDNA recognition in Oxytricha nova telomere end-binding protein complexes." Embo J 22. 16 (2003): 4314-24.

Theobald DL, Cervantes RB, Lundblad V, and Wuttke DS. "Homology among telomeric end-protection proteins." Structure 11. 9 (2003): 1049-50.

Theobald DL, Mitton-Fry RM, and Wuttke DS. "Nucleic acid recognition by OB-fold proteins." Annual Review of Biophysics and Biomolecular Structure 32. (2003): 115-33.

 


Last review: October 31, 2013.

 
415 South Street, Waltham, MA 02453 (781) 736-2000