The post-genome era is characterized by a major expansion in the available biological data. Sources of information include the genome sequences; protein structures; functional data both at the molecular and the pathway level; transcriptome, proteome and metabolome expression levels together with details of which biomolecules interact. Understanding and exploiting these data is now central to progress in biology and this requirement has stimulated the development and expansion of bioinformatics. Of particular importance is that bioinformatics is essential to the integration of these sources of information, thereby empowering scientists to formulate in silico hypotheses for experimental study. These proceedings will explore current challenges in bioinformatics ranging from molecules to systems.
Bioinformatics played a critical role in the landmark event of the last few years–the sequencing of the human genome. A major goal of this project was to provide the framework upon which further studies can explore the relationships between genetic variation and human disease. The first paper in this volume, by Mott (2006), describes recent methodological advances in quantitative locus mapping to study and compare genetic variation in mouse and man.
Experimental and computational studies on model organisms, such as yeast, continue to provide major biological understanding. Wolfe (2006) describes how the complete genome sequences from 30 fungal groups provide a broad perspective on genome evolution. From these genomes, one can identify whole genome duplications, gene loss, gene displacement and gene relocation.
Structural studies are revealing the molecular basis of protein function in numerous biological systems. Blundell et al. (2006) describe the combined use of experimental structure determination and bioinformatics to accelerate drug discovery. A particularly powerful new approach is the combination of modelling and crystallographic analysis of fragments binding to protein receptors as the first step in drug discovery. Structure-based approaches today can have a major impact in three crucial stages of dug design–target identification, lead discovery and lead optimization.
Orengo and co-workers (Marsden et al. 2006), describe the use of automated methodologies to assign sequences in completed genomes to structural families of protein domains. Comparative analysis across bacterial genomes reveals different rates of expansion for different families, which can be explained by their functional role. The paper from the Sternberg group (Fleming et al. 2006) continues with the theme of interpreting assignment of protein structural families to genome sequences to provide evolutionary insights. In addition, the paper reports a novel approach to assign function to newly determined protein structures addressing one major challenge of the recent structural genomics projects.
For more than 30 years, theoreticians have been aiming to develop algorithms to predict the three-dimensional structure of a protein from its sequence. The paper by Moult (2006) reports the current status of this field as evaluated in the blind trials of protein structure prediction (termed CASP) held every 2 years. Over the six such meetings, there has been considerable progress in prediction across a range of problems ranging from accurate comparative modelling using the structure of a homologue through the folding of small proteins de novo. Baker (2006) reports the results of his group in predicting protein structure from sequence and he emphasizes the importance of high-resolution refinement to achieve accurate models. These functions have also been applied with considerable success both to predict the structure of protein complexes starting from the unbound components and to design a novel protein with a chosen three-dimensional structure.
The recent progress in predicting the structure of transmembrane proteins from sequences is reported in the paper from Jones and co-workers (Hurwitz et al. 2006). Such algorithms are particularly important since, despite the biological importance of transmembrane proteins, it remains very difficult to determine their structure experimentally. The methods reported range from the assignment of membrane topology through to low resolution folding simulations in a knowledge-based force field.
The remaining papers in this proceedings focus on the challenges of systems biology. Over the last few years there have been major developments in high throughput ‘omics methodologies, which are providing new sources of information that characterize the expression, interaction and regulation of cellular components under a range of conditions. The interpretation of these new sources of data provides challenges for bioinformatics both at the data processing and at the interpretation stages.
The paper by Oliver (2006) describes various approaches to model the flux within the transcriptome, the proteome and the metabolome of Saccharomyces cerevisiae. Two types of experiments are employed–varying the flux and altering the level of gene products. Integrating these results requires bioinformatics tools for representing, storing and analysing the data so one can begin to construct mathematical models of unicellular organisms. This theme of network modelling is also the focus of the paper by Schlitt & Brazma (2006). They report approaches to modelling gene regulation networks which can be categorized by increasing detail as a parts lists, a topological model of the network, logic (Boolean) models of network control and finally dynamic models using differential or difference equations. The authors then introduce a new simple method of modelling dynamic models called finite state linear model that combines the simplicity of Boolean networks with the advantages of continuous representations. The complexity of modelling transcriptional networks is highlighted by the paper from Bolouri and co-workers (Ramsey et al. 2006) who consider noise of gene expression in mammalian macrophages. They have developed a detailed stochastic model of gene expression that considers effects due to cell size and genome complexity. They find that the nature of predicted transcriptional noise is different in macrophages compared to those in yeast and bacteria.
Two papers highlight the close link between the bioinformatics study of individual molecular components and an understanding of their integration into the system. The paper from Teichmann and co-workers (Pereira-Leal et al. 2006) considers the evolution of functional modules in biological systems. They focus on analysing protein complexes and they investigate the emergence of these complexes by duplication. The authors suggest that certain protein complexes were established very early in evolution. They observe that proteins that are shared across different complexes occur frequently and tend to be essential genes. Bork and his group (Foerstner et al. 2006) describe a high throughput analysis of the results of environmental sequencing projects (also known as metagenomics). The paper highlights the need to establish baselines for interpreting the comparative sequence analysis. The authors find simple discriminative properties for the DNA sequences from four distinct habitats.
We close these proceedings with the paper from Eisenberg et al. (2006) that describes bioinformatics challenges for the next decades. This paper highlights the breadth of topics now being studied within bioinformatics. There remain major challenges in simulating the structure and function of individual molecules and in modelling the complexity of networks. It is clear that bioinformatics increasingly will impact on advances in human health in areas such as drug design and patient profiling. At a more fundamental level, bioinformatics addresses the question of the cause of humanness.
Our aim in organizing this Discussion Meeting was to highlight the progress in bioinformatics over the last 30 years. Advances are often incremental and sometimes only by charting progress over a window of a decade can one appreciate the extent of developments. This is particularly well demonstrated in the CASP blind trials of protein structure prediction: the combination of increasing data resources and improved algorithms has led to substantial progress in the field since the evaluations started in 1994. The development of high throughput methodologies has stimulated the modelling of networks. Here, new methodologies are being developed alongside the application of genome wide analysis of the component molecules. As bioinformatics now actively embraces modelling from molecules to systems, we anticipate that it will over the next decade have an ever increasing impact across a broad range of biological research.
We are most grateful to the Royal Society for providing us with the opportunity to organize this Discussion Meeting and for its generous financial support. We would also like to thank all the staff at the Royal Society for their help in arranging this meeting.
Baker, D. 2006 Prediction and design of macromolecular structures and interactions. Phil. Trans. R. Soc. B 361, 459-463.
Blundell, T.L., Sibanda, B.L., Montalvão, R.W., Brewerton, S., Chelliah, V., Worth, C.L., Harmer, N.J., Davies, O. & Burke, D. 2006 Structural biology and bioinformatics in drug design: opportunities and challenges for target identification and lead discovery. Phil. Trans. R. Soc. B 361, 413-423.
Eisenberg, D., Marcotte, E., McLachlan, A.D. & Pellegrini, M. 2006 Bioinformatic challenges for the next decade(s). Phil. Trans. R. Soc. B 361, 525-527.
Fleming, K., Kelley, L.A., Islam, S.A., MacCallum, R.M., Muller, A., Pazos, F. & Sternberg, M.J.E. 2006 The proteome: structure, function and evolution. Phil. Trans. R. Soc. B 361, 441-451.
Foerstner, K.U., von Mering, C. & Bork, P. 2006 Comparative analysis of environmental sequences: potential and challenges. Phil. Trans. R. Soc. B 361, 519-523.
Hurwitz, N., Pellegrini-Calace, M. & Jones, D.T. 2006 Towards genome-scale structure prediction for transmembrane proteins. Phil. Trans. R. Soc. B 361, 565-575.
Marsden, R.L. et al. 2006 Exploiting protein structure data to explore the evolution of protein function and biological complexity. Phil. Trans. R. Soc. B 361, 425-440.
Mott, R. 2006 Finding the molecular basis of complex genetic variation in humans and mice. Phil. Trans. R. Soc. B 361, 393-401.
Moult, J. 2006 Rigorous performance evaluation in protein structure modeling and implications for computational biology. Phil. Trans. R. Soc. B 361, 453-458.
Oliver, S.G. 2006 From genomes to systems: the path with yeast. Phil. Trans. R. Soc. B 361, 447-482.
Pereira-Leal, J.B., Levy, E.D. & Teichmann, S.A. 2006 The origins and evolution of functional modules: lessons from protein complexes. Phil. Trans. R. Soc. B 361, 507-517.
Ramsey, S., Ozinsky, A., Clark, A., Smith, K.D., de Atauri, P., Thorsson, V., Orrell, D. & Bolouri, H. 2006 Transcriptional noise and cellular heterogeneity in mammalian macrophages. Phil. Trans. R. Soc. B 361, 495-506.
Schlitt, T. & Brazma, A. 2006 Modelling in molecular biology: describing transcription regulatory networks at different scales. Phil. Trans. R. Soc. B 361, 483-494.
Wolfe, K.H. 2006 Comparative genomics and genome evolution in yeasts. Phil. Trans. R. Soc. B 361, 403-412.
Source: David T. Jones, Michael J. E. Sternberg & Janet M. Thornton. Philosophical Transactions of The Royal Society B Volume 361, Number 1467, Pages: 389–391, 29 March 2006