Mathematical Modeling: Epidemiology Meets Systems Biology
Cornelia M. Ulrich, H. Frederik Nijhout and Michael C. Reed
Cancer Prevention Program, Fred Hutchinson Cancer Research Center; Department of Epidemiology, University of Washington, Seattle, Washington; and Departments of Biology and Mathematics, Duke University, Durham, North Carolina
"For every complex problem there is a simple, easy to understand, incorrect answer." –Albert Szent-Gyorgy
Cancer Epidemiology Biomarkers & Prevention Vol. 15, 827-829, May 2006. © 2006 American Association for Cancer Research
This issue of Cancer Epidemiology, Biomarkers, and Prevention includes a study on mathematical modeling of biological processes. Sokhansanj and Wilson describe a mathematical model that mimics the kinetics of base excision repair and thus permits them to investigate in silico the effects of genetic variation in this important DNA repair pathway (1).
Why should epidemiologists have interest in this type of mathematical modeling? Although perhaps not obvious at first glance, there are several benefits of this approach to cancer epidemiology. First, as we all know, the quality of cancer epidemiology is much improved by a thorough understanding of the underlying biological processes. Molecular epidemiology has increasingly moved away from considering isolated genes towards a pathway-based approach (2). Most grant applications in molecular cancer epidemiology these days delineate specific biological pathways that are targeted for investigation and provide schemata that describe the main reactions and connections. However, these pathway depictions are usually simplified: they often ignore (for parsimony) intricate intermediate steps or interactions with other pathways, and often, we have incomplete knowledge about the specific interplay of the many elements in the system. Biological systems are immensely complex. In our simplified descriptions of pathways, we tend to disregard the fact that changes in inputs typically affect many different cellular biochemical components (genes, substrates, and enzymes); thus, the net effect on the entire system is difficult to anticipate. There are many mutual regulatory interactions among genes, proteins, and the myriad organic molecules in a cell. Even more daunting is that many genes that code for enzymes are up-regulated or down-regulated by the cellular inputs or by changes in the levels of various biochemical substrates. The cell is a dynamic system that is always changing in response to the fluctuating cellular environment. In addition, of course, it is not just the metabolism and molecular biology of healthy cells and diseased cells that is of interest. We also need to understand how cellular changes lead to large-scale disease processes and to the biomarkers and disease outcomes that epidemiologists measure.
One way to study such complicated systems is by mathematical modeling. Typically, one develops differential equations for the gene activities or substrate concentrations in small or medium-sized networks that contain the molecular phenomena that one wants to understand. The first difficult issue is to decide how to limit the network to be modeled. Each local network is connected to the whole of cell physiology by shared substrates, genes, and enzymes. By studying a small piece of the network and oversimplifying the rest as "constant," one may make simplifications that give nonphysiologic results. Second, one must decide on the level of detail to be used for each individual interaction. Is it important to include the sequences of steps by which an enzyme catalyzes the reaction or will a simple Michaelis-Menten formula do? Is it necessary to model the time delays that occur in both transcription and translation? Can one treat the cell as a well-mixed biochemical reactor, or should one take into account compartmentalization and the localization of many processes? There are no simple or universal answers to such questions. They must be answered on a case-by-case basis depending on the biological or biochemical questions being investigated.
If one succeeds in constructing a mathematical model that reasonably represents the biochemical reality, the payoff is large. One can experiment with the model by increasing or decreasing inputs (corresponding, say, to changes in diet) or by raising or lowering activities of enzymes (corresponding to genetic polymorphisms), or eliminating entire reactions completely (corresponding to gene-knockout experiments). One can take apart and put back together the biochemical network piece by piece to determine how it works. In contrast to biological experiments, these in silico experiments are quick and inexpensive and, if done well, can give real insight into the genetic and molecular network.
How can epidemiologists benefit from these in silico experimentations? The article by Sokhansanj and Wilson (1) provides an excellent illustration of the use of a mathematical model to predict the functional effect of genetic polymorphisms on biologically relevant outputs. The authors explored base excision repair and used as main end points the number of unrepaired lesions in the cell and clearance time of lesions (focusing on 8-hydroxyguanine, a prominent component of oxidative DNA damage). Results from this model illustrate that this pathway is remarkably robust, as one would expect after millions of years of evolution, given the need to preserve the integrity of DNA in the face of the most common form of DNA damage (3, 4).
Of course, model predictions will be, although qualitatively correct, not necessarily quantitatively exact. Nevertheless, they do provide an initial understanding of where this system is robust to changes in enzyme function and where it is sensitive (see Table 3 in Sokhansanj and Wilson). Results from this type of sensitivity analysis allow epidemiologists to target their investigations of genetic variability towards components of the system in which an alteration in function would have consequences (i.e., result in a notable disturbance of the system). Based on their Table 3, one would suggest that Lig1 and perhaps Ogg1 could be such critical elements of the base-excision repair network and thus should undergo more scrutiny in molecular epidemiologic investigations (1).
A second key advantage of mathematical modeling is that it provides a tool for quick and inexpensive (perhaps also quick and dirty) examinations of gene-gene interactions. Again, a mathematical model per se will never provide a complete answer and will certainly not replace the need for experimental or epidemiologic studies. However, results from modeling can guide an epidemiologist’s choice of gene-gene interactions that are most promising for further investigation. Model predictions may provide information that can be incorporated in the statistical analysis in form of a "probability that an interaction will exist," comparable with the false-positive report probability (5). Sokhansanj and Wilson’s findings illustrate that the investigation of interactions in a complex system can yield evidence for nonintuitive consequences of multiple simultaneous variants (1). Their findings suggest the next steps of targeted experimental investigations, including gene knockout studies.
Other mathematical models of biological networks may supply novel information on gene-environment interactions: for example, we are in the process of developing such a model of folate-mediated one-carbon metabolism, including its links to methylation capacity and nucleotide synthesis (6–8). A number of nutritional factors play a role in this biological pathway, both as substrates and cofactors, including folate; vitamins B2, B6, and B12; methionine; and choline. This model will facilitate investigation of the independent and combined effects of dietary factors and genetic variability on the pathway’s components and on specific biomarkers (e.g., thymidine synthesis). Although the model’s results on gene-nutrient interactions will not substitute for in vivo studies, we expect that they will guide human feeding studies towards the combination of nutrients and genetic factors most relevant for disease processes.
A third benefit of mathematical modeling relates to our need for incorporating information on biological pathways in the statistical data analysis. D. Thomas has previously discussed some of the approaches that can be used in pursuing this goal (2). Results from mathematical modeling can yield information of direct use to epidemiologists, by giving values for prior covariates (factors that help explain variability between variables of interest) that can be used in a hierarchical modeling structure: for example, the modeling of folate metabolism is motivated by the strong evidence that folate intakes and polymorphisms in folate-metabolizing enzymes are related not only to neural tube defects but also several types of cancer (9–13), in the absence of well-defined biological mechanisms. Folate metabolism is integral for the provision of S-adenosylmethionine, a substrate for methylation reactions, including DNA methylation (14). At the same time, thymidine and purine syntheses require folate cofactors; thus, the integrity of DNA can be severely impaired under folate deficiency (15). Finally, homocysteine increases with a low folate status, and this compound may directly participate in disease processes as part of its role in redox signaling (16). There are now a number of hypotheses regarding the folate-related mechanisms that are relevant to carcinogenesis for specific polymorphisms or under specific conditions (17–21). An example of a prior covariate that could be derived from this mathematical modeling would be a quantitative value for the relative change in a specific biomarker that results from a polymorphic change in enzyme function (22, 23). Attaching such prior knowledge to specific polymorphisms in a standard case-control analysis will help identify which specific biological mechanisms (e.g., effects on purine synthesis) are the primary links of folate to the disease outcome investigated.
In addition to the biological pathways of relevance to cancer biology discussed here, mathematical modeling has proven extremely useful in understanding complex molecular and biochemical mechanisms, such as the control of the cell cycle (24) and mitochondrial metabolism (25). Furthermore, the difficulties inherent in the analysis of complex, nonlinear gene-metabolic networks have stimulated the development of novel mathematical methodologies (26, 27).5
Mathematical modeling of cancer-relevant biological pathways provides epidemiologists with a new powerful tool. However, the strengths and limitations of this instrument need to be well understood to gain the most benefit from its use and avoid misinterpretation. A model will never be perfect, in part because it is usually based on incomplete and simplified biological information. Nevertheless, models can be used as platforms to test hypotheses that may be experimentally difficult or expensive. They can investigate the role that specific components (such as nutritional inputs or genetic factors) play in the overall behavior of the system. Such mathematical modeling will allow us to link epidemiology to systems biology and will help drive our understanding of cancer etiology and cancer biology.