Anuradha Pujar2, Pankaj Jaiswal2, Elizabeth A. Kellogg2, Katica Ilic2, Leszek Vincent2, Shulamit Avraham2, Peter Stevens2, Felipe Zapata2, Leonore Reiser3, Seung Y. Rhee, Martin M. Sachs, Mary Schaeffer, Lincoln Stein, Doreen Ware and Susan McCouch*
Department of Plant Breeding, Cornell University, Ithaca, New York 14853 (A.P., P.J., S.M.); Department of Biology, University of Missouri, St. Louis, Missouri 63121 (E.A.K., P.S., F.Z.); Department of Plant Biology, Carnegie Institution, Stanford, California 94305 (K.I., L.R., S.Y.R.); Division of Plant Sciences, University of Missouri, Columbia, Missouri 65211 (L.V., M.S.); Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724 (S.A., L.S., D.W.); Missouri Botanical Garden, St. Louis, Missouri 63110 (P.S., F.Z.); Maize Genetics Cooperation-Stock Center, Department of Crop Sciences, University of Illinois, Urbana, Illinois 61801 (M.M.S.); and Agricultural Research Service, United States Department of Agriculture, Washington, DC 20250 (M.M.S., M.S., D.W.)
Plant growth stages are identified as distinct morphological landmarks in a continuous developmental process. The terms describing these developmental stages record the morphological appearance of the plant at a specific point in its life cycle. The widely differing morphology of plant species consequently gave rise to heterogeneous vocabularies describing growth and development. Each species or family specific community developed distinct terminologies for describing whole-plant growth stages. This semantic heterogeneity made it impossible to use growth stage description contained within plant biology databases to make meaningful computational comparisons. The Plant Ontology Consortium (http://www.plantontology.org) was founded to develop standard ontologies describing plant anatomical as well as growth and developmental stages that can be used for annotation of gene expression patterns and phenotypes of all flowering plants. In this article, we describe the development of a generic whole-plant growth stage ontology that describes the spatiotemporal stages of plant growth as a set of landmark events that progress from germination to senescence. This ontology represents a synthesis and integration of terms and concepts from a variety of species-specific vocabularies previously used for describing phenotypes and genomic information. It provides a common platform for annotating gene function and gene expression in relation to the developmental trajectory of a plant described at the organismal level. As proof of concept the Plant Ontology Consortium used the plant ontology growth stage ontology to annotate genes and phenotypes in plants with initial emphasis on those represented in The Arabidopsis Information Resource, Gramene database, and MaizeGDB.
Plant Physiology 142:414-428 (2006). OPEN ACCESS ARTICLE.
Plant systems are complex, both structurally and operationally, and the information regarding plant development requires extensive synthesis to provide a coherent view of their growth and development. The difficulty of developing such a synthesis is exacerbated by the deluge of new technologies such as high-throughput genotyping, microarrays, proteomics, transcriptomics, etc., that generate large amounts of data rapidly. The speed and magnitude of data deposition challenges our ability to represent and interpret this data within the context of any particular biological system (Gopalacharyulu et al., 2005). The ability to extract knowledge from historical sources and integrate it with new information derived from global datasets requires a sophisticated approach to data mining and integration.
Historically, the growth and development of cultivated plants have been monitored at the whole-plant level with the help of scales of easily recognizable growth stages. Consequently, there exist large volumes of literature detailing growth stages for individual plant species or closely related groups of species. For example, Zadok’s scale (Zadok et al., 1974) was developed for the Triticeae crops and is widely used to stage the growth and development of cereal crops in the United States. The flexibility of this scale has allowed it to be extended to other cultivated plants, and a uniform code called the Biologische Bundesanstalt, Bundessortenamt, and Chemical Industry (BBCH) code was developed from it (Meier, 1997). The BBCH scale is quite generic and encompasses multiple crops, including monocot and eudicot species. It offers standardized descriptions of plant development in the order of phenological appearance, and has coded each stage for easy computer retrieval. It should be noted that Arabidopsis (Arabidopsis thaliana), as a representative of the Brassicaceae and by virtue of not being a cultivated species, did not have a specific growth stage vocabulary or scale until 2001 when Boyes et al. (2001) developed an experimental platform describing the Arabidopsis growth stages using the BBCH scale. This work created a crucial semantic link between Arabidopsis and cultivated plants. In addition to facilitating the description and synthesis of large amounts of data within a crop species, vocabularies like the BBCH and Zadok’s scale also make possible transfer of information among researchers and provide a common language for comparative purposes (Counce et al., 2000).
In the post genomic era, these scales have proved inadequate to handle the deluge of information that required large-scale computation for comparative analysis. This called for the conversion of existing scales into ontology that have an advantage over simple scales because their hierarchical organization facilitates computation across them. Terms in an ontology are organized in the form of a tree, the nodes of the tree represent entities at greater or lesser levels of detail (Smith, 2004). The branches connecting the nodes represent the relation between two entities such that the term radicle emergence stage is a child of the parent term germination stage (Fig. 1 ). Individual stages of a scale are then parts that can be related to the whole by their order of appearance during plant growth. Each term carries a unique identifier and strictly specified relationships between the terms allow systematic ordering of data within a database, this in turn improves input and retrieval of information (Bard and Rhee, 2004; Harris et al., 2004).
Consequently, several species-specific databases converted BBCH and other scales into formal ontologies (controlled vocabularies) to facilitate the annotation of genetic information. For example, the Gramene database (Jaiswal et al., 2006) designed its cereal growth stage ontology based on the stages described in the standard evaluation system for rice (Oryza sativa; INGER, 1996) and those described by Counce et al. (2000) for rice, by Zadok et al. (1974) for Triticeae (wheat [Triticum aestivum], oat [Avena sativa], and barley [Hordeum vulgare]), and by Doggett (1988) for sorghum. Except for the sorghum, which is a less studied crop, these species had fairly well-described growth staging vocabularies. MaizeGDB (Lawrence et al., 2005) developed a very extensive controlled vocabulary from a modified version of that described by Ritchie et al. (1993). The Arabidopsis Information Resource (TAIR; Rhee et al., 2003) developed the Arabidopsis growth stage ontology from the scale described by Boyes et al. (2001). However, ontologies created in these projects remained restricted to particular species or families, whereas comparative genomics requires that a common standard vocabulary be applied to a broad range of species. The uniform BBCH scale (Meier, 1997) appeared to be a suitable model to develop a unified ontology since this scale had already synthesized monocot and eudicot crop stages into a single vocabulary.
The Plant Ontology Consortium (POC) was inaugurated in 2003 for the purpose of developing common ontologies to describe the anatomy, morphology, and growth stages of flowering plants (Jaiswal et al., 2005). Its primary task was to integrate and normalize existing species-specific ontologies or vocabularies that had been developed by several major databases for the purpose of annotating gene expression and mutant phenotype. The plant ontology (PO) is divided into two aspects. The first is the plant structure ontology (PSO), a vocabulary of anatomical terms (K. Ilic, E.A. Kellogg, P. Jaiswal, F. Zapata, P. Stevens, L. Vincent, S. Avraham, L. Reiser, A. Pujar, M.M. Sachs, S. McCouch, M. Schaeffer, D. Ware, L. Stein, and S.Y. Rhee, unpublished data), which, since its release to the public domain in 2004, has become widely used by plant genome databases (Jaiswal et al., 2005). The second aspect is the plant growth and developmental stages ontology. This component of PO is further divided into the whole-plant growth stage ontology (GSO) developed by this project and the plant part developmental stages. This article focuses on the whole-plant GSO; we will discuss the history, design, and applications of the GSO and show how it simplifies the description of a continuous and complex series of events in plant development. The plant part developmental stages will be reviewed elsewhere.