Skip to main content

Surviving in a sea of data: a survey of plant genome data resources and issues in building data management systems

  • Chapter
Functional Genomics

Abstract

Exponential growth of data, largely from whole-genome analyses, has changed the way biologists think about and handle data. Optimal use of these data requires effective methods to analyze and manage these data sets. Computers, software and the World Wide Web are now integral components of biological discovery. Understanding how information is obtained, processed and annotated in public databases allows researchers to effectively organize, analyze and export their own data into these databases. In this review we focus largely on two areas related to management of genomic data. We cite examples of resources available in the public domain and describe some of the software for data management systems currently available for plant research. In addition, we discuss a few concepts of data management from the perspective of an individual or group that wishes to provide data to the public databases, to use the information in the public databases more efficiently, or to develop a database to manage large data sets internally or for public access. These concepts include data descriptions, exchange format, curation, attribution, and database implementation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Aach, J., Bulyk, M.L., Church, G.M., Comander, J., Derti, A. and Shendure, J. 2001. Computational comparison of two draft sequences of the human genome. Nature 409: 856–859.

    Article  PubMed  CAS  Google Scholar 

  • Achard, F., Vaysseix, G. and Barillot, E. 2001. XML, bioinformatics and data integration. Bioinformatics 17: 115–125.

    Article  PubMed  CAS  Google Scholar 

  • Adams, M.D., Celniker, S.E., Holt, R.A., Evans, C.A., Gocayne, J.D., Amanatides, P.G., Scherer, S.E., Li, P.W., Hoskins, R.A., Galle, R.F., George, R.A., Lewis, S.E., Richards, S., Ashburner, M., Henderson, S.N., Sutton, G.G., Wortman, J.R., Yandell, M.D., Zhang, Q., Chen, L.X., Brandon, R.C., Rogers, Y.H., Blazej, R.G., Champe, M., Pfeiffer, B.D., Wan, K.H., Doyle, C., Baxter, E.G., Helt, G., Nelson, C.R., Gabor, G.L., Abril, J.F., Agbayani, A., An, H.J., Andrews-Pfannkoch, C., Baldwin, D., Ballew, R.M., Basu, A., Baxendale, J., Bayraktaroglu, L., Beasley, E.M., Beeson, K.Y., Benos, P.V., Berman, B.P., Bhan-dari, D., Bolshakov, S., Borkova, D., Botchan, M.R., Bouck, J. et al., 2000. The genome sequence of Drosophila melanogaster. Science 287:2185–2195.

    Article  PubMed  Google Scholar 

  • Altschul, S.F., Gish, W., Miller, W., Myers, E.W. and Lipman, D.J. 1990. Basic Local Alignment Search Tool. J. Mol. Biol. 215: 403–410.

    PubMed  CAS  Google Scholar 

  • Apweiler, R., Attwood, T.K., Bairoch, A., Bateman, A., Birney, E., Biswas, M., Bucher, P., Cerutti, L., Corpet, F., Croning, M.D., Durbin, R., Falquet, L., Fleischmann, W., Gouzy, J., Hermjakob, H., Hulo, N., Jonassen, I., Kahn, D., Kanapin, A., Karavidopoulou, Y., Lopez, R., Marx, B., Mulder, N.J., Oinn, T.M., Pagni, M. and Servant, F. 2001. The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucl. Acids Res. 29: 37–40.

    Article  PubMed  CAS  Google Scholar 

  • Arabidopsis Genome Initiative. 2000. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796–815.

    Article  Google Scholar 

  • Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M. and Sherlock, G. 2000. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genet. 25: 25–29.

    Article  PubMed  CAS  Google Scholar 

  • Bairoch, A. and Apweiler, R. 2000. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucl. Acids Res. 28: 45–48.

    Article  PubMed  CAS  Google Scholar 

  • Baker, P.G., Goble, C.A., Bechhofer, S., Paton, N.W., Stevens, R. and Brass, A. 1999. An ontology for bioinformatics applications. Bioinformatics 15:510–520.

    Article  PubMed  CAS  Google Scholar 

  • Bateman, A., Birney, E., Durbin, R., Eddy, S.R., Howe, K.L. and Sonnhammer, E.L. 2000. The Pfam protein families database. Nucl. Acids Res. 28: 263–266.

    Article  PubMed  CAS  Google Scholar 

  • Baxevanis, A.D. 1998. Information retrieval from biological data-bases. Meth. Biochem. Anal. 39: 98–120.

    Google Scholar 

  • Baxevanis, A.D. 2001. The Molecular Biology Database Collection: an updated compilation of biological database resources. Nucl. Acids Res. 29: 1–10.

    Article  PubMed  CAS  Google Scholar 

  • Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., Rapp, B.A. and Wheeler, D.L. 2000. GenBank. Nucl. Acids Res. 28: 15–18.

    Article  PubMed  CAS  Google Scholar 

  • Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N. and Bourne, P.E. 2000. The Protein Data Bank. Nucl. Acids Res. 28: 235–242.

    Article  PubMed  CAS  Google Scholar 

  • Brenner, S.E. 1999. Errors in genome annotation. Trends Genet. 15: 132–133.

    Article  PubMed  CAS  Google Scholar 

  • Busch, M., Mayer, U. and Jurgens, G. 1996. Molecular analysis of the Arabidopsis pattern formation of gene GNOM: gene structure and intragenic complementation. Mol. Gen. Genet. 250: 681–691.

    PubMed  CAS  Google Scholar 

  • Cartinhour, S.W. 1997. Public informatics resources for rice and other grasses. Plant Mol. Biol. 35: 241–251.

    Article  PubMed  CAS  Google Scholar 

  • Dicks, J., Anderson, M., Cardie, L., Cartinhour, S., Couchman, M., Davenport, G., Dickson, J., Gale, M., Marshall, D., May, S., McWilliam, H., O’Malia, A., Ougham, H., Trick, M., Walsh, S. and Waugh, R. 2000. UK CropNet: a collection of databases and bioinformatics resources for crop plant genomics. Nucl. Acids Res. 28: 104–107.

    Article  PubMed  CAS  Google Scholar 

  • Eppig, J.T. 2000. Algorithms for mutant sorting: the need for phenotype vocabularies. Mamm. Genome 11: 584–589.

    Article  PubMed  CAS  Google Scholar 

  • Frishman, D., Albermann, K., Hani, J., Heumann, K., Metanomski, A., Zollner, A. and Mewes, H.W. 2001. Functional and structural genomics using PEDANT. Bioinformatics 17: 44–57.

    Article  PubMed  CAS  Google Scholar 

  • Gai, X., Lai, S., Xing, L., Brendel, V. and Walbot, V. 2000. Gene discovery using the maize genome database ZmDB. Nucl. Acids Res. 28: 94–96.

    Article  PubMed  CAS  Google Scholar 

  • Gene Ontology Consortium. 2001. Creating the gene ontology resource: design and implementation. Genome Res. 11: 1425–1433.

    Article  Google Scholar 

  • Hofmann, K., Bucher, P., Falquet, L. and Bairoch, A. 1999. The PROSITE database, its status in 1999. Nucl. Acids Res. 27: 215–219.

    Article  PubMed  CAS  Google Scholar 

  • Huala, E., Dickerman, A.W., Garcia-Hernandez, M., Weems, D., Reiser, L., LaFond, F., Hanley, D., Kiphart, D., Zhuang, M., Huang, W., Mueller, L.A., Bhattacharyya, D., Bhaya, D., Sobral, B.W., Beavis, W., Meinke, D.W., Town, C.D., Somerville, C. and Rhee, S.Y. 2001. The Arabidopsis Information Resource (TAIR): a comprehensive database and web-based information retrieval, analysis, and visualization system for a model plant. Nucl. Acids Res. 29: 102–105.

    Article  PubMed  CAS  Google Scholar 

  • Kaminski, N. 2000. Bioinformatics. A user’s perspective. Am. J. Respir. Cell Mol. Biol. 23: 705–711.

    PubMed  CAS  Google Scholar 

  • Karp, P.D. 2000. An ontology for biological function based on molecular interactions. Bioinformatics 16: 269–285.

    Article  PubMed  CAS  Google Scholar 

  • Karp, P.D. 1998. What we do not know about sequence analysis and sequence databases. Bioinformatics 14: 753–754

    Article  PubMed  CAS  Google Scholar 

  • Kawai, J., Shinagawa, A., Shibata, K., Yoshino, M., Itoh, M., Ishii, Y., Arakawa, T., Hara, A., Fukunishi, Y., Konno, H., Adachi, J., Fukuda, S., Aizawa, K., Izawa, M., Nishi, K., Kiyosawa, H., Kondo, S., Yamanaka, I., Saito, T., Okazaki, Y., Gojobori, T., Bono, H., Kasukawa, T., Saito, R., Kadota, K., Matsuda, H. A., Ashburner, M., Batalov, S., Casavant, T., Fleischmann, W., Gaasterland, T., Gissi, C., King, B., Kochiwa, H., Kuehl, P., Lewis, S., Matsuo, Y., Nikaido, I., Pesole, G., Quackenbush, J., Schriml, L.M., Staubli, F., Suzuki, R., Tomita, M., Wagner, L., Washio, T., Sakai, K., Okido, T., Furuno, M., Aono, H., Baldarelli, R., Barsh, G., Blake, J., Boffelli, D., Bojunga, N., Carninci, R, de Bonaldo, M.F., Brownstein, M.J., Bult, C., Fletcher, C., Fujita, M., Gariboldi, M., Gustincich, S., Hill, D., Hofmann, M., Hume, D.A., Kamiya, M., Lee, N.H., Lyons, P., Marchionni, L., Mashima, J., Mazzarelli, J., Mombaerts, P., Nordone, R., Ring, B., Ringwald, M., Rodriguez, I., Sakamoto, N., Sasaki, H., Sato, K., Schonbach, C., Seya, T., Shibata, Y, Storch, K.F., Suzuki, H., Toyo-oka, K., Wang, K.H., Weitz, C., Whittaker, C., Wilming, L., Wynshaw-Boris, A., Yoshida, K., Hasegawa, Y., Kawaji, H., Kohtsuki, S. and Hayashizaki, Y. 2001. Functional annotation of a full-length mouse cDNA collection. Nature 409: 685–690.

    Article  PubMed  Google Scholar 

  • Lo Conte, L., Ailey, B., Hubbard, T.J., Brenner, S.E., Murzin, A.G. and Chothia, C. 2000. SCOP: a structural classification of proteins database. Nucl. Acids Res. 28: 257–259.

    Article  PubMed  Google Scholar 

  • Meinke, D. 1995. Genetic nomenclature guide. Arabidopsis thaliana. Trends Genet. (AUTHOR: PLEASE MENTION VOLUME): 22–23.

    Google Scholar 

  • Paton, N.W., Khan, S.A., Hayes, A., Moussouni, F., Brass, A., Eilbeck, K., Goble, C.A., Hubbard, S.J. and Oliver, S.G. 2000. Conceptual modelling of genomic information. Bioinformatics 16: 548–557.

    Article  PubMed  CAS  Google Scholar 

  • Price, C., Reardon, E.M. and Lonsdale, D. 1996. A guide to naming sequenced plant genes. Plant Mol. Biol. 30: 225–227

    Article  PubMed  CAS  Google Scholar 

  • Rhee, S.Y. 2000. Bioinformatic resources, challenges, and opportunities using Arabidopsis as a model organism in a post-genomic era. Plant Physiol. 124: 1460–1464.

    Article  PubMed  CAS  Google Scholar 

  • Rhee, S.Y and Flanders, D.J. 2000. Web-based bioinformatic tools for Arabidopsis researchers. In: Z. Wilson (Ed.) Arabidopsis: A Practical Approach, Oxford University Press, Oxford, pp. 225–265.

    Google Scholar 

  • Riley, M. 1993. Functions of the gene products of Escherichia coli. Microbiol. Rev. 57: 862–952.

    PubMed  CAS  Google Scholar 

  • Schenk, P.M., Kazan, K., Wilson, I., Anderson, J.P., Richmond, T., Somerville, S.C. and Manners, J.M. 2000. Coordinated plant defense responses in Arabidopsis revealed by microarray analysis. Proc. Natl. Acad. Sci. USA 97: 11655–11660.

    Article  PubMed  CAS  Google Scholar 

  • Scholl, R.L., May, S.T. and Ware, D.H. 2000. Seed and molecular resources for Arabidopsis. Plant Physiol. 124: 1477–1480.

    Article  PubMed  CAS  Google Scholar 

  • Sehulze-Kremer, S. 1998. Ontologies for molecular biology. Pac. Symp. Biocomput.: 695–706.

    Google Scholar 

  • Shevell, D.E., Leu, W.M., Gillmor, C.S., Xia, G., Feldmann, K.A. and Chua, N.H. 1994. EMB30 is essential for normal cell division, cell expansion, and cell adhesion in Arabidopsis and encodes a protein that has similarity to Sec7. Cell 77: 1051–1062.

    Article  PubMed  CAS  Google Scholar 

  • Siepel, A., Farmer, A., Tolopko, A., Zhuang, M., Mendes, P., Beavis, W. and Sobral, B. 2001. ISYS: a decentralized, component-based approach to the integration of heterogeneous bioinformatics resources. Bioinformatics 17: 83–94.

    Article  PubMed  CAS  Google Scholar 

  • Stevens, R., Goble, C., Baker, P. and Brass, A. 2001. A classification of tasks in bioinformatics. Bioinformatics 17: 180–188.

    Article  PubMed  CAS  Google Scholar 

  • Tatusov, R.L., Natale, D.A., Garkavtsev, I.V., Tatusova, T.A., Shankavaram, U.T., Rao, B.S., Kiryutin, B., Galperin, M.Y., Fedorova, N.D. and Koonin, E.V. 2001. The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucl. Acids Res. 29: 22–28.

    Article  PubMed  CAS  Google Scholar 

  • Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W., Mural, R.J., Sutton, G.G., Smith, H.O., Yandell, M., Evans, C.A., Holt, R.A., Gocayne, J.D., Amanatides, P., Ballew, R.M., Huson, D.H., Wortman, J.R., Zhang, Q., Kodira, C.D., Zheng, X.H., Chen, L., Skupski, M., Subramanian, G., Thomas, P.D., Zhang, J., Gabor Miklos, G.L., Nelson, C., Broder, S., Clark, A.G., Nadeau, J., McKusick, V.A., Zinder, N., Levine, A.J., Roberts, R.J., Simon, M., Slayman, C., Hunkapiller, M., Bolanos, R., Delcher, A., Dew, I., Fasulo, D., Flanigan, M., Florea, L., Halpern, A., Hannenhalli, S., Kravitz, S., Levy, S., Mobarry, C, Reinert, K., Remington, K., Abu-Threideh, J., Beasley, E., Biddick, K., Bonazzi, V., Brandon, R., Cargill, M., Chandramouliswaran, I., Charlab, R., Chaturvedi, K., Deng, Z., Di Francesco, V., Dunn, P., Eilbeck, K., Evangelista, C., Gabrielian, A.E., Gan, W., Ge, W., Gong, F., Gu, Z., Guan, P., Heiman, T.J., Higgins, M.E., Ji, R.R., Ke, Z., Ketchum, K.A., Lai, Z., Lei, Y., Li, Z., Li, J., Liang, Y., Lin, X., Lu, F., Merkulov, G.V., Milshina, N., Moore, H.M., Naik, A.K., Narayan, V.A., Neelam, B., Nusskern, D., Rusch, D.B., Salzberg, S., Shao, W., Shue, B., Sun, J., Wang, Z., Wang, A., Wang, X., Wang, J., Wei, M., Wides, R., Xiao, C., Yan, C., Yao, A., Ye, J., Zhan, M., Zhang, W., Zhang, H., Zhao, Q., Zheng, L., Zhong, F., Zhong, W., Zhu, S., Zhao, S., Gilbert, D., Baumhueter, S., Spier, G., Carter, C., Cravchik, A., Woodage, T., Ali, F., An, H., Awe, A., Baldwin, D., Baden, H., Barnstead, M., Barrow, I., Beeson, K., Busam, D., Carver, A., Center, A., Cheng, M. L., Curry, L., Danaher, S., Davenport, L., Desilets, R., Dietz, S., Dodson, K., Doup, L., Ferriera, S., Garg, N., Gluecksmann, A., Hart, B., Haynes, J., Haynes, C., Heiner, C., Hladun, S., Hostin, D., Houck, J., Howland, T., Ibegwam, C., Johnson, J., Kalush, F., Kline, L., Koduru, S., Love, A., Mann, F., May, D., McCawley, S., Mcintosh, T., McMullen, I., Moy, M., Moy, L., Murphy, B., Nelson, K., Pfannkoch, C., Pratts, E., Puri, V., Qureshi, H., Reardon, M., Rodriguez, R., Rogers, Y. H., Romblad, D., Ruhfel, B., Scott, R., Sitter, C., Smallwood, M., Stewart, E., Strong, R., Suh, E., Thomas, R., Tint, N.N., Tse, S., Vech, C., Wang, G., Wetter, J., Williams, S., Williams, M., Windsor, S., Winn-Deen, E., Wolfe, K., Zaveri, J., Zaveri, K., Abril, J. F., Guigo, R., Campbell, M.J., Sjolander, K.V., Karlak, A., Kejariwal, A., Mi, H., Lazareva, B., Hatton, T., Narechania, A., Diemer, K., Muruganujan, A., Guo, N., Sato, S., Bafna, V., Istrail, S., Lippert, R., Schwartz, R., Walenz, B., Yooseph, S., Allen, D., Basu, A., Baxendale, J., Blick, L., Caminha, M., Carnes-Stine, J., Caulk, P., Chiang, Y.H., Coyne, M., Dahlke, C., Mays, A., Dombroski, M., Donnelly, M., Ely, D., Esparham, S., Fosler, C., Gire, H., Glanowski, S., Glasser, K., Glodek, A., Gorokhov, M., Graham, K., Gropman, B., Harris, M., Heil, J., Henderson, S., Hoover, J., Jennings, D., Jordan, C., Jordan, J., Kasha, J., Kagan, L., Kraft, C., Levitsky, A., Lewis, M., Liu, X., Lopez, J., Ma, D., Majoros, W., McDaniel, J., Murphy, S., Newman, M., Nguyen, T., Nguyen, N., Nodell, M., Pan, S., Peck, J., Peterson, M., Rowe, W., Sanders, R., Scott, J., Simpson, M., Smith, T., Sprague A., Stockwell, T., Turner, R., Venter, E., Wang, M., Wen, M., Wu, D., Wu, M., Xia, A., Zandieh, A. and Zhu, X. 2001. The sequence of the human genome. Science 291:1304–1351.

    Article  PubMed  CAS  Google Scholar 

  • Walsh, S., Anderson, M. and Cartinhour, S.W. 1998. ACEDB: a database for genome information. Meth. Biochem. Anal. 39: 299–318.

    Article  CAS  Google Scholar 

  • White, J.A., Apweiler, R., Blake, J.A., Eppig, J.T., Maltais, L.J. and Povey, S. 1999. Report of the Second International Nomenclature Workshop. Cambridge, UK, 1–2 May 1999. Genomics 62: 320–323.

    Article  PubMed  CAS  Google Scholar 

  • Wixon, J. and Kell, D. 2000. The Kyoto encyclopedia of genes and genomes-KEGG. Yeast 17: 48–55.

    Article  PubMed  CAS  Google Scholar 

  • Yephremov, A., Wisman, E., Huijser, P., Huijser, C., Wellesen, K. and Saedler, H. 1999. Characterization of the FIDDLE-HEAD gene of Arabidopsis reveals a link between adhesion response and cell differentiation in the epidermis. Plant Cell 11: 2187–2201.

    PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Leonore Reiser .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Reiser, L., Mueller, L.A., Rhee, S.Y. (2002). Surviving in a sea of data: a survey of plant genome data resources and issues in building data management systems. In: Town, C. (eds) Functional Genomics. Springer, Dordrecht. https://doi.org/10.1007/978-94-010-0448-0_5

Download citation

  • DOI: https://doi.org/10.1007/978-94-010-0448-0_5

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-010-3903-1

  • Online ISBN: 978-94-010-0448-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics