Abstract
Exponential growth of data, largely from whole-genome analyses, has changed the way biologists think about and handle data. Optimal use of these data requires effective methods to analyze and manage these data sets. Computers, software and the World Wide Web are now integral components of biological discovery. Understanding how information is obtained, processed and annotated in public databases allows researchers to effectively organize, analyze and export their own data into these databases. In this review we focus largely on two areas related to management of genomic data. We cite examples of resources available in the public domain and describe some of the software for data management systems currently available for plant research. In addition, we discuss a few concepts of data management from the perspective of an individual or group that wishes to provide data to the public databases, to use the information in the public databases more efficiently, or to develop a database to manage large data sets internally or for public access. These concepts include data descriptions, exchange format, curation, attribution, and database implementation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aach, J., Bulyk, M.L., Church, G.M., Comander, J., Derti, A. and Shendure, J. 2001. Computational comparison of two draft sequences of the human genome. Nature 409: 856–859.
Achard, F., Vaysseix, G. and Barillot, E. 2001. XML, bioinformatics and data integration. Bioinformatics 17: 115–125.
Adams, M.D., Celniker, S.E., Holt, R.A., Evans, C.A., Gocayne, J.D., Amanatides, P.G., Scherer, S.E., Li, P.W., Hoskins, R.A., Galle, R.F., George, R.A., Lewis, S.E., Richards, S., Ashburner, M., Henderson, S.N., Sutton, G.G., Wortman, J.R., Yandell, M.D., Zhang, Q., Chen, L.X., Brandon, R.C., Rogers, Y.H., Blazej, R.G., Champe, M., Pfeiffer, B.D., Wan, K.H., Doyle, C., Baxter, E.G., Helt, G., Nelson, C.R., Gabor, G.L., Abril, J.F., Agbayani, A., An, H.J., Andrews-Pfannkoch, C., Baldwin, D., Ballew, R.M., Basu, A., Baxendale, J., Bayraktaroglu, L., Beasley, E.M., Beeson, K.Y., Benos, P.V., Berman, B.P., Bhan-dari, D., Bolshakov, S., Borkova, D., Botchan, M.R., Bouck, J. et al., 2000. The genome sequence of Drosophila melanogaster. Science 287:2185–2195.
Altschul, S.F., Gish, W., Miller, W., Myers, E.W. and Lipman, D.J. 1990. Basic Local Alignment Search Tool. J. Mol. Biol. 215: 403–410.
Apweiler, R., Attwood, T.K., Bairoch, A., Bateman, A., Birney, E., Biswas, M., Bucher, P., Cerutti, L., Corpet, F., Croning, M.D., Durbin, R., Falquet, L., Fleischmann, W., Gouzy, J., Hermjakob, H., Hulo, N., Jonassen, I., Kahn, D., Kanapin, A., Karavidopoulou, Y., Lopez, R., Marx, B., Mulder, N.J., Oinn, T.M., Pagni, M. and Servant, F. 2001. The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucl. Acids Res. 29: 37–40.
Arabidopsis Genome Initiative. 2000. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796–815.
Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M. and Sherlock, G. 2000. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genet. 25: 25–29.
Bairoch, A. and Apweiler, R. 2000. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucl. Acids Res. 28: 45–48.
Baker, P.G., Goble, C.A., Bechhofer, S., Paton, N.W., Stevens, R. and Brass, A. 1999. An ontology for bioinformatics applications. Bioinformatics 15:510–520.
Bateman, A., Birney, E., Durbin, R., Eddy, S.R., Howe, K.L. and Sonnhammer, E.L. 2000. The Pfam protein families database. Nucl. Acids Res. 28: 263–266.
Baxevanis, A.D. 1998. Information retrieval from biological data-bases. Meth. Biochem. Anal. 39: 98–120.
Baxevanis, A.D. 2001. The Molecular Biology Database Collection: an updated compilation of biological database resources. Nucl. Acids Res. 29: 1–10.
Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., Rapp, B.A. and Wheeler, D.L. 2000. GenBank. Nucl. Acids Res. 28: 15–18.
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N. and Bourne, P.E. 2000. The Protein Data Bank. Nucl. Acids Res. 28: 235–242.
Brenner, S.E. 1999. Errors in genome annotation. Trends Genet. 15: 132–133.
Busch, M., Mayer, U. and Jurgens, G. 1996. Molecular analysis of the Arabidopsis pattern formation of gene GNOM: gene structure and intragenic complementation. Mol. Gen. Genet. 250: 681–691.
Cartinhour, S.W. 1997. Public informatics resources for rice and other grasses. Plant Mol. Biol. 35: 241–251.
Dicks, J., Anderson, M., Cardie, L., Cartinhour, S., Couchman, M., Davenport, G., Dickson, J., Gale, M., Marshall, D., May, S., McWilliam, H., O’Malia, A., Ougham, H., Trick, M., Walsh, S. and Waugh, R. 2000. UK CropNet: a collection of databases and bioinformatics resources for crop plant genomics. Nucl. Acids Res. 28: 104–107.
Eppig, J.T. 2000. Algorithms for mutant sorting: the need for phenotype vocabularies. Mamm. Genome 11: 584–589.
Frishman, D., Albermann, K., Hani, J., Heumann, K., Metanomski, A., Zollner, A. and Mewes, H.W. 2001. Functional and structural genomics using PEDANT. Bioinformatics 17: 44–57.
Gai, X., Lai, S., Xing, L., Brendel, V. and Walbot, V. 2000. Gene discovery using the maize genome database ZmDB. Nucl. Acids Res. 28: 94–96.
Gene Ontology Consortium. 2001. Creating the gene ontology resource: design and implementation. Genome Res. 11: 1425–1433.
Hofmann, K., Bucher, P., Falquet, L. and Bairoch, A. 1999. The PROSITE database, its status in 1999. Nucl. Acids Res. 27: 215–219.
Huala, E., Dickerman, A.W., Garcia-Hernandez, M., Weems, D., Reiser, L., LaFond, F., Hanley, D., Kiphart, D., Zhuang, M., Huang, W., Mueller, L.A., Bhattacharyya, D., Bhaya, D., Sobral, B.W., Beavis, W., Meinke, D.W., Town, C.D., Somerville, C. and Rhee, S.Y. 2001. The Arabidopsis Information Resource (TAIR): a comprehensive database and web-based information retrieval, analysis, and visualization system for a model plant. Nucl. Acids Res. 29: 102–105.
Kaminski, N. 2000. Bioinformatics. A user’s perspective. Am. J. Respir. Cell Mol. Biol. 23: 705–711.
Karp, P.D. 2000. An ontology for biological function based on molecular interactions. Bioinformatics 16: 269–285.
Karp, P.D. 1998. What we do not know about sequence analysis and sequence databases. Bioinformatics 14: 753–754
Kawai, J., Shinagawa, A., Shibata, K., Yoshino, M., Itoh, M., Ishii, Y., Arakawa, T., Hara, A., Fukunishi, Y., Konno, H., Adachi, J., Fukuda, S., Aizawa, K., Izawa, M., Nishi, K., Kiyosawa, H., Kondo, S., Yamanaka, I., Saito, T., Okazaki, Y., Gojobori, T., Bono, H., Kasukawa, T., Saito, R., Kadota, K., Matsuda, H. A., Ashburner, M., Batalov, S., Casavant, T., Fleischmann, W., Gaasterland, T., Gissi, C., King, B., Kochiwa, H., Kuehl, P., Lewis, S., Matsuo, Y., Nikaido, I., Pesole, G., Quackenbush, J., Schriml, L.M., Staubli, F., Suzuki, R., Tomita, M., Wagner, L., Washio, T., Sakai, K., Okido, T., Furuno, M., Aono, H., Baldarelli, R., Barsh, G., Blake, J., Boffelli, D., Bojunga, N., Carninci, R, de Bonaldo, M.F., Brownstein, M.J., Bult, C., Fletcher, C., Fujita, M., Gariboldi, M., Gustincich, S., Hill, D., Hofmann, M., Hume, D.A., Kamiya, M., Lee, N.H., Lyons, P., Marchionni, L., Mashima, J., Mazzarelli, J., Mombaerts, P., Nordone, R., Ring, B., Ringwald, M., Rodriguez, I., Sakamoto, N., Sasaki, H., Sato, K., Schonbach, C., Seya, T., Shibata, Y, Storch, K.F., Suzuki, H., Toyo-oka, K., Wang, K.H., Weitz, C., Whittaker, C., Wilming, L., Wynshaw-Boris, A., Yoshida, K., Hasegawa, Y., Kawaji, H., Kohtsuki, S. and Hayashizaki, Y. 2001. Functional annotation of a full-length mouse cDNA collection. Nature 409: 685–690.
Lo Conte, L., Ailey, B., Hubbard, T.J., Brenner, S.E., Murzin, A.G. and Chothia, C. 2000. SCOP: a structural classification of proteins database. Nucl. Acids Res. 28: 257–259.
Meinke, D. 1995. Genetic nomenclature guide. Arabidopsis thaliana. Trends Genet. (AUTHOR: PLEASE MENTION VOLUME): 22–23.
Paton, N.W., Khan, S.A., Hayes, A., Moussouni, F., Brass, A., Eilbeck, K., Goble, C.A., Hubbard, S.J. and Oliver, S.G. 2000. Conceptual modelling of genomic information. Bioinformatics 16: 548–557.
Price, C., Reardon, E.M. and Lonsdale, D. 1996. A guide to naming sequenced plant genes. Plant Mol. Biol. 30: 225–227
Rhee, S.Y. 2000. Bioinformatic resources, challenges, and opportunities using Arabidopsis as a model organism in a post-genomic era. Plant Physiol. 124: 1460–1464.
Rhee, S.Y and Flanders, D.J. 2000. Web-based bioinformatic tools for Arabidopsis researchers. In: Z. Wilson (Ed.) Arabidopsis: A Practical Approach, Oxford University Press, Oxford, pp. 225–265.
Riley, M. 1993. Functions of the gene products of Escherichia coli. Microbiol. Rev. 57: 862–952.
Schenk, P.M., Kazan, K., Wilson, I., Anderson, J.P., Richmond, T., Somerville, S.C. and Manners, J.M. 2000. Coordinated plant defense responses in Arabidopsis revealed by microarray analysis. Proc. Natl. Acad. Sci. USA 97: 11655–11660.
Scholl, R.L., May, S.T. and Ware, D.H. 2000. Seed and molecular resources for Arabidopsis. Plant Physiol. 124: 1477–1480.
Sehulze-Kremer, S. 1998. Ontologies for molecular biology. Pac. Symp. Biocomput.: 695–706.
Shevell, D.E., Leu, W.M., Gillmor, C.S., Xia, G., Feldmann, K.A. and Chua, N.H. 1994. EMB30 is essential for normal cell division, cell expansion, and cell adhesion in Arabidopsis and encodes a protein that has similarity to Sec7. Cell 77: 1051–1062.
Siepel, A., Farmer, A., Tolopko, A., Zhuang, M., Mendes, P., Beavis, W. and Sobral, B. 2001. ISYS: a decentralized, component-based approach to the integration of heterogeneous bioinformatics resources. Bioinformatics 17: 83–94.
Stevens, R., Goble, C., Baker, P. and Brass, A. 2001. A classification of tasks in bioinformatics. Bioinformatics 17: 180–188.
Tatusov, R.L., Natale, D.A., Garkavtsev, I.V., Tatusova, T.A., Shankavaram, U.T., Rao, B.S., Kiryutin, B., Galperin, M.Y., Fedorova, N.D. and Koonin, E.V. 2001. The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucl. Acids Res. 29: 22–28.
Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W., Mural, R.J., Sutton, G.G., Smith, H.O., Yandell, M., Evans, C.A., Holt, R.A., Gocayne, J.D., Amanatides, P., Ballew, R.M., Huson, D.H., Wortman, J.R., Zhang, Q., Kodira, C.D., Zheng, X.H., Chen, L., Skupski, M., Subramanian, G., Thomas, P.D., Zhang, J., Gabor Miklos, G.L., Nelson, C., Broder, S., Clark, A.G., Nadeau, J., McKusick, V.A., Zinder, N., Levine, A.J., Roberts, R.J., Simon, M., Slayman, C., Hunkapiller, M., Bolanos, R., Delcher, A., Dew, I., Fasulo, D., Flanigan, M., Florea, L., Halpern, A., Hannenhalli, S., Kravitz, S., Levy, S., Mobarry, C, Reinert, K., Remington, K., Abu-Threideh, J., Beasley, E., Biddick, K., Bonazzi, V., Brandon, R., Cargill, M., Chandramouliswaran, I., Charlab, R., Chaturvedi, K., Deng, Z., Di Francesco, V., Dunn, P., Eilbeck, K., Evangelista, C., Gabrielian, A.E., Gan, W., Ge, W., Gong, F., Gu, Z., Guan, P., Heiman, T.J., Higgins, M.E., Ji, R.R., Ke, Z., Ketchum, K.A., Lai, Z., Lei, Y., Li, Z., Li, J., Liang, Y., Lin, X., Lu, F., Merkulov, G.V., Milshina, N., Moore, H.M., Naik, A.K., Narayan, V.A., Neelam, B., Nusskern, D., Rusch, D.B., Salzberg, S., Shao, W., Shue, B., Sun, J., Wang, Z., Wang, A., Wang, X., Wang, J., Wei, M., Wides, R., Xiao, C., Yan, C., Yao, A., Ye, J., Zhan, M., Zhang, W., Zhang, H., Zhao, Q., Zheng, L., Zhong, F., Zhong, W., Zhu, S., Zhao, S., Gilbert, D., Baumhueter, S., Spier, G., Carter, C., Cravchik, A., Woodage, T., Ali, F., An, H., Awe, A., Baldwin, D., Baden, H., Barnstead, M., Barrow, I., Beeson, K., Busam, D., Carver, A., Center, A., Cheng, M. L., Curry, L., Danaher, S., Davenport, L., Desilets, R., Dietz, S., Dodson, K., Doup, L., Ferriera, S., Garg, N., Gluecksmann, A., Hart, B., Haynes, J., Haynes, C., Heiner, C., Hladun, S., Hostin, D., Houck, J., Howland, T., Ibegwam, C., Johnson, J., Kalush, F., Kline, L., Koduru, S., Love, A., Mann, F., May, D., McCawley, S., Mcintosh, T., McMullen, I., Moy, M., Moy, L., Murphy, B., Nelson, K., Pfannkoch, C., Pratts, E., Puri, V., Qureshi, H., Reardon, M., Rodriguez, R., Rogers, Y. H., Romblad, D., Ruhfel, B., Scott, R., Sitter, C., Smallwood, M., Stewart, E., Strong, R., Suh, E., Thomas, R., Tint, N.N., Tse, S., Vech, C., Wang, G., Wetter, J., Williams, S., Williams, M., Windsor, S., Winn-Deen, E., Wolfe, K., Zaveri, J., Zaveri, K., Abril, J. F., Guigo, R., Campbell, M.J., Sjolander, K.V., Karlak, A., Kejariwal, A., Mi, H., Lazareva, B., Hatton, T., Narechania, A., Diemer, K., Muruganujan, A., Guo, N., Sato, S., Bafna, V., Istrail, S., Lippert, R., Schwartz, R., Walenz, B., Yooseph, S., Allen, D., Basu, A., Baxendale, J., Blick, L., Caminha, M., Carnes-Stine, J., Caulk, P., Chiang, Y.H., Coyne, M., Dahlke, C., Mays, A., Dombroski, M., Donnelly, M., Ely, D., Esparham, S., Fosler, C., Gire, H., Glanowski, S., Glasser, K., Glodek, A., Gorokhov, M., Graham, K., Gropman, B., Harris, M., Heil, J., Henderson, S., Hoover, J., Jennings, D., Jordan, C., Jordan, J., Kasha, J., Kagan, L., Kraft, C., Levitsky, A., Lewis, M., Liu, X., Lopez, J., Ma, D., Majoros, W., McDaniel, J., Murphy, S., Newman, M., Nguyen, T., Nguyen, N., Nodell, M., Pan, S., Peck, J., Peterson, M., Rowe, W., Sanders, R., Scott, J., Simpson, M., Smith, T., Sprague A., Stockwell, T., Turner, R., Venter, E., Wang, M., Wen, M., Wu, D., Wu, M., Xia, A., Zandieh, A. and Zhu, X. 2001. The sequence of the human genome. Science 291:1304–1351.
Walsh, S., Anderson, M. and Cartinhour, S.W. 1998. ACEDB: a database for genome information. Meth. Biochem. Anal. 39: 299–318.
White, J.A., Apweiler, R., Blake, J.A., Eppig, J.T., Maltais, L.J. and Povey, S. 1999. Report of the Second International Nomenclature Workshop. Cambridge, UK, 1–2 May 1999. Genomics 62: 320–323.
Wixon, J. and Kell, D. 2000. The Kyoto encyclopedia of genes and genomes-KEGG. Yeast 17: 48–55.
Yephremov, A., Wisman, E., Huijser, P., Huijser, C., Wellesen, K. and Saedler, H. 1999. Characterization of the FIDDLE-HEAD gene of Arabidopsis reveals a link between adhesion response and cell differentiation in the epidermis. Plant Cell 11: 2187–2201.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Reiser, L., Mueller, L.A., Rhee, S.Y. (2002). Surviving in a sea of data: a survey of plant genome data resources and issues in building data management systems. In: Town, C. (eds) Functional Genomics. Springer, Dordrecht. https://doi.org/10.1007/978-94-010-0448-0_5
Download citation
DOI: https://doi.org/10.1007/978-94-010-0448-0_5
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-010-3903-1
Online ISBN: 978-94-010-0448-0
eBook Packages: Springer Book Archive