skip to main content
research-article

Capri/MR: exploring protein databases from a structural and physicochemical point of view

Published:01 August 2008Publication History
Skip Abstract Section

Abstract

With the advent of high throughput systems to experimentally determine the three-dimensional (3-D) structure of proteins, molecular biologists are in urgent need of systems to automatically store, maintain and explore the vast structural databases that are thus being created. We have designed and implemented the Capri/MR system which makes it possible to identify families of protein structures, as contained in such very large 3-D protein structure databases. Our system is able to automatically index and search a database of proteins by three-dimensional shape, structural and/or physicochemical properties. For each of these diverse protein structure representations, we create a compact rotation and translation invariant index (or signature) which is placed in a database for future querying. A similarity search algorithm performs an exhaustive search against the entire database. Our search algorithm takes advantage of the compact signatures to rapidly find protein structures that are similar in 3-D shape and/or two-dimensional (2-D) properties. As a result, queries in our Capri/MR system run within a fraction of a second, and we are able to accurately group protein structures into the correct families, with very high precision and recall. In addition, our system dynamically processes new protein structures as they become available. We demonstrate the power of Capri/MR against the Protein Data Bank, which contains all known, experimentally determined, 3-D protein structures (48.000 as of January 2008). The main applications of our Capri/MR system lie in structural proteomics, protein evolution and mutation, as well as in drug design, in particular for studying the docking problem and the computer aided design of non-toxic drugs.

References

  1. J.-S. Yeh, D.-Y. Chen and M. Ouhyoung, A Web-based Protein Retrieval System by Matching Visual Similarity, Bioinformatics, 21(13), pages 3056--3057, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. M. Lesk, Introduction to Protein Science: Architecture, Function, and Genomics, Oxford University Press, 2004.Google ScholarGoogle Scholar
  3. E. Paquet and H. L. Viktor, Exploring Protein Architecture using 3D Shape-based Signatures, International Conference of the IEEE Engineering in Medicine and Biology Society, pages 1204--1208, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  4. E. Paquet and H. L. Viktor, Distributed Virtual Environments for Visualization and Visual Data Mining, ISPRS Int. Workshop on "Visualization and Animation of Reality-based 3D Models", 6 pages, CD ROM, 2003.Google ScholarGoogle Scholar
  5. H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov and P. E. Bourne, The Protein Data Bank, Nucleic Acids Research, 28, pages 235--242, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  6. A. G. Murzin, S. E. Brenner, T. Hubbard and C. Chothia, SCOP: A Structural Classification of Proteins Database of the Investigation of Sequences and Structures, Journal of Molecular Biology, 247, pages 536--540, 1995.Google ScholarGoogle Scholar
  7. P. Daras et al., Three-dimensional shape-structure comparison method for protein classification, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 3(3), pages 193--207, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Capri/MR: exploring protein databases from a structural and physicochemical point of view

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader