Skip to main content

2018 | OriginalPaper | Buchkapitel

11. Exploration of Protein Secondary Structures in Relational Databases with Multi-threaded PSS-SQL

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Protein secondary structure reveals important information regarding protein construction and regular spatial shapes, including alpha-helices, beta-strands, and loops, which protein amino acid chain can adopt in some of its regions. The relevance of this information and the scope of its practical applications cause the requirement for its effective storage and processing. In this chapter, we will see how protein secondary structures can be stored in the relational database and processed with the use of the PSS-SQL. The PSS-SQL is an extension to the SQL language. It allows formulation of queries against a relational database in order to find proteins having secondary structures similar to the structural pattern specified by a user. In the chapter, we will see how this process can be accelerated by parallel implementation of the alignment using multiple threads working on multi-core CPUs.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Anvik, J., MacDonald, S., Szafron, D., Schaeffer, J., Bromling, S., Tan, K.: Generating parallel programs from the wavefront design pattern. In: Proceedings 16th International Parallel and Distributed Processing Symposium, p. 8 (2002) Anvik, J., MacDonald, S., Szafron, D., Schaeffer, J., Bromling, S., Tan, K.: Generating parallel programs from the wavefront design pattern. In: Proceedings 16th International Parallel and Distributed Processing Symposium, p. 8 (2002)
3.
Zurück zum Zitat Apweiler, R., Bairoch, A., Wu, C.H., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., Martin, M.J., Natale, D.A., ODonovan, C., Redaschi, N., Yeh, L.L.: UniProt: the universal protein knowledgebase. Nucleic Acids Res. 32(suppl-1), D115–D119 (2004). https://doi.org/10.1093/nar/gkh131CrossRef Apweiler, R., Bairoch, A., Wu, C.H., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., Martin, M.J., Natale, D.A., ODonovan, C., Redaschi, N., Yeh, L.L.: UniProt: the universal protein knowledgebase. Nucleic Acids Res. 32(suppl-1), D115–D119 (2004). https://​doi.​org/​10.​1093/​nar/​gkh131CrossRef
4.
Zurück zum Zitat Berman, H., et al.: The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000)CrossRef Berman, H., et al.: The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000)CrossRef
5.
Zurück zum Zitat Boutet, E., Lieberherr, D., Tognolli, M., Schneider, M., Bansal, P., Bridge, A.J., Poux, S., Bougueleret, L., Xenarios, I.: UniProtKB/Swiss-Prot, the manually annotated section of the UniProt knowledgebase: how to use the entry view, 23–54 (2016)CrossRef Boutet, E., Lieberherr, D., Tognolli, M., Schneider, M., Bansal, P., Bridge, A.J., Poux, S., Bougueleret, L., Xenarios, I.: UniProtKB/Swiss-Prot, the manually annotated section of the UniProt knowledgebase: how to use the entry view, 23–54 (2016)CrossRef
6.
Zurück zum Zitat Can, T., Wang, Y.F.: CTSS: a robust and efficient method for protein structure alignment based on local geometrical and biological features. In: Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference (CSB2003), pp. 169–179 (2003) Can, T., Wang, Y.F.: CTSS: a robust and efficient method for protein structure alignment based on local geometrical and biological features. In: Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference (CSB2003), pp. 169–179 (2003)
7.
Zurück zum Zitat Date, C.: An Introduction to Database Systems, 8th edn. Addison-Wesley, USA (2003) Date, C.: An Introduction to Database Systems, 8th edn. Addison-Wesley, USA (2003)
8.
Zurück zum Zitat Frishman, D., Argos, P.: Incorporation of non-local interactions in protein secondary structure prediction from the amino acid sequence. Protein Eng. 9(2), 133–142 (1996)CrossRef Frishman, D., Argos, P.: Incorporation of non-local interactions in protein secondary structure prediction from the amino acid sequence. Protein Eng. 9(2), 133–142 (1996)CrossRef
9.
Zurück zum Zitat Gibrat, J., Madej, T., Bryant, S.: Surprising similarities in structure comparison. Curr. Opin. Struct. Biol. 6(3), 377–385 (1996)CrossRef Gibrat, J., Madej, T., Bryant, S.: Surprising similarities in structure comparison. Curr. Opin. Struct. Biol. 6(3), 377–385 (1996)CrossRef
10.
Zurück zum Zitat Hammel, L., Patel, J.M.: Searching on the secondary structure of protein sequences. In: Bernstein, P.A., Ioannidis, Y.E., Ramakrishnan, R., Papadias, D. (eds.) VLDB ’02: Proceedings of the 28th International Conference on Very Large Databases, pp. 634–645. Morgan Kaufmann, San Francisco (2002)CrossRef Hammel, L., Patel, J.M.: Searching on the secondary structure of protein sequences. In: Bernstein, P.A., Ioannidis, Y.E., Ramakrishnan, R., Papadias, D. (eds.) VLDB ’02: Proceedings of the 28th International Conference on Very Large Databases, pp. 634–645. Morgan Kaufmann, San Francisco (2002)CrossRef
12.
Zurück zum Zitat Kabsch, W., Sander, C.: Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22(12), 2577–2637 (1987)CrossRef Kabsch, W., Sander, C.: Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22(12), 2577–2637 (1987)CrossRef
13.
Zurück zum Zitat Källberg, M., Wang, H., Wang, S., Peng, J., Wang, Z., Lu, H., Xu, J.: Template-based protein structure modeling using the RaptorX web server. Nat. Protoc. 7, 1511–1522 (2012)CrossRef Källberg, M., Wang, H., Wang, S., Peng, J., Wang, Z., Lu, H., Xu, J.: Template-based protein structure modeling using the RaptorX web server. Nat. Protoc. 7, 1511–1522 (2012)CrossRef
14.
Zurück zum Zitat Liu, W., Schmidt, B.: Parallel design pattern for computational biology and scientific computing applications. In: 2003 Proceedings of IEEE International Conference on Cluster Computing, pp. 456–459 (2003) Liu, W., Schmidt, B.: Parallel design pattern for computational biology and scientific computing applications. In: 2003 Proceedings of IEEE International Conference on Cluster Computing, pp. 456–459 (2003)
16.
Zurück zum Zitat Mrozek, D., Małysiak-Mrozek, B.: CASSERT: a two-phase alignment algorithm for matching 3D structures of proteins. In: Kwiecień, A., Gaj, P., Stera, P. (eds.) Computer Networks. Communications in Computer and Information Science, vol. 370, pp. 334–343. Springer International Publishing, Berlin (2013)CrossRef Mrozek, D., Małysiak-Mrozek, B.: CASSERT: a two-phase alignment algorithm for matching 3D structures of proteins. In: Kwiecień, A., Gaj, P., Stera, P. (eds.) Computer Networks. Communications in Computer and Information Science, vol. 370, pp. 334–343. Springer International Publishing, Berlin (2013)CrossRef
17.
Zurück zum Zitat Mrozek, D., Wieczorek, D., Małysiak-Mrozek, B., Kozielski, S.: PSS-SQL: protein secondary structure - structured query language. In: 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology, pp. 1073–1076 (2010) Mrozek, D., Wieczorek, D., Małysiak-Mrozek, B., Kozielski, S.: PSS-SQL: protein secondary structure - structured query language. In: 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology, pp. 1073–1076 (2010)
18.
Zurück zum Zitat Mrozek, D., Małysiak-Mrozek, B., Socha, B., Kozielski, S.: Selection of a consensus area size for multithreaded wavefront-based alignment procedure for compressed sequences of protein secondary structures. In: Kryszkiewicz, M., Bandyopadhyay, S., Rybinski, H., Pal, S.K. (eds.) Pattern Recognition and Machine Intelligence. Lecture Notes Computer Science, vol. 9124, pp. 472–481. Springer International Publishing, Cham (2015) Mrozek, D., Małysiak-Mrozek, B., Socha, B., Kozielski, S.: Selection of a consensus area size for multithreaded wavefront-based alignment procedure for compressed sequences of protein secondary structures. In: Kryszkiewicz, M., Bandyopadhyay, S., Rybinski, H., Pal, S.K. (eds.) Pattern Recognition and Machine Intelligence. Lecture Notes Computer Science, vol. 9124, pp. 472–481. Springer International Publishing, Cham (2015)
22.
Zurück zum Zitat Shapiro, J., Brutlag, D.: FoldMiner and LOCK2: protein structure comparison and motif discovery on the Web. Nucleic Acids Res. 32, 536–41 (2004)CrossRef Shapiro, J., Brutlag, D.: FoldMiner and LOCK2: protein structure comparison and motif discovery on the Web. Nucleic Acids Res. 32, 536–41 (2004)CrossRef
24.
Zurück zum Zitat Socha, B.: Multithreaded execution of the Smith-Waterman algorithm in the query language for protein secondary structures. Master’s thesis, Institute of Informatics, Silesian University of Technology, Gliwice, Poland (2013) Socha, B.: Multithreaded execution of the Smith-Waterman algorithm in the query language for protein secondary structures. Master’s thesis, Institute of Informatics, Silesian University of Technology, Gliwice, Poland (2013)
26.
Zurück zum Zitat Tata, S., Friedman, J.S., Swaroop, A.: Declarative querying for biological sequences. In: 22nd International Conference on Data Engineering (ICDE’06), pp. 87–98 (2006) Tata, S., Friedman, J.S., Swaroop, A.: Declarative querying for biological sequences. In: 22nd International Conference on Data Engineering (ICDE’06), pp. 87–98 (2006)
27.
Zurück zum Zitat Wang, Y., Sunderraman, R., Tian, H.: A domain specific data management architecture for protein structure data. In: 2006 International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 5751–5754 (2006) Wang, Y., Sunderraman, R., Tian, H.: A domain specific data management architecture for protein structure data. In: 2006 International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 5751–5754 (2006)
28.
Zurück zum Zitat Wieczorek, D., Małysiak-Mrozek, B., Kozielski, S., Mrozek, D.: A declarative query language for protein secondary structures. J. Med. Inform. Technol. 16, 139–148 (2010) Wieczorek, D., Małysiak-Mrozek, B., Kozielski, S., Mrozek, D.: A declarative query language for protein secondary structures. J. Med. Inform. Technol. 16, 139–148 (2010)
29.
Zurück zum Zitat Wieczorek, D., Małysiak-Mrozek, B., Kozielski, S., Mrozek, D.: A method for matching sequences of protein secondary structures. J. Med. Inform. Technol. 16, 133–137 (2010) Wieczorek, D., Małysiak-Mrozek, B., Kozielski, S., Mrozek, D.: A method for matching sequences of protein secondary structures. J. Med. Inform. Technol. 16, 133–137 (2010)
30.
Zurück zum Zitat Yang, Y., Faraggi, E., Zhao, H., Zhou, Y.: Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates. Bioinformatics 27(15), 2076–2082 (2011). http://dx.doi.org/10.1093/bioinformatics/btr350CrossRef Yang, Y., Faraggi, E., Zhao, H., Zhou, Y.: Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates. Bioinformatics 27(15), 2076–2082 (2011). http://​dx.​doi.​org/​10.​1093/​bioinformatics/​btr350CrossRef
31.
Zurück zum Zitat Zomaya, A.Y.: Parallel Computing for Bioinformatics and Computational Biology: Models, Enabling Technologies, and Case Studies, 1st edn. Wiley-Interscience, New York (2006) Zomaya, A.Y.: Parallel Computing for Bioinformatics and Computational Biology: Models, Enabling Technologies, and Case Studies, 1st edn. Wiley-Interscience, New York (2006)
Metadaten
Titel
Exploration of Protein Secondary Structures in Relational Databases with Multi-threaded PSS-SQL
verfasst von
Dariusz Mrozek
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-98839-9_11