Skip to main content

2015 | OriginalPaper | Buchkapitel

Big Data Analytics and Its Prospects in Computational Proteomics

verfasst von : Sagnik Banerjee, Subhadip Basu, Mita Nasipuri

Erschienen in: Information Systems Design and Intelligent Applications

Verlag: Springer India

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The volume and variety of data in biology is increasing at an exponential velocity. Every week new proteins are getting sequenced and novel structures are being discovered. With the advent of hitherto unknown diseases, it has become imperative that vaccines and drugs be designed as fast as possible. This is causing an immense surge of information which is becoming increasing difficult to process due to limited computational resources. Thus the need of the hour is to harness technologies, like Big Data, which will help distribute computations over a group of nodes and hasten the process of data analysis. In this paper we have explored some techniques to dispense the job of data analysis to several computers which could work in parallel and reach a solution faster.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10. IEEE (2010) Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10. IEEE (2010)
2.
Zurück zum Zitat May, M.: Life science technologies: big biological impacts from big data. Science (80), 344, 1298–1300 (2014) May, M.: Life science technologies: big biological impacts from big data. Science (80), 344, 1298–1300 (2014)
3.
Zurück zum Zitat Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The protein data bank. Nucleic Acids Res. 28, 235–242 (2000)CrossRef Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The protein data bank. Nucleic Acids Res. 28, 235–242 (2000)CrossRef
4.
Zurück zum Zitat The UniProt Consortium: Update on activities at the Universal Protein Resource (UniProt) in 2013. Nucleic Acids Res. 41, D43–D47 (2013)CrossRef The UniProt Consortium: Update on activities at the Universal Protein Resource (UniProt) in 2013. Nucleic Acids Res. 41, D43–D47 (2013)CrossRef
5.
Zurück zum Zitat Finn, R.D., Bateman, A., Clements, J., Coggill, P., Eberhardt, R.Y., Eddy, S.R., Heger, A., Hetherington, K., Holm, L., Mistry, J., Sonnhammer, E.L.L., Tate, J., Punta, M.: Pfam: the protein families database. Nucleic Acids Res. 42, D222–D230 (2014)CrossRef Finn, R.D., Bateman, A., Clements, J., Coggill, P., Eberhardt, R.Y., Eddy, S.R., Heger, A., Hetherington, K., Holm, L., Mistry, J., Sonnhammer, E.L.L., Tate, J., Punta, M.: Pfam: the protein families database. Nucleic Acids Res. 42, D222–D230 (2014)CrossRef
6.
Zurück zum Zitat Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C.: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536–540 (1995) Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C.: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536–540 (1995)
7.
Zurück zum Zitat Andreeva, A., Howorth, D., Chothia, C., Kulesha, E., Murzin, A.G.: SCOP2 prototype: a new approach to protein structure mining. Nucleic Acids Res. 42, D310–D314 (2014)CrossRef Andreeva, A., Howorth, D., Chothia, C., Kulesha, E., Murzin, A.G.: SCOP2 prototype: a new approach to protein structure mining. Nucleic Acids Res. 42, D310–D314 (2014)CrossRef
8.
Zurück zum Zitat Orengo, C.A., Michie, A.D., Jones, S., Jones, D.T., Swindells, M.B., Thornton, J.M.: CATH–a hierarchic classification of protein domain structures. Structure 5, 1093–1109 (1997)CrossRef Orengo, C.A., Michie, A.D., Jones, S., Jones, D.T., Swindells, M.B., Thornton, J.M.: CATH–a hierarchic classification of protein domain structures. Structure 5, 1093–1109 (1997)CrossRef
9.
Zurück zum Zitat Grabowski, T.J., Cho, H.S., Vonsattel, J.P.G., Rebeck, G.W., Greenberg, S.M.: Novel amyloid precursor protein mutation in an Iowa family with dementia and severe cerebral amyloid angiopathy. Ann. Neurol. 49, 697–705 (2001)CrossRef Grabowski, T.J., Cho, H.S., Vonsattel, J.P.G., Rebeck, G.W., Greenberg, S.M.: Novel amyloid precursor protein mutation in an Iowa family with dementia and severe cerebral amyloid angiopathy. Ann. Neurol. 49, 697–705 (2001)CrossRef
10.
Zurück zum Zitat Blum, M., Floyd, R.W., Pratt, V., Rivest, R.L., Tarjan, R.E.: Time bounds for selection. J. Comput. Syst. Sci. 7, 448–461 (1973)MathSciNetCrossRefMATH Blum, M., Floyd, R.W., Pratt, V., Rivest, R.L., Tarjan, R.E.: Time bounds for selection. J. Comput. Syst. Sci. 7, 448–461 (1973)MathSciNetCrossRefMATH
11.
Zurück zum Zitat Liao, C.-S., Lu, K., Baym, M., Singh, R., Berger, B.: IsoRankN: spectral methods for global alignment of multiple protein networks. Bioinformatics 25, i253–i258 (2009)CrossRef Liao, C.-S., Lu, K., Baym, M., Singh, R., Berger, B.: IsoRankN: spectral methods for global alignment of multiple protein networks. Bioinformatics 25, i253–i258 (2009)CrossRef
12.
Zurück zum Zitat Hecker, M., Lambeck, S., Toepfer, S., Van Someren, E., Guthke, R.: Gene regulatory network inference: data integration in dynamic models—a review. Biosystems 96, 86–103 (2009)CrossRef Hecker, M., Lambeck, S., Toepfer, S., Van Someren, E., Guthke, R.: Gene regulatory network inference: data integration in dynamic models—a review. Biosystems 96, 86–103 (2009)CrossRef
13.
Zurück zum Zitat Sumazin, P., Yang, X., Chiu, H.H.-S., Chung, W.W.-J., Iyer, A., Llobet-Navas, D., Rajbhandari, P., Bansal, M., Guarnieri, P., Silva, J.: An extensive microRNA-mediated network of RNA–RNA interactions regulates established oncogenic pathways in glioblastoma. Cell 147, 370–381 (2011)CrossRef Sumazin, P., Yang, X., Chiu, H.H.-S., Chung, W.W.-J., Iyer, A., Llobet-Navas, D., Rajbhandari, P., Bansal, M., Guarnieri, P., Silva, J.: An extensive microRNA-mediated network of RNA–RNA interactions regulates established oncogenic pathways in glioblastoma. Cell 147, 370–381 (2011)CrossRef
14.
Zurück zum Zitat Pancaldi, V., Bähler, J.: In silico characterization and prediction of global protein–mRNA interactions in yeast. Nucleic Acids Res. 39, 5826–5836 (2011)CrossRef Pancaldi, V., Bähler, J.: In silico characterization and prediction of global protein–mRNA interactions in yeast. Nucleic Acids Res. 39, 5826–5836 (2011)CrossRef
15.
Zurück zum Zitat Chatterjee, P., Basu, S., Kundu, M., Nasipuri, M., Plewczynski, D.: PPI_SVM: prediction of protein–protein interactions using machine learning, domain–domain affinities and frequency tables. Cell. Mol. Biol. Lett. 16, 264–278 (2011)CrossRef Chatterjee, P., Basu, S., Kundu, M., Nasipuri, M., Plewczynski, D.: PPI_SVM: prediction of protein–protein interactions using machine learning, domain–domain affinities and frequency tables. Cell. Mol. Biol. Lett. 16, 264–278 (2011)CrossRef
16.
Zurück zum Zitat Bas, D.C., Rogers, D.M., Jensen, J.H.: Very fast prediction and rationalization of pKa values for protein–ligand complexes. Proteins Struct. Funct. Bioinf. 73, 765–783 (2008)CrossRef Bas, D.C., Rogers, D.M., Jensen, J.H.: Very fast prediction and rationalization of pKa values for protein–ligand complexes. Proteins Struct. Funct. Bioinf. 73, 765–783 (2008)CrossRef
17.
Zurück zum Zitat Basu, S., Plewczynski, D.: AMS 3.0: prediction of post-translational modifications. BMC Bioinf. 11, 210 (2010)CrossRef Basu, S., Plewczynski, D.: AMS 3.0: prediction of post-translational modifications. BMC Bioinf. 11, 210 (2010)CrossRef
18.
Zurück zum Zitat Plewczynski, D., Basu, S., Saha, I.: AMS 4.0: consensus prediction of post-translational modifications in protein sequences. Amino Acids 43, 573–582 (2012)CrossRef Plewczynski, D., Basu, S., Saha, I.: AMS 4.0: consensus prediction of post-translational modifications in protein sequences. Amino Acids 43, 573–582 (2012)CrossRef
19.
Zurück zum Zitat Chatterjee, P., Basu, S., Kundu, M., Nasipuri, M., Plewczynski, D.: PSP_MCSVM: brainstorming consensus prediction of protein secondary structures using two-stage multiclass support vector machines. J. Mol. Model. 17, 2191–2201 (2011)CrossRef Chatterjee, P., Basu, S., Kundu, M., Nasipuri, M., Plewczynski, D.: PSP_MCSVM: brainstorming consensus prediction of protein secondary structures using two-stage multiclass support vector machines. J. Mol. Model. 17, 2191–2201 (2011)CrossRef
20.
Zurück zum Zitat Sriwastava, B.K., Basu, S., Maulik, U., Plewczynski, D.: PPIcons: identification of protein–protein interaction sites in selected organisms. J. Mol. Model. 19, 4059–4070 (2013)CrossRef Sriwastava, B.K., Basu, S., Maulik, U., Plewczynski, D.: PPIcons: identification of protein–protein interaction sites in selected organisms. J. Mol. Model. 19, 4059–4070 (2013)CrossRef
Metadaten
Titel
Big Data Analytics and Its Prospects in Computational Proteomics
verfasst von
Sagnik Banerjee
Subhadip Basu
Mita Nasipuri
Copyright-Jahr
2015
Verlag
Springer India
DOI
https://doi.org/10.1007/978-81-322-2247-7_60

Premium Partner