Skip to main content
Top

2017 | OriginalPaper | Chapter

Probabilistic and Likelihood-Based Methods for Protein Identification from MS/MS Data

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The process of identification of peptides from the mass spectra and the constituent proteins in a sample is called protein identification. In the current literature, there exist many proposed approaches for the protein identification problem based on tandem mass spectrometry (MS/MS) data. While there are many two-step protein identification procedures that first identify peptides in a separate process and then use the results in protein identification, in recent years there have been attempts to develop a one-step solution to the problem through simultaneous identification of proteins and peptides in a sample. We briefly introduce the probabilistic and likelihood-based two-step and one-step procedures and report some comparative performances of these procedures for different MS/MS data.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Yates, J. R., Ruse, C. I., & Nakorchevsky, M. (2009). Proteomics by mass spectrometry: Approaches, advances, and applications. Annual Review of Biomedical Engineering, 11(1), 49–79.CrossRef Yates, J. R., Ruse, C. I., & Nakorchevsky, M. (2009). Proteomics by mass spectrometry: Approaches, advances, and applications. Annual Review of Biomedical Engineering, 11(1), 49–79.CrossRef
2.
go back to reference Eng, J. K., McCormack, A. L., & Yates, J. R., III. (1994). An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. Journal of the American Society for Mass Spectrometry, 5(11), 976–989.CrossRef Eng, J. K., McCormack, A. L., & Yates, J. R., III. (1994). An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. Journal of the American Society for Mass Spectrometry, 5(11), 976–989.CrossRef
3.
go back to reference Eng, J. K., Fischer, B., Grossmann, J., & Maccoss, M. J. (2008). A fast SEQUEST cross correlation algorithm. Journal of Proteome Research, 7(10), 4598–4602.CrossRef Eng, J. K., Fischer, B., Grossmann, J., & Maccoss, M. J. (2008). A fast SEQUEST cross correlation algorithm. Journal of Proteome Research, 7(10), 4598–4602.CrossRef
4.
go back to reference Diament, B. J., & Noble, W. S. (2011). Faster SEQUEST searching for peptide identification from tandem mass spectra. Journal of Proteome Research, 10(9), 3871–3879.CrossRef Diament, B. J., & Noble, W. S. (2011). Faster SEQUEST searching for peptide identification from tandem mass spectra. Journal of Proteome Research, 10(9), 3871–3879.CrossRef
5.
go back to reference Craig, R., & Beavis, R. C. (2004). TANDEM: Matching proteins with tandem mass spectra. Bioinformatics, 20(9), 1466–1467.CrossRef Craig, R., & Beavis, R. C. (2004). TANDEM: Matching proteins with tandem mass spectra. Bioinformatics, 20(9), 1466–1467.CrossRef
6.
go back to reference Perkins, D. N., Pappin, D. J., Creasy, D. M., & Cottrell, J. S. (1999). Probability-based protein identification by searching sequence databases using mass spectrometry. Electrophoresis, 20(18), 3551–3567.CrossRef Perkins, D. N., Pappin, D. J., Creasy, D. M., & Cottrell, J. S. (1999). Probability-based protein identification by searching sequence databases using mass spectrometry. Electrophoresis, 20(18), 3551–3567.CrossRef
7.
go back to reference Clauser, K. R., Baker, P., & Burlingame, A. L. (1999). Role of accurate mass measurement (+/− 10 ppm) in protein identification strategies employing MS or MS/MS and database searching. Analytical Chemistry, 71(14), 2871–2882.CrossRef Clauser, K. R., Baker, P., & Burlingame, A. L. (1999). Role of accurate mass measurement (+/− 10 ppm) in protein identification strategies employing MS or MS/MS and database searching. Analytical Chemistry, 71(14), 2871–2882.CrossRef
8.
go back to reference Kim, S., Gupta, N., & Pevzner, P. A. (2008). Spectral probabilities and generating functions of tandem mass spectra: A strike against decoy databases. Journal of Proteome Research, 7(8), 3354–3363.CrossRef Kim, S., Gupta, N., & Pevzner, P. A. (2008). Spectral probabilities and generating functions of tandem mass spectra: A strike against decoy databases. Journal of Proteome Research, 7(8), 3354–3363.CrossRef
9.
go back to reference Swaney, D. L., Wenger, C. D., & Coon, J. J. (2010). Value of using multiple proteases for large-scale mass spectrometry-based proteomics. Journal of Proteome Research, 9(3), 1323–1329.CrossRef Swaney, D. L., Wenger, C. D., & Coon, J. J. (2010). Value of using multiple proteases for large-scale mass spectrometry-based proteomics. Journal of Proteome Research, 9(3), 1323–1329.CrossRef
10.
go back to reference Granholm, V., Kim, S., Navarro, J. C. F., Sjolund, E., Smith, R. D., & Kall, L. (2014). Fast and accurate database searches with MSGF+ Percolator. Journal of Proteome Research, 13(2), 890–897.CrossRef Granholm, V., Kim, S., Navarro, J. C. F., Sjolund, E., Smith, R. D., & Kall, L. (2014). Fast and accurate database searches with MSGF+ Percolator. Journal of Proteome Research, 13(2), 890–897.CrossRef
11.
go back to reference Keller, A., Purvine, S., Nesvizhskii, A. I., Stolyar, S., Goodlett, D. R., & Kolker, E. (2002). Experimental protein mixture for validating tandem mass spectral analysis. Omics, 6(2), 207–212.CrossRef Keller, A., Purvine, S., Nesvizhskii, A. I., Stolyar, S., Goodlett, D. R., & Kolker, E. (2002). Experimental protein mixture for validating tandem mass spectral analysis. Omics, 6(2), 207–212.CrossRef
12.
go back to reference Nesvizhskii, A. I., & Aebersold, R. (2004). Analysis, statistical validation and dissemination of large-scale proteomics data sets generated by tandem MS. Drug Discovery Today, 9(4), 173–181.CrossRef Nesvizhskii, A. I., & Aebersold, R. (2004). Analysis, statistical validation and dissemination of large-scale proteomics data sets generated by tandem MS. Drug Discovery Today, 9(4), 173–181.CrossRef
13.
go back to reference Nesvizhskii, A. I., Keller, A., Kolker, E., & Aebersold, R. (2003). A statistical model for identifying proteins by tandem mass spectrometry. Analytical Chemistry, 75(17), 4646–4658.CrossRef Nesvizhskii, A. I., Keller, A., Kolker, E., & Aebersold, R. (2003). A statistical model for identifying proteins by tandem mass spectrometry. Analytical Chemistry, 75(17), 4646–4658.CrossRef
14.
go back to reference Shen, C., Wang, Z., Shankar, G., Zhang, X., & Li, L. (2008). A hierarchical statistical model to assess the confidence of peptides and proteins inferred from tandem mass spectrometry. Bioinformatics, 24(2), 202–208.CrossRef Shen, C., Wang, Z., Shankar, G., Zhang, X., & Li, L. (2008). A hierarchical statistical model to assess the confidence of peptides and proteins inferred from tandem mass spectrometry. Bioinformatics, 24(2), 202–208.CrossRef
15.
go back to reference Sikdar, S., Gill, R., & Datta, S. (2015). Improving protein identification from tandem mass spectrometry data by one-step methods and integrating data from other platforms. Briefings in Bioinformatics, 17(2), 262–269. Sikdar, S., Gill, R., & Datta, S. (2015). Improving protein identification from tandem mass spectrometry data by one-step methods and integrating data from other platforms. Briefings in Bioinformatics, 17(2), 262–269.
16.
go back to reference Keller, A., Nesvizhskii, A. I., Kolker, E., & Aebersold, R. (2002). Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Analytical Chemistry, 74(20), 5383–5592.CrossRef Keller, A., Nesvizhskii, A. I., Kolker, E., & Aebersold, R. (2002). Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Analytical Chemistry, 74(20), 5383–5592.CrossRef
17.
go back to reference Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction. New York: Springer.CrossRefMATH Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction. New York: Springer.CrossRefMATH
18.
go back to reference Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1), 1–38.MathSciNetMATH Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1), 1–38.MathSciNetMATH
19.
go back to reference Shteynberg, D., Deutsch, E. W., Lam, H., Eng, J. K., Sun, Z., Tasman, N., et al. (2011). iProphet: Multi-level integrative analysis of shotgun proteomic data improves peptide and protein identification rates and error estimates. Molecular & Cellular Proteomics, 10(12), 1–15.CrossRef Shteynberg, D., Deutsch, E. W., Lam, H., Eng, J. K., Sun, Z., Tasman, N., et al. (2011). iProphet: Multi-level integrative analysis of shotgun proteomic data improves peptide and protein identification rates and error estimates. Molecular & Cellular Proteomics, 10(12), 1–15.CrossRef
20.
go back to reference Mitra, R., Gill, R., Sikdar, S., & Datta, S. (2015). Bayesian hierarchical model for protein identifications. Under review. Mitra, R., Gill, R., Sikdar, S., & Datta, S. (2015). Bayesian hierarchical model for protein identifications. Under review.
21.
go back to reference Li, Q., MacCoss, M., & Stephens, M. (2010). A nested mixture model for protein identification using mass spectrometry. The Annals of Applied Statistics, 4(2), 962–987.MathSciNetCrossRefMATH Li, Q., MacCoss, M., & Stephens, M. (2010). A nested mixture model for protein identification using mass spectrometry. The Annals of Applied Statistics, 4(2), 962–987.MathSciNetCrossRefMATH
22.
go back to reference Huang, T., Wang, J., Yu, W., & He, Z. (2012). Protein inference: A review. Briefings in Bioinformatics, 13(5), 586–614.CrossRef Huang, T., Wang, J., Yu, W., & He, Z. (2012). Protein inference: A review. Briefings in Bioinformatics, 13(5), 586–614.CrossRef
23.
go back to reference Nesvizhskii, A. I., Vitek, O., & Aebersold, R. (2007). Analysis and validation of proteomic data generated by tandem mass spectrometry. Nature Methods, 4(10), 787–797.CrossRef Nesvizhskii, A. I., Vitek, O., & Aebersold, R. (2007). Analysis and validation of proteomic data generated by tandem mass spectrometry. Nature Methods, 4(10), 787–797.CrossRef
24.
go back to reference Serang, O., & Noble, W. (2012). A review of statistical methods for protein identification using tandem mass spectrometry. Stat Interface, 5(1), 3–20.MathSciNetCrossRefMATH Serang, O., & Noble, W. (2012). A review of statistical methods for protein identification using tandem mass spectrometry. Stat Interface, 5(1), 3–20.MathSciNetCrossRefMATH
25.
go back to reference Bern, M. W., & Kil, Y. J. (2011). Two-dimensional target decoy strategy for shotgun proteomics. Journal of Proteome Research, 10(12), 5296–5301.CrossRef Bern, M. W., & Kil, Y. J. (2011). Two-dimensional target decoy strategy for shotgun proteomics. Journal of Proteome Research, 10(12), 5296–5301.CrossRef
26.
go back to reference Shi, J., & Wu, F.-X. (2012). A feedback framework for protein inference with peptides identified from tandem mass spectra. Proteome Science, 10, 68.CrossRef Shi, J., & Wu, F.-X. (2012). A feedback framework for protein inference with peptides identified from tandem mass spectra. Proteome Science, 10, 68.CrossRef
27.
go back to reference Shi, J., Chen, B., & Wu, F.-X. (2013). Unifying protein inference and peptide identification with feedback to update consistency between peptides. Proteomics, 13(2), 239–247.CrossRef Shi, J., Chen, B., & Wu, F.-X. (2013). Unifying protein inference and peptide identification with feedback to update consistency between peptides. Proteomics, 13(2), 239–247.CrossRef
28.
go back to reference Spivak, M., Weston, J., Tomazela, D., Maccoss, M. J., & Noble, W. S. (2012). Direct maximization of protein identifications from tandem mass spectra. Molecular & Cellular Proteomics, 11(2), M111.012161. Spivak, M., Weston, J., Tomazela, D., Maccoss, M. J., & Noble, W. S. (2012). Direct maximization of protein identifications from tandem mass spectra. Molecular & Cellular Proteomics, 11(2), M111.012161.
29.
go back to reference Purvine, S., Picone, A. F., & Kolker, E. (2004). Standard mixtures for proteome studies. OMICS, 8(1), 79–92.CrossRef Purvine, S., Picone, A. F., & Kolker, E. (2004). Standard mixtures for proteome studies. OMICS, 8(1), 79–92.CrossRef
30.
go back to reference Elias, J. E., Haas, W., Faherty, B. K., & Gygi, S. P. (2005). Comparative evaluation of mass spectrometry platforms used in large-scale proteomics investigations. Nature Methods, 2(9), 667–675.CrossRef Elias, J. E., Haas, W., Faherty, B. K., & Gygi, S. P. (2005). Comparative evaluation of mass spectrometry platforms used in large-scale proteomics investigations. Nature Methods, 2(9), 667–675.CrossRef
31.
go back to reference Kall, L., Canterbury, J., Weston, J., Noble, M. J., & MacCoss, W. S. (2007). A semi-supervised machine learning technique for peptide identification from shotgun proteomics datasets. Nature Methods, 4, 923–925.CrossRef Kall, L., Canterbury, J., Weston, J., Noble, M. J., & MacCoss, W. S. (2007). A semi-supervised machine learning technique for peptide identification from shotgun proteomics datasets. Nature Methods, 4, 923–925.CrossRef
Metadata
Title
Probabilistic and Likelihood-Based Methods for Protein Identification from MS/MS Data
Authors
Ryan Gill
Susmita Datta
Copyright Year
2017
DOI
https://doi.org/10.1007/978-3-319-45809-0_4

Premium Partner