Skip to main content
Top

2017 | OriginalPaper | Chapter

Robust and Confident Predictor Selection in Metabolomics

Authors : J. A. Hageman, B. Engel, Ric C. H. de Vos, Roland Mumm, Robert D. Hall, H. Jwanro, D. Crouzillat, J. C. Spadone, F. A. van Eeuwijk

Published in: Statistical Analysis of Proteomics, Metabolomics, and Lipidomics Data Using Mass Spectrometry

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Metabolomics is a proven tool to obtain information about differences in food stuffs and to select biochemical markers for sensory quality of food products. A valuable application of untargeted metabolomics is the selection of metabolites that are (highly) predictive for sensory or phenotypical traits for use as (bio) markers. This chapter demonstrates how to robustly select key metabolites and evaluate their predictive properties. The proposed approach constrains the number of selected metabolites, searching for an optimal number of predictive metabolites by cross-validation. This mitigates the problem of selection of spurious metabolites. It also enables straightforward use of linear regression. In the present implementation simple forward selection is used. In concert with a second cross-validation to assess the predictive power of the selected set of metabolites, the proposed method involves two leave-one-out cross-validations and will be referred to as LOO2CV. In the second leave-one-out cross-validation a multitude of regression models is generated. This offers additional information that is potentially useful for selection of key metabolites in the spirit of stability selection. The proposed LOO2CV approach is illustrated with sensory and large-scale metabolomics data from a set of 76 different cocoa liquors. The proposed approach is compared with conventional stepwise regression and stepwise regression in concert with cross-validation for evaluation of predictive power of the model.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Hall, R. D. (2011). Biology of plant metabolomics. In R. D. Hall (Ed.), Annual plant reviews (Vol. 43). Oxford: Wiley. Hall, R. D. (2011). Biology of plant metabolomics. In R. D. Hall (Ed.), Annual plant reviews (Vol. 43). Oxford: Wiley.
2.
go back to reference Keurentjes, J. J. B., et al. (2006). The genetics of plant metabolism. Nature Genetics, 38(7), 842–849.CrossRef Keurentjes, J. J. B., et al. (2006). The genetics of plant metabolism. Nature Genetics, 38(7), 842–849.CrossRef
3.
go back to reference Moing, A., et al. (2011). Extensive metabolic cross-talk in melon fruit revealed by spatial and developmental combinatorial metabolomics. New Phytologist, 190(3), 683–696.CrossRef Moing, A., et al. (2011). Extensive metabolic cross-talk in melon fruit revealed by spatial and developmental combinatorial metabolomics. New Phytologist, 190(3), 683–696.CrossRef
4.
go back to reference Tikunov, Y. M., et al. (2010). A role for differential glycoconjugation in the emission of phenylpropanoid volatiles from tomato fruit discovered using a metabolic data fusion approach. Plant Physiology, 152(1), 55–70.CrossRef Tikunov, Y. M., et al. (2010). A role for differential glycoconjugation in the emission of phenylpropanoid volatiles from tomato fruit discovered using a metabolic data fusion approach. Plant Physiology, 152(1), 55–70.CrossRef
5.
go back to reference Gupta, A. J., et al. (2014). Chemometric analysis of soy protein hydrolysates used in animal cell culture for IgG production - An untargeted metabolomics approach. Process Biochemistry, 49(2), 309–317.CrossRef Gupta, A. J., et al. (2014). Chemometric analysis of soy protein hydrolysates used in animal cell culture for IgG production - An untargeted metabolomics approach. Process Biochemistry, 49(2), 309–317.CrossRef
6.
go back to reference Lindinger, C., et al. (2009). Identification of ethyl formate as a quality marker of the fermented off-note in coffee by a nontargeted chemometric approach. Journal of Agricultural and Food Chemistry, 57(21), 9972–9978.CrossRef Lindinger, C., et al. (2009). Identification of ethyl formate as a quality marker of the fermented off-note in coffee by a nontargeted chemometric approach. Journal of Agricultural and Food Chemistry, 57(21), 9972–9978.CrossRef
7.
go back to reference Capanoglu, E., et al. (2008). Changes in antioxidant and metabolite profiles during production of tomato paste. Journal of Agricultural and Food Chemistry, 56(3), 964–973.CrossRef Capanoglu, E., et al. (2008). Changes in antioxidant and metabolite profiles during production of tomato paste. Journal of Agricultural and Food Chemistry, 56(3), 964–973.CrossRef
8.
go back to reference Hendriks, M., et al. (2011). Data-processing strategies for metabolomics studies. Trac-Trends in Analytical Chemistry, 30(10), 1685–1698.CrossRef Hendriks, M., et al. (2011). Data-processing strategies for metabolomics studies. Trac-Trends in Analytical Chemistry, 30(10), 1685–1698.CrossRef
9.
go back to reference Jelizarow, M., et al. (2010). Over-optimism in bioinformatics: An illustration. Bioinformatics, 26(16), 1990–1998.CrossRef Jelizarow, M., et al. (2010). Over-optimism in bioinformatics: An illustration. Bioinformatics, 26(16), 1990–1998.CrossRef
10.
go back to reference Wehrens, R., et al. (2011). Stability-based biomarker selection. Analytica Chimica Acta, 705(1–2), 15–23.CrossRef Wehrens, R., et al. (2011). Stability-based biomarker selection. Analytica Chimica Acta, 705(1–2), 15–23.CrossRef
11.
go back to reference Hageman, J. A., et al. (2008). Simplivariate models: Ideas and first examples. PLoS One, 3(9). Hageman, J. A., et al. (2008). Simplivariate models: Ideas and first examples. PLoS One, 3(9).
12.
go back to reference Montgomery, D., & Peck, E. (1982). Introduction to linear regression analysis. Wiley. Montgomery, D., & Peck, E. (1982). Introduction to linear regression analysis. Wiley.
13.
go back to reference Efron, B., & Tibshirani, R. (1993). An introduction to the bootstrap. Monographs on statistics and applied probability (Vol. 57). Chapman & Hall. Efron, B., & Tibshirani, R. (1993). An introduction to the bootstrap. Monographs on statistics and applied probability (Vol. 57). Chapman & Hall.
14.
go back to reference Westerhuis, J. A., et al. (2008). Assessment of PLSDA cross validation. Metabolomics, 4(1), 81–89.CrossRef Westerhuis, J. A., et al. (2008). Assessment of PLSDA cross validation. Metabolomics, 4(1), 81–89.CrossRef
15.
go back to reference Smit, S., et al. (2007). Assessing the statistical validity of proteomics based biomarkers. Analytica Chimica Acta, 592(2), 210–217.CrossRef Smit, S., et al. (2007). Assessing the statistical validity of proteomics based biomarkers. Analytica Chimica Acta, 592(2), 210–217.CrossRef
16.
go back to reference Abeel, T., et al. (2010). Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics, 26(3), 392–398.CrossRef Abeel, T., et al. (2010). Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics, 26(3), 392–398.CrossRef
17.
go back to reference Meinshausen, N., & Buhlmann, P. (2010). Stability selection. Journal of the Royal Statistical Society Series B-Statistical Methodology, 72, 417–473.MathSciNetCrossRef Meinshausen, N., & Buhlmann, P. (2010). Stability selection. Journal of the Royal Statistical Society Series B-Statistical Methodology, 72, 417–473.MathSciNetCrossRef
18.
go back to reference Menendez, P., et al. (2012). Penalized regression techniques for modeling relationships between metabolites and tomato taste attributes. Euphytica, 183(3), 379–387.CrossRef Menendez, P., et al. (2012). Penalized regression techniques for modeling relationships between metabolites and tomato taste attributes. Euphytica, 183(3), 379–387.CrossRef
19.
go back to reference Vandeginste, B. G. M., et al. Handbook of chemometrics. Data handling in science and technology (Vol. 20B). Amsterdam: Elsevier. Vandeginste, B. G. M., et al. Handbook of chemometrics. Data handling in science and technology (Vol. 20B). Amsterdam: Elsevier.
20.
go back to reference Hageman, J. A., et al. (2003). Wavelength selection with tabu search. Journal of Chemometrics, 17(8–9), 427–437.CrossRef Hageman, J. A., et al. (2003). Wavelength selection with tabu search. Journal of Chemometrics, 17(8–9), 427–437.CrossRef
21.
go back to reference Furnival, G. M., & Wilson, R. W. (1974). Regressions by leaps and bounds. Technometrics, 16(4), 499–511.CrossRefMATH Furnival, G. M., & Wilson, R. W. (1974). Regressions by leaps and bounds. Technometrics, 16(4), 499–511.CrossRefMATH
22.
go back to reference Hammami, D., et al. (2012). Predictor selection for downscaling GCM data with LASSO. Journal of Geophysical Research-Atmospheres, 117, 1–11.CrossRef Hammami, D., et al. (2012). Predictor selection for downscaling GCM data with LASSO. Journal of Geophysical Research-Atmospheres, 117, 1–11.CrossRef
23.
go back to reference Neter, et al. (1996). Applied linear statistical models. Irwin. Neter, et al. (1996). Applied linear statistical models. Irwin.
24.
go back to reference Tukey, J. W. (1958). Bias and confidence in not-quite large samples. Annals of Mathematical Statistics, 29(2), 614. CrossRef Tukey, J. W. (1958). Bias and confidence in not-quite large samples. Annals of Mathematical Statistics, 29(2), 614. CrossRef
25.
go back to reference De Vos, R. C. H., et al. (2007). Untargeted large-scale plant metabolomics using liquid chromatography coupled to mass spectrometry. Nature Protocols, 2(4), 778–791.CrossRef De Vos, R. C. H., et al. (2007). Untargeted large-scale plant metabolomics using liquid chromatography coupled to mass spectrometry. Nature Protocols, 2(4), 778–791.CrossRef
26.
go back to reference Tikunov, Y. M., et al. (2012). MSClust: A tool for unsupervised mass spectra extraction of chromatography-mass spectrometry ion-wise aligned data. Metabolomics, 8(4), 714–718.CrossRef Tikunov, Y. M., et al. (2012). MSClust: A tool for unsupervised mass spectra extraction of chromatography-mass spectrometry ion-wise aligned data. Metabolomics, 8(4), 714–718.CrossRef
27.
Metadata
Title
Robust and Confident Predictor Selection in Metabolomics
Authors
J. A. Hageman
B. Engel
Ric C. H. de Vos
Roland Mumm
Robert D. Hall
H. Jwanro
D. Crouzillat
J. C. Spadone
F. A. van Eeuwijk
Copyright Year
2017
DOI
https://doi.org/10.1007/978-3-319-45809-0_13

Premium Partner