Skip to main content

2017 | OriginalPaper | Buchkapitel

On the Combination of Omics Data for Prediction of Binary Outcomes

verfasst von : Mar Rodríguez-Girondo, Alexia Kakourou, Perttu Salo, Markus Perola, Wilma E. Mesker, Rob A. E. M. Tollenaar, Jeanine Houwing-Duistermaat, Bart J. A. Mertens

Erschienen in: Statistical Analysis of Proteomics, Metabolomics, and Lipidomics Data Using Mass Spectrometry

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Enrichment of predictive models with new biomolecular markers is an important task in high-dimensional omic applications. Increasingly, clinical studies include several sets of such omics markers available for each patient, measuring different levels of biological variation. As a result, one of the main challenges in predictive research is the integration of different sources of omic biomarkers for the prediction of health traits. We review several approaches for the combination of omic markers in the context of binary outcome prediction, all based on double cross-validation and regularized regression models. We evaluate their performance in terms of calibration and discrimination and we compare their performance with respect to single-omic source predictions. We illustrate the methods through the analysis of two real datasets. On the one hand, we consider the combination of two fractions of proteomic mass spectrometry for the calibration of a diagnostic rule for the detection of early stage breast cancer. On the other hand, we consider transcriptomics and metabolomics as predictors of obesity using data from the Dietary, Lifestyle, and Genetic determinants of Obesity and Metabolic syndrome (DILGOM) study, a population-based cohort, from Finland.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The authors “Mar Rodríguez-Girondo” and “Alexia Kakourou” contributed equally to this work.
 
Literatur
2.
Zurück zum Zitat Bühlmann, P., & Hothorn, T. (2007). Boosting algorithms: Regularization, prediction and model fitting. Statistical Science, 22, 477–505.MathSciNetCrossRefMATH Bühlmann, P., & Hothorn, T. (2007). Boosting algorithms: Regularization, prediction and model fitting. Statistical Science, 22, 477–505.MathSciNetCrossRefMATH
3.
Zurück zum Zitat Cox, D. R. (1958). Two further applications of a model for binary regression. Biometrika, 45, 562–565.CrossRefMATH Cox, D. R. (1958). Two further applications of a model for binary regression. Biometrika, 45, 562–565.CrossRefMATH
4.
Zurück zum Zitat de Noo, M. E., Deelder, A. M., Mertens, B. J. A., Ozalp, A., Bladergroen, M. R., van der Werff, M. P. J., & Tollenaar, R. A. E. M. (2005). Detection of colorectal cancer using MALDI-TOF serum protein profiling. European Journal of Cancer, 42, 1068–1076.CrossRef de Noo, M. E., Deelder, A. M., Mertens, B. J. A., Ozalp, A., Bladergroen, M. R., van der Werff, M. P. J., & Tollenaar, R. A. E. M. (2005). Detection of colorectal cancer using MALDI-TOF serum protein profiling. European Journal of Cancer, 42, 1068–1076.CrossRef
5.
Zurück zum Zitat Hand, D. J. (1997). Construction and assessment of classification rules. Chichester: Wiley.MATH Hand, D. J. (1997). Construction and assessment of classification rules. Chichester: Wiley.MATH
7.
Zurück zum Zitat Hastie, T., Tibshirani, R., & Friedman, J. (2001). Elements of statistical learning: Data mining, inference, and prediction. Springer series in statistic. New York: Springer Hastie, T., Tibshirani, R., & Friedman, J. (2001). Elements of statistical learning: Data mining, inference, and prediction. Springer series in statistic. New York: Springer
8.
Zurück zum Zitat Hoerl, A. E., & Kennard, R. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12, 55–67.CrossRefMATH Hoerl, A. E., & Kennard, R. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12, 55–67.CrossRefMATH
9.
Zurück zum Zitat Inouye, M., Kettunen, J., Soininen, P., Silander, K., Ripatti, S., et al. (2010). Metabonomic, transcriptomic, and genomic variation of a population cohort. Molecular Systems Biology, 6, 441. Inouye, M., Kettunen, J., Soininen, P., Silander, K., Ripatti, S., et al. (2010). Metabonomic, transcriptomic, and genomic variation of a population cohort. Molecular Systems Biology, 6, 441.
10.
Zurück zum Zitat Inouye, M., Silander, K., Hamalainen, E., Salomaa, V., Harald, K., Jousilahti, P., et al. (2010). An immune response network associated with blood lipid levels. Plos Genetics, 6, e1001113. doi:10.1371/journal.pgen.1001113. Inouye, M., Silander, K., Hamalainen, E., Salomaa, V., Harald, K., Jousilahti, P., et al. (2010). An immune response network associated with blood lipid levels. Plos Genetics, 6, e1001113. doi:10.​1371/​journal.​pgen.​1001113.
11.
Zurück zum Zitat Jonathan, P., Krzanowski, W. J., & McCarthy, M. V. (2000). On the use of cross-validation to assess performance in multivariate prediction. Statistics and Computing, 10, 209–229.CrossRef Jonathan, P., Krzanowski, W. J., & McCarthy, M. V. (2000). On the use of cross-validation to assess performance in multivariate prediction. Statistics and Computing, 10, 209–229.CrossRef
12.
Zurück zum Zitat Kakourou, A., Vach, W., & Mertens B. (2014). Combination approaches improve predictive performance of diagnostic rules for mass-spectrometry proteomic data. Journal of Computational Biology, 21, 898–914.CrossRef Kakourou, A., Vach, W., & Mertens B. (2014). Combination approaches improve predictive performance of diagnostic rules for mass-spectrometry proteomic data. Journal of Computational Biology, 21, 898–914.CrossRef
13.
Zurück zum Zitat Kneib, T., Hothorn, T., & Tutz, G. (2009). Variable selection and model choice in geoadditive regression models. Biometrics, 65, 626–634.MathSciNetCrossRefMATH Kneib, T., Hothorn, T., & Tutz, G. (2009). Variable selection and model choice in geoadditive regression models. Biometrics, 65, 626–634.MathSciNetCrossRefMATH
14.
Zurück zum Zitat Leblanc, M., & Tibshirani, R. (1996). Combining estimates in regression and classification. Journal of the American Statistical Association, 91, 1641–1650.MathSciNetMATH Leblanc, M., & Tibshirani, R. (1996). Combining estimates in regression and classification. Journal of the American Statistical Association, 91, 1641–1650.MathSciNetMATH
15.
Zurück zum Zitat Le Cessie, S., & van Houwelingen, J. C. (1992). Ridge estimators in logistic regression. Applied Statistics, 41, 191–201.CrossRefMATH Le Cessie, S., & van Houwelingen, J. C. (1992). Ridge estimators in logistic regression. Applied Statistics, 41, 191–201.CrossRefMATH
16.
Zurück zum Zitat Liu, H., DÁndrade, P., Fulmer-Smentek, S., Lorenzi, P., Kohn, K. W., Weinstein, J. N., Pommier, Y., & Reinhold, W. C. (2010). mRNA and microRNA expression profiles of the NCI-60 integrated with drug activities. Molecular Cancer Therapeutics, 9, 1080–1091.CrossRef Liu, H., DÁndrade, P., Fulmer-Smentek, S., Lorenzi, P., Kohn, K. W., Weinstein, J. N., Pommier, Y., & Reinhold, W. C. (2010). mRNA and microRNA expression profiles of the NCI-60 integrated with drug activities. Molecular Cancer Therapeutics, 9, 1080–1091.CrossRef
17.
Zurück zum Zitat Meier, L., van de Geer, S., & Bühlmann, P. (2008). The group lasso for logistic regression. Journal of the Royal Statistical Society B, 70, 53–71.MathSciNetCrossRefMATH Meier, L., van de Geer, S., & Bühlmann, P. (2008). The group lasso for logistic regression. Journal of the Royal Statistical Society B, 70, 53–71.MathSciNetCrossRefMATH
18.
Zurück zum Zitat Mertens, B. J. A. (2003). Microarrays, pattern recognition and exploratory data analysis. Statistics in Medicine, 22, 1879–1899CrossRef Mertens, B. J. A. (2003). Microarrays, pattern recognition and exploratory data analysis. Statistics in Medicine, 22, 1879–1899CrossRef
19.
Zurück zum Zitat Mertens, B. J. A., de Noo, M. E., Tollenaar, R. A. E. M., & Deelder, A. M. (2006). Mass spectrometry proteomic diagnosis: Enacting the double cross validatory paradigm. Journal of Computational Biology, 13, 1591–1605.MathSciNetCrossRef Mertens, B. J. A., de Noo, M. E., Tollenaar, R. A. E. M., & Deelder, A. M. (2006). Mass spectrometry proteomic diagnosis: Enacting the double cross validatory paradigm. Journal of Computational Biology, 13, 1591–1605.MathSciNetCrossRef
20.
Zurück zum Zitat Mertens, B. J. A., van der Burgt, Y. E. M., Velstra, B., Mesker, W. E., Tollenaar, R. A. E. M., & Deelder, A. M. (2011). On the use of double cross-validation for the combination of proteomic mass spectral data for enhanced diagnosis and prediction. Statistics and Probability Letters, 81, 759–766.MathSciNetCrossRefMATH Mertens, B. J. A., van der Burgt, Y. E. M., Velstra, B., Mesker, W. E., Tollenaar, R. A. E. M., & Deelder, A. M. (2011). On the use of double cross-validation for the combination of proteomic mass spectral data for enhanced diagnosis and prediction. Statistics and Probability Letters, 81, 759–766.MathSciNetCrossRefMATH
21.
Zurück zum Zitat Pencina, M. J., D’Agostino, R. B. Sr., D’Agostino, R. B. Jr., & Vasan, R. S. (2008). Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond. Statistics in Medicine, 27, 157–172.MathSciNetCrossRef Pencina, M. J., D’Agostino, R. B. Sr., D’Agostino, R. B. Jr., & Vasan, R. S. (2008). Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond. Statistics in Medicine, 27, 157–172.MathSciNetCrossRef
22.
Zurück zum Zitat Pepe, M. S., Kerr, K. F., Longton, G., & Wang, Z. (2013). Testing for improvement in prediction model performance. Statistics in Medicine, 32, 1467–1482.MathSciNetCrossRef Pepe, M. S., Kerr, K. F., Longton, G., & Wang, Z. (2013). Testing for improvement in prediction model performance. Statistics in Medicine, 32, 1467–1482.MathSciNetCrossRef
23.
Zurück zum Zitat Rodríguez-Girondo, M., Salo, P., Burzykowski, T., Perola, M., Houwing-Duistermaat, J. J., & Mertens, B. (2016) Sequential double cross-validation for augmented prediction assessment in high-dimensional omic applications. Working Paper in ArXiv. arXiv:1601.08197v1. Rodríguez-Girondo, M., Salo, P., Burzykowski, T., Perola, M., Houwing-Duistermaat, J. J., & Mertens, B. (2016) Sequential double cross-validation for augmented prediction assessment in high-dimensional omic applications. Working Paper in ArXiv. arXiv:1601.08197v1.
24.
Zurück zum Zitat Steyerberg, E. W., Vickers, A. J., Cook, N. R., Gerds, T., Gonen, M., Obuchowski, N., Pencina, M. J., & Kattan, M. W. (2010). Assessing the performance of prediction models: A framework for some traditional and novel measures. Epidemiology, 21, 128–138.CrossRef Steyerberg, E. W., Vickers, A. J., Cook, N. R., Gerds, T., Gonen, M., Obuchowski, N., Pencina, M. J., & Kattan, M. W. (2010). Assessing the performance of prediction models: A framework for some traditional and novel measures. Epidemiology, 21, 128–138.CrossRef
25.
Zurück zum Zitat Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions (with discussion). Journal of the Royal Statistical Society. Series B, 36, 111–147.MathSciNetMATH Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions (with discussion). Journal of the Royal Statistical Society. Series B, 36, 111–147.MathSciNetMATH
26.
Zurück zum Zitat Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58, 267–288.MathSciNetMATH Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58, 267–288.MathSciNetMATH
27.
Zurück zum Zitat Tutz, G., & Binder, H. (2006). Generalized additive modeling with implicit variable selection by likelihood-based boosting. Biometrics, 62, 961–971.MathSciNetCrossRefMATH Tutz, G., & Binder, H. (2006). Generalized additive modeling with implicit variable selection by likelihood-based boosting. Biometrics, 62, 961–971.MathSciNetCrossRefMATH
28.
Zurück zum Zitat van de Wiel, M. A., Lien, T. G., Verlaat, W., van Wieringen, W. N., & Wilting, S. M. (2015). Better prediction by use of co-data: Adaptive group-regularized ridge regression Statistics in Medicine, 35, 368–381. van de Wiel, M. A., Lien, T. G., Verlaat, W., van Wieringen, W. N., & Wilting, S. M. (2015). Better prediction by use of co-data: Adaptive group-regularized ridge regression Statistics in Medicine, 35, 368–381.
29.
Zurück zum Zitat van der Laan, M. J., Polley, E. C., & Hubbard, A. E. (2007). Super learner. U.C. Berkeley Division of Biostatistics Working Paper Series, Working Paper 222. van der Laan, M. J., Polley, E. C., & Hubbard, A. E. (2007). Super learner. U.C. Berkeley Division of Biostatistics Working Paper Series, Working Paper 222.
30.
Zurück zum Zitat Vickers, A. J., & Elkin, E. B. (2006). Decision curve analysis: A novel method for evaluating prediction models. Medical Decision Making, 26, 565–574.CrossRef Vickers, A. J., & Elkin, E. B. (2006). Decision curve analysis: A novel method for evaluating prediction models. Medical Decision Making, 26, 565–574.CrossRef
31.
Zurück zum Zitat Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society. Series B, 67, 301–320.MathSciNetCrossRefMATH Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society. Series B, 67, 301–320.MathSciNetCrossRefMATH
Metadaten
Titel
On the Combination of Omics Data for Prediction of Binary Outcomes
verfasst von
Mar Rodríguez-Girondo
Alexia Kakourou
Perttu Salo
Markus Perola
Wilma E. Mesker
Rob A. E. M. Tollenaar
Jeanine Houwing-Duistermaat
Bart J. A. Mertens
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-45809-0_14

Premium Partner