Abstract
Identifying biomarkers that are indicative of a phenotypic state is difficult because of the amount of natural variability which exists in any population. While there are many different algorithms to select biomarkers, previous investigation shows the sensitivity and flexibility of support vector machines (SVM) make them an attractive candidate. Here we evaluate the ability of support vector machine recursive feature elimination (SVM-RFE) to identify potential metabolic biomarkers in liquid chromatography mass spectrometry untargeted metabolite datasets. Two separate experiments are considered, a low variance (low biological noise) prokaryotic stress experiment, and a high variance (high biological noise) mammalian stress experiment. For each experiment, the phenotypic response to stress is metabolically characterized. SVM-based classification and metabolite ranking is undertaken using a systematically reduced number of biological replicates to evaluate the impact of sample size on biomarker reproducibility and robustness. Our results indicate the highest ranked 1 % of metabolites, the most predictive of the physiological state, were identified by SVM-RFE even when the number of training examples was small (≥3) and the coefficient of variation was high (>0.5). An accuracy analysis shows filtering with recursive feature elimination measurably improves SVM classification accuracy, an effect that is pronounced when the number of training examples is small. These results indicate that SVM-RFE can be successful at biomarker identification even in challenging scenarios where the training examples are noisy and the number of biological replicates is low.
Similar content being viewed by others
References
Bertini, I., Calabro, A., De Carli, V., Luchinat, C., Nepi, S., Porfirio, B., et al. (2009). The metabonomic signature of celiac disease. Journal of Proteome Research, 8, 170–177.
Duan, K., Rajapakse, J. C., et al. (2005). Multiple SVM-RFE for gene selection in cancer classification with expression data. IEEE Transactions on Nanobioscience, 4, 228–234.
Guan, W., Zhou, M., Hampton, C. Y., Benigno, B. B., Walker, L. D., Gray, A., et al. (2009). Ovarian cancer detection from metabolomic liquid chromatography/mass spectrometry data by support vector machines. BMC Bioinformatics, 10, 259.
Guyon, I., Weston, J., Barnhill, S., et al. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46, 389–422.
Hall, M., National, H., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., et al. (2010). The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter, 11(1), 10–18.
Haug, K., Salek, R. M., Conesa, P., Hastings, J., de Matos, P., Rijnbeek, M., et al. (2013). Metabolights—an open-access general-purpose repository for metabolomics studies and associated meta-data. Nucleic Acids Research, 41, D781–D786.
Heinemann, J., Hamerly, T., Maaty, W. S., Movahed, N., Steffens, J. D., Reeves, B. D., et al. (2014). Expanding the paradigm of thiol redox in the thermophilic root of life. Biochimca et Biophysica Acta, 1840, 80–85.
Herder, C., Karakas, M., Koenig, W., et al. (2011). Biomarkers for the prediction of type 2 diabetes and cardiovascular disease. Clinical Pharmacology and Therapeutics, 90(1), 52–66.
Lin, X., Wang, Q., Yin, P., Tang, L., Tan, Y., Li, H., et al. (2011). A method for handling metabonomics data from liquid chromatography/mass spectrometry: Combinational use of support vector machine recursive feature elimination, genetic algorithm and random forest for feature selection. Metabolomics, 7(4), 549–558.
Lusczek, E. R., Nelson, T., Lexcen, D., Witowski, N. E., Mulier, K. E., et al. (2011). Urine metabolomics in hemorrhagic shock: Normalization of urine in the face of changing intravascular fluid volume and perturbations in metabolism. Bioanalysis and Biomedicine, 3(2), 38–48.
Maaty, W. S., Wiedenheft, B., Tarlykov, P., Schaff, N., Heinemann, J., Robison-Cox, J., et al. (2009). Something old, something new, something borrowed; how the thermoacidophilic archaeon Sulfolobus solfataricus responds to oxidative stress. PLoS One, 4(9), e6964.
Mahadevan, S., Shah, S. L., Marrie, T. J., Slupsky, C. M., et al. (2008). Analysis of metabolomic data using support vector machines. Analytical Chemistry, 80(19), 7562–7570.
Mulier, K. E., Beilman, G. J., Conroy, M. J., Taylor, J. H., Skarda, D. E., et al. (2005). Ringer’s ethyl pyruvate in hemorrhagic shock and resuscitation does not improve early hemodynamics or tissue energetics. Shock, 23, 248–252.
Patti, G. J., Tautenhahn, R., Siuzdak, G., et al. (2012a). Meta-analysis of untargeted metabolomic data from multiple profiling experiments. Nature Protocols, 7(3), 508–516.
Patti, G. J., Yanes, O., Shriver, L. P., Courade, J., Tautenhahn, R., Manchester, M., et al. (2012b). Metabolomics implicates altered sphingolipids in chronic pain of neuropathic origin. Nature Chemical Biology, 8(3), 232–234.
R Development Core Team. (2012). R: A language and environment for statistical computing, reference index version 2.15.1. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, http://www.R-project.org. Retrieved 16 May 2013.
Scribner, D. M., Witowski, N. E., Mulier, K. E., Lusczek, E. R., Wasiluk, K. R., Bielman, G. J., et al. (2010). Liver metabolomic changes identify biochemical pathways in hemorrhagic shock. The Journal of Surgical Research, 164, e131–e139.
Serkova, N. J., Standiford, T. J., Stringer, K. A., et al. (2011). The emerging field of quantitative blood metabolomics for biomarker discovery in critical illnesses. American Journal of Respiratory and Critical Care Medicine, 184, 647–655.
Smith, C., O’Maille, G., Want, E. J., Qin, C., Trauger, S., Brandon, T. R., et al. (2005). METLIN: A metabolite mass spectral database. Therapeutic Drug Monitoring, 27(6), 747–751.
Tautenhahn, R., Böttcher, C., Neumann, S., et al. (2008). Highly sensitive feature detection for high resolution LC/MS. BMC Bioinformatics, 9, 504.
VeselKov, K. A., Vingara, L. K., Masson, P., Robinette, S. L., Want, E., Li, J. V., et al. (2011). Optimizing preprocessing of ultra-performance liquid chromatography/mass spectrometry urinary metabolic profiles for improved information recovery. Analytical Chemistry, 83, 5864–5872.
Yanes, O., Tautenhahn, R., Patti, G. J., Siuzdak, G., et al. (2011). Expanding coverage of the metabolome for global metabolite profiling. Analytical Chemistry, 83(6), 2152–2161.
Acknowledgments
This work was supported by National Science Foundation, MCB0646499 and MCB102248. Mass spectrometry, proteomics and metabolomics core facility supported by the Murdock Charitable Trust, INBRE MT Grant No. P20 RR-16455-08, NIH Grant Nos. P20 RR-020185 and P20 RR-024237 from the COBRE Program of the National Center for Research Resources.
Author information
Authors and Affiliations
Corresponding author
Additional information
All data is available online at http://www.ebi.ac.uk/metabolights/ (Haug et al. 2013).
Rights and permissions
About this article
Cite this article
Heinemann, J., Mazurie, A., Tokmina-Lukaszewska, M. et al. Application of support vector machines to metabolomics experiments with limited replicates. Metabolomics 10, 1121–1128 (2014). https://doi.org/10.1007/s11306-014-0651-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11306-014-0651-0