ABSTRACT
In this paper we discuss heterogeneous estimation model ensembles for cancer diagnoses produced using various machine learning algorithms. Based on patients' data records including standard blood parameters, tumor markers, and information about the diagnosis of tumors, the goal is to identify mathematical models for estimating cancer diagnoses. Several machine learning approaches implemented in HeuristicLab and WEKA have been applied for identifying estimators for selected cancer diagnoses: k-nearest neighbor learning, decision trees, artificial neural networks, support vector machines, random forests, and genetic programming. The models produced using these methods have been combined to heterogeneous model ensembles. All models trained during the learning phase are applied during the test phase; the final classification is annotated with a confidence value that specifies how reliable the models are regarding the presented decision: We calculate the final estimation for each sample via majority voting, and the relative ratio of a sample's majority vote is used for calculating the confidence in the final estimation. We use a confidence threshold that specifies the minimum confidence level that has to be reached; if this threshold is not reached for a sample, then there is no prediction for that specific sample.
As we show in the results section, the accuracies of diagnoses of breast cancer, melanoma, and respiratory system cancer can so be increased significantly. We see that increasing the confidence threshold leads to higher classification accuracies, bearing in mind that the ratio of samples, for which there is a classification statement, is significantly decreased.
- M. Affenzeller and S. Wagner. SASEGASA: A new generic parallel evolutionary algorithm for achieving highest quality results. Journal of Heuristics - Special Issue on New Advances on Parallel Meta-Heuristics for Complex Problems, 10:239--263, 2004. Google ScholarDigital Library
- M. Affenzeller, S. Winkler, S. Wagner, and A. Beham. Genetic Algorithms and Genetic Programming - Modern Concepts and Practical Applications. Chapman & Hall / CRC, 2009. Google ScholarCross Ref
- M. Affenzeller, S. M. Winkler, H. Stekel, S. Forstenlechner, and S. Wagner. Improving the accuracy of cancer prediction by ensemble confidence evaluation. Lecture Notes in Computer Science, 8111:316--323, 2013.Google ScholarDigital Library
- W. Banzhaf and C. Lasarczyk. Genetic programming of an algorithmic chemistry. In U. O'Reilly, T. Yu, R. Riolo, and B. Worzel, editors, Genetic Programming Theory and Practice II, pages 175--190. Ann Arbor, 2004.Google Scholar
- N. Bitterlich and J. Schneider. Cut-off-independent tumour marker evaluation using ROC approximation. Anticancer Research, 27:4305--4310, 2007.Google Scholar
- L. Breiman. Random forests. Machine Learning, 45(1):5--32, 2001. Google ScholarDigital Library
- C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. Google ScholarDigital Library
- R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. Wiley Interscience, 2nd edition, 2000. Google ScholarDigital Library
- M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: An update. SIGKDD Explorations, 11(1), 2009. Google ScholarDigital Library
- J. A. Koepke. Molecular marker test standardization. Cancer, 69:1578--1581, 1992.Google ScholarCross Ref
- S. Kotsiantis. Supervised machine learning: A review of classification techniques. Informatica, 31:249--268, 2007.Google Scholar
- J. R. Koza. Genetic Programming: On the Programming of Computers by Means of Natural Selection. The MIT Press, 1992. Google ScholarDigital Library
- M. LaFleur-Brooks. Exploring Medical Language: A Student-Directed Approach. St. Louis, Missouri, USA: Mosby Elsevier, 7th edition, 2008.Google Scholar
- O. Nelles. Nonlinear System Identification. Springer Verlag, Berlin Heidelberg New York, 2001.Google ScholarCross Ref
- J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, 1993. Google ScholarDigital Library
- A. J. Rai, Z. Zhang, J. Rosenzweig, I. ming Shih, T. Pham, E. T. Fung, L. J. Sokoll, and D. W. Chan. Proteomic approaches to tumor marker discovery. Archives of Pathology & Laboratory Medicine, 126(12):1518--1526, 2002.Google ScholarCross Ref
- M. Segal. Machine Learning Benchmarks and Random Forest Regression. Center for Bioinformatics & Molecular Biostatistics, 2004.Google Scholar
- V. Vapnik. Statistical Learning Theory. Wiley, New York, 1998.Google ScholarDigital Library
- S. Wagner and M. Affenzeller. SexualGA: Gender-specific selection for genetic algorithms. In N. Callaos, W. Lesso, and E. Hansen, editors, Proceedings of the 9th World Multi-Conference on Systemics, Cybernetics and Informatics (WMSCI) 2005, volume 4, pages 76--81. International Institute of Informatics and Systemics, 2005.Google Scholar
- S. Wagner, G. Kronberger, A. Beham, M. Kommenda, A. Scheibenpflug, E. Pitzer, S. Vonolfen, M. Kofler, S. Winkler, V. Dorfer, and M. Affenzeller. Advanced Methods and Applications in Computational Intelligence, volume 6 of Topics in Intelligent Engineering and Informatics, chapter Architecture and Design of the HeuristicLab Optimization Environment, pages 197--261. Springer, 2014.Google Scholar
- P. W. Williams and H. D. Gray. Gray's anatomy. New York: C. Livingstone, 37th edition, 1989.Google Scholar
- S. Winkler, M. Affenzeller, and S. Wagner. Using enhanced genetic programming techniques for evolving classifiers in the context of medical diagnosis - an empirical study. Genetic Programming and Evolvable Machines, 10(2):111--140, 2009. Google ScholarDigital Library
- S. M. Winkler. Evolutionary System Identification - Modern Concepts and Practical Applications. PhD thesis, Institute for Formal Models and Verification, Johannes Kepler University Linz, 2008.Google Scholar
- S. M. Winkler, M. Affenzeller, W. Jacak, and H. Stekel. Classification of tumor marker values using heuristic data mining methods. In Proceedings of the GECCO 2010 Workshop on Medical Applications of Genetic and Evolutionary Computation (MedGEC 2010), 2010. Google ScholarDigital Library
- S. M. Winkler, M. Affenzeller, W. Jacak, and H. Stekel. Identification of cancer diagnosis estimation models using evolutionary algorithms - a case study for breast cancer, melanoma, and cancer in the respiratory system. In Proceedings of the GECCO 2011 Workshop on Medical Applications of Genetic and Evolutionary Computation (MedGEC 2011), 2011. Google ScholarDigital Library
- S. M. Winkler, M. Affenzeller, and H. Stekel. Evolutionary identification of cancer predictors using clustered data - a case study for breast cancer, melanoma, and cancer in the respiratory system. In Proceedings of the GECCO 2013 Workshop on Medical Applications of Genetic and Evolutionary Computation (MedGEC 2013), 2013. Google ScholarDigital Library
- S. M. Winkler, S. Schaller, V. Dorfer, M. Affenzeller, G. Petz, and M. Karpowicz. Data based prediction of sentiments using heterogeneous model ensembles. submitted to Soft Computing, 2014.Google Scholar
- I. H. Witten and E. Frank. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, 2nd edition, 2005. Google ScholarDigital Library
- L. Zhong, X. Zhou, K. Wei, X. Yang, C. Ma, C. Zhang, and Z. Zhang. Application of serum tumor markers and support vector machine in the diagnosis of oral squamous cell carcinoma. Shanghai Journal of Stomatology, 17(5):457--460, 2008.Google Scholar
Index Terms
- Data based prediction of cancer diagnoses using heterogeneous model ensembles: a case study for breast cancer, melanoma, and cancer in the respiratory system
Recommendations
Evolutionary identification of cancer predictors using clustered data: a case study for breast cancer, melanoma, and cancer in the respiratory system
GECCO '13 Companion: Proceedings of the 15th annual conference companion on Genetic and evolutionary computationIn this paper we discuss the effects of using pre-clustered data on the identification of estimation models for cancer diagnoses. Based on patients' data records including standard blood parameters, tumor markers, and information about the diagnosis of ...
Identification of cancer diagnosis estimation models using evolutionary algorithms: a case study for breast cancer, melanoma, and cancer in the respiratory system
GECCO '11: Proceedings of the 13th annual conference companion on Genetic and evolutionary computationIn this paper we present results of empirical research work done on the data based identification of estimation models for cancer diagnoses: Based on patients' data records including standard blood parameters, tumor markers, and information about the ...
Classification of tumor marker values using heuristic data mining methods
GECCO '10: Proceedings of the 12th annual conference companion on Genetic and evolutionary computationTumor markers are substances that are found in blood, urine, or body tissues and that are used as indicators for tumors; elevated tumor marker values can indicate the presence of cancer, but there can also be other causes. We have used a medical ...
Comments