skip to main content
10.1145/2598394.2609853acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
technical-note

Data based prediction of cancer diagnoses using heterogeneous model ensembles: a case study for breast cancer, melanoma, and cancer in the respiratory system

Published:12 July 2014Publication History

ABSTRACT

In this paper we discuss heterogeneous estimation model ensembles for cancer diagnoses produced using various machine learning algorithms. Based on patients' data records including standard blood parameters, tumor markers, and information about the diagnosis of tumors, the goal is to identify mathematical models for estimating cancer diagnoses. Several machine learning approaches implemented in HeuristicLab and WEKA have been applied for identifying estimators for selected cancer diagnoses: k-nearest neighbor learning, decision trees, artificial neural networks, support vector machines, random forests, and genetic programming. The models produced using these methods have been combined to heterogeneous model ensembles. All models trained during the learning phase are applied during the test phase; the final classification is annotated with a confidence value that specifies how reliable the models are regarding the presented decision: We calculate the final estimation for each sample via majority voting, and the relative ratio of a sample's majority vote is used for calculating the confidence in the final estimation. We use a confidence threshold that specifies the minimum confidence level that has to be reached; if this threshold is not reached for a sample, then there is no prediction for that specific sample.

As we show in the results section, the accuracies of diagnoses of breast cancer, melanoma, and respiratory system cancer can so be increased significantly. We see that increasing the confidence threshold leads to higher classification accuracies, bearing in mind that the ratio of samples, for which there is a classification statement, is significantly decreased.

References

  1. M. Affenzeller and S. Wagner. SASEGASA: A new generic parallel evolutionary algorithm for achieving highest quality results. Journal of Heuristics - Special Issue on New Advances on Parallel Meta-Heuristics for Complex Problems, 10:239--263, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Affenzeller, S. Winkler, S. Wagner, and A. Beham. Genetic Algorithms and Genetic Programming - Modern Concepts and Practical Applications. Chapman & Hall / CRC, 2009. Google ScholarGoogle ScholarCross RefCross Ref
  3. M. Affenzeller, S. M. Winkler, H. Stekel, S. Forstenlechner, and S. Wagner. Improving the accuracy of cancer prediction by ensemble confidence evaluation. Lecture Notes in Computer Science, 8111:316--323, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. W. Banzhaf and C. Lasarczyk. Genetic programming of an algorithmic chemistry. In U. O'Reilly, T. Yu, R. Riolo, and B. Worzel, editors, Genetic Programming Theory and Practice II, pages 175--190. Ann Arbor, 2004.Google ScholarGoogle Scholar
  5. N. Bitterlich and J. Schneider. Cut-off-independent tumour marker evaluation using ROC approximation. Anticancer Research, 27:4305--4310, 2007.Google ScholarGoogle Scholar
  6. L. Breiman. Random forests. Machine Learning, 45(1):5--32, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. Wiley Interscience, 2nd edition, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: An update. SIGKDD Explorations, 11(1), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. A. Koepke. Molecular marker test standardization. Cancer, 69:1578--1581, 1992.Google ScholarGoogle ScholarCross RefCross Ref
  11. S. Kotsiantis. Supervised machine learning: A review of classification techniques. Informatica, 31:249--268, 2007.Google ScholarGoogle Scholar
  12. J. R. Koza. Genetic Programming: On the Programming of Computers by Means of Natural Selection. The MIT Press, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. LaFleur-Brooks. Exploring Medical Language: A Student-Directed Approach. St. Louis, Missouri, USA: Mosby Elsevier, 7th edition, 2008.Google ScholarGoogle Scholar
  14. O. Nelles. Nonlinear System Identification. Springer Verlag, Berlin Heidelberg New York, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  15. J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. J. Rai, Z. Zhang, J. Rosenzweig, I. ming Shih, T. Pham, E. T. Fung, L. J. Sokoll, and D. W. Chan. Proteomic approaches to tumor marker discovery. Archives of Pathology & Laboratory Medicine, 126(12):1518--1526, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  17. M. Segal. Machine Learning Benchmarks and Random Forest Regression. Center for Bioinformatics & Molecular Biostatistics, 2004.Google ScholarGoogle Scholar
  18. V. Vapnik. Statistical Learning Theory. Wiley, New York, 1998.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Wagner and M. Affenzeller. SexualGA: Gender-specific selection for genetic algorithms. In N. Callaos, W. Lesso, and E. Hansen, editors, Proceedings of the 9th World Multi-Conference on Systemics, Cybernetics and Informatics (WMSCI) 2005, volume 4, pages 76--81. International Institute of Informatics and Systemics, 2005.Google ScholarGoogle Scholar
  20. S. Wagner, G. Kronberger, A. Beham, M. Kommenda, A. Scheibenpflug, E. Pitzer, S. Vonolfen, M. Kofler, S. Winkler, V. Dorfer, and M. Affenzeller. Advanced Methods and Applications in Computational Intelligence, volume 6 of Topics in Intelligent Engineering and Informatics, chapter Architecture and Design of the HeuristicLab Optimization Environment, pages 197--261. Springer, 2014.Google ScholarGoogle Scholar
  21. P. W. Williams and H. D. Gray. Gray's anatomy. New York: C. Livingstone, 37th edition, 1989.Google ScholarGoogle Scholar
  22. S. Winkler, M. Affenzeller, and S. Wagner. Using enhanced genetic programming techniques for evolving classifiers in the context of medical diagnosis - an empirical study. Genetic Programming and Evolvable Machines, 10(2):111--140, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. M. Winkler. Evolutionary System Identification - Modern Concepts and Practical Applications. PhD thesis, Institute for Formal Models and Verification, Johannes Kepler University Linz, 2008.Google ScholarGoogle Scholar
  24. S. M. Winkler, M. Affenzeller, W. Jacak, and H. Stekel. Classification of tumor marker values using heuristic data mining methods. In Proceedings of the GECCO 2010 Workshop on Medical Applications of Genetic and Evolutionary Computation (MedGEC 2010), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. M. Winkler, M. Affenzeller, W. Jacak, and H. Stekel. Identification of cancer diagnosis estimation models using evolutionary algorithms - a case study for breast cancer, melanoma, and cancer in the respiratory system. In Proceedings of the GECCO 2011 Workshop on Medical Applications of Genetic and Evolutionary Computation (MedGEC 2011), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. M. Winkler, M. Affenzeller, and H. Stekel. Evolutionary identification of cancer predictors using clustered data - a case study for breast cancer, melanoma, and cancer in the respiratory system. In Proceedings of the GECCO 2013 Workshop on Medical Applications of Genetic and Evolutionary Computation (MedGEC 2013), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. M. Winkler, S. Schaller, V. Dorfer, M. Affenzeller, G. Petz, and M. Karpowicz. Data based prediction of sentiments using heterogeneous model ensembles. submitted to Soft Computing, 2014.Google ScholarGoogle Scholar
  28. I. H. Witten and E. Frank. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, 2nd edition, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. L. Zhong, X. Zhou, K. Wei, X. Yang, C. Ma, C. Zhang, and Z. Zhang. Application of serum tumor markers and support vector machine in the diagnosis of oral squamous cell carcinoma. Shanghai Journal of Stomatology, 17(5):457--460, 2008.Google ScholarGoogle Scholar

Index Terms

  1. Data based prediction of cancer diagnoses using heterogeneous model ensembles: a case study for breast cancer, melanoma, and cancer in the respiratory system

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        GECCO Comp '14: Proceedings of the Companion Publication of the 2014 Annual Conference on Genetic and Evolutionary Computation
        July 2014
        1524 pages
        ISBN:9781450328814
        DOI:10.1145/2598394

        Copyright © 2014 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 July 2014

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • technical-note

        Acceptance Rates

        GECCO Comp '14 Paper Acceptance Rate180of544submissions,33%Overall Acceptance Rate1,669of4,410submissions,38%

        Upcoming Conference

        GECCO '24
        Genetic and Evolutionary Computation Conference
        July 14 - 18, 2024
        Melbourne , VIC , Australia

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader