Performance Improvement with Combining Multiple Approaches to Diagnosis of Thyroid Cancer

Abstract

There are a lot of diseases that carry death risk when these diseases are infected to human body, if early measures are not taken. Thyroid cancer is one of them. In USA, number of thyroid cancer cases resulted in death in only 2013 shows necessity of early fight with this disease. This study aims performance improvement in diagnosis of thyroid cancer with machine learning techniques. Study consists of 3 phases. In the first phase, BayesNet, NaiveBayes, SMO, Ibk and Random Forest classifiers have been trained with thyroid cancer train dataset. In the second phase, trained classifiers have been tested with thyroid cancer test dataset and the obtained performance results have been compared. In the third and last phase, approaches named above have been integrated to algorithm AdaboostMI to show difference between of ensemble classifiers from conventional individual classifiers and first two phases have been repeated. With using ensemble approaches performance improvement has been achieved in diagnosis of thyroid cancer. Also, kappa, accuracy and MCC values obtained from these classifier models have been explained in tables and effects on diagnosis of the disease have been shown with ROC graphics. All of these operations have been carried out with WEKA data mining program.

Share and Cite:

Akbaş, A. , Turhal, U. , Babur, S. and Avci, C. (2013) Performance Improvement with Combining Multiple Approaches to Diagnosis of Thyroid Cancer. Engineering, 5, 264-267. doi: 10.4236/eng.2013.510B055.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] C. Aral, et al., “The Association of P53 Codon 72 Polymorphism with Thyroid Cancer in Turkish Patients,” Marmara Medical Journal, Vol. 20, No. 1, 2007, pp. 1-5.
[2] J. Liska, V. Altanerova, S. Galbavy, S. Stvrtina and J. Brtko, “Thyroid Tumors: Histological Classification and Genetic Factors Involved in the Development of Thyroid Cancer,” EndocrRegul, Vol. 39, 2005, pp. 73-83.
[3] 2013. http://www.cancer.gov/cancertopics/types/thyroid
[4] F. Saiti, A. A. Naini, M. A. Shoorehdeli and M. Teshnehlab, “Thyroid Disease Diagnosis Based on Genetic Algorithms Using PNN and SVM,” The Interna-tional Bioinformatics and Biomedical Engineering (ICBBE), Beijing, 11-13 June 2009, pp. 1-4.
[5] 2013. http://archive.ics.uci.edu/ml/datasets/Thyroid+Disease
[6] R. E. Neapolitan, “Probabilistic Reasoning in Expert Systems,” Wiley, New York, 1990.
[7] H. Zhang, “Exlporing Conditions for the Optimality of Naive Bayes,” International Journal of Pattern Recognition and Artificial Intelligence, Vol. 19, No. 2, 2005, pp 183-192. http://dx.doi.org/10.1142/S0218001405003983
[8] S. Babur, U. Turhal and A. Akbas, “DVM Tabanl1 Kal1n Bag1rsak Kanseri Tan1s1Lçin Performans Gelistirme,” Elek-trik—Elektronik ve Bilgisayar Mühendisligi Sempozyumu, 2012, pp. 425-428.
[9] J. Platt, “Fast Training of Support Vector Machines Using Sequential Minimal Optimization,” In: B. Schoelkopf, C. Burges and A. Smola, Eds., Advances in Kernel Methods—Support Vector Learning, MIT Press, Cambridge, 1998.
[10] M. Bhandari and A. Joensson, “Clinical Research for Surgeons,” Library of Congress Cataloging, 2009.
[11] D. Aha and D. Kibler, “Instance-Based Learning Algorithms,” Machine Learning, Vol. 6, 1991, pp. 37-66. http://dx.doi.org/10.1007/BF00153759
[12] E. Deza and M. Deza, “Encyclopedia of Distances,” Springer, Berlin, 2009. http://dx.doi.org/10.1007/978-3-642-00234-2
[13] L. Breiman, “Random Forests-Random Features,” Tech-nical Report 567, Department of Statistics, University of California, Berkeley, 1999.
[14] A. Liaw and M. Wiener, “Classification and Regression by Random Forest,” 2013. http://www.webchem.science.ru.nl/PRiNS/rF.pdf
[15] S. Sancak, “Saldsr1 Tespit Sistemleri Tekniklerinin Kars1-last1r1lmas1,” Gebze Yüksek Teknoloji Enstitüsü Sosyal Bilimler Enstitüsü, Yüksek Lisans Tezi, Gebze, 2008.
[16] Y. Freund and R. Schapire, “Experiments with a New Boosting Algorithm,” Proceedings of Inter-national Conference on Machine Learning, 1996, pp. 148-156.
[17] M. Kearns, “Thoughts on Hypothesis Boosting,” Unpublished, Machine Learning Class Project, 1988.
[18] R. Taylor, “An Introduction to Error Analysis: The Study of Uncertainties in Physical Measurements,” 1999, pp 128- 129.
[19] J. Cohen, “A Coefficient of Agreement For Nominal Scales,” Educational and Psychological Measurement, Vol. 20, No. 1, 1960, pp. 37-46. http://dx.doi.org/10.1177/001316446002000104
[20] P. Perruchet and R. Peereman, “The Exploitation of Distributional Information in Syllable Processing,” Journal of Neurolinguistics, Vol. 17, No. 2-3, 2004, pp. 97-119. http://dx.doi.org/10.1016/S0911-6044(03)00059-9
[21] A. Swets, “Signal Detection Theory and ROC Analysis in Psychology and Diagnostics: Collected Papers,” Lawrence Erlbaum Associates, Mahwah, 1996.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.