Comparison of Machine Learning Algorithms in Breast Cancer Prediction Using the Coimbra Dataset

Yolanda D Austria (Adamson University, Philippines); Marie Luvett Goh (FEU Institute of Technology, Philippines); Lorenzo Sta. Maria Jr. (Asian Institute of Management, Philippines); Jay-Ar Lalata (FEU Institute of Technology, Philippines); Joselito Eduard Goh (De La Salle - College of St. Benilde, Philippines); Heintjie Vicente (FEU Institute of Technology, Philippines)

In the medical field, machine learning (ML) techniques are playing a significant and growing role because of their high potential in helping health practitioners make decisions and diagnosis. This inspective research aims to review ML models that may predict breast cancer in women and to compare their performances. A number of clinical features were measured among the 116 participants in the dataset of this study including insulin, glucose, resistin, adiponectin, homeostasis model assessment (HOMA), leptin, monocyte chemoattractant protein-1 (MCP-1), along with their age and body mass index (BMI). The researchers implemented 11 classification algorithms and their variations including Logistic Regression (LR), k-Nearest Neighbor (kNN), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), Gradient Boosting Method (GBM), and Naive Bayes (NB), in the detection of breast cancer on the publicly available Coimbra Breast Cancer Dataset (CBCD). Each classifier applies a unique hyperparameter setting to perform prediction and their performances are compared in identifying breast cancer. As a conclusion of this study, Gradient Boosting (GB) machine learning algorithm is the best classifier in predicting breast cancer using the Coimbra Breast Cancer Dataset (CBCD) with an accuracy of 74.14%. k-Nearest Neighbor (kNN) classifier produces the fastest training time at 0.000598 seconds while Nonlinear Support Vector Machine (SVM) classifier gives with the fastest testing time at 0 seconds. Another conclusion of this paper is that the body mass index (BMI) is the top predictor, with 50% of the classifiers observing it as their top predictor and Glucose comes in second. This recommends that they may be a good pair of variables, which may predict breast cancer in women.

Journal: International Journal of Simulation- Systems, Science and Technology- IJSSST V20

Published: Jul 30, 2019

DOI: 10.5013/IJSSST.a.20.S2.23