Skip to main content
Top
Published in: Health and Technology 6/2022

04-11-2022 | Original Paper

Breast cancer classification along with feature prioritization using machine learning algorithms

Authors: Abdullah-Al Nahid, Md. Johir Raihan, Abdullah Al-Mamun Bulbul

Published in: Health and Technology | Issue 6/2022

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Purpose

Breast Cancer (BC) is considered one of the lethal diseases that causes a large number of female deaths around the world. Prevention and diagnosis are the best options to reduce cancer death, which can be performed through regular examination of a few health-related issues such as the level of Glucose, Insulin, HOMA, Leptin, etc. Based on a few such kinds of statistics, this work classifies Breast Cancer patients and non-Breast Cancer patients utilizing state-of-the-art Machine Learning (ML) techniques. In this study, we have classified the BC using state-of-the-art ML techniques and analyzed the features that influence the model to predict a certain class.

Methods

We have used several Machine Learning (ML) models such as Gradient Boosting (GB), XGBoost (XGB), CatBoost (CB), and Light Gradient Boosting Machine (LGBM) to classify the BC and find the feature importance. To interpret the ML model and find the feature contribution to the prediction of the BC, we have used the Shapley Additive exPlanation (SHAP). Besides, a few filters and wrapper-based feature selection and prioritization algorithms have been used to sort out the priority of the features. To obtain conclusive remarks based on a democratic manner, we have utilized the traditional Borda method.

Results

It shows that Gradient Boosting (GB) methods provide the best performances among the selected gradient-based algorithms with 82.85% accuracy, 80.00% precision, 88.89% recall, and 84.21% F1-Score, respectively. It shows that different algorithms provide different precedence of the features. We have utilized the traditional Borda method, which has concluded that Glucose is the most influential parameter for Breast Cancer and non-Breast Cancer patients' selection.

Conclusion

In this study, we have classified the BC and found that the GB classifier achieved the highest accuracy among CB. XGB, and LGBM classifier. Using the feature selection technique, SHAP, and Borda method we have found that Glucose is the most influential parameter for the detection of BC. We have also presented and analyzed the samples that were misclassified by the GB classifier.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference World Health Organization fact sheets for breast cancer. 2021. World Health Organization fact sheets for breast cancer. 2021.
3.
go back to reference Burstein HJ, Lacchetti C, Anderson H, Buchholz TA, Davidson NE, Gelmon KE, Giordano SH, Hudis CA, Solky AJ, Stearns V, Winer EP, Griggs JJ. Adjuvant endocrine therapy for women with hormone receptor-positive breast cancer: American society of clinical oncology clinical practice guideline update on ovarian suppression. J Clin Oncol. 2016 May 10;34(14):1689–701. https://doi.org/10.1200/JCO.2015.65.9573. Epub 2016 Feb 16. PMID: 26884586.CrossRef Burstein HJ, Lacchetti C, Anderson H, Buchholz TA, Davidson NE, Gelmon KE, Giordano SH, Hudis CA, Solky AJ, Stearns V, Winer EP, Griggs JJ. Adjuvant endocrine therapy for women with hormone receptor-positive breast cancer: American society of clinical oncology clinical practice guideline update on ovarian suppression. J Clin Oncol. 2016 May 10;34(14):1689–701. https://​doi.​org/​10.​1200/​JCO.​2015.​65.​9573. Epub 2016 Feb 16. PMID: 26884586.CrossRef
5.
go back to reference Wu J, Hou Y, Zhou M, Xie J, Chao P, Feng Q. High glucose levels promote the proliferation of breast cancer cells through GTPases. Breast Canc (Dove Med Press). 429–436, 2017. Wu J, Hou Y, Zhou M, Xie J, Chao P, Feng Q. High glucose levels promote the proliferation of breast cancer cells through GTPases. Breast Canc (Dove Med Press). 429–436, 2017.
6.
10.
go back to reference Assiri AMA, Kamel HFM, Hassanien MFR. Resistin, Visfatin, Adiponectin, and Leptin: Risk of Breast Cancer in Pre- and Postmenopausal Saudi Females and Their Possible Diagnostic and Predictive Implications as Novel Biomarkers. Dis Markers. 2015;2015:253519. https://doi.org/10.1155/2015/253519.CrossRef Assiri AMA, Kamel HFM, Hassanien MFR. Resistin, Visfatin, Adiponectin, and Leptin: Risk of Breast Cancer in Pre- and Postmenopausal Saudi Females and Their Possible Diagnostic and Predictive Implications as Novel Biomarkers. Dis Markers. 2015;2015:253519. https://​doi.​org/​10.​1155/​2015/​253519.CrossRef
11.
go back to reference Sun L, Zhu Y, Qian Q, Tang L. Body mass index and prognosis of breast cancer: An analysis by menstruation status when breast cancer diagnosis. Medicine. 2018;97:26. Sun L, Zhu Y, Qian Q, Tang L. Body mass index and prognosis of breast cancer: An analysis by menstruation status when breast cancer diagnosis. Medicine. 2018;97:26.
18.
go back to reference Kayaalp F, Başarslan M. Performance analysis of filter based feature selection methods on diagnosis of breast cancer and orthopedics. 2019. Kayaalp F, Başarslan M. Performance analysis of filter based feature selection methods on diagnosis of breast cancer and orthopedics. 2019.
20.
go back to reference Sardouk F, Duru A, Bayat O. Classification of Breast Cancer Using Data Mining. American Scientific Research Journal for Engineering, Technology, and Sciences. 2019;51:38–46. Sardouk F, Duru A, Bayat O. Classification of Breast Cancer Using Data Mining. American Scientific Research Journal for Engineering, Technology, and Sciences. 2019;51:38–46.
23.
go back to reference Li Y, Chen Z. Performance evaluation of machine learning methods for breast cancer prediction. Appl Comput Math. 2018;7:212.CrossRef Li Y, Chen Z. Performance evaluation of machine learning methods for breast cancer prediction. Appl Comput Math. 2018;7:212.CrossRef
27.
go back to reference Ke G et al. LightGBM: A highly efficient gradient boosting decision tree. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017;3149–3157. Ke G et al. LightGBM: A highly efficient gradient boosting decision tree. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017;3149–3157.
28.
go back to reference Dorogush AV, Ershov V, Gulin A. CatBoost: gradient boosting with categorical features support. CoRR, vol. abs/1810.1, 2018. Dorogush AV, Ershov V, Gulin A. CatBoost: gradient boosting with categorical features support. CoRR, vol. abs/1810.1, 2018.
29.
go back to reference Beraha M, Metelli AM, Papini M, Tirinzoni A, Restelli M. Feature Selection via Mutual Information: New Theoretical Insights. CoRR, vol. abs/1907.0, 2019. Beraha M, Metelli AM, Papini M, Tirinzoni A, Restelli M. Feature Selection via Mutual Information: New Theoretical Insights. CoRR, vol. abs/1907.0, 2019.
30.
go back to reference El Akadi A, Ouardighi A, Aboutajdine D. A powerful feature selection approach based on mutual information. 2008;8. El Akadi A, Ouardighi A, Aboutajdine D. A powerful feature selection approach based on mutual information. 2008;8.
31.
go back to reference Gu Q, Li Z, Han J. Generalized Fisher Score for Feature Selection. CoRR, vol. abs/1202.3, 2012. Gu Q, Li Z, Han J. Generalized Fisher Score for Feature Selection. CoRR, vol. abs/1202.3, 2012.
32.
go back to reference Ren J, Zhou Z, Chen Q, Zhang Q. Learning baseline values for shapley values. CoRR. abs/2105.1, 2021. Ren J, Zhou Z, Chen Q, Zhang Q. Learning baseline values for shapley values. CoRR. abs/2105.1, 2021.
Metadata
Title
Breast cancer classification along with feature prioritization using machine learning algorithms
Authors
Abdullah-Al Nahid
Md. Johir Raihan
Abdullah Al-Mamun Bulbul
Publication date
04-11-2022
Publisher
Springer Berlin Heidelberg
Published in
Health and Technology / Issue 6/2022
Print ISSN: 2190-7188
Electronic ISSN: 2190-7196
DOI
https://doi.org/10.1007/s12553-022-00710-6

Other articles of this Issue 6/2022

Health and Technology 6/2022 Go to the issue

Premium Partner