Skip to main content
Top

2024 | OriginalPaper | Chapter

BCSNP-ML: A Novel Breast Cancer Prediction Model Base on LightGBM and Estrogen Metabolic Enzyme Genes

Authors : Tianlei Zheng, Shi Geng, Wei Yan, Fengjun Guan, Na Yang, Lei Zhao, Bei Zhang, Xueyan Zhou, Deqiang Cheng

Published in: Proceedings of the 2nd International Conference on Internet of Things, Communication and Intelligent Technology

Publisher: Springer Nature Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Estrogen-related metabolic enzyme gene polymorphisms have been demonstrated to be linked to breast cancer, and in this paper, a novel noninvasive breast cancer prediction model was developed utilizing machine learning algorithms incorporating estrogen metabolic enzyme gene single nucleotide polymorphisms (SNPs). To precisely forecast the susceptibility to breast cancer,, the coded data of 14 SNPs from enrolled breast patients and normal women were randomly shuffled, with 80% of the data designated as training data, the remaining 20% reserved as the test group to be validated. Single factor analysis was performed to screen independent risk factors, and subsequent application of Breast Cancer with Single Nucleotide Polymorphisms - Machine Learning model (BCSNP-ML) prediction model was completed using Light Gradient Boosting Machine (LightGBM) algorithm. A total of 14 SNPs variables from 280 subjects were utilized in this study. Single factor analysis indicated that a meaningful association between SULT1A1 rs1042028, CYP1A1 rs1048943, CYP1B1 rs1056827, CYP1A1 rs1056836 and the incidence of breast cancer, with 14 variables demonstrates a notable area under the receiver operating characteristic curve (AUROC) of 0.809. The AUROC of the BCSNP-ML model constructed by four variables was 0.831. Additionally, BCSNP-ML is visualized and interpretated in the paper using SHapley Additive exPlanations analysis to further validate that the model exhibits great potential as a robust tool for clinical forecasting of breast cancer.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Michailidou, K., Hall, P., Gonzalez-Neira, A., et al.: Large-scale genotyping identifies 41 new loci associated with breast cancer risk. Nat. Genet. 45(4), 353–361 (2013)CrossRef Michailidou, K., Hall, P., Gonzalez-Neira, A., et al.: Large-scale genotyping identifies 41 new loci associated with breast cancer risk. Nat. Genet. 45(4), 353–361 (2013)CrossRef
2.
go back to reference Yin, M., et al.: Analysis on incidence and mortality trends and age-period-cohort of breast cancer in Chinese women from 1990 to 2019. Int. J. Environ. Res. Publ. Health 20(1) (2023) Yin, M., et al.: Analysis on incidence and mortality trends and age-period-cohort of breast cancer in Chinese women from 1990 to 2019. Int. J. Environ. Res. Publ. Health 20(1) (2023)
3.
go back to reference Yager, J.D., Davidson, N.E.: Estrogen carcinogenesis in breast cancer. N. Engl. J. Med. 354(3), 270–282 (2006)CrossRef Yager, J.D., Davidson, N.E.: Estrogen carcinogenesis in breast cancer. N. Engl. J. Med. 354(3), 270–282 (2006)CrossRef
4.
go back to reference Clemons, M., Goss, P.: Estrogen and the risk of breast cancer. N. Engl. J. Med. 344(4), 276–285 (2001)CrossRef Clemons, M., Goss, P.: Estrogen and the risk of breast cancer. N. Engl. J. Med. 344(4), 276–285 (2001)CrossRef
5.
go back to reference Peto, J., Mack, T.M.: High constant incidence in twins and other relatives of women with breast cancer. Nat. Genet. 26(4), 411–414 (2000)CrossRef Peto, J., Mack, T.M.: High constant incidence in twins and other relatives of women with breast cancer. Nat. Genet. 26(4), 411–414 (2000)CrossRef
6.
go back to reference Michailidou, K., et al.: Association analysis identifies 65 new breast cancer risk loci. Nature 551(7678), 92–94 (2017)CrossRef Michailidou, K., et al.: Association analysis identifies 65 new breast cancer risk loci. Nature 551(7678), 92–94 (2017)CrossRef
7.
go back to reference Friesenhengst, A., et al.: Elevated aromatase (CYP19A1) expression is associated with a poor survival of patients with Estrogen receptor positive breast cancer. Horm. Cancer 9(2), 128–138 (2018)CrossRef Friesenhengst, A., et al.: Elevated aromatase (CYP19A1) expression is associated with a poor survival of patients with Estrogen receptor positive breast cancer. Horm. Cancer 9(2), 128–138 (2018)CrossRef
8.
go back to reference Bahreini, F., et al.: MiR-559 polymorphism rs58450758 is linked to breast cancer. Br. J. Biomed. Sci. 77(1), 29–34 (2020)MathSciNetCrossRef Bahreini, F., et al.: MiR-559 polymorphism rs58450758 is linked to breast cancer. Br. J. Biomed. Sci. 77(1), 29–34 (2020)MathSciNetCrossRef
9.
go back to reference Mavaddat, N., et al.: Prediction of breast cancer risk based on profiling with common genetic variants. J. Natl. Cancer Inst. 107(5) (2015) Mavaddat, N., et al.: Prediction of breast cancer risk based on profiling with common genetic variants. J. Natl. Cancer Inst. 107(5) (2015)
10.
go back to reference Reinbolt, R.E., et al.: Genomic risk prediction of aromatase inhibitor-related arthralgia in patients with breast cancer using a novel machine-learning algorithm. Cancer Med. 7(1), 240–253 (2018)CrossRef Reinbolt, R.E., et al.: Genomic risk prediction of aromatase inhibitor-related arthralgia in patients with breast cancer using a novel machine-learning algorithm. Cancer Med. 7(1), 240–253 (2018)CrossRef
11.
go back to reference Cui, P., et al.: SNP rs2071095 in LincRNA H19 is associated with breast cancer risk. Breast Cancer Res. Treat. 171(1), 161–171 (2018)CrossRef Cui, P., et al.: SNP rs2071095 in LincRNA H19 is associated with breast cancer risk. Breast Cancer Res. Treat. 171(1), 161–171 (2018)CrossRef
12.
go back to reference Desautels, T., et al.: Prediction of early unplanned intensive care unit readmission in a UK tertiary care hospital: a cross-sectional machine learning approach. BMJ Open 7(9), e017199 (2017)CrossRef Desautels, T., et al.: Prediction of early unplanned intensive care unit readmission in a UK tertiary care hospital: a cross-sectional machine learning approach. BMJ Open 7(9), e017199 (2017)CrossRef
13.
go back to reference Ho, D.S.W., et al.: Machine learning SNP based prediction for precision medicine. Front. Genet. 10, 267 (2019)CrossRef Ho, D.S.W., et al.: Machine learning SNP based prediction for precision medicine. Front. Genet. 10, 267 (2019)CrossRef
14.
go back to reference Pattarabanjird, T., et al.: A machine learning model utilizing a Novel SNP shows enhanced prediction of coronary artery disease severity. Genes (Basel) 11(12) (2020) Pattarabanjird, T., et al.: A machine learning model utilizing a Novel SNP shows enhanced prediction of coronary artery disease severity. Genes (Basel) 11(12) (2020)
15.
go back to reference Gaudillo, J., et al.: Machine learning approach to single nucleotide polymorphism-based asthma prediction. PLoS ONE 14(12), e0225574 (2019)CrossRef Gaudillo, J., et al.: Machine learning approach to single nucleotide polymorphism-based asthma prediction. PLoS ONE 14(12), e0225574 (2019)CrossRef
16.
go back to reference Wang, H.Y., et al.: Machine learning-based method for obesity risk evaluation using single-nucleotide polymorphisms derived from next-generation sequencing. J. Comput. Biol. 25(12), 1347–1360 (2018)CrossRef Wang, H.Y., et al.: Machine learning-based method for obesity risk evaluation using single-nucleotide polymorphisms derived from next-generation sequencing. J. Comput. Biol. 25(12), 1347–1360 (2018)CrossRef
17.
go back to reference Tai, K.Y., Dhaliwal, J., Wong, K.: Risk score prediction model based on single nucleotide polymorphism for predicting malaria: a machine learning approach. BMC Bioinform. 23(1), 325 (2022)CrossRef Tai, K.Y., Dhaliwal, J., Wong, K.: Risk score prediction model based on single nucleotide polymorphism for predicting malaria: a machine learning approach. BMC Bioinform. 23(1), 325 (2022)CrossRef
18.
go back to reference Lakeman, I.M.M., et al.: Addition of a 161-SNP polygenic risk score to family history-based risk prediction: impact on clinical management in non-BRCA1/2 breast cancer families. J. Med. Genet. 56(9), 581–589 (2019)CrossRef Lakeman, I.M.M., et al.: Addition of a 161-SNP polygenic risk score to family history-based risk prediction: impact on clinical management in non-BRCA1/2 breast cancer families. J. Med. Genet. 56(9), 581–589 (2019)CrossRef
19.
go back to reference Reeves, G.K., et al.: Incidence of breast cancer and its subtypes in relation to individual and multiple low-penetrance genetic susceptibility loci. JAMA 304(4), 426–434 (2010)CrossRef Reeves, G.K., et al.: Incidence of breast cancer and its subtypes in relation to individual and multiple low-penetrance genetic susceptibility loci. JAMA 304(4), 426–434 (2010)CrossRef
20.
go back to reference Lee, O., et al.: Association of genetic polymorphisms with local steroid metabolism in human benign breasts. Steroids 177, 108937 (2022)CrossRef Lee, O., et al.: Association of genetic polymorphisms with local steroid metabolism in human benign breasts. Steroids 177, 108937 (2022)CrossRef
21.
go back to reference Babu, G., Bin Islam, S., Khan, M.A.: A review on the genetic polymorphisms and susceptibility of cancer patients in Bangladesh. Mol. Biol. Rep. 49(7), 6725–6739 (2022)CrossRef Babu, G., Bin Islam, S., Khan, M.A.: A review on the genetic polymorphisms and susceptibility of cancer patients in Bangladesh. Mol. Biol. Rep. 49(7), 6725–6739 (2022)CrossRef
22.
go back to reference Kristanti, A.N., et al.: Anticancer potential of beta-Sitosterol and Oleanolic acid as through inhibition of human estrogenic 17beta-hydroxysteroid dehydrogenase type-1 based on an in silico approach. RSC Adv. 12(31), 20319–20329 (2022)CrossRef Kristanti, A.N., et al.: Anticancer potential of beta-Sitosterol and Oleanolic acid as through inhibition of human estrogenic 17beta-hydroxysteroid dehydrogenase type-1 based on an in silico approach. RSC Adv. 12(31), 20319–20329 (2022)CrossRef
23.
go back to reference Khorshid Shamshiri, A., et al.: Genetic architecture of mammographic density as a risk factor for breast cancer: a systematic review. Clin. Transl. Oncol. 25(6), 1729–1747 (2023)CrossRef Khorshid Shamshiri, A., et al.: Genetic architecture of mammographic density as a risk factor for breast cancer: a systematic review. Clin. Transl. Oncol. 25(6), 1729–1747 (2023)CrossRef
24.
go back to reference Yi, M., Negishi, M., Lee, S.J.: Estrogen Sulfotransferase (SULT1E1): its molecular regulation, polymorphisms, and clinical perspectives. J. Pers. Med. 11(3) (2021) Yi, M., Negishi, M., Lee, S.J.: Estrogen Sulfotransferase (SULT1E1): its molecular regulation, polymorphisms, and clinical perspectives. J. Pers. Med. 11(3) (2021)
25.
go back to reference Li, J., et al.: Value of UGT2B7-161 single nucleotide polymorphism in predicting the risk of cardiotoxicity in HER-2 positive breast cancer patients who underwent Pertuzumab combined with Trastuzumab therapy by PSL. Pharmgenomics Pers. Med. 15, 215–225 (2022) Li, J., et al.: Value of UGT2B7-161 single nucleotide polymorphism in predicting the risk of cardiotoxicity in HER-2 positive breast cancer patients who underwent Pertuzumab combined with Trastuzumab therapy by PSL. Pharmgenomics Pers. Med. 15, 215–225 (2022)
26.
go back to reference Nyangwara, V.A., et al.: Cardiotoxicity and pharmacogenetics of doxorubicin in black Zimbabwean breast cancer patients. Br. J. Clin. Pharmacol. (2023) Nyangwara, V.A., et al.: Cardiotoxicity and pharmacogenetics of doxorubicin in black Zimbabwean breast cancer patients. Br. J. Clin. Pharmacol. (2023)
27.
go back to reference Jin, M., et al.: Association between KRAS gene polymorphisms and genetic susceptibility to breast cancer in a Chinese population. J. Clin. Lab. Anal. 37(1), e24806 (2023)CrossRef Jin, M., et al.: Association between KRAS gene polymorphisms and genetic susceptibility to breast cancer in a Chinese population. J. Clin. Lab. Anal. 37(1), e24806 (2023)CrossRef
28.
go back to reference Quinlan, J.R.: Learning decision tree classifiers. ACM Comput. Surv. 28(1), 71–72 (1996)CrossRef Quinlan, J.R.: Learning decision tree classifiers. ACM Comput. Surv. 28(1), 71–72 (1996)CrossRef
29.
30.
go back to reference Cortes, C., Vapnik, V.J.M.L.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)CrossRef Cortes, C., Vapnik, V.J.M.L.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)CrossRef
31.
go back to reference Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. ACM (2016) Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. ACM (2016)
32.
go back to reference Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems, vol. 30 (2017) Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
33.
go back to reference Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, vol. 30 (2017) Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
34.
go back to reference Wei, Q., et al.: Machine learning based on eye-tracking data to identify autism spectrum disorder: a systematic review and meta-analysis. J. Biomed. Inform. 137, 104254 (2023)CrossRef Wei, Q., et al.: Machine learning based on eye-tracking data to identify autism spectrum disorder: a systematic review and meta-analysis. J. Biomed. Inform. 137, 104254 (2023)CrossRef
35.
go back to reference Morgenstern, J.D., et al.: Perspective: big data and machine learning could help advance nutritional epidemiology. Adv. Nutr. 12(3), 621–631 (2021)CrossRef Morgenstern, J.D., et al.: Perspective: big data and machine learning could help advance nutritional epidemiology. Adv. Nutr. 12(3), 621–631 (2021)CrossRef
36.
go back to reference Liew, B.X.W., et al.: Machine learning versus logistic regression for prognostic modelling in individuals with non-specific neck pain. Eur. Spine J. 31(8), 2082–2091 (2022)CrossRef Liew, B.X.W., et al.: Machine learning versus logistic regression for prognostic modelling in individuals with non-specific neck pain. Eur. Spine J. 31(8), 2082–2091 (2022)CrossRef
37.
go back to reference Founta, K., et al.: Gene targeting in amyotrophic lateral sclerosis using causality-based feature selection and machine learning. Mol. Med. 29(1), 12 (2023)CrossRef Founta, K., et al.: Gene targeting in amyotrophic lateral sclerosis using causality-based feature selection and machine learning. Mol. Med. 29(1), 12 (2023)CrossRef
38.
go back to reference Yin, L., Ma, P., Deng, Z.: JLGBMLoc-a novel high-precision indoor localization method based on LightGBM. Sensors (Basel) 21(8) (2021) Yin, L., Ma, P., Deng, Z.: JLGBMLoc-a novel high-precision indoor localization method based on LightGBM. Sensors (Basel) 21(8) (2021)
39.
go back to reference Gupta, V., Kumar, E.: H(3)O-LGBM: hybrid Harris hawk optimization based light gradient boosting machine model for real-time trading. Artif. Intell. Rev., 1–24 (2023) Gupta, V., Kumar, E.: H(3)O-LGBM: hybrid Harris hawk optimization based light gradient boosting machine model for real-time trading. Artif. Intell. Rev., 1–24 (2023)
40.
go back to reference Xie, P., et al.: An explainable machine learning model for predicting in-hospital amputation rate of patients with diabetic foot ulcer. Int. Wound J. 19(4), 910–918 (2022)CrossRef Xie, P., et al.: An explainable machine learning model for predicting in-hospital amputation rate of patients with diabetic foot ulcer. Int. Wound J. 19(4), 910–918 (2022)CrossRef
41.
go back to reference Zhao, F., et al.: Discovery of breast cancer risk genes and establishment of a prediction model based on Estrogen metabolism regulation. BMC Cancer 21(1), 194 (2021)CrossRef Zhao, F., et al.: Discovery of breast cancer risk genes and establishment of a prediction model based on Estrogen metabolism regulation. BMC Cancer 21(1), 194 (2021)CrossRef
42.
go back to reference Roberts, E., Howell, S., Evans, D.G.: Polygenic risk scores and breast cancer risk prediction. Breast 67, 71–77 (2023)CrossRef Roberts, E., Howell, S., Evans, D.G.: Polygenic risk scores and breast cancer risk prediction. Breast 67, 71–77 (2023)CrossRef
43.
go back to reference Lopes Cardozo, J.M.N., et al.: Associations of a breast cancer polygenic risk score with Tumor characteristics and survival. J. Clin. Oncol. 41(10), 1849–1863 (2023) Lopes Cardozo, J.M.N., et al.: Associations of a breast cancer polygenic risk score with Tumor characteristics and survival. J. Clin. Oncol. 41(10), 1849–1863 (2023)
44.
go back to reference Warren Andersen, S., et al.: The associations between a polygenic score, reproductive and menstrual risk factors and breast cancer risk. Breast Cancer Res. Treat. 140(2), 427–434 (2013)CrossRef Warren Andersen, S., et al.: The associations between a polygenic score, reproductive and menstrual risk factors and breast cancer risk. Breast Cancer Res. Treat. 140(2), 427–434 (2013)CrossRef
Metadata
Title
BCSNP-ML: A Novel Breast Cancer Prediction Model Base on LightGBM and Estrogen Metabolic Enzyme Genes
Authors
Tianlei Zheng
Shi Geng
Wei Yan
Fengjun Guan
Na Yang
Lei Zhao
Bei Zhang
Xueyan Zhou
Deqiang Cheng
Copyright Year
2024
Publisher
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-97-2757-5_66

Premium Partner