nach oben

Health and Technology

Erschienen in:

01.07.2020 | Original Paper

Assessing the impact of parameters tuning in ensemble based breast Cancer classification

verfasst von: Ali Idri, El Ouassif Bouchra, Mohamed Hosni, Ibtissam Abnane

Erschienen in: Health and Technology | Ausgabe 5/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Breast cancer is one of the major causes of death among women. Different decision support systems were proposed to assist oncologists to accurately diagnose their patients. These decision support systems mainly used classification techniques to categorize the diagnosis into Malign or Benign tumors. Given that no consensus has been reached on the classifier that can perform best in all circumstances, ensemble-based classification, which classifies patients by combining more than one single classification technique, has recently been investigated. In this paper, heterogeneous ensembles based on three well-known machine learning techniques (support vector machines, multilayer perceptron, and decision trees) were developed and evaluated by investigating the impact of parameter values of the ensemble members on classification performance. In particular, we investigate three parameters tuning techniques: Grid Search (GS), Particle Swarm Optimization (PSO) and the default parameters of the Weka Tool to evaluate whether setting ensemble parameters permits more accurate classification in breast cancer over four datasets obtained from the Machine Learning repository. The heterogeneous ensembles of this study were built using the majority voting technique as a combination rule. The overall results obtained suggest that: (1) Using GS or PSO techniques for single techniques provide more accurate classification; (2) In general, ensembles generate more accurate classification than their single techniques regardless of the optimization techniques used. (3) Heterogeneous ensembles based on optimized single classifiers generate better results than the Uniform Configuration of Weka (UC-WEKA) ensembles, and (4) PSO and GS slightly have the same impact on the performances of ensembles.

Vorheriger Artikel An appraisal of literature for design and implementation of developing a framework for digital twin and validation through case studies

Nächster Artikel Staying on the digitalized trail

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Luo ST, Cheng BW. Diagnosing breast masses in digital mammography using feature selection and ensemble methods. J Med Syst. 2012;36:569–77. https://doi.org/10.1007/s10916-010-9518-8.CrossRef

Chen T, Hsu T. A GAs based approach for mining breast cancer pattern. Expert Syst Appl. 2006;30:674–81. https://doi.org/10.1016/j.eswa.2005.07.013.CrossRef

Kaushik D, Kaur K. Application of data mining for high accuracy prediction of breast tissue biopsy results, 2016 3rd Int. Conf. Digit. Inf. Process. Data Mining, Wirel. Commun DIPDMWC 2016; 2016: pp. 40–5. https://doi.org/10.1109/DIPDMWC.2016.7529361.

Idri A, Chlioui I, El Ouassif B. A systematic map of data analytics in breast cancer, In: Proc. Australas. Comput. Sci. Week Multiconference - ACSW ‘18, New York, New York, USA: ACM Press; 2018: pp. 1–10. https://doi.org/10.1145/3167918.3167930.

Ma X, Zhang Y, Wang Y. Performance evaluation of kernel functions based on grid search for support vector regression, In: 2015 IEEE 7th Int. Conf. Cybern. Intell. Syst. IEEE Conf. Robot. Autom. Mechatronics, IEEE; 2015: pp. 283–8. https://doi.org/10.1109/ICCIS.2015.7274635.

Hosni M, Abnane I, Idri A. Reviewing ensemble classification methods in breast cancer. Comput Methods Prog Biomed. 2019;177:89–112. https://doi.org/10.1016/J.CMPB.2019.05.019.CrossRef

Idri A, Hosni M, Abnane I. Impact of parameter tuning on machine learning based breast cancer classification. In: Springer, Cham; 2019: pp. 115–25. https://doi.org/10.1007/978-3-030-16187-3_12.

Vapnik V. Principles of risk minimization for learning theory, in advances in neural information processing systems; 1992.

Vapnik V, Bottou L. Local algorithms for pattern recognition and dependencies estimation. Neural Comput. 1993;5:893–909. https://doi.org/10.1162/neco.1993.5.6.893.CrossRef

10.

Sadri J, Suen C, Bui T. Application of support vector machines for recognition of handwritten Arabic/Persian digits. Second Conf Mach Vis Image Process Appl (MVIP 2003). 2003;1:300–7.

11.

Tong S, Koller D. Support vector machine active learning with applications to text classification. J Mach Learn Res. 2002;2:45–66. https://doi.org/10.1162/153244302760185243.CrossRefMATH

12.

Haykin S. Neural networks: a comprehensive foundation; 1999.

13.

Idri A, Khoshgoftaar T, Abran A. Can neural networks be easily interpreted in software cost estimation? 2002 IEEE World Congr. Comput. Intell. 2002 IEEE Int. Conf. Fuzzy Syst. FUZZ-IEEE’02. Proc. (Cat. No.02CH37291). 2003; 2:1162–7. https://doi.org/10.1109/fuzz.2002.1006668.

14.

Nassif AB, Azzeh M, Capretz LF, Ho D. Neural network models for software development effort estimation: a comparative study. Neural Comput Appl. 2016;27:2369–81. https://doi.org/10.1007/s00521-015-2127-1.CrossRef

15.

Braga PL, Oliveira ALI, Ribeiro GHT, Meira SRL. Bagging predictors for estimation of software project effort. In: 2007 Int. Jt. Conf. Neural Networks, IEEE; 2007: pp. 1595–600. https://doi.org/10.1109/IJCNN.2007.4371196.

16.

Wang Y, Wang Y, Witten IH. Inducing model trees for continuous classes. Proc. 9TH Eur. Conf. Mach. Learn. POSTER Pap.; 1997: 128–37. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.50.9768 (accessed June 30, 2019).

17.

Salzberg SL. C4.5: programs for machine learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993. Mach Learn. 1994;16:235–40. https://doi.org/10.1007/BF00993309.MathSciNetCrossRef

18.

Hosni M, Idri A, Abran A, Bou A. On the value of parameter tuning in heterogeneous ensembles effort estimation. Soft Comput. 2017;22(8):5977–6010. https://doi.org/10.1007/s00500-017-2945-4.

19.

Hosni M, Idri A, Abran A. Evaluating filter fuzzy analogy homogenous ensembles for software development effort estimation. J Softw Evol Process. 2018;31(7):e2117.

20.

Kennedy J, Eberhart R. Particle swarm optimization. In: Proc. ICNN’95 - Int. Conf. Neural Networks, IEEE; 1995: pp. 1942–8. https://doi.org/10.1109/ICNN.1995.488968.

21.

Chen K-H, Wang K-J, Wang K-M, Angelia M-A. Applying particle swarm optimization-based decision tree classifier for cancer classification on gene expression data. Appl Soft Comput. 2014;24:773–80. https://doi.org/10.1016/J.ASOC.2014.08.032.CrossRef

22.

Boeringer DW, Werner DH. Particle swarm optimization versus genetic algorithms for phased array synthesis. In: IEEE Trans. Antennas Propag.; 2004.

23.

Skurichina M, Duin RPW. Bagging and the random subspace method for redundant feature spaces. In: Lect. Notes Comput. Sci. (Including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), Springer Verlag, 2001: pp. 1–10. https://doi.org/10.1007/3-540-48219-9_1.

24.

Skiena SS. Machine learning. In: Springer, Cham; 2017: pp. 351–90. https://doi.org/10.1007/978-3-319-55444-0_11.

25.

Shepperd M, Kadoda G. Comparing software prediction techniques using simulation. IEEE Trans Softw Eng. 2001;27:1014–22. https://doi.org/10.1109/32.965341.CrossRef

26.

Wen J, Li S, Lin Z, Hu Y, Huang C. Systematic literature review of machine learning based software development effort estimation models. Inf Softw Technol. 2012;54:41–59. https://doi.org/10.1016/j.infsof.2011.09.002.CrossRef

27.

Jørgensen M, Shepperd M. A systematic review of software development cost estimation studies. IEEE Trans Softw Eng. 2007;33:33–53. https://doi.org/10.1109/TSE.2007.256943.CrossRef

28.

Idri A, Amazal FA, Abran A. Accuracy comparison of analogy-based software development effort estimation techniques. Int. J. Intell. Syst. 2016;31:128–52. https://doi.org/10.1002/int.21748.CrossRef

29.

Idri A, Hosni M, Abran A. Systematic literature review of ensemble effort estimation. J Syst Softw. 2016;118:151–75. https://doi.org/10.1016/j.jss.2016.05.016.CrossRef

30.

Kuncheva LI. Combining pattern classifiers. Hoboken, NJ: John Wiley & Sons, Inc.; 2014. https://doi.org/10.1002/9781118914564.CrossRefMATH

31.

Nanni L, Lumini A. An experimental comparison of ensemble of classifiers for bankruptcy prediction and credit scoring. Expert Syst Appl. 2009;36:3028–33. https://doi.org/10.1016/j.eswa.2008.01.018.CrossRef

32.

Elish MO, Helmy T, Hussain MI. Empirical study of homogeneous and heterogeneous ensemble models for software development effort estimation. Math Probl Eng. 2013;2013:1–21. https://doi.org/10.1155/2013/312067.CrossRef

33.

Breiman L. Bagging predictors. Mach Learn. 1996;24:123–40. https://doi.org/10.1023/A:1018054314350.CrossRefMATH

34.

Schapire RE. A brief introduction to boosting. Proc. 16th Int. Jt. Conf. Artif. Intell. - vol. 2. 1999: pp. 1401–6. https://dl.acm.org/citation.cfm?id=1624417 (accessed June 20, 2019).

35.

Liu Y, Yao X. Ensemble learning via negative correlation. Neural Netw. 1999;12:1399–404. https://doi.org/10.1016/S0893-6080(99)00073-8.CrossRef

36.

Ho TK. The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell. 1998;20:832–44. https://doi.org/10.1109/34.709601.CrossRef

37.

Elsayad AM. Predicting the severity of breast masses with ensemble of Bayesian classifiers. J Comput Sci. 2010;6:576–84. https://doi.org/10.3844/jcssp.2010.576.584.CrossRef

38.

Hansen LK, Salamon P. Neural network ensembles. IEEE Trans Pattern Anal Mach Intell. 1990;12:993–1001. https://doi.org/10.1109/34.58871.CrossRef

39.

Azzeh M, Nassif AB, Minku LL. An empirical evaluation of ensemble adjustment methods for analogy-based effort estimation. J Syst Softw. 2015;103:36–52. https://doi.org/10.1016/J.JSS.2015.01.028.CrossRef

40.

Onan A. On the performance of ensemble learning for automated diagnosis of breast cancer. Artif Intell Perspect Appl. 2015: 119–29. https://doi.org/10.1007/978-3-319-18476-0.

41.

Al-Quraishi T, Abawajy JH, Chowdhury MU, Rajasegarar S, Abdalrada AS. Breast cancer recurrence prediction using random forest model. In: Int. Conf. Soft Comput. Data Min.; 2018: pp. 318–29.

42.

Winkler SM, Affenzeller M, Schaller S, Stekel H. Data based prediction of cancer diagnoses using heterogeneous model ensembles. In: Proc. Companion Publ. 2014 Annu. Conf. Genet. Evol. Comput.; 2014: pp. 1337–44. https://doi.org/10.1145/2598394.2609853.

43.

Mohebian MR, Marateb HR, Mansourian M, Mañanas MA. A hybrid computer-aided-diagnosis system for prediction of breast Cancer recurrence (HPBCR) using optimized ensemble Learning. Comput Struct Biotechnol J. 2017;15:75–85. https://doi.org/10.1016/j.csbj.2016.11.004.CrossRef

44.

Sokolova M, Lapalme G. A systematic analysis of performance measures for classification tasks. Inf Process Manag. 2009;45:427–37. https://doi.org/10.1016/J.IPM.2009.03.002.CrossRef

45.

Borges L, Ferreira D. Power and type I errors rate of Scott–Knott, Tukey and Newman–Keuls tests under normal and no-normal distributions of the residues. Rev Matemática e Estatística. 2003;21:67–83 http://jaguar.fcav.unesp.br/RME/fasciculos/v21/v21_n1/A4_LiviaBorges.pdf.MATH

46.

Tsoumakas G, Angelis L, Vlahavas I. Selective fusion of heterogeneous classi ers. Intell Data Anal. 2005;9:511–25. https://doi.org/10.3233/ida-2005-9602.CrossRef

47.

Cox DR, Spjøtvoll E. On partitioning means into groups source, Wiley behalf board found. Scand J St. 1982: 147–52.

48.

Calinski T, Corsten LCA. Clustering means in ANOVA by simultaneous testing. Biometrics. 1985;41:39. https://doi.org/10.2307/2530641.CrossRef

49.

Sharma A, Kulshrestha S, Daniel S. Machine learning approaches for breast cancer diagnosis and prognosis, 2017 Int. Conf. Soft Comput. Its Eng. Appl. Harnessing Soft Comput. Tech. Smart Better World, IcSoftComp 2017. 2018-January; 2018: pp. 1–5. https://doi.org/10.1109/ICSOFTCOMP.2017.8280082.

50.

Bony S, Pichon N, Ravel C, Durixl A, Balfourier F. The relationship between mycotoxin synthesis and isolatemorphology in fungal endophytes of Lolium perenne. 2001; 152:125–37.

51.

Azhar D, Riddle P, Mendes E, Mittas N, Angelis L. Using Ensembles for web effort estimation. 2016; https://researchspace.auckland.ac.nz/handle/2292/29236 ().

52.

Idri A, Hosni M, Abran A. Improved estimation of software development effort using classical and fuzzy analogy ensembles. Appl Soft Comput J. 2016;49:990–1019. https://doi.org/10.1016/j.asoc.2016.08.012.CrossRef

53.

Mittas N, Angelis L. Ranking and clustering software cost estimation models through a multiple comparisons algorithm. IEEE Trans Softw Eng. 2013;39:537–51. https://doi.org/10.1109/TSE.2012.45.CrossRef

54.

Mittas N, Mamalikidis I, Angelis L. A framework for comparing multiple cost estimation methods using an automated visualization toolkit. Inf Softw Technol. 2015;57:310–28. https://doi.org/10.1016/j.infsof.2014.05.010.CrossRef

55.

Rowley CK. Borda, Jean-Charles de (1733–1799). 2008: 527–9. https://doi.org/10.1007/978-1-349-58802-2_148.

56.

Ren Y, Bai G. Determination of optimal SVM parameters by using GA/PSO. J Comput. 2010;5:1160–8. https://doi.org/10.4304/jcp.5.8.1160-1168.CrossRef

57.

Sengupta S, Basak S, Peters R. Particle swarm optimization: a survey of historical and recent developments with hybridization perspectives. Mach Learn Knowl Extr. 2018;1:157–91. https://doi.org/10.3390/make1010010.CrossRef

58.

Hsu C-W, Chang C-C, Lin C-J. A practical guide to support vector classification. 2003. http://www.csie.ntu.edu.tw/~cjlin (accessed May 16, 2020).

59.

Kernel width selection for SVM classification: A meta-learning approach: Computer Science & IT Book Chapter | IGI Global. n.d. https://www.igi-global.com/chapter/kernel-width-selection-svm-classification/26135 (accessed May 16, 2020).

60.

Huang H-Y, Lin C-J. Linear and kernel classification: when to use which? n.d. http://www.csie.ntu.edu.tw/ (accessed May 16, 2020).

61.

Mat Deris A, Mohd Zain A, Sallehuddin R. Overview of support vector machine in modeling machining performances. In: Procedia Eng., Elsevier. 2011: pp. 308–12. https://doi.org/10.1016/j.proeng.2011.11.2647.

62.

Oh SB. On the relationship between majority vote accuracy and dependency in multiple classifier systems. Pattern Recogn Lett. 2003;24:359–63. https://doi.org/10.1016/S0167-8655(02)00260-X.CrossRefMATH

63.

Kuncheva LI. Combining pattern classifiers: methods and algorithms. 2014. https://doi.org/10.1002/97811189145641.

64.

Orrite C, Rodríguez M, Martínez F, Fairhurst M. Classifier ensemble generation for the majority vote rule. Lect. Notes Comput. Sci. (Including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics). 5197 LNCS. 2008:pp. 340–47. https://doi.org/10.1007/978-3-540-85920-8_42.

65.

Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: Synthetic minority over-sampling technique. 2002.

66.

UCI Machine Learning Repository: Breast Cancer data set. n.d. https://archive.ics.uci.edu/ml/datasets/breast+cancer (accessed December 18, 2018).

67.

UCI Machine Learning Repository: Breast Cancer Wisconsin (diagnostic) data set. n.d. https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic) (accessed May 16, 2020).

68.

UCI Machine Learning Repository: Breast Cancer Wisconsin (original) data set. n.d. https://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+(original) (accessed May 16, 2020).

69.

UCI Machine Learning Repository: Breast Cancer Wisconsin (prognostic) data set. n.d.. https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Prognostic) (accessed May 16, 2020).

70.

Göndör M, Bresfelean VP. REPTree and M5P for measuring fiscal policy influences on the Romanian capital market during 2003–2010. Int J Math Comput Simul. 2012;6:3783–86 http://naun.org/main/NAUN/mcs/17-414.pdf (accessed June 21, 2019).

71.

Kalmegh SR. Analysis of WEKA data mining algorithm REPTree. Simple Cart and RandomTree for Classification of Indian News. 2015. https://www.semanticscholar.org/paper/Analysis-of-WEKA-Data-Mining-Algorithm-REPTree%2C-and-Kalmegh/26d673f140807942313545489b38241c1f0401d0 (accessed June 21, 2019).

72.

Shi Y, Eberhart R. A modified particle swarm optimizer. In: 1998 IEEE Int. Conf. Evol. Comput. Proceedings. IEEE World Congr. Comput. Intell. (Cat. No.98TH8360), IEEE. n.d.: pp. 69–73. https://doi.org/10.1109/ICEC.1998.699146.

73.

Kim M-J, Kang D-K. Ensemble with neural networks for bankruptcy prediction. Expert Syst Appl. 2010;37:3373–9. https://doi.org/10.1016/J.ESWA.2009.10.012.CrossRef

74.

Kim H-C, Pang S, Je H-M, Kim D, Yang Bang S. Constructing support vector machine ensemble. Pattern Recogn. 2003;36:2757–67. https://doi.org/10.1016/S0031-3203(03)00175-4.CrossRefMATH

75.

Santana A, Soares RF, Canuto AP, De Souto MP. A dynamic classifier selection method to build ensembles using accuracy and diversity. In: 2006 Ninth Brazilian Symp. Neural Networks, IEEE. 2006: p. 7–7. https://doi.org/10.1109/SBRN.2006.1.

76.

Idri A, Abnane I, Abran A. Missing data techniques in analogy-based software development effort estimation. J Syst Softw. 2016;117:595–611. https://doi.org/10.1016/J.JSS.2016.04.058.CrossRef

77.

Osborne JW. Improving your data transformations: applying the Box-Cox transformation. Pract Assess Res Eval. 2010;15:1–9.

78.

Sakia R. M., The Box-Cox Transformation Technique: A Review, Journal of the Royal Statistical Society. Series D (The Statistician), Vol. 41, No. 2 (1992), pp. 169–178

79.

Azzeh M, Nassif AB. Analyzing the relationship between project productivity and environment factors in the use case points method. J Softw Evol Process. 2017;29:1–19. https://doi.org/10.1002/smr.1882.CrossRef

80.

Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K. An empirical comparison of model validation techniques for defect prediction models. IEEE Trans Softw Eng. 2017;43:1–18. https://doi.org/10.1109/TSE.2016.2584050.CrossRef

81.

Nassif AB, Azzeh M, Idri A, Abran A. Software development effort estimation using regression fuzzy models. Comput Intell Neurosci. 2019;2019:1–17. https://doi.org/10.1155/2019/8367214.CrossRef

82.

Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K. The impact of automated parameter optimization on defect prediction models. IEEE Trans Softw Eng. 2019;45:683–711. https://doi.org/10.1109/TSE.2018.2794977.CrossRef

83.

H. Wang, N. Lu, T. Chen, H. He, Y. Lu, X.M. Tu, Log-transformation and its implications for data analysis. Biostatistics in psychiatry (20). Shanghai Arch Psychiatry, 2014, Vol. 26, No. 2. 26 (2014) 105–109. https://doi.org/10.3969/j.issn.1002-0829.2014.02.009.

84.

Ludwig O. Blom, Gunnar: statistical estimates and transformed beta-variables. Wiley/New York, Almquist und Wiksell/Stockholm 1958; 176 S., Kr. 20,—. Biom Z. 1961;3:285–5. https://doi.org/10.1002/bimj.19610030410.

Titel: Assessing the impact of parameters tuning in ensemble based breast Cancer classification
verfasst von: Ali Idri
El Ouassif Bouchra
Mohamed Hosni
Ibtissam Abnane
Publikationsdatum: 01.07.2020
Verlag: Springer Berlin Heidelberg
Erschienen in: Health and Technology / Ausgabe 5/2020
Print ISSN: 2190-7188
Elektronische ISSN: 2190-7196
DOI: https://doi.org/10.1007/s12553-020-00453-2

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 5/2020

Pharmaceutical industrial equipment qualification in Brazil: a strategic test proposal for vaccine secondary material packaging lines

The relationship between users’ technology approaches and experiences in a child development mobile application

Caregiver Delivered Sensory Electrical Stimulation for Post Stroke Upper Limb Spasticity: A Single Blind Crossover Randomized Feasibility Study

Heart disease classification using data mining tools and machine learning techniques

Correction to: A review of dementia screening tools based on Mobile application

Fog computing in health: A systematic literature review

Premium Partner