Skip to main content
Erschienen in: Arabian Journal for Science and Engineering 2/2021

01.10.2020 | Research Article-Electrical Engineering

A Method of Classification Performance Improvement Via a Strategy of Clustering-Based Data Elimination Integrated with k-Fold Cross-Validation

verfasst von: Onur Inan, Mustafa Serter Uzer

Erschienen in: Arabian Journal for Science and Engineering | Ausgabe 2/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Non-system errors that occur during data entry or data collection create noisy data that reduce the success of classification systems. To eliminate this data, a classification system with a new data reduction method consisting of a modified k-means algorithm using relief algorithm coefficients named MKMA-RAC was developed. The main theme of this article is the elimination of noisy data and its consistent application to the classification system using the k-fold cross-validation method. By means of the developed system, the training data became free from noisy data by integrating the support vector machine, linear discriminant analysis (LDA) and decision tree classifiers with MKMA-RAC-based data reduction for every fold. The data reduction process was not applied for the test data. Datasets used in the proposed method were the Hepatitis, Liver Disorders, SPECT images and Statlog (Heart) dataset taken from the UCI database. Classification performance values obtained both from the proposed method and without the proposed method with tenfold CV were given for these datasets. For Hepatitis, Liver Disorders, SPECT images and Statlog (Heart) datasets, and classification successes of the proposed system with SVM classifier were 96.88%, 74.56%, 87.24%, and 90.00%, classification successes of the proposed system with LDA classifier were 94.91%, 69.05%, 82.38%, and 88.52%, classification successes of the proposed system with decision tree classifier were 96.25%, 77.73%, 88.77% and 89.63%, respectively. The test results have shown that the proposed system generally achieved higher classification performance than other literature results. Therefore, the performance is very encouraging for pattern recognition applications.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Kou, G., Yang, P., Peng, Y., Xiao, F., Chen, Y., Alsaadi, F.E.: Evaluation of feature selection methods for text classification with small datasets using multiple criteria decision-making methods. Appl. Soft Comput. 86, 105836 (2020) Kou, G., Yang, P., Peng, Y., Xiao, F., Chen, Y., Alsaadi, F.E.: Evaluation of feature selection methods for text classification with small datasets using multiple criteria decision-making methods. Appl. Soft Comput. 86, 105836 (2020)
4.
Zurück zum Zitat Kohavi, R.: A study of cross validation and bootstrap for accuracy estimation and model selection. Paper presented at the Fourteenth International Joint Conference on Artificial Intelligence, San Francisco, 1995 Kohavi, R.: A study of cross validation and bootstrap for accuracy estimation and model selection. Paper presented at the Fourteenth International Joint Conference on Artificial Intelligence, San Francisco, 1995
8.
Zurück zum Zitat Chen, H.L.; Liu, D.Y.; Yang, B.; Liu, J.; Wang, G.: A new hybrid method based on local fisher discriminant analysis and support vector machines for Hepatitis disease diagnosis. Expert Syst. Appl. 38(9), 11796–11803 (2011)CrossRef Chen, H.L.; Liu, D.Y.; Yang, B.; Liu, J.; Wang, G.: A new hybrid method based on local fisher discriminant analysis and support vector machines for Hepatitis disease diagnosis. Expert Syst. Appl. 38(9), 11796–11803 (2011)CrossRef
9.
Zurück zum Zitat Sartakhti, J.S.; Zangooei, M.H.; Mozafari, K.: Hepatitis disease diagnosis using a novel hybrid method based on support vector machine and simulated annealing (SVM-SA). Comput. Methods Prog. Biol. 108(2), 570–579 (2012)CrossRef Sartakhti, J.S.; Zangooei, M.H.; Mozafari, K.: Hepatitis disease diagnosis using a novel hybrid method based on support vector machine and simulated annealing (SVM-SA). Comput. Methods Prog. Biol. 108(2), 570–579 (2012)CrossRef
10.
Zurück zum Zitat Christo, V.R.E., Nehemiah, H.K., Minu, B., Kannan, A.: Correlation-based ensemble feature selection using bioinspired algorithms and classification using backpropagation neural network. Comput. Math. Methods Med. 2019, 1–17 (2019) Christo, V.R.E., Nehemiah, H.K., Minu, B., Kannan, A.: Correlation-based ensemble feature selection using bioinspired algorithms and classification using backpropagation neural network. Comput. Math. Methods Med. 2019, 1–17 (2019)
12.
Zurück zum Zitat Van Gestel, T.; Suykens, J.A.K.; Lanckriet, G.; Lambrechts, A.; De Moor, B.; Vandewalle, J.: Bayesian framework for least-squares support vector machine classifiers, Gaussian processes, and kernel Fisher discriminant analysis. Neural Comput. 14(5), 1115–1147 (2002)CrossRef Van Gestel, T.; Suykens, J.A.K.; Lanckriet, G.; Lambrechts, A.; De Moor, B.; Vandewalle, J.: Bayesian framework for least-squares support vector machine classifiers, Gaussian processes, and kernel Fisher discriminant analysis. Neural Comput. 14(5), 1115–1147 (2002)CrossRef
16.
Zurück zum Zitat Chen, L.F.; Su, C.T.; Chen, K.H.; Wang, P.C.: Particle swarm optimization for feature selection with application in obstructive sleep apnea diagnosis. Neural Comput. Appl. 21(8), 2087–2096 (2012)CrossRef Chen, L.F.; Su, C.T.; Chen, K.H.; Wang, P.C.: Particle swarm optimization for feature selection with application in obstructive sleep apnea diagnosis. Neural Comput. Appl. 21(8), 2087–2096 (2012)CrossRef
22.
Zurück zum Zitat Polat, K.; Guenes, S.: A new feature selection method on classification of medical datasets: Kernel F-score feature selection. Expert Syst. Appl. 36(7), 10367–10373 (2009)CrossRef Polat, K.; Guenes, S.: A new feature selection method on classification of medical datasets: Kernel F-score feature selection. Expert Syst. Appl. 36(7), 10367–10373 (2009)CrossRef
23.
Zurück zum Zitat Lee, S.H.: Feature selection based on the center of gravity of BSWFMs using NEWFM. Eng. Appl. Artif. Intell. 45, 482–487 (2015)CrossRef Lee, S.H.: Feature selection based on the center of gravity of BSWFMs using NEWFM. Eng. Appl. Artif. Intell. 45, 482–487 (2015)CrossRef
24.
Zurück zum Zitat Kou, G.; Peng, Y.; Wang, G.X.: Evaluation of clustering algorithms for financial risk analysis using MCDM methods. Inform. Sci. 275, 1–12 (2014)CrossRef Kou, G.; Peng, Y.; Wang, G.X.: Evaluation of clustering algorithms for financial risk analysis using MCDM methods. Inform. Sci. 275, 1–12 (2014)CrossRef
26.
Zurück zum Zitat Kira, K., Rendell, L.A.: A practical approach to feature selection. In: Machine Learning Proceedings, pp. 249–256. Elsevier (1992) Kira, K., Rendell, L.A.: A practical approach to feature selection. In: Machine Learning Proceedings, pp. 249–256. Elsevier (1992)
27.
Zurück zum Zitat Cortes, C.; Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)MATH Cortes, C.; Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)MATH
28.
Zurück zum Zitat Laskov, P., Gehl, C., Krüger, S., Müller, K.-R.: Incremental support vector learning: analysis, implementation and applications. J. Mach. Learn. Res. 7, 1909–1936 (2006) Laskov, P., Gehl, C., Krüger, S., Müller, K.-R.: Incremental support vector learning: analysis, implementation and applications. J. Mach. Learn. Res. 7, 1909–1936 (2006)
29.
Zurück zum Zitat Li, T., Kou, G., Peng, Y., Shi, Y.: Classifying with adaptive hyper-spheres: an incremental classifier based on competitive learning. IEEE Trans. Syst. Man Cybern. Syst. (2017) Li, T., Kou, G., Peng, Y., Shi, Y.: Classifying with adaptive hyper-spheres: an incremental classifier based on competitive learning. IEEE Trans. Syst. Man Cybern. Syst. (2017)
30.
Zurück zum Zitat Ivanciuc, O.: Reviews in Computational Chemistry, vol. 23. Wiley, London (2007) Ivanciuc, O.: Reviews in Computational Chemistry, vol. 23. Wiley, London (2007)
31.
Zurück zum Zitat Rokach, L.M.O.: Data Mining with Decision Trees: Theory and Applications, vol. 69. World Scientific, Singapore (2008) Rokach, L.M.O.: Data Mining with Decision Trees: Theory and Applications, vol. 69. World Scientific, Singapore (2008)
33.
Zurück zum Zitat Meng, X.F., Zhang, P., Xu, Y., Xie, H.: Construction of decision tree based on C4.5 algorithm for online voltage stability assessment. Int. J. Electr. Power 118, 105793 (2020). Meng, X.F., Zhang, P., Xu, Y., Xie, H.: Construction of decision tree based on C4.5 algorithm for online voltage stability assessment. Int. J. Electr. Power 118, 105793 (2020).
34.
Zurück zum Zitat Liu, X.Q.; Li, Q.M.; Li, T.; Chen, D.: Differentially private classification with decision tree ensemble. Appl. Soft Comput. 62, 807–816 (2018)CrossRef Liu, X.Q.; Li, Q.M.; Li, T.; Chen, D.: Differentially private classification with decision tree ensemble. Appl. Soft Comput. 62, 807–816 (2018)CrossRef
35.
Zurück zum Zitat Al-Dulaimi, K.; Chandran, V.; Nguyen, K.; Banks, J.; Tomeo-Reyes, I.: Benchmarking HEp-2 specimen cells classification using linear discriminant analysis on higher order spectra features of cell shape. Pattern Recogn. Lett. 125, 534–541 (2019)CrossRef Al-Dulaimi, K.; Chandran, V.; Nguyen, K.; Banks, J.; Tomeo-Reyes, I.: Benchmarking HEp-2 specimen cells classification using linear discriminant analysis on higher order spectra features of cell shape. Pattern Recogn. Lett. 125, 534–541 (2019)CrossRef
36.
Zurück zum Zitat Li, C.N., Shao, Y.H., Wang, Z., Deng, N.Y., Yang, Z.M.: Robust Bhattacharyya bound linear discriminant analysis through an adaptive algorithm. Knowl.-Based Syst. 183, 104858 (2019) Li, C.N., Shao, Y.H., Wang, Z., Deng, N.Y., Yang, Z.M.: Robust Bhattacharyya bound linear discriminant analysis through an adaptive algorithm. Knowl.-Based Syst. 183, 104858 (2019)
39.
Zurück zum Zitat Polat, K.; Gunes, S.: Medical decision support system based on artificial immune recognition immune system (AIRS), fuzzy weighted pre-processing and feature selection. Expert Syst. Appl. 33(2), 484–490 (2007)CrossRef Polat, K.; Gunes, S.: Medical decision support system based on artificial immune recognition immune system (AIRS), fuzzy weighted pre-processing and feature selection. Expert Syst. Appl. 33(2), 484–490 (2007)CrossRef
43.
Zurück zum Zitat Duch, W.; Adamczak, R.; Grabczewski, K.: A new methodology of extraction, optimization and application of crisp and fuzzy logical rules. IEEE Trans. Neural Netw. 12(2), 277–306 (2001)CrossRef Duch, W.; Adamczak, R.; Grabczewski, K.: A new methodology of extraction, optimization and application of crisp and fuzzy logical rules. IEEE Trans. Neural Netw. 12(2), 277–306 (2001)CrossRef
44.
Zurück zum Zitat Sahan, S.; Polat, K.; Kodaz, H.; Gunes, S.: The medical applications of attribute weighted artificial immune system (AWAIS): diagnosis of heart and diabetes diseases. Artif. Immune Syst. Proc. 3627, 456–468 (2005)CrossRef Sahan, S.; Polat, K.; Kodaz, H.; Gunes, S.: The medical applications of attribute weighted artificial immune system (AWAIS): diagnosis of heart and diabetes diseases. Artif. Immune Syst. Proc. 3627, 456–468 (2005)CrossRef
Metadaten
Titel
A Method of Classification Performance Improvement Via a Strategy of Clustering-Based Data Elimination Integrated with k-Fold Cross-Validation
verfasst von
Onur Inan
Mustafa Serter Uzer
Publikationsdatum
01.10.2020
Verlag
Springer Berlin Heidelberg
Erschienen in
Arabian Journal for Science and Engineering / Ausgabe 2/2021
Print ISSN: 2193-567X
Elektronische ISSN: 2191-4281
DOI
https://doi.org/10.1007/s13369-020-04972-y

Weitere Artikel der Ausgabe 2/2021

Arabian Journal for Science and Engineering 2/2021 Zur Ausgabe

    Marktübersichten

    Die im Laufe eines Jahres in der „adhäsion“ veröffentlichten Marktübersichten helfen Anwendern verschiedenster Branchen, sich einen gezielten Überblick über Lieferantenangebote zu verschaffen.