Skip to main content
Top
Published in: Arabian Journal for Science and Engineering 2/2022

20-09-2021 | Research Article-Computer Engineering and Computer Science

Classification Based on Structural Information in Data

Authors: Bergen Karabulut, Güvenç Arslan, Halil Murat Ünver

Published in: Arabian Journal for Science and Engineering | Issue 2/2022

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Clustering provides structural information from unlabeled data. The studies in which the structural information of the dataset is obtained through unsupervised learning approaches such as clustering and then transferred to the supervised learning are noteworthy. In this study, we propose a new preprocessing method, which obtains structural information that is expected to represent the most meaningful summary of the training dataset before applying a supervised learning strategy. To obtain this summary, the CURE clustering method was used. The proposed preprocessing method combined with SVM and a new classification method named representative points based SVM (RP-SVM) was developed. This new method was experimentally tested with various real datasets and was compared with the standard SVM, KMSVM, KNN and CART methods. The RP-SVM has significantly reduced the training size and resulted in less support vectors compared to standard SVM while achieving similar accuracy results. The RP-SVM has achieved better accuracy with less training data compared to KNN and CART. In addition, the RP-SVM has less data reduction compared to the KMSVM, but it is a more stable method that performs well in all datasets used. The results show that the proposed method can extract structural information that provides high quality for classification.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Lopes, A.A.; Bertini, J.R.; Motta, R.; Zhao, L.: Classification based on the optimal k-associated network. In Proceedings of the International Conference on Complex Sciences, Springer, Heidelberg, Berlin, pp. 1167–1177 (2009) Lopes, A.A.; Bertini, J.R.; Motta, R.; Zhao, L.: Classification based on the optimal k-associated network. In Proceedings of the International Conference on Complex Sciences, Springer, Heidelberg, Berlin, pp. 1167–1177 (2009)
2.
go back to reference Kayaalp, N.; Arslan, G.: Fuzzy Bayesian Classifier with Learned Mahalanobis Distance. Int. J. Intell. Syst. 29, 713–726 (2014)CrossRef Kayaalp, N.; Arslan, G.: Fuzzy Bayesian Classifier with Learned Mahalanobis Distance. Int. J. Intell. Syst. 29, 713–726 (2014)CrossRef
3.
go back to reference Gu, Q.; Han, J.: Clustered support vector machines. In Proceedings of the 16th International Conference on Artificial Intelligence and Statistics, Scottsdale, AZ, USA, pp. 307–315 (2013) Gu, Q.; Han, J.: Clustered support vector machines. In Proceedings of the 16th International Conference on Artificial Intelligence and Statistics, Scottsdale, AZ, USA, pp. 307–315 (2013)
4.
go back to reference Widodo, A.; Yang, B.S.: Support vector machine in machine condition monitoring and fault diagnosis. Mech. Syst. Signal Process 21, 2560–2574 (2007)CrossRef Widodo, A.; Yang, B.S.: Support vector machine in machine condition monitoring and fault diagnosis. Mech. Syst. Signal Process 21, 2560–2574 (2007)CrossRef
5.
go back to reference Wang, F.; Zhen, Z.; Wang, B.; Mi, Z.: Comparative study on KNN and SVM based weather classification models for day ahead short term solar PV power forecasting. Appl. Sci. 8, 28 (2018)CrossRef Wang, F.; Zhen, Z.; Wang, B.; Mi, Z.: Comparative study on KNN and SVM based weather classification models for day ahead short term solar PV power forecasting. Appl. Sci. 8, 28 (2018)CrossRef
6.
go back to reference Vijayarajeswari, R.; Parthasarathy, P.; Vivekanandan, S.; Basha, A.A.: Classification of mammogram for early detection of breast cancer using SVM classifier and Hough transform. Measurement 146, 800–805 (2019)CrossRef Vijayarajeswari, R.; Parthasarathy, P.; Vivekanandan, S.; Basha, A.A.: Classification of mammogram for early detection of breast cancer using SVM classifier and Hough transform. Measurement 146, 800–805 (2019)CrossRef
7.
go back to reference Shankar, K.; Lakshmanaprabu, S.K.; Gupta, D., et al.: Optimal feature-based multi-kernel SVM approach for thyroid disease classification. J. Supercomput. 76, 1128–1143 (2020)CrossRef Shankar, K.; Lakshmanaprabu, S.K.; Gupta, D., et al.: Optimal feature-based multi-kernel SVM approach for thyroid disease classification. J. Supercomput. 76, 1128–1143 (2020)CrossRef
8.
go back to reference Sahoo, K.S.; Tripathy, B.K.; Naik, K.; Ramasubbareddy, S.; Balusamy, B.; Khari, M.; Burgos, D.: An evolutionary SVM model for DDOS attack detection in software defined networks. IEEE Access 8, 132502–132513 (2020)CrossRef Sahoo, K.S.; Tripathy, B.K.; Naik, K.; Ramasubbareddy, S.; Balusamy, B.; Khari, M.; Burgos, D.: An evolutionary SVM model for DDOS attack detection in software defined networks. IEEE Access 8, 132502–132513 (2020)CrossRef
9.
go back to reference Gopi, A.P.; Jyothi, R.N.S.; Narayana, V.L.; Sandeep, K.S.: Classification of tweets data based on polarity using improved RBF kernel of SVM. Int. J. Inf. Technol. 1–16 (2020) Gopi, A.P.; Jyothi, R.N.S.; Narayana, V.L.; Sandeep, K.S.: Classification of tweets data based on polarity using improved RBF kernel of SVM. Int. J. Inf. Technol. 1–16 (2020)
10.
go back to reference Byun, H.; Lee, S.W.: Applications of support vector machines for pattern recognition: A survey, p. 213–236. In Proceedings of the International Workshop on Support Vector Machines, Springer, Heidelberg, Berlin (2002)MATHCrossRef Byun, H.; Lee, S.W.: Applications of support vector machines for pattern recognition: A survey, p. 213–236. In Proceedings of the International Workshop on Support Vector Machines, Springer, Heidelberg, Berlin (2002)MATHCrossRef
11.
go back to reference Burges, C.J.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2, 121–167 (1998)CrossRef Burges, C.J.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2, 121–167 (1998)CrossRef
12.
go back to reference Almasi, O.N.; Rouhani, M.: Fast and de-noise support vector machine training method based on fuzzy clustering method for large real-world datasets. Turk. J. Elec. Comp. Sci. 24, 219–233 (2016)CrossRef Almasi, O.N.; Rouhani, M.: Fast and de-noise support vector machine training method based on fuzzy clustering method for large real-world datasets. Turk. J. Elec. Comp. Sci. 24, 219–233 (2016)CrossRef
13.
go back to reference Wang, J.; Wu, X.; Zhang, C.: Support vector machines based on K-means clustering for real-time business intelligence systems. Int. J. Bus. Intell. Data Min. 1, 54–64 (2005) Wang, J.; Wu, X.; Zhang, C.: Support vector machines based on K-means clustering for real-time business intelligence systems. Int. J. Bus. Intell. Data Min. 1, 54–64 (2005)
14.
go back to reference Lee, S.J.; Park, C.; Jhun, M.; Koo, J.Y.: Support vector machine using K-means clustering. J. Korean Stat. Soc. 36, 175–182 (2007)MathSciNetMATH Lee, S.J.; Park, C.; Jhun, M.; Koo, J.Y.: Support vector machine using K-means clustering. J. Korean Stat. Soc. 36, 175–182 (2007)MathSciNetMATH
15.
go back to reference Chen, J.; Pan, F.: Clustering-based geometric support vector machines, p. 207–217. In Proceedings of the Life System Modeling and Intelligent Computing, Springer, Berlin, Heidelberg (2010) Chen, J.; Pan, F.: Clustering-based geometric support vector machines, p. 207–217. In Proceedings of the Life System Modeling and Intelligent Computing, Springer, Berlin, Heidelberg (2010)
16.
go back to reference Yao, Y.; Liu, Y.; Yu, Y., et al.: K-SVM: An Effective SVM Algorithm Based on K-means Clustering. J. Comput. 8, 2632–2639 (2013) Yao, Y.; Liu, Y.; Yu, Y., et al.: K-SVM: An Effective SVM Algorithm Based on K-means Clustering. J. Comput. 8, 2632–2639 (2013)
17.
go back to reference Gan, J.; Li, A.; Lei, Q.L.; Ren, H., Yang, Y.: K-means based on active learning for support vector machine. In Proceedings of the IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS), Wuhan, China, pp.727–731 (2017) Gan, J.; Li, A.; Lei, Q.L.; Ren, H., Yang, Y.: K-means based on active learning for support vector machine. In Proceedings of the IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS), Wuhan, China, pp.727–731 (2017)
18.
19.
go back to reference Yu, H.; Yang, J.; Han, J.: Classifying large datasets using SVMs with hierarchical clusters. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM. pp. 306–315 (2003) Yu, H.; Yang, J.; Han, J.: Classifying large datasets using SVMs with hierarchical clusters. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM. pp. 306–315 (2003)
20.
go back to reference Horng, S.J.; Su, M.Y.; Chen, Y.H., et al.: A novel intrusion detection system based on hierarchical clustering and support vector machines. Expert Syst. Appl. 38, 306–313 (2011)CrossRef Horng, S.J.; Su, M.Y.; Chen, Y.H., et al.: A novel intrusion detection system based on hierarchical clustering and support vector machines. Expert Syst. Appl. 38, 306–313 (2011)CrossRef
21.
go back to reference Bang, S.; Koo, J.Y.; Jhun, M.: Support vector machine using k-spatial medians clustering and recovery process. Commun. Stat. -Simul. Comput. 39, 1422–1434 (2010)MathSciNetMATHCrossRef Bang, S.; Koo, J.Y.; Jhun, M.: Support vector machine using k-spatial medians clustering and recovery process. Commun. Stat. -Simul. Comput. 39, 1422–1434 (2010)MathSciNetMATHCrossRef
22.
go back to reference Arslan, G.; Karabulut, B.; Ünver, H.M.: On using structural patterns in data for classification. Adva. Appl. Stat. 65, 33–56 (2020) Arslan, G.; Karabulut, B.; Ünver, H.M.: On using structural patterns in data for classification. Adva. Appl. Stat. 65, 33–56 (2020)
23.
go back to reference Andre, A.B.; Beltrame, E.; Wainer, J.: A combination of support vector machine and k-nearest neighbors for machine fault detection. Appl. Artif. Intell. 27, 36–49 (2013)CrossRef Andre, A.B.; Beltrame, E.; Wainer, J.: A combination of support vector machine and k-nearest neighbors for machine fault detection. Appl. Artif. Intell. 27, 36–49 (2013)CrossRef
24.
go back to reference Kavzoglu, T.; Colkesen, I.: A kernel functions analysis for support vector machines for land cover classification. Int. J. Appl. Earth Obs. Geoinf. 11, 352–359 (2009)CrossRef Kavzoglu, T.; Colkesen, I.: A kernel functions analysis for support vector machines for land cover classification. Int. J. Appl. Earth Obs. Geoinf. 11, 352–359 (2009)CrossRef
26.
go back to reference Achirul Nanda, M.; Boro Seminar, K.; Nandika, D.; Maddu, A.: A comparison study of kernel functions in the support vector machine and its application for termite detection. Information 9, 5 (2018)CrossRef Achirul Nanda, M.; Boro Seminar, K.; Nandika, D.; Maddu, A.: A comparison study of kernel functions in the support vector machine and its application for termite detection. Information 9, 5 (2018)CrossRef
27.
go back to reference Guha, S.; Rastogi, R.; Shim, K.: CURE: an efficient clustering algorithm for large databases. ACM Sigmod Rec. 27, 73–84 (1998)MATHCrossRef Guha, S.; Rastogi, R.; Shim, K.: CURE: an efficient clustering algorithm for large databases. ACM Sigmod Rec. 27, 73–84 (1998)MATHCrossRef
28.
go back to reference Guha, S.; Rastogi, R.; Shim, K.: Cure: an efficient clustering algorithm for large databases. Inf. Syst. 26, 35–58 (2001)MATHCrossRef Guha, S.; Rastogi, R.; Shim, K.: Cure: an efficient clustering algorithm for large databases. Inf. Syst. 26, 35–58 (2001)MATHCrossRef
29.
go back to reference Xiang, S.; Nie, F.; Zhang, C.: Learning a Mahalanobis distance metric for data clustering and classification. Pattern Recognit. 41, 3600–3612 (2008)MATHCrossRef Xiang, S.; Nie, F.; Zhang, C.: Learning a Mahalanobis distance metric for data clustering and classification. Pattern Recognit. 41, 3600–3612 (2008)MATHCrossRef
30.
go back to reference Hu, L.Y.; Huang, M.W.; Ke, S.W.; Tsai, C.F.: The distance function effect on k-nearest neighbor classification for medical datasets. Springerplus 5, 1304 (2016)CrossRef Hu, L.Y.; Huang, M.W.; Ke, S.W.; Tsai, C.F.: The distance function effect on k-nearest neighbor classification for medical datasets. Springerplus 5, 1304 (2016)CrossRef
31.
go back to reference Karypis, G.; Han, E.H.; Kumar, V.: Chameleon: Hierarchical clustering using dynamic modelling. Comput. 32, 68–75 (1999)CrossRef Karypis, G.; Han, E.H.; Kumar, V.: Chameleon: Hierarchical clustering using dynamic modelling. Comput. 32, 68–75 (1999)CrossRef
32.
go back to reference Sayed, G.I.; Hassanien, A.E.: Moth-flame swarm optimization with neutrosophic sets for automatic mitosis detection in breast cancer histology images. Appl. Intell. 47, 397–408 (2017)CrossRef Sayed, G.I.; Hassanien, A.E.: Moth-flame swarm optimization with neutrosophic sets for automatic mitosis detection in breast cancer histology images. Appl. Intell. 47, 397–408 (2017)CrossRef
33.
go back to reference Alcalá-Fdez, J.; Fernandez, A.; Luengo, J.; Derrac, J.; García, S.; Sánchez, L.; Herrera, F.: KEEL data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. J. Mult-Valued Log. S. 17, 255–287 (2011) Alcalá-Fdez, J.; Fernandez, A.; Luengo, J.; Derrac, J.; García, S.; Sánchez, L.; Herrera, F.: KEEL data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. J. Mult-Valued Log. S. 17, 255–287 (2011)
34.
go back to reference Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B., et al.: Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetMATH Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B., et al.: Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetMATH
35.
go back to reference Bertini, J.R.; Zhao, L.; Motta, R.; de Andrade Lopes, A.: A nonparametric classification method based on k-associated graphs. Inf. Sci. 181, 5435–5456 (2011)MathSciNetCrossRef Bertini, J.R.; Zhao, L.; Motta, R.; de Andrade Lopes, A.: A nonparametric classification method based on k-associated graphs. Inf. Sci. 181, 5435–5456 (2011)MathSciNetCrossRef
36.
go back to reference Debnath, R.; Takahide, N.; Takahashi, H.: A decision based one-against-one method for multi-class support vector machine. Pattern Anal. Appl. 7, 164–175 (2004)MathSciNetCrossRef Debnath, R.; Takahide, N.; Takahashi, H.: A decision based one-against-one method for multi-class support vector machine. Pattern Anal. Appl. 7, 164–175 (2004)MathSciNetCrossRef
37.
go back to reference Parvandeh, S.; Yeh, H.W.; Paulus, M.P.; McKinney, B.A.: Consensus features nested cross-validation. Bioinformatics 36, 3093–3098 (2020)CrossRef Parvandeh, S.; Yeh, H.W.; Paulus, M.P.; McKinney, B.A.: Consensus features nested cross-validation. Bioinformatics 36, 3093–3098 (2020)CrossRef
38.
go back to reference Vabalas, A.; Gowen, E.; Poliakoff, E.; Casson, A.J.: Machine learning algorithm validation with a limited sample size. PLoS ONE 14, 0224365 (2019)CrossRef Vabalas, A.; Gowen, E.; Poliakoff, E.; Casson, A.J.: Machine learning algorithm validation with a limited sample size. PLoS ONE 14, 0224365 (2019)CrossRef
39.
go back to reference Shandilya, S.; Ward, K.; Kurz, M.; Najarian, K.: Non-linear dynamical signal characterization for prediction of defibrillation success through machine learning. BMC Med. Inform. Decis. Mak. 12, 1–9 (2012)CrossRef Shandilya, S.; Ward, K.; Kurz, M.; Najarian, K.: Non-linear dynamical signal characterization for prediction of defibrillation success through machine learning. BMC Med. Inform. Decis. Mak. 12, 1–9 (2012)CrossRef
40.
go back to reference Varma, S.; Simon, R.: Bias in error estimation when using cross-validation for model selection. BMC Bioinformatics 7, 1–8 (2006)CrossRef Varma, S.; Simon, R.: Bias in error estimation when using cross-validation for model selection. BMC Bioinformatics 7, 1–8 (2006)CrossRef
41.
go back to reference Rus, V.; Lintean, M.; Azevedo, R.: automatic detection of student mental models during prior knowledge activation in metatutor. In Proceedings of the International Conference on Educational Data Mining (EDM), Cordoba, Spain, pp. 161–170 (2009) Rus, V.; Lintean, M.; Azevedo, R.: automatic detection of student mental models during prior knowledge activation in metatutor. In Proceedings of the International Conference on Educational Data Mining (EDM), Cordoba, Spain, pp. 161–170 (2009)
42.
go back to reference Seo, J.; Laine, T.H.; Sohn, K.A.: An exploration of machine learning methods for robust boredom classification using EEG and GSR data. Sensors 19(20), 4561 (2019)CrossRef Seo, J.; Laine, T.H.; Sohn, K.A.: An exploration of machine learning methods for robust boredom classification using EEG and GSR data. Sensors 19(20), 4561 (2019)CrossRef
43.
go back to reference Widera, P.; Welsing, P.M.; Ladel, C.; Loughlin, J.; Lafeber, F.P.; Dop, F.P., et al.: Multi-classifier prediction of knee osteoarthritis progression from incomplete imbalanced longitudinal data. Sci. Rep. 10(1), 1–15 (2020)CrossRef Widera, P.; Welsing, P.M.; Ladel, C.; Loughlin, J.; Lafeber, F.P.; Dop, F.P., et al.: Multi-classifier prediction of knee osteoarthritis progression from incomplete imbalanced longitudinal data. Sci. Rep. 10(1), 1–15 (2020)CrossRef
44.
go back to reference Witten, I.H.; Frank, E.: Data mining: practical machine learning tools and techniques with Java implementations. Sigmod Rec. 31(1), 76–77 (2002)CrossRef Witten, I.H.; Frank, E.: Data mining: practical machine learning tools and techniques with Java implementations. Sigmod Rec. 31(1), 76–77 (2002)CrossRef
45.
go back to reference James, G.; Daniela, W.; Trevor, H.; Robert, T.: An introduction to statistical learning: with applications in R. Springer, New York (2013)MATHCrossRef James, G.; Daniela, W.; Trevor, H.; Robert, T.: An introduction to statistical learning: with applications in R. Springer, New York (2013)MATHCrossRef
46.
go back to reference Veena, K.M.; Manjula Shenoy, K.; Ajitha Shenoy, K.B.: Performance comparison of machine learning classification algorithms. In International Conference on Advances in Computing and Data Sciences, pp. 489–497, Springer, Singapore (2018) Veena, K.M.; Manjula Shenoy, K.; Ajitha Shenoy, K.B.: Performance comparison of machine learning classification algorithms. In International Conference on Advances in Computing and Data Sciences, pp. 489–497, Springer, Singapore (2018)
Metadata
Title
Classification Based on Structural Information in Data
Authors
Bergen Karabulut
Güvenç Arslan
Halil Murat Ünver
Publication date
20-09-2021
Publisher
Springer Berlin Heidelberg
Published in
Arabian Journal for Science and Engineering / Issue 2/2022
Print ISSN: 2193-567X
Electronic ISSN: 2191-4281
DOI
https://doi.org/10.1007/s13369-021-06177-3

Other articles of this Issue 2/2022

Arabian Journal for Science and Engineering 2/2022 Go to the issue

Research Article-Computer Engineering and Computer Science

Effect of Identifier Tokenization on Automatic Source Code Documentation

Research Article-Computer Engineering and Computer Science

Resource Provisioning Through Machine Learning in Cloud Services

Premium Partners