Skip to main content
Top

2019 | OriginalPaper | Chapter

Does the Order of Attributes Play an Important Role in Classification?

Authors : Antonio J. Tallón-Ballesteros, Simon Fong, Rocío Leal-Díaz

Published in: Hybrid Artificial Intelligent Systems

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This paper proposes a methodology to feature sorting in the context of supervised machine learning algorithms. Feature sorting is defined as a procedure to order the initial arrangement of the attributes according to any sorting algorithm to assign an ordinal number to every feature, depending on its importance; later the initial features are sorted following the ordinal numbers from the first to the last, which are provided by the sorting method. Feature ranking has been chosen as the representative technique to fulfill the sorting purpose inside the feature selection area. This contribution aims at introducing a new methodology where all attributes are included in the data mining task, following different sortings by means of different feature ranking methods. The approach has been assessed in ten binary and multiple class problems with a number of features lower than 37 and a number of instances below than 106 up to 28056; the test-bed includes one challenging data set with 21 labels and 23 attributes where previous works were not able to achieve an accuracy of at least a fifty percent. ReliefF is a strong candidate to be applied in order to re-sort the initial characteristic space and C4.5 algorithm achieved a promising global performance; additionally, PART -a rule-based classifier- and Support Vector Machines obtained acceptable results.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Amari, S.-I.: Mathematical foundations of neurocomputing. Proc. IEEE 78(9), 1443–1463 (1990)CrossRef Amari, S.-I.: Mathematical foundations of neurocomputing. Proc. IEEE 78(9), 1443–1463 (1990)CrossRef
2.
go back to reference Azevedo, A.: Data mining and knowledge discovery in databases. In: Advanced Methodologies and Technologies in Network Architecture, Mobile Computing, and Data Analytics, pp. 502–514. IGI Global (2019) Azevedo, A.: Data mining and knowledge discovery in databases. In: Advanced Methodologies and Technologies in Network Architecture, Mobile Computing, and Data Analytics, pp. 502–514. IGI Global (2019)
3.
go back to reference Bäck, T.: Evolutionary Algorithms in Theory and Practice: Evolution Strategies, Evolutionary Programming, Genetic Algorithms. Oxford University Press, New York (1996)MATH Bäck, T.: Evolutionary Algorithms in Theory and Practice: Evolution Strategies, Evolutionary Programming, Genetic Algorithms. Oxford University Press, New York (1996)MATH
5.
go back to reference Corchado, E., Corchado Rodrguez, J.M., Abraham, A.: Innovations in Hybrid Intelligent Systems, vol. 44. Springer Science & Business Media, Berlin (2007)CrossRef Corchado, E., Corchado Rodrguez, J.M., Abraham, A.: Innovations in Hybrid Intelligent Systems, vol. 44. Springer Science & Business Media, Berlin (2007)CrossRef
6.
go back to reference Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)CrossRef Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)CrossRef
7.
go back to reference Cover, T.M., Thomas, J.A.: Elements of information theory. New York 68, 69–73 (1991) Cover, T.M., Thomas, J.A.: Elements of information theory. New York 68, 69–73 (1991)
9.
go back to reference Di Ruberto, C., Putzu, L., Arabnia, H.R., Quoc-Nam, T.: A feature learning framework for histology images classification. In: Emerging Trends in Applications and Infrastructures for Computational Biology, Bioinformatics, and Systems Biology: Systems and Applications, pp. 37–48. Elsevier Press (2016) Di Ruberto, C., Putzu, L., Arabnia, H.R., Quoc-Nam, T.: A feature learning framework for histology images classification. In: Emerging Trends in Applications and Infrastructures for Computational Biology, Bioinformatics, and Systems Biology: Systems and Applications, pp. 37–48. Elsevier Press (2016)
10.
go back to reference Frank, E., Witten, I.H.: Generating accurate rule sets without global optimization. In: Shavlik, J. (ed.) Fifteenth International Conference on Machine Learning, pp. 144–151. Morgan Kaufmann (1998) Frank, E., Witten, I.H.: Generating accurate rule sets without global optimization. In: Shavlik, J. (ed.) Fifteenth International Conference on Machine Learning, pp. 144–151. Morgan Kaufmann (1998)
11.
go back to reference Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. ACM SIGKDD Exp. Newsl. 11(1), 10–18 (2009)CrossRef Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. ACM SIGKDD Exp. Newsl. 11(1), 10–18 (2009)CrossRef
12.
go back to reference He, J., Yang, Z., Yao, X.: Hybridisation of evolutionary programming and machine learning with k-nearest neighbor estimation. In: 2007 IEEE Congress on Evolutionary Computation, pp. 1693–1700. IEEE (2007) He, J., Yang, Z., Yao, X.: Hybridisation of evolutionary programming and machine learning with k-nearest neighbor estimation. In: 2007 IEEE Congress on Evolutionary Computation, pp. 1693–1700. IEEE (2007)
13.
go back to reference Huber, S., Wiemer, H., Schneider, D., Ihlenfeldt, S.: DMME: data mining methodology for engineering applications-a holistic extension to the CRISP-DM model. Procedia CIRP 79, 403–408 (2019)CrossRef Huber, S., Wiemer, H., Schneider, D., Ihlenfeldt, S.: DMME: data mining methodology for engineering applications-a holistic extension to the CRISP-DM model. Procedia CIRP 79, 403–408 (2019)CrossRef
14.
go back to reference Jordan, M.I., Mitchell, T.M.: Machine learning: trends, perspectives, and prospects. Science 349(6245), 255–260 (2015)MathSciNetCrossRef Jordan, M.I., Mitchell, T.M.: Machine learning: trends, perspectives, and prospects. Science 349(6245), 255–260 (2015)MathSciNetCrossRef
15.
go back to reference Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)CrossRef Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)CrossRef
17.
go back to reference Kruse, R., Gebhardt, J.E., Klawon, F.: Foundations of Fuzzy Systems. John Wiley & Sons Inc., New York (1994) Kruse, R., Gebhardt, J.E., Klawon, F.: Foundations of Fuzzy Systems. John Wiley & Sons Inc., New York (1994)
18.
go back to reference Liu, W., Liu, S., Gu, Q., Chen, X., Chen, D.: FECS: a cluster based feature selection method for software fault prediction with noises. In: 2015 IEEE 39th Annual Computer Software and Applications Conference, vol. 2, pp. 276–281. IEEE (2015) Liu, W., Liu, S., Gu, Q., Chen, X., Chen, D.: FECS: a cluster based feature selection method for software fault prediction with noises. In: 2015 IEEE 39th Annual Computer Software and Applications Conference, vol. 2, pp. 276–281. IEEE (2015)
19.
go back to reference May, T., Bannach, A., Davey, J., Ruppert, T., Kohlhammer, J.: Guiding feature subset selection with an interactive visualization. In: 2011 IEEE Conference on Visual Analytics Science and Technology (VAST), pp. 111–120. IEEE (2011) May, T., Bannach, A., Davey, J., Ruppert, T., Kohlhammer, J.: Guiding feature subset selection with an interactive visualization. In: 2011 IEEE Conference on Visual Analytics Science and Technology (VAST), pp. 111–120. IEEE (2011)
20.
go back to reference Michalski, R.S., Mozetic, I., Hong, J., Lavrac, N.: The multi-purpose incremental learning system AQ15 and its testing application to three medical domains. In: Proceedings of the AAAI 1986, pp. 1–041 (1986) Michalski, R.S., Mozetic, I., Hong, J., Lavrac, N.: The multi-purpose incremental learning system AQ15 and its testing application to three medical domains. In: Proceedings of the AAAI 1986, pp. 1–041 (1986)
21.
go back to reference Narendra, P.M., Fukunaga, K.: A branch and bound algorithm for feature subset selection. IEEE Trans. Comput. 9, 917–922 (1977)CrossRef Narendra, P.M., Fukunaga, K.: A branch and bound algorithm for feature subset selection. IEEE Trans. Comput. 9, 917–922 (1977)CrossRef
22.
go back to reference Ortega, J., Fisher, D.: Flexibly exploiting prior knowledge in empirical learning. In: IJCAI, pp. 1041–1049 (1995) Ortega, J., Fisher, D.: Flexibly exploiting prior knowledge in empirical learning. In: IJCAI, pp. 1041–1049 (1995)
23.
24.
go back to reference Prechelt, L.: Proben 1-a set of benchmarks and benchmarking rules for neural network training algorithms (1994) Prechelt, L.: Proben 1-a set of benchmarks and benchmarking rules for neural network training algorithms (1994)
25.
go back to reference Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986) Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
26.
go back to reference Quinlan, J.R.: C4.5: Programs for Machine Learning, vol. 1. Morgan Kaufmann, San Mateo (1993) Quinlan, J.R.: C4.5: Programs for Machine Learning, vol. 1. Morgan Kaufmann, San Mateo (1993)
27.
go back to reference Salguero, A.G., Medina, J., Delatorre, P., Espinilla, M.: Methodology for improving classification accuracy using ontologies: application in the recognition of activities of daily living. J. Ambient Intell. Humaniz. Comput. 10(6), 2125–2142 (2019)CrossRef Salguero, A.G., Medina, J., Delatorre, P., Espinilla, M.: Methodology for improving classification accuracy using ontologies: application in the recognition of activities of daily living. J. Ambient Intell. Humaniz. Comput. 10(6), 2125–2142 (2019)CrossRef
29.
go back to reference Tallón-Ballesteros, A.J., Cavique, L., Fong, S.: Addressing low dimensionality feature subset selection: ReliefF(-k) or Extended Correlation-Based Feature Selection(eCFS)? In: Martínez Álvarez, F., Troncoso Lora, A., Sáez Muñoz, J.A., Quintián, H., Corchado, E. (eds.) SOCO 2019. AISC, vol. 950, pp. 251–260. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-20055-8_24CrossRef Tallón-Ballesteros, A.J., Cavique, L., Fong, S.: Addressing low dimensionality feature subset selection: ReliefF(-k) or Extended Correlation-Based Feature Selection(eCFS)? In: Martínez Álvarez, F., Troncoso Lora, A., Sáez Muñoz, J.A., Quintián, H., Corchado, E. (eds.) SOCO 2019. AISC, vol. 950, pp. 251–260. Springer, Cham (2020). https://​doi.​org/​10.​1007/​978-3-030-20055-8_​24CrossRef
30.
go back to reference Tallón-Ballesteros, A.J., Gutiérrez-Peña, P.A., Hervás-Martínez, R.: Distribution of the search of evolutionary product unit neural networks for classification. arXiv preprint arXiv:1205.3336 (2012) Tallón-Ballesteros, A.J., Gutiérrez-Peña, P.A., Hervás-Martínez, R.: Distribution of the search of evolutionary product unit neural networks for classification. arXiv preprint arXiv:​1205.​3336 (2012)
31.
go back to reference Tallón-Ballesteros, A.J., Hervás-Martínez, C., Riquelme, J.C., Ruiz, R.: Improving the accuracy of a two-stage algorithm in evolutionary product unit neural networks for classification by means of feature selection. In: Ferrández, J.M., Álvarez Sánchez, J.R., de la Paz, F., Toledo, F.J. (eds.) IWINAC 2011. LNCS, vol. 6687, pp. 381–390. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21326-7_41CrossRef Tallón-Ballesteros, A.J., Hervás-Martínez, C., Riquelme, J.C., Ruiz, R.: Improving the accuracy of a two-stage algorithm in evolutionary product unit neural networks for classification by means of feature selection. In: Ferrández, J.M., Álvarez Sánchez, J.R., de la Paz, F., Toledo, F.J. (eds.) IWINAC 2011. LNCS, vol. 6687, pp. 381–390. Springer, Heidelberg (2011). https://​doi.​org/​10.​1007/​978-3-642-21326-7_​41CrossRef
32.
go back to reference Tallón-Ballesteros, A.J., Riquelme. J.C.: Deleting or keeping outliers for classifier training? In: 2014 Sixth World Congress on Nature and Biologically Inspired Computing (NaBIC 2014), pp. 281–286. IEEE (2014) Tallón-Ballesteros, A.J., Riquelme. J.C.: Deleting or keeping outliers for classifier training? In: 2014 Sixth World Congress on Nature and Biologically Inspired Computing (NaBIC 2014), pp. 281–286. IEEE (2014)
33.
go back to reference Tallón-Ballesteros, A.J., Riquelme, J.C.: Low dimensionality or same subsets as a result of feature selection: an in-depth roadmap. In: Ferrández Vicente, J.M., Álvarez-Sánchez, J.R., de la Paz López, F., Toledo Moreo, J., Adeli, H. (eds.) IWINAC 2017. LNCS, vol. 10338, pp. 531–539. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59773-7_54CrossRef Tallón-Ballesteros, A.J., Riquelme, J.C.: Low dimensionality or same subsets as a result of feature selection: an in-depth roadmap. In: Ferrández Vicente, J.M., Álvarez-Sánchez, J.R., de la Paz López, F., Toledo Moreo, J., Adeli, H. (eds.) IWINAC 2017. LNCS, vol. 10338, pp. 531–539. Springer, Cham (2017). https://​doi.​org/​10.​1007/​978-3-319-59773-7_​54CrossRef
34.
go back to reference Tallón-Ballesteros, A.J., Riquelme, J.C., Ruiz, R.: Accuracy increase on evolving product unit neural networks via feature subset selection. In: Martínez-Álvarez, F., Troncoso, A., Quintián, H., Corchado, E. (eds.) HAIS 2016. LNCS (LNAI), vol. 9648, pp. 136–148. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32034-2_12CrossRef Tallón-Ballesteros, A.J., Riquelme, J.C., Ruiz, R.: Accuracy increase on evolving product unit neural networks via feature subset selection. In: Martínez-Álvarez, F., Troncoso, A., Quintián, H., Corchado, E. (eds.) HAIS 2016. LNCS (LNAI), vol. 9648, pp. 136–148. Springer, Cham (2016). https://​doi.​org/​10.​1007/​978-3-319-32034-2_​12CrossRef
35.
go back to reference Tallón-Ballesteros, A.J., Riquelme, J.C., Ruiz, R.: Semi-wrapper feature subset selector for feed-forward neural networks: applications to binary and multi-class classification problems. Neurocomputing 353, 28–44 (2019)CrossRef Tallón-Ballesteros, A.J., Riquelme, J.C., Ruiz, R.: Semi-wrapper feature subset selector for feed-forward neural networks: applications to binary and multi-class classification problems. Neurocomputing 353, 28–44 (2019)CrossRef
36.
go back to reference Tallón-Ballesteros, A.J., Tuba, M., Xue, B., Hashimoto, T.: Feature selection and interpretable feature transformation: a preliminary study on feature engineering for classification algorithms. In: Yin, H., Camacho, D., Novais, P., Tallón-Ballesteros, A.J. (eds.) IDEAL 2018. LNCS, vol. 11315, pp. 280–287. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03496-2_31CrossRef Tallón-Ballesteros, A.J., Tuba, M., Xue, B., Hashimoto, T.: Feature selection and interpretable feature transformation: a preliminary study on feature engineering for classification algorithms. In: Yin, H., Camacho, D., Novais, P., Tallón-Ballesteros, A.J. (eds.) IDEAL 2018. LNCS, vol. 11315, pp. 280–287. Springer, Cham (2018). https://​doi.​org/​10.​1007/​978-3-030-03496-2_​31CrossRef
37.
go back to reference Tan, P.-N.: Introduction to Data Mining. Pearson Education India, India (2018) Tan, P.-N.: Introduction to Data Mining. Pearson Education India, India (2018)
40.
41.
go back to reference Wirth, R., Hipp, J.: CRISP-DM: towards a standard process model for data mining. In: Proceedings of the 4th International Conference on the Practical Applications of Knowledge Discovery and Data Mining, pp. 29–39. Citeseer (2000) Wirth, R., Hipp, J.: CRISP-DM: towards a standard process model for data mining. In: Proceedings of the 4th International Conference on the Practical Applications of Knowledge Discovery and Data Mining, pp. 29–39. Citeseer (2000)
42.
go back to reference Xu, G., Zong, Y., Yang, Z.: Applied Data Mining. CRC Press, Boca Raton (2013)CrossRef Xu, G., Zong, Y., Yang, Z.: Applied Data Mining. CRC Press, Boca Raton (2013)CrossRef
Metadata
Title
Does the Order of Attributes Play an Important Role in Classification?
Authors
Antonio J. Tallón-Ballesteros
Simon Fong
Rocío Leal-Díaz
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-29859-3_32

Premium Partner