Skip to main content
Top

2019 | OriginalPaper | Chapter

Dataset Weighting via Intrinsic Data Characteristics for Pairwise Statistical Comparisons in Classification

Authors : José A. Sáez, Pablo Villacorta, Emilio Corchado

Published in: Hybrid Artificial Intelligent Systems

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In supervised learning, some data characteristics (e.g. presence of errors, overlapping degree, etc.) may negatively influence classifier performance. Many methods are designed to overcome the undesirable effects of the aforementioned issues. When comparing one of those techniques with existing ones, a proper selection of datasets must be made, based on how well each dataset reflects the characteristic being specifically addressed by the proposed algorithm. In this setting, statistical tests are necessary to check the significance of the differences found in the comparison of different methods. Wilcoxon’s signed-ranks test is one of the most well-known statistical tests for pairwise comparisons between classifiers. However, it gives the same importance to every dataset, disregarding how representative each of them is in relation to the concrete issue addressed by the methods compared. This research proposes a hybrid approach which combines techniques of measurement for data characterization with statistical tests for decision making in data mining. Thus, each dataset is weighted according to its representativeness of the property of interest before using Wilcoxon’s test. Our proposal has been successfully compared with the standard Wilcoxon’s test in two scenarios related to the noisy data problem. As a result, this approach stands out properties of the algorithms easier, which may otherwise remain hidden if data characteristics are not considered in the comparison.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Bach, F.: Breaking the curse of dimensionality with convex neural networks. J. Mach. Learn. Res. 18, 1–53 (2017)MATH Bach, F.: Breaking the curse of dimensionality with convex neural networks. J. Mach. Learn. Res. 18, 1–53 (2017)MATH
2.
go back to reference Bello-Orgaz, G., Jung, J., Camacho, D.: Social big data: recent achievements and new challenges. Inf. Fusion 28, 45–59 (2016)CrossRef Bello-Orgaz, G., Jung, J., Camacho, D.: Social big data: recent achievements and new challenges. Inf. Fusion 28, 45–59 (2016)CrossRef
3.
go back to reference Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)MathSciNetMATH Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)MathSciNetMATH
5.
go back to reference Jain, S., Shukla, S., Wadhvani, R.: Dynamic selection of normalization techniques using data complexity measures. Expert Syst. Appl. 106, 252–262 (2018)CrossRef Jain, S., Shukla, S., Wadhvani, R.: Dynamic selection of normalization techniques using data complexity measures. Expert Syst. Appl. 106, 252–262 (2018)CrossRef
6.
go back to reference Khalilpour Darzi, M., Niaki, S., Khedmati, M.: Binary classification of imbalanced datasets: the case of coil challenge 2000. Expert Syst. Appl. 128, 169–186 (2019)CrossRef Khalilpour Darzi, M., Niaki, S., Khedmati, M.: Binary classification of imbalanced datasets: the case of coil challenge 2000. Expert Syst. Appl. 128, 169–186 (2019)CrossRef
7.
go back to reference Kuncheva, L., Galar, M.: Theoretical and empirical criteria for the edited nearest neighbour classifier, vol. January, pp. 817–822 (2016) Kuncheva, L., Galar, M.: Theoretical and empirical criteria for the edited nearest neighbour classifier, vol. January, pp. 817–822 (2016)
8.
go back to reference Larose, D.T., Larose, C.D.: Data Mining and Predictive Analytics, 2nd edn. Wiley Publishing, Hoboken (2015)MATH Larose, D.T., Larose, C.D.: Data Mining and Predictive Analytics, 2nd edn. Wiley Publishing, Hoboken (2015)MATH
9.
go back to reference Luengo, J., García, S., Herrera, F.: A study on the use of imputation methods for experimentation with radial basis function network classifiers handling missing attribute values: the good synergy between RBFs and eventcovering method. Neural Networks 23(3), 406–418 (2010)CrossRef Luengo, J., García, S., Herrera, F.: A study on the use of imputation methods for experimentation with radial basis function network classifiers handling missing attribute values: the good synergy between RBFs and eventcovering method. Neural Networks 23(3), 406–418 (2010)CrossRef
10.
go back to reference Nettleton, D., Orriols-Puig, A., Fornells, A.: A study of the effect of different types of noise on the precision of supervised learning techniques. Artif. Intell. Rev. 33, 275–306 (2010)CrossRef Nettleton, D., Orriols-Puig, A., Fornells, A.: A study of the effect of different types of noise on the precision of supervised learning techniques. Artif. Intell. Rev. 33, 275–306 (2010)CrossRef
11.
go back to reference Quade, D.: Using weighted rankings in the analysis of complete blocks with additive block effects. J. Am. Stat. Assoc. 74, 680–683 (1979)MathSciNetCrossRef Quade, D.: Using weighted rankings in the analysis of complete blocks with additive block effects. J. Am. Stat. Assoc. 74, 680–683 (1979)MathSciNetCrossRef
12.
go back to reference Sáez, J.A., Galar, M., Luengo, J., Herrera, F.: INFFC: an iterative class noise filter based on the fusion of classifiers with noise sensitivity control. Inf. Fusion 27, 19–32 (2016)CrossRef Sáez, J.A., Galar, M., Luengo, J., Herrera, F.: INFFC: an iterative class noise filter based on the fusion of classifiers with noise sensitivity control. Inf. Fusion 27, 19–32 (2016)CrossRef
13.
go back to reference Sáez, J.A., Luengo, J., Herrera, F.: Predicting noise filtering efficacy with data complexity measures for nearest neighbor classification. Pattern Recogn. 46(1), 355–364 (2013)CrossRef Sáez, J.A., Luengo, J., Herrera, F.: Predicting noise filtering efficacy with data complexity measures for nearest neighbor classification. Pattern Recogn. 46(1), 355–364 (2013)CrossRef
14.
go back to reference Santafe, G., Inza, I., Lozano, J.: Dealing with the evaluation of supervised classification algorithms. Artif. Intell. Rev. 44(4), 467–508 (2015)CrossRef Santafe, G., Inza, I., Lozano, J.: Dealing with the evaluation of supervised classification algorithms. Artif. Intell. Rev. 44(4), 467–508 (2015)CrossRef
15.
go back to reference Singh, P., Sarkar, R., Nasipuri, M.: Significance of non-parametric statistical tests for comparison of classifiers over multiple datasets. Int. J. Comput. Sci. Math. 7(5), 410–442 (2016)MathSciNetCrossRef Singh, P., Sarkar, R., Nasipuri, M.: Significance of non-parametric statistical tests for comparison of classifiers over multiple datasets. Int. J. Comput. Sci. Math. 7(5), 410–442 (2016)MathSciNetCrossRef
16.
go back to reference Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)MATH Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)MATH
17.
go back to reference Wolpert, D.H., Macready, W.G.: No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 1(1), 67–82 (1997)CrossRef Wolpert, D.H., Macready, W.G.: No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 1(1), 67–82 (1997)CrossRef
18.
go back to reference Zar, J.: Biostatistical Analysis. Prentice Hall, Upper Saddle River (2009) Zar, J.: Biostatistical Analysis. Prentice Hall, Upper Saddle River (2009)
Metadata
Title
Dataset Weighting via Intrinsic Data Characteristics for Pairwise Statistical Comparisons in Classification
Authors
José A. Sáez
Pablo Villacorta
Emilio Corchado
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-29859-3_6

Premium Partner