nach oben

Erschienen in:

2016 | OriginalPaper | Buchkapitel

On the Relation Between kNN Accuracy and Dataset Compression Level

verfasst von : Marcin Blachnik

Erschienen in: Artificial Intelligence and Soft Computing

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The paper discusses how instance selection can be used to asses the kNN performance. There exists a strong correlation between the compression level of the dataset obtained by instance selection methods and the prediction accuracy obtained the k-NN classifier trained on full training dataset.

Based on two standard algorithms of instance selection namely CNN and ENN, which belong to two different groups of methods, so called condensation and editing methods, we perform empirical analysis to verify this relation. The obtained results show that this relation is almost linear, so that the level of compression is linearly correlated with the accuracy. In other words by knowing the compression of instance selection methods we are able to estimate the accuracy of the final kNN prediction model.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Generalized Shape Language Application to Detection of a Specific Type of Bone Erosion in X-ray Images

Nächstes Kapitel Diversity Analysis on Imbalanced Data Using Neighbourhood and Roughly Balanced Bagging Ensembles

Asuncion, A., Newman, D.: UCI machine learning repository. http://www.ics.uci.edu/~mlearn/MLRepository.html

Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)MathSciNetCrossRefMATH

Duch, W., Wieczorek, T., Biesiada, J., Blachnik, M.: Comparision of feature ranking methods based on information entropy. In: Proceedings of International Joint Conference on Neural Networks, pp. 1415–1420. IEEE Press, Budapest (2004)

Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: Proceedings of the 25th Very Large Database (VLDB) Conference (1999)

Hart, P.: The condensed nearest neighbor rule. IEEE Trans. Inf. Theory 16, 515–516 (1968)CrossRef

Herrera, F.: Keel, knowledge extraction based on evolutionary learning (2005). http://www.keel.es, spanish National Projects TIC2002-04036-C05, TIN2005-08386-C05 and TIN2008-06681-C06

Kachel, A., Biesiada, J., Blachnik, M., Duch, W.: Infosel++: Information based feature Selection C++ library. In: Rutkowski, L., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2010, Part I. LNCS, vol. 6113, pp. 388–396. Springer, Heidelberg (2010)CrossRef

Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining. Springer, New York (1998)CrossRefMATH

Omohundro, S.M.: Five balltree construction algorithms. Technical report. TR-89-063, International Computer Science Institute, December 1989

10.

Salvador, G., Joaquin, D., Jose, R.C., Francisco, H.: Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 417–435 (2012)CrossRef

11.

Uhlmann, J.K.: Satisfying general proximity/similarity queries with metric trees. Inf. Proc. Lett. 40(4), 175–179 (1991)CrossRefMATH

12.

Wilson, D.: Assymptotic properties of nearest neighbour rules using edited data. IEEE Trans. Syst. Man Cybernetics SMC 2, 408–421 (1972)CrossRefMATH

13.

Yao, Y.Y.: Information-theoretic measures for knowledge discovery and data mining. In: Karmeshu, P. (ed.) Entropy Measures, Maximum Entropy Principle and Emerging Applications. STUDFUZZ, vol. 119, pp. 115–136. Springer, Heidelberg (2003)CrossRef

14.

Perlich, C.: Learning curves in machine learning. In: Sammut, C., Webb, G.I. (eds.) Encyclopedia of Machine Learning, pp. 577–580. Springer, New York (2011)

Titel: On the Relation Between kNN Accuracy and Dataset Compression Level
verfasst von: Marcin Blachnik
Verlag: Springer International Publishing
Buch: Artificial Intelligence and Soft Computing
Print ISBN: 978-3-319-39377-3

Electronic ISBN: 978-3-319-39378-0

Copyright-Jahr: 2016
DOI: https://doi.org/10.1007/978-3-319-39378-0_46

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"