Skip to main content

2016 | OriginalPaper | Buchkapitel

On the Relation Between kNN Accuracy and Dataset Compression Level

verfasst von : Marcin Blachnik

Erschienen in: Artificial Intelligence and Soft Computing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The paper discusses how instance selection can be used to asses the kNN performance. There exists a strong correlation between the compression level of the dataset obtained by instance selection methods and the prediction accuracy obtained the k-NN classifier trained on full training dataset.
Based on two standard algorithms of instance selection namely CNN and ENN, which belong to two different groups of methods, so called condensation and editing methods, we perform empirical analysis to verify this relation. The obtained results show that this relation is almost linear, so that the level of compression is linearly correlated with the accuracy. In other words by knowing the compression of instance selection methods we are able to estimate the accuracy of the final kNN prediction model.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
3.
Zurück zum Zitat Duch, W., Wieczorek, T., Biesiada, J., Blachnik, M.: Comparision of feature ranking methods based on information entropy. In: Proceedings of International Joint Conference on Neural Networks, pp. 1415–1420. IEEE Press, Budapest (2004) Duch, W., Wieczorek, T., Biesiada, J., Blachnik, M.: Comparision of feature ranking methods based on information entropy. In: Proceedings of International Joint Conference on Neural Networks, pp. 1415–1420. IEEE Press, Budapest (2004)
4.
Zurück zum Zitat Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: Proceedings of the 25th Very Large Database (VLDB) Conference (1999) Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: Proceedings of the 25th Very Large Database (VLDB) Conference (1999)
5.
Zurück zum Zitat Hart, P.: The condensed nearest neighbor rule. IEEE Trans. Inf. Theory 16, 515–516 (1968)CrossRef Hart, P.: The condensed nearest neighbor rule. IEEE Trans. Inf. Theory 16, 515–516 (1968)CrossRef
6.
Zurück zum Zitat Herrera, F.: Keel, knowledge extraction based on evolutionary learning (2005). http://www.keel.es, spanish National Projects TIC2002-04036-C05, TIN2005-08386-C05 and TIN2008-06681-C06 Herrera, F.: Keel, knowledge extraction based on evolutionary learning (2005). http://​www.​keel.​es, spanish National Projects TIC2002-04036-C05, TIN2005-08386-C05 and TIN2008-06681-C06
7.
Zurück zum Zitat Kachel, A., Biesiada, J., Blachnik, M., Duch, W.: Infosel++: Information based feature Selection C++ library. In: Rutkowski, L., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2010, Part I. LNCS, vol. 6113, pp. 388–396. Springer, Heidelberg (2010)CrossRef Kachel, A., Biesiada, J., Blachnik, M., Duch, W.: Infosel++: Information based feature Selection C++ library. In: Rutkowski, L., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2010, Part I. LNCS, vol. 6113, pp. 388–396. Springer, Heidelberg (2010)CrossRef
8.
Zurück zum Zitat Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining. Springer, New York (1998)CrossRefMATH Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining. Springer, New York (1998)CrossRefMATH
9.
Zurück zum Zitat Omohundro, S.M.: Five balltree construction algorithms. Technical report. TR-89-063, International Computer Science Institute, December 1989 Omohundro, S.M.: Five balltree construction algorithms. Technical report. TR-89-063, International Computer Science Institute, December 1989
10.
Zurück zum Zitat Salvador, G., Joaquin, D., Jose, R.C., Francisco, H.: Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 417–435 (2012)CrossRef Salvador, G., Joaquin, D., Jose, R.C., Francisco, H.: Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 417–435 (2012)CrossRef
11.
Zurück zum Zitat Uhlmann, J.K.: Satisfying general proximity/similarity queries with metric trees. Inf. Proc. Lett. 40(4), 175–179 (1991)CrossRefMATH Uhlmann, J.K.: Satisfying general proximity/similarity queries with metric trees. Inf. Proc. Lett. 40(4), 175–179 (1991)CrossRefMATH
12.
Zurück zum Zitat Wilson, D.: Assymptotic properties of nearest neighbour rules using edited data. IEEE Trans. Syst. Man Cybernetics SMC 2, 408–421 (1972)CrossRefMATH Wilson, D.: Assymptotic properties of nearest neighbour rules using edited data. IEEE Trans. Syst. Man Cybernetics SMC 2, 408–421 (1972)CrossRefMATH
13.
Zurück zum Zitat Yao, Y.Y.: Information-theoretic measures for knowledge discovery and data mining. In: Karmeshu, P. (ed.) Entropy Measures, Maximum Entropy Principle and Emerging Applications. STUDFUZZ, vol. 119, pp. 115–136. Springer, Heidelberg (2003)CrossRef Yao, Y.Y.: Information-theoretic measures for knowledge discovery and data mining. In: Karmeshu, P. (ed.) Entropy Measures, Maximum Entropy Principle and Emerging Applications. STUDFUZZ, vol. 119, pp. 115–136. Springer, Heidelberg (2003)CrossRef
14.
Zurück zum Zitat Perlich, C.: Learning curves in machine learning. In: Sammut, C., Webb, G.I. (eds.) Encyclopedia of Machine Learning, pp. 577–580. Springer, New York (2011) Perlich, C.: Learning curves in machine learning. In: Sammut, C., Webb, G.I. (eds.) Encyclopedia of Machine Learning, pp. 577–580. Springer, New York (2011)
Metadaten
Titel
On the Relation Between kNN Accuracy and Dataset Compression Level
verfasst von
Marcin Blachnik
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-39378-0_46