Skip to main content
Erschienen in: Pattern Analysis and Applications 3/2007

01.08.2007 | Theoretical Advances

An analysis of how training data complexity affects the nearest neighbor classifiers

verfasst von: J. S. Sánchez, R. A. Mollineda, J. M. Sotoca

Erschienen in: Pattern Analysis and Applications | Ausgabe 3/2007

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The k-nearest neighbors (k-NN) classifier is one of the most popular supervised classification methods. It is very simple, intuitive and accurate in a great variety of real-world domains. Nonetheless, despite its simplicity and effectiveness, practical use of this rule has been historically limited due to its high storage requirements and the computational costs involved. On the other hand, the performance of this classifier appears strongly sensitive to training data complexity. In this context, by means of several problem difficulty measures, we try to characterize the behavior of the k-NN rule when working under certain situations. More specifically, the present analysis focuses on the use of some data complexity measures to describe class overlapping, feature space dimensionality and class density, and discover their relation with the practical accuracy of this classifier.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Bernadó E, Ho T-K (2004) On classifier domain of competence. In: Proceedings of 17th international conference on pattern recognition, Cambridge, pp 136–139 Bernadó E, Ho T-K (2004) On classifier domain of competence. In: Proceedings of 17th international conference on pattern recognition, Cambridge, pp 136–139
2.
Zurück zum Zitat Beyer KS, Goldstein J, Ramakrishnan R, Shaft U (1999) When is “nearest neighbor” meaningful? In: Proceedings of 7th international conference on database theory, Jerusalem, pp 217–235 Beyer KS, Goldstein J, Ramakrishnan R, Shaft U (1999) When is “nearest neighbor” meaningful? In: Proceedings of 7th international conference on database theory, Jerusalem, pp 217–235
3.
Zurück zum Zitat Cover TM (1968) Estimation by the nearest neighbor rule. IEEE Trans Inf Theory 14:50–55MATHCrossRef Cover TM (1968) Estimation by the nearest neighbor rule. IEEE Trans Inf Theory 14:50–55MATHCrossRef
4.
Zurück zum Zitat Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13:21–27MATHCrossRef Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13:21–27MATHCrossRef
5.
Zurück zum Zitat Dasarathy BV (1991) Nearest neighbor norms: NN pattern classification techniques. IEEE Computer Society Press, Los Alamos Dasarathy BV (1991) Nearest neighbor norms: NN pattern classification techniques. IEEE Computer Society Press, Los Alamos
6.
Zurück zum Zitat Devijver PA, Kittler J (1992) Pattern recognition: a statistical approach. Prentice Hall, Englewood Cliffs Devijver PA, Kittler J (1992) Pattern recognition: a statistical approach. Prentice Hall, Englewood Cliffs
7.
Zurück zum Zitat Friedman JH (1997) On bias, variance, 0/1-loss, and the curse of dimensionality. Data Mining Knowl Discov 1:55–77CrossRef Friedman JH (1997) On bias, variance, 0/1-loss, and the curse of dimensionality. Data Mining Knowl Discov 1:55–77CrossRef
8.
Zurück zum Zitat Gama J, Brazdil P (1995) Characterization of classification algorithms. In: Progress in artificial intelligence. Springer, Heidelberg, pp 83–102 Gama J, Brazdil P (1995) Characterization of classification algorithms. In: Progress in artificial intelligence. Springer, Heidelberg, pp 83–102
9.
Zurück zum Zitat Hand DJ, Vinciotti V (2003) Choosing k for two-class nearest neighbour classifiers with unbalanced classes. Pattern Recognit Lett 24:1555–1562MATHCrossRef Hand DJ, Vinciotti V (2003) Choosing k for two-class nearest neighbour classifiers with unbalanced classes. Pattern Recognit Lett 24:1555–1562MATHCrossRef
10.
Zurück zum Zitat Ho T-K, Basu M (2002) Complexity measures of supervised classification problems. IEEE Trans Pattern Anal Mach Intell 24:289–300CrossRef Ho T-K, Basu M (2002) Complexity measures of supervised classification problems. IEEE Trans Pattern Anal Mach Intell 24:289–300CrossRef
11.
Zurück zum Zitat Hoekstra A, Duin RPW (1996) On the nonlinearity of pattern classifiers. In: Proceedings of 13th international conference on pattern recognition, Vienna, pp 271–275 Hoekstra A, Duin RPW (1996) On the nonlinearity of pattern classifiers. In: Proceedings of 13th international conference on pattern recognition, Vienna, pp 271–275
12.
Zurück zum Zitat Jain AK, Xu X, Ho T-K, Xiao F (2002) Uniformity testing using minimal spanning tree. In: Proceedings of 16th international conference on pattern recognition, Quebec City, pp 281–284 Jain AK, Xu X, Ho T-K, Xiao F (2002) Uniformity testing using minimal spanning tree. In: Proceedings of 16th international conference on pattern recognition, Quebec City, pp 281–284
13.
Zurück zum Zitat Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intell Data Anal 6:429–449MATH Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intell Data Anal 6:429–449MATH
14.
Zurück zum Zitat Kubat M, Chen W-K (1998) Weighted projection in nearest-neighbor classifiers. In: Proceedings of 1st Southern symposium on computing, Hattiesburg, pp 27–34 Kubat M, Chen W-K (1998) Weighted projection in nearest-neighbor classifiers. In: Proceedings of 1st Southern symposium on computing, Hattiesburg, pp 27–34
15.
Zurück zum Zitat Little RJA, Rubin DB (2002) Statistical analysis with missing data. Wiley, New YorkMATH Little RJA, Rubin DB (2002) Statistical analysis with missing data. Wiley, New YorkMATH
16.
Zurück zum Zitat Mollineda RA, Sánchez JS, Sotoca JM (2005) Data characterization for effective prototype selection. In: Proceedings of 2nd Iberian conference on pattern recognition and image analysis, Estoril, pp 27–34 Mollineda RA, Sánchez JS, Sotoca JM (2005) Data characterization for effective prototype selection. In: Proceedings of 2nd Iberian conference on pattern recognition and image analysis, Estoril, pp 27–34
17.
Zurück zum Zitat Okamoto S, Yugami N (2003) Effects of domain characteristics on instance-based learning algorithms. Theor Comput Sci 298:207–233MATHCrossRefMathSciNet Okamoto S, Yugami N (2003) Effects of domain characteristics on instance-based learning algorithms. Theor Comput Sci 298:207–233MATHCrossRefMathSciNet
18.
Zurück zum Zitat Sánchez JS, Barandela R, Marqués AI, Alejo R, Badenas J (2003) Analysis of new techniques to obtain quality training sets. Pattern Recognit Lett 24:1015–1022CrossRef Sánchez JS, Barandela R, Marqués AI, Alejo R, Badenas J (2003) Analysis of new techniques to obtain quality training sets. Pattern Recognit Lett 24:1015–1022CrossRef
19.
Zurück zum Zitat Singh S (2003) PRISM—a novel framework for pattern recognition. Pattern Anal Appl 6:134–149CrossRef Singh S (2003) PRISM—a novel framework for pattern recognition. Pattern Anal Appl 6:134–149CrossRef
20.
Zurück zum Zitat Sohn S-Y (1999) Meta analysis of classification algorithms for pattern recognition. IEEE Trans Pattern Anal Mach Intell 21:1137–1144CrossRef Sohn S-Y (1999) Meta analysis of classification algorithms for pattern recognition. IEEE Trans Pattern Anal Mach Intell 21:1137–1144CrossRef
21.
Zurück zum Zitat Zhang J, Mani I (2003) kNN approach to unbalanced data distributions: a case study involving information extraction. In: Proceedings of workshop on learning from imbalanced datasets, Washington DC Zhang J, Mani I (2003) kNN approach to unbalanced data distributions: a case study involving information extraction. In: Proceedings of workshop on learning from imbalanced datasets, Washington DC
22.
Zurück zum Zitat Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data sets. IEEE Trans Syst Man Cybern 2:408–421MATHCrossRef Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data sets. IEEE Trans Syst Man Cybern 2:408–421MATHCrossRef
Metadaten
Titel
An analysis of how training data complexity affects the nearest neighbor classifiers
verfasst von
J. S. Sánchez
R. A. Mollineda
J. M. Sotoca
Publikationsdatum
01.08.2007
Verlag
Springer-Verlag
Erschienen in
Pattern Analysis and Applications / Ausgabe 3/2007
Print ISSN: 1433-7541
Elektronische ISSN: 1433-755X
DOI
https://doi.org/10.1007/s10044-007-0061-2

Weitere Artikel der Ausgabe 3/2007

Pattern Analysis and Applications 3/2007 Zur Ausgabe