Skip to main content

Advertisement

Log in

Identifying predictive hubs to condense the training set of \(k\)-nearest neighbour classifiers

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

The \(k\)-Nearest Neighbour classifier is widely used and popular due to its inherent simplicity and the avoidance of model assumptions. Although the approach has been shown to yield a near-optimal classification performance for an infinite number of samples, a selection of the most decisive data points can improve the classification accuracy considerably in real settings with a limited number of samples. At the same time, a selection of a subset of representative training samples reduces the required amount of storage and computational resources. We devised a new approach that selects a representative training subset on the basis of an evolutionary optimization procedure. This method chooses those training samples that have a strong influence on the correct prediction of other training samples, in particular those that have uncertain labels. The performance of the algorithm is evaluated on different data sets. Additionally, we provide graphical examples of the selection procedure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Angiulli F (2005) Fast condensed nearest neighbor rule. In: De Raedt L, Wrobel S (eds) ICML, ACM pp 25–32

  • Bhattacharya B, Kaller D (1998) Reference set thinning for the k-nearest neighbor decision rule. In: Proceedings of the 14th international conference on pattern recognition, Vol 1- ICPR ’98. IEEE Computer Society pp 238–242

  • Brighton H, Mellish C (2002) Advances in instance selection for instance-based learning algorithms. Data Min Knowl Discov 6(2):153–172

    Article  MATH  MathSciNet  Google Scholar 

  • Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13:21–27

    Article  MATH  Google Scholar 

  • Dasarathy BV (1994) Minimal consistent set (MCS) identification for optimal nearest neighbor decision systems design. IEEE Trans Syst Man Cybern 24(3):511–517

    Article  Google Scholar 

  • Devroye L, Györfi L, Lugosi G (1996) A probabilistic theory of pattern recognition. Springer, Berlin

    Book  MATH  Google Scholar 

  • Fix E, Hodges J (1951) Discriminatory analysis: nonparametric discrimination: consistency properties. Technical report project 21–49-004, Report number 4, USAF School of Aviation Medicine, Randolf Field, Texas

  • Frank A, Asuncion A (2010) UCI machine learning repository http://archive.ics.uci.edu/ml

  • Gates G (1972) The reduced nearest neighbor rule. IEEE Trans Inf Theory 18(3):431–433

    Article  Google Scholar 

  • Gil-Pita R, Yao X (2007) Using a genetic algorithm for editing k-nearest neighbor classifiers. In: Yin H, Tiño P, Corchado E, Byrne W, Yao X (eds) IDEAL (Lecture Notes in Computer Science), vol 4881. Springer pp 1141–1150

  • Hart P (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14:515–516

    Article  Google Scholar 

  • Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor

    Google Scholar 

  • Horton P, Nakai K (1996) A probabilistic classification system for predicting the cellular localization sites of proteins. In: States D, Agarwal P, Gaasterland T, Hunter L, Smith R (eds) Proceedings of the fourth international conference on intelligent systems for molecular Biology. AAAI Press pp 109–115

  • Kohonen T (1988) Learning vector quantization. Neural Netw 1:303

    Article  Google Scholar 

  • Kohonen T (1995) Learning vector quantization. In: Arbib M (ed) The handbook of brain theory and neural networks. MIT Press, Cambrige, pp 537–540

    Google Scholar 

  • Kuncheva L (1995) Editing for the k-nearest neighbors rule by a genetic algorithm. Pattern Recognit Lett 16(8):809–814

    Article  Google Scholar 

  • Kuncheva L (1997) Fitness functions in editing k-nn reference set by genetic algorithms. Pattern Recognit 30(6):1041–1049

    Article  Google Scholar 

  • Kuncheva L, Jain L (1999) Nearest neighbor classifier: Simultaneous editing and feature selection. Pattern Recognit Lett 20(11–13):1149–1156

    Article  Google Scholar 

  • Langford J (2005) Tutorial on practical prediction theory for classification. J Mach Learn Res 6:273–306

    MATH  MathSciNet  Google Scholar 

  • Li D, Simske S (2011) Training set compression by incremental clustering. J Pattern Recognit Res 6(1): 56–63

    Google Scholar 

  • Littlestone N, Warmuth M (1986) Relating data compression and learnability. Unpublished manuscript

  • Schwenker F, Kestler HA (2002) Analysis of support vectors helps to identify borderline patients in classification studies. Comput Cardiol 29:305–308

    Google Scholar 

  • Sigillito VG, Wing SP, Hutton LV, Baker KB (1989) Classification of radar returns from the ionosphere using neural networks. Johns Hopkins APL Tech Dig 10(3):262–266

    Google Scholar 

  • Street WN, Wolberg WH, Mangasaria OL (1993) Nuclear feature extraction for breast tumor diagnosis. In: Acharya R, Goldgof D (eds) International symposium on electronic imaging: science and technology, vol 1905. SPIE Conference Series, pp 861–870

  • Webb A (2002) Statistical pattern recognition, 2nd edn. Wiley, London

    Book  MATH  Google Scholar 

  • Wilson D (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2(3):408–421

    Article  MATH  Google Scholar 

Download references

Acknowledgments

This work is supported by the Graduate School of Mathematical Analysis of Evolution, Information and Complexity at the University of Ulm (CM, HAK) and by the German federal ministry of education and research (BMBF) within the framework of the program of medical genome research (PaCa-Net; project ID PKB-01GS08) and GerontoSys (Forschungskern SyStaR). AM was supported by the Ministry of Education and Science of the Russian Federation under the government contract 13.G25.31.0017 with the joint-stock company “Academician M.F. Reshetnev Information Satellite Systems”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hans A. Kestler.

Additional information

Ludwig Lausser and Christoph Müssel: contributed equally.

Appendix

Appendix

The accuracy term \(A\) given in Eq. 10 is inspired by the idea of sample compression bounds (Littlestone and Warmuth 1986). These bounds can be used to give an upper limit to the true classification error probability \(\mathcal{E }\) of a data-dependent classifier \(c\). A classifier is called data-dependent if it can be reconstructed directly from a certain subset of training samples \(\mathcal{R } \subseteq \mathcal{T }\) (called a compression set in this context), i.e. \(c_\mathcal{R } = c_\mathcal{T }\). This is clearly true for \(k\)-NN-like classifiers following Eq. 3, as they can be reconstructed from their reference set. A sample compression bound estimates the true error probability \(\mathcal{E }\) from the empirical error rate E calculated on the remaining samples \(\mathcal T {\setminus }\mathcal{R }\).

Theorem 1

(e.g. Langford 2005) For a random sample \(\mathcal{T }\) of iid examples drawn from an arbitrary, but fixed distribution \(\mathcal{D }\) and for all \(\delta \in \left(0,1\right]\),

$$\begin{aligned}&\mathop {\text{ Pr}}\limits _{\mathcal{T } \sim \mathcal{D }^{|\mathcal{T }|}} \left( \forall \mathcal{R } \subseteq \mathcal{T }\, with\, c \!=\! k\text{ NN}(x, \mathcal{R }): \mathcal{E }_\mathcal{D }(c) \le \mathrm{{\overline{Bin}}} \left(\text{ E}(c, \mathcal{T }{\setminus } \mathcal{R }), \frac{\delta }{|\mathcal{T }| \genfrac(){0.0pt}{}{|\mathcal{T }|}{|\mathcal{T }\setminus \mathcal{R }|}} \right) \right)\ge 1-\delta \nonumber \end{aligned}$$

Proof

Theorem 1 is a direct application of the sample compression bound given in Langford (2005). \(\square \)

The term \(\mathrm{{\overline{Bin}}}\) in Theorem 1 denotes the binomial tail inversion.

Figure 3 shows an application of the sample compression bound on a training set of size \(|\mathcal{T }|=100.\) The bound is shown for fixed empirical error rates of 0.01, 0.05 and 0.1. The bound suggests that of all \(k\)-NN classifiers achieving the same empirical error, the classifier with the smallest reference set is most reliable.

Fig. 3
figure 3

The figure shows the sample compression bounds for a \(k\)-NN experiment with \(|\mathcal{T }|=100\) examples. We assume that empirical error rates of 0.01, 0.05 and 0.1 were measured for different compression set sizes \(|\mathcal{R }|\)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lausser, L., Müssel, C., Melkozerov, A. et al. Identifying predictive hubs to condense the training set of \(k\)-nearest neighbour classifiers. Comput Stat 29, 81–95 (2014). https://doi.org/10.1007/s00180-012-0379-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-012-0379-0

Keywords

Navigation