Identifying predictive hubs to condense the training set of $$k$$ -nearest neighbour classifiers

Lausser, Ludwig; Müssel, Christoph; Melkozerov, Alexander; Kestler, Hans A.

doi:10.1007/s00180-012-0379-0

Identifying predictive hubs to condense the training set of $k$-nearest neighbour classifiers

Original Paper
Published: 22 November 2012

Volume 29, pages 81–95, (2014)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Ludwig Lausser¹,
Christoph Müssel¹,
Alexander Melkozerov² &
…
Hans A. Kestler¹

265 Accesses
4 Citations
Explore all metrics

Abstract

The $k$-Nearest Neighbour classifier is widely used and popular due to its inherent simplicity and the avoidance of model assumptions. Although the approach has been shown to yield a near-optimal classification performance for an infinite number of samples, a selection of the most decisive data points can improve the classification accuracy considerably in real settings with a limited number of samples. At the same time, a selection of a subset of representative training samples reduces the required amount of storage and computational resources. We devised a new approach that selects a representative training subset on the basis of an evolutionary optimization procedure. This method chooses those training samples that have a strong influence on the correct prediction of other training samples, in particular those that have uncertain labels. The performance of the algorithm is evaluated on different data sets. Additionally, we provide graphical examples of the selection procedure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A random forest guided tour

Article 19 April 2016

A comparative analysis of gradient boosting algorithms

Article 24 August 2020

A survey on semi-supervised learning

Article Open access 15 November 2019

References

Angiulli F (2005) Fast condensed nearest neighbor rule. In: De Raedt L, Wrobel S (eds) ICML, ACM pp 25–32
Bhattacharya B, Kaller D (1998) Reference set thinning for the k-nearest neighbor decision rule. In: Proceedings of the 14th international conference on pattern recognition, Vol 1- ICPR ’98. IEEE Computer Society pp 238–242
Brighton H, Mellish C (2002) Advances in instance selection for instance-based learning algorithms. Data Min Knowl Discov 6(2):153–172
Article MATH MathSciNet Google Scholar
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13:21–27
Article MATH Google Scholar
Dasarathy BV (1994) Minimal consistent set (MCS) identification for optimal nearest neighbor decision systems design. IEEE Trans Syst Man Cybern 24(3):511–517
Article Google Scholar
Devroye L, Györfi L, Lugosi G (1996) A probabilistic theory of pattern recognition. Springer, Berlin
Book MATH Google Scholar
Fix E, Hodges J (1951) Discriminatory analysis: nonparametric discrimination: consistency properties. Technical report project 21–49-004, Report number 4, USAF School of Aviation Medicine, Randolf Field, Texas
Frank A, Asuncion A (2010) UCI machine learning repository http://archive.ics.uci.edu/ml
Gates G (1972) The reduced nearest neighbor rule. IEEE Trans Inf Theory 18(3):431–433
Article Google Scholar
Gil-Pita R, Yao X (2007) Using a genetic algorithm for editing k-nearest neighbor classifiers. In: Yin H, Tiño P, Corchado E, Byrne W, Yao X (eds) IDEAL (Lecture Notes in Computer Science), vol 4881. Springer pp 1141–1150
Hart P (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14:515–516
Article Google Scholar
Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor
Google Scholar
Horton P, Nakai K (1996) A probabilistic classification system for predicting the cellular localization sites of proteins. In: States D, Agarwal P, Gaasterland T, Hunter L, Smith R (eds) Proceedings of the fourth international conference on intelligent systems for molecular Biology. AAAI Press pp 109–115
Kohonen T (1988) Learning vector quantization. Neural Netw 1:303
Article Google Scholar
Kohonen T (1995) Learning vector quantization. In: Arbib M (ed) The handbook of brain theory and neural networks. MIT Press, Cambrige, pp 537–540
Google Scholar
Kuncheva L (1995) Editing for the k-nearest neighbors rule by a genetic algorithm. Pattern Recognit Lett 16(8):809–814
Article Google Scholar
Kuncheva L (1997) Fitness functions in editing k-nn reference set by genetic algorithms. Pattern Recognit 30(6):1041–1049
Article Google Scholar
Kuncheva L, Jain L (1999) Nearest neighbor classifier: Simultaneous editing and feature selection. Pattern Recognit Lett 20(11–13):1149–1156
Article Google Scholar
Langford J (2005) Tutorial on practical prediction theory for classification. J Mach Learn Res 6:273–306
MATH MathSciNet Google Scholar
Li D, Simske S (2011) Training set compression by incremental clustering. J Pattern Recognit Res 6(1): 56–63
Google Scholar
Littlestone N, Warmuth M (1986) Relating data compression and learnability. Unpublished manuscript
Schwenker F, Kestler HA (2002) Analysis of support vectors helps to identify borderline patients in classification studies. Comput Cardiol 29:305–308
Google Scholar
Sigillito VG, Wing SP, Hutton LV, Baker KB (1989) Classification of radar returns from the ionosphere using neural networks. Johns Hopkins APL Tech Dig 10(3):262–266
Google Scholar
Street WN, Wolberg WH, Mangasaria OL (1993) Nuclear feature extraction for breast tumor diagnosis. In: Acharya R, Goldgof D (eds) International symposium on electronic imaging: science and technology, vol 1905. SPIE Conference Series, pp 861–870
Webb A (2002) Statistical pattern recognition, 2nd edn. Wiley, London
Book MATH Google Scholar
Wilson D (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2(3):408–421
Article MATH Google Scholar

Download references

Acknowledgments

This work is supported by the Graduate School of Mathematical Analysis of Evolution, Information and Complexity at the University of Ulm (CM, HAK) and by the German federal ministry of education and research (BMBF) within the framework of the program of medical genome research (PaCa-Net; project ID PKB-01GS08) and GerontoSys (Forschungskern SyStaR). AM was supported by the Ministry of Education and Science of the Russian Federation under the government contract 13.G25.31.0017 with the joint-stock company “Academician M.F. Reshetnev Information Satellite Systems”.

Author information

Authors and Affiliations

Research Group Bioinformatics and Systems Biology, Institute of Neural Information Processing, University of Ulm, 89069, Ulm, Germany
Ludwig Lausser, Christoph Müssel & Hans A. Kestler
Department of Television and Control, Tomsk State University of Control Systems and Radioelectronics, Lenin ave. 40, 634050, Tomsk, Russia
Alexander Melkozerov

Authors

Ludwig Lausser
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Müssel
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Melkozerov
View author publications
You can also search for this author in PubMed Google Scholar
Hans A. Kestler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hans A. Kestler.

Additional information

Ludwig Lausser and Christoph Müssel: contributed equally.

Appendix

The accuracy term $A$ given in Eq. 10 is inspired by the idea of sample compression bounds (Littlestone and Warmuth 1986). These bounds can be used to give an upper limit to the true classification error probability $\mathcal{E }$ of a data-dependent classifier $c$. A classifier is called data-dependent if it can be reconstructed directly from a certain subset of training samples $\mathcal{R } \subseteq \mathcal{T }$ (called a compression set in this context), i.e. $c_\mathcal{R } = c_\mathcal{T }$. This is clearly true for $k$-NN-like classifiers following Eq. 3, as they can be reconstructed from their reference set. A sample compression bound estimates the true error probability $\mathcal{E }$ from the empirical error rate E calculated on the remaining samples $\mathcal T {\setminus }\mathcal{R }$.

Theorem 1

(e.g. Langford 2005) For a random sample $\mathcal{T }$ of iid examples drawn from an arbitrary, but fixed distribution $\mathcal{D }$ and for all $\delta \in \left(0,1\right]$,

$$\begin{aligned}&\mathop {\text{ Pr}}\limits _{\mathcal{T } \sim \mathcal{D }^{|\mathcal{T }|}} \left( \forall \mathcal{R } \subseteq \mathcal{T }\, with\, c \!=\! k\text{ NN}(x, \mathcal{R }): \mathcal{E }_\mathcal{D }(c) \le \mathrm{{\overline{Bin}}} \left(\text{ E}(c, \mathcal{T }{\setminus } \mathcal{R }), \frac{\delta }{|\mathcal{T }| \genfrac(){0.0pt}{}{|\mathcal{T }|}{|\mathcal{T }\setminus \mathcal{R }|}} \right) \right)\ge 1-\delta \nonumber \end{aligned}$$

Proof

Theorem 1 is a direct application of the sample compression bound given in Langford (2005). $\square $

The term $\mathrm{{\overline{Bin}}}$ in Theorem 1 denotes the binomial tail inversion.

Figure 3 shows an application of the sample compression bound on a training set of size $|\mathcal{T }|=100.$ The bound is shown for fixed empirical error rates of 0.01, 0.05 and 0.1. The bound suggests that of all $k$-NN classifiers achieving the same empirical error, the classifier with the smallest reference set is most reliable.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lausser, L., Müssel, C., Melkozerov, A. et al. Identifying predictive hubs to condense the training set of $k$-nearest neighbour classifiers. Comput Stat 29, 81–95 (2014). https://doi.org/10.1007/s00180-012-0379-0

Download citation

Received: 01 March 2012
Accepted: 04 November 2012
Published: 22 November 2012
Issue Date: February 2014
DOI: https://doi.org/10.1007/s00180-012-0379-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Identifying predictive hubs to condense the training set of \(k\)-nearest neighbour classifiers

Abstract

Access this article