New rank methods for reducing the size of the training set using the nearest neighbor rule

https://doi.org/10.1016/j.patrec.2011.07.019Get rights and content

Abstract

Some new rank methods to select the best prototypes from a training set are proposed in this paper in order to establish its size according to an external parameter, while maintaining the classification accuracy. The traditional methods that filter the training set in a classification task like editing or condensing have some rules that apply to the set in order to remove outliers or keep some prototypes that help in the classification. In our approach, new voting methods are proposed to compute the prototype probability and help to classify correctly a new sample. This probability is the key to sorting the training set out, so a relevance factor from 0 to 1 is used to select the best candidates for each class whose accumulated probabilities are less than that parameter. This approach makes it possible to select the number of prototypes necessary to maintain or even increase the classification accuracy. The results obtained in different high dimensional databases show that these methods maintain the final error rate while reducing the size of the training set.

Highlights

► New rank methods to select the best prototypes from a training set are proposed. ► Its compute the prototype probability and help to classify correctly a new sample. ► A relevance factor from 0 to 1 is used to select the best candidates. ► The results show how to keep the final error rate while reducing the size of the training set.

Introduction

In classification problems, a statistical knowledge of the conditional density functions of each class is rarely available, so application of the optimal Bayes classification methods is not possible. The nearest neighbor (NN) rule and its extension (k-nearest neighbors) have been the most widely used non-parametric classifiers in practice.

The advantage of the NN rule lies in the fact that it combines its conceptual simplicity with an asymptotic error rate that is conveniently bounded in terms of the optimal Bayes error (Cover and Hart, 1967). However, the NN rule also presents some problems when the number of prototypes is large, since it needs to store all the examples in a memory and is sensitive to noisy instances. Many researchers have studied how to reduce the training set and obtain the same classification ability as when the whole training set is used (Wilson and Martinez, 2000, Wilson, 1972, Hart, 1968).

There are two different ways to deal with the instance reduction problem (Li et al., 2005):

  • New prototype generation that creates a new prototype set (Chang, 1974).

  • Prototype selection, consisting in selecting a particular subset of prototypes from the original training set:

    • The condensing or reducing algorithms give the minimal subset of prototypes that lead to the same performance as when the whole training set is used.

    • The editing algorithm eliminates atypical prototypes from the original set and removes overlapping among classes.

For condensing algorithms, the key problem is to decide which examples should be retained. The difference between the different condensing algorithms is how the typicality of training examples is evaluated. Condensing algorithms give more emphasis to minimizing the size of the training set and its consistence, but noisy examples may often be selected for the prototype set and harm the classification accuracy. For editing algorithms, identifying these ‘bad’ training examples that harm the accuracy is the most important challenge.

In this paper, some new rank methods are proposed in order to reduce the size of the training set. This rank is based on estimating the probability that an instance participates in a correct classification using the nearest neighbor rule. So, using this methodology the user could control the size of the resulting training set.

In the second section, some classical methodologies to reduce the training set are explained. In the third section, our new methodologies are introduced with their detailed algorithms. In the fourth section, the results obtained when applying different algorithms to reduce the size of the training set and their associated error rates on some widely used collection data are shown. Finally, some conclusions and future lines of work are presented.

Section snippets

Condensed Nearest Neighbor Rule

The Condensed Nearest Neighbor Rule (CNN) (Hart, 1968) was one of the first techniques to reduce the size of the training set. This algorithm gives a subset S of the training set T such that every member of T is closer to a member of S of the same class than to a member of a different class.

The algorithm selects a sample randomly from T. If this sample is incorrectly classified using S then it is added to S. The process is repeated until all samples belonging to T are selected and correctly

Our methodology

In order to illustrate the main idea of our proposed methodology, a binary classification problem is considered and a distribution of training samples, T. In the figures shown in this section, the points of class 1 are indicated by circles and the points of class 2 by rectangles. The main idea is that the prototypes in the training set vote for the rest of the prototypes that help them to classify correctly and the method estimates a probability for each prototype that shows its importance in a

Results

The experiments were performed using two well known isolated handwritten character databases and the UCI Machine Learning Repository (Asuncion and Newman, 2007). The first is a database of uppercase characters (the NIST Special Database 3 of the National Institute of Standards and Technology) and the second contains digits (the US Post Service digit recognition corpus, USPS). In both cases, the classification task was performed using contour descriptions with Freeman codes (Freeman, 1961) and

Conclusions and future work

In this paper, new algorithms to reduce the size of the training set for use in a classification task have been presented. These algorithms give a different estimate of the probability that a new example may be classified correctly by a training set example. The results obtained for accuracy are in most cases better than those obtained using the classical algorithms compared. Note the good performance shown by 1-FN, 2-FN and 1-NE algorithms with respect to the classification accuracy and the

References (14)

  • Asuncion, A., Newman, D., 2007. UCI machine learning repository. URL...
  • C. Chang

    Finding prototypes for nearest neighbour classifiers

    IEEE Trans. Comput.

    (1974)
  • T. Cover et al.

    Nearest neighbor pattern classification

    IEEE Trans. Inform. Theory

    (1967)
  • B.V. Dasarathy et al.

    Nearest neighbour editing and condensing tools-synergy exploitation

    Pattern Anal. Appl.

    (2000)
  • F.J. Ferri et al.

    Considerations about sample-size sensitivity of a family of edited nearest-neighbor rules

    IEEE Trans. Syst. Man Cybernet. Part B

    (1999)
  • H. Freeman

    On the encoding of arbitrary geometric configurations

    IRE Trans. Electron. Comput.

    (1961)
  • G. Gates

    The reduced nearest neighbor rule

    IEEE Trans. Inform. Theory IT-

    (1972)
There are more references available in the full text version of this article.

Cited by (35)

  • Extensions to rank-based prototype selection in k-Nearest Neighbour classification

    2019, Applied Soft Computing Journal
    Citation Excerpt :

    each prototype of the training set votes for the other elements which lead to its correct classification. The rank methods considered in this paper focus on the two voting heuristics proposed so far, to our best knowledge: Farthest Neighbour (FN) and Nearest to Enemy (NE) [15]. Both strategies are based on the idea that a prototype can only vote for another element, and the point is to decide which prototype the vote is given to.

  • Weighted Reward-Punishment Editing

    2016, Pattern Recognition Letters
    Citation Excerpt :

    In [5] a novel clustering approach called supervised clustering is introduced that is based on the idea of replacing a dataset with a set of cluster prototypes that is determined by a supervised clustering procedure. Two prototype selection methods based on ranking are proposed in [26] and [27]. The first prototype method is based on new voting methods that compute a prototype probability that aids in the correct classification of a new sample.

View all citing articles on Scopus
View full text