Neighborhood classifiers

https://doi.org/10.1016/j.eswa.2006.10.043Get rights and content

Abstract

K nearest neighbor classifier (K-NN) is widely discussed and applied in pattern recognition and machine learning, however, as a similar lazy classifier using local information for recognizing a new test, neighborhood classifier, few literatures are reported on. In this paper, we introduce neighborhood rough set model as a uniform framework to understand and implement neighborhood classifiers. This algorithm integrates attribute reduction technique with classification learning. We study the influence of the three norms on attribute reduction and classification, and compare neighborhood classifier with KNN, CART and SVM. The experimental results show that neighborhood-based feature selection algorithm is able to delete most of the redundant and irrelevant features. The classification accuracies based on neighborhood classifier is superior to K-NN, CART in original feature spaces and reduced feature subspaces, and a little weaker than SVM.

Introduction

Giving a set of samples U, described with some input variables C (also called condition attributes, features) and an output D (decision), the task of classification learning is to construct a mapping from the condition to the decision labels based on the set of training samples. One of the most popular learning and classification techniques is the nearest neighbor search, introduced by Fix and Hodges (1951). It has been proven to be a simple and yet powerful recognition algorithm. In 1967, Cover and Hart (1967) showed, under some continuity assumptions on the underlying distributions, that the asymptotic error rate of the 1-NN rule is bounded from above by twice the Bayes error (the error of the best possible rule). What is more, a key feature of this decision rule is that it performs remarkably well considering that no explicit knowledge of the underlying distributions of the data is used. Furthermore, a simple generalization of this method, called K-NN-rule, in which a new pattern is classified into the class with the most members present among the K nearest neighbors, can be used to obtain good estimates of the Bayes error and its probability of error asymptotically approaches the Bayes error (Duda & Hart, 1973). However, K-NN classifiers require computing all the distances between the training set and test samples, it is time-consuming if the available samples are of very great size. Besides, when the number of prototypes in the training set is not large enough, the K-NN rule is no longer optimal. This problem becomes more relevant when having few prototypes compared to the intrinsic dimensionality of the feature space. After half century, a wide variety of algorithms have been developed to deal with these problems (Anil, 2006, Fu et al., 2000, Fukunaga and Narendra, 1975, Hart, 1968, Kuncheva and Lakhmi, 1999, Kushilevitz et al., 2000, Lindenbaum et al., 2004, Short and Fukunaga, 1981, Vidal, 1986, Wilson and Martinez, 2000, Zhou et al., 2006).

From another viewpoint, some classification algorithms based on neighborhood were proposed, where a new sample is associated with a neighborhood, rather than some nearest neighbors. Owen developed a classifier which uses information from all data points in a neighborhood to classify the point at the center of the neighborhood (Owen, 1984). The neighborhood-based classifier is shown to outperform linear discriminant analysis on some LANDSAT data. Salzberg (1991) proposed a family of learning algorithms based on nested generalized exemplars (NGE), where an exemplar is a single training example, and generalized exemplars is an axis-parallel hyperrectangle that may cover several training examples. Once the generalized exemplars are learned, a test example can be classified by computing the Euclidean distance between the example and each of the generalized exemplars. If an example is contained in a generalized exemplar, the distance to that generalized exemplar is zero. The class of the nearest generalized exemplar is output as the predicted class of the test example. Wettschereck and Dieterich (1995) compared NGE with K-NN algorithms, and found that in most cases, K-NN outperforms NGE. Then some improved versions of NGE, called NONGE, BNGE and OBNGE, were developed, where NONGE disallows overlapping rectangles while retaining nested rectangles and the same search procedure is uniformly superior to NGE, while OBNGE is a batch algorithm that incorporates an improved search algorithm and disallows nested rectangles (but still permits overlapping rectangles) and is only superior to NGE in one domain and worse in two; BNGE is a batch version of NONGE that is very efficient and requires no user tuning of parameters. They also pointed out that further research is needed to develop an NGE-like algorithm that can be robust in situations where axis-parallel hyperrectangles are inappropriate. Intuitively, the concept of neighborhood should be such that the neighbors are as close to a sample as possible but also, the neighbors should lie as homogeneously around that sample as possible. Sanchez, Pla, and Ferri (1997) showed that the geometrical placement can become much more important than the actual distances to appropriately characterize a sample by its neighborhood. As the nearest neighborhood takes into account the first property only, the nearest neighbors may not be placed symmetrically around the sample if the neighborhood in the training set is not spatially homogeneous. In fact, it has been shown that the use of local distance measures can significantly improve the behavior of the classifier in the case of a finite sample size. They proposed to make use of some alternative neighborhood definitions, obtaining the surrounding neighborhood (SN) samples, the neighbors of a sample will be considered not only in terms of proximity but also in terms of their spatial distribution with respect to that sample. More recently, Wang (2006) showed a nonparametric technique for pattern recognition, named neighborhood counting (NC), where he used neighborhoods of data points measure the similarity between two data points. Considering all neighborhoods that cover both data points, he proposed using the number of such neighborhoods as a generic measure of similarity. However, most of the work is focused on 2-norm neighborhood, few researches compare the influence bringing by different norms, such as 1-norm and infinite-norm. What is more, there is no uniform framework to understand, analyze and compare these algorithms.

In fact, neighborhoods and neighborhood relations are a class of important concepts in topology. Lin, 1988, Lin, 1997 pointed out that neighborhood spaces are more general topological spaces than equivalence spaces and introduced neighborhood relation into rough set methodology, which has shown to be a powerful tool to attribute reduction, feature selection, rule extraction and reasoning with uncertainty (Hu et al., 2006, Hu et al., 2006, Jensen and Shen, 2004, Swiniarski and Skowron, 2003). Yao (1998) and Wu and Zhang (2002) discussed the properties of neighborhood approximation spaces. However, few applications of the model were reported these years. In this paper, we will review the basic concepts on neighborhood and neighborhood rough sets and show some properties of the model. And then we will use the model to build a uniform theoretic framework for neighborhood-based classifiers. This framework integrates feature selection with classifier construction, and classifies a test sample in the selected subspaces based on the majority class in the neighborhood of the test sample. The proposed technique combines the advantages of feature subset selection and neighborhood-based classification. It is conceptually simple and is straightforward to implement. Some experimental analysis is conducted on UCI data sets. Three kinds of norms, 1-norm, 2-norm and infinite-norm, are tried. The results show that the proposed classification systems outperform the popular CART Learning algorithm and K-NN classifier, and a little weaker than SVM for the three norms.

The remainder of the paper is organized as follows. The basic concepts on neighborhood rough set models are shown in Section 2. The neighborhood classifier algorithm is introduced in Section 3. Section 4 presents the experimental analysis. Then the conclusion is given in Section 5.

Section snippets

Neighborhood-based rough set model

Formally, the structural data for classification learning can be written as a tuple IS = U ,A, V, f〉, where U is the nonempty set of samples {x1, x2, …, xn}, called a universe or sample space, A is the nonempty set of variables (also called as features, inputs, attributes) {a1, a2, …, am} to characterize the samples, Va is the value domain of attribute a; and f is an information function, f: U × A  V. More specially, 〈U, A, V, f〉 is also called a decision table if A = C  D, where C is the set of condition

Classification learning algorithm

Usually we hope to recognize pattern in a relatively lower dimensional space to avoid curse-of dimension, reduce cost in measuring and processing information and enhance the interpretability of learned models. However, as the development of information techniques, more and more samples and features are acquired and stored. Classification algorithms will be confused with a lot of features. Therefore, feature subset selection is implicitly or explicitly conducted for some learning systems (Muni

Experimental analysis

In order to test the proposed classification model, some data sets are downloaded from the machine learning data repository, University of California at Irvine. The data sets are outlined in Table 1.

There are two objectives to conduct the experiments. The first one is to compare classification performances of K-NN, neighborhood classifier (NEC), CART and SVM in original feature spaces and reduced feature spaces. The second one is to get experiential rule to specify the parameter w used in NEC.

Conclusion and future work

K-NN classifiers are widely discussed and applied; however, as another classification technique based on local information, neighborhood classifiers have not been carefully studied. In this paper, we introduce neighborhood rough set model as a basic theoretic framework, which presents a conceptually simple and easy to implement method to understand and construct neighborhood-based attribute reduction technique and classifiers.

Experiments with UCI data sets show both in the original feature

References (34)

  • R.O. Duda et al.

    Pattern classification and scene analysis

    (1973)
  • Fix, E., & Hodges, J. (1951). Discriminatory analysis. Nonparametric discrimination: Consistency properties. Tech....
  • A.W. Fu et al.

    Dynamic vp-tree indexing for n-nearest neighbor search given pair-wise distances

    VLDB Journal

    (2000)
  • K. Fukunaga et al.

    A branch and bound algorithm for computing k-nearest neighbors

    IEEE Transactions on Computers

    (1975)
  • M. Girolami et al.

    Probability density estimation from optimally condensed data samples

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2003)
  • P.E. Hart

    The condensed nearest neighbor

    IEEE Transactions on Information Theory

    (1968)
  • Q.H. Hu et al.

    Fuzzy probabilistic approximation spaces and their information measures

    IEEE Transactions on Fuzzy Systems

    (2006)
  • Cited by (488)

    View all citing articles on Scopus
    View full text