Nearest neighbor classification from multiple feature subsets

https://doi.org/10.1016/S1088-467X(99)00018-9Get rights and content

Abstract

Combining multiple classifiers is an effective technique for improving accuracy. There are many general combining algorithms, such as Bagging, Boosting, or Error Correcting Output Coding, that significantly improve classifiers like decision trees, rule learners, or neural networks. Unfortunately, these combining methods do not improve the nearest neighbor classifier. In this paper, we present MFS, a combining algorithm designed to improve the accuracy of the nearest neighbor (NN) classifier. MFS combines multiple NN classifiers each using only a random subset of features. The experimental results are encouraging: On 25 datasets from the UCI repository, MFS significantly outperformed several standard NN variants and was competitive with boosted decision trees. In additional experiments, we show that MFS is robust to irrelevant features, and is able to reduce both bias and variance components of error.

Introduction

The nearest neighbor (NN) classifier is one of the oldest and simplest methods for performing general, non-parametric classification. It can be represented by the following rule: to classify an unknown pattern, choose the class of the nearest example in the training set as measured by a distance metric. A common extension is to choose the most common class in the k nearest neighbors (kNN).

Despite its simplicity, the NN classifier has many advantages over other methods. For example, it can learn from a small set of examples, can incrementally add new information at runtime, and can give competitive performance with more modern methods such as decision trees or neural networks.

Since its inception by Fix and Hodges [25], researchers have investigated many methods for improving the NN classifier, but most work has concentrated on changing the distance metric or manipulating the patterns in the training set [4], [19]. Recently, researchers have begun experimenting with general algorithms for improving classification accuracy by combining multiple versions of a single classifier, also known as a multiple model or ensemble approach. The outputs of several classifiers are combined in the hope that the accuracy of the whole is greater than the parts. Unfortunately, many combining methods do not improve the NN classifier at all. For example, neither Bagging nor Error Correcting Output Coding (ECOC) improves the NN classifier [9], [44].

In this paper, we present a new method of combining NN classifiers with the goal of improving classification accuracy. Our approach uses random features for the individual classifiers. In contrast, other combining algorithms usually manipulate the training patterns (Bagging, Boosting) or the class labels ECOC.

In the next section, we discuss current ensemble approaches and their limitations with respect to the NN classifier. In Section 3 we describe the algorithm and investigate its time and space complexity. In Section 4 we evaluate the algorithm on datasets from the UCI repository for accuracy, computational complexity, and robustness to irrelevant features. Next, in Section 5 we analyze the algorithm's bias and variance components of error. In Section 6, we discuss related work, and follow it by conclusions and future directions in Section 7.

Section snippets

Limitations of existing ensemble methods

An ensemble of classifiers must be both diverse and accurate in order to improve accuracy of the whole. We require diversity to ensure that all the classifiers do not make the same errors. If the classifiers make identical errors, then these errors will propagate through voting to the whole ensemble. We require accuracy to ensure that in the ensemble vote, too many poor classifiers do not drown out the votes of correct classifiers.

Hansen and Salamon [28] formulated these diversity and accuracy

Classification from multiple feature subsets

The algorithm for NN classification from multiple feature subsets (MFS) is simple and can be stated as:

Using simple voting, combine the outputs from multiple NN classifiers, each having access only to a random subset of features.

We select the random subset of features by sampling from the original set of features. We use two different sampling functions: sampling with replacement, and sampling without replacement. In sampling with replacement, a feature can be selected more than once which we

Experiments

Classification algorithms are complex programs that interact with data in many unpredictable ways. It is generally not possible to determine many properties of an algorithm simply by looking at its description. Therefore, we resort to experimental analysis.

Bias-variance analysis of error

The expected error of an algorithm can be divided into two components: bias which is the consistent error that the algorithm makes over many different runs, and variance which is error that fluctuates from run to run. This decomposition is a useful method for explaining how changes to an algorithm affect the final error rates. It allows us to decompose the error into meaningful components and to see how the error components change with variations in the algorithm.

Several researchers have used

Related work

Although there is a large body of research on multiple model methods for classification, very little specifically deals with combining NN classifiers. We are only aware of Skalak's [47], [48] work on combining NN classifiers with small prototype sets, Alpaydin's [6] work with condensed nearest neighbor (CNN) classifiers [29], and Aha and Bankert's [3] initial work on combining NN, feature selection, and ECOC which was later expanded on by Ricci and Aha [44].

Skalak and Alpaydin approach the

Conclusions and future directions

In this article, we introduced MFS, an algorithm for combining multiple NN classifiers. In MFS, each NN classifier has access to all the patterns in the original training set, but only to a random subset of the features.

Our experiments showed that MFS was effective in improving accuracy, and was even competitive with boosted decision trees, the current state of the art. But beyond accuracy improvements, MFS is a significant advance because it allows us to incorporate many desirable properties

Acknowledgements

I thank Michael Pazzani for his support and encouragement. I also thank Cathy Blake, Yang Wang, and the anonymous reviewers for providing many comments that improved this paper. This article represents research conducted both at the University of California, Irvine, and at the University of Waterloo, Department of Systems Design Engineering. It was partially supported by an NSERC PGS A scholarship.

References (53)

  • L. Miclet et al.

    Approximative fast nearest-neighbour recognition

    Pattern Recognition Letters

    (1983)
  • A. Agresti, Categorical Data Analysis, Wiley, New York,...
  • D.W. Aha, R.L. Bankert, Feature selection for case-based classification of cloud types: An empirical comparison, in:...
  • D.W. Aha et al.

    Cloud classification using error-correcting output codes

    Artificial Intelligence Applications: Natural Resources, Agriculture, and Environmental Science

    (1997)
  • D.W. Aha et al.

    Instance-based learning algorithms

    Machine Learning

    (1991)
  • K.M. Ali et al.

    Error reduction through learning multiple descriptions

    Machine Learning

    (1996)
  • E. Alpaydin

    Voting over multiple condensed nearest neighbors

    Artificial Intelligence Review

    (1997)
  • S.D. Bay, Nearest neighbour classification from multiple data representations, Master's thesis, University of Waterloo,...
  • C. Blake, E. Keogh, C.J. Merz, UCI repository of machine learning databases,...
  • L. Breiman

    Bagging predictors

    Machine Learning

    (1996)
  • L. Breiman, Bias, variance, and arcing classifiers, Tech. Rep. 460, Statistics Department, University of California,...
  • L. Breiman

    Heuristics of instability and stabilizations in model selection

    The Annals of Statistics

    (1996)
  • J. Catlett, Megainduction: machine learning on very large databases, Ph.D. thesis, Faculty of Arts, University of...
  • P. Chan, S. Stolfo, A comparative evaluation of voting and meta-learning on partitioned data, in: Proceedings of the...
  • K.J. Cherkauer, Human expert-level performance on a scientific image analysis task by a system using combined...
  • P. Clark, R. Boswell, Rule induction with CN2: Some recent improvements, in: Machine Learning: Proceedings of the Fifth...
  • P. Clark et al.

    The CN2 induction algorithm

    Machine Learning

    (1989)
  • J. Cohen

    A coefficient of agreement for nominal scales

    Educational and Psychological Measurement

    (1960)
  • T.M. Cover et al.

    Nearest neighbor pattern classification

    IEEE Transactions on Information Theory

    (1967)
  • B.V. Dasarathy, Nearest Neighbor, NN Norms: NN Pattern Classification Techniques, IEEE Computer Society Press, Los...
  • T.G. Dietterich

    Machine learning research: four current directions

    AI Magazine

    (1997)
  • T.G. Dietterich, An experimental comparison of three methods for constructing ensembles of decision trees: Bagging,...
  • T.G. Dietterich et al.

    Solving multiclass learning problems via error-correcting output codes

    Journal of Artificial Intelligence Research

    (1995)
  • R.O. Duda et al.

    Pattern Classification and Scene Analysis

    (1973)
  • R.A. Fisher

    The use of multiple measurements in taxonomic problems

    Annual Eugenics

    (1936)
  • E. Fix, J.L. Hodges, Discriminatory analysis: Nonparametric discrimination: Consistency properties, Tech. Rep. Project...
  • Cited by (170)

    • Random subspace and random projection nearest neighbor ensembles for high dimensional data

      2022, Expert Systems with Applications
      Citation Excerpt :

      The random subspace and random projection methods have been used to create ensembles; in the first case by choosing a random subset of features from the original feature set and in the second case by projecting the original features into lower dimensions using a random projection matrix. Although some investigations of these approaches have been presented in the literature (Bay, 1999; Kuncheva et al., 2010; Mylavarapu & Kaban, 2013; Schclar & Rokach, 2009), no study has focused on comparing the two approaches towards forming nearest neighbor ensembles for high dimensional data. The study empirically evaluates the performance of nearest neighbor ensembles with random subspace and random projection methods and compares the performance with state-of-the-art techniques, such as random forests and support vector classifiers using three different types of high-dimensional datasets; microarrays, chemoinformatics, and images.

    • Fusing absolute and relative information for augmenting the method of nearest neighbors for ordinal classification

      2020, Information Fusion
      Citation Excerpt :

      The method of k nearest neighbors (k-NN) is one of the most fundamental and well-known machine learning methods and has been widely used for classification [5], clustering [6] and regression [7] in various domains of application such as financial modelling [8], image interpolation [9] and bioinformatics [10]. It is a non-parametric method [11] that assigns to a test example the most frequent class label among its k nearest neighbors. The first formulation of k-NN is usually attributed to Fix and Hodges [12], although the work of Cover and Hart [13] also contributed to the popularity of the method.

    View all citing articles on Scopus
    View full text