Nearest neighbor classification from multiple feature subsets
Introduction
The nearest neighbor (NN) classifier is one of the oldest and simplest methods for performing general, non-parametric classification. It can be represented by the following rule: to classify an unknown pattern, choose the class of the nearest example in the training set as measured by a distance metric. A common extension is to choose the most common class in the k nearest neighbors (kNN).
Despite its simplicity, the NN classifier has many advantages over other methods. For example, it can learn from a small set of examples, can incrementally add new information at runtime, and can give competitive performance with more modern methods such as decision trees or neural networks.
Since its inception by Fix and Hodges [25], researchers have investigated many methods for improving the NN classifier, but most work has concentrated on changing the distance metric or manipulating the patterns in the training set [4], [19]. Recently, researchers have begun experimenting with general algorithms for improving classification accuracy by combining multiple versions of a single classifier, also known as a multiple model or ensemble approach. The outputs of several classifiers are combined in the hope that the accuracy of the whole is greater than the parts. Unfortunately, many combining methods do not improve the NN classifier at all. For example, neither Bagging nor Error Correcting Output Coding (ECOC) improves the NN classifier [9], [44].
In this paper, we present a new method of combining NN classifiers with the goal of improving classification accuracy. Our approach uses random features for the individual classifiers. In contrast, other combining algorithms usually manipulate the training patterns (Bagging, Boosting) or the class labels ECOC.
In the next section, we discuss current ensemble approaches and their limitations with respect to the NN classifier. In Section 3 we describe the algorithm and investigate its time and space complexity. In Section 4 we evaluate the algorithm on datasets from the UCI repository for accuracy, computational complexity, and robustness to irrelevant features. Next, in Section 5 we analyze the algorithm's bias and variance components of error. In Section 6, we discuss related work, and follow it by conclusions and future directions in Section 7.
Section snippets
Limitations of existing ensemble methods
An ensemble of classifiers must be both diverse and accurate in order to improve accuracy of the whole. We require diversity to ensure that all the classifiers do not make the same errors. If the classifiers make identical errors, then these errors will propagate through voting to the whole ensemble. We require accuracy to ensure that in the ensemble vote, too many poor classifiers do not drown out the votes of correct classifiers.
Hansen and Salamon [28] formulated these diversity and accuracy
Classification from multiple feature subsets
The algorithm for NN classification from multiple feature subsets (MFS) is simple and can be stated as:
Using simple voting, combine the outputs from multiple NN classifiers, each having access only to a random subset of features.
We select the random subset of features by sampling from the original set of features. We use two different sampling functions: sampling with replacement, and sampling without replacement. In sampling with replacement, a feature can be selected more than once which we
Experiments
Classification algorithms are complex programs that interact with data in many unpredictable ways. It is generally not possible to determine many properties of an algorithm simply by looking at its description. Therefore, we resort to experimental analysis.
Bias-variance analysis of error
The expected error of an algorithm can be divided into two components: bias which is the consistent error that the algorithm makes over many different runs, and variance which is error that fluctuates from run to run. This decomposition is a useful method for explaining how changes to an algorithm affect the final error rates. It allows us to decompose the error into meaningful components and to see how the error components change with variations in the algorithm.
Several researchers have used
Related work
Although there is a large body of research on multiple model methods for classification, very little specifically deals with combining NN classifiers. We are only aware of Skalak's [47], [48] work on combining NN classifiers with small prototype sets, Alpaydin's [6] work with condensed nearest neighbor (CNN) classifiers [29], and Aha and Bankert's [3] initial work on combining NN, feature selection, and ECOC which was later expanded on by Ricci and Aha [44].
Skalak and Alpaydin approach the
Conclusions and future directions
In this article, we introduced MFS, an algorithm for combining multiple NN classifiers. In MFS, each NN classifier has access to all the patterns in the original training set, but only to a random subset of the features.
Our experiments showed that MFS was effective in improving accuracy, and was even competitive with boosted decision trees, the current state of the art. But beyond accuracy improvements, MFS is a significant advance because it allows us to incorporate many desirable properties
Acknowledgements
I thank Michael Pazzani for his support and encouragement. I also thank Cathy Blake, Yang Wang, and the anonymous reviewers for providing many comments that improved this paper. This article represents research conducted both at the University of California, Irvine, and at the University of Waterloo, Department of Systems Design Engineering. It was partially supported by an NSERC PGS A scholarship.
References (53)
- et al.
Approximative fast nearest-neighbour recognition
Pattern Recognition Letters
(1983) - A. Agresti, Categorical Data Analysis, Wiley, New York,...
- D.W. Aha, R.L. Bankert, Feature selection for case-based classification of cloud types: An empirical comparison, in:...
- et al.
Cloud classification using error-correcting output codes
Artificial Intelligence Applications: Natural Resources, Agriculture, and Environmental Science
(1997) - et al.
Instance-based learning algorithms
Machine Learning
(1991) - et al.
Error reduction through learning multiple descriptions
Machine Learning
(1996) Voting over multiple condensed nearest neighbors
Artificial Intelligence Review
(1997)- S.D. Bay, Nearest neighbour classification from multiple data representations, Master's thesis, University of Waterloo,...
- C. Blake, E. Keogh, C.J. Merz, UCI repository of machine learning databases,...
Bagging predictors
Machine Learning
(1996)
Heuristics of instability and stabilizations in model selection
The Annals of Statistics
The CN2 induction algorithm
Machine Learning
A coefficient of agreement for nominal scales
Educational and Psychological Measurement
Nearest neighbor pattern classification
IEEE Transactions on Information Theory
Machine learning research: four current directions
AI Magazine
Solving multiclass learning problems via error-correcting output codes
Journal of Artificial Intelligence Research
Pattern Classification and Scene Analysis
The use of multiple measurements in taxonomic problems
Annual Eugenics
Cited by (170)
Optimal -k nearest neighbours based ensemble for classification and feature selection in chemometrics data
2023, Chemometrics and Intelligent Laboratory SystemsRandom subspace and random projection nearest neighbor ensembles for high dimensional data
2022, Expert Systems with ApplicationsCitation Excerpt :The random subspace and random projection methods have been used to create ensembles; in the first case by choosing a random subset of features from the original feature set and in the second case by projecting the original features into lower dimensions using a random projection matrix. Although some investigations of these approaches have been presented in the literature (Bay, 1999; Kuncheva et al., 2010; Mylavarapu & Kaban, 2013; Schclar & Rokach, 2009), no study has focused on comparing the two approaches towards forming nearest neighbor ensembles for high dimensional data. The study empirically evaluates the performance of nearest neighbor ensembles with random subspace and random projection methods and compares the performance with state-of-the-art techniques, such as random forests and support vector classifiers using three different types of high-dimensional datasets; microarrays, chemoinformatics, and images.
A particle swarm optimization based ensemble for vegetable crop disease recognition
2020, Computers and Electronics in AgricultureFusing absolute and relative information for augmenting the method of nearest neighbors for ordinal classification
2020, Information FusionCitation Excerpt :The method of k nearest neighbors (k-NN) is one of the most fundamental and well-known machine learning methods and has been widely used for classification [5], clustering [6] and regression [7] in various domains of application such as financial modelling [8], image interpolation [9] and bioinformatics [10]. It is a non-parametric method [11] that assigns to a test example the most frequent class label among its k nearest neighbors. The first formulation of k-NN is usually attributed to Fix and Hodges [12], although the work of Cover and Hart [13] also contributed to the popularity of the method.
A scalable framework for cross-lingual authorship identification
2018, Information SciencesAn approach to EEG-based gender recognition using entropy measurement methods
2018, Knowledge-Based Systems