Nearest neighbor classification from multiple feature subsets

doi:10.1016/S1088-467X(99)00018-9

Intelligent Data Analysis

Volume 3, Issue 3, September 1999, Pages 191-209

Intelligent Data Ana...

https://doi.org/10.1016/S1088-467X(99)00018-9 Get rights and content

Abstract

Combining multiple classifiers is an effective technique for improving accuracy. There are many general combining algorithms, such as Bagging, Boosting, or Error Correcting Output Coding, that significantly improve classifiers like decision trees, rule learners, or neural networks. Unfortunately, these combining methods do not improve the nearest neighbor classifier. In this paper, we present MFS, a combining algorithm designed to improve the accuracy of the nearest neighbor (NN) classifier. MFS combines multiple NN classifiers each using only a random subset of features. The experimental results are encouraging: On 25 datasets from the UCI repository, MFS significantly outperformed several standard NN variants and was competitive with boosted decision trees. In additional experiments, we show that MFS is robust to irrelevant features, and is able to reduce both bias and variance components of error.

Introduction

The nearest neighbor (NN) classifier is one of the oldest and simplest methods for performing general, non-parametric classification. It can be represented by the following rule: to classify an unknown pattern, choose the class of the nearest example in the training set as measured by a distance metric. A common extension is to choose the most common class in the k nearest neighbors (kNN).

Despite its simplicity, the NN classifier has many advantages over other methods. For example, it can learn from a small set of examples, can incrementally add new information at runtime, and can give competitive performance with more modern methods such as decision trees or neural networks.

Since its inception by Fix and Hodges [25], researchers have investigated many methods for improving the NN classifier, but most work has concentrated on changing the distance metric or manipulating the patterns in the training set [4], [19]. Recently, researchers have begun experimenting with general algorithms for improving classification accuracy by combining multiple versions of a single classifier, also known as a multiple model or ensemble approach. The outputs of several classifiers are combined in the hope that the accuracy of the whole is greater than the parts. Unfortunately, many combining methods do not improve the NN classifier at all. For example, neither Bagging nor Error Correcting Output Coding (ECOC) improves the NN classifier [9], [44].

In this paper, we present a new method of combining NN classifiers with the goal of improving classification accuracy. Our approach uses random features for the individual classifiers. In contrast, other combining algorithms usually manipulate the training patterns (Bagging, Boosting) or the class labels ECOC.

In the next section, we discuss current ensemble approaches and their limitations with respect to the NN classifier. In Section 3 we describe the algorithm and investigate its time and space complexity. In Section 4 we evaluate the algorithm on datasets from the UCI repository for accuracy, computational complexity, and robustness to irrelevant features. Next, in Section 5 we analyze the algorithm's bias and variance components of error. In Section 6, we discuss related work, and follow it by conclusions and future directions in Section 7.

Section snippets

Limitations of existing ensemble methods

An ensemble of classifiers must be both diverse and accurate in order to improve accuracy of the whole. We require diversity to ensure that all the classifiers do not make the same errors. If the classifiers make identical errors, then these errors will propagate through voting to the whole ensemble. We require accuracy to ensure that in the ensemble vote, too many poor classifiers do not drown out the votes of correct classifiers.

Hansen and Salamon [28] formulated these diversity and accuracy

Classification from multiple feature subsets

The algorithm for NN classification from multiple feature subsets (MFS) is simple and can be stated as:

Using simple voting, combine the outputs from multiple NN classifiers, each having access only to a random subset of features.

We select the random subset of features by sampling from the original set of features. We use two different sampling functions: sampling with replacement, and sampling without replacement. In sampling with replacement, a feature can be selected more than once which we

Experiments

Classification algorithms are complex programs that interact with data in many unpredictable ways. It is generally not possible to determine many properties of an algorithm simply by looking at its description. Therefore, we resort to experimental analysis.

Bias-variance analysis of error

The expected error of an algorithm can be divided into two components: bias which is the consistent error that the algorithm makes over many different runs, and variance which is error that fluctuates from run to run. This decomposition is a useful method for explaining how changes to an algorithm affect the final error rates. It allows us to decompose the error into meaningful components and to see how the error components change with variations in the algorithm.

Several researchers have used

Related work

Although there is a large body of research on multiple model methods for classification, very little specifically deals with combining NN classifiers. We are only aware of Skalak's [47], [48] work on combining NN classifiers with small prototype sets, Alpaydin's [6] work with condensed nearest neighbor (CNN) classifiers [29], and Aha and Bankert's [3] initial work on combining NN, feature selection, and ECOC which was later expanded on by Ricci and Aha [44].

Skalak and Alpaydin approach the

Conclusions and future directions

In this article, we introduced MFS, an algorithm for combining multiple NN classifiers. In MFS, each NN classifier has access to all the patterns in the original training set, but only to a random subset of the features.

Our experiments showed that MFS was effective in improving accuracy, and was even competitive with boosted decision trees, the current state of the art. But beyond accuracy improvements, MFS is a significant advance because it allows us to incorporate many desirable properties

Acknowledgements

I thank Michael Pazzani for his support and encouragement. I also thank Cathy Blake, Yang Wang, and the anonymous reviewers for providing many comments that improved this paper. This article represents research conducted both at the University of California, Irvine, and at the University of Waterloo, Department of Systems Design Engineering. It was partially supported by an NSERC PGS A scholarship.

References (53)

L. Miclet et al.
Approximative fast nearest-neighbour recognition
Pattern Recognition Letters
(1983)
A. Agresti, Categorical Data Analysis, Wiley, New York,...
D.W. Aha, R.L. Bankert, Feature selection for case-based classification of cloud types: An empirical comparison, in:...
D.W. Aha et al.
Cloud classification using error-correcting output codes
Artificial Intelligence Applications: Natural Resources, Agriculture, and Environmental Science
(1997)
D.W. Aha et al.
Instance-based learning algorithms
Machine Learning
(1991)
K.M. Ali et al.
Error reduction through learning multiple descriptions
Machine Learning
(1996)
E. Alpaydin
Voting over multiple condensed nearest neighbors
Artificial Intelligence Review
(1997)
S.D. Bay, Nearest neighbour classification from multiple data representations, Master's thesis, University of Waterloo,...
C. Blake, E. Keogh, C.J. Merz, UCI repository of machine learning databases,...
L. Breiman
Bagging predictors
Machine Learning
(1996)

L. Breiman, Bias, variance, and arcing classifiers, Tech. Rep. 460, Statistics Department, University of California,...

L. Breiman

Heuristics of instability and stabilizations in model selection

The Annals of Statistics

(1996)

J. Catlett, Megainduction: machine learning on very large databases, Ph.D. thesis, Faculty of Arts, University of...

P. Chan, S. Stolfo, A comparative evaluation of voting and meta-learning on partitioned data, in: Proceedings of the...

K.J. Cherkauer, Human expert-level performance on a scientific image analysis task by a system using combined...

P. Clark, R. Boswell, Rule induction with CN2: Some recent improvements, in: Machine Learning: Proceedings of the Fifth...

P. Clark et al.

The CN2 induction algorithm

Machine Learning

(1989)

J. Cohen

A coefficient of agreement for nominal scales

Educational and Psychological Measurement

(1960)

T.M. Cover et al.

Nearest neighbor pattern classification

IEEE Transactions on Information Theory

(1967)

B.V. Dasarathy, Nearest Neighbor, NN Norms: NN Pattern Classification Techniques, IEEE Computer Society Press, Los...

T.G. Dietterich

Machine learning research: four current directions

AI Magazine

(1997)

T.G. Dietterich, An experimental comparison of three methods for constructing ensembles of decision trees: Bagging,...

T.G. Dietterich et al.

Solving multiclass learning problems via error-correcting output codes

Journal of Artificial Intelligence Research

(1995)

R.O. Duda et al.

Pattern Classification and Scene Analysis

(1973)

R.A. Fisher

The use of multiple measurements in taxonomic problems

Annual Eugenics

(1936)

E. Fix, J.L. Hodges, Discriminatory analysis: Nonparametric discrimination: Consistency properties, Tech. Rep. Project...

Cited by (170)

Optimal -k nearest neighbours based ensemble for classification and feature selection in chemometrics data
2023, Chemometrics and Intelligent Laboratory Systems
There are various machine-learning techniques available for classification and regression tasks. The k-nearest neighbours (k-NN) method is a well-recognized algorithm that is used for both regression and classification problems. It identifies a group of knearest observations to a given test point, reducing the impact of outliers in the training dataset. For regression, the mean value is calculated, while for classification, the majority value is determined. This study proposes a novel ensemble approach that constructs k-NN models using bootstrap samples from the training data and a randomly selected subset of features. Stepwise logistic regression is then applied to the nearest neighbours identified by each k-NN model to estimate the test observations. The final estimation for the test point's response is made through a majority voting approach using the estimates from different k-NN models. The performance of the proposed method is compared to other methods using five benchmark datasets, using Brier score, sensitivity, and accuracy as performance metrics. The results indicate that the proposed ensemble method outperforms the other methods across most of the datasets. Additionally, the proposed ensemble method is used for feature selection and compared with four other feature selection methods using 9 benchmark datasets. The results demonstrate that the proposed method exhibits superior performance compared to the other methods.
Random subspace and random projection nearest neighbor ensembles for high dimensional data
2022, Expert Systems with Applications
Citation Excerpt :
The random subspace and random projection methods have been used to create ensembles; in the first case by choosing a random subset of features from the original feature set and in the second case by projecting the original features into lower dimensions using a random projection matrix. Although some investigations of these approaches have been presented in the literature (Bay, 1999; Kuncheva et al., 2010; Mylavarapu & Kaban, 2013; Schclar & Rokach, 2009), no study has focused on comparing the two approaches towards forming nearest neighbor ensembles for high dimensional data. The study empirically evaluates the performance of nearest neighbor ensembles with random subspace and random projection methods and compares the performance with state-of-the-art techniques, such as random forests and support vector classifiers using three different types of high-dimensional datasets; microarrays, chemoinformatics, and images.
The random subspace and the random projection methods are investigated and compared as techniques for forming ensembles of nearest neighbor classifiers in high dimensional feature spaces. The two methods have been empirically evaluated on three types of high-dimensional datasets: microarrays, chemoinformatics, and images. Experimental results on 34 datasets show that both the random subspace and the random projection method lead to improvements in predictive performance compared to using the standard nearest neighbor classifier, while the best method to use depends on the type of data considered; for the microarray and chemoinformatics datasets, random projection outperforms the random subspace method, while the opposite holds for the image datasets. An analysis using data complexity measures, such as attribute to instance ratio and Fisher’s discriminant ratio, provide some more detailed indications on what relative performance can be expected for specific datasets. The results also indicate that the resulting ensembles may be competitive with state-of-the-art ensemble classifiers; the nearest neighbor ensembles using random projection perform on par with random forests for the microarray and chemoinformatics datasets.
A particle swarm optimization based ensemble for vegetable crop disease recognition
2020, Computers and Electronics in Agriculture
Ensemble methods give better performance compared to a single machine learning algorithm. Vote is one of the best ensembles. Vote merges predictions from Simple Logistics and the Naive Bayes algorithms in the present work. The paper presents a new ensemble approach – Ensemble Particle Swarm Optimization (EnsPSO). The EnsPSO approach is a combination of (i) Vote, (ii) Correlation based Feature(s) Selection (CFS) method, (iii) PSO algorithm and (iv) random sampling method. The EnsPSO shows better performance results than Vote. The EnsPSO shows higher classification accuracy (96%) as compared to Vote (84%). The performance enhancement of EnsPSO is also proved using ten-fold cross validation on 3 standard datasets.
Fusing absolute and relative information for augmenting the method of nearest neighbors for ordinal classification
2020, Information Fusion
Citation Excerpt :
The method of k nearest neighbors (k-NN) is one of the most fundamental and well-known machine learning methods and has been widely used for classification [5], clustering [6] and regression [7] in various domains of application such as financial modelling [8], image interpolation [9] and bioinformatics [10]. It is a non-parametric method [11] that assigns to a test example the most frequent class label among its k nearest neighbors. The first formulation of k-NN is usually attributed to Fix and Hodges [12], although the work of Cover and Hart [13] also contributed to the popularity of the method.
Ordinal classification is a special case of multiclass classification in which there exists a natural order on the set of class labels. Due to the nature of the problem, datasets for ordinal classification are typically rather small, having a negative impact on performance. A possible way out is to look for additional information. In this paper, firstly, we make use of order relations for unlabeled examples to generate relative information. Secondly, we incorporate this relative information into the method of k nearest neighbors, thus exploiting absolute and relative information at the same time. More specifically, we bring together notions from the fields of information fusion and machine learning to integrate both types of information. Finally, we test the proposed method on some classical machine learning datasets. The experimental results show the effectiveness of our approach.
A scalable framework for cross-lingual authorship identification
2018, Information Sciences
Cross-lingual authorship identification aims at finding the author of an anonymous document written in one language by using labeled documents written in other languages. The main challenge of cross-lingual authorship identification is that the stylistic markers (features) used in one language may not be applicable to other languages in the corpus. Existing methods overcome this challenge by using external resources such as machine translation and part-of-speech tagging. However, such solutions are not applicable to languages with poor external resources (known as low resource languages). They also fail to scale as the number of candidate authors and/or the number of languages in the corpus increases. In this investigation, we analyze different types of stylometric features and identify 10 high-performance language-independent features for cross-lingual stylometric analysis tasks. Based on these stylometric features, we propose a cross-lingual authorship identification solution that can accurately handle a large number of authors. Specifically, we partition the documents into fragments where each fragment is further decomposed into fixed size chunks. Using a multilingual corpus of 400 authors with 825 documents written in 6 different languages, we show that our method can achieve an accuracy level of 96.66%. Our solution also outperforms the best existing solution that does not rely on external resources.
An approach to EEG-based gender recognition using entropy measurement methods
2018, Knowledge-Based Systems
Gender carries significant information related to male and female characteristics. Gender classification using physiological measurements are being studied vigorously. An automated analysis of the electroencephalography (EEG) signal is one of many techniques that may help study brain functions. In this study, EEG data were employed to recognize gender through a novel method used to estimate the differences in the status of brain based EEG data of male subjects and data from female subjects by computing different entropy measurements, including fuzzy entropy (FE), sample entropy (SE), approximate entropy (AE) and spectral entropy (PE). The paper presents an improvement of performance for EEG-based gender recognition with signals collected from 28 subjects. The system employs four entropy measurements as the feature extraction algorithm, used six types of classifiers namely K-Nearest Neighbors (KNN), Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), Quadratic Discriminant Analysis (QDA), and Decision Tree (DT). Three ensemble classifiers were employed, including Bagging, Boosting and vote classifier. The classification results showed that by using an FE feature extractor and SVM classifiers, the classification performance achieved an improved classification performance with an accuracy of 0.995 and AUC of 0.995 when compared to LR (accuracy at 0.949 with AUC of 0.983), DT classifiers (accuracy at 0.961 with AUC of 0.963), RF (accuracy at 0.976 with AUC of 0.993), KNN (accuracy at 0.991 with AUC of 0.993), and QDA (accuracy at 0.966 with AUC of 0.993). Using the Boosting and vote method, classification performance was improved further with an accuracy of 0.996 and 0.998, respectively. It was possible to use EEG signals for gender recognition.

View all citing articles on Scopus

View full text

Nearest neighbor classification from multiple feature subsets

Abstract

Introduction

Section snippets

Limitations of existing ensemble methods

Classification from multiple feature subsets

Experiments

Bias-variance analysis of error

Related work

Conclusions and future directions

Acknowledgements

Pattern Recognition Letters

Cloud classification using error-correcting output codes

Artificial Intelligence Applications: Natural Resources, Agriculture, and Environmental Science

Instance-based learning algorithms

Machine Learning

Error reduction through learning multiple descriptions

Machine Learning

Voting over multiple condensed nearest neighbors

Artificial Intelligence Review

Bagging predictors

Machine Learning

Heuristics of instability and stabilizations in model selection

The Annals of Statistics

The CN2 induction algorithm

Machine Learning

A coefficient of agreement for nominal scales

Educational and Psychological Measurement

Nearest neighbor pattern classification

IEEE Transactions on Information Theory

Machine learning research: four current directions

AI Magazine

Solving multiclass learning problems via error-correcting output codes

Journal of Artificial Intelligence Research

Pattern Classification and Scene Analysis

The use of multiple measurements in taxonomic problems

Annual Eugenics