BeeOWA: A novel approach based on ABC algorithm and induced OWA operators for constructing one-class classifier ensembles
Introduction
Combining multiple classifiers, also known as classifier ensemble, is an effective technique for solving classification problems using an ensemble of individual base classifiers. It has been theoretically and empirically demonstrated that classifier ensembles can substantially improve the classification accuracy of their constituent members [1], [2], [3].
In addition to the accuracy of base classifiers, the success of a classifier ensemble also relies on the diversity being inherent in the base classifiers [4], [5]. Diversity ensures that all the base classifiers are as different from each other as possible and so can make uncorrelated errors. To achieve an initial diversity, a common technique is to train a set of homogeneous or heterogeneous base classifiers on different or same training datasets using techniques such as bagging [6], boosting [7], or random subspacing [8]. Bagging generates several different training datasets with bootstrap sampling from the original training dataset and then trains a base classifier from each of those training datasets. Boosting generates a sequence of base classifiers whose training datasets are different and determined by the accuracy of former base classifiers. Random subspacing trains base classifiers independently on the same training dataset using different random subsets of features.
Using a subset, or sub-ensemble, of base classifiers could provide higher diversity and accuracy than using the whole set, or ensemble. Thus, one of the most important issues in constructing a classifier ensemble is to decide which ones of the base classifiers to choose [9], [10]. This process is also known as classifier pruning or ensemble pruning [11] and can be considered as an optimization problem with two objectives, classification accuracy and diversity, that both need to be maximized. When the size of a classifier ensemble is relatively large, classifier pruning is computationally expensive or even prohibitive. One solution to this problem is to use meta-heuristic algorithms, such as artificial bee colony (ABC) [12]. These algorithms can find near-optimal solutions in a short time. ABC is a new swarm based meta-heuristic algorithm that was initially proposed for solving numerical optimization problems. It is as simple as particle swarm optimization (PSO) [13] and differential evolution (DE) [14], and uses only common control parameters, such as population size and maximum cycle number. ABC has shown promising results in the field of optimization [15], [16].
Furthermore, it has been shown that increasing the coverage of a classifier ensemble through classifier pruning is not enough to increase the classification accuracy [17], [18]. Hence, another important step in constructing a classifier ensemble is to choose a good strategy for combining the outputs of base classifiers, using a process also known as classifier fusion. In the literature, several classifier fusion strategies have been proposed, which can be categorized according to the level of classifier outputs into abstract level, rank level, and measurement level [19], [20]. Measurement level outputs provide more information than the other types of outputs and a number of aggregation functions or fusion rules, such as mean, max, and product are employed for combining them [21].
Aggregation of different pieces of information obtained from different sources is a common aspect of any fusion system. A very interesting class of powerful aggregation operators is called the ordered weighted averaging (OWA) [22]. An OWA operator takes multiple values as input and returns a single value that is a weighted sum based on an order ranking of all input values. Classifier fusion using OWA operators seems more robust than the simple weighted averaging, where the coefficients are derived based on the classifier accuracy [23].
In recent years, there has been a substantial amount of research conducted in the field of one-class classification, resulting in different one-class classifiers, including one-class SVM (OCSVM) [24], support vector data description (SVDD) [25], and so on. The goal of one-class classification is to distinguish a set of target objects from all the other possible objects [26]. Since in one-class classification problems we have only the information of the target class, therefore constructing highly accurate one-class classifier ensembles is more challenging than constructing multi-class classifier ensembles.
As mentioned before, to construct classifier ensembles, we need to address issues on how to generate a set of accurate and diverse base classifiers and how to combine the outputs of them using an effective fusion rule. Although these issues have been adequately addressed in multi-class classifier ensembles, relatively little work has been reported in the literature to address them in one-class classifier ensembles. In this paper, we present BeeOWA, a novel approach to construct highly accurate one-class classifier ensembles. BeeOWA uses a novel binary artificial bee colony algorithm, called BeePruner, to prune an initial one-class classifier ensemble and find a near-optimal sub-ensemble of base classifiers. More precisely, the goal of BeePruner is to exclude the non-diverse base classifiers from the initial ensemble and, at the same time, keep the classification accuracy. In the subsequent step, if the fusion rule does not properly utilize the ensemble diversity, then no benefit arises from the classifier fusion. Considering this fact, BeeOWA uses a novel exponential induced OWA operator, called EIOWA, to combine the outputs of base classifiers in the sub-ensemble.
The major contributions of this paper are listed as follows:
- •
We present a novel artificial bee colony algorithm for one-class classifier pruning that utilizes two measures simultaneously, an exponential consistency measure and a non-pairwise diversity measure based on the Kappa inter-rater agreement.
- •
To the best of our knowledge, the most widely used fusion rules in one-class classification problems are fixed rules, such as majority voting, mean, max, and product. We propose a novel exponential induced OWA operator for one-class classifier fusion and experimentally show it can outperform the fixed rules.
- •
We conduct extensive experiments on benchmark datasets to evaluate the performance of BeeOWA and show that it performs significantly better than state-of-the-art approaches in the literature.
The rest of this paper is organized as follows. Section 2 is fully dedicated to the background. Short descriptions of classifier ensembles, OWA operators, and ABC algorithm are included in this section. Section 3 presents the main steps of BeeOWA. Section 4 provides an overview of current techniques for constructing one-class classifier ensembles. The experimental results are described in Section 5. Finally, conclusions are given in Section 6.
Section snippets
Background
In this section, we give a brief introduction to some basic concepts used throughout this paper.
BeeOWA
In this section, we present BeeOWA, a novel approach for constructing one-class classifier ensembles. It assumes that we are given an initial ensemble of base one-class classifiers. Therefore, it consists of two main steps: one-class classifier pruning and one-class classifier fusion.
Related work
Over the last few years, one-class classifier ensembles have been used in various domains, such as information security [58], [59], signature verification [60], [61], image retrieval [62], and so on. As previously mentioned, the two challenging steps in constructing classifier ensembles are classifier pruning and classifier fusion. These steps have proven to be promising research directions for one-class classifier ensembles. Hence, in the following, we give a brief overview of the
Experiments
In this section, we evaluate the performance of BeeOWA using several benchmark datasets and compare it with that of state-of-the-art one-class classifier ensemble approaches in the literature.
All experiments were implemented in MATLAB using PRTools [69] and the Data Description toolbox [70]. All of the reported results were averaged over 5 times 10-fold cross-validation. For the comparison, we used the Friedman test [71] with the Nemenyi post hoc test [72], as recommended in [73]. The
Conclusions
It has been shown that using the best classifier and discarding the classifiers with poorer performance might waste valuable information [45]. For this reason, we commonly use classifier ensembles to improve the classification accuracy. In general, the process of constructing a classifier ensemble consists of three main steps: classifier generation, classifier pruning, and classifier fusion. Although classifier pruning and classifier fusion have been sufficiently studied in multi-class
Elham Parhizkar received the M.Sc. degree in computer engineering, with first class honors, from Tarbiat Modares University in 2014, where she worked on anomaly detection in web traffic as her master thesis. Her main research interests are in the field of machine learning, particularly in the areas of one-class classification and outlier detection.
References (75)
- et al.
Boosting with pairwise constraints
Neurocomputing
(2010) - et al.
Dynamic classifier ensemble using classification confidence
Neurocomputing
(2013) - et al.
LibD3Censemble classifiers with a clustering and dynamic selection strategy
Neurocomputing
(2014) - et al.
Ensembling neural networksmany could be better than all
Artif. Intell.
(2002) - et al.
Optimal selection of ensemble classifiers using measures of competence and diversity of base classifiers
Neurocomputing
(2014) - et al.
Optimization of stacking ensemble configurations through artificial bee colony algorithm
Swarm Evol. Comput.
(2013) - et al.
Diversity in search strategies for ensemble feature selection
Inf. Fusion
(2005) - et al.
Iterative boolean combination of classifiers in the ROC spacean application to anomaly detection with HMMs
Pattern Recognit.
(2010) - et al.
Selection-fusion approach for classification of datasets with missing values
Pattern Recognit.
(2010) - et al.
A probabilistic model of classifier competence for dynamic ensemble selection
Pattern Recognit.
(2011)
On the evolutionary design of heterogeneous bagging models
Neurocomputing
Classification by fuzzy integralperformance and tests
Fuzzy Sets Syst.
Low resolution face recognition based on support vector data description
Pattern Recognit.
Classifier selection for majority voting
Inf. Fusion
Diversity measures for one-class classifier ensembles
Neurocomputing
Analytic properties of maximum entropy OWA operators
Inf. Sci.
Intrusion detection in computer networks by a modular ensemble of one-class classifiers
Inf. Fusion
Experimental comparison of one-class classifiers for online signature verification
Neurocomputing
Ensemble one-class support vector machines for content-based image retrieval
Expert Syst. Appl.
The use of the area under the ROC curve in the evaluation of machine learning algorithms
Pattern Recognit.
Random bandsa novel ensemble for fingerprint matching
Neurocomputing
Rotation foresta new classifier ensemble method
IEEE Trans. Pattern Anal. Mach. Intell.
Ensemble Methods: Foundations and Algorithms
Bagging predictors
Mach. Learn.
The boosting approach to machine learningan overview
The random subspace method for constructing decision forests
IEEE Trans. Pattern Anal. Mach. Intell.
Optimization algorithms for one-class classification ensemble pruning
A powerful and efficient algorithm for numerical function optimizationartificial bee colony (ABC) algorithm
J. Global Optim.
Particle swarm optimization
Differential evolutiona simple and efficient heuristic for global optimization over continuous spaces
J. Global Optim.
Methods of combining multiple classifiers and their applications to handwriting recognition
IEEE Trans. Syst. Man Cybern.
On ordered weighted averaging aggregation operators in multicriteria decision making
IEEE Trans. Syst. Man Cybern.
Cited by (23)
Boosting the prediction of molten steel temperature in ladle furnace with a dynamic outlier ensemble
2022, Engineering Applications of Artificial IntelligenceCitation Excerpt :Converting the issue of ensemble pruning into an optimization problem is also an intuitive way. Several intelligent optimization algorithms, such as firefly algorithm (Krawczyk, 2015), artificial bee colony algorithm (Parhizkar and Abadi, 2015), and genetic algorithm (Krawczyk and Woźniak, 2014b) have been employed to search for the best combination of base detectors. From our point of view, optimization algorithm is not the key to the success of outlier ensemble pruning.
A dynamic ensemble outlier detection model based on an adaptive k-nearest neighbor rule
2020, Information FusionCitation Excerpt :With a combined criterion that consists of consistency and pairwise diversity, Krawczyk [17] employs firefly algorithm to complete the task of searching for the best possible subset of classifiers. Similarly, with an exponential consistency measure and a non-pairwise diversity measure as criterion, Parhizkar and Abadi [18] uses a novel binary artificial bee colony algorithm to prune the initial ensemble so that a near-optimal sub-ensemble can be found. An ensemble pruning methodology using non-monotone Simple Coalitional Games is proposed in [19].
Outlier detection based on a dynamic ensemble model: Applied to process monitoring
2019, Information FusionCitation Excerpt :With a combined criterion that consists of consistency and pairwise diversity, Krawczyk [24] employs firefly algorithm to complete the task of searching for the best possible subset of classifiers. Similarly, with an exponential consistency measure and a non-pairwise diversity measure as criterion, Parhizkar and Abadi [25] uses a novel binary artificial bee colony algorithm to prune the initial ensemble so that a near-optimal sub-ensemble can be found. According to their empirical results, PE methods usually outperform SE methods.
Detecting outliers for complex nonlinear systems with dynamic ensemble learning
2019, Chaos, Solitons and FractalsCitation Excerpt :Combining consistency with pairwise diversity, [22] employs firefly algorithm to complete the task of finding the best possible subset of base learners. Similarly, with an exponential consistency measure and a non-pairwise diversity measure as criterion, [23] uses a novel binary artificial bee colony algorithm to prune the initial ensemble so that a near-optimal sub-ensemble can be found. According to the empirical results, this type of ensemble methods usually outperform the static ensemble methods.
Dynamic ensemble selection for multi-class classification with one-class classifiers
2018, Pattern RecognitionCitation Excerpt :By combining mutually complementary classifiers a better coverage of the target class can be achieved, especially when dealing with complex distributions [26]. In order to take advantage of multiple classifiers a proper exploitation of the competence areas of each base classifier is required, for example, considering unique model properties (for heterogeneous ensembles [27]) or diversified inputs (for homogeneous ensembles). Additionally, often a classifier selection or ensemble pruning step can be conducted to only consider the most appropriate classifier in the final combination phase [28].
Selective ensemble of SVDDs with Renyi entropy based diversity measure
2017, Pattern Recognition
Elham Parhizkar received the M.Sc. degree in computer engineering, with first class honors, from Tarbiat Modares University in 2014, where she worked on anomaly detection in web traffic as her master thesis. Her main research interests are in the field of machine learning, particularly in the areas of one-class classification and outlier detection.
Mahdi Abadi received the B.Sc. degree in computer engineering from Ferdowsi University of Mashhad in 1998. He also received the M.Sc. and Ph.D. degrees from Tarbiat Modares University in 2001 and 2008, respectively. Since 2009, he has been an assistant professor in the Faculty of Electrical and Computer Engineering at Tarbiat Modares University. His main research interests are network security, evolutionary computation, and data mining.