BeeOWA: A novel approach based on ABC algorithm and induced OWA operators for constructing one-class classifier ensembles

doi:10.1016/j.neucom.2015.03.051

Neurocomputing

Volume 166, 20 October 2015, Pages 367-381

https://doi.org/10.1016/j.neucom.2015.03.051 Get rights and content

Abstract

In recent years, classifier ensembles have received increasing attention in the machine learning and pattern recognition communities. However, constructing classifier ensembles for one-class classification problems has still remained as a challenging research topic. To pursue this line of research, we need to address issues on how to generate a set of diverse one-class classifiers that are individually accurate and how to combine the outputs of them in an effective way. In this paper, we present BeeOWA, a novel approach to construct highly accurate one-class classifier ensembles. It uses a novel binary artificial bee colony algorithm, called BeePruner, to prune an initial one-class classifier ensemble and find a near-optimal sub-ensemble of base classifiers in a reasonable computational time. To evaluate the fitness of an ensemble solution, BeePruner uses two different measures: an exponential consistency measure and a non-pairwise diversity measure based on the Kappa inter-rater agreement. After one-class classifier pruning, BeeOWA uses a novel exponential induced OWA (ordered weighted averaging) operator, called EIOWA, to combine the outputs of base classifiers in the sub-ensemble. The results of experiments carried out on a number of benchmark datasets show that BeeOWA can outperform several state-of-the-art approaches, both in terms of classification performance and statistical significance.

Introduction

Combining multiple classifiers, also known as classifier ensemble, is an effective technique for solving classification problems using an ensemble of individual base classifiers. It has been theoretically and empirically demonstrated that classifier ensembles can substantially improve the classification accuracy of their constituent members [1], [2], [3].

In addition to the accuracy of base classifiers, the success of a classifier ensemble also relies on the diversity being inherent in the base classifiers [4], [5]. Diversity ensures that all the base classifiers are as different from each other as possible and so can make uncorrelated errors. To achieve an initial diversity, a common technique is to train a set of homogeneous or heterogeneous base classifiers on different or same training datasets using techniques such as bagging [6], boosting [7], or random subspacing [8]. Bagging generates several different training datasets with bootstrap sampling from the original training dataset and then trains a base classifier from each of those training datasets. Boosting generates a sequence of base classifiers whose training datasets are different and determined by the accuracy of former base classifiers. Random subspacing trains base classifiers independently on the same training dataset using different random subsets of features.

Using a subset, or sub-ensemble, of base classifiers could provide higher diversity and accuracy than using the whole set, or ensemble. Thus, one of the most important issues in constructing a classifier ensemble is to decide which ones of the base classifiers to choose [9], [10]. This process is also known as classifier pruning or ensemble pruning [11] and can be considered as an optimization problem with two objectives, classification accuracy and diversity, that both need to be maximized. When the size of a classifier ensemble is relatively large, classifier pruning is computationally expensive or even prohibitive. One solution to this problem is to use meta-heuristic algorithms, such as artificial bee colony (ABC) [12]. These algorithms can find near-optimal solutions in a short time. ABC is a new swarm based meta-heuristic algorithm that was initially proposed for solving numerical optimization problems. It is as simple as particle swarm optimization (PSO) [13] and differential evolution (DE) [14], and uses only common control parameters, such as population size and maximum cycle number. ABC has shown promising results in the field of optimization [15], [16].

Furthermore, it has been shown that increasing the coverage of a classifier ensemble through classifier pruning is not enough to increase the classification accuracy [17], [18]. Hence, another important step in constructing a classifier ensemble is to choose a good strategy for combining the outputs of base classifiers, using a process also known as classifier fusion. In the literature, several classifier fusion strategies have been proposed, which can be categorized according to the level of classifier outputs into abstract level, rank level, and measurement level [19], [20]. Measurement level outputs provide more information than the other types of outputs and a number of aggregation functions or fusion rules, such as mean, max, and product are employed for combining them [21].

Aggregation of different pieces of information obtained from different sources is a common aspect of any fusion system. A very interesting class of powerful aggregation operators is called the ordered weighted averaging (OWA) [22]. An OWA operator takes multiple values as input and returns a single value that is a weighted sum based on an order ranking of all input values. Classifier fusion using OWA operators seems more robust than the simple weighted averaging, where the coefficients are derived based on the classifier accuracy [23].

In recent years, there has been a substantial amount of research conducted in the field of one-class classification, resulting in different one-class classifiers, including one-class SVM (OCSVM) [24], support vector data description (SVDD) [25], and so on. The goal of one-class classification is to distinguish a set of target objects from all the other possible objects [26]. Since in one-class classification problems we have only the information of the target class, therefore constructing highly accurate one-class classifier ensembles is more challenging than constructing multi-class classifier ensembles.

As mentioned before, to construct classifier ensembles, we need to address issues on how to generate a set of accurate and diverse base classifiers and how to combine the outputs of them using an effective fusion rule. Although these issues have been adequately addressed in multi-class classifier ensembles, relatively little work has been reported in the literature to address them in one-class classifier ensembles. In this paper, we present BeeOWA, a novel approach to construct highly accurate one-class classifier ensembles. BeeOWA uses a novel binary artificial bee colony algorithm, called BeePruner, to prune an initial one-class classifier ensemble and find a near-optimal sub-ensemble of base classifiers. More precisely, the goal of BeePruner is to exclude the non-diverse base classifiers from the initial ensemble and, at the same time, keep the classification accuracy. In the subsequent step, if the fusion rule does not properly utilize the ensemble diversity, then no benefit arises from the classifier fusion. Considering this fact, BeeOWA uses a novel exponential induced OWA operator, called EIOWA, to combine the outputs of base classifiers in the sub-ensemble.

The major contributions of this paper are listed as follows:

•
We present a novel artificial bee colony algorithm for one-class classifier pruning that utilizes two measures simultaneously, an exponential consistency measure and a non-pairwise diversity measure based on the Kappa inter-rater agreement.
•
To the best of our knowledge, the most widely used fusion rules in one-class classification problems are fixed rules, such as majority voting, mean, max, and product. We propose a novel exponential induced OWA operator for one-class classifier fusion and experimentally show it can outperform the fixed rules.
•
We conduct extensive experiments on benchmark datasets to evaluate the performance of BeeOWA and show that it performs significantly better than state-of-the-art approaches in the literature.

The rest of this paper is organized as follows. Section 2 is fully dedicated to the background. Short descriptions of classifier ensembles, OWA operators, and ABC algorithm are included in this section. Section 3 presents the main steps of BeeOWA. Section 4 provides an overview of current techniques for constructing one-class classifier ensembles. The experimental results are described in Section 5. Finally, conclusions are given in Section 6.

Section snippets

Background

In this section, we give a brief introduction to some basic concepts used throughout this paper.

BeeOWA

In this section, we present BeeOWA, a novel approach for constructing one-class classifier ensembles. It assumes that we are given an initial ensemble of base one-class classifiers. Therefore, it consists of two main steps: one-class classifier pruning and one-class classifier fusion.

Related work

Over the last few years, one-class classifier ensembles have been used in various domains, such as information security [58], [59], signature verification [60], [61], image retrieval [62], and so on. As previously mentioned, the two challenging steps in constructing classifier ensembles are classifier pruning and classifier fusion. These steps have proven to be promising research directions for one-class classifier ensembles. Hence, in the following, we give a brief overview of the

Experiments

In this section, we evaluate the performance of BeeOWA using several benchmark datasets and compare it with that of state-of-the-art one-class classifier ensemble approaches in the literature.

All experiments were implemented in MATLAB using PRTools [69] and the Data Description toolbox [70]. All of the reported results were averaged over 5 times 10-fold cross-validation. For the comparison, we used the Friedman test [71] with the Nemenyi post hoc test [72], as recommended in [73]. The

Conclusions

It has been shown that using the best classifier and discarding the classifiers with poorer performance might waste valuable information [45]. For this reason, we commonly use classifier ensembles to improve the classification accuracy. In general, the process of constructing a classifier ensemble consists of three main steps: classifier generation, classifier pruning, and classifier fusion. Although classifier pruning and classifier fusion have been sufficiently studied in multi-class

Elham Parhizkar received the M.Sc. degree in computer engineering, with first class honors, from Tarbiat Modares University in 2014, where she worked on anomaly detection in web traffic as her master thesis. Her main research interests are in the field of machine learning, particularly in the areas of one-class classification and outlier detection.

References (75)

C. Zhang et al.
Boosting with pairwise constraints
Neurocomputing
(2010)
L. Li et al.
Dynamic classifier ensemble using classification confidence
Neurocomputing
(2013)
C. Lin et al.
LibD3Censemble classifiers with a clustering and dynamic selection strategy
Neurocomputing
(2014)
Z.-H. Zhou et al.
Ensembling neural networksmany could be better than all
Artif. Intell.
(2002)
R. Lysiak et al.
Optimal selection of ensemble classifiers using measures of competence and diversity of base classifiers
Neurocomputing
(2014)
P. Shunmugapriya et al.
Optimization of stacking ensemble configurations through artificial bee colony algorithm
Swarm Evol. Comput.
(2013)
A. Tsymbal et al.
Diversity in search strategies for ensemble feature selection
Inf. Fusion
(2005)
W. Khreich et al.
Iterative boolean combination of classifiers in the ROC spacean application to anomaly detection with HMMs
Pattern Recognit.
(2010)
M. Ghannad-Rezaie et al.
Selection-fusion approach for classification of datasets with missing values
Pattern Recognit.
(2010)
T. Woloszynski et al.
A probabilistic model of classifier competence for dynamic ensemble selection
Pattern Recognit.
(2011)

A.L.V. Coelho et al.

On the evolutionary design of heterogeneous bagging models

Neurocomputing

(2010)

M. Grabisch et al.

Classification by fuzzy integralperformance and tests

Fuzzy Sets Syst.

(1994)

S.-W. Lee et al.

Low resolution face recognition based on support vector data description

Pattern Recognit.

(2006)

D. Ruta et al.

Classifier selection for majority voting

Inf. Fusion

(2005)

B. Krawczyk et al.

Diversity measures for one-class classifier ensembles

Neurocomputing

(2014)

D. Filev et al.

Analytic properties of maximum entropy OWA operators

Inf. Sci.

(1995)

G. Giacinto et al.

Intrusion detection in computer networks by a modular ensemble of one-class classifiers

Inf. Fusion

(2008)

L. Nanni

Experimental comparison of one-class classifiers for online signature verification

Neurocomputing

(2006)

R.-S. Wu et al.

Ensemble one-class support vector machines for content-based image retrieval

Expert Syst. Appl.

(2009)

A.P. Bradley

The use of the area under the ROC curve in the evaluation of machine learning algorithms

Pattern Recognit.

(1997)

L. Nanni et al.

Random bandsa novel ensemble for fingerprint matching

Neurocomputing

(2006)

J.J. Rodriguez et al.

Rotation foresta new classifier ensemble method

IEEE Trans. Pattern Anal. Mach. Intell.

(2006)

Z.-H. Zhou

Ensemble Methods: Foundations and Algorithms

(2012)

L. Breiman

Bagging predictors

Mach. Learn.

(1996)

R.E. Schapire

The boosting approach to machine learningan overview

T.K. Ho

The random subspace method for constructing decision forests

IEEE Trans. Pattern Anal. Mach. Intell.

(1998)

B. Krawczyk et al.

Optimization algorithms for one-class classification ensemble pruning

D. Karaboga et al.

A powerful and efficient algorithm for numerical function optimizationartificial bee colony (ABC) algorithm

J. Global Optim.

(2007)

J. Kennedy

Particle swarm optimization

R. Storn et al.

Differential evolutiona simple and efficient heuristic for global optimization over continuous spaces

J. Global Optim.

(1997)

F. Barani, M. Abadi, An ABC-AIS hybrid approach to dynamic anomaly detection in AODV-based MANETs, in: Proceedings of...

C. Brodley, T. Lane, Creating and exploiting coverage and diversity, in: Proceedings of the AAAI-96 Workshop on...

L. Xu et al.

Methods of combining multiple classifiers and their applications to handwriting recognition

IEEE Trans. Syst. Man Cybern.

(1992)

C.Y. Suen, L. Lam, Multiple classifier combination methodologies for different output levels, in: Multiple Classifier...

R.P.W. Duin, D.M.J. Tax, Experiments with classifier combining rules, in: Multiple Classifier Systems, Lecture Notes in...

R.R. Yager

On ordered weighted averaging aggregation operators in multicriteria decision making

IEEE Trans. Syst. Man Cybern.

(1988)

L.I. Kuncheva

(2000)

Cited by (23)

Boosting the prediction of molten steel temperature in ladle furnace with a dynamic outlier ensemble
2022, Engineering Applications of Artificial Intelligence
Citation Excerpt :
Converting the issue of ensemble pruning into an optimization problem is also an intuitive way. Several intelligent optimization algorithms, such as firefly algorithm (Krawczyk, 2015), artificial bee colony algorithm (Parhizkar and Abadi, 2015), and genetic algorithm (Krawczyk and Woźniak, 2014b) have been employed to search for the best combination of base detectors. From our point of view, optimization algorithm is not the key to the success of outlier ensemble pruning.
Molten steel temperature prediction is a critical step in the development of level-two control systems for ladle furnace. Many machine learning algorithms have been employed to complete such a work. Whereas data-driven predictors often deteriorate due to the presence of outliers in practical applications. This paper proposes to boost the predictive performance via outlier detection. Specifically, a dynamic outlier ensemble is developed inspired by the superiority of dynamic classifier selection in classification. Clustering analysis is used to determine the region of competence, on which base detectors are selected with the dedicated measure. The reason for the usage of clustering analysis lies in its efficiency during online detection. One attribute weighting algorithm is used to enhance the capability of clustering in outlier detection. The information behind regression is used to facilitate the measure of competence, results of which can promote the performance of predictors. Such a strategy can achieve double-win from the perspective of regression and outlier detection. Extensive experiments on real-world data sets show that results of all 4 predictive models with respect to accuracy and hit rate can be improved. Moreover, the detection performance in terms of G-mean and F1 score of our detector has also been confirmed via the comparison with 8 competitors.
A dynamic ensemble outlier detection model based on an adaptive k-nearest neighbor rule
2020, Information Fusion
Citation Excerpt :
With a combined criterion that consists of consistency and pairwise diversity, Krawczyk [17] employs firefly algorithm to complete the task of searching for the best possible subset of classifiers. Similarly, with an exponential consistency measure and a non-pairwise diversity measure as criterion, Parhizkar and Abadi [18] uses a novel binary artificial bee colony algorithm to prune the initial ensemble so that a near-optimal sub-ensemble can be found. An ensemble pruning methodology using non-monotone Simple Coalitional Games is proposed in [19].
Ensembles of outlier detectors are drawing increasing attentions recently, in spite of the difficulty on developing ensembles in the framework of unsupervised learning. We have noted that existing outlier ensembles often use certain fusion rules (e.g. majority voting) to aggregate individual learners. Theoretically, these individuals are assumed to be error-independent so that single models can be outperformed by the ensemble. However, it is of great difficulty to satisfy this assumption in practical applications. By dynamic selecting more competent individual(s) for each test pattern, this problem can be alleviated effectively. Inspired by this idea, this paper proposes a dynamic ensemble outlier detection model using one-class classifiers as base learners. As the competences of base detectors are estimated totally on data points in the validation set, its impact on the selection is significant. In order to achieve an efficient selection, we propose an adaptive k-nearest neighbor (KNN) rule, instead of traditional KNN algorithm, to constitute the validation set for each test pattern. Our adaptive KNN rule firstly uses algorithm support vector data description (SVDD) to mine the local area where class conditional probabilities are not constant in terms of the corresponding test pattern. Competences estimated with neighbor patterns in this area should thus be more accurate than that by traditional KNN rule. A probabilistic model that uses posterior probabilities of one-class classifiers is used then to estimate classifier competences. We present experimental evidence of the detection performance improvement over single models and over a variety of static ensemble models, by using data sets from UCI repository.
Outlier detection based on a dynamic ensemble model: Applied to process monitoring
2019, Information Fusion
Citation Excerpt :
With a combined criterion that consists of consistency and pairwise diversity, Krawczyk [24] employs firefly algorithm to complete the task of searching for the best possible subset of classifiers. Similarly, with an exponential consistency measure and a non-pairwise diversity measure as criterion, Parhizkar and Abadi [25] uses a novel binary artificial bee colony algorithm to prune the initial ensemble so that a near-optimal sub-ensemble can be found. According to their empirical results, PE methods usually outperform SE methods.
This paper focuses on outlier detection and its application to process monitoring. The main contribution is that we propose a dynamic ensemble detection model, of which one-class classifiers are used as base learners. Developing a dynamic ensemble model for one-class classification is challenging due to the absence of labeled training samples. To this end, we propose a procedure that can generate pseudo outliers, prior to which we transform outputs of all base classifiers to the form of probability. Then we use a probabilistic model to evaluate competence of all base classifiers. Friedman test along with Nemenyi test are used together to construct a switching mechanism. This is used for determining whether one classifier should be nominated to make the decision or a fusion method should be applied instead. Extensive experiments are carried out on 20 data sets and an industrial application to verify the effectiveness of the proposed method.
Detecting outliers for complex nonlinear systems with dynamic ensemble learning
2019, Chaos, Solitons and Fractals
Citation Excerpt :
Combining consistency with pairwise diversity, [22] employs firefly algorithm to complete the task of finding the best possible subset of base learners. Similarly, with an exponential consistency measure and a non-pairwise diversity measure as criterion, [23] uses a novel binary artificial bee colony algorithm to prune the initial ensemble so that a near-optimal sub-ensemble can be found. According to the empirical results, this type of ensemble methods usually outperform the static ensemble methods.
Process data has been used in most industrial systems to facilitate process control and process monitoring. Even if outliers have been proved to have negative influence on those data-driven techniques, dedicated detection methods are still rare or at a junior phase. Furthermore, due to the fact that most industrial systems are complex and nonlinear, many outlier detection methods developed in the field of data mining are inefficient or cannot be applied directly. In this paper thereby, we propose an outlier detection method dedicated to complex and nonlinear industrial systems. This method is on the basis of dynamic ensemble learning. It is observed that ensemble learning has made great achievement recently, and dynamic ensemble learning usually outperforms other ensemble techniques. Experimental results prove that our dynamic ensemble outlier detection method has better performance for complex nonlinear industrial systems.
Dynamic ensemble selection for multi-class classification with one-class classifiers
2018, Pattern Recognition
Citation Excerpt :
By combining mutually complementary classifiers a better coverage of the target class can be achieved, especially when dealing with complex distributions [26]. In order to take advantage of multiple classifiers a proper exploitation of the competence areas of each base classifier is required, for example, considering unique model properties (for heterogeneous ensembles [27]) or diversified inputs (for homogeneous ensembles). Additionally, often a classifier selection or ensemble pruning step can be conducted to only consider the most appropriate classifier in the final combination phase [28].
In this paper we deal with the problem of addressing multi-class problems with decomposition strategies. Based on the divide-and-conquer principle, a multi-class problem is divided into a number of easier to solve sub-problems. In order to do so, binary decomposition is considered to be the most popular approach. However, when using this strategy we may deal with the problem of non-competent classifiers. Otherwise, recent studies highlighted the potential usefulness of one-class classifiers for this task. Despite not using all the available knowledge, one-class classifiers have several desirable properties that may benefit the decomposition task.
From this perspective, we propose a novel approach for combining one-class classifiers to solve multi class problems based on dynamic ensemble selection, which allows us to discard non-competent classifiers to improve the robustness of the combination phase. We consider the neighborhood of each instance to decide whether a classifier may be competent or not. We further augment this with a threshold option that prevents from the selection of classifiers corresponding to classes with too little examples in this neighborhood.
To evaluate the usefulness of our approach an extensive experimental study is carried out, backed-up by a thorough statistical analysis. The results obtained show the high quality of our proposal and that the dynamic selection of one-class classifiers is a useful tool for decomposing multi-class problems.
Selective ensemble of SVDDs with Renyi entropy based diversity measure
2017, Pattern Recognition
In this paper, a novel selective ensemble strategy for support vector data description (SVDD) using the Renyi entropy based diversity measure is proposed to deal with the problem of one-class classification. In order to obtain compact classification boundary, the radius of ensemble is defined as the inner product of the vector of combination weights and the vector of the radii of SVDDs. To make the center of ensemble achieve the optimal position, the Renyi entropy of the kernelized distances between the images of samples and the center of ensemble in the high-dimensional feature space is defined as the diversity measure. Moreover, to fulfill the selective ensemble, an ℓ₁-norm based regularization term is introduced into the objective function of the proposed ensemble. The optimal combination weights can be iteratively obtained by the half-quadratic optimization technique. Experimental results on two synthetic data sets and twenty benchmark data sets demonstrate that the proposed selective ensemble method is superior to the single SVDD and the other four related ensemble approaches.

View all citing articles on Scopus

Mahdi Abadi received the B.Sc. degree in computer engineering from Ferdowsi University of Mashhad in 1998. He also received the M.Sc. and Ph.D. degrees from Tarbiat Modares University in 2001 and 2008, respectively. Since 2009, he has been an assistant professor in the Faculty of Electrical and Computer Engineering at Tarbiat Modares University. His main research interests are network security, evolutionary computation, and data mining.

View full text

BeeOWA: A novel approach based on ABC algorithm and induced OWA operators for constructing one-class classifier ensembles

Abstract

Introduction

Section snippets

Background

BeeOWA

Related work

Experiments

Conclusions

Neurocomputing

Neurocomputing

Neurocomputing

Artif. Intell.

Neurocomputing

Swarm Evol. Comput.

Inf. Fusion

Pattern Recognit.

Pattern Recognit.

Pattern Recognit.

Neurocomputing

Fuzzy Sets Syst.

Pattern Recognit.

Inf. Fusion

Neurocomputing

Inf. Sci.

Inf. Fusion

Neurocomputing

Expert Syst. Appl.

Pattern Recognit.

Neurocomputing

Rotation foresta new classifier ensemble method

IEEE Trans. Pattern Anal. Mach. Intell.

Ensemble Methods: Foundations and Algorithms

Bagging predictors

Mach. Learn.

The boosting approach to machine learningan overview

The random subspace method for constructing decision forests

IEEE Trans. Pattern Anal. Mach. Intell.

Optimization algorithms for one-class classification ensemble pruning

A powerful and efficient algorithm for numerical function optimizationartificial bee colony (ABC) algorithm

J. Global Optim.

Particle swarm optimization

Differential evolutiona simple and efficient heuristic for global optimization over continuous spaces

J. Global Optim.

Methods of combining multiple classifiers and their applications to handwriting recognition

IEEE Trans. Syst. Man Cybern.

On ordered weighted averaging aggregation operators in multicriteria decision making

IEEE Trans. Syst. Man Cybern.