Ensemble of subset online sequential extreme learning machine for class imbalance and concept drift

doi:10.1016/j.neucom.2014.03.075

Neurocomputing

Volume 149, Part A, 3 February 2015, Pages 316-329

https://doi.org/10.1016/j.neucom.2014.03.075 Get rights and content

Abstract

In this paper, a computationally efficient framework, referred to as ensemble of subset online sequential extreme learning machine (ESOS-ELM), is proposed for class imbalance learning from a concept-drifting data stream. The proposed framework comprises a main ensemble representing short-term memory, an information storage module representing long-term memory and a change detection mechanism to promptly detect concept drifts. In the main ensemble of ESOS-ELM, each OS-ELM network is trained with a balanced subset of the data stream. Using ELM theory, a computationally efficient storage scheme is proposed to leverage the prior knowledge of recurring concepts. A distinctive feature of ESOS-ELM is that it can learn from new samples sequentially in both the chunk-by-chunk and one-by-one modes. ESOS-ELM can also be effectively applied to imbalanced data without concept drift. On most of the datasets used in our experiments, ESOS-ELM performs better than the state-of-the-art methods for both stationary and non-stationary environments.

Introduction

The class imbalance problem has been studied extensively [1], [2], [3], [4], [5] in the past decade or so. Imbalanced datasets are common in real-world applications such as medical diagnosis, fraud detection, spam filtering, bioinformatics, text classification, etc [1]. Recently, the class imbalance problem in sequential learning has also attracted attention of researchers from various application domains [6], [7], [8], [9].

Sequential or incremental learners are computationally efficient as compared to batch learners [10], [11], [12], [13] since batch learners require retraining with the complete dataset whenever new samples arrive. Sequential learners store the previously learnt information and update themselves only with the newly arrived data samples. However, since statistical characteristics of training data streams may change over time, class concepts tend to drift in non-stationary environments. Concept drift learning raises the so-called stability-plasticity dilemma. Sequential learning framework should be able to achieve a meaningful balance between previously acquired information (stability) and learning new information (plasticity) [12]. The class imbalance problem further complicates sequential learning from drifting data streams.

Class imbalance and concept drift are two challenging problems which can occur in the same data stream. Recently, class imbalance problem in drifting data streams has received attention for chunk-by-chunk learning [14], [15], [16], [17], [18], [19]. ${Learn}^{+ +} .$ NIE [16] is a state-of-the-art method in this area. ${Learn}^{+ +}$ family of methods is based on ensemble learning framework in which a new classifier is trained with each arriving chunk of data and added to the ensemble. ${Learn}^{+ +}$ .NIE works better than most of the competing methods in recurring environments since it does not remove old classifiers from the ensemble. For imbalance datasets, the prior knowledge of recurring concepts can be particularly useful since the minority class samples are usually rare. In general, incremental learning methods should meet the single-pass requirement i.e. once samples are learnt, they are discarded [12]. However, some methods store previously learnt minority class samples, thus, violating the single-pass requirement [17], [18], [19]. Gao [17] proposed to collect the minority class samples from all previous chunks while SERA [18] and REA [19] select the minority class samples from previous chunks which are similar to samples in the current chunk. Moreover, most of the existing methods assume that a full chunk of data is always available for training [14]. If the samples are arriving continuously or one-by-one, updating of the classification model is delayed until a full chunk is completed. Hence, the need for a class imbalance learning (CIL) method for non-stationary environments, which can learn in both chunk-by-chunk and one-by-one modes, is timely.

Extreme learning machine (ELM) [11], [20] is becoming popular in large dataset and online learning applications due to its fast learning speed. ELM provides a single step least square estimatation (LSE) method for training single hidden layer feed forward network (SLFN) instead of using iterative gradient descent methods, such as back propagation algorithms. Very recently, a weighted online sequential extreme learning machine (WOS-ELM) was proposed for class imbalance learning [9]. WOS-ELM has been shown to effectively tackle the class imbalance problem in both chunk-by-chunk and one-by-one learning. However, WOS-ELM was proposed only for stationary environments and may not be appropriate for concept drift learning. Moreover, OS-ELM [11] related methods, with random hidden node parameters, may not always adapt well to the new data [21]. Thus, ensemble methods [21], [22], [23] are generally preferred over single OS-ELM methods [6], [9], [11].

In this paper, a computationally efficient framework, referred to as ensemble of subset online sequential extreme learning machine (ESOS-ELM), is proposed for class imbalance learning from a concept-drifting data stream. In ESOS-ELM, a minority class sample is processed by ‘m’ classifiers (‘m’ is the imbalance ratio) while a majority class sample is processed by a single classifier. The majority class samples are processed in a round robin fashion, i.e., the first majority class sample is processed by the first classifier, the second sample by the second classifier and so on. In this way, classifiers in the ensemble are trained with balanced subsets from the original imbalanced dataset. Note that the proposed framework tackles class imbalance and concept drift problems in both the one-by-one and chunk-by-chunk modes.

Ensemble learning methods are widely used in concept drift learning. Compared to single classifier methods, ensemble methods tend to better cope with concept drift problem, particularly with gradual drifts [14]. Dynamic weighted majority (DWM) [24] is a state-of-the-art ensemble method for concept drift learning with balanced datasets. In DWM, voting weights are decreased proportional to the error rate of the classifier. ESOS-ELM also uses dynamic weighted majority voting for concept drift learning. For tackling the class imbalance problem, ESOS-ELM processes incoming samples in a way that each OS-ELM network is trained with approximately equal number of minority and majority class samples. Unlike DWM, voting weights are updated proportional to an appropriate performance measure for CIL. In recurring environments [12], [16], [25], DWM may not be able to leverage the prior knowledge since old concepts are usually forgotten. To avoid this problem, we propose a novel information storage mechanism, using ELM theory, which is efficient both in terms of memory and computation. A change detection mechanism is also employed in the learning framework to promptly react to both the sudden and gradual drifts.

The new framework achieves better performance than that of ${Learn}^{+ +} .$ NIE(gm) on most of the datasets used in this paper. ESOS-ELM achieves this performance with fewer hypotheses than in ${Learn}^{+ +} .$ NIE(gm). The new method also obtained better performance than DWM in recurring environments. This superiority is due to the ELM-Store module which helps leverage the prior knowledge of old concepts. ESOS-ELM is also applied on benchmark imbalanced datasets without concept drift. ESOS-ELM outperformed WOS-ELM, OTER and SMOTE for all the 15 imbalanced datasets used in [9].

This paper is organized as follows: Section 2 discusses the preliminaries. Section 3 presents the details of the ESOS-ELM method. This is followed by experiments for validating the performance of the proposed framework in Section 4. Finally, Section 5 concludes the paper.

Section snippets

ELM and OS-ELM

Extreme learning machine (ELM) [20] is a single step least square error estimate solution originally proposed for single hidden layer feed forward networks and later extended for non-neuron like networks. The input weights and biases connecting input layer to the hidden layer (hidden node parameters) are assigned randomly and the weights connecting the hidden layer and the output layer (output weights) are determined analytically. Compared to the traditional iterative gradient decent methods

Method

In this section, an ensemble of subset online sequential extreme learning machine (ESOS-ELM) is proposed for class imbalance learning from drifting data stream. As shown in Fig. 1, the proposed ESOS-ELM method consists of three blocks, the main ensemble block, the ELM-Store block and the change detector block. These blocks are discussed in details as follows.

Experiments

The performance of ESOS-ELM is first evaluated for class imbalance learning with concept drift in Section 4.1. Later it is also evaluated for class imbalance learning without concept drift in Section 4.2.

Conclusions

In this paper, we have proposed an ensemble of subset online sequential extreme learning machine (ESOS-ELM) for class imbalance learning from drifting data streams. ESOS-ELM consists of a main ensemble for classification in the current imbalanced environment, an ELM-Store module for storing information of old concepts and a change detector for promptly detecting concept drifts. In ESOS-ELM, the main ensemble is trained with balanced subsets of the data stream. Similar to WOS-ELM, the new method

Acknowledgments

The authors would like to thank the anonymous reviewers whose insightful and helpful comments greatly improved this paper.

Bilal Mirza received the M.Sc. (signal processing) degree from Nanyang Technological University (NTU) in 2010. He is currently working towards the Ph.D. degree from NTU. His research interests include machine learning and its application in bio-signal processing, class imbalance and online sequential learning.

References (36)

W. Zong et al.
Weighted extreme learning machine for imbalance learning
Neurocomputing
(2013)
Y. Kim et al.
An online learning network for biometric scores fusion
Neurocomputing
(2013)
G.-B. Huang et al.
Extreme learning machine: theory and applications
Neurocomputing
(2006)
Y. Lan et al.
Ensemble of online sequential extreme learning machine
Neurocomputing
(2009)
J. Cao et al.
Voting based extreme learning machine
Inf. Sci.
(2012)
H. He et al.
Learning from imbalanced data
IEEE Trans. Knowl. Data Eng.
(2009)
M. Kubat, S. Matwin, Addressing the curse of imbalanced training sets: one-sided selection, in: Proceedings of the...
N.V. Chawla et al.
SMOTE: synthetic minority over-sampling technique
J. Artif. Intell. Res.
(2002)
C.X. Ling et al.
Cost-sensitive learning and the class imbalance problem
Encycl. Mach. Learn.
(2008)
Y. Wang et al.
Mining data streams with skewed distribution by static classifier ensemble
Stud. Comput. Intell.
(2009)

H. Nguyen, E. Cooper, K. Kamei, Online learning from imbalanced data streams, in: Proceedings of the International...

B. Mirza et al.

Weighted online sequential extreme learning machine for class imbalance learning

Neural Process. Lett.

(2013)

S. Ozawa et al.

Incremental learning of chunk data for online pattern classification

IEEE Trans. Neural Netw.

(2008)

N.Y. Liang et al.

A fast and accurate online sequential learning algorithm for feedforward networks

IEEE Trans. Neural Netw.

(2006)

R. Elwell et al.

Incremental learning of concept drift in nonstationary environments

IEEE Trans. Neural Netw.

(2011)

A. Shilton et al.

Incremental training of support vector machines

IEEE Trans. Neural Netw.

(2005)

T.R. Hoens et al.

Learning from streaming data with concept drift and imbalance: an overview

Prog. Artif. Intell.

(2012)

G. Ditzler, R. Polikar, N.V. Chawla, An incremental learning algorithm for non-stationary environments and class...

Cited by (166)

A review of ensemble learning and data augmentation models for class imbalanced problems: Combination, implementation and evaluation
2024, Expert Systems with Applications
Class imbalance (CI) in classification problems arises when the number of observations belonging to one class is lower than the other. Ensemble learning combines multiple models to obtain a robust model and has been prominently used with data augmentation methods to address class imbalance problems. In the last decade, a number of strategies have been added to enhance ensemble learning and data augmentation methods, along with new methods such as generative adversarial networks (GANs). A combination of these has been applied in many studies, and the evaluation of different combinations would enable a better understanding and guidance for different application domains. In this paper, we present a computational study to evaluate data augmentation and ensemble learning methods used to address prominent benchmark CI problems. We present a general framework that evaluates 9 data augmentation and 9 ensemble learning methods for CI problems. Our objective is to identify the most effective combination for improving classification performance on imbalanced datasets. The results indicate that combinations of data augmentation methods with ensemble learning can significantly improve classification performance on imbalanced datasets. We find that traditional data augmentation methods such as the synthetic minority oversampling technique (SMOTE) and random oversampling (ROS) are not only better in performance for selected CI problems, but also computationally less expensive than GANs. Our study is vital for the development of novel models for handling imbalanced datasets.
Pro-IDD: Pareto-based ensemble for imbalanced and drifting data streams
2023, Knowledge-Based Systems
Concept drifts and class imbalance are two primary challenges in supervised data stream classification, whereas their co-occurrence presents a more complicated learning problem. To tackle these challenges, this paper proposes Pro-IDD, a Pareto-based ensemble for imbalanced and drifting data streams. As part of Pro-IDD, Min++ module resolves the class imbalance issue by improving the minority class visibility such that class overlaps are reduced and small disjuncts in minority space are enlarged. Additionally, the ProEns module is designed to construct the ensemble pool by taking concept drifts and class imbalance into account. ProEns prunes the ensemble pool by using Pareto-based multi-objective learning for two measures: time-decayed recall-based weight and ensemble diversity. Experiments are conducted on 20 data streams with concept drifts and class imbalance and comparisons are reported against 10 state-of-the-art methods. Results show that improving the minority class visibility and using time-decayed recall-based weight and diversity for ensemble selection through Pareto-based multi-objective learning could improve the classification performance of data streams learners in the presence of concept drifts and class imbalance.
Mixture extreme learning machine algorithm for robust regression
2023, Knowledge-Based Systems
The extreme learning machine (ELM) is a well-known approach for training single hidden layer feedforward neural networks (SLFNs) in machine learning. However, ELM is most effective when used for regression on datasets with simple Gaussian distributed error because it often employs a squared loss in its objective function. In contrast, real-world data is often collected from unpredictable and diverse contexts, which may contain complex noise that cannot be characterized by a single distribution. To address this challenge, we propose a robust mixture ELM algorithm, called Mixture-ELM, that enhances modeling capability and resilience to both Gaussian and non-Gaussian noise. The Mixture-ELM algorithm uses an adjusted objective function that blends Gaussian and Laplacian distributions to approximate any continuous distribution and match the noise. The Gaussian mixture accurately models the residual distribution, while the inclusion of the Laplacian distribution addresses the limitations of the Gaussian distribution in identifying outliers. We derive a solution to the novel objective function using the expectation maximization (EM) and iteratively reweighted least squares (IRLS) algorithms. We evaluate the effectiveness of the algorithm through numerical simulation and experiments on benchmark datasets, thereby demonstrating its superiority over other state-of-the-art machine learning methods in terms of robustness and generalization.
Multiclass imbalanced and concept drift network traffic classification framework based on online active learning
2023, Engineering Applications of Artificial Intelligence
The complex problems of multiclass imbalance, virtual or real concept drift, concept evolution, high-speed traffic streams and limited label cost budgets pose severe challenges in network traffic classification tasks. In this paper, we propose a multiclass imbalanced and concept drift network traffic classification framework based on online active learning (MicFoal), which includes a configurable supervised learner for the initialization of a network traffic classification model, an active learning method with a hybrid label request strategy, a label sliding window group, a sample training weight formula and an adaptive adjustment mechanism for the label cost budget based on a periodic performance evaluation. In addition, a novel uncertain label request strategy based on a variable least confidence threshold vector is designed to address the problems of a variable multiclass imbalance ratio or even the number of classes changing over time. Experiments performed based on eight well-known real-world network traffic datasets demonstrate that MicFoal is more effective and efficient than several state-of-the-art learning algorithms.
Data Stream Classification Based on Extreme Learning Machine: A Review
2022, Big Data Research
Many daily applications are generating massive amount of data in the form of stream at an ever higher speed, such as medical data, clicking stream, internet record and banking transaction, etc. In contrast to the traditional static data, data streams are of some inherent properties, to name a few, infinite length, concept drift, multiple labels and concept evolution. Among all the data mining tasks, classification is one of the basic topics in data stream mining and has gained more and more attentions among different research communities. Extreme Learning Machine (ELM) has drawn much interests in data classification due to its high efficiency, universal approximation capability, generalization ability, and simplicity, which have greatly inspired the development of many ELM-based algorithms and their applications during the past decades. In this paper, we mainly provide a comprehensive review on ELM theoretical research and its variants in data stream classification, and categorize these algorithms from different perspectives. Firstly, we briefly introduce the basic principles of ELM and its characteristics. Secondly, we give an overview of different ELM variants to address the particular issues of data stream classification. Thirdly, we present an overview of different strategies to optimize the ELM, which have further improved the stability, accuracy and generalization ability of ELM, and briefly introduce some practical applications of ELM in data stream classification. Finally, we conduct several groups of experiments to compare the performance of ELM based models addressing the focused issues. Also, the open issues and prospects of ELM models used for stream classification are discussed, which are worthwhile to be further studied in the future.
Nonstationary data stream classification with online active learning and siamese neural networks<sup>✩</sup>
2022, Neurocomputing
We have witnessed in recent years an ever-growing volume of information becoming available in a streaming manner in various application areas. As a result, there is an emerging need for online learning methods that train predictive models on-the-fly. A series of open challenges, however, hinder their deployment in practice. These are, learning as data arrive in real-time one-by-one, learning from data with limited ground truth information, learning from nonstationary data, and learning from severely imbalanced data, while occupying a limited amount of memory for data storage. We propose the ActiSiamese algorithm, which addresses these challenges by combining online active learning, siamese networks, and a multi-queue memory. It develops a new density-based active learning strategy which considers similarity in the latent (rather than the input) space. We conduct an extensive study that compares the role of different active learning budgets and strategies, the performance with/without memory, the performance with/without ensembling, in both synthetic and real-world datasets, under different data nonstationarity characteristics and class imbalance levels. ActiSiamese outperforms baseline and state-of-the-art algorithms, and is effective under severe imbalance, even only when a fraction of the arriving instances’ labels is available. We publicly release our code to the community.

View all citing articles on Scopus

Zhiping Lin received the B.Eng. degree in control engineering from South China Institute of Technology, Canton, China in 1982 and the Ph.D. degree in information engineering from the University of Cambridge, England in 1987. He was with the University of Calgary, Canada for 1987–1988, with Shantou University, China for 1988–1993, and with DSO National Laboratories, Singapore for 1993–1999. Since February, 1999, he has been an Associate Professor at Nanyang Technological University (NTU), Singapore. He is also the Program Director of Bio-Signal Processing, Valens Centre of Excellence, NTU. Dr. Lin is currently serving as the Editor-in-Chief of Multidimensional Systems and Signal Processing after serving as an editorial board member for 1993–2004 and a CO-Editor for 2005–2010. He was an Associate Editor of Circuits, Systems and Signal Processing for 2000–2007 and an Associate Editor of IEEE Transactions on Circuits and Systems, Part II, for 2010–2011. He also serves as a reviewer for Mathematical Reviews. He is General Chair of the 9th International Conference on Information, Communications and Signal Processing (ICICS), 2013. His research interests include multidimensional systems and signal processing, statistical and biomedical signal processing, and more recently machine learning. He is the co-author of the 2007 Young Author Best Paper Award from the IEEE Signal Processing Society, Distinguished Lecturer of the IEEE Circuits and Systems Society for 2007–2008, and the Chair of the IEEE Circuits and Systems Singapore Chapter for 2007–2008.

Nan Liu received the B.Eng. degree in electrical engineering from University of Science and Technology Beijing, China, and the Ph.D. degree in electrical engineering from Nanyang Technological University, Singapore. He is currently a Senior Research Scientist at the Department of Emergency Medicine, Singapore General Hospital. His research interests include pattern recognition, machine learning, and biomedical signal processing.

View full text

Ensemble of subset online sequential extreme learning machine for class imbalance and concept drift

Abstract

Introduction

Section snippets

ELM and OS-ELM

Method

Experiments

Conclusions

Acknowledgments

Neurocomputing

Neurocomputing

Neurocomputing

Neurocomputing

Inf. Sci.

Learning from imbalanced data

IEEE Trans. Knowl. Data Eng.

SMOTE: synthetic minority over-sampling technique

J. Artif. Intell. Res.

Cost-sensitive learning and the class imbalance problem

Encycl. Mach. Learn.

Mining data streams with skewed distribution by static classifier ensemble

Stud. Comput. Intell.

Weighted online sequential extreme learning machine for class imbalance learning

Neural Process. Lett.

Incremental learning of chunk data for online pattern classification

IEEE Trans. Neural Netw.

A fast and accurate online sequential learning algorithm for feedforward networks

IEEE Trans. Neural Netw.

Incremental learning of concept drift in nonstationary environments

IEEE Trans. Neural Netw.

Incremental training of support vector machines

IEEE Trans. Neural Netw.

Learning from streaming data with concept drift and imbalance: an overview

Prog. Artif. Intell.