Elsevier

Neurocomputing

Volume 225, 15 February 2017, Pages 205-213
Neurocomputing

Monotonic classification extreme learning machine

https://doi.org/10.1016/j.neucom.2016.11.021Get rights and content

Abstract

Monotonic classification problems mean that both feature values and class labels are ordered and monotonicity relationships exist between some features and the decision label. Extreme Learning Machine (ELM) is a single-hidden layer feedforward neural network with fast training rate and good generalization capability, but due to the existence of training error, ELM cannot be directly used to handle monotonic classification problems. This work proposes a generalization of ELM for processing the monotonic classification, named as Monotonic Classification Extreme Learning Machine (MCELM) in which the monotonicity constraints are imposed to the original ELM model. Mathematically, MCELM is a quadratic programming problem in which the monotonicity relationships are considered as constraints and the training error is the objective to be minimized. The mathematical model of MCELM not only can make the generated classifier monotonic but also can minimize the classification error. MCELM does not need to tune parameters iteratively, and therefore, keeps the advantage of extremely fast training which is the essential characteristic of ELM. MCELM does not require that the monotonic relationships existing between features and the output are consistent, which essentially relaxes the assumption of consistent monotonicity used in most existing approaches to handling monotonic classification problems. In comparison with exiting approaches to handling monotonic classification, MCELM can indeed generate a monotonicity-reserving classifier which experimentally shows a much better generalization capability on both artificial and real world datasets.

Introduction

As a fundamental task of supervised learning, classification is to get a classifier by training a number of labeled samples, and then to predict the class label of an unseen sample based on the trained classifier. From references we can find many different algorithms proposed to solve classification problems, such as decision tree induction [1], Bayesian classifier [2], kernel methods [3], support vector machine [4], artificial neuron networks [5], etc. They can be used to handle different kinds of data, such as uncertain data [6], [7] and ordinal data.

Ordinal classification is a generalization of the traditional classification. In traditional classification problems the class labels are nominal while in the ordinal classification the class labels are ordered, in other words, there is an order relationship among the class labels in ordinal classifications. For example, the damage degree after a typhoon hit can be classified into three levels: slight, moderate and serious. It is clear that there is an order relationship among these class labels. In comparison with the traditional classification, the ordinal classification sufficiently employs the information of order among the labels.

The monotonic classification is essentially an ordinal classification with monotonicity constraints, in which both feature values and class labels are ordered and monotonic relationships exist between them. The monotonicity can be either increasing or decreasing. If the decision value increases (decreases) with the increasing (decreasing) of a particular feature value, then the monotonicity between the decision attribute and the feature is increasing; otherwise it is decreasing. For example, the grade of scholarship is monotonically increasing with respect to the student's academic performance and the satisfaction degree of a car is monotonically decreasing with respect to the maintenance cost. Monotonic classification problems widely exist in many real application fields such as disease diagnoses, bankruptcy risk assessments and employee selections.

In recent decades, the monotonic classification has attracted lots of attention from researchers. The following is an incomplete survey on monotonic classifications.

In 1989, Ben David et al. introduced the first algorithm OLM [8] for ordinal classification with monotonic constraints in the machine learning community. This method consists of two components. It first chooses a subset of training objects and then classifies the samples of the subset by using a function. Unfortunately this approach may produce non-monotonic results. Greco et al. proposed a dominance-based rough set approach [9] to deal with the multi-criteria sorting, which is an extension of the classical rough set approach [10]. The approach handles inconsistencies coming from violation of the monotonicity constraints by substituting the indiscernibility relation with the dominance relation. It is the first approach using the comprehensive theory for monotonic classification tasks in the domain of knowledge discovery. Further in [[11], [12]], the model of dominance rough set was used to extract rules for monotonic classification. Similar pieces of work can be found from many other researchers' studies, such as [[13], [14], [15], [16], [17], [18]]. Among the existing approaches to learning with monotonicity constraints, the nonparametric approach is the most general one. In [19], the authors introduced a probabilistic model for ordinal classifications with monotonicity constraints based on the concept of stochastic dominance and investigated the possible loss functions. Its main contribution is considered as the analysis on nonparametric approach from a statistical point of view. The analysis suggested that convex losses were suitable for monotonic classification, and provided a necessary foundation for nonparametric methods.

Decision tree algorithms such as the CART [20] and the C4.5 [21] are widely used in the machine learning community to solve classification problems. Many variants of these algorithms have been developed and segment based decision tree [22] is an instance. To solve ordinal classifications with monotonicity constraints, lots of decision tree studies on monotonic classifications have been done. The following is a brief summary about them.

Ben-David [23] introduced a tree induction algorithm called Monotone Induction of Decision trees (MID) to this context. His main work was to modify the conditional entropy by adding a term called order-ambiguity-score, which made the splitting strategy monotonic. In 1999, Makino et al. provided two algorithms which were named as Positive Decision Trees (PDT) and Quasi-Positive Decision Trees (Q-PDT) [24] based on the former decision tree algorithms. However, both of these algorithms can only be applied to solve binary-class monotonic classification problems. In [[25], [26], [27], [28], [29]], those two algorithms were extended to handle K-class monotonic classification problems. In 2000, R. Potharst and J. C. Bioch developed an order-preserving tree-generation algorithm and provided a technique for repairing non-monotonic decision trees for multi-attribute classification problems with several linearly ordered classes [30]. In 2003, A. J. Feelders and Martijn Pardoel proposed a collection of pruning techniques to make a non-monotonic tree monotonic [31]. In the same year, a tree construction algorithm was proposed by Cao-Van and De Baets to avoid violating the monotonicity of data [[32], [33]]. In 2008, Xia, et al. extended the Gini impurity used in CATR to ordinal classifications, and called it ranking impurity [34]. In 2009, Kamp, Feelders and Barile proposed a new algorithm for learning isotonic classification trees [35].

The algorithms mentioned above have enhanced the capability of extracting ordinal information. However, they have a common disadvantage, that is, the algorithms cannot guarantee a monotonic tree is generated if the training data are monotonically increasing or decreasing.

To generate a monotonic classifier for monotonic classification problems, in 2011 Qinghua Hu et al. introduced a new measure of feature quality, which was called rank mutual information [36]. It not only can inherit the advantage of robustness from Shannon's entropy, but also can extract ordinal structures from monotonic datasets. Based on the rank mutual information, the authors designed a decision tree algorithm named REMT which can get consistently monotonic decision tree. However, the REMT algorithm does not take the classification error into account, which cannot guarantee the classification accuracy of the generated classifier. Moreover, the classifier is obtained under the restriction that all the monotonic relationships between features and the decision attribute are consistent; otherwise, the data have to be preprocessed to meet this restriction, which may cause information loss. Besides, in many other existing methods, such as those proposed in [[19], [36]], all the monotonic relationships are required to be consistent, too.

Conventional artificial neural networks have great approximation capability and almost always generate good separation in training datasets, but the behavior of the neural networks training process depends heavily on the training set. For a given training sample the classification boundary generated by a neural network is relatively unpredictable, especially in the case of small sample size. There are many extendings of artificial neural network. Such as the polygonal fuzzy neural network which is proposed to handle polygonal fuzzy data in [37], weighted networks [[38], [39]] and discrete-time stochastic neural networks [40]. In addition, to make a breakthrough, researchers have proposed a few methods which are based on artificial neural networks to handle monotonic classification problems. In order to solve the two-group classification problems with monotonicity constraints, Norman P. Archer [41] developed a modification of the existing back propagation neural network algorithm by preprocessing the training samples with a linear classification function. In [42], C. Li et al. studied the monotonic type-2 fuzzy neural network (T2FNN). Under the monotonicity constraints, a hybrid algorithm was provided to optimize the parameters of the monotonic T2FNN. In [43], C. Li et al. studied the incorporation of monotonicity into interval type-2 fuzzy logic systems. They showed that type-2 fuzzy logic systems were monotonic if the antecedent and consequents parts of their fuzzy rules were arranged according to the proposed monotonicity conditions. These approaches tend to be much more efficient than standard back propagation learning algorithms. However, all the parameters in these algorithms still need to be tuned iteratively by using the gradient descent method, which is time consuming and easily gets stuck in local minima.

In the past two decades, support vector machine (SVM) [44] and its variants [[45], [46], [47], [48]] have been extensively used in classification applications due to their surprising classification capability. But the optimization constraints are strict and the generalization ability is not very good. The ELM method proposed in [49] has overcome the shortcomings of SVM. ELM [49] is a single-hidden layer feed-forward neural network with fast training rate and good generalization capability, but due to the existence of training error, ELM cannot be directly used to handle monotonic classification problems. Motivated by extending ELM for handling monotonic classifications and by overcoming the above-mentioned shortcomings of those approaches handling monotonic classifications, this paper proposes a generalization of ELM, which is named as Monotonic Classification Extreme Learning Machine (MCELM) in which the monotonicity constraints are imposed to the original ELM model. The mathematical model of MCELM is a quadratic programming problem in which both the classification error and monotonicity are taken into account. The model can perform well with very little classification error, and more importantly, can ensure that the generated classifier is monotonic. Moreover, different from other existing approaches, MCELM does not require that all the monotonic relationships between features and the decision attribute are consistent, which indicates that the related data preprocessing is unnecessary and furthermore some information loss can be avoided. Similar to ELM, in MCELM the gradient descent method is not adopted and all the parameters do not need to be tuned iteratively, which keeps the essential advantage of extremely fast training of the original ELM.

The rest of the paper is organized as follows. Section 2 introduces the basic knowledge of monotonic classification and ELM. Section 3 develops our MCELM model and describes the framework of MCELM. Experimental results are presented in Section 4. Finally, Section 5 provides the concluding remarks.

Section snippets

Monotonic classification

Suppose that U={x1,xn} is the set of objects; A is the set of features describing the objects; D is the decision attribute of the samples, which represents the class label in classification problems. The value of sample xi in terms of attribute aA or D is denoted by v(xi,a) or v(xi,D) respectively. The ordinal relationship between samples in terms of attribute a or D is denoted by ‘’ or ‘’. We say xj is not worse than xi in term of a or D if v(xi,a)v(xj,a) or v(xi,D)v(xj,D) denoted by xia

The framework of MCELM

Similar to ELM, there are three layers in the framework of MCELM. The first layer is called the input layer, suppose where there are n neurons, and each of them represents a real input value of a feature. The number of input neurons equals to that of the features which are used to describe samples. The second layer is the hidden layer which contains N˜ neurons. The last layer is the output layer. In MCELM, there is only one neuron which represents the classification result in the output layer.

Performance evaluation

In this section, we will experimentally compare MCELM with CART [20], Rank Tree [34], OLM [8], OSDL [58], REMT [36] and ELM [49] based on both artificial datasets and real world datasets. OLM is an ordinal learning model introduced by Ben-David et al., while OSDL is an ordinal stochastic dominance learner based on associated cumulative distribution [58].

In traditional classification problems, we usually use the misclassification rate (divide the number of misclassified samples by the number of

Concluding remarks

Monotonic classification essentially is an ordinal classification with monotonicity constrains. The monotonicity between a feature and the decision attribute may be increasing or decreasing. Several methods have been proposed to solve monotonic classification problems, such as Rank Tree, OLM and REMT. Although these methods enhance the capability of extracting ordinal information, they suffer from many shortcomings. Most of them cannot guarantee the generated classifiers are monotonic ones with

Acknolwdedgement

This work is supported by the Macao Science and Technology Development Funds (100/2013/A3 and 081/2015/A3), China Postdoctoral Science Foundations (2015M572361 and 2016T90799), Basic Research Project of Knowledge Innovation Program in Shenzhen (JCYJ20150324140036825), and National Natural Science Foundations of China (61503252, 61473194, and 71371063).

Hong Zhu received the B.Sc. and M.Sc. degrees from Hebei University, Baoding, China, in 2012 and 2015, respectively. She is currently a Ph.D. candidate of the Faculty of Information Technology, Macao University of Science and Technology. Her main research interests include decision tree and neural networks.

References (59)

  • Xizhao Wang. Learning from big data with uncertainty-Editorial. Journal of Intelligent & Fuzzy Systems, 2015, 28(5):...
  • Xizhao Wang, RAR Ashfaq, Aimin Fu. Fuzziness based sample categorization for classifier performance improvement....
  • A. Ben-David et al.

    Learning and classification of monotonic ordinal concepts

    Comput. Intell.

    (1989)
  • S. Greco, B. Matarazzo, R. Slowinski, Rule-based decision support in multicriteria choice and ranking. symbolic and...
  • Z. Pawlak

    Rough sets

    Int. J. Comput. Inf. Sci.

    (1982)
  • J.C. Bioch, V. Popova, Rough sets and ordinal classification, algorithmic learning theory, in: Proceedings of the 11th...
  • V. Popova, Knowledge discovery and monotonicity. Rotterdam, Nederland: Erasmus Research Institute of Management (ERIM)...
  • S. Greco et al.

    Rough approximation by dominance relations

    Int. J. Intell. Syst.

    (2002)
  • J.W.T. Lee et al.

    Rough sets and ordinal reducts

    Soft Comput.

    (2005)
  • W. Kotlowski et al.

    On nonparametric ordinal classification with monotonicity constraints

    IEEE Trans. Knowl. Data Eng.

    (2013)
  • L. Breiman et al.

    Classification and Regression Trees

    (1984)
  • J.R. Quinlan

    C4.5: Programs for Machine Learning

    (1993)
  • Ran Wang, Sam Kwong, Xizhao Wang, Qingshan Jiang. Segment based decision tree induction with continuous...
  • A. Ben-David

    Monotonicity maintenance in information-theoretic machine learning algorithms

    Mach. Learn.

    (1995)
  • S. Makino

    Cardiomyocytes can be generated from marrow stromal cells in vitro

    J. Clin. Investig.

    (1999)
  • R. Potharst, J.C. Bioch, T.C. Petter, Monotone decision trees. Technical report, Erasmus University Rotterdam,...
  • R. Potharst, Classification using decision trees and neural networks, Ph. D. dissertation, Erasmus University...
  • R. Potharst et al.

    Decision trees for ordinal classification

    Intell. Data Anal.

    (2000)
  • R. Potharst et al.

    Classification trees for problems with monotonicity constraints

    SIGKDD Explor.

    (2002)
  • Cited by (41)

    • Physics-informed machine learning methods for biomass gasification modeling by considering monotonic relationships

      2023, Bioresource Technology
      Citation Excerpt :

      Pan et al. built a hybrid model with factor influence-based monotone additive SVR (Pan et al., 2014). Zhu et al. proposed a monotonic classification extreme learning machine by considering the monotonicity constraints (Zhu et al., 2017). Xie et al. constructed a deep learning model guided by extreme events and a monotonic relationship to simulate the rainfall-runoff process (Xie et al., 2021).

    • Monotonic relation-constrained Takagi-Sugeno-Kang fuzzy system

      2022, Information Sciences
      Citation Excerpt :

      First, the proposed method was compared with five classical nonmonotonic classification methods, i.e., the TSK FS classifier (TSK-FSC) [8], support vector machine (SVM) [17], fuzzy SVM (FSVM) [19], K-nearest neighbor (KNN) [21] and Naive Bayes [22]. Second, the proposed method was compared with three existing monotonic methods, i.e., the regularized monotonic fuzzy SVM (RMC-FSVM) [20], rank mutual information (RMI) [30] and the monotonic classification extreme learning machine (MCELM) [36]. Five-fold cross validation was used to determine the appropriate values for the hyperparameters in the given search grids.

    • Prediction of effluent quality in papermaking wastewater treatment processes using dynamic kernel-based extreme learning machine

      2020, Process Biochemistry
      Citation Excerpt :

      Extreme learning machine (ELM) as a simple and effective single hidden layer feedforward neural network was proposed by Huang [24,25]. As a kind of neural network, ELM is mainly used for regression, pattern recognition, and classification [26,27]. It is worth mentioning that this algorithm only needs to set the number of hidden layer nodes of the network, then it could get the optimal solution without adjusting the input weight and the bias of the hidden layer.

    View all citing articles on Scopus

    Hong Zhu received the B.Sc. and M.Sc. degrees from Hebei University, Baoding, China, in 2012 and 2015, respectively. She is currently a Ph.D. candidate of the Faculty of Information Technology, Macao University of Science and Technology. Her main research interests include decision tree and neural networks.

    Eric C.C. Tsang received his B.Sc. degree in Computer Studies from the City University of Hong Kong in 1990 and Ph.D. degree in computing at the Hong Kong Polytechnic University in 1996. He is an associate professor of the Faculty of Information Technology, Macao University of Science and Technology. His main research interests include fuzzy systems, rough sets, fuzzy rough sets and multiple classifier systems.

    Xi-Zhao Wang (M′03-SM'04-F′12) received the Ph.D. degree in computer science from the Harbin Institute of Technology, Harbin, China, in 1998. He is currently a Professor with the College of Computer Science and Software Engineering, Shenzhen University, Guangdong, China. From September 1998 to September 2001, he was a Research Fellow with the Department of Computing, Hong Kong Polytechnic University, Hong Kong. His current research interests include learning from examples with fuzzy representation, fuzzy measures and integrals, neuro fuzzy systems and genetic algorithms, feature extraction, multiclassifier fusion, and applications of machine learning. Dr. Wang has been the PI/Co-PI for 16 research projects supported partially by the National Natural Science Foundation of China and the Research Grant Committee of Hong Kong Government. He was the BoG Member of the IEEE Systems, Man, and Cybernetics Society (IEEE SMCS) in 2005, 2007–2009, and 2012–2014; the Chair of the IEEESMC Technical Committee on Computational Intelligence, an Associate Editor of IEEE TRANSACTIONS ON CYBERNETICS; an Associate Editor of Pattern Recognition and Artificial Intelligence; a Member of the Editorial Board of Information Sciences; and an Executive Member of the Chinese Association of Artificial Intelligence. He was the recipient of the IEEE SMCS Outstanding Contribution Award in 2004 and the recipient of IEEE SMCS Best Associate Editor Award in 2006. He is the general Co-Chair of the 2002–2014 International Conferences on Machine Learning and Cybernetics, cosponsored by IEEE SMCS. He is a Distinguished Lecturer of the IEEE SMCS.

    Rana Aamir Raza Ashfaq received his Master degree in Computer Science from Blekinge Tekniska Hogskola (BTH), Sweden. He also received his Bachelor and Master Degrees in Computer Science from Bahauddin Zakariya University, Multan, Pakistan. Since 2010 he is working as Assistant Professor in Department of Computer Science, Bahauddin Zakariya University, Multan, Pakistan. He is currently a Ph.D. candidate in College of Computer Science & Software Engineering, Shenzhen University, Shenzhen, Guangdong, China. His main research interests include machine learning and data mining.

    View full text