Elsevier

Neurocomputing

Volume 251, 16 August 2017, Pages 106-114
Neurocomputing

A cost-sensitive semi-supervised learning model based on uncertainty

https://doi.org/10.1016/j.neucom.2017.04.010Get rights and content

Abstract

Aiming at reducing the total cost in cost-sensitive learning, this paper introduces a semi-supervised learning model based on uncertainty of sample outputs. Its central idea is (1) to categorize the samples which are not in training set into several groups based on the uncertainty-magnitude of their outputs, (2) to add the group of samples which have the least uncertainty together with their predicted labels in the original training set, and (3) to retain a new classifier for total cost reduction. The ratio of costs between classes and its impact on learning system improvement is discussed. Theoretical analysis and experimental demonstration show that the model can effectively improve the performance of a cost-sensitive learning algorithm for a certain type of classifiers.

Introduction

Technology of classification which belongs to a topic of machine learning has been widely applied in a lot of domains such as pattern recognition, knowledge discovery, intelligent control, network security, gene engineering, bioinformatics, and so on. From literature we can find many approaches to designing classifiers such as decision tree induction [1], [2], hypothesis version spaces [3], Bayesian networks [4], [5], evolutionary computing [6], logistic regressions [7], [8], support vector machines [9], [10], neural networks [11], and deep learning technique [12], [13]. The most important index for evaluating a designed classifier is the generalization ability, i.e., the rate of correctly classifying samples which are not in training set.

The concept of cost-sensitive classification can be introduced into the process of classifier design [14], [15], [16], [17], [18]. The “cost-sensitive” refers to that a class (feature or object) has the different cost in comparison with another class (feature or object respectively) in classification process. For example, the cost of wrongly classifying a patient as non-cancer class is much bigger than the cost of wrongly classifying the patient as cancer class. From the viewpoint of loss function, cost-sensitive classifier learning is to minimize a cost-loss function but cost-insensitive learning is to maximize the correct rate of classification. Under some certain conditions, these two objective functions of optimization can be equivalent.

The following is an incomplete survey on cost-sensitive learning of classification problems. Reference [14] proposed a principled method for making a general classifier cost-sensitive by wrapping a cost-minimizing procedure called Meta-Cost around it. It is confirmed that class-imbalance often affects the performance of cost-sensitive classifiers in [15]. While most studies of cost-sensitive have only paid attention how to deal with misclassification costs, article [16] put forward to handle the equally important issue: the test costs associated with querying the missing values in a test case. A theoretical analysis on F-measures was presented in [17] for binary, multiclass and multi-label classification while these performance measures are non-linear. Paper [18] proposed a cost-sensitive rotation forest algorithm for gene expression data classification. Three classification costs, namely misclassification cost, test cost and rejection cost, are embedded into the rotation forest algorithm.

Under the framework of semi-supervised learning, this paper introduces a cost-sensitive learning model based on uncertainty-magnitude of sample outputs. Suppose we have selected a classifier training algorithm and trained a basic classifier. The fundamental operations of this model include three steps. The first step is categorizing the testing samples into several groups based on the uncertainty of their outputs given by the basic classifier; the second step is determining a group of samples with smallest uncertainty and adding these samples and their labels predicted by the basic classifier in the original training set, and the third step is retraining a classifier on the enlarged training set through the selected training algorithm. We expect that the new classifier has a reduced total cost in comparison with the basic classifier when a class-cost matrix for wrong classification among the classes is given.

The algorithm of training a basic classifier is selected in this paper as Extreme Learning Machine (ELM) which is a recently popular scheme for training a single hidden layer feed forward neural network. The reason is that the ELM has the non-iterative training mechanism, which has been studied intensively and extensively in recent decade. ELM was firstly proposed in [19], which randomly chooses weights of hidden nodes and analytically determines the output weights of a single-hidden layer feed-forward network (SLFN). ELMs have both better generalization performance and much faster learning speed than traditional algorithms. ELMs were originally developed for the SLFNs [20] and then extended to the generalized SLFNs which need not be neuron alike [21]. To tackle with the data with imbalanced class distribution, paper [22] proposed a weighted ELM which can be generalized to cost sensitive learning by distributing different weights for individual examples. The dissimilar ELM (D-ELM) was developed by introducing misclassification costs into the classifier and the cost-sensitive D-ELM (CS-D-ELM) was proposed to increase the classification stability [23].

One can find from references many specific forms of uncertainty representation for a vector within which each component is a number between 0 and 1. The uncertainty in this work is chosen as the fuzziness which basically is a variant of the entropy, the typical representation of uncertainty for a probability distribution. It is worth pointing out that our developed learning model is basically insensitive to the selection of both the classifier training algorithms and the specific representation forms of uncertainty.

The rest of this paper is organized as follows. Section 2 gives a brief review on uncertainty of a vector. Section 3 introduces the ELM training algorithm and then incorporates the cost-sensitive class into the ELM training. Section 4 demonstrates some observations between uncertainty amount and classification cost, and then proposes our new training model for class cost-sensitive learning. Section 5 lists the experimental verification and provides some analysis on our model. Section 6 finally concludes this paper.

Section snippets

Uncertainty

Uncertainty is usually referred to as that a concept cannot be described clearly and exactly. We do not find a general definition of uncertainty mathematically, but under different settings, specific definitions of uncertainty can be given. The following is a brief summary of several uncertainties well modeled mathematically (Table 1).

We now focus on a typical uncertainty called fuzziness for a fuzzy set [32]. Fuzziness is used to describe the unclear degree between two terms such as hot and

Extreme learning machine with cost-sensitive class

Neural networks are widely used in many fields because of their strong ability to approximate complex nonlinear mappings. The approximation ability of randomly weighted neural networks is confirmed in [20], [21], [29]. Extreme learning machine (ELM) [19] is an algorithm for training single-hidden-layer feedforward neural networks (SLFNs), which is reported in many references to have faster learning speed and better generalization performance than the traditional algorithms of training

Sample categorization based on uncertainty

This section reports an experimental observation regarding the training/testing cost among sample categories with different uncertainty. We can use Algorithm 2 to train an ELM with a given class cost matrix and a training set of classification problem. The trained ELM can have a vector output for any example (training or testing), and the vector can further be normalized as a fuzzy set. Then all examples (training or testing) are grouped as several categories based on the sorting of sample

Our approach and experimental demonstration

Section 4 indicates experimentally an interesting phenomenon, that is, the testing cost of the low fuzziness category of samples is generally less than the testing cost of high fuzziness category for a give class cost matrix. Carefully studying this phenomenon, we have three approaches to reduce the total cost of testing.

  • (1)

    Divide-and-conquer strategy: It means we can regularly use the trained classifier to predict classes of samples of low fuzziness category and use an enhanced technique such

Concluding remarks

This paper empirically confirms that, for classification problem with cost-sensitive classes, samples with low output-fuzziness are usually of high quality. It will reduce the total cost of prediction if these high quality samples together with their predicted labels are added in the original training set and a retraining on the enlarged training set is conducted. We highlight that it is an empirical observation. We have not yet a mathematical equation to model this observation and then to

Acknowledgments

This study was supported by Basic Research Project of Knowledge Innovation Program in Shenzhen (JCYJ20150324140036825), and National Natural Science Foundations of China (71371063).

Hongyu Zhu is a postgraduate of college at Computer Science and Software Engineering in Shenzhen University under the supervision of Professor Xi-zhao Wang. She received her bachelor's degree from Jangxi Normal University in 2011. Her research interests include machine learning, cost sensitive model, fuzzy sets and three-way decision.

References (32)

  • J. Mingers

    An empirical comparison of pruning methods for decision tree induction

    Mach. Learn.

    (1989)
  • J. Tanha et al.

    Semi-supervised self-training for decision tree classifiers

    Int. J. Mach. Learn. Cybern.

    (2017)
  • H. Prade et al.

    Version space learning for possibilistic hypotheses

  • J. Cheng et al.

    Comparing Bayesian network classifiers

    Uncertainty in Artificial Intelligence

    (2013)
  • E. Philippot et al.

    Bayesian networks for incomplete data analysis in form processing

    Int. J. Mach. Learn. Cybern.

    (2015)
  • D.B. Fogel
    (1995)
  • Cited by (19)

    View all citing articles on Scopus

    Hongyu Zhu is a postgraduate of college at Computer Science and Software Engineering in Shenzhen University under the supervision of Professor Xi-zhao Wang. She received her bachelor's degree from Jangxi Normal University in 2011. Her research interests include machine learning, cost sensitive model, fuzzy sets and three-way decision.

    Xi-Zhao Wang (M'03 – SM'04 – F'12) received the Ph.D. degree in computer science from the Harbin Institute of Technology, Harbin, China, in 1998. He is currently a professor with the College of Computer Science and Software Engineering, Shenzhen University, Guangdong, China. From September 1998 to September 2001, he was a research fellow with the Department of Computing, Hong Kong Polytechnic University, Hong Kong. He has more than 180 publications, including four books, seven book chapters, and more than 100 journal papers in the IEEE Transactions on Pattern Analysis and Machine Intelligence, IEEE Transactions on Systems, Man, and Cybernetics, IEEE Transactions on Fuzzy Systems, Fuzzy Sets and Systems, Pattern Recognition, etc. His H-index is 20 (up to June 2014). His current research interests include learning from examples with fuzzy representation, fuzzy measures and integrals, neurofuzzy systems and genetic algorithms, feature extraction, multiclassifier fusion, and applications of machine learning. Dr. Wang has been the PI/Co-PI for 16 research projects supported partially by the National Natural Science Foundation of China and the Research Grant Committee of Hong Kong Government. He was the BoG Member of the IEEE Systems, Man, and Cybernetics Society (IEEE SMCS) in 2005, 2007–2009, and 2012–2014; the Chair of the IEEE SMC Technical Committee on Computational Intelligence, an associate editor of IEEE Transactions on Cybernetics; an associate editor of Pattern Recognition and Artificial Intelligence; a Member of the Editorial Board of Information Sciences; and an Executive Member of the Chinese Association of Artificial Intelligence. He was the recipient of the IEEE SMCS Outstanding Contribution Award in 2004 and the recipient of IEEE SMCS Best Associate Editor Award in 2006. He is the general Co-Chair of the 2002–2014 International Conferences on Machine Learning and Cybernetics, cosponsored by IEEE SMCS. He is a distinguished lecturer of the IEEE SMCS.

    View full text