Elsevier

Pattern Recognition

Volume 95, November 2019, Pages 72-82
Pattern Recognition

Distinguishing two types of labels for multi-label feature selection

https://doi.org/10.1016/j.patcog.2019.06.004Get rights and content

Highlights

  • We categorize labels into two groups: independent labels and dependent labels.

  • A new feature relevance term that considers label redundancy is proposed.

  • We propose a novel multi-label feature selection method.

  • Our method outperforms seven other methods in terms of four metrics.

  • We implement experiments on 12 benchmark data sets.

Abstract

Multi-label feature selection plays an important role in pattern recognition, which can improve multi-label classification performance. In traditional multi-label feature selection methods based on information theory, feature relevance is evaluated by the accumulated mutual information between a candidate feature and each label. However, to the best of our knowledge, traditional methods ignore the effect of label redundancy on the evaluation of feature relevance. To address this issue, we propose a new multi-label feature selection method named multi-label Feature Selection based on Label Redundancy (LRFS). First, we categorize labels into two groups: independent labels and dependent labels. Second, by analyzing the differences between independent labels and dependent labels, we propose a new feature relevance term, that is, the conditional mutual information between candidate features and each label given other labels. Finally, we combine the new feature relevance term with the feature redundancy term to design our feature selection method. To evaluate the classification performance of our method, LRFS is compared to three information-theoretical-based multi-label feature selection methods on an artificial data set. Furthermore, LRFS is compared to five algorithm adaption feature selection methods and two problem transformation feature selection methods on 12 real-world multi-label data sets. The experimental results demonstrate that LRFS outperforms the other compared methods in terms of four evaluation metrics.

Introduction

Recent years, multi-label classification [1], [2], [3] has received increased attention in modern applications such as gene function classification [4], text categorization [5] and the semantic annotation of images [6]. Multi-label data in which one sample may be simultaneously relevant to multiple labels has widely emerged in real-world problems [7].

Multi-label data contain a large number of features that often involve irrelevant and redundant features [8], [9]. The high dimensionality of multi-label data increases the computing costs and decreases the classification performance due to the effects of irrelevant and redundant features. As a result, multi-label feature selection has received extensive attention in order to address the curse of dimensionality [10], [11], [12]. The multi-label feature selection methods aim to select the feature subset from the original data set, and the feature subset can effectively improve the classification performance of multi-label data [13], [14], [15], [16].

Multi-label feature selection methods usually can be categorized into three groups: filter methods, wrapper methods and embedded methods [17], [18], [19]. Filter methods do not consider any learning algorithm when the feature subset is selected [20], [21], wrapper methods evaluate the importance of each candidate feature subset according to the classification performance of the specific classifier [22], [23], and embedded methods combine the learning algorithm with the process of feature selection [24], [25]. In this paper, we focus on filter methods because they are simple and efficient.

Information theory is a common evaluation measure for designing the multi-label feature selection method because it can measure both linear and nonlinear relationships among variables [26], [27]. Many multi-label feature selection methods based on information theory have been proposed to address the curse of dimensionality. Lee et al. [28] propose a multi-label feature selection method that employs mutual information and interaction information to obtain an effective feature subset for multi-label classification. Additionally, they also propose a multi-label feature selection method based on scalable criterion for the large label set, which considers mutual information and information entropy in measuring the feature relevance and feature redundancy [29]. However, to the best of our knowledge, previous methods ignore label redundancy when they calculate feature relevance in the process of feature selection. In this work, we propose a new multi-label feature selection method, which employs a new feature relevance term. The main contributions of this paper are as follows.

  • (1)

    In this paper, we categorize labels into two groups: independent labels and dependent labels. Furthermore, we describe the details of the two groups of labels in Section 4.1.

  • (2)

    Considering the correlations between labels, we propose the Label Redundancy (LR) term, which employs conditional mutual information to measure the feature relevance in the process of feature selection.

  • (3)

    Based on the discussions of (1) and (2), a novel multi-label Feature Selection method based on Label Redundancy (LRFS) is proposed. In LRFS, the feature relevance is measured by the LR term and the feature redundancy is measured by the mutual information between a candidate feature and each already-selected feature.

  • (4)

    The classification performance of our method is validated on an artificial data set and 12 real-world multi-label data sets. The experimental results show that the proposed method achieves a promising improvement in the classification performance.

The rest of this paper is organized as follows. In Section 2, we introduce some basic concepts of information theory. In Section 3, we briefly review the related work. In Section 4, we propose a new multi-label feature selection method. In Section 5, we present the experimental results that are used to verify the effectiveness of the proposed feature selection method. In Section 6, we make a conclusion and give the direction of our future research.

Section snippets

Preliminaries

In this section, some fundamental concepts of information theory are presented. Information theory [30] is used to measure the correlations among random variables. Information entropy is a measure of the uncertainty of random variables [31]. Let X={x1,x2,,xn},Y={y1,y2,,ym} and Z={z1,z2,,zk} be three discrete random variables. The entropy of random variable X is defined by the following:H(X)=i=1np(xi)logp(xi)where p(xi) is the probability of xi. The base of log is 2; therefore, H(X) ≥ 0.

Related work

A number of multi-label classification algorithms have been developed, which can be categorized into two groups [32], [33]: problem transformation and algorithm adaption. Problem transformation is a common method that transforms label sets into a single-label problem and then solves the traditional single-label classification problem, such as the Label Powerset (LP) [7] and the Pruned Problem Transformation (PPT) [34]. The LP transforms multi-label data into single-label data by assigning each

Proposed multi-label feature selection method

In this section, we propose a new feature selection method that considers the correlations between labels when calculating the feature relevance. First, we show and analyze the feature relevance from two perspectives: the labels are independent of each other and the labels are dependent on each other. Second, we define a new feature relevance term named Label Redundancy (LR). Finally, a multi-label feature selection method based on Label Redundancy (LRFS) is proposed and the pseudo code is

Experimental results and analysis

In this section, we verify the effectiveness of the proposed LRFS method in multi-label classification and present the experimental results. First, we introduce four multi-label evaluation criteria used in the experiment. Second, LRFS is compared to three information-theoretical-based multi-label feature selection methods that do not consider the label redundancy on an artificial data set (D2F [44], PMU [28] and SCLS [29]). Finally, our method is compared to seven representative methods, which

Conclusions and future work

To the best of our knowledge, previous multi-label feature selection methods based on information theory ignore the effect of label redundancy on the evaluation of the feature relevance. To address this issue, we first divide the labels into two groups: independent labels and dependent labels. Second, we propose a new feature relevance term named Label Redundancy (LR). LR employs the conditional mutual information to evaluate the importance of the candidate features. Finally, a novel

Acknowledgements

This work is supported by Postdoctoral Innovative Talents Support Program under Grant No. BX20190137; National Nature Science Foundation of China [grant number 61772226,61373051,61502343]; Science and Technology Development Program of Jilin Province [grant number 20140204004GX]; Science Research Funds for the Guangxi Universities [grant number KY2015ZD122]; Science Research Funds for the Wuzhou University [grant number 2014A002]; Project of Science and Technology Innovation Platform of

Ping Zhang received her B.E degree from Hebei GEO University in 2015. Now, She is working toward the Ph.D. degree in the College of Computer Science, Jilin University. Her research interests include feature selection and information theory.

References (47)

  • W. Gao et al.

    Feature selection by integrating two groups of feature evaluation criteria

    Expert Syst. Appl.

    (2018)
  • W. Gao et al.

    Feature selection considering the composition of feature relevancy

    Pattern Recognit Lett

    (2018)
  • L. Hu et al.

    Feature selection considering two types of feature relevancy and feature interdependency

    Expert Syst. Appl.

    (2018)
  • Y. Lin et al.

    Multi-label feature selection based on max-dependency and min-redundancy

    Neurocomputing

    (2015)
  • J. Lee et al.

    Fast multi-label feature selection based on information-theoretic feature ranking

    Pattern Recognit.

    (2015)
  • J. Lee et al.

    Feature selection for multi-label classification using multivariate mutual information

    Pattern Recognit. Lett

    (2013)
  • J. Lee et al.

    Scls: multi-label feature selection based on scalable criterion for large label set

    Pattern Recognit.

    (2017)
  • M.-L. Zhang et al.

    Feature selection for multi-label naive bayes classification

    Inf. Sci.

    (2009)
  • Y. Zhang et al.

    Multilabel dimensionality reduction via dependence maximization

    ACM Trans. Knowl. Dis. Data (TKDD)

    (2010)
  • F. Li et al.

    Granular multi-label feature selection based on mutual information

    Pattern Recognit.

    (2017)
  • J. Lee et al.

    Mutual information-based multi-label feature selection using interaction information

    Expert Syst. Appl.

    (2015)
  • M.L. Zhang et al.

    Ml-knn: a lazy learning approach to multi-label learning

    Pattern Recognit.

    (2007)
  • S.J. Huang et al.

    Multi-label learning by exploiting label correlations locally

    Twenty-Sixth AAAI Conference on Artificial Intelligence

    (2012)
  • Cited by (100)

    View all citing articles on Scopus

    Ping Zhang received her B.E degree from Hebei GEO University in 2015. Now, She is working toward the Ph.D. degree in the College of Computer Science, Jilin University. Her research interests include feature selection and information theory.

    Guixia Liu received her M.S. and Ph.D. degrees in Computer Science from Jilin University in 1996 and 2007. She is currently a professor and doctoral supervisor at the College of Computer Science and Technology, Jilin University, China. Her research areas are machine learning.

    Wanfu Gao received his B.E. and Ph.D. degrees in the College of Computer Science from Jilin University in 2013 and 2019. He is doing post-doctoral research in the College of Chemistry in Jilin University. His research interests include feature selection and information theory.

    View full text