Distinguishing two types of labels for multi-label feature selection

doi:10.1016/j.patcog.2019.06.004

Pattern Recognition

Volume 95, November 2019, Pages 72-82

https://doi.org/10.1016/j.patcog.2019.06.004 Get rights and content

Highlights

•
We categorize labels into two groups: independent labels and dependent labels.
•
A new feature relevance term that considers label redundancy is proposed.
•
We propose a novel multi-label feature selection method.
•
Our method outperforms seven other methods in terms of four metrics.
•
We implement experiments on 12 benchmark data sets.

Abstract

Multi-label feature selection plays an important role in pattern recognition, which can improve multi-label classification performance. In traditional multi-label feature selection methods based on information theory, feature relevance is evaluated by the accumulated mutual information between a candidate feature and each label. However, to the best of our knowledge, traditional methods ignore the effect of label redundancy on the evaluation of feature relevance. To address this issue, we propose a new multi-label feature selection method named multi-label Feature Selection based on Label Redundancy (LRFS). First, we categorize labels into two groups: independent labels and dependent labels. Second, by analyzing the differences between independent labels and dependent labels, we propose a new feature relevance term, that is, the conditional mutual information between candidate features and each label given other labels. Finally, we combine the new feature relevance term with the feature redundancy term to design our feature selection method. To evaluate the classification performance of our method, LRFS is compared to three information-theoretical-based multi-label feature selection methods on an artificial data set. Furthermore, LRFS is compared to five algorithm adaption feature selection methods and two problem transformation feature selection methods on 12 real-world multi-label data sets. The experimental results demonstrate that LRFS outperforms the other compared methods in terms of four evaluation metrics.

Introduction

Recent years, multi-label classification [1], [2], [3] has received increased attention in modern applications such as gene function classification [4], text categorization [5] and the semantic annotation of images [6]. Multi-label data in which one sample may be simultaneously relevant to multiple labels has widely emerged in real-world problems [7].

Multi-label data contain a large number of features that often involve irrelevant and redundant features [8], [9]. The high dimensionality of multi-label data increases the computing costs and decreases the classification performance due to the effects of irrelevant and redundant features. As a result, multi-label feature selection has received extensive attention in order to address the curse of dimensionality [10], [11], [12]. The multi-label feature selection methods aim to select the feature subset from the original data set, and the feature subset can effectively improve the classification performance of multi-label data [13], [14], [15], [16].

Multi-label feature selection methods usually can be categorized into three groups: filter methods, wrapper methods and embedded methods [17], [18], [19]. Filter methods do not consider any learning algorithm when the feature subset is selected [20], [21], wrapper methods evaluate the importance of each candidate feature subset according to the classification performance of the specific classifier [22], [23], and embedded methods combine the learning algorithm with the process of feature selection [24], [25]. In this paper, we focus on filter methods because they are simple and efficient.

Information theory is a common evaluation measure for designing the multi-label feature selection method because it can measure both linear and nonlinear relationships among variables [26], [27]. Many multi-label feature selection methods based on information theory have been proposed to address the curse of dimensionality. Lee et al. [28] propose a multi-label feature selection method that employs mutual information and interaction information to obtain an effective feature subset for multi-label classification. Additionally, they also propose a multi-label feature selection method based on scalable criterion for the large label set, which considers mutual information and information entropy in measuring the feature relevance and feature redundancy [29]. However, to the best of our knowledge, previous methods ignore label redundancy when they calculate feature relevance in the process of feature selection. In this work, we propose a new multi-label feature selection method, which employs a new feature relevance term. The main contributions of this paper are as follows.

(1)
In this paper, we categorize labels into two groups: independent labels and dependent labels. Furthermore, we describe the details of the two groups of labels in Section 4.1.
(2)
Considering the correlations between labels, we propose the Label Redundancy (LR) term, which employs conditional mutual information to measure the feature relevance in the process of feature selection.
(3)
Based on the discussions of (1) and (2), a novel multi-label Feature Selection method based on Label Redundancy (LRFS) is proposed. In LRFS, the feature relevance is measured by the LR term and the feature redundancy is measured by the mutual information between a candidate feature and each already-selected feature.
(4)
The classification performance of our method is validated on an artificial data set and 12 real-world multi-label data sets. The experimental results show that the proposed method achieves a promising improvement in the classification performance.

The rest of this paper is organized as follows. In Section 2, we introduce some basic concepts of information theory. In Section 3, we briefly review the related work. In Section 4, we propose a new multi-label feature selection method. In Section 5, we present the experimental results that are used to verify the effectiveness of the proposed feature selection method. In Section 6, we make a conclusion and give the direction of our future research.

Section snippets

Preliminaries

In this section, some fundamental concepts of information theory are presented. Information theory [30] is used to measure the correlations among random variables. Information entropy is a measure of the uncertainty of random variables [31]. Let $X = {x_{1}, x_{2}, \dots, x_{n}}, Y = {y_{1}, y_{2}, \dots, y_{m}}$ and $Z = {z_{1}, z_{2}, \dots, z_{k}}$ be three discrete random variables. The entropy of random variable X is defined by the following: $H (X) = - \sum_{i = 1}^{n} p (x_{i}) \log p (x_{i})$ where p(x_i) is the probability of x_i. The base of log is 2; therefore, H(X) ≥ 0.

Related work

A number of multi-label classification algorithms have been developed, which can be categorized into two groups [32], [33]: problem transformation and algorithm adaption. Problem transformation is a common method that transforms label sets into a single-label problem and then solves the traditional single-label classification problem, such as the Label Powerset (LP) [7] and the Pruned Problem Transformation (PPT) [34]. The LP transforms multi-label data into single-label data by assigning each

Proposed multi-label feature selection method

In this section, we propose a new feature selection method that considers the correlations between labels when calculating the feature relevance. First, we show and analyze the feature relevance from two perspectives: the labels are independent of each other and the labels are dependent on each other. Second, we define a new feature relevance term named Label Redundancy (LR). Finally, a multi-label feature selection method based on Label Redundancy (LRFS) is proposed and the pseudo code is

Experimental results and analysis

In this section, we verify the effectiveness of the proposed LRFS method in multi-label classification and present the experimental results. First, we introduce four multi-label evaluation criteria used in the experiment. Second, LRFS is compared to three information-theoretical-based multi-label feature selection methods that do not consider the label redundancy on an artificial data set (D2F [44], PMU [28] and SCLS [29]). Finally, our method is compared to seven representative methods, which

Conclusions and future work

To the best of our knowledge, previous multi-label feature selection methods based on information theory ignore the effect of label redundancy on the evaluation of the feature relevance. To address this issue, we first divide the labels into two groups: independent labels and dependent labels. Second, we propose a new feature relevance term named Label Redundancy (LR). LR employs the conditional mutual information to evaluate the importance of the candidate features. Finally, a novel

Acknowledgements

This work is supported by Postdoctoral Innovative Talents Support Program under Grant No. BX20190137; National Nature Science Foundation of China [grant number 61772226,61373051,61502343]; Science and Technology Development Program of Jilin Province [grant number 20140204004GX]; Science Research Funds for the Guangxi Universities [grant number KY2015ZD122]; Science Research Funds for the Wuzhou University [grant number 2014A002]; Project of Science and Technology Innovation Platform of

Ping Zhang received her B.E degree from Hebei GEO University in 2015. Now, She is working toward the Ph.D. degree in the College of Computer Science, Jilin University. Her research interests include feature selection and information theory.

References (47)

W. Gao et al.
On the consistency of multi-label learning
Artif. Intell.
(2013)
J. Read et al.
Multi-label methods for prediction with sequential data
Pattern Recognit.
(2017)
M.R. Boutell et al.
Learning multi-label scene classification
Pattern Recognit.
(2004)
P. Naula et al.
Multi-label learning under feature extraction budgets
Pattern Recognit. Lett
(2014)
J.J. Zhang et al.
Multi-label learning with discriminative features for each label
Neurocomputing
(2015)
Y. Lin et al.
Multi-label feature selection based on neighborhood mutual information
Appl. Soft Comput.
(2016)
J. Lee et al.
Memetic feature selection algorithm for multi-label classification
Inf. Sci.
(2015)
H. Li et al.
A novel attribute reduction approach for multi-label data based on rough set theory
Inf. Sci.
(2016)
G. Doquire et al.
Mutual information-based feature selection for multilabel classification
Neurocomputing
(2013)
W. Gao et al.
Class-specific mutual information variation for feature selection
Pattern Recognit.
(2018)

W. Gao et al.

Feature selection by integrating two groups of feature evaluation criteria

Expert Syst. Appl.

(2018)

W. Gao et al.

Feature selection considering the composition of feature relevancy

Pattern Recognit Lett

(2018)

L. Hu et al.

Feature selection considering two types of feature relevancy and feature interdependency

Expert Syst. Appl.

(2018)

Y. Lin et al.

Multi-label feature selection based on max-dependency and min-redundancy

Neurocomputing

(2015)

J. Lee et al.

Fast multi-label feature selection based on information-theoretic feature ranking

Pattern Recognit.

(2015)

J. Lee et al.

Feature selection for multi-label classification using multivariate mutual information

Pattern Recognit. Lett

(2013)

J. Lee et al.

Scls: multi-label feature selection based on scalable criterion for large label set

Pattern Recognit.

(2017)

M.-L. Zhang et al.

Feature selection for multi-label naive bayes classification

Inf. Sci.

(2009)

Y. Zhang et al.

Multilabel dimensionality reduction via dependence maximization

ACM Trans. Knowl. Dis. Data (TKDD)

(2010)

F. Li et al.

Granular multi-label feature selection based on mutual information

Pattern Recognit.

(2017)

J. Lee et al.

Mutual information-based multi-label feature selection using interaction information

Expert Syst. Appl.

(2015)

M.L. Zhang et al.

Ml-knn: a lazy learning approach to multi-label learning

Pattern Recognit.

(2007)

S.J. Huang et al.

Multi-label learning by exploiting label correlations locally

Twenty-Sixth AAAI Conference on Artificial Intelligence

(2012)

Cited by (100)

Multi-label feature selection with global and local label correlation
2024, Expert Systems with Applications
In various application domains, high-dimensional multi-label data has become more prevalent, presenting two significant challenges: instances with high-dimensional features and a large number of labels. In the context of multi-label feature selection, the objective is to choose a subset of features from a given set that is highly pertinent for predicting multiple labels or categories associated with each instance. However, certain characteristics of multi-label classification, such as label dependencies and imbalanced label distribution, have often been overlooked although they hold valuable insights for designing effective multi-label feature selection algorithms. In this paper, we propose a feature selection model which exploits explicit global and local label correlations to select discriminative features across multiple labels. In addition, by representing the feature matrix and label matrix in a shared latent space, the model aims to capture the underlying correlations between features and labels. The shared representation can reveal common patterns or relationships that exist across multiple labels and features. An objective function involving $L_{2, 1}$ -norm regularization is formulated, and an alternating optimization-based iterative algorithm is designed to obtain the sparse coefficients for multi-label feature selection. The proposed method was evaluated on 14 real-world multi-label datasets using six evaluation metrics, through comprehensive experiments. The results indicate its effectiveness, surpassing that of several representative methods.
LSFSR: Local label correlation-based sparse multilabel feature selection with feature redundancy
2024, Information Sciences
In recent studies, existing multilabel feature selection models have focused on either considering the relationship between labels or the redundancy between features. Furthermore, they only use simple sparsity constraints to process high-dimensional data without the intrinsic relationships between features and labels. These issues can have a great impact on the classification effectiveness of feature selection. To address these limitations, this article describes a new local label correlation-based sparse multilabel feature selection approach with feature redundancy. First, a new loss function is established among the matrices of samples, label coefficients, and labels. Then, the Frobenius norm is imposed to investigate the potential relationships between features and labels. The weight matrix is sparsified by the l_2,1 norm to ensure that the new loss function has high interpretability. Second, a manifold constraint is employed to capture the local geometric structure between labels and to delve deeper into the latent information among the local labels. Then manifold constraints and Laplacian scores are combined for embedding feature selection to guide the exploration of hidden latent label. Finally, by considering the differences between the feature scores and the redundancy between the samples, feature redundancy is analyzed via the modified cosine similarity, and a candidate feature subset with low redundancy is generated. The l₂ norm is used to select features with low redundancy while preserving sparsity, and a novel objective function is developed to optimize this solution. Thus, a sparse feature selection algorithm via local label correlation and feature redundancy is designed, and has demonstrated remarkable classification effectiveness in comparative experiments conducted on 21 multilabel datasets.
Multi-label feature selection with high-sparse personalized and low-redundancy shared common features
2024, Information Processing and Management
Prevalent multi-label feature selection (MLFS) approaches to obtain the most suitable feature subset by dealing with two issues, namely sparsity and redundancy. In this paper, we design an efficient Elastic net based high Sparse personalized and low Redundancy Feature Selection approach for multi-label data named ESRFS to address the two obstacles, $i . e .$ , low-sparse LASSO-norm leads to personalized features for each label while high redundancy $l_{2, 1}$ -norm explore shared common features for all labels in multi-label learning. These two problems impede the selection of high-quality features for classification. In comparison with previous MLFS approaches, ESRFS has two main superiority. First, ESRFS achieves higher sparse personalized features than LASSO-norm. Second, ESRFS can identify low redundancy shared common features with strong discrimination by introducing a novel regularization term. To effectively and efficiently identify the most optimal feature subset, an alternating-multiplier-based rule is introduced to optimize ESRFS. Experimental results on fifteen multi-label data sets show that ESRFS can achieve obvious superior performance compared to eight state-of-the-art MLFS approaches in 80%, 80%, 73.3%, 80%, 86.7%, 80% cases based on Hamming Loss, Zero-One Loss using MLkNN, Micro- $F_{1}$ and Macro- $F_{1}$ using SVM as well as Micro- $F_{1}$ and Macro- $F_{1}$ using 3NN perspectives.
Multi-label feature selection by strongly relevant label gain and label mutual aid
2024, Pattern Recognition
Multi-label feature selection, which addresses the challenge of high dimensionality in multi-label learning, has wide applicability in pattern recognition, machine learning, and related domains. Most existing studies on multi-label feature selection assume that all labels have the same importance with respect to features, however, they overlook the differences between labels and candidate features relative to selected features and the internal influence of the label space. To address this issue, we propose a novel method for multi-label feature selection that accounts for both the strongly relevant label gain and the label mutual aid. Firstly, we advance two new potential relationships between labels and candidate features relative to selected features, and the label discriminant function is introduced. Secondly, the mutual aid information between labels is presented to describe the internal correlation of the label space. Thirdly, the concept of strongly relevant label gain is defined based on the label discriminant function, which allows better exploration of positive correlation between features. Finally, the experimental results on sixteen multi-label benchmark datasets indicate that the proposed method outperforms other compared representative multi-label feature selection methods.
Gradient-based multi-label feature selection considering three-way variable interaction
2024, Pattern Recognition
Nowadays, Multi-Label Feature Selection (MLFS) attracts more and more attention to tackle the high-dimensional problem in multi-label data. A key characteristic of existing gradient-based MLFS methods is that they typically consider two-way variable correlations between features and labels, including feature-feature and label-label correlations. However, two-way correlations are not sufficient to steer feature selection since such correlations vary given different additional variables in practical scenarios, which leads to the selected features with relatively-poor classification performance. Motivated by this, we capture three-way variable interactions including feature-feature-label and feature-label-label interactions to further characterize the fluctuating correlations in the context of another variable, and propose a new gradient-based MLFS approach incorporating the above three-way variable interactions into a global optimization objective. Specifically, based on information theory, we develop second-order regularization penalty terms to regard three-way interactions while jointly combining with the main loss term in regard to feature relevance. Then the objective function can be efficiently optimized via a block-coordinate gradient descent schema. Meanwhile, we provide a theoretical analysis demonstrating the effectiveness of the regularization terms in exploiting three-way interaction. In addition, experiments conducted on a series of benchmark data sets also verify the validity of the proposed method on multiple evaluation metrics.
Cost-constrained feature selection in multilabel classification using an information-theoretic approach
2023, Pattern Recognition
Feature selection is one of the key steps in building a predictive model in multi-label classification. However, most of the existing methods do not take into account information about the costs associated with considered features, such as the costs of performing diagnostic medical tests. We consider a problem of cost-constrained multilabel feature selection, which aims to select a feature subset relevant to multiple labels while satisfying a user-specific maximal admissible budget. This approach allows for building a model with high predictive power, for which the cost of making a prediction for a single instance does not exceed the user-specified budget. In this problem, the balance between the feature subset relevance and its cost should be considered concurrently, which is nontrivial in practice because their optimal balance is unknown. In this paper, we propose a novel criterion for cost-constrained multilabel feature selection that combines the relevance and cost of the candidate feature. The relevance measure is derived using the lower bound of the mutual information between the feature subset and label vector. Moreover, we propose an effective method for determining the cost-factor value that controls the trade-off between relevancy and cost. The experimental results on multilabel datasets with various characteristics demonstrate the superiority of the proposed method over conventional methods.

View all citing articles on Scopus

Guixia Liu received her M.S. and Ph.D. degrees in Computer Science from Jilin University in 1996 and 2007. She is currently a professor and doctoral supervisor at the College of Computer Science and Technology, Jilin University, China. Her research areas are machine learning.

Wanfu Gao received his B.E. and Ph.D. degrees in the College of Computer Science from Jilin University in 2013 and 2019. He is doing post-doctoral research in the College of Chemistry in Jilin University. His research interests include feature selection and information theory.

View full text

Distinguishing two types of labels for multi-label feature selection

Highlights

Abstract

Introduction

Section snippets

Preliminaries

Related work

Proposed multi-label feature selection method

Experimental results and analysis

Conclusions and future work

Acknowledgements

Artif. Intell.

Pattern Recognit.

Pattern Recognit.

Pattern Recognit. Lett

Neurocomputing

Appl. Soft Comput.

Inf. Sci.

Inf. Sci.

Neurocomputing

Pattern Recognit.

Expert Syst. Appl.

Pattern Recognit Lett

Expert Syst. Appl.

Neurocomputing

Pattern Recognit.

Pattern Recognit. Lett

Pattern Recognit.

Inf. Sci.

ACM Trans. Knowl. Dis. Data (TKDD)

Pattern Recognit.

Expert Syst. Appl.

Pattern Recognit.

Multi-label learning by exploiting label correlations locally

Twenty-Sixth AAAI Conference on Artificial Intelligence