Semi-supervised deep embedded clustering

doi:10.1016/j.neucom.2018.10.016

Neurocomputing

Volume 325, 24 January 2019, Pages 121-130

https://doi.org/10.1016/j.neucom.2018.10.016 Get rights and content

Abstract

Clustering is an important topic in machine learning and data mining. Recently, deep clustering, which learns feature representations for clustering tasks using deep neural networks, has attracted increasing attention for various clustering applications. Deep embedded clustering (DEC) is one of the state-of-the-art deep clustering methods. However, DEC does not make use of prior knowledge to guide the learning process. In this paper, we propose a new scheme of semi-supervised deep embedded clustering (SDEC) to overcome this limitation. Concretely, SDEC learns feature representations that favor the clustering tasks and performs clustering assignments simultaneously. In contrast to DEC, SDEC incorporates pairwise constraints in the feature learning process such that data samples belonging to the same cluster are close to each other and data samples belonging to different clusters are far away from each other in the learned feature space. Extensive experiments on real benchmark data sets validate the effectiveness and robustness of the proposed method.

Introduction

Clustering is one of very extensively studied topics in artificial intelligence and enjoys a wide range of applications, ranging from document analysis [1], [2], regional science [3], image retrieval [4], [5], [6], annotation [7], segmentation [8], to network analysis [9], [10], [11]. In the past few decades, many clustering algorithms have been proposed, including k-means [12], hierarchical clustering [13], DBSCAN [14], Gaussian mixture model [15], non-negative matrix factorization based clustering methods [16], [17], [18], [19], mean shift clustering [20], [21], [22], consensus clustering [23], [24], [25], [26], graph-based clustering [27], [28], and so on. Despite being studied extensively, the performance of traditional clustering methods generally deteriorates with high dimensional data due to unreliable similarity metrics, a phenomenon known as the curse of dimensionality.

To mitigate the curse of dimensionality, a common way is to transform data from a high dimensional feature space to a lower one by applying dimension reduction techniques like principle component analysis (PCA) or feature selection methods [29], [30]. Then, clustering is performed in the lower dimensional feature space. However, this scheme ignores the interconnection between features learning and clustering. To address this issue, the work [31] proposes to perform clustering and feature learning simultaneously by integrating k-means and linear discriminant analysis (LDA) into a joint framework. Nevertheless, the representation ability of features learned by these shallow models is limited [32].

In recent years, deep neural networks (DNN) that own better representation ability have been broadly applied in many machine learning tasks [33], [34], [35], [36], [37]. Lately, some work has been done to successfully apply deep neural networks in clustering tasks [38], [39], [40], [41], [42]. The resulting model is called deep clustering. Peng et al. [38] and Tian et al. [39] divide the deep clustering into two phases, i.e., feature transformation using DNN and clustering. In contrast, feature mapping and clustering are jointly learned in [40]. Xie et al. [40] propose deep embedded clustering (DEC) to learn a mapping from the high original feature space to a lower-dimensional one in which an effective objective is optimized. Yang et al. [41] and Chang et al. [42] make use of deep convolutional neural network (CNN) for image data clustering.

Traditional clustering methods refer to unsupervised settings. But, in many real machine learning and computer vision tasks, we know some prior knowledge such as pairwise constraints a-prior. There are typically two kinds of pairwise constraints: must-link constraints and cannot-link constraints. Must-link constraints specify that two instances are known to be in the same cluster in advance, while cannot-link constraints indicates the corresponding two instances belong to different clusters. Semi-supervised learning can use these constraints to improve the learning ability and has produced a huge impact over various machine learning applications [43]. Lately, a number of semi-supervised clustering (SSC) methods which take advantage of pairwise constraints have been developed [44], [45], [46], [47]. Obviously, despite its success in clustering, DEC is not able to make use of such prior information to guide the clustering process and to further enhance the clustering performance.

To address this issue, we propose semi-supervised deep embedded clustering (SDEC) that incorporates semi-supervised information in DEC to further improve its effectiveness. By integrating pairwise constraints, SDEC considerably improves the quality of clustering results over DEC. Specifically, SDEC makes use of pairwise constraints in the feature learning process such that data samples from the same cluster are enforced to be close to each other and data samples from different clusters are enforced to be far away from each other in the learned feature space where the final cluster assignment is conducted.

The contributions of this paper are summarized below:

•
We propose a new semi-supervised clustering scheme SDEC which simultaneously learns feature transformation and cluster assignment jointly by integrating with pairwise constraints. A joint objective considering both the unlabeled data and prior information is developed.
•
By leveraging the prior knowledge of pairwise constraints, SDEC significantly improves the clustering performance of the state-of-the-art DEC. The proposed method is also robust to the choice of parameters.
•
SDEC can address the curse of dimensionality and is effective in clustering high-dimensional data. Experimental results on real image and document data sets demonstrate its effectiveness and robustness.

The rest of this paper is organized as follows: Section 2 reviews related work and Section 3 introduces the proposed method. Sections 4 and 5 present the experimental settings and the empirical results, respectively. Conclusion and future work are provided in Section 6.

Section snippets

Deep clustering

Deep clustering is a new category of clustering that has arisen in recent years. Inspired by the similarity between eigen-decomposition in spectral methods and auto-encoder [48] in learning lower-dimensional representation, Tian et al. [39] were the first to introduce deep neural network in the field of clustering, which simply combines a nonlinear embedding of the original graph and k-means algorithm in the embedding space. Chen [49] learns representation using a deep belief network and then

Semi-supervised deep embedded clustering

This section elucidates the proposed semi-supervised deep embedded clustering (SDEC) with pairwise constraints. Consider a data set X of n unlabeled samples ${x_{i} \in R^{d}}_{i = 1}^{n}$ where d is the dimension. The set of initial must-link constraints is denoted by $M L = {(x_{i}, x_{j}) : x_{i} and x_{j} belong to the same cluster}$ and the set of cannot-link constraints is $C L = {(x_{i}, x_{j}) : x_{i} and x_{j} belong to different clusters},$ 1 ≤ i, j ≤ N. The number of cluster K is chosen according to prior knowledge, each cluster is represented by a center

Experimental setup

In this section, we first introduce the data sets used in our experiments. Then we describe the implementation in detail, including experiment setup of our algorithm, comparing methods and evaluation metrics.

Results on real data

When applying semi-supervised methods on each data set, the number of total pairwise constraints is set to n (the number of data points). The parameter λ of SDEC is set to $10^{- 5}$ . Tables 2–4 show the clustering results measured by ACC, NMI, and ARI, respectively. In each row, the best and comparable results are highlighted in boldface. To save space, the standard deviations (std) are not reported. In fact, the std values of SDEC are pretty small (i.e., SDEC obtains std values of 0.05%, 0.24%,

Conclusion and future work

In this paper, we propose a semi-supervised deep embedded clustering (SDEC) model. SDEC incorporates pairwise constraints to guide the process of feature learning, ensuring that must-link examples are close and cannot-link examples are distinct in the learned feature space. Both KL divergence loss and semi-supervised loss are jointly optimized in the semi-supervised deep clustering framework to gain the deep representation for clustering. Extensive experiments on real image and document data

Acknowledgments

This paper was in part supported by Grants from the Natural Science Foundation of China (Nos. 61806043, 61572111, and 61832001), two Projects funded by China Postdoctoral Science Foundation (Nos. 2016M602674 and 2017M623007), a 985 Project of UESTC (No. A1098531023601041), three Fundamental Research Funds for the Central Universities of China (Nos. ZYGX2016J078, ZYGX2016Z003, and ZYGX2016J034). This research is supported by the National Research Foundation Singapore under its AI Singapore

Yazhou Ren is a Lecturer in the School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China. He received his B.Sc. degree in Information and Computation Science and Ph.D. degree in Computer Science from South China University of Technology, Guangzhou, China in 2009 and 2014, respectively. He visited the Data Mining Laboratory at George Mason University, USA, for two years from 2012 to 2014. His current research interests include

References (71)

H.K. Kim et al.
Bag-of-concepts: comprehending document representation through clustering words in distributed representation
Neurocomputing
(2017)
S. Huang et al.
Adaptive local structure learning for document co-clustering
Knowl. Based Syst.
(2018)
S. Huang et al.
Self-weighted multi-view clustering with soft capped norm
Knowl. Based Syst.
(2018)
S. Huang et al.
Robust graph regularized nonnegative matrix factorization for clustering
Data Min. Knowl. Discov.
(2018)
Z. Kang et al.
Low-rank kernel learning for graph-based clustering
Knowl. Based Syst.
(2018)
F.D.l. Torre et al.
Discriminative cluster analysis
Proceedings of the International Conference on Machine Learning
(2006)
Y. Ren
Big data clustering and its applications in regional science
Y. Chen et al.
CLUE: cluster-based retrieval of images by unsupervised learning
IEEE Trans. Image Process.
(2005)
S.G. Jacob Goldberger et al.
Unsupervised image-set clustering using an information theoretic framework
IEEE Trans. Image Process.
(2006)
P. Xie et al.
Integrating image clustering and codebook learning
Proceedings of the AAAI Conference on Artificial Intelligence
(2015)

J. Li et al.

Real-time computerized annotation of pictures

IEEE Trans. Pattern Anal. Mach. Intell.

(2008)

J. Shi et al.

Normalized cuts and image segmentation

IEEE Trans. Pattern Anal. Mach. Intell.

(2000)

Z. Xu et al.

Bayesian nonparametric models for multiway data analysis

IEEE Trans. Pattern Anal. Mach. Intell.

(2015)

S. Zhe et al.

Scalable nonparametric multiway data analysis

Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, AISTATS

(2015)

Z. Xu et al.

Variational random function model for network modeling

IEEE Trans. Neural Netw. Learn. Syst.

(2018)

J. MacQueen

Some methods for classification and analysis of multivariate observations

Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability

(1967)

A.K. Jain et al.

Data clustering: a review

ACM Comput. Surv.

(1999)

M. Ester et al.

A density-based algorithm for discovering clusters in large spatial databases with noise

Proceedings of the Second International Conference on Knowledge Discovery and Data Mining

(1996)

Neural Comput.

(2006)

Cited by (0)

Kangrong Hu received his Bachelor degree in Information Management and Information System from University of Electronic Science and technology of China in 2018. His current research interests include semi-supervised learning and clustering.

Xinyi Dai received her Bachelor degree in Information security from University of Electronic Science and Technology of China in 2018. Now she is pursuing her Ph.D Degree in Shanghai Jiao Tong University. Her research interests focus on machine learning, data mining, and recommender system.

Lili Pan received her B.Eng. degree in Electronic Engineering, as well as her M.Eng. and Ph.D. degrees in Information Engineering from University of Electronic Science and Technology of China (UESTC), China, in 2004, 2007, and 2012, respectively. From 2009 to 2011, she visited the Robotics Institute of Carnegie Mellon University, USA. She joined the Department of Information Engineering, UESTC, as a lecturer in 2012. She is currently an associate professor at UESTC.

Steven C.H. Hoi is an associate of the School of Information Systems, Singapore Management University, Singapore. Prior to joining SMU, he was Associate Professor with Nanyang Technological University, Singapore. He received his Bachelor degree from Tsinghua University, PR China, in 2002, and his Ph.D. degree in computer science and engineering from The Chinese University of Hong Kong, in 2006. His research interests are machine learning and data mining and their applications to multimedia information retrieval (image and video retrieval), social media and web mining, and computational finance, etc, and he has published over 150 refereed papers in top conferences and journals in these related areas. He has served as Editor-in-Chief for Neurocomputing Journal, general co-chair for ACM SIGMM Workshops on Social Media (WSM09, WSM10, WSM11), program co-chair for the fourth Asian Conference on Machine Learning (ACML12), book editor for Social Media Modeling and Computing, guest editor for ACM Transactions on Intelligent Systems and Technology (ACM TIST), technical PC member for many international conferences, and external reviewer for many top journals and worldwide funding agencies, including NSF in US and RGC in Hong Kong.

Zenglin Xu received the Ph.D. degree in computer science and engineering from the Chinese University of Hong Kong. He is currently a full professor in University of Electronic Science and Technology of China. He has been working at Michigan State University, Cluster of Excellence at Saarland University and Max Planck Institute for Informatics, and later Purdue University. Dr. Xu’s research interests include machine learning and its applications in information retrieval. He has been elected in the 2013’s China Youth 1000-talent Program. He is the recipient of the outstanding student paper honorable mention of AAAI 2015, the best student paper runner up of ACML 2016, and the 2016 young researcher award from APNNS.

View full text

Semi-supervised deep embedded clustering

Abstract

Introduction

Section snippets

Deep clustering

Semi-supervised deep embedded clustering

Experimental setup

Results on real data

Conclusion and future work

Acknowledgments

Neurocomputing

Knowl. Based Syst.

Knowl. Based Syst.

Data Min. Knowl. Discov.

Knowl. Based Syst.

Big data clustering and its applications in regional science

CLUE: cluster-based retrieval of images by unsupervised learning

IEEE Trans. Image Process.

Unsupervised image-set clustering using an information theoretic framework

IEEE Trans. Image Process.

Integrating image clustering and codebook learning

Proceedings of the AAAI Conference on Artificial Intelligence

Real-time computerized annotation of pictures

IEEE Trans. Pattern Anal. Mach. Intell.

Normalized cuts and image segmentation

IEEE Trans. Pattern Anal. Mach. Intell.

Bayesian nonparametric models for multiway data analysis

IEEE Trans. Pattern Anal. Mach. Intell.

Scalable nonparametric multiway data analysis

Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, AISTATS

Variational random function model for network modeling

IEEE Trans. Neural Netw. Learn. Syst.

Some methods for classification and analysis of multivariate observations

Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability

Data clustering: a review

ACM Comput. Surv.

A density-based algorithm for discovering clusters in large spatial databases with noise

Proceedings of the Second International Conference on Knowledge Discovery and Data Mining

Robust multi-view data clustering with multi-view capped-norm k-means

Neurocomputing

Mean shift: a robust approach toward feature space analysis

IEEE Trans. Pattern Anal. Mach. Intell.

A weighted adaptive mean shift clustering algorithm

Proceedings of the SIAM International Conference on Data Mining

Boosted mean shift clustering

Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases

Cluster ensembles - a knowledge reuse framework for combining multiple partitions

JMLR

Weighted-object ensemble clustering

Proceedings of the IEEE Thirteenth International Conference on Data Mining

Weighted-object ensemble clustering: methods and analysis

Knowl. Inf. Syst.

Distribution-based cluster structure selection

IEEE Trans. Cybern.

Kernel-driven similarity learning

Neurocomputing

Laplacian score for feature selection

Proceedings of the Advances in Neural Information Processing Systems

Local and global structure preserving based feature selection

Neurocomputing

Improved deep embedded clustering with local structure preservation

Proceedings of the International Joint Conference on Artificial Intelligence

Learning deep architectures for AI

Found. Trendsë Mach. Learn.

Representation learning: a review and new perspectives

IEEE Trans. Pattern Anal. Mach. Intell.

A fast learning algorithm for deep belief nets

Neural Comput.