Elsevier

Pattern Recognition

Volume 43, Issue 3, March 2010, Pages 720-730
Pattern Recognition

Multiple view semi-supervised dimensionality reduction

https://doi.org/10.1016/j.patcog.2009.07.015Get rights and content

Abstract

Multiple view data, together with some domain knowledge in the form of pairwise constraints, arise in various data mining applications. How to learn a hidden consensus pattern in the low dimensional space is a challenging problem. In this paper, we propose a new method for multiple view semi-supervised dimensionality reduction. The pairwise constraints are used to derive embedding in each view and simultaneously, the linear transformation is introduced to make different embeddings from different pattern spaces comparable. Hence, the consensus pattern can be learned from multiple embeddings of multiple representations. We derive an iterating algorithm to solve the above problem. Some theoretical analyses and out-of-sample extensions are also provided. Promising experiments on various data sets, together with some important discussions, are also presented to demonstrate the effectiveness of the proposed algorithm.

Introduction

With the rapid increase of high dimensional data, such as digital images, financial time series and web texts, dimensionality reduction has been widely used as a fundamental tool for many data mining tasks. Different from traditional applications, some data may have special characters in some real cases. How to reduce their dimensionality is a big challenge. For example, in some applications, an instance may have multiple views, i.e., there are multiple representations from different feature spaces or graph spaces for an instance [1]. Simultaneously, we may also be given some domain knowledge in the form of pairwise constraints. Nevertheless, the number of constraints is limited since examples are readily available but constructing links between two points is fairly expensive [2]. One typical application is to predict a person's political view (left, right) from his/her online blogs. The dimensionality of blog page is often very high and each page has disparate descriptions: textual content, in-bound and out-bound links, etc. Moreover, the fact that person B quotes person A and uses expletives near the quotation is a strong indication that B disagrees with A [3]. We may create a dissimilarity pair to reflect our knowledge that A and B probably have different political views. On the contrary, if A uses praises, we probably know that they have the same political view.

Together with some domain knowledge, multiple view high dimensional data raise a natural, yet non-standard new problem: how to use the multiple representations and the given domain knowledge to learn a consensus pattern, which is more accurate than the pattern based on a single view. This is the main task of multiple view semi-supervised dimensionality reduction and the consensus pattern should be shared by multiple representations as much as possible. Commonly, since the structure of each view is disparate, it is unwise to regard multiple representations as one view by simply connecting all features. This would treat the representation of each view in the same way and ignore their diversities. Previous approaches have also shown the advantages of multiple view learning [1].

Traditional well-known dimensionality reduction methods, such as principle component analysis (PCA) [4] and locally linear embedding (LLE) [5], are not suitable to reduce the dimensionality of this kind of data, since (1) they are unsupervised and they cannot incorporate the domain knowledge. If they were employed, these algorithms will suffer from a low discriminant power due to their unsupervised nature. (2) It is unsuitable to apply these approaches to directly reduce dimensionality of the data that are formed by connecting representations of each view, since each view may have very different formulations or statistical properties. For example, genes can be represented in the expression vector space (corresponding to the genetic activity) and also in the term vector space (corresponding to the text information) [6].

As far as we know, there is little work dedicating to this topic in the literature. Foster et al. have employed canonical correlation analysis (CCA) approach [7] to derive the low dimensional embeddings of two-view data and compute the regression function based on these embeddings [8]. They focus on different domain knowledge (the exact label) and different problem (regression). There are also some researches for multiple view unsupervised learning and semi-supervised dimensionality reduction (SSDR). We will review them since they have close relationships with this topic.

There are two directions for seeking solutions to multiple view unsupervised learning. One is to design the centralized algorithms that use multiple solutions simultaneously to mine hidden pattern from the data. The top challenge for these approaches is the diversity of multiple representations, since it is difficult to design a single algorithm to accomplish several different tasks which are handled by separate algorithms. The existing efforts usually restrict themselves to special cases. For example, Bickel and Scheffer assumed that the predefined features are independent and can be handled by the same algorithms [9]. Zhou et al. focused on the data that can be well modeled by a mixture of Markov chains [10]. The other direction is distributed, or namely they learn hidden patterns individually from each representation and then compute the consensus hidden pattern from those multiple patterns. Under this framework, the problem of choosing the most appropriate method is left to domain experts and the main challenge is to compare different patterns from different representations, since these patterns exist in different spaces. For multiple view dimensionality reduction, Long et al. have addressed this problem by proposing a unified framework [11]. It is assumed that there exists a linear mapping between the consensus embedding and the individual pattern of each single view. They first learn the individual pattern and then construct the linear mapping. Nevertheless, it does not take domain knowledge into consideration. In summary, we will face the first problem, traditional multiple view methods cannot be directly used to solve multiple view semi-supervised dimensionality reduction problem.

For semi-supervised dimensionality reduction, considering the types of prior knowledge, we can classify SSDR approaches into three categories. (1) The first kind of approaches adopts the pre-defined low dimensional representations of several points. Typical method is proposed by Yang et al. [12]. They first reformulated several typical nonlinear dimensionality reduction techniques into a common problem and then constructed a linear relationship between the pre-defined embeddings and the unknown embeddings by minimizing above common problem. (2) Methods of the second type employ domain knowledge in the form of pairwise constraints. Zhang et al. [13] first specified whether a pair of instances belongs to the same class (must-link constraint) or different classes (cannot-link constraint) and then computed linear transformations by maximizing distances between the cannot-link pairs and minimizing distances between the must-link pairs. (3) The third type of SSDR methods uses label information directly. They employ labeled and unlabeled data points to construct a weight graph. Typical method may include semi-supervised discriminant analysis (SDA) [14], which is an extension of linear discriminant analysis (LDA) [4]. It used the labeled data points to maximize the separability between different classes and the unlabeled data points to estimate the intrinsic geometric structure of the data. However, these approaches do not concern about the disparate structures of multiple view high dimensional data, they treat all views of a point in the same way. Traditional semi-supervised multiple learning approaches have also shown that treating multiple representations instead of connecting all features into one view can improve classification performance [1], [15]. In summary, we will face the second problem, it seems that all of these semi-supervised dimensionality reduction methods are not suitable for multiple view data, since they all focus on data points with a single view.

In this paper, we will propose a new approach named multiple view semi-supervised dimensionality reduction (MVSSDR) to solve the above-mentioned problems. We also focus on the domain knowledge in the form of pairwise constraints since it arises naturally in some applications. The embedding for each view is computed and simultaneously the transformation from consensus pattern to representations of each view is constructed. The consensus pattern can be effectively computed by alternative optimization [16]. We prove theoretically that the objective function is non-increasing and SSDR approach in [13] can be regarded as a special case of our method. We also provide an easy method for extending MVSSDR to out-of-sample data points. Experiments on different real data sets are presented to show the effectiveness of our method. Finally, we provide some basic discussions about multiple view dimensionality reduction.

It is worthwhile to highlight several aspects of the proposed approach here:

  • To the best of our knowledge, our approach is the first one to address this kind of multiple view semi-supervised dimensionality reduction problem.

  • Comparing with traditional multiple view learning approaches, MVSSDR performs better since it could effectively use the domain knowledge. Comparing with traditional dimensionality reduction methods (e.g. PCA, LDA and SSDR), which can only be used by connecting representations of all views, MVSSDR also performs better since it concerns about the disparate structures and different statistical properties of different views.

  • The proposed method can be effectively solved by alternative optimization. We prove that the objective function is non-increasing under the updating rules and hence the convergence of our algorithm is guaranteed. We also provide some discussions about how to use MVSSDR effectively.

  • MVSSDR contains other methods, such as SSDR, as special cases. It is also easy to extend MVSSDR to out-of-sample data set.

The remainder of this paper is organized as follows. In Section 2, we will briefly review the SSDR approach in [13]. Section 3 will show MVSSDR algorithm in detail. The analysis and extension of MVSSDR will be proposed in Section 4. Section 5 presents experimental results on real-world data sets and some important discussions. The conclusions and future works are in Section 6.

Section snippets

Notations and related works

In this section, we will briefly review SSDR method. Let us introduce some notations first. A set of n data points in RD is represented by X={x1,x2,,xn}. For each column vector xkX, there are l views, i.e., xk=[xk(1)T,xk(2)T,,xk(l)T]T with column vector xk(i)RDi, i=1lDi=D. For each view v, there are some pairwise must-link constraints Mv and cannot-link constraints Cv, i.e., instances involved by Mv should be close while instances involved by Cv should be far away in the view v. Here Mv is

The algorithm

In this section, we will formally present our MVSSDR algorithm, which aims to discover the consensus low dimensional representations of multiple view points.

Analysis and extensions

First, we will prove that the optimal Pv, which minimizes the objective function in Eq. (8) can be determined by Eq. (10). Second, the convergence behavior of our algorithm is shown. Then, the relationship between our method and SSDR, the relationship between MVSSDR and the method proposed in [11] are also analyzed. Finally, we give a simple way to extend MVSSDR to the out-of-sample data.

Experiments and discussions

In this section, several experiments are performed for illustration. There are mainly four commonly used data, the WebKB data set1 [17], the Internet advertisements data set,2 the 20-Newsgroups data set3 and the Sonar data set.4 After reducing the dimensionality, we employ K

Conclusions and future works

In this paper, a novel semi-supervised dimensionality reduction approach MVSSDR was proposed. To the best of our knowledge, this is the first approach to address this problem. It aims to derive the consensus pattern for multiple view data with domain knowledge. In each view, we derive the embedding that preserves the intrinsic structure of the data as well as the must-link and cannot-link constraints. We also provide an iterative algorithm to solve this problem. The linear transformation

Acknowledgments

The authors thank two anonymous reviewers for their constructive suggestions. Thanks to Daoqiang Zhang for providing data. Thanks also to the National Basic Research Program of China under Grant nos. 2005CB321800 and 2009CB320602, National Natural Science Foundation of China, under Grant no. 60673090, the Hunan Provincial Innovation Foundation for Postgraduate for their supports.

About the Author—CHENPING HOU received his BS degree in applied mathematics from the National University of Defense Technology, Changsha, China, in 2004. He is now a PhD candidate in the Department of Mathematics and System Science. He worked as a Visiting Student at Tsinghua University since May 2008. His current research interests include dimensionality reduction, manifold learning, and semi-supervised learning.

References (20)

  • S. Rüping, T. Scheffer, Learning with multiple views, in: ICML Workshop on Learning with Multiple Views,...
  • X. Zhu, Semi-supervised learning literature survey, Technical Report 1530, Department of Computer Sciences, University...
  • T. Mullen, R. Malouf, A preliminary investigation into sentiment analysis for informal political discourse, in:...
  • A.M. Martinez et al.

    PCA versus LDA

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2001)
  • L. Saul et al.

    Think globally, fit locally: unsupervised learning of low dimensional manifolds

    Journal of Machine Learning Research

    (2003)
  • P. Glenisson et al.

    Metaclustering of gene expression data and literature-based information

    SIGKDD Explorations Newsletter

    (2003)
  • H. Hotelling, The most predictable criterion, Journal of Educational Psychology 26 (1935)...
  • D. Foster, S. Kakade, T. Zhang, Multi-view dimensionality reduction via canonical correlation analysis, TTI-C Technical...
  • S. Bickel, T. Scheffer, Multi-view clustering, in: ICDM 04, 2004, p....
  • D. Zhou, C. Burges, Spectral clustering and transductive learning with multiple views, in: ICML07, 2007, pp....
There are more references available in the full text version of this article.

Cited by (111)

  • Multi-view unsupervised dimensionality reduction with probabilistic neighbors

    2022, Neurocomputing
    Citation Excerpt :

    Multi-view Unsupervised Dimensionality Reduction (MUDR) is a technique that helps explore the statistical characteristics of multi-view data without using label information, and finds low-dimensional subspaces that preserves the inherent structure of original data. It is a promising way to solve aforementioned problems and plays an important role in various research areas, including machine learning, pattern recognition and computer vision [8–10]. Traditional dimensionality reduction algorithms, such as Principle Component Analysis (PCA) [11], isometric mapping [12], Locality Linear Embedding (LLE) [13], Laplacian Eigenmaps (LE) [14], Hessian eigenmaps [15] and Locality Preserving Projections (LPP) [16] have been widely used in many fields over the last decades.

View all citing articles on Scopus

About the Author—CHENPING HOU received his BS degree in applied mathematics from the National University of Defense Technology, Changsha, China, in 2004. He is now a PhD candidate in the Department of Mathematics and System Science. He worked as a Visiting Student at Tsinghua University since May 2008. His current research interests include dimensionality reduction, manifold learning, and semi-supervised learning.

About the Author—CHANGSHUI ZHANG received his BS degree in Mathematics from Peking University, China, in 1986, and PhD degree from Department of Automation, Tsinghua University, in 1992. He is currently a Professor of Department of Automation, Tsinghua University. He is an Associate Editor of the journal Pattern Recognition. His interests include artificial intelligence, image processing, pattern recognition, machine learning, evolutionary computation and complex system analysis.

About the Author—YI WU is a Professor in the Department of Mathematics and System Science at the National University of Defense Technology in Changsha, China. He earned his Bachelor's and Master's degrees in Applied Mathematics at the National University of Defense Technology in 1981 and 1988. He worked as a Visiting Researcher at New York State University in 1999. His research interests include applied mathematics, statistics, and data processing.

About the Author—FEIPING NIE received his BS degree from the Department of Computer Science, North China University of Water Conservancy and Electric Power, China, in 2000, and MS degree from the Department of Computer Science, Lanzhou University, China, in 2003. He is currently a PhD candidate in the Department of Automation, Tsinghua University. His research interests focus on machine learning and its applications.

View full text