skip to main content
10.1145/2783258.2783309acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Dimensionality Reduction Via Graph Structure Learning

Published:10 August 2015Publication History

ABSTRACT

We present a new dimensionality reduction setting for a large family of real-world problems. Unlike traditional methods, the new setting aims to explicitly represent and learn an intrinsic structure from data in a high-dimensional space, which can greatly facilitate data visualization and scientific discovery in downstream analysis. We propose a new dimensionality-reduction framework that involves the learning of a mapping function that projects data points in the original high-dimensional space to latent points in a low-dimensional space that are then used directly to construct a graph. Local geometric information of the projected data is naturally captured by the constructed graph. As a showcase, we develop a new method to obtain a discriminative and compact feature representation for clustering problems. In contrast to assumptions used in traditional clustering methods, we assume that centers of clusters should be close to each other if they are connected in a learned graph, and other cluster centers should be distant. Extensive experiments are performed that demonstrate that the proposed method is able to obtain discriminative feature representations yielding superior clustering performance, and correctly recover the intrinsic structures of various real-world datasets including curves, hierarchies and a cancer progression path.

References

  1. R. K. Ando and T. Zhang. A framework for learning predictive structures from multiple tasks and unlabeled data. JMLR, 6:1817--1853, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Belkin and P. Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. In NIPS, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft. When is "nearest neighbor"meaningful? In Database Theory-ICD'99, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. M. Bishop, M. Svensén, and C. K. I. Williams. GTM: the generative topographic mapping. Neural Comput, 10(1):215--234, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Boyd and L. Vandenberghe. Conex Optimization. Cambridge University Press, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. J. C. Burges. Dimension reduction: a guided tour. FTML, 2(4):275--365, 2009.Google ScholarGoogle Scholar
  7. L. Cayton. Algorithms for manifold learning. Technical report, UCSD, 2005.Google ScholarGoogle Scholar
  8. Y. Cheng. Mean shift, mode seeking, and clustering. IEEE T-PAMI, 17(8):790--799, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Cheung. Minimum-cost spanning trees. http://people.orie.cornell.edu/dpw/orie6300fall2008/Recitations/rec09.pdf.Google ScholarGoogle Scholar
  10. C. Curtis, S. P. Shah, S. Chin, et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature, 486(7403):346--352, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  11. C. Ding and X. He. K-means clustering via principal component analysis. In ICML, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. E. Erwin, K. Obermayer, and K. Schulten. Self-organizing maps: ordering, convergence properties and energy functions. Biol Cybern, 67:47--55, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Filippone, F. Camastra, F. Masulli, and S. Rovetta. A survey of kernel and spectral methods for clustering. Pattern Recogn, 41:176--190, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Gorban, B. Kégl, D. Wunsch, and A. Zinovyev. Principal Manifolds for Data Visualisation and Dimension Reduction, volume 58. Springer, 2007. Google ScholarGoogle ScholarCross RefCross Ref
  15. M. Greaves and C. C. Maley. Clonal evolution in cancer. Nature, 481(7381):306--313, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  16. T. Hastie and W. Stuetzle. Principal curves. JASA, 84:502--516, 1989.Google ScholarGoogle ScholarCross RefCross Ref
  17. R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, 2012. Google ScholarGoogle ScholarCross RefCross Ref
  18. J. T. Jolliffe. Principal Component Analysis. Springer-Verlag, Berlin, 1986.Google ScholarGoogle ScholarCross RefCross Ref
  19. B. Kégl and A. Kryzak. Piecewise linear skeletonization using principal curves. IEEE T-PAMI, 24(1):59--74, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. T. Kohonen. Self-organizing Maps. Springer, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Q. Mao, I. Tsang, S. Gao, and L. Wang. Generalized multiple kernel learning with data-dependent priors. IEEE T-NNLS, 29(6):1134--1148, 2015.Google ScholarGoogle Scholar
  22. Q. Mao, L. Yang, L. Wang, S. Goodison, and Y. Sun. SimplePPT: A simple principal tree algorithm. In SDM, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  23. J. Parker, M. Mullins, M. Cheang, et al. Supervised risk predictor of breast cancer based on intrinsic subtypes. J Clin Oncol, 27(8):1160--1167, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  24. M. Radovanović, A. Nanopoulos, and M. Ivanović. Hubs in space: popular nearest neighbors in high-dimensional data. JMLR, 11:2487--2531, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323--2326, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  26. L. Saul and S. Roweis. Think globally, fit locally: unsupervised learning of low dimensional mainfolds. JMLR, 4:119--155, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. B. Schölkopf, A. Smola, and K. Muller. Kernel principal component analysis. Advances in Kernel Methods - Support Vector Learning, pages 327--352, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. J. Smola, S. Mika, B. Schölkopf, and R. C. Williamson. Regularized principal manifolds. JMLR, 1:179--209, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. L. Song, A. Smola, A. Gretton, and K. Borgwardt. A dependence maximization view of clustering. In ICML, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Y. Sun, J. Yao, N. Nowak, and S. Goodison. Cancer progression modeling using static sample data. Genome Biol, 15(8):440, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  31. J. B. Tenenbaum, V. deSilva, and J. C. Landford. A global geometric framework for nonlinear dimensionality reduction. Science, 290:2319--2323, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  32. R. Tibshirani. Principal curves revisited. Stat Comput, 2:183--190, 1992.Google ScholarGoogle ScholarCross RefCross Ref
  33. K. Q. Weinberger and L. K. Saul. An introduction to nonlinear dimensionality reduction by maximum variance unfolding. In AAAI, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Yao, Q. Mao, S. Goodison, V. Mai, and Y. Sun. Feature selection for unsupervised learning through local learning. Pattern Recogn Lett, 53:100--107, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Dimensionality Reduction Via Graph Structure Learning

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
        August 2015
        2378 pages
        ISBN:9781450336642
        DOI:10.1145/2783258

        Copyright © 2015 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 10 August 2015

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        KDD '15 Paper Acceptance Rate160of819submissions,20%Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

        KDD '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader