skip to main content
research-article

PathSelClus: Integrating Meta-Path Selection with User-Guided Object Clustering in Heterogeneous Information Networks

Authors Info & Claims
Published:01 September 2013Publication History
Skip Abstract Section

Abstract

Real-world, multiple-typed objects are often interconnected, forming heterogeneous information networks. A major challenge for link-based clustering in such networks is their potential to generate many different results, carrying rather diverse semantic meanings. In order to generate desired clustering, we propose to use meta-path, a path that connects object types via a sequence of relations, to control clustering with distinct semantics. Nevertheless, it is easier for a user to provide a few examples (seeds) than a weighted combination of sophisticated meta-paths to specify her clustering preference. Thus, we propose to integrate meta-path selection with user-guided clustering to cluster objects in networks, where a user first provides a small set of object seeds for each cluster as guidance. Then the system learns the weight for each meta-path that is consistent with the clustering result implied by the guidance, and generates clusters under the learned weights of meta-paths. A probabilistic approach is proposed to solve the problem, and an effective and efficient iterative algorithm, PathSelClus, is proposed to learn the model, where the clustering quality and the meta-path weights mutually enhance each other. Our experiments with several clustering tasks in two real networks and one synthetic network demonstrate the power of the algorithm in comparison with the baselines.

References

  1. Airoldi, E. M., Blei, D. M., Fienberg, S. E., and Xing, E. P. 2008. Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9, 1981--2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Banerjee, A., Merugu, S., Dhillon, I. S., and Ghosh, J. 2005. Clustering with Bregman divergences. J. Mach. Learn. Res. 6, 1705--1749. Google ScholarGoogle ScholarCross RefCross Ref
  3. Bar-Hillel, A., Hertz, T., Shental, N., and Weinshall, D. 2005. Learning a Mahalanobis metric from equivalence constraints. J. Mach. Learn. Res. 6, 937--965. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Basu, S., Banerjee, A., and Mooney, R. 2002. Semi-supervised clustering by seeding. In Proceedings of the 19th International Conference on Machine Learning (ICML). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Basu, S., Bilenko, M., and Mooney, R. J. 2004. A probabilistic framework for semi-supervised clustering. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 59--68. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bilenko, M., Basu, S., and Mooney, R. J. 2004. Integrating constraints and metric learning in semi-supervised clustering. In Proceedings of the 21st International Conference on Machine Learning (ICML). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Cohn, D. and Chang, H. 2000. Learning to probabilistically identify authoritative documents. In Proceedings of the 17th International Conference on Machine Learning (ICML). 167--174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Dhillon, I. S., Mallela, S., and Kumar, R. 2003. A divisive information-theoretic feature clustering algorithm for text classification. J. Mach. Learn. Res. 3, 1265--1287. Google ScholarGoogle ScholarCross RefCross Ref
  9. Guyon, I. and Elisseeff, A. 2003. An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157--1182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Hofmann, T. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 50--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Kulis, B., Basu, S., Dhillon, I., and Mooney, R. 2005. Semi-supervised graph clustering: A kernel approach. In Proceedings of the 22nd International Conference on Machine Learning (ICML). 457--464. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Lee, D. D. and Seung, H. S. 2000. Algorithms for non-negative matrix factorization. In Advances in Neural Information Processing Systems 13 (NIPS), 556--562.Google ScholarGoogle Scholar
  13. Long, B., Zhang, Z. M., Wu, X., and Yu, P. S. 2006. Spectral clustering for multi-type relational data. In Proceedings of the 23rd International Conference on Machine Learning (ICML). 585--592. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Long, B., Zhang, Z. M., and Yu, P. S. 2007. A probabilistic framework for relational clustering. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 470--479. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Luxburg, U. 2007. A tutorial on spectral clustering. Statist. Comput. 17, 395--416. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Newman, M. E. J. 2004. Fast algorithm for detecting community structure in networks. Phys. Rev. E 69, 6.Google ScholarGoogle Scholar
  17. Newman, M. E. J. and Girvan, M. 2004. Finding and evaluating community structure in networks. Phys. Rev. E 69, 2.Google ScholarGoogle ScholarCross RefCross Ref
  18. Punera, K. and Ghosh, J. 2008. Consensus-based ensembles of soft clusterings. Appl. Artif. Intell. 22, 780--810. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Shi, J. and Malik, J. 1997. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22, 888--905. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Strehl, A., Ghosh, J., and Cardie, C. 2002. Cluster ensembles---A knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583--617. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Sun, Y., Han, J., Zhao, P., Yi, N. Z., Cheng, H., and Wu, T. 2009a. Rankclus: Integrating clustering with ranking for heterogeneous information network analysis. In Proceedings of the 12th International Conference on Extending Database Technology (EDBT). 565--576. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Sun, Y., Yu, Y., and Han, J. 2009b. Ranking-based clustering of heterogeneous information networks with star network schema. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 797--806. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Sun, Y., Han, J., Yan, X., Yu, P. S., and Wu, T. 2011. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. In Proceedings of International Conference on Very Large Data Bases (VLDB). 992--1003.Google ScholarGoogle Scholar
  24. Sun, Y., Norick, B., Han, J., Yan, X., Yu, P. S., and Yu, X. 2012. Integrating meta-path selection with user guided object clustering in heterogeneous information networks. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 1348--1356. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Wang, F., Li, T., Wang, X., Zhu, S., and Ding, C. 2011. Community discovery using nonnegative matrix factorization. Data Min. Knowl. Discov. 22, 3, 493--521. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Wang, N., Parthasarathy, S., Tan, K.-L., and Tung, A. K. H. 2008. CSV: Visualizing and mining cohesive subgraphs. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). 445--458. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Xu, X., Yuruk, N., Feng, Z., and Schweiger, T. A. J. 2007. Scan: A structural clustering algorithm for networks. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 824--833. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Xu, Z., King, I., Lyu, M. R.-T., and Jin, R. 2010. Discriminative semi-supervised feature selection via manifold regularization. Trans. Neur. Netw. 21, 1033--1047. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Yin, X., Han, J., and Yu, P. S. 2007. Crossclus: User-guided multirelational clustering. Data Min. Knowl. Discov. 15, 3, 321--348. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Zhao, Z. and Liu, H. 2007. Semi-supervised feature selection via spectral analysis. In Proceedings of the 7th SIAM International Conference on Data Mining (SDM).Google ScholarGoogle Scholar
  31. Zhu, X. and Ghahramani, Z. 2002. Learning from labeled and unlabeled data with label propagation. Tech. rep. CMU-CALD-02-107, Carnegie Mellon University.Google ScholarGoogle Scholar
  32. Zhu, X., Ghahramani, Z., and Lafferty, J. D. 2003. Semi-Supervised learning using Gaussian fields and harmonic functions. In Proceedings of the 20th International Conference (ICML). 912--919.Google ScholarGoogle Scholar

Index Terms

  1. PathSelClus: Integrating Meta-Path Selection with User-Guided Object Clustering in Heterogeneous Information Networks

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Knowledge Discovery from Data
      ACM Transactions on Knowledge Discovery from Data  Volume 7, Issue 3
      Special Issue on ACM SIGKDD 2012
      September 2013
      156 pages
      ISSN:1556-4681
      EISSN:1556-472X
      DOI:10.1145/2513092
      Issue’s Table of Contents

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 September 2013
      • Accepted: 1 December 2012
      • Received: 1 September 2012
      Published in tkdd Volume 7, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader