skip to main content
10.1145/1516360.1516426acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article
Free Access

RankClus: integrating clustering with ranking for heterogeneous information network analysis

Published:24 March 2009Publication History

ABSTRACT

As information networks become ubiquitous, extracting knowledge from information networks has become an important task. Both ranking and clustering can provide overall views on information network data, and each has been a hot topic by itself. However, ranking objects globally without considering which clusters they belong to often leads to dumb results, e.g., ranking database and computer architecture conferences together may not make much sense. Similarly, clustering a huge number of objects (e.g., thousands of authors) in one huge cluster without distinction is dull as well.

In this paper, we address the problem of generating clusters for a specified type of objects, as well as ranking information for all types of objects based on these clusters in a multi-typed (i.e., heterogeneous) information network. A novel clustering framework called RankClus is proposed that directly generates clusters integrated with ranking. Based on initial K clusters, ranking is applied separately, which serves as a good measure for each cluster. Then, we use a mixture model to decompose each object into a K-dimensional vector, where each dimension is a component coefficient with respect to a cluster, which is measured by rank distribution. Objects then are reassigned to the nearest cluster under the new measure space to improve clustering. As a result, quality of clustering and ranking are mutually enhanced, which means that the clusters are getting more accurate and the ranking is getting more meaningful. Such a progressive refinement process iterates until little change can be made. Our experiment results show that RankClus can generate more accurate clusters and in a more efficient way than the state-of-the-art link-based clustering methods. Moreover, the clustering results with ranks can provide more informative views of data compared with traditional clustering.

References

  1. J. Bilmes. A gentle tutorial on the em algorithm and its application to parameter estimation for gaussian mixture and hidden markov models, 1997.Google ScholarGoogle Scholar
  2. S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst., 30(1--7):107--117, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. R. Cutting, D. R. Karger, J. O. Pedersen, and J. W. Tukey. Scatter/gather: a cluster-based approach to browsing large document collections. pages 318--329, 1992.Google ScholarGoogle Scholar
  4. DBLP. The dblp computer science bibliography. http://www.informatik.uni-trier.de/ley/db/.Google ScholarGoogle Scholar
  5. J. E. Gentle and W. HSrdle. Handbook of Computational Statistics: Concepts and Methods, chapter 7 Evaluation of Eigenvalues, pages 245--247. Springer, 1 edition, 2004.Google ScholarGoogle Scholar
  6. C. L. Giles. The future of citeseer. In 10th European Conference on PKDD (PKDD'06), page 2, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Z. Gyöngyi, H. Garcia-Molina, and J. Pedersen. Combating web spam with trustrank. In Proceedings of the Thirtieth international conference on Very large data bases (VLDB'04), pages 576--587. VLDB Endowment, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. E. Hirsch. An index to quantify an individual's scientific research output. Proceedings of the National Academy of Sciences, 102:16569, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  9. G. Jeh and J. Widom. SimRank: a measure of structural-context similarity. In Proceedings of the eighth ACM SIGKDD conference (KDD'02), pages 538--543. ACM, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. W. Jiang, J. Vaidya, Z. Balaporia, C. Clifton, and B. Banich. Knowledge discovery from transportation network data. In Proceedings of the 21st ICDE Conference (ICDE'05), pages 1061--1072, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM, 46(5):604--632, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. U. Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17(4):395--416, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Z. Nie, Y. Zhang, J.-R. Wen, and W.-Y. Ma. Object-level ranking: Bringing order to web objects. In Proceedings of the fourteenth International World Wide Web Conference (WWW'05), pages 567--574. ACM, May 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Roy, T. Lane, and M. Werner-Washburne. Integrative construction and analysis of condition-specific biological networks. In Proceedings of AAAI'07, pages 1898--1899, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888--905, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Sidiropoulos, D. Katsaros, and Y. Manolopoulos. Generalized h-index for disclosing latent facts in citation networks. CoRR, abs/cs/0607066, 2006.Google ScholarGoogle Scholar
  17. X. Yin, J. Han, and P. S. Yu. Linkclus: Efficient clustering via heterogeneous semantic links. In Proceedings of the 32nd VLDB conference (VLDB'06), pages 427--438, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. O. Zamir and O. Etzioni. Grouper: A dynamic clustering interface to web search results. pages 1361--1374, 1999.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    EDBT '09: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
    March 2009
    1180 pages
    ISBN:9781605584225
    DOI:10.1145/1516360

    Copyright © 2009 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 24 March 2009

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate7of10submissions,70%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader