skip to main content
10.1145/1281192.1281248acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Joint cluster analysis of attribute and relationship data withouta-priori specification of the number of clusters

Published:12 August 2007Publication History

ABSTRACT

In many applications, attribute and relationship data areavailable, carrying complementary information about real world entities. In such cases, a joint analysis of both types of data can yield more accurate results than classical clustering algorithms that either use only attribute data or only relationship (graph) data. The Connected k-Center (CkC) has been proposed as the first joint cluster analysis model to discover k clusters which are cohesive on both attribute and relationship data. However, it is well-known that prior knowledge on the number of clusters is often unavailable in applications such as community dentification and hotspot analysis. In this paper, we introduce and formalize the problem of discovering an a-priori unspecified number of clusters in the context of joint cluster analysis of attribute and relationship data, called Connected X Clusters (CXC) problem. True clusters are assumed to be compact and distinctive from their neighboring clusters in terms of attribute data and internally connected in terms of relationship data. Different from classical attribute-based clustering methods, the neighborhood of clusters is not defined in terms of attribute data but in terms of relationship data. To efficiently solve the CXC problem, we present JointClust, an algorithm which adopts a dynamic two-phase approach. In the first phase, we find so called cluster atoms. We provide a probability analysis for thisphase, which gives us a probabilistic guarantee, that each true cluster is represented by at least one of the initial cluster atoms. In the second phase, these cluster atoms are merged in a bottom-up manner resulting in a dendrogram. The final clustering is determined by our objective function. Our experimental evaluation on several real datasets demonstrates that JointClust indeed discovers meaningful and accurate clusterings without requiring the user to specify the number of clusters.

References

  1. M. Ankerst, M. Breunig, H.-P. Kriegel, and J. Sander. Optics: ordering points to identify the clustering structure. In SIGMOD, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. M. Bishop. Neural networks for pattern recognition. Oxford: Clarendon Press, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. S. Bradley and U. M. Fayyad. Refining initial points for k-means clustering. In ICML, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. U. Brandes, M. Gaertler, and D. Wagner. Experiments on graph clustering algorithms. In Algorithms -- ESA 2003, 11th Annual European Symposium, Budapest,Hungary, 2003.Google ScholarGoogle Scholar
  5. C. Carson, M. Thomas, S. Belongie, J. M. Hellerstein, and J. Malik. Blobworld: A system for region--based image indexing and retrieval. In VIS. Springer, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Ding, X. He, H. Zha, M. Gu, and H. Simon. A min-max cut algorithm for graph partitioning and data clustering. ICDM, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Duda and P. Hart. Pattern Classification and Scene Analysis. John Wiley and Sons, 1973.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Ester, R. Ge, B. J. Gao, Z. Hu, and B. Ben-Moshe. Joint cluster analysis of attribute data and relationship data: the connected k-center problem. In SDM, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  9. M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, 1996.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. W. Feller. An Introduction to Probability Theory and Its Applications, volume 1. New York: Wiley, 1968.Google ScholarGoogle Scholar
  11. P. Fjallstrom. Algorithms for graph partitioning: A survey. Linkoping Electronic Articles in Computer and Information Science, 1998.Google ScholarGoogle Scholar
  12. M. R. Garey, D. S. Johnson, and L. Stockmeyer. Some simplified np--complete problems. In Sixth annual ACM symposium on Theory of computing, 1974. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Guha, R. Rastogi, and K. Shim. Cure: an efficient clustering algorithm for large databases. In SIGMOD, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Guha, R. Rastogi, and K. Shim. Rock: A robust clustering algorithm for categorical attributes. In Information Systems, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. L. Hagen and A. Kahng. New spectral methods for ratio cut partitioning and clustering. IEEE Trans. on Computed Aided Design, 1992.Google ScholarGoogle Scholar
  16. R. A. Hanneman and M. Riddle. Introduction to social network methods. http://faculty.ucr.edu/$\sim$hanneman/, 2005.Google ScholarGoogle Scholar
  17. A. Jain and R. Dubes. Algorithms for clustering data. Prentice Hall, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. G. Karypis, E.-H. Han, and V. Kumar. Chameleon: Hierarchical clustering using dynamic modeling. IEEE Computer, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. L. Kaufman and P. Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis. New York: Wiley, 1990.Google ScholarGoogle Scholar
  20. T. M. Mitchell. Machine Learning. New York; London: McGraw-Hill, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Newman. The structure of scientific collaboration networks. In Proc. Natl. Acad. Sci. 98, 2001.Google ScholarGoogle Scholar
  22. M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks. Physical Review, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  23. A. Okabe, K.-I. Okunuki, and S. Shino. The sanet toolbox: New methods for network spatial analysis. In Transactions in GIS. Blackwell Publishing, July 2006.Google ScholarGoogle Scholar
  24. D. Pelleg and A. Moore. X-means: Extending k-means with efficient estimation of the number of clusters. In ICML, San Francisco, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. G. Salton and M. J. McGill. Introduction to modern information retrieval. McGraw-Hill, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Scott. Social Network Analysis: A handbook. Sage Publications, London, 2000.Google ScholarGoogle Scholar
  27. J. Shi and J. Malik. Normalized cuts and image segmentation. In IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Wasserman and K. Faust. Social Network Analysis. Cambridge University Press, 1994.Google ScholarGoogle ScholarCross RefCross Ref
  29. Y. Wei and C. Cheng. Towards efficient hierarchical designs by ratio cut partitioning. In IEEE Conf. on Computer-Aided Design, 1989.Google ScholarGoogle Scholar
  30. S. White and P. Smyth. A spectral clustering approach to finding comm. in graphs. In SDM, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  31. T. Zhang, R. Ramakrishnan, and M. Livny. Birch: An efficient data clustering method for very large databases. In SIGMOD, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Joint cluster analysis of attribute and relationship data withouta-priori specification of the number of clusters

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2007
      1080 pages
      ISBN:9781595936097
      DOI:10.1145/1281192

      Copyright © 2007 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 August 2007

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      KDD '07 Paper Acceptance Rate111of573submissions,19%Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader