skip to main content
research-article

Engineering graph clustering: Models and experimental evaluation

Published:12 June 2008Publication History
Skip Abstract Section

Abstract

A promising approach to graph clustering is based on the intuitive notion of intracluster density versus intercluster sparsity. As for the weighted case, clusters should accumulate lots of weight, in contrast to their connection to the remaining graph, which should be light. While both formalizations and algorithms focusing on particular aspects of this rather vague concept have been proposed, no conclusive argument on their appropriateness has been given. In order to deepen the understanding of particular concepts, including both quality assessment as well as designing new algorithms, we conducted an experimental evaluation of graph-clustering approaches. By combining proved techniques from graph partitioning and geometric clustering, we also introduce a new approach that compares favorably.

References

  1. AUSIELLO, G., CRESCENZI, P., GAMBOSI, G., KANN, V., AND MARCHETTI-SPACCAMELA, A. 2002. Complexity and Approximation--Combinatorial Optimization Problems and Their Approximability Properties , 2nd ed. Springer-Verlagn New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. BRANDES, U., GAERTLER, M., AND WAGNER, D. 2003. Experiments on graph clustering algorithms. In Proceedings of the 11th Annual European Symposium on Algorithms (ESA'03). Lecture Notes in Computer Science, vol. 2832. 568-579.Google ScholarGoogle ScholarCross RefCross Ref
  3. CHUNG, F. R. K. AND YAU, S.-T. 1994. A near optimal algorithm for edge separators. In Proceeding of the 26th Annual ACM Symposium on Theory of Computing. ACM Press, New York. 1-8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. CHUNG, F. R. K. AND YAU, S.-T. 1997. Eigenvalues, flows and separators of graphs. In Proceeding of the 29th Annual ACM Symposium on Theory of Computing. ACM Press, New York. 1-8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. CLAUSET, A., NEWMAN, M. E. J., AND MOORE, C. 2004. Finding community structure in very large networks. Physical Review E 70, 066111.Google ScholarGoogle ScholarCross RefCross Ref
  6. GAERTLER, M. 2005. Clustering. In Network Analysis: Methodological Foundations, U. Brandes and T. Erlebach, Eds. Lecture Notes in Computer Science, vol. 3418. Springer-Verlag, New York. 178-215.Google ScholarGoogle Scholar
  7. HAREL, D. AND KOREN, Y. 2001. On clustering using random walks. In Proceedings of the 21st Conference on Foundations of Software Technology and Theoretical Computer S. Lecture Notes in Computer Science, vol. 2245. Springer-Verlag, New York. 18-41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. HARTUV, E. AND SHAMIR, R. 2000. A clustering algorithm based on graph connectivity. Information Processing Letters 76, 4-6, 175-181. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. HO, T. B., KAWASAKI, S., AND NGUYEN, N. B. 2003. Documents Clustering Using Tolerance Rough Set Model and Its Application to Information Retrieval. Physica-Verlag GmbH, Heidelberg. 181-196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. JAIN, A. K. AND DUBES, R. C. 1988. Algorithms for Clustering Data. Prentice Hall, Englewood, Cliffs, NJ. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. JAIN, A. K., MURTY, M. N., AND FLYNN, P. J. 1999. Data clustering: A review. ACM Computing Surveys 31, 3, 264-323. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. NEWMAN, M. E. J. AND GIRVAN, M. 2004. Finding and evaluating community structure in networks. Physical Review E 69, 026113.Google ScholarGoogle ScholarCross RefCross Ref
  13. SHAMIR, R., SHARAN, R., AND TSUR, D. 2002. Cluster graph modification problems. In Proceedings of the 28th International Workshop on Graph-Theoretical Conecpts in Computer Science (WG). Lecture Notes in Computer Science, vol. 2573. Springer-Verlag, New York. 379-390. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. SPIELMAN, D. A. AND TENG, S.-H. 1996. Spectral partitioning works: Planar graphs and finite element meshes. In Proceedings of the 37th Annual IEEE Symposium on Foundations of Computer Science (FOCS'96). 96-106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. VAN DONGEN, S. M. 2000. Graph Clustering by Flow Simulation. Ph.D. thesis, University of Utrecht.Google ScholarGoogle Scholar
  16. VEMPALA, S., KANNAN, R., AND VETTA, A. 2000. On clusterings--good, bad and spectral. In Proceedings of the 41st Annual IEEE Symposium on Foundations of Computer Science (FOCS'00). 367-378. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. WAGNER, D. AND WAGNER, F. 1993. Between min cut and graph bisection. In MFCS'93: Proceedings of the 18th International Symposium on Mathematical Foundations of Computer Science, A. M. Borzyszkowski and S. Sokolowski, Eds. Lecture Notes in Computer Science, vol. 711. Springer-Verlag, New York. 744-750. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. ZAHN, C. T. 1971. Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Transactions on Computers C-20, 68-86. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Engineering graph clustering: Models and experimental evaluation

        Recommendations

        Reviews

        Goran Trajkovski

        Clustering is an issue that affects nearly all computing disciplines, from computer graphics to bioinformatics; it also affects a wide range of fields outside of computing. Trying to identify regions in which data are denser than other regions is not a trivial task when the dataset is huge. So what do we do then__?__ We try to formalize intuitive ways to see the denser regions in the foggy clouds of data. Effective, well-performing algorithms are a necessity for data mining. In this paper, the authors give us models of graph clustering, based on the notion of identifying intracluster density as well as intercluster sparsity. They contrast three algorithms: Markov clustering, interactive conductance cutting, and the geometric minimum spanning tree clustering. The authors define several parameters for quality control, in an effort to assess density and sparsity. For graphs of different topologies, one algorithm might perform better than the other two. As always, there is a tradeoff between complexity and speed. Experimental studies reveal that this approach is a promising one. So what is the effect of this paper__?__ It makes one sit down and think. Given time, one might implement an approach, and apply it to datasets that have been collected and hidden somewhere, in order to see how the algorithms discussed in this paper will perform. This is a well-written, organized, and easy-to-follow paper. It presents new concepts with sound fundamental investigations and experimental support. It represents a vast, solid amount of work. Online Computing Reviews Service

        Access critical reviews of Computing literature here

        Become a reviewer for Computing Reviews.

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Journal of Experimental Algorithmics
          ACM Journal of Experimental Algorithmics  Volume 12, Issue
          2008
          507 pages
          ISSN:1084-6654
          EISSN:1084-6654
          DOI:10.1145/1227161
          Issue’s Table of Contents

          Copyright © 2008 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 12 June 2008
          • Accepted: 1 December 2006
          • Received: 1 February 2006
          Published in jea Volume 12, Issue

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader