skip to main content
10.1145/1014052.1014068acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Fast discovery of connection subgraphs

Published:22 August 2004Publication History

ABSTRACT

We define a connection subgraph as a small subgraph of a large graph that best captures the relationship between two nodes. The primary motivation for this work is to provide a paradigm for exploration and knowledge discovery in large social networks graphs. We present a formal definition of this problem, and an ideal solution based on electricity analogues. We then show how to accelerate the computations, to produce approximate, but high-quality connection subgraphs in real time on very large (disk resident) graphs.We describe our operational prototype, and we demonstrate results on a social network graph derived from the World Wide Web. Our graph contains 15 million nodes and 96 million edges, and our system still produces quality responses within seconds.

References

  1. R. Albert, H. Jeong, and A.-L. Barabasi. Diameter of the world wide web. Nature, (401):130--131, 1999.Google ScholarGoogle Scholar
  2. U. Brandes, M. Gaertler, and D. Wagner. Experiments on graph clustering algorithms. In Proc. 11th Europ. Symp. Algorithms (ESA '03), LNCS 2832, pages 568--579. Springer-Verlag, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  3. A. K. Chandra, P. Raghavan, W. L. Ruzzo, R. Smolensky, and P. Tiwari. The electrical resistance of a graph captures its commute and cover times. STOC, pages 574--586, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. I. S. Dhillon, S. Mallela, and D. S. Modha. Information-theoretic co-clustering. In Proc. ACM KDD, Washington, DC, August 24-27 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Domingos and M. Richardson. Mining the network value of customers. Proc. ACM KDD, pages 57--66, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Dorogovtsev and J. Mendes. Evolution of networks. Advances in Physics, 51:1079--1187, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  7. P. Doyle and J. Snell. Random walks and electric networks, volume 22. Mathematical Association America, New York, 1984.Google ScholarGoogle Scholar
  8. M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the internet topology. SIGCOMM, pages 251--262, Aug-Sept. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. Flake, S. Lawrence, C. L. Giles, and F. Coetzee. Self-organization and identification of web communities. IEEE Computer, 35(3), Mar. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. E. Gansner and S. C. North. An open graph visualization system and its applications to software engineering. Software - Practice and Experience, 30:1203--1233, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Gibson, J. Kleinberg, and P. Raghavan. Inferring web communities from link topology. In Ninth ACM Conference on Hypertext and Hypermedia, pages 225--234, New York, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Girvan and M. E. J. Newman. Community structure is social and biological networks.Google ScholarGoogle Scholar
  13. M. Grotschel, C. L. Monma, and M. Stoer. Design of survivable networks. In Handbooks in Operations Research and Management Science 7: Network Models. North Holland, 1993.Google ScholarGoogle Scholar
  14. D. Gruhl, L. Chavet, D. Gibson, J. Meyer, P. Pattanayak, A. Tomkins, and J. Zien. How to build a WebFountain: An architecture for very large-scale text analytics. IBM Systems Journal, 43(1):64--77, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. T. H. Haveliwala. Topic-sensitive pagerank. Proc. WWW, pages 517--526, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. Jeh and J. Widom. Scaling personalized web search. WWW, pages 271--279, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. G. Karypis and V. Kumar. Parallel multilevel k-way partitioning for irregular graphs. SIAM Review, 41(2):278--300, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. Kempe, J. Kleinberg, and E. Tardos. Maximizing the spread of influence through a social network. In Proc. ACM KDD, Washington, DC, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. Liben-Nowell and J. Kleinberg. The link prediction problem for social networks. In Proc. CIKM, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. E. J. Newman. The structure and function of complex networks. SIAM Review, 45:167--256, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.Google ScholarGoogle Scholar
  22. C. R. Palmer and C. Faloutsos. Electricity based external similarity of categorical attributes. Seoul, South Korea, April-May 2003.Google ScholarGoogle Scholar
  23. S. van Dongen. Graph Clustering by Flow Simulation. Ph.D. thesis, University of Utrecht, May 2000.Google ScholarGoogle Scholar
  24. S. Virtanen. Clustering the chilean web. LA-WEB 2003, Nov. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Fast discovery of connection subgraphs

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2004
      874 pages
      ISBN:1581138881
      DOI:10.1145/1014052

      Copyright © 2004 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 22 August 2004

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader