ABSTRACT
We define a connection subgraph as a small subgraph of a large graph that best captures the relationship between two nodes. The primary motivation for this work is to provide a paradigm for exploration and knowledge discovery in large social networks graphs. We present a formal definition of this problem, and an ideal solution based on electricity analogues. We then show how to accelerate the computations, to produce approximate, but high-quality connection subgraphs in real time on very large (disk resident) graphs.We describe our operational prototype, and we demonstrate results on a social network graph derived from the World Wide Web. Our graph contains 15 million nodes and 96 million edges, and our system still produces quality responses within seconds.
- R. Albert, H. Jeong, and A.-L. Barabasi. Diameter of the world wide web. Nature, (401):130--131, 1999.Google Scholar
- U. Brandes, M. Gaertler, and D. Wagner. Experiments on graph clustering algorithms. In Proc. 11th Europ. Symp. Algorithms (ESA '03), LNCS 2832, pages 568--579. Springer-Verlag, 2003.Google ScholarCross Ref
- A. K. Chandra, P. Raghavan, W. L. Ruzzo, R. Smolensky, and P. Tiwari. The electrical resistance of a graph captures its commute and cover times. STOC, pages 574--586, 1989. Google ScholarDigital Library
- I. S. Dhillon, S. Mallela, and D. S. Modha. Information-theoretic co-clustering. In Proc. ACM KDD, Washington, DC, August 24-27 2003. Google ScholarDigital Library
- P. Domingos and M. Richardson. Mining the network value of customers. Proc. ACM KDD, pages 57--66, 2001. Google ScholarDigital Library
- S. Dorogovtsev and J. Mendes. Evolution of networks. Advances in Physics, 51:1079--1187, 2002.Google ScholarCross Ref
- P. Doyle and J. Snell. Random walks and electric networks, volume 22. Mathematical Association America, New York, 1984.Google Scholar
- M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the internet topology. SIGCOMM, pages 251--262, Aug-Sept. 1999. Google ScholarDigital Library
- G. Flake, S. Lawrence, C. L. Giles, and F. Coetzee. Self-organization and identification of web communities. IEEE Computer, 35(3), Mar. 2002. Google ScholarDigital Library
- E. Gansner and S. C. North. An open graph visualization system and its applications to software engineering. Software - Practice and Experience, 30:1203--1233, 1999. Google ScholarDigital Library
- D. Gibson, J. Kleinberg, and P. Raghavan. Inferring web communities from link topology. In Ninth ACM Conference on Hypertext and Hypermedia, pages 225--234, New York, 1998. Google ScholarDigital Library
- M. Girvan and M. E. J. Newman. Community structure is social and biological networks.Google Scholar
- M. Grotschel, C. L. Monma, and M. Stoer. Design of survivable networks. In Handbooks in Operations Research and Management Science 7: Network Models. North Holland, 1993.Google Scholar
- D. Gruhl, L. Chavet, D. Gibson, J. Meyer, P. Pattanayak, A. Tomkins, and J. Zien. How to build a WebFountain: An architecture for very large-scale text analytics. IBM Systems Journal, 43(1):64--77, 2004. Google ScholarDigital Library
- T. H. Haveliwala. Topic-sensitive pagerank. Proc. WWW, pages 517--526, 2002. Google ScholarDigital Library
- G. Jeh and J. Widom. Scaling personalized web search. WWW, pages 271--279, 2003. Google ScholarDigital Library
- G. Karypis and V. Kumar. Parallel multilevel k-way partitioning for irregular graphs. SIAM Review, 41(2):278--300, 1999. Google ScholarDigital Library
- D. Kempe, J. Kleinberg, and E. Tardos. Maximizing the spread of influence through a social network. In Proc. ACM KDD, Washington, DC, 2003. Google ScholarDigital Library
- D. Liben-Nowell and J. Kleinberg. The link prediction problem for social networks. In Proc. CIKM, 2003. Google ScholarDigital Library
- M. E. J. Newman. The structure and function of complex networks. SIAM Review, 45:167--256, 2003.Google ScholarDigital Library
- L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.Google Scholar
- C. R. Palmer and C. Faloutsos. Electricity based external similarity of categorical attributes. Seoul, South Korea, April-May 2003.Google Scholar
- S. van Dongen. Graph Clustering by Flow Simulation. Ph.D. thesis, University of Utrecht, May 2000.Google Scholar
- S. Virtanen. Clustering the chilean web. LA-WEB 2003, Nov. 2003. Google ScholarDigital Library
Index Terms
- Fast discovery of connection subgraphs
Recommendations
Rainbow connection and forbidden subgraphs
A connected edge-colored graph G is rainbow-connected if any two distinct vertices of G are connected by a path whose edges have pairwise distinct colors; the rainbow connection number rc ( G ) of G is the minimum number of colors such that G is rainbow-...
Forbidden Subgraphs and Weak Locally Connected Graphs
A graph is called H-free if it has no induced subgraph isomorphic to H. A graph is called $$N^i$$Ni-locally connected if $$G[\{ x\in V(G): 1\le d_G(w, x)\le i\}]$$G[{x?V(G):1≤dG(w,x)≤i}] is connected and $$N_2$$N2-locally connected if $$G[\{uv: \{uw, vw\...
Bipartite subgraphs of triangle-free subcubic graphs
Suppose G is a graph with n vertices and m edges. Let n^' be the maximum number of vertices in an induced bipartite subgraph of G and let m^' be the maximum number of edges in a spanning bipartite subgraph of G. Then b(G)=m^'/m is called the bipartite ...
Comments