Abstract
Bipartite networks are of great importance in many real-world applications. In bipartite networks, butterfly (i.e., a complete 2 x 2 biclique) is the smallest non-trivial cohesive structure and plays a key role. In this paper, we study the problem of efficient counting the number of butterflies in bipartite networks. The most advanced techniques are based on enumerating wedges which is the dominant cost of counting butterflies. Nevertheless, the existing algorithms cannot efficiently handle large-scale bipartite networks. This becomes a bottleneck in large-scale applications. In this paper, instead of the existing layer-priority-based techniques, we propose a vertex-priority-based paradigm BFC-VP to enumerate much fewer wedges; this leads to a significant improvement of the time complexity of the state-of-the-art algorithms. In addition, we present cache-aware strategies to further improve the time efficiency while theoretically retaining the time complexity of BFC-VP. Moreover, we also show that our proposed techniques can work efficiently in external and parallel contexts. Our extensive empirical studies demonstrate that the proposed techniques can speed up the state-of-the-art techniques by up to two orders of magnitude for the real datasets.
- Amazon. http://liu.cs.uic.edu/download/data/.Google Scholar
- Bi-sk. http://law.di.unimi.it/webdata/sk/.Google Scholar
- Bi-twitter. http://an.kaist.ac.kr/traces/WWW2010.html.Google Scholar
- Bi-uk. http://law.di.unimi.it/webdata/uk-2006-05/.Google Scholar
- Dbpedia. http://wiki.dbpedia.org/Downloads.Google Scholar
- Delicious. http://dai-labor.de/IRML/datasets.Google Scholar
- Live-journal. http://socialnetworks.mpi-sws.org.Google Scholar
- Orkut. http://socialnetworks.mpi-sws.org.Google Scholar
- Tracker. https://ssc.io/trackingthetrackers/.Google Scholar
- Twitter. http://public.asu.edu/~mdechoud/datasets.html.Google Scholar
- Wiki-en. http://dumps.wikimedia.org/.Google Scholar
- Wiki-fr. http://dumps.wikimedia.org/.Google Scholar
- A. Aggarwal, J. Vitter, et al. The input/output complexity of sorting and related problems. Communications of the ACM, 31(9):1116--1127, 1988. Google ScholarDigital Library
- A. Ailamaki, D. J. DeWitt, M. D. Hill, and D. A. Wood. Dbmss on a modern processor: Where does time go? In PVLDB, number DIAS-CONF-1999-001, pages 266--277, 1999. Google ScholarDigital Library
- S. G. Aksoy, T. G. Kolda, and A. Pinar. Measuring and modeling bipartite graphs with community structure. Journal of Complex Networks, 5(4):581--603, 2017.Google ScholarCross Ref
- M. Al Hasan and V. S. Dave. Triangle counting in large networks: a review. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8(2):e1226, 2018.Google ScholarCross Ref
- N. Alon, R. Yuster, and U. Zwick. Finding and counting given length cycles. Algorithmica, 17(3):209--223, 1997.Google ScholarCross Ref
- L. Auroux, M. Burelle, and R. Erra. Reordering very large graphs for fun & profit. In International Symposium on Web AlGorithms, 2015.Google Scholar
- L. Becchetti, P. Boldi, C. Castillo, and A. Gionis. Efficient semi-streaming algorithms for local triangle counting in massive graphs. In KDD, pages 16--24. ACM, 2008. Google ScholarDigital Library
- D. K. Blandford, G. E. Blelloch, and I. A. Kash. Compact representations of separable graphs. In ACM-SIAM symposium on Discrete algorithms, pages 679--688. Society for Industrial and Applied Mathematics, 2003. Google ScholarDigital Library
- P. Boldi, M. Rosa, M. Santini, and S. Vigna. Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks. In WWW, pages 587--596. ACM, 2011. Google ScholarDigital Library
- P. Boldi, M. Santini, and S. Vigna. Permuting web graphs. In International Workshop on Algorithms and Models for the Web-Graph, pages 116--126. Springer, 2009. Google ScholarDigital Library
- S. P. Borgatti and M. G. Everett. Network analysis of 2-mode data. Social networks, 19(3):243--269, 1997.Google ScholarCross Ref
- L. Chang, C. Zhang, X. Lin, and L. Qin. Scalable top-k structural diversity search. In ICDE, pages 95--98. IEEE, 2017.Google ScholarCross Ref
- F. Chierichetti, R. Kumar, S. Lattanzi, M. Mitzenmacher, A. Panconesi, and P. Raghavan. On compressing social networks. In KDD, pages 219--228. ACM, 2009. Google ScholarDigital Library
- S. Chu and J. Cheng. Triangle listing in massive networks. TKDD, 6(4):17, 2012. Google ScholarDigital Library
- I. S. Dhillon. Co-clustering documents and words using bipartite spectral graph partitioning. In KDD, pages 269--274. ACM, 2001. Google ScholarDigital Library
- L. Dhulipala, I. Kabiljo, B. Karrer, G. Ottaviano, S. Pupyrev, and A. Shalita. Compressing graphs and indexes with recursive graph bisection. In KDD, pages 1535--1544. ACM, 2016. Google ScholarDigital Library
- D. C. Fain and J. O. Pedersen. Sponsored search: A brief history. Bulletin of the american Society for Information Science and technology, 32(2):12--13, 2006.Google Scholar
- Y. Fang, X. Huang, L. Qin, Y. Zhang, W. Zhang, R. Cheng, and X. Lin. A survey of community search over big graphs. VLDB Journal, 2019.Google ScholarCross Ref
- M. R. Garey and D. S. Johnson. Computers and intractability, volume 29. wh freeman New York, 2002.Google Scholar
- R. L. Graham. Bounds for certain multiprocessing anomalies. Bell System Technical Journal, 45(9):1563--1581, 1966.Google ScholarCross Ref
- R. L. Graham. Bounds on multiprocessing timing anomalies. SIAM journal on Applied Mathematics, 17(2):416--429, 1969.Google Scholar
- S. Han, L. Zou, and J. X. Yu. Speeding up set intersections in graph algorithms using simd instructions. In SIGMOD, pages 1587--1602. ACM, 2018. Google ScholarDigital Library
- M. R. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. External memory algorithms, 50:107--118, 1998. Google ScholarDigital Library
- X. Hu, Y. Tao, and C.-W. Chung. Massive graph triangulation. In SIGMOD, pages 325--336. ACM, 2013. Google ScholarDigital Library
- A. Itai and M. Rodeh. Finding a minimum circuit in a graph. SIAM Journal on Computing, 7(4):413--423, 1978.Google ScholarDigital Library
- S. Jain and C. Seshadhri. A fast and provable method for estimating clique counts using turán's theorem. In WWW, pages 441--449. International World Wide Web Conferences Steering Committee, 2017. Google ScholarDigital Library
- M. Jha, C. Seshadhri, and A. Pinar. Path sampling: A fast and provable method for estimating 4-vertex subgraph counts. In WWW, pages 495--505. International World Wide Web Conferences Steering Committee, 2015. Google ScholarDigital Library
- U. Kang and C. Faloutsos. Beyond' caveman communities': Hubs and spokes for graph compression and mining. In ICDM, pages 300--309. IEEE, 2011. Google ScholarDigital Library
- W. Khaouid, M. Barsky, V. Srinivasan, and A. Thomo. K-core decomposition of large networks on a single pc. PVLDB, 9(1):13--23, 2015. Google ScholarDigital Library
- M. N. Kolountzakis, G. L. Miller, R. Peng, and C. E. Tsourakakis. Efficient triangle counting in large graphs via degree-based vertex partitioning. Internet Mathematics, 8(1--2):161--185, 2012.Google Scholar
- M. Latapy. Main-memory triangle computations for very large (sparse (power-law)) graphs. Theoretical computer science, 407(1--3):458--473, 2008. Google ScholarDigital Library
- M. Latapy, C. Magnien, and N. Del Vecchio. Basic notions for the analysis of large two-mode networks. Social networks, 30(1):31--48, 2008.Google ScholarCross Ref
- X. Lin, M. M. Orlowska, and Y. Zhang. On data allocation with minimum overall communication costs in distributed database design. In ICCI, pages 539--544. IEEE, 1993. Google ScholarDigital Library
- P. G. Lind, M. C. Gonzalez, and H. J. Herrmann. Cycles and clustering in bipartite networks. Physical review E, 72(5):056127, 2005.Google Scholar
- B. Liu, L. Yuan, X. Lin, L. Qin, W. Zhang, and J. Zhou. Efficient (α, β)-core computation: An index-based approach. In WWW, pages 1130--1141. ACM, 2019. Google ScholarDigital Library
- R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. Network motifs: simple building blocks of complex networks. Science, 298(5594):824--827, 2002.Google ScholarCross Ref
- M. E. Newman. Scientific collaboration networks. i. network construction and fundamental results. Physical review E, 64(1):016131, 2001.Google Scholar
- M. E. Newman. Scientific collaboration networks. ii. shortest paths, weighted networks, and centrality. Physical review E, 64(1):016132, 2001.Google Scholar
- T. Opsahl. Triadic closure in two-mode networks: Redefining the global and local clustering coefficients. Social Networks, 35(2):159--167, 2013.Google ScholarCross Ref
- M. Ornstein. Interlocking directorates in canada: Intercorporate or class alliance? Administrative science quarterly, pages 210--231, 1984.Google Scholar
- M. D. Ornstein. Interlocking directorates in canada: Evidence from replacement patterns. Social Networks, 4(1):3--25, 1982.Google ScholarCross Ref
- D. Palmer. Broken ties: Interlocking directorates and intercorporate coordination. Administrative Science Quarterly, pages 40--55, 1983.Google Scholar
- J.-S. Park, M. Penner, and V. K. Prasanna. Optimizing graph algorithms for improved cache performance. IEEE Transactions on Parallel and Distributed Systems, 15(9):769--782, 2004. Google ScholarDigital Library
- Y. Peng, Y. Zhang, W. Zhang, X. Lin, and L. Qin. Efficient probabilistic k-core computation on uncertain graphs. In ICDE, pages 1192--1203. IEEE, 2018.Google ScholarCross Ref
- A. Pinar, C. Seshadhri, and V. Vishal. Escape: Efficiently counting all 5-vertex subgraphs. In WWW, pages 1431--1440. International World Wide Web Conferences Steering Committee, 2017. Google ScholarDigital Library
- G. Robins and M. Alexander. Small worlds among interlocking directors: Network structure and distance in bipartite graphs. Computational & Mathematical Organization Theory, 10(1):69--94, 2004. Google ScholarDigital Library
- S.-V. Sanei-Mehri, A. E. Sariyuce, and S. Tirthapura. Butterfly counting in bipartite networks. In KDD, pages 2150--2159. ACM, 2018. Google ScholarDigital Library
- A. E. Sariyüce and A. Pinar. Peeling bipartite networks for dense subgraph discovery. In WSDM, pages 504--512. ACM, 2018. Google ScholarDigital Library
- T. Schank and D. Wagner. Finding, counting and listing all triangles in large graphs, an experimental study. In International workshop on experimental and efficient algorithms, pages 606--609. Springer, 2005. Google ScholarDigital Library
- C. Seshadhri, A. Pinar, and T. G. Kolda. Triadic measures on graphs: The power of wedge sampling. In SDM, pages 10--18. SIAM, 2013.Google Scholar
- J. Shun and K. Tangwongsan. Multicore triangle computations without tuning. In ICDE, pages 149--160. IEEE, 2015.Google ScholarCross Ref
- L. D. Stefani, A. Epasto, M. Riondato, and E. Upfal. Triest: Counting local and global triangles in fully dynamic streams with fixed memory size. TKDD, 11(4):43, 2017. Google ScholarDigital Library
- S. Suri and S. Vassilvitskii. Counting triangles and the curse of the last reducer. In WWW, pages 607--614. ACM, 2011. Google ScholarDigital Library
- M. Then, M. Kaufmann, F. Chirigati, T.-A. Hoang-Vu, K. Pham, A. Kemper, T. Neumann, and H. T. Vo. The more the merrier: Efficient multi-source graph traversal. PVLDB, 8(4):449--460, 2014. Google ScholarDigital Library
- C. Wang, J. Wang, X. Lin, W. Wang, H. Wang, H. Li, W. Tian, J. Xu, and R. Li. Mapdupreducer: detecting near duplicates over massive datasets. In SIGMOD, pages 1119--1122, 2010. Google ScholarDigital Library
- J. Wang, A. W.-C. Fu, and J. Cheng. Rectangle counting in large bipartite graphs. In BigData Congress, pages 17--24. IEEE, 2014. Google ScholarDigital Library
- K. Wang, X. Cao, X. Lin, W. Zhang, and L. Qin. Efficient computing of radius-bounded k-cores. In ICDE, pages 233--244. IEEE, 2018.Google ScholarCross Ref
- X. Wang, Y. Zhang, W. Zhang, X. Lin, and W. Wang. Ap-tree: Efficiently support continuous spatial-keyword queries over stream. In ICDE, pages 1107--1118. IEEE, 2015.Google ScholarCross Ref
- H. Wei, J. X. Yu, C. Lu, and X. Lin. Speedup graph processing by graph ordering. In SIGMOD, pages 1813--1828. ACM, 2016. Google ScholarDigital Library
- W. Yu, X. Lin, W. Zhang, L. Chang, and J. Pei. More is simpler: Effectively and efficiently assessing node-pair similarities based on hyperlinks. PVLDB, 7(1):13--24, 2013. Google ScholarDigital Library
- F. Zhang, L. Yuan, Y. Zhang, L. Qin, X. Lin, and A. Zhou. Discovering strong communities with user engagement and tie strength. In DASFAA, pages 425--441. Springer, 2018.Google ScholarDigital Library
- F. Zhang, Y. Zhang, L. Qin, W. Zhang, and X. Lin. When engagement meets similarity: efficient (k, r)-core computation on social networks. PVLDB, 10(10):998--1009, 2017. Google ScholarDigital Library
- F. Zhang, Y. Zhang, L. Qin, W. Zhang, and X. Lin. Efficiently reinforcing social networks over user engagement and tie strength. In ICDE, pages 557--568. IEEE, 2018.Google ScholarCross Ref
- Z. Zou. Bitruss decomposition of bipartite graphs. In DASFAA, pages 218--233. Springer, 2016. Google ScholarDigital Library
Recommendations
Butterfly Counting in Bipartite Networks
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningWe consider the problem of counting motifs in bipartite affiliation networks, such as author-paper, user-product, and actor-movie relations. We focus on counting the number of occurrences of a "butterfly", a complete 2x2 biclique, the simplest cohesive ...
Accelerated butterfly counting with vertex priority on bipartite graphs
AbstractBipartite graphs are of great importance in many real-world applications. Butterfly, which is a complete biclique, plays a key role in bipartite graphs. In this paper, we investigate the problem of efficient counting the number of butterflies. ...
Reconfiguration graphs for vertex colourings of chordal and chordal bipartite graphs
A k -colouring of a graph G =( V , E ) is a mapping c : V {1,2, , k } such that c ( u ) c ( v ) whenever uv is an edge. The reconfiguration graph of the k -colourings of G contains as its vertex set the k -colourings of G , and two ...
Comments