research-article

Vertex priority based butterfly counting for large-scale bipartite networks

Authors:
Kai Wang

University of New South Wales, Australia and Zhejiang Lab, China

University of New South Wales, Australia and Zhejiang Lab, China
View Profile

,
Xuemin Lin

University of New South Wales, Australia and Zhejiang Lab, China

University of New South Wales, Australia and Zhejiang Lab, China
View Profile

,
Lu Qin

University of Technology Sydney, Australia

University of Technology Sydney, Australia
View Profile

,
Wenjie Zhang

University of New South Wales, Australia

University of New South Wales, Australia
View Profile

,
Ying Zhang

University of Technology Sydney, Australia

University of Technology Sydney, Australia
View Profile

Proceedings of the VLDB Endowment Volume 12 Issue 10pp 1139–1152https://doi.org/10.14778/3339490.3339497

Published:01 June 2019Publication History

Proceedings of the VLDB Endowment

Abstract

Bipartite networks are of great importance in many real-world applications. In bipartite networks, butterfly (i.e., a complete 2 x 2 biclique) is the smallest non-trivial cohesive structure and plays a key role. In this paper, we study the problem of efficient counting the number of butterflies in bipartite networks. The most advanced techniques are based on enumerating wedges which is the dominant cost of counting butterflies. Nevertheless, the existing algorithms cannot efficiently handle large-scale bipartite networks. This becomes a bottleneck in large-scale applications. In this paper, instead of the existing layer-priority-based techniques, we propose a vertex-priority-based paradigm BFC-VP to enumerate much fewer wedges; this leads to a significant improvement of the time complexity of the state-of-the-art algorithms. In addition, we present cache-aware strategies to further improve the time efficiency while theoretically retaining the time complexity of BFC-VP. Moreover, we also show that our proposed techniques can work efficiently in external and parallel contexts. Our extensive empirical studies demonstrate that the proposed techniques can speed up the state-of-the-art techniques by up to two orders of magnitude for the real datasets.

References

Amazon. http://liu.cs.uic.edu/download/data/.Google Scholar
Bi-sk. http://law.di.unimi.it/webdata/sk/.Google Scholar
Bi-twitter. http://an.kaist.ac.kr/traces/WWW2010.html.Google Scholar
Bi-uk. http://law.di.unimi.it/webdata/uk-2006-05/.Google Scholar
Dbpedia. http://wiki.dbpedia.org/Downloads.Google Scholar
Delicious. http://dai-labor.de/IRML/datasets.Google Scholar
Live-journal. http://socialnetworks.mpi-sws.org.Google Scholar
Orkut. http://socialnetworks.mpi-sws.org.Google Scholar
Tracker. https://ssc.io/trackingthetrackers/.Google Scholar
Twitter. http://public.asu.edu/~mdechoud/datasets.html.Google Scholar
Wiki-en. http://dumps.wikimedia.org/.Google Scholar
Wiki-fr. http://dumps.wikimedia.org/.Google Scholar
A. Aggarwal, J. Vitter, et al. The input/output complexity of sorting and related problems. Communications of the ACM, 31(9):1116--1127, 1988. Google ScholarDigital Library
A. Ailamaki, D. J. DeWitt, M. D. Hill, and D. A. Wood. Dbmss on a modern processor: Where does time go? In PVLDB, number DIAS-CONF-1999-001, pages 266--277, 1999. Google ScholarDigital Library
S. G. Aksoy, T. G. Kolda, and A. Pinar. Measuring and modeling bipartite graphs with community structure. Journal of Complex Networks, 5(4):581--603, 2017.Google ScholarCross Ref
M. Al Hasan and V. S. Dave. Triangle counting in large networks: a review. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8(2):e1226, 2018.Google ScholarCross Ref
N. Alon, R. Yuster, and U. Zwick. Finding and counting given length cycles. Algorithmica, 17(3):209--223, 1997.Google ScholarCross Ref
L. Auroux, M. Burelle, and R. Erra. Reordering very large graphs for fun & profit. In International Symposium on Web AlGorithms, 2015.Google Scholar
L. Becchetti, P. Boldi, C. Castillo, and A. Gionis. Efficient semi-streaming algorithms for local triangle counting in massive graphs. In KDD, pages 16--24. ACM, 2008. Google ScholarDigital Library
D. K. Blandford, G. E. Blelloch, and I. A. Kash. Compact representations of separable graphs. In ACM-SIAM symposium on Discrete algorithms, pages 679--688. Society for Industrial and Applied Mathematics, 2003. Google ScholarDigital Library
P. Boldi, M. Rosa, M. Santini, and S. Vigna. Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks. In WWW, pages 587--596. ACM, 2011. Google ScholarDigital Library
P. Boldi, M. Santini, and S. Vigna. Permuting web graphs. In International Workshop on Algorithms and Models for the Web-Graph, pages 116--126. Springer, 2009. Google ScholarDigital Library
S. P. Borgatti and M. G. Everett. Network analysis of 2-mode data. Social networks, 19(3):243--269, 1997.Google ScholarCross Ref
L. Chang, C. Zhang, X. Lin, and L. Qin. Scalable top-k structural diversity search. In ICDE, pages 95--98. IEEE, 2017.Google ScholarCross Ref
F. Chierichetti, R. Kumar, S. Lattanzi, M. Mitzenmacher, A. Panconesi, and P. Raghavan. On compressing social networks. In KDD, pages 219--228. ACM, 2009. Google ScholarDigital Library
S. Chu and J. Cheng. Triangle listing in massive networks. TKDD, 6(4):17, 2012. Google ScholarDigital Library
I. S. Dhillon. Co-clustering documents and words using bipartite spectral graph partitioning. In KDD, pages 269--274. ACM, 2001. Google ScholarDigital Library
L. Dhulipala, I. Kabiljo, B. Karrer, G. Ottaviano, S. Pupyrev, and A. Shalita. Compressing graphs and indexes with recursive graph bisection. In KDD, pages 1535--1544. ACM, 2016. Google ScholarDigital Library
D. C. Fain and J. O. Pedersen. Sponsored search: A brief history. Bulletin of the american Society for Information Science and technology, 32(2):12--13, 2006.Google Scholar
Y. Fang, X. Huang, L. Qin, Y. Zhang, W. Zhang, R. Cheng, and X. Lin. A survey of community search over big graphs. VLDB Journal, 2019.Google ScholarCross Ref
M. R. Garey and D. S. Johnson. Computers and intractability, volume 29. wh freeman New York, 2002.Google Scholar
R. L. Graham. Bounds for certain multiprocessing anomalies. Bell System Technical Journal, 45(9):1563--1581, 1966.Google ScholarCross Ref
R. L. Graham. Bounds on multiprocessing timing anomalies. SIAM journal on Applied Mathematics, 17(2):416--429, 1969.Google Scholar
S. Han, L. Zou, and J. X. Yu. Speeding up set intersections in graph algorithms using simd instructions. In SIGMOD, pages 1587--1602. ACM, 2018. Google ScholarDigital Library
M. R. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. External memory algorithms, 50:107--118, 1998. Google ScholarDigital Library
X. Hu, Y. Tao, and C.-W. Chung. Massive graph triangulation. In SIGMOD, pages 325--336. ACM, 2013. Google ScholarDigital Library
A. Itai and M. Rodeh. Finding a minimum circuit in a graph. SIAM Journal on Computing, 7(4):413--423, 1978.Google ScholarDigital Library
S. Jain and C. Seshadhri. A fast and provable method for estimating clique counts using turán's theorem. In WWW, pages 441--449. International World Wide Web Conferences Steering Committee, 2017. Google ScholarDigital Library
M. Jha, C. Seshadhri, and A. Pinar. Path sampling: A fast and provable method for estimating 4-vertex subgraph counts. In WWW, pages 495--505. International World Wide Web Conferences Steering Committee, 2015. Google ScholarDigital Library
U. Kang and C. Faloutsos. Beyond' caveman communities': Hubs and spokes for graph compression and mining. In ICDM, pages 300--309. IEEE, 2011. Google ScholarDigital Library
W. Khaouid, M. Barsky, V. Srinivasan, and A. Thomo. K-core decomposition of large networks on a single pc. PVLDB, 9(1):13--23, 2015. Google ScholarDigital Library
M. N. Kolountzakis, G. L. Miller, R. Peng, and C. E. Tsourakakis. Efficient triangle counting in large graphs via degree-based vertex partitioning. Internet Mathematics, 8(1--2):161--185, 2012.Google Scholar
M. Latapy. Main-memory triangle computations for very large (sparse (power-law)) graphs. Theoretical computer science, 407(1--3):458--473, 2008. Google ScholarDigital Library
M. Latapy, C. Magnien, and N. Del Vecchio. Basic notions for the analysis of large two-mode networks. Social networks, 30(1):31--48, 2008.Google ScholarCross Ref
X. Lin, M. M. Orlowska, and Y. Zhang. On data allocation with minimum overall communication costs in distributed database design. In ICCI, pages 539--544. IEEE, 1993. Google ScholarDigital Library
P. G. Lind, M. C. Gonzalez, and H. J. Herrmann. Cycles and clustering in bipartite networks. Physical review E, 72(5):056127, 2005.Google Scholar
B. Liu, L. Yuan, X. Lin, L. Qin, W. Zhang, and J. Zhou. Efficient (α, β)-core computation: An index-based approach. In WWW, pages 1130--1141. ACM, 2019. Google ScholarDigital Library
R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. Network motifs: simple building blocks of complex networks. Science, 298(5594):824--827, 2002.Google ScholarCross Ref
M. E. Newman. Scientific collaboration networks. i. network construction and fundamental results. Physical review E, 64(1):016131, 2001.Google Scholar
M. E. Newman. Scientific collaboration networks. ii. shortest paths, weighted networks, and centrality. Physical review E, 64(1):016132, 2001.Google Scholar
T. Opsahl. Triadic closure in two-mode networks: Redefining the global and local clustering coefficients. Social Networks, 35(2):159--167, 2013.Google ScholarCross Ref
M. Ornstein. Interlocking directorates in canada: Intercorporate or class alliance? Administrative science quarterly, pages 210--231, 1984.Google Scholar
M. D. Ornstein. Interlocking directorates in canada: Evidence from replacement patterns. Social Networks, 4(1):3--25, 1982.Google ScholarCross Ref
D. Palmer. Broken ties: Interlocking directorates and intercorporate coordination. Administrative Science Quarterly, pages 40--55, 1983.Google Scholar
J.-S. Park, M. Penner, and V. K. Prasanna. Optimizing graph algorithms for improved cache performance. IEEE Transactions on Parallel and Distributed Systems, 15(9):769--782, 2004. Google ScholarDigital Library
Y. Peng, Y. Zhang, W. Zhang, X. Lin, and L. Qin. Efficient probabilistic k-core computation on uncertain graphs. In ICDE, pages 1192--1203. IEEE, 2018.Google ScholarCross Ref
A. Pinar, C. Seshadhri, and V. Vishal. Escape: Efficiently counting all 5-vertex subgraphs. In WWW, pages 1431--1440. International World Wide Web Conferences Steering Committee, 2017. Google ScholarDigital Library
G. Robins and M. Alexander. Small worlds among interlocking directors: Network structure and distance in bipartite graphs. Computational & Mathematical Organization Theory, 10(1):69--94, 2004. Google ScholarDigital Library
S.-V. Sanei-Mehri, A. E. Sariyuce, and S. Tirthapura. Butterfly counting in bipartite networks. In KDD, pages 2150--2159. ACM, 2018. Google ScholarDigital Library
A. E. Sariyüce and A. Pinar. Peeling bipartite networks for dense subgraph discovery. In WSDM, pages 504--512. ACM, 2018. Google ScholarDigital Library
T. Schank and D. Wagner. Finding, counting and listing all triangles in large graphs, an experimental study. In International workshop on experimental and efficient algorithms, pages 606--609. Springer, 2005. Google ScholarDigital Library
C. Seshadhri, A. Pinar, and T. G. Kolda. Triadic measures on graphs: The power of wedge sampling. In SDM, pages 10--18. SIAM, 2013.Google Scholar
J. Shun and K. Tangwongsan. Multicore triangle computations without tuning. In ICDE, pages 149--160. IEEE, 2015.Google ScholarCross Ref
L. D. Stefani, A. Epasto, M. Riondato, and E. Upfal. Triest: Counting local and global triangles in fully dynamic streams with fixed memory size. TKDD, 11(4):43, 2017. Google ScholarDigital Library
S. Suri and S. Vassilvitskii. Counting triangles and the curse of the last reducer. In WWW, pages 607--614. ACM, 2011. Google ScholarDigital Library
M. Then, M. Kaufmann, F. Chirigati, T.-A. Hoang-Vu, K. Pham, A. Kemper, T. Neumann, and H. T. Vo. The more the merrier: Efficient multi-source graph traversal. PVLDB, 8(4):449--460, 2014. Google ScholarDigital Library
C. Wang, J. Wang, X. Lin, W. Wang, H. Wang, H. Li, W. Tian, J. Xu, and R. Li. Mapdupreducer: detecting near duplicates over massive datasets. In SIGMOD, pages 1119--1122, 2010. Google ScholarDigital Library
J. Wang, A. W.-C. Fu, and J. Cheng. Rectangle counting in large bipartite graphs. In BigData Congress, pages 17--24. IEEE, 2014. Google ScholarDigital Library
K. Wang, X. Cao, X. Lin, W. Zhang, and L. Qin. Efficient computing of radius-bounded k-cores. In ICDE, pages 233--244. IEEE, 2018.Google ScholarCross Ref
X. Wang, Y. Zhang, W. Zhang, X. Lin, and W. Wang. Ap-tree: Efficiently support continuous spatial-keyword queries over stream. In ICDE, pages 1107--1118. IEEE, 2015.Google ScholarCross Ref
H. Wei, J. X. Yu, C. Lu, and X. Lin. Speedup graph processing by graph ordering. In SIGMOD, pages 1813--1828. ACM, 2016. Google ScholarDigital Library
W. Yu, X. Lin, W. Zhang, L. Chang, and J. Pei. More is simpler: Effectively and efficiently assessing node-pair similarities based on hyperlinks. PVLDB, 7(1):13--24, 2013. Google ScholarDigital Library
F. Zhang, L. Yuan, Y. Zhang, L. Qin, X. Lin, and A. Zhou. Discovering strong communities with user engagement and tie strength. In DASFAA, pages 425--441. Springer, 2018.Google ScholarDigital Library
F. Zhang, Y. Zhang, L. Qin, W. Zhang, and X. Lin. When engagement meets similarity: efficient (k, r)-core computation on social networks. PVLDB, 10(10):998--1009, 2017. Google ScholarDigital Library
F. Zhang, Y. Zhang, L. Qin, W. Zhang, and X. Lin. Efficiently reinforcing social networks over user engagement and tie strength. In ICDE, pages 557--568. IEEE, 2018.Google ScholarCross Ref
Z. Zou. Bitruss decomposition of bipartite graphs. In DASFAA, pages 218--233. Springer, 2016. Google ScholarDigital Library

Recommendations

Butterfly Counting in Bipartite Networks
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

We consider the problem of counting motifs in bipartite affiliation networks, such as author-paper, user-product, and actor-movie relations. We focus on counting the number of occurrences of a "butterfly", a complete 2x2 biclique, the simplest cohesive ...
Read More
Accelerated butterfly counting with vertex priority on bipartite graphs
Abstract
Bipartite graphs are of great importance in many real-world applications. Butterfly, which is a complete $2 \times 2$ biclique, plays a key role in bipartite graphs. In this paper, we investigate the problem of efficient counting the number of butterflies. ...
Read More
Reconfiguration graphs for vertex colourings of chordal and chordal bipartite graphs

A k -colouring of a graph G =( V , E ) is a mapping c : V {1,2, , k } such that c ( u ) c ( v ) whenever uv is an edge. The reconfiguration graph of the k -colourings of G contains as its vertex set the k -colourings of G , and two ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 12, Issue 10
June 2019
177 pages
ISSN:2150-8097
Editors:
Lei Chen
HKUST
,
Fatma Özcan
IBM Research - Almaden
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 June 2019
Published in pvldb Volume 12, Issue 10
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 35
  Total Citations
  View Citations
- 313
  Total Downloads
- Downloads (Last 12 months)34
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Vertex priority based butterfly counting for large-scale bipartite networks

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Butterfly Counting in Bipartite Networks

Accelerated butterfly counting with vertex priority on bipartite graphs

Reconfiguration graphs for vertex colourings of chordal and chordal bipartite graphs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Vertex priority based butterfly counting for large-scale bipartite networks

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Butterfly Counting in Bipartite Networks

Accelerated butterfly counting with vertex priority on bipartite graphs

Reconfiguration graphs for vertex colourings of chordal and chordal bipartite graphs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media