skip to main content
research-article

Vertex priority based butterfly counting for large-scale bipartite networks

Authors Info & Claims
Published:01 June 2019Publication History
Skip Abstract Section

Abstract

Bipartite networks are of great importance in many real-world applications. In bipartite networks, butterfly (i.e., a complete 2 x 2 biclique) is the smallest non-trivial cohesive structure and plays a key role. In this paper, we study the problem of efficient counting the number of butterflies in bipartite networks. The most advanced techniques are based on enumerating wedges which is the dominant cost of counting butterflies. Nevertheless, the existing algorithms cannot efficiently handle large-scale bipartite networks. This becomes a bottleneck in large-scale applications. In this paper, instead of the existing layer-priority-based techniques, we propose a vertex-priority-based paradigm BFC-VP to enumerate much fewer wedges; this leads to a significant improvement of the time complexity of the state-of-the-art algorithms. In addition, we present cache-aware strategies to further improve the time efficiency while theoretically retaining the time complexity of BFC-VP. Moreover, we also show that our proposed techniques can work efficiently in external and parallel contexts. Our extensive empirical studies demonstrate that the proposed techniques can speed up the state-of-the-art techniques by up to two orders of magnitude for the real datasets.

References

  1. Amazon. http://liu.cs.uic.edu/download/data/.Google ScholarGoogle Scholar
  2. Bi-sk. http://law.di.unimi.it/webdata/sk/.Google ScholarGoogle Scholar
  3. Bi-twitter. http://an.kaist.ac.kr/traces/WWW2010.html.Google ScholarGoogle Scholar
  4. Bi-uk. http://law.di.unimi.it/webdata/uk-2006-05/.Google ScholarGoogle Scholar
  5. Dbpedia. http://wiki.dbpedia.org/Downloads.Google ScholarGoogle Scholar
  6. Delicious. http://dai-labor.de/IRML/datasets.Google ScholarGoogle Scholar
  7. Live-journal. http://socialnetworks.mpi-sws.org.Google ScholarGoogle Scholar
  8. Orkut. http://socialnetworks.mpi-sws.org.Google ScholarGoogle Scholar
  9. Tracker. https://ssc.io/trackingthetrackers/.Google ScholarGoogle Scholar
  10. Twitter. http://public.asu.edu/~mdechoud/datasets.html.Google ScholarGoogle Scholar
  11. Wiki-en. http://dumps.wikimedia.org/.Google ScholarGoogle Scholar
  12. Wiki-fr. http://dumps.wikimedia.org/.Google ScholarGoogle Scholar
  13. A. Aggarwal, J. Vitter, et al. The input/output complexity of sorting and related problems. Communications of the ACM, 31(9):1116--1127, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Ailamaki, D. J. DeWitt, M. D. Hill, and D. A. Wood. Dbmss on a modern processor: Where does time go? In PVLDB, number DIAS-CONF-1999-001, pages 266--277, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. G. Aksoy, T. G. Kolda, and A. Pinar. Measuring and modeling bipartite graphs with community structure. Journal of Complex Networks, 5(4):581--603, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  16. M. Al Hasan and V. S. Dave. Triangle counting in large networks: a review. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8(2):e1226, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  17. N. Alon, R. Yuster, and U. Zwick. Finding and counting given length cycles. Algorithmica, 17(3):209--223, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  18. L. Auroux, M. Burelle, and R. Erra. Reordering very large graphs for fun & profit. In International Symposium on Web AlGorithms, 2015.Google ScholarGoogle Scholar
  19. L. Becchetti, P. Boldi, C. Castillo, and A. Gionis. Efficient semi-streaming algorithms for local triangle counting in massive graphs. In KDD, pages 16--24. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. K. Blandford, G. E. Blelloch, and I. A. Kash. Compact representations of separable graphs. In ACM-SIAM symposium on Discrete algorithms, pages 679--688. Society for Industrial and Applied Mathematics, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. P. Boldi, M. Rosa, M. Santini, and S. Vigna. Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks. In WWW, pages 587--596. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. P. Boldi, M. Santini, and S. Vigna. Permuting web graphs. In International Workshop on Algorithms and Models for the Web-Graph, pages 116--126. Springer, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. P. Borgatti and M. G. Everett. Network analysis of 2-mode data. Social networks, 19(3):243--269, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  24. L. Chang, C. Zhang, X. Lin, and L. Qin. Scalable top-k structural diversity search. In ICDE, pages 95--98. IEEE, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  25. F. Chierichetti, R. Kumar, S. Lattanzi, M. Mitzenmacher, A. Panconesi, and P. Raghavan. On compressing social networks. In KDD, pages 219--228. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Chu and J. Cheng. Triangle listing in massive networks. TKDD, 6(4):17, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. I. S. Dhillon. Co-clustering documents and words using bipartite spectral graph partitioning. In KDD, pages 269--274. ACM, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. L. Dhulipala, I. Kabiljo, B. Karrer, G. Ottaviano, S. Pupyrev, and A. Shalita. Compressing graphs and indexes with recursive graph bisection. In KDD, pages 1535--1544. ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. C. Fain and J. O. Pedersen. Sponsored search: A brief history. Bulletin of the american Society for Information Science and technology, 32(2):12--13, 2006.Google ScholarGoogle Scholar
  30. Y. Fang, X. Huang, L. Qin, Y. Zhang, W. Zhang, R. Cheng, and X. Lin. A survey of community search over big graphs. VLDB Journal, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  31. M. R. Garey and D. S. Johnson. Computers and intractability, volume 29. wh freeman New York, 2002.Google ScholarGoogle Scholar
  32. R. L. Graham. Bounds for certain multiprocessing anomalies. Bell System Technical Journal, 45(9):1563--1581, 1966.Google ScholarGoogle ScholarCross RefCross Ref
  33. R. L. Graham. Bounds on multiprocessing timing anomalies. SIAM journal on Applied Mathematics, 17(2):416--429, 1969.Google ScholarGoogle Scholar
  34. S. Han, L. Zou, and J. X. Yu. Speeding up set intersections in graph algorithms using simd instructions. In SIGMOD, pages 1587--1602. ACM, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. M. R. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. External memory algorithms, 50:107--118, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. X. Hu, Y. Tao, and C.-W. Chung. Massive graph triangulation. In SIGMOD, pages 325--336. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. A. Itai and M. Rodeh. Finding a minimum circuit in a graph. SIAM Journal on Computing, 7(4):413--423, 1978.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. S. Jain and C. Seshadhri. A fast and provable method for estimating clique counts using turán's theorem. In WWW, pages 441--449. International World Wide Web Conferences Steering Committee, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. M. Jha, C. Seshadhri, and A. Pinar. Path sampling: A fast and provable method for estimating 4-vertex subgraph counts. In WWW, pages 495--505. International World Wide Web Conferences Steering Committee, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. U. Kang and C. Faloutsos. Beyond' caveman communities': Hubs and spokes for graph compression and mining. In ICDM, pages 300--309. IEEE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. W. Khaouid, M. Barsky, V. Srinivasan, and A. Thomo. K-core decomposition of large networks on a single pc. PVLDB, 9(1):13--23, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. M. N. Kolountzakis, G. L. Miller, R. Peng, and C. E. Tsourakakis. Efficient triangle counting in large graphs via degree-based vertex partitioning. Internet Mathematics, 8(1--2):161--185, 2012.Google ScholarGoogle Scholar
  43. M. Latapy. Main-memory triangle computations for very large (sparse (power-law)) graphs. Theoretical computer science, 407(1--3):458--473, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. M. Latapy, C. Magnien, and N. Del Vecchio. Basic notions for the analysis of large two-mode networks. Social networks, 30(1):31--48, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  45. X. Lin, M. M. Orlowska, and Y. Zhang. On data allocation with minimum overall communication costs in distributed database design. In ICCI, pages 539--544. IEEE, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. P. G. Lind, M. C. Gonzalez, and H. J. Herrmann. Cycles and clustering in bipartite networks. Physical review E, 72(5):056127, 2005.Google ScholarGoogle Scholar
  47. B. Liu, L. Yuan, X. Lin, L. Qin, W. Zhang, and J. Zhou. Efficient (α, β)-core computation: An index-based approach. In WWW, pages 1130--1141. ACM, 2019. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. Network motifs: simple building blocks of complex networks. Science, 298(5594):824--827, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  49. M. E. Newman. Scientific collaboration networks. i. network construction and fundamental results. Physical review E, 64(1):016131, 2001.Google ScholarGoogle Scholar
  50. M. E. Newman. Scientific collaboration networks. ii. shortest paths, weighted networks, and centrality. Physical review E, 64(1):016132, 2001.Google ScholarGoogle Scholar
  51. T. Opsahl. Triadic closure in two-mode networks: Redefining the global and local clustering coefficients. Social Networks, 35(2):159--167, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  52. M. Ornstein. Interlocking directorates in canada: Intercorporate or class alliance? Administrative science quarterly, pages 210--231, 1984.Google ScholarGoogle Scholar
  53. M. D. Ornstein. Interlocking directorates in canada: Evidence from replacement patterns. Social Networks, 4(1):3--25, 1982.Google ScholarGoogle ScholarCross RefCross Ref
  54. D. Palmer. Broken ties: Interlocking directorates and intercorporate coordination. Administrative Science Quarterly, pages 40--55, 1983.Google ScholarGoogle Scholar
  55. J.-S. Park, M. Penner, and V. K. Prasanna. Optimizing graph algorithms for improved cache performance. IEEE Transactions on Parallel and Distributed Systems, 15(9):769--782, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Y. Peng, Y. Zhang, W. Zhang, X. Lin, and L. Qin. Efficient probabilistic k-core computation on uncertain graphs. In ICDE, pages 1192--1203. IEEE, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  57. A. Pinar, C. Seshadhri, and V. Vishal. Escape: Efficiently counting all 5-vertex subgraphs. In WWW, pages 1431--1440. International World Wide Web Conferences Steering Committee, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. G. Robins and M. Alexander. Small worlds among interlocking directors: Network structure and distance in bipartite graphs. Computational & Mathematical Organization Theory, 10(1):69--94, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. S.-V. Sanei-Mehri, A. E. Sariyuce, and S. Tirthapura. Butterfly counting in bipartite networks. In KDD, pages 2150--2159. ACM, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. A. E. Sariyüce and A. Pinar. Peeling bipartite networks for dense subgraph discovery. In WSDM, pages 504--512. ACM, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. T. Schank and D. Wagner. Finding, counting and listing all triangles in large graphs, an experimental study. In International workshop on experimental and efficient algorithms, pages 606--609. Springer, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. C. Seshadhri, A. Pinar, and T. G. Kolda. Triadic measures on graphs: The power of wedge sampling. In SDM, pages 10--18. SIAM, 2013.Google ScholarGoogle Scholar
  63. J. Shun and K. Tangwongsan. Multicore triangle computations without tuning. In ICDE, pages 149--160. IEEE, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  64. L. D. Stefani, A. Epasto, M. Riondato, and E. Upfal. Triest: Counting local and global triangles in fully dynamic streams with fixed memory size. TKDD, 11(4):43, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. S. Suri and S. Vassilvitskii. Counting triangles and the curse of the last reducer. In WWW, pages 607--614. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. M. Then, M. Kaufmann, F. Chirigati, T.-A. Hoang-Vu, K. Pham, A. Kemper, T. Neumann, and H. T. Vo. The more the merrier: Efficient multi-source graph traversal. PVLDB, 8(4):449--460, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. C. Wang, J. Wang, X. Lin, W. Wang, H. Wang, H. Li, W. Tian, J. Xu, and R. Li. Mapdupreducer: detecting near duplicates over massive datasets. In SIGMOD, pages 1119--1122, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. J. Wang, A. W.-C. Fu, and J. Cheng. Rectangle counting in large bipartite graphs. In BigData Congress, pages 17--24. IEEE, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. K. Wang, X. Cao, X. Lin, W. Zhang, and L. Qin. Efficient computing of radius-bounded k-cores. In ICDE, pages 233--244. IEEE, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  70. X. Wang, Y. Zhang, W. Zhang, X. Lin, and W. Wang. Ap-tree: Efficiently support continuous spatial-keyword queries over stream. In ICDE, pages 1107--1118. IEEE, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  71. H. Wei, J. X. Yu, C. Lu, and X. Lin. Speedup graph processing by graph ordering. In SIGMOD, pages 1813--1828. ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. W. Yu, X. Lin, W. Zhang, L. Chang, and J. Pei. More is simpler: Effectively and efficiently assessing node-pair similarities based on hyperlinks. PVLDB, 7(1):13--24, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. F. Zhang, L. Yuan, Y. Zhang, L. Qin, X. Lin, and A. Zhou. Discovering strong communities with user engagement and tie strength. In DASFAA, pages 425--441. Springer, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. F. Zhang, Y. Zhang, L. Qin, W. Zhang, and X. Lin. When engagement meets similarity: efficient (k, r)-core computation on social networks. PVLDB, 10(10):998--1009, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. F. Zhang, Y. Zhang, L. Qin, W. Zhang, and X. Lin. Efficiently reinforcing social networks over user engagement and tie strength. In ICDE, pages 557--568. IEEE, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  76. Z. Zou. Bitruss decomposition of bipartite graphs. In DASFAA, pages 218--233. Springer, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 12, Issue 10
    June 2019
    177 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    • Published: 1 June 2019
    Published in pvldb Volume 12, Issue 10

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader