skip to main content
research-article

Approximately counting triangles in large graph streams including edge duplicates with a fixed memory usage

Authors Info & Claims
Published:01 October 2017Publication History
Skip Abstract Section

Abstract

Counting triangles in a large graph is important for detecting network anomalies such as spam web pages and suspicious accounts (e.g., fraudsters and advertisers) on online social networks. However, it is challenging to compute the number of triangles in a large graph represented as a stream of edges with a low computational cost when given a limited memory. Recently, several effective sampling-based approximation methods have been developed to solve this problem. However, they assume the graph stream of interest contains no duplicate edges, which does not hold in many real-world graph streams (e.g., phone calling networks). In this paper, we observe that these methods exhibit a large estimation error or computational cost even when modified to deal with duplicate edges using deduplication techniques such as Bloom filter and hash-based sampling. To solve this challenge, we design a one-pass streaming algorithm for uniformly sampling distinct edges at a high speed. Compared to state-of-the-art algorithms, our algorithm reduces the sampling cost per edge from O(log k) (k is the maximum number of sampled edges determined by the available memory space) to O(1) without using any additional memory space. Based on sampled edges, we develop a simple yet accurate method to infer the number of triangles in the original graph stream. We conduct extensive experiments on a variety of real-world large graphs, and the results demonstrate that our method is several times more accurate and faster than state-of-the-art methods with the same memory usage.

References

  1. N. Ahmed, N. Duffield, J. Neville, and R. Kompella. Graph sample and hold: A framework for big-graph analytics. In SIGKDD, pages 1446--1455, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. N. Alon, R. Yuster, and U. Zwick. Finding and counting given length cycles. Algorithmica, 17:354--364, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Arifuzzaman, M. Khan, and M. Marathe. Patric: A parallel algorithm for counting triangles in massive networks. In CIKM, pages 529--538, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Z. Bar-Yossef, T. S. Jayram, R. Kumar, D. Sivakumar, and L. Trevisan. Counting distinct elements in a data stream. In RANDOM, pages 1--10, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Z. Bar-Yossef, R. Kumar, and D. Sivakumar. Reductions in streaming algorithms, with an application to counting triangles in graphs. In SODA, pages 623--632, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. L. Becchetti, P. Boldi, C. Castillo, and A. Gionis. Efficient algorithms for large-scale local triangle counting. TKDD, 4(3):13, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. W. Berry, B. Hendrickson, R. A. LaViolette, and C. A. Phillips. Tolerating the community detection resolution limit with edge weighting. Physical Review E, 83(5):056119+, 2011.Google ScholarGoogle Scholar
  8. B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. CACM, 13(7):422--426, 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. L. S. Buriol, G. Frahling, S. Leonardi, A. Marchetti-Spaccamela, and C. Sohler. Counting triangles in data streams. In PODS, pages 253--262, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. H. Chun, Y. yeol Ahn, H. Kwak, S. Moon, Y. ho Eom, and H. Jeong. Comparison of online social relations in terms of volume vs. interaction: A case study of cyworld. In IMC, pages 57--70, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. E. Cohen. Size-estimation framework with applications to transitive closure and reachability. J. Comput. Syst. Sci., 55(3):441--453, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. E. Cohen. All-distances sketches, revisited: Hip estimators for massive graphs analysis. In PODS, pages 2320--2334, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. H. Cormen, C. Stein, R. L. Rivest, and C. E. Leiserson. Introduction to Algorithms. McGraw-Hill Higher Education, 2nd edition, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J.-P. Eckmann and E. Moses. Curvature of co-links uncovers hidden thematic layers in the world wide web. PNAS, 99(9):5825--5829, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  15. P. Flajolet, E. Fusy, O. Gandouet, and F. Meunier. Hyperloglog: The analysis of a near-optimal cardinality estimation algorithm. In AOFA, pages 127--146, 2007.Google ScholarGoogle Scholar
  16. P. Flajolet and G. N. Martin. Probabilistic counting algorithms for data base applications. J. Comput. Syst. Sci., 31(2):182--209, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. I. Giechaskiel, G. Panagopoulos, and E. Yoneki. PDTL: parallel and distributed triangle listing for massive graphs. In ICPP, pages 370--379, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. X. Hu, Y. Tao, and C.-W. Chung. Massive graph triangulation. In SIGMOD, pages 325--336, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Jha, A. Pinar, and C. Seshadhri. Counting triangles in real-world graph streams: Dealing with repeated edges and time windows. In ACSSC, pages 1507--1514, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  20. M. Jha, C. Seshadhri, and A. Pinar. A space efficient streaming algorithm for triangle counting using the birthday paradox. In SIGKDD, pages 589--597, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. H. Jowhari and M. Ghodsi. New streaming algorithms for counting triangles in graphs. In COCOON, pages 710--716, 2005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Jung, S. Lee, Y. Lim, and U. Kang. FURL: fixed-memory and uncertainty reducing local triangle counting for graph streams. CoRR, abs/1611.06615, 2016.Google ScholarGoogle Scholar
  23. U. Kang, B. Meeder, E. E. Papalexakis, and C. Faloutsos. Heigen: Spectral analysis for billion-scale graphs. TKDE, 26(2):350--362, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Kim, W.-S. Han, S. Lee, K. Park, and H. Yu. Opt: A new framework for overlapped and parallel triangulation in large-scale graphs. In SIGMOD, pages 637--648, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Kunegis. Handbook of network analysis {KONECT - the koblenz network collection}. CoRR, abs/1402.5500:1343--1350, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. K. Kutzkov and R. Pagh. On the streaming complexity of computing local clustering coefficients. In WSDM, pages 677--686, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? In WWW, pages 591--600, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. Latapy. Main-memory triangle computations for very large (sparse (power-law)) graphs. TCS, 407(1--3):458--473, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. Leskovec, D. Huttenlocher, and J. Kleinberg. Predicting positive and negative links in online social networks. In WWW, pages 641--650, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Y. Lim and U. Kang. MASCOT: memory-efficient and accurate sampling for counting local triangles in graph streams. In SIGKDD, pages 685--694, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. R. Milo, E. Al, and C. Biology. Network motifs: Simple building blocks of complex networks. Science, 298(5549):824--827, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  32. A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and analysis of online social networks. In IMC, pages 29--42, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. Mitzenmacher and E. Upfal. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, New York, NY, USA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. R. Pagh and F. Silvestri. The input/output complexity of triangle enumeration. In PODS, pages 224--233, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. H. Park, S. Myaeng, and U. Kang. PTE: enumerating trillion triangles on distributed systems. In SIGKDD, pages 1115--1124, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. H.-M. Park and C.-W. Chung. An efficient mapreduce algorithm for counting triangles in a very large graph. In CIKM, pages 539--548, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. H.-M. Park, F. Silvestri, U. Kang, and R. Pagh. Mapreduce triangle enumeration with guarantees. In CIKM, pages 1739--1748, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. A. Pavany, K. Tangwongsan, S. Tirthapuraz, and K.-L. Wu. Counting and sampling triangles from a graph stream. In PVLDB, 6(14):1870--1881, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. T. Schank. Algorithmic aspects of triangle-based network analysis. Phd in Computer Science, 2007.Google ScholarGoogle Scholar
  40. T. Schank and D. Wagner. Finding, counting and listing all triangles in large graphs, an experimental study. In WEA, pages 606--609, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. D. Schiöberg, F. Schneider, S. Schmid, S. Uhlig, and A. Feldmann. Evolution of directed triangle motifs in the google+ OSN. CoRR, abs/1502.04321, 2015.Google ScholarGoogle Scholar
  42. C. Seshadhri, A. Pinar, N. Durak, and T. G. Kolda. Directed closure measures for networks with reciprocity. J. Complex Networks, 5(1):32--47, 2017.Google ScholarGoogle Scholar
  43. C. Seshadhri, A. Pinar, and T. G. Kolda. Wedge sampling for computing clustering coefficients and triangle counts on large graphs. Statistical Analysis and Data Mining, 7(4):294--307, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. L. D. Stefani, A. Epasto, M. Riondato, and E. Upfal. Trièst: Counting local and global triangles in fully-dynamic streams with fixed memory size. In SIGKDD, pages 825--834, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. L. D. Stefani, A. Epasto, M. Riondato, and E. Upfal. Trièst: Counting local and global triangles in fully-dynamic streams with fixed memory size. CoRR, abs/1602.07424, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. S. Suri and S. Vassilvitskii. Counting triangles and the curse of the last reducer. In WWW, pages 607--614, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. D. Ting. Streamed approximate counting of distinct elements: Beating optimal batch methods. In SIGKDD, pages 442--451, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. C. E. Tsourakakis, U. Kang, G. L. Miller, and C. Faloutsos. Doulion: Counting triangles in massive graphs with a coin. In KDD, pages 837--846, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. J. S. Vitter. Random sampling with a reservoir. TOMS, 11(1):37--57, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. H. T. Welser, E. Gleave, D. Fisher, and M. Smith. Visualizing the signatures of social roles in online discussion groups. JoSS, 8(2):1--32, 2007.Google ScholarGoogle Scholar
  51. B. Wu, K. Yi, and Z. Li. Counting triangles in large graphs by random sampling. TKDE, 28(8):2013--2026, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Z. Yang, C. Wilson, X. Wang, T. Gao, B. Y. Zhao, and Y. Dai. Uncovering social network sybils in the wild. TKDD, 8(1):1--29, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. H. Zhang, Y. Zhu, L. Qin, H. Cheng, and J. X. Yu. Efficient triangle listing for billion-scale graphs. In BigData, pages 813--822, 2016.Google ScholarGoogle ScholarCross RefCross Ref

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 11, Issue 2
    October 2017
    176 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    • Published: 1 October 2017
    Published in pvldb Volume 11, Issue 2

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader