Abstract
Counting triangles in a large graph is important for detecting network anomalies such as spam web pages and suspicious accounts (e.g., fraudsters and advertisers) on online social networks. However, it is challenging to compute the number of triangles in a large graph represented as a stream of edges with a low computational cost when given a limited memory. Recently, several effective sampling-based approximation methods have been developed to solve this problem. However, they assume the graph stream of interest contains no duplicate edges, which does not hold in many real-world graph streams (e.g., phone calling networks). In this paper, we observe that these methods exhibit a large estimation error or computational cost even when modified to deal with duplicate edges using deduplication techniques such as Bloom filter and hash-based sampling. To solve this challenge, we design a one-pass streaming algorithm for uniformly sampling distinct edges at a high speed. Compared to state-of-the-art algorithms, our algorithm reduces the sampling cost per edge from O(log k) (k is the maximum number of sampled edges determined by the available memory space) to O(1) without using any additional memory space. Based on sampled edges, we develop a simple yet accurate method to infer the number of triangles in the original graph stream. We conduct extensive experiments on a variety of real-world large graphs, and the results demonstrate that our method is several times more accurate and faster than state-of-the-art methods with the same memory usage.
- N. Ahmed, N. Duffield, J. Neville, and R. Kompella. Graph sample and hold: A framework for big-graph analytics. In SIGKDD, pages 1446--1455, 2014. Google ScholarDigital Library
- N. Alon, R. Yuster, and U. Zwick. Finding and counting given length cycles. Algorithmica, 17:354--364, 1997. Google ScholarDigital Library
- S. Arifuzzaman, M. Khan, and M. Marathe. Patric: A parallel algorithm for counting triangles in massive networks. In CIKM, pages 529--538, 2013. Google ScholarDigital Library
- Z. Bar-Yossef, T. S. Jayram, R. Kumar, D. Sivakumar, and L. Trevisan. Counting distinct elements in a data stream. In RANDOM, pages 1--10, 2002. Google ScholarDigital Library
- Z. Bar-Yossef, R. Kumar, and D. Sivakumar. Reductions in streaming algorithms, with an application to counting triangles in graphs. In SODA, pages 623--632, 2002. Google ScholarDigital Library
- L. Becchetti, P. Boldi, C. Castillo, and A. Gionis. Efficient algorithms for large-scale local triangle counting. TKDD, 4(3):13, 2010. Google ScholarDigital Library
- J. W. Berry, B. Hendrickson, R. A. LaViolette, and C. A. Phillips. Tolerating the community detection resolution limit with edge weighting. Physical Review E, 83(5):056119+, 2011.Google Scholar
- B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. CACM, 13(7):422--426, 1970. Google ScholarDigital Library
- L. S. Buriol, G. Frahling, S. Leonardi, A. Marchetti-Spaccamela, and C. Sohler. Counting triangles in data streams. In PODS, pages 253--262, 2006. Google ScholarDigital Library
- H. Chun, Y. yeol Ahn, H. Kwak, S. Moon, Y. ho Eom, and H. Jeong. Comparison of online social relations in terms of volume vs. interaction: A case study of cyworld. In IMC, pages 57--70, 2008. Google ScholarDigital Library
- E. Cohen. Size-estimation framework with applications to transitive closure and reachability. J. Comput. Syst. Sci., 55(3):441--453, 1997. Google ScholarDigital Library
- E. Cohen. All-distances sketches, revisited: Hip estimators for massive graphs analysis. In PODS, pages 2320--2334, 2014. Google ScholarDigital Library
- T. H. Cormen, C. Stein, R. L. Rivest, and C. E. Leiserson. Introduction to Algorithms. McGraw-Hill Higher Education, 2nd edition, 2001. Google ScholarDigital Library
- J.-P. Eckmann and E. Moses. Curvature of co-links uncovers hidden thematic layers in the world wide web. PNAS, 99(9):5825--5829, 2002.Google ScholarCross Ref
- P. Flajolet, E. Fusy, O. Gandouet, and F. Meunier. Hyperloglog: The analysis of a near-optimal cardinality estimation algorithm. In AOFA, pages 127--146, 2007.Google Scholar
- P. Flajolet and G. N. Martin. Probabilistic counting algorithms for data base applications. J. Comput. Syst. Sci., 31(2):182--209, 1985. Google ScholarDigital Library
- I. Giechaskiel, G. Panagopoulos, and E. Yoneki. PDTL: parallel and distributed triangle listing for massive graphs. In ICPP, pages 370--379, 2015. Google ScholarDigital Library
- X. Hu, Y. Tao, and C.-W. Chung. Massive graph triangulation. In SIGMOD, pages 325--336, 2013. Google ScholarDigital Library
- M. Jha, A. Pinar, and C. Seshadhri. Counting triangles in real-world graph streams: Dealing with repeated edges and time windows. In ACSSC, pages 1507--1514, 2015.Google ScholarCross Ref
- M. Jha, C. Seshadhri, and A. Pinar. A space efficient streaming algorithm for triangle counting using the birthday paradox. In SIGKDD, pages 589--597, 2013. Google ScholarDigital Library
- H. Jowhari and M. Ghodsi. New streaming algorithms for counting triangles in graphs. In COCOON, pages 710--716, 2005.Google ScholarDigital Library
- M. Jung, S. Lee, Y. Lim, and U. Kang. FURL: fixed-memory and uncertainty reducing local triangle counting for graph streams. CoRR, abs/1611.06615, 2016.Google Scholar
- U. Kang, B. Meeder, E. E. Papalexakis, and C. Faloutsos. Heigen: Spectral analysis for billion-scale graphs. TKDE, 26(2):350--362, 2014. Google ScholarDigital Library
- J. Kim, W.-S. Han, S. Lee, K. Park, and H. Yu. Opt: A new framework for overlapped and parallel triangulation in large-scale graphs. In SIGMOD, pages 637--648, 2014. Google ScholarDigital Library
- J. Kunegis. Handbook of network analysis {KONECT - the koblenz network collection}. CoRR, abs/1402.5500:1343--1350, 2014. Google ScholarDigital Library
- K. Kutzkov and R. Pagh. On the streaming complexity of computing local clustering coefficients. In WSDM, pages 677--686, 2013. Google ScholarDigital Library
- H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? In WWW, pages 591--600, 2010. Google ScholarDigital Library
- M. Latapy. Main-memory triangle computations for very large (sparse (power-law)) graphs. TCS, 407(1--3):458--473, 2008. Google ScholarDigital Library
- J. Leskovec, D. Huttenlocher, and J. Kleinberg. Predicting positive and negative links in online social networks. In WWW, pages 641--650, 2010. Google ScholarDigital Library
- Y. Lim and U. Kang. MASCOT: memory-efficient and accurate sampling for counting local triangles in graph streams. In SIGKDD, pages 685--694, 2015. Google ScholarDigital Library
- R. Milo, E. Al, and C. Biology. Network motifs: Simple building blocks of complex networks. Science, 298(5549):824--827, 2002.Google ScholarCross Ref
- A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and analysis of online social networks. In IMC, pages 29--42, 2007. Google ScholarDigital Library
- M. Mitzenmacher and E. Upfal. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, New York, NY, USA, 2005. Google ScholarDigital Library
- R. Pagh and F. Silvestri. The input/output complexity of triangle enumeration. In PODS, pages 224--233, 2014. Google ScholarDigital Library
- H. Park, S. Myaeng, and U. Kang. PTE: enumerating trillion triangles on distributed systems. In SIGKDD, pages 1115--1124, 2016. Google ScholarDigital Library
- H.-M. Park and C.-W. Chung. An efficient mapreduce algorithm for counting triangles in a very large graph. In CIKM, pages 539--548, 2013. Google ScholarDigital Library
- H.-M. Park, F. Silvestri, U. Kang, and R. Pagh. Mapreduce triangle enumeration with guarantees. In CIKM, pages 1739--1748, 2014. Google ScholarDigital Library
- A. Pavany, K. Tangwongsan, S. Tirthapuraz, and K.-L. Wu. Counting and sampling triangles from a graph stream. In PVLDB, 6(14):1870--1881, 2013. Google ScholarDigital Library
- T. Schank. Algorithmic aspects of triangle-based network analysis. Phd in Computer Science, 2007.Google Scholar
- T. Schank and D. Wagner. Finding, counting and listing all triangles in large graphs, an experimental study. In WEA, pages 606--609, 2005. Google ScholarDigital Library
- D. Schiöberg, F. Schneider, S. Schmid, S. Uhlig, and A. Feldmann. Evolution of directed triangle motifs in the google+ OSN. CoRR, abs/1502.04321, 2015.Google Scholar
- C. Seshadhri, A. Pinar, N. Durak, and T. G. Kolda. Directed closure measures for networks with reciprocity. J. Complex Networks, 5(1):32--47, 2017.Google Scholar
- C. Seshadhri, A. Pinar, and T. G. Kolda. Wedge sampling for computing clustering coefficients and triangle counts on large graphs. Statistical Analysis and Data Mining, 7(4):294--307, 2014. Google ScholarDigital Library
- L. D. Stefani, A. Epasto, M. Riondato, and E. Upfal. Trièst: Counting local and global triangles in fully-dynamic streams with fixed memory size. In SIGKDD, pages 825--834, 2016. Google ScholarDigital Library
- L. D. Stefani, A. Epasto, M. Riondato, and E. Upfal. Trièst: Counting local and global triangles in fully-dynamic streams with fixed memory size. CoRR, abs/1602.07424, 2016. Google ScholarDigital Library
- S. Suri and S. Vassilvitskii. Counting triangles and the curse of the last reducer. In WWW, pages 607--614, 2011. Google ScholarDigital Library
- D. Ting. Streamed approximate counting of distinct elements: Beating optimal batch methods. In SIGKDD, pages 442--451, 2014. Google ScholarDigital Library
- C. E. Tsourakakis, U. Kang, G. L. Miller, and C. Faloutsos. Doulion: Counting triangles in massive graphs with a coin. In KDD, pages 837--846, 2009. Google ScholarDigital Library
- J. S. Vitter. Random sampling with a reservoir. TOMS, 11(1):37--57, 1985. Google ScholarDigital Library
- H. T. Welser, E. Gleave, D. Fisher, and M. Smith. Visualizing the signatures of social roles in online discussion groups. JoSS, 8(2):1--32, 2007.Google Scholar
- B. Wu, K. Yi, and Z. Li. Counting triangles in large graphs by random sampling. TKDE, 28(8):2013--2026, 2016.Google ScholarDigital Library
- Z. Yang, C. Wilson, X. Wang, T. Gao, B. Y. Zhao, and Y. Dai. Uncovering social network sybils in the wild. TKDD, 8(1):1--29, 2014. Google ScholarDigital Library
- H. Zhang, Y. Zhu, L. Qin, H. Cheng, and J. X. Yu. Efficient triangle listing for billion-scale graphs. In BigData, pages 813--822, 2016.Google ScholarCross Ref
Recommendations
Memory-Efficient and Accurate Sampling for Counting Local Triangles in Graph Streams: From Simple to Multigraphs
Special Issue (IDEA) and Regular PapersHow can we estimate local triangle counts accurately in a graph stream without storing the whole graph? How to handle duplicated edges in local triangle counting for graph stream? Local triangle counting, which computes the number of triangles attached ...
Small Edge Sets Meeting all Triangles of a Graph
It was conjectured in 1981 by the third author that if a graph G does not contain more than t pairwise edge-disjoint triangles, then there exists a set of at most 2t edges that shares an edge with each triangle of G. In this paper, we prove this ...
Edge-disjoint rainbow triangles in edge-colored graphs
AbstractLet G be an edge-colored graph. A triangle of G is called rainbow if any two edges of the triangle have distinct colors. We use m ( G ) and c ( G ) to denote the number of edges of G and the number of colors appearing on the edges of G,...
Comments