Skip to main content

2020 | OriginalPaper | Buchkapitel

A Scalable Randomized Algorithm for Triangle Enumeration on Graphs Based on SQL Queries

verfasst von : Abir Farouzi, Ladjel Bellatreche, Carlos Ordonez, Gopal Pandurangan, Mimoun Malki

Erschienen in: Big Data Analytics and Knowledge Discovery

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Triangle enumeration is a fundamental problem in large-scale graph analysis. For instance, triangles are used to solve practical problems like community detection and spam filtering. On the other hand, there is a large amount of data stored on database management systems (DBMSs), which can be modeled and analyzed as graphs. Alternatively, graph data can be quickly loaded into a DBMS. Our paper shows how to adapt and optimize a randomized distributed triangle enumeration algorithm with SQL queries, which is a significantly different approach from programming graph algorithms in traditional languages such as Python or C++. We choose a parallel columnar DBMS given its fast query processing, but our solution should work for a row DBMS as well. Our randomized solution provides a balanced workload for parallel query processing, being robust to the existence of skewed degree vertices. We experimentally prove our solution ensures a balanced data distribution, and hence workload, among machines. The key idea behind the algorithm is to evenly partition all possible triplets of vertices among machines, sending edges that may form a triangle to a proxy machine; this edge redistribution eliminates shuffling edges during join computation and therefore triangle enumeration becomes local and fully parallel. In summary, our algorithm exhibits linear speedup with large graphs, including graphs that have high skewness in vertex degree distributions.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Afrati, F.N., Sarma, A.D., Salihoglu, S., Ullman, J.D.: Upper and lower bounds on the cost of a MapReduce computation. PVLDB 6(4), 277–288 (2013) Afrati, F.N., Sarma, A.D., Salihoglu, S., Ullman, J.D.: Upper and lower bounds on the cost of a MapReduce computation. PVLDB 6(4), 277–288 (2013)
3.
Zurück zum Zitat Al Hasan, M., Dave, V.S.: Triangle counting in large networks: a review. Wiley Interdisc. Rev. Data Min. Knowl. Discov. 8(2), e1226 (2018) Al Hasan, M., Dave, V.S.: Triangle counting in large networks: a review. Wiley Interdisc. Rev. Data Min. Knowl. Discov. 8(2), e1226 (2018)
4.
Zurück zum Zitat Arifuzzaman, S., Khan, M., Marathe, M.: A space-efficient parallel algorithm for counting exact triangles in massive networks. In: HPCC, pp. 527–534 (2015) Arifuzzaman, S., Khan, M., Marathe, M.: A space-efficient parallel algorithm for counting exact triangles in massive networks. In: HPCC, pp. 527–534 (2015)
5.
Zurück zum Zitat Berry, J.W., Fostvedt, L.A., Nordman, D.J., Phillips, C.A., Seshadhri, C., Wilson, A.G.: Why do simple algorithms for triangle enumeration work in the real world? Internet Math. 11(6), 555–571 (2015)MathSciNetCrossRef Berry, J.W., Fostvedt, L.A., Nordman, D.J., Phillips, C.A., Seshadhri, C., Wilson, A.G.: Why do simple algorithms for triangle enumeration work in the real world? Internet Math. 11(6), 555–571 (2015)MathSciNetCrossRef
7.
Zurück zum Zitat Chu, S., Cheng, J.: Triangle listing in massive networks. ACM Trans. Knowl. Discov. Data 6(4), 17 (2012)CrossRef Chu, S., Cheng, J.: Triangle listing in massive networks. ACM Trans. Knowl. Discov. Data 6(4), 17 (2012)CrossRef
8.
Zurück zum Zitat Cui, Y., Xiao, D., Cline, D.B., Loguinov, D.: Improving I/O complexity of triangle enumeration. In: IEEE ICDM, pp. 61–70 (2017) Cui, Y., Xiao, D., Cline, D.B., Loguinov, D.: Improving I/O complexity of triangle enumeration. In: IEEE ICDM, pp. 61–70 (2017)
10.
Zurück zum Zitat Fan, J., Raj, A.G.S., Patel, J.M.: The case against specialized graph analytics engines. In: CIDR (2015) Fan, J., Raj, A.G.S., Patel, J.M.: The case against specialized graph analytics engines. In: CIDR (2015)
11.
Zurück zum Zitat Hu, X., Tao, Y., Chung, C.W.: Massive graph triangulation. In: ACM SIGMOD, pp. 325–336 (2013) Hu, X., Tao, Y., Chung, C.W.: Massive graph triangulation. In: ACM SIGMOD, pp. 325–336 (2013)
12.
13.
Zurück zum Zitat Klauck, H., Nanongkai, D., Pandurangan, G., Robinson, P.: Distributed computation of large-scale graph problems. In: ACM-SIAM SODA, pp. 391–410 (2015) Klauck, H., Nanongkai, D., Pandurangan, G., Robinson, P.: Distributed computation of large-scale graph problems. In: ACM-SIAM SODA, pp. 391–410 (2015)
14.
Zurück zum Zitat Latapy, M.: Main-memory triangle computations for very large (sparse (power-law)) graphs. Theor. Comput. Sci. 407(1–3), 458–473 (2008)MathSciNetCrossRef Latapy, M.: Main-memory triangle computations for very large (sparse (power-law)) graphs. Theor. Comput. Sci. 407(1–3), 458–473 (2008)MathSciNetCrossRef
15.
Zurück zum Zitat Malewicz, G., et al.: Pregel: a system for large-scale graph processing. In: ACM SIGMOD, pp. 135–146 (2010) Malewicz, G., et al.: Pregel: a system for large-scale graph processing. In: ACM SIGMOD, pp. 135–146 (2010)
16.
Zurück zum Zitat Ngo, H.Q., Ré, C., Rudra, A.: Skew strikes back: new developments in the theory of join algorithms. SIGMOD Rec. 42(4), 5–16 (2013)CrossRef Ngo, H.Q., Ré, C., Rudra, A.: Skew strikes back: new developments in the theory of join algorithms. SIGMOD Rec. 42(4), 5–16 (2013)CrossRef
17.
Zurück zum Zitat Ordonez, C.: Optimization of linear recursive queries in SQL. IEEE Trans. Knowl. Data Eng. 22(2), 264–277 (2010)MathSciNetCrossRef Ordonez, C.: Optimization of linear recursive queries in SQL. IEEE Trans. Knowl. Data Eng. 22(2), 264–277 (2010)MathSciNetCrossRef
18.
Zurück zum Zitat Ordonez, C., Cabrera, W., Gurram, A.: Comparing columnar, row and array DBMSs to process recursive queries on graphs. Inf. Syst. 63, 66–79 (2017)CrossRef Ordonez, C., Cabrera, W., Gurram, A.: Comparing columnar, row and array DBMSs to process recursive queries on graphs. Inf. Syst. 63, 66–79 (2017)CrossRef
19.
Zurück zum Zitat Pandurangan, G., Robinson, P., Scquizzato, M.: On the distributed complexity of large-scale graph computations. In: SPAA, pp. 405–414 (2018) Pandurangan, G., Robinson, P., Scquizzato, M.: On the distributed complexity of large-scale graph computations. In: SPAA, pp. 405–414 (2018)
20.
Zurück zum Zitat Park, H.M., Chung, C.W.: An efficient MapReduce algorithm for counting triangles in a very large graph. In: ACM CIKM, pp. 539–548 (2013) Park, H.M., Chung, C.W.: An efficient MapReduce algorithm for counting triangles in a very large graph. In: ACM CIKM, pp. 539–548 (2013)
21.
Zurück zum Zitat Schank, T.: Algorithmic aspects of triangle-based network analysis (2007) Schank, T.: Algorithmic aspects of triangle-based network analysis (2007)
22.
Zurück zum Zitat Shun, J., Tangwongsan, K.: Multicore triangle computations without tuning. In: ICDE, pp. 149–160 (2015) Shun, J., Tangwongsan, K.: Multicore triangle computations without tuning. In: ICDE, pp. 149–160 (2015)
23.
Zurück zum Zitat Sun, W., Fokoue, A., Srinivas, K., Kementsietsidis, A., Hu, G., Xie, G.: SQLGraph: an efficient relational-based property graph store. In: ACM SIGMOD, pp. 1887–1901 (2015) Sun, W., Fokoue, A., Srinivas, K., Kementsietsidis, A., Hu, G., Xie, G.: SQLGraph: an efficient relational-based property graph store. In: ACM SIGMOD, pp. 1887–1901 (2015)
24.
Zurück zum Zitat Suri, S., Vassilvitskii, S.: Counting triangles and the curse of the last reducer. In: WWW, pp. 607–614 (2011) Suri, S., Vassilvitskii, S.: Counting triangles and the curse of the last reducer. In: WWW, pp. 607–614 (2011)
25.
Zurück zum Zitat Wang, N., Zhang, J., Tan, K.L., Tung, A.K.H.: On triangulation-based dense neighborhood graph discovery. Proc. VLDB Endow. 4(2), 58–68 (2010)CrossRef Wang, N., Zhang, J., Tan, K.L., Tung, A.K.H.: On triangulation-based dense neighborhood graph discovery. Proc. VLDB Endow. 4(2), 58–68 (2010)CrossRef
26.
Zurück zum Zitat Watts, D.J., Strogatz, S.H.: Collective dynamics of ‘small-world’ networks. Nature 393, 440–442 (1998)CrossRef Watts, D.J., Strogatz, S.H.: Collective dynamics of ‘small-world’ networks. Nature 393, 440–442 (1998)CrossRef
27.
Zurück zum Zitat Zhang, Y., Jiang, H., Wang, F., Hua, Y., Feng, D., Xu, X.: LiteTE: lightweight, communication-efficient distributed-memory triangle enumerating. IEEE Access 7, 26294–26306 (2019)CrossRef Zhang, Y., Jiang, H., Wang, F., Hua, Y., Feng, D., Xu, X.: LiteTE: lightweight, communication-efficient distributed-memory triangle enumerating. IEEE Access 7, 26294–26306 (2019)CrossRef
28.
Zurück zum Zitat Zhao, K., Yu, J.X.: All-in-one: graph processing in RDBMSs revisited. In: SIGMOD, pp. 1165–1180 (2017) Zhao, K., Yu, J.X.: All-in-one: graph processing in RDBMSs revisited. In: SIGMOD, pp. 1165–1180 (2017)
Metadaten
Titel
A Scalable Randomized Algorithm for Triangle Enumeration on Graphs Based on SQL Queries
verfasst von
Abir Farouzi
Ladjel Bellatreche
Carlos Ordonez
Gopal Pandurangan
Mimoun Malki
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-59065-9_12