Skip to main content
Erschienen in: Distributed and Parallel Databases 2/2017

17.03.2017

Efficient MapReduce algorithms for triangle listing in billion-scale graphs

verfasst von: Yuanyuan Zhu, Hao Zhang, Lu Qin, Hong Cheng

Erschienen in: Distributed and Parallel Databases | Ausgabe 2/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper addresses the classical triangle listing problem, which aims at enumerating all the tuples of three vertices connected with each other by edges. This problem has been intensively studied in internal and external memory, but it is still an urgent challenge in distributed environment where multiple machines across the network can be utilized to achieve good performance and scalability. As one of the de facto computing methodologies in distributed environment, MapReduce has been used in some of existing triangle listing algorithms. However, these algorithms usually need to shuffle a huge amount of intermediate data, which seriously hinders their scalability on large scale graphs. In this paper, we propose a new triangle listing algorithm in MapReduce, FTL, which utilizes a light weight data structure to substantially reduce the intermediate data transferred during the shuffle stage, and also is equipped with multiple-round techniques to ease the burden on memory and network bandwidth when dealing with graphs at billion scale. We prove that the size of the intermediate data can be well bounded near to the number of triangles in the graph. To further reduce the shuffle size and memory cost, we also propose improved algorithms based on a compact data structure, and present several optimization techniques to accelerate the computation and reduce the memory consumption. The extensive experimental results show that our algorithms outperform existing competitors by several times on both synthetic graphs and real world graphs.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Wang, J., Cheng, J.: Truss decomposition in massive networks. Proc. VLDB Endow. 5(9), 812–823 (2012)CrossRef Wang, J., Cheng, J.: Truss decomposition in massive networks. Proc. VLDB Endow. 5(9), 812–823 (2012)CrossRef
2.
Zurück zum Zitat Watts, D.J., Strogatz, S.H.: Collective dynamics of small-world networks. Nature 393(6684), 440–442 (1998)CrossRef Watts, D.J., Strogatz, S.H.: Collective dynamics of small-world networks. Nature 393(6684), 440–442 (1998)CrossRef
3.
Zurück zum Zitat Schank, T.: Algorithmic aspects of triangle-based network analysis. PhD in Computer Science, University Karlsruhe, vol 1 (2007) Schank, T.: Algorithmic aspects of triangle-based network analysis. PhD in Computer Science, University Karlsruhe, vol 1 (2007)
6.
Zurück zum Zitat Batagelj, V., Mrvar, A.: A subquadratic triad census algorithm for large sparse networks with small maximum degree. Soc. Netw. 23(3), 237–243 (2001)CrossRef Batagelj, V., Mrvar, A.: A subquadratic triad census algorithm for large sparse networks with small maximum degree. Soc. Netw. 23(3), 237–243 (2001)CrossRef
7.
Zurück zum Zitat Schank, T., Wagner, D.: Finding, counting and listing all triangles in large graphs, an experimental study. In: Experimental and Efficient Algorithms, pp. 606–609. Springer, Berlin (2005) Schank, T., Wagner, D.: Finding, counting and listing all triangles in large graphs, an experimental study. In: Experimental and Efficient Algorithms, pp. 606–609. Springer, Berlin (2005)
8.
Zurück zum Zitat Latapy, M.: Main-memory triangle computations for very large (sparse (power-law)) graphs. Theor. Comput. Sci. 407(1), 458–473 (2008)MathSciNetCrossRefMATH Latapy, M.: Main-memory triangle computations for very large (sparse (power-law)) graphs. Theor. Comput. Sci. 407(1), 458–473 (2008)MathSciNetCrossRefMATH
9.
Zurück zum Zitat Eppstein, D., Spiro, E.S.: The h-index of a graph and its application to dynamic subgraph statistics. In: Algorithms and Data Structures, pp. 278–289. Springer, Heidelberg (2009) Eppstein, D., Spiro, E.S.: The h-index of a graph and its application to dynamic subgraph statistics. In: Algorithms and Data Structures, pp. 278–289. Springer, Heidelberg (2009)
10.
Zurück zum Zitat Menegola, B.: An External Memory Algorithm for Listing Triangles. Technical report. Universidade Federal do Rio Grande do Sul (2010) Menegola, B.: An External Memory Algorithm for Listing Triangles. Technical report. Universidade Federal do Rio Grande do Sul (2010)
11.
Zurück zum Zitat Dementiev, R.: Algorithm engineering for large data sets. PhD Dissertation, Saarland University (2006) Dementiev, R.: Algorithm engineering for large data sets. PhD Dissertation, Saarland University (2006)
12.
Zurück zum Zitat Chu, S., Cheng, J.: Triangle listing in massive networks and its applications. In: Proceedings of SIGKDD, pp. 672–680. ACM (2011) Chu, S., Cheng, J.: Triangle listing in massive networks and its applications. In: Proceedings of SIGKDD, pp. 672–680. ACM (2011)
13.
Zurück zum Zitat Hu, X., Tao, Y., Chung, C.-W.: Massive graph triangulation. In: Proceedings of SIGMOD, pp. 325–336. ACM, New York (2013) Hu, X., Tao, Y., Chung, C.-W.: Massive graph triangulation. In: Proceedings of SIGMOD, pp. 325–336. ACM, New York (2013)
14.
Zurück zum Zitat Cohen, J.: Graph twiddling in a MapReduce world. Comput. Sci. Eng. 11(4), 29–41 (2009)CrossRef Cohen, J.: Graph twiddling in a MapReduce world. Comput. Sci. Eng. 11(4), 29–41 (2009)CrossRef
15.
Zurück zum Zitat Suri, S., Vassilvitskii, S.: Counting triangles and the curse of the last reducer. In: Proceedings of WWW, pp. 607–614. ACM, New York (2011) Suri, S., Vassilvitskii, S.: Counting triangles and the curse of the last reducer. In: Proceedings of WWW, pp. 607–614. ACM, New York (2011)
16.
Zurück zum Zitat Park, H.-M., Silvestri, F., Kang, U., Pagh, R.: MapReduce triangle enumeration with guarantees. In: Proceedings of CIKM, pp. 1739–1748. ACM (2014) Park, H.-M., Silvestri, F., Kang, U., Pagh, R.: MapReduce triangle enumeration with guarantees. In: Proceedings of CIKM, pp. 1739–1748. ACM (2014)
17.
Zurück zum Zitat Park, H.-M., Chung, C.-W.: An efficient MapReduce algorithm for counting triangles in a very large graph. In: Proceedings of CIKM, pp. 539–548. ACM (2013) Park, H.-M., Chung, C.-W.: An efficient MapReduce algorithm for counting triangles in a very large graph. In: Proceedings of CIKM, pp. 539–548. ACM (2013)
18.
Zurück zum Zitat Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: PowerGraph: distributed graph-parallel computation on natural graphs. In: Proceedings of OSDI, pp. 17–30 (2012) Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: PowerGraph: distributed graph-parallel computation on natural graphs. In: Proceedings of OSDI, pp. 17–30 (2012)
19.
Zurück zum Zitat Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of SIGMOD, pp. 135–146. ACM, New York (2010) Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of SIGMOD, pp. 135–146. ACM, New York (2010)
20.
Zurück zum Zitat Zhang, H., Zhu, Y., Qin, L., Cheng, H., Yu, J.X.: Efficient triangle listing for billion-scale graphs. In: IEEE BigData, pp. 813–822. IEEE (2016) Zhang, H., Zhu, Y., Qin, L., Cheng, H., Yu, J.X.: Efficient triangle listing for billion-scale graphs. In: IEEE BigData, pp. 813–822. IEEE (2016)
22.
Zurück zum Zitat Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: Proceedings of WWW, pp. 591–600. ACM, New York (2010) Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: Proceedings of WWW, pp. 591–600. ACM, New York (2010)
24.
Zurück zum Zitat Lai, L., Qin, L., Lin, X., Chang, L.: Scalable subgraph enumeration in mapreduce. Proc. VLDB Endow. 8(10), 974–985 (2015)CrossRef Lai, L., Qin, L., Lin, X., Chang, L.: Scalable subgraph enumeration in mapreduce. Proc. VLDB Endow. 8(10), 974–985 (2015)CrossRef
26.
Zurück zum Zitat Lam, C.: Hadoop in Action. Manning Publications Co., New York (2010) Lam, C.: Hadoop in Action. Manning Publications Co., New York (2010)
27.
Zurück zum Zitat Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-MAT: a recursive model for graph mining. In: SDM, vol. 4, pp. 442–446. SIAM (2004) Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-MAT: a recursive model for graph mining. In: SDM, vol. 4, pp. 442–446. SIAM (2004)
29.
Zurück zum Zitat Khorasani, F., Gupta, R., Bhuyan, L.N.: Scalable SIMD-efficient graph processing on GPUs. In: Proceedings of PACT, Series PACT ’15, pp. 39–50 (2015) Khorasani, F., Gupta, R., Bhuyan, L.N.: Scalable SIMD-efficient graph processing on GPUs. In: Proceedings of PACT, Series PACT ’15, pp. 39–50 (2015)
30.
Zurück zum Zitat Kim, J., Han, W.S., Lee, S., Park, K., Yu, H.: OPT: a new framework for overlapped and parallel triangulation in large-scale graphs. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 637–648. ACM (2014) Kim, J., Han, W.S., Lee, S., Park, K., Yu, H.: OPT: a new framework for overlapped and parallel triangulation in large-scale graphs. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 637–648. ACM (2014)
31.
Zurück zum Zitat Park, H.-M., Myaeng, S.-H., Kang, U.: PTE: enumerating trillion triangles on distributed systems. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1115–1124. ACM (2016) Park, H.-M., Myaeng, S.-H., Kang, U.: PTE: enumerating trillion triangles on distributed systems. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1115–1124. ACM (2016)
32.
Zurück zum Zitat Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: GraphX: graph processing in a distributed dataflow framework. In: 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pp. 599–613 (2014) Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: GraphX: graph processing in a distributed dataflow framework. In: 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pp. 599–613 (2014)
Metadaten
Titel
Efficient MapReduce algorithms for triangle listing in billion-scale graphs
verfasst von
Yuanyuan Zhu
Hao Zhang
Lu Qin
Hong Cheng
Publikationsdatum
17.03.2017
Verlag
Springer US
Erschienen in
Distributed and Parallel Databases / Ausgabe 2/2017
Print ISSN: 0926-8782
Elektronische ISSN: 1573-7578
DOI
https://doi.org/10.1007/s10619-017-7193-1

Weitere Artikel der Ausgabe 2/2017

Distributed and Parallel Databases 2/2017 Zur Ausgabe

Acknowledgments

Editorial Note