Skip to main content
Erschienen in: Distributed and Parallel Databases 3/2018

18.07.2018

Distributed computing connected components with linear communication cost

verfasst von: Xing Feng, Lijun Chang, Xuemin Lin, Lu Qin, Wenjie Zhang, Long Yuan

Erschienen in: Distributed and Parallel Databases | Ausgabe 3/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The paper studies three fundamental problems in graph analytics, computing connected components (CCs), biconnected components (BCCs), and 2-edge-connected components (ECCs) of a graph. With the recent advent of big data, developing efficient distributed algorithms for computing CCs, BCCs and ECCs of a big graph has received increasing interests. As with the existing research efforts, we focus on the Pregel programming model, while the techniques may be extended to other programming models including MapReduce and Spark. The state-of-the-art techniques for computing CCs and BCCs in Pregel incur \(O(m\times \#\text {supersteps})\) total costs for both data communication and computation, where m is the number of edges in a graph and #supersteps is the number of supersteps. Since the network communication speed is usually much slower than the computation speed, communication costs are the dominant costs of the total running time in the existing techniques. In this paper, we propose a new paradigm based on graph decomposition to compute CCs and BCCs with O(m) total communication cost. The total computation costs of our techniques are also smaller than that of the existing techniques in practice, though theoretically almost the same. Moreover, we also study distributed computing ECCs. We are the first to study this problem and an approach with O(m) total communication cost is proposed. Comprehensive empirical studies demonstrate that our approaches can outperform the existing techniques by one order of magnitude regarding the total running time.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
3
The total data communication cost of \(\mathsf {single\text {-}pivot}\) becomes O(m) in practice if the input graph contains one giant CC and other small CCs.
 
4
Note that we confirmed with the authors of [38] through private communication that the data communication cost of \(\mathsf {S\text {-}V}\) per superstep is O(m), which is misspelled as O(n) in [38].
 
Literatur
1.
Zurück zum Zitat Awerbuch, B., Shiloach, Y.: New connectivity and MSF algorithms for shuffle-exchange network and PRAM. IEEE Trans. Comput. 10, 1258–1263 (1987)MathSciNetCrossRefMATH Awerbuch, B., Shiloach, Y.: New connectivity and MSF algorithms for shuffle-exchange network and PRAM. IEEE Trans. Comput. 10, 1258–1263 (1987)MathSciNetCrossRefMATH
2.
Zurück zum Zitat Ceccarello, M., Pietracaprina, A., Pucci, G., Upfal, E.: Space and time efficient parallel graph decomposition, clustering and diameter approximation. In: Proceedings of SPAA’15 (2015) Ceccarello, M., Pietracaprina, A., Pucci, G., Upfal, E.: Space and time efficient parallel graph decomposition, clustering and diameter approximation. In: Proceedings of SPAA’15 (2015)
3.
Zurück zum Zitat Chang, L., Yu, J. X., Qin, L., Lin, X., Liu, C., Liang, W.: Efficiently computing k-edge connected components via graph decomposition. In: Proceedings of SIGMOD’13 (2013) Chang, L., Yu, J. X., Qin, L., Lin, X., Liu, C., Liang, W.: Efficiently computing k-edge connected components via graph decomposition. In: Proceedings of SIGMOD’13 (2013)
4.
Zurück zum Zitat Ching, A., Kunz, C.: Giraph: large-scale graph processing infrastructure on hadoop. Hadoop Summit (2011) Ching, A., Kunz, C.: Giraph: large-scale graph processing infrastructure on hadoop. Hadoop Summit (2011)
5.
Zurück zum Zitat Cohen, J.: Graph twiddling in a mapreduce world. Comput. Sci. Eng. 11, 29–41 (2009)CrossRef Cohen, J.: Graph twiddling in a mapreduce world. Comput. Sci. Eng. 11, 29–41 (2009)CrossRef
6.
Zurück zum Zitat Cormen, T.H., Stein, C., Rivest, R.L., Leiserson, C.E.: Introduction to Algorithms. McGraw-Hill Higher Education, New York (2001)MATH Cormen, T.H., Stein, C., Rivest, R.L., Leiserson, C.E.: Introduction to Algorithms. McGraw-Hill Higher Education, New York (2001)MATH
7.
Zurück zum Zitat Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. In: Proceedings of OSDI’04 (2004) Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. In: Proceedings of OSDI’04 (2004)
8.
Zurück zum Zitat Feng, X., Chang, L., Lin, X., Qin, L., Zhang, W.: Computing connected components with linear communication cost in pregel-like systems. In: Proceedings of ICDE’16 (2016) Feng, X., Chang, L., Lin, X., Qin, L., Zhang, W.: Computing connected components with linear communication cost in pregel-like systems. In: Proceedings of ICDE’16 (2016)
10.
Zurück zum Zitat Gibbons, A.: Algorithmic Graph Theory. Cambridge University Press, Cambridge (1985)MATH Gibbons, A.: Algorithmic Graph Theory. Cambridge University Press, Cambridge (1985)MATH
11.
Zurück zum Zitat Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: Graphx: graph processing in a distributed dataflow framework. In: Proceedings of OSDI’14 (2014) Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: Graphx: graph processing in a distributed dataflow framework. In: Proceedings of OSDI’14 (2014)
12.
Zurück zum Zitat Henry, N., Bezerianos, A., Fekete, J.: Improving the readability of clustered social networks using node duplication. IEEE Trans. Vis. Comput. Graph. 14, 1317–1324 (2008)CrossRef Henry, N., Bezerianos, A., Fekete, J.: Improving the readability of clustered social networks using node duplication. IEEE Trans. Vis. Comput. Graph. 14, 1317–1324 (2008)CrossRef
13.
Zurück zum Zitat Hopcroft, J.E., Tarjan, R.E.: Efficient algorithms for graph manipulation [H] (algorithm 447). Commun. ACM 16, 372–378 (1973)CrossRef Hopcroft, J.E., Tarjan, R.E.: Efficient algorithms for graph manipulation [H] (algorithm 447). Commun. ACM 16, 372–378 (1973)CrossRef
14.
Zurück zum Zitat Kang, U., Tsourakakis, C.E., Faloutsos, C.: PEGASUS: a peta-scale graph mining system. In: Proceedings of ICDM’09 (2009) Kang, U., Tsourakakis, C.E., Faloutsos, C.: PEGASUS: a peta-scale graph mining system. In: Proceedings of ICDM’09 (2009)
15.
Zurück zum Zitat Kang, U., McGlohon, M., Akoglu, L., Faloutsos, C.: Patterns on the connected components of terabyte-scale graphs. In: Proceedings of ICDM’10 (2010) Kang, U., McGlohon, M., Akoglu, L., Faloutsos, C.: Patterns on the connected components of terabyte-scale graphs. In: Proceedings of ICDM’10 (2010)
16.
Zurück zum Zitat Kavitha, T., Liebchen, C., Mehlhorn, K., Michail, D., Rizzi, R., Ueckerdt, T., Zweig, K.A.: Cycle bases in graphs characterization, algorithms, complexity, and applications. Comput. Sci. Rev. 3, 199–243 (2009)CrossRefMATH Kavitha, T., Liebchen, C., Mehlhorn, K., Michail, D., Rizzi, R., Ueckerdt, T., Zweig, K.A.: Cycle bases in graphs characterization, algorithms, complexity, and applications. Comput. Sci. Rev. 3, 199–243 (2009)CrossRefMATH
17.
Zurück zum Zitat Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Distributed graphlab: a framework for machine learning in the cloud. PVLDB 5, 716–727 (2012) Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Distributed graphlab: a framework for machine learning in the cloud. PVLDB 5, 716–727 (2012)
18.
Zurück zum Zitat Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of SIGMOD’10 (2010) Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of SIGMOD’10 (2010)
19.
Zurück zum Zitat Miller, G.L., Peng, R., Xu, S.C.: Parallel graph decompositions using random shifts. In: Proceedings of SPAA’13 (2013) Miller, G.L., Peng, R., Xu, S.C.: Parallel graph decompositions using random shifts. In: Proceedings of SPAA’13 (2013)
20.
Zurück zum Zitat Miller, G.L., Ramachandran, V.: Efficient parallel ear decomposition with applications. MSRI 135, 162 (1986) Miller, G.L., Ramachandran, V.: Efficient parallel ear decomposition with applications. MSRI 135, 162 (1986)
21.
Zurück zum Zitat Munagala, K., Ranade, A.G.: I/o-complexity of graph algorithms. In: Proceedings of SODA’99 (1999) Munagala, K., Ranade, A.G.: I/o-complexity of graph algorithms. In: Proceedings of SODA’99 (1999)
22.
Zurück zum Zitat Nanda, S., Kotz, D.: Localized bridging centrality for distributed network analysis. In: Proceedings of ICCCN’08 (2008) Nanda, S., Kotz, D.: Localized bridging centrality for distributed network analysis. In: Proceedings of ICCCN’08 (2008)
23.
Zurück zum Zitat Przulj, N., Wigle, D., Jurisica, I.: Functional topology in a network of protein interactions. Bioinformatics 20, 340–348 (2004)CrossRef Przulj, N., Wigle, D., Jurisica, I.: Functional topology in a network of protein interactions. Bioinformatics 20, 340–348 (2004)CrossRef
24.
Zurück zum Zitat Qin, L., Yu, J.X., Chang, L., Cheng, H., Zhang, C., Lin, X.: Scalable big graph processing in mapreduce. In: Proceedings of SIGMOD’14 (2014) Qin, L., Yu, J.X., Chang, L., Cheng, H., Zhang, C., Lin, X.: Scalable big graph processing in mapreduce. In: Proceedings of SIGMOD’14 (2014)
25.
Zurück zum Zitat Ramachandran, V.: Parallel open ear decomposition with applications to graph biconnectivity and triconnectivity. Citeseer (1992) Ramachandran, V.: Parallel open ear decomposition with applications to graph biconnectivity and triconnectivity. Citeseer (1992)
26.
Zurück zum Zitat Rastogi, V., Machanavajjhala, A., Chitnis, L., Sarma, A.D.: Finding connected components in map-reduce in logarithmic rounds. In: Proceedings of ICDE’13 (2013) Rastogi, V., Machanavajjhala, A., Chitnis, L., Sarma, A.D.: Finding connected components in map-reduce in logarithmic rounds. In: Proceedings of ICDE’13 (2013)
27.
Zurück zum Zitat Salihoglu, S., Widom, J.: Optimizing graph algorithms on Pregel-like systems. PVLDB 7, 577–588 (2014) Salihoglu, S., Widom, J.: Optimizing graph algorithms on Pregel-like systems. PVLDB 7, 577–588 (2014)
28.
31.
Zurück zum Zitat Slota, G.M., Madduri, K.: Simple parallel biconnectivity algorithms for multicore platforms. In: Proceedings of HiPC’14 (2014) Slota, G.M., Madduri, K.: Simple parallel biconnectivity algorithms for multicore platforms. In: Proceedings of HiPC’14 (2014)
32.
Zurück zum Zitat Stanton, I.: Streaming balanced graph partitioning algorithms for random graphs. In: Proceedings of SODA’14 (2014) Stanton, I.: Streaming balanced graph partitioning algorithms for random graphs. In: Proceedings of SODA’14 (2014)
34.
Zurück zum Zitat Tian, Y., Balmin, A., Corsten, S.A., Tatikonda, S., McPherson, J.: From “think like a vertex” to “think like a graph”. PVLDB 7, 193–204 (2013) Tian, Y., Balmin, A., Corsten, S.A., Tatikonda, S., McPherson, J.: From “think like a vertex” to “think like a graph”. PVLDB 7, 193–204 (2013)
35.
Zurück zum Zitat Turau, V.: Computing bridges, articulations, and 2-connected components in wireless sensor networks. In: Proceedings of ALGOSENSORS’06 (2006) Turau, V.: Computing bridges, articulations, and 2-connected components in wireless sensor networks. In: Proceedings of ALGOSENSORS’06 (2006)
36.
Zurück zum Zitat Valiant, L.G.: A bridging model for parallel computation. Commun ACM 33, 103–111 (1990)CrossRef Valiant, L.G.: A bridging model for parallel computation. Commun ACM 33, 103–111 (1990)CrossRef
37.
Zurück zum Zitat Vega, D., Cerda-Alabern, L., Navarro, L., Meseguer, R.: Topology patterns of a community network: Guifi.net. In: Proceedings of WiMob’12 (2012) Vega, D., Cerda-Alabern, L., Navarro, L., Meseguer, R.: Topology patterns of a community network: Guifi.net. In: Proceedings of WiMob’12 (2012)
38.
Zurück zum Zitat Yan, D., Cheng, J., Xing, K., Lu, Y., Ng, W., Bu, Y.: Pregel algorithms for graph connectivity problems with performance guarantees. PVLDB 7, 1821–1832 (2014) Yan, D., Cheng, J., Xing, K., Lu, Y., Ng, W., Bu, Y.: Pregel algorithms for graph connectivity problems with performance guarantees. PVLDB 7, 1821–1832 (2014)
39.
Zurück zum Zitat Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of NSDI’12 (2012) Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of NSDI’12 (2012)
Metadaten
Titel
Distributed computing connected components with linear communication cost
verfasst von
Xing Feng
Lijun Chang
Xuemin Lin
Lu Qin
Wenjie Zhang
Long Yuan
Publikationsdatum
18.07.2018
Verlag
Springer US
Erschienen in
Distributed and Parallel Databases / Ausgabe 3/2018
Print ISSN: 0926-8782
Elektronische ISSN: 1573-7578
DOI
https://doi.org/10.1007/s10619-018-7232-6

Weitere Artikel der Ausgabe 3/2018

Distributed and Parallel Databases 3/2018 Zur Ausgabe