Skip to main content
Top
Published in: The Journal of Supercomputing 18/2023

19-06-2023

iPartition: a distributed partitioning algorithm for block-centric graph processing systems

Authors: Masoud Sagharichian, Morteza Alipour Langouri

Published in: The Journal of Supercomputing | Issue 18/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Extracting information from growing networks like social networks has a wide domain of applications. Therefore, many large-scale distributed graph processing systems have been developed. Such systems use some kinds of partitioning to distribute a big graph over machines. Minimizing the number of cutting edges and balancing the load are two general factors that most of the partitioning algorithms try to optimize regardless of the properties of the underlying systems. Although these general factors are necessary, we believe that they are not enough for all systems. In this paper, we investigate common features of block-centric graph processing systems and extract some system-specific core factors that influence on the quality of partitions: minimizing the diameter of the integrated-graph, maximizing the size of integrated vertices, and preventing the skew problem in the size of integrated vertices. The proposed partitioning method, tries to optimize system-specific factors as well as general ones. Evaluating iPartition over real-world graphs showed that it can significantly improve the performance of block-centric graph processing systems in terms of network traffic and synchronization barriers. More specifically, the amount of superstep reduction over frequently-used distributed graph partitioning methods varies from 20 to 75%, based on the type of the graph. The reduction in transmitted messages over the network varies from 10 to 80%.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Jackson MO (2014) Social and economic networks. Princeton University Press, Jackson, MOMATH Jackson MO (2014) Social and economic networks. Princeton University Press, Jackson, MOMATH
2.
go back to reference Li D, Chen Z, Liu J (2019) Analysis for behavioral economics in social networks: an altruism-based dynamic cooperation model. Int J Parallel Prog 47(4):686–708CrossRef Li D, Chen Z, Liu J (2019) Analysis for behavioral economics in social networks: an altruism-based dynamic cooperation model. Int J Parallel Prog 47(4):686–708CrossRef
3.
go back to reference Varanka D (2019) Integrating the sociology of space with geospatial semantic relation properties for data graphs Varanka D (2019) Integrating the sociology of space with geospatial semantic relation properties for data graphs
4.
go back to reference Liu N et al (2020) Large-scale graph processing systems: a survey. Front Inf Technol Electron Eng 21(3):384–404CrossRef Liu N et al (2020) Large-scale graph processing systems: a survey. Front Inf Technol Electron Eng 21(3):384–404CrossRef
5.
go back to reference Sahu S et al (2020) The ubiquity of large graphs and surprising challenges of graph processing: extended survey. VLDB J 29(2):595–618CrossRef Sahu S et al (2020) The ubiquity of large graphs and surprising challenges of graph processing: extended survey. VLDB J 29(2):595–618CrossRef
6.
go back to reference Carrington PJ, Scott J, Wasserman S (2005) Models and methods in social network analysis, vol 28. Cambridge University Press, CambridgeCrossRef Carrington PJ, Scott J, Wasserman S (2005) Models and methods in social network analysis, vol 28. Cambridge University Press, CambridgeCrossRef
7.
go back to reference Malewicz G et al (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data Malewicz G et al (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
8.
go back to reference Sagharichian M, Naderi H, Haghjoo M (2015) ExPregel: a new computational model for large-scale graph processing. Concurr Comput Pract Exp 27(17):4954–4969CrossRef Sagharichian M, Naderi H, Haghjoo M (2015) ExPregel: a new computational model for large-scale graph processing. Concurr Comput Pract Exp 27(17):4954–4969CrossRef
9.
go back to reference Yan D et al (2014) Blogel: a block-centric framework for distributed computation on real-world graphs. Proc VLDB Endow 7(14):1981–1992CrossRef Yan D et al (2014) Blogel: a block-centric framework for distributed computation on real-world graphs. Proc VLDB Endow 7(14):1981–1992CrossRef
10.
go back to reference Salihoglu S, Widom J (2013) Gps: a graph processing system. In: Proceedings of the 25th International Conference on Scientific and Statistical Database Management Salihoglu S, Widom J (2013) Gps: a graph processing system. In: Proceedings of the 25th International Conference on Scientific and Statistical Database Management
11.
go back to reference Gonzalez JE et al. (2014) Graphx: graph processing in a distributed dataflow framework. In: 11th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 14) Gonzalez JE et al. (2014) Graphx: graph processing in a distributed dataflow framework. In: 11th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 14)
12.
go back to reference Adoni HWY et al (2020) A survey of current challenges in partitioning and processing of graph-structured data in parallel and distributed systems. Distrib Parallel Databases 38(2):495–530CrossRef Adoni HWY et al (2020) A survey of current challenges in partitioning and processing of graph-structured data in parallel and distributed systems. Distrib Parallel Databases 38(2):495–530CrossRef
13.
go back to reference Sakouhi C, Khaldi A, Ghezal HB (2018) An overview of recent graph partitioning algorithms. In: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA). The Steering Committee of the World Congress in Computer Science, Computer Sakouhi C, Khaldi A, Ghezal HB (2018) An overview of recent graph partitioning algorithms. In: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA). The Steering Committee of the World Congress in Computer Science, Computer
14.
go back to reference Onizuka M, Fujimori T, Shiokawa H (2017) Graph partitioning for distributed graph processing. Data Sci Eng 2(1):94–105CrossRef Onizuka M, Fujimori T, Shiokawa H (2017) Graph partitioning for distributed graph processing. Data Sci Eng 2(1):94–105CrossRef
15.
go back to reference Sagharichian M, Naderi H (2017) Intelligent and independent processes for overcoming big graphs. J Supercomput 73(4):1438–1466CrossRef Sagharichian M, Naderi H (2017) Intelligent and independent processes for overcoming big graphs. J Supercomput 73(4):1438–1466CrossRef
16.
go back to reference Panitanarak T, Shontz SM (2011) Mdec: metis-based domain decomposition for parallel 2d mesh generation. Procedia Comput Sci 4:302–311CrossRef Panitanarak T, Shontz SM (2011) Mdec: metis-based domain decomposition for parallel 2d mesh generation. Procedia Comput Sci 4:302–311CrossRef
17.
go back to reference Karypis G (1997) METIS: unstructured graph partitioning and sparse matrix ordering system. Technical report Karypis G (1997) METIS: unstructured graph partitioning and sparse matrix ordering system. Technical report
18.
go back to reference Sui X et al (2010) Parallel graph partitioning on multicore architectures. In: International Workshop on Languages and Compilers for Parallel Computing. Springer Sui X et al (2010) Parallel graph partitioning on multicore architectures. In: International Workshop on Languages and Compilers for Parallel Computing. Springer
19.
go back to reference Slota GM, Madduri K, Rajamanickam S (2014) PuLP: scalable multi-objective multi-constraint partitioning for small-world networks. In: 2014 IEEE International Conference on Big Data (Big Data). IEEE Slota GM, Madduri K, Rajamanickam S (2014) PuLP: scalable multi-objective multi-constraint partitioning for small-world networks. In: 2014 IEEE International Conference on Big Data (Big Data). IEEE
20.
go back to reference Tsourakakis C et al (2014) Fennel: streaming graph partitioning for massive scale graphs. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining Tsourakakis C et al (2014) Fennel: streaming graph partitioning for massive scale graphs. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining
22.
go back to reference Guerrieri A, Montresor A (2015) Dfep: distributed funding-based edge partitioning. In: Euro-Par 2015: Parallel Processing: 21st International Conference on Parallel and Distributed Computing, Vienna, Austria, August 24–28, 2015, Proceedings 21. Springer Guerrieri A, Montresor A (2015) Dfep: distributed funding-based edge partitioning. In: Euro-Par 2015: Parallel Processing: 21st International Conference on Parallel and Distributed Computing, Vienna, Austria, August 24–28, 2015, Proceedings 21. Springer
23.
go back to reference Chen R et al (2019) Powerlyra: differentiated graph computation and partitioning on skewed graphs. ACM Trans Parallel Comput (TOPC) 5(3):1–39 Chen R et al (2019) Powerlyra: differentiated graph computation and partitioning on skewed graphs. ACM Trans Parallel Comput (TOPC) 5(3):1–39
24.
go back to reference Zhang Y, Wang Q, Gong S (2021) Distributed graph processing: techniques and systems. In: Web and Big Data. APWeb-WAIM 2020 International Workshops: KGMA 2020, SemiBDMA 2020, DeepLUDA 2020, Tianjin, China, September 18–20, 2020, Revised Selected Papers 4. Springer Zhang Y, Wang Q, Gong S (2021) Distributed graph processing: techniques and systems. In: Web and Big Data. APWeb-WAIM 2020 International Workshops: KGMA 2020, SemiBDMA 2020, DeepLUDA 2020, Tianjin, China, September 18–20, 2020, Revised Selected Papers 4. Springer
25.
go back to reference Ayall TA et al (2022) Graph computing systems and partitioning techniques: a survey. IEEE Access 10:118523–118550CrossRef Ayall TA et al (2022) Graph computing systems and partitioning techniques: a survey. IEEE Access 10:118523–118550CrossRef
26.
go back to reference Mayer R, Orujzade K, Jacobsen H-A (2022) Out-of-core edge partitioning at linear run-time. In: 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE Mayer R, Orujzade K, Jacobsen H-A (2022) Out-of-core edge partitioning at linear run-time. In: 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE
27.
go back to reference Ayall T et al (2021) Taking heuristic based graph edge partitioning one step ahead via OffStream partitioning approach. In: 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE Ayall T et al (2021) Taking heuristic based graph edge partitioning one step ahead via OffStream partitioning approach. In: 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE
28.
go back to reference Mayer R, Jacobsen H-A (2021) Hybrid edge partitioner: partitioning large power-law graphs under memory constraints. In: Proceedings of the 2021 International Conference on Management of Data Mayer R, Jacobsen H-A (2021) Hybrid edge partitioner: partitioning large power-law graphs under memory constraints. In: Proceedings of the 2021 International Conference on Management of Data
29.
go back to reference Kong D, Xie X, Zhang Z (2022) Clustering-based partitioning for large web graphs. In: 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE Kong D, Xie X, Zhang Z (2022) Clustering-based partitioning for large web graphs. In: 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE
30.
31.
go back to reference Gonzalez JE et al (2012) {PowerGraph}: distributed {Graph-Parallel} computation on natural graphs. In: 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12) Gonzalez JE et al (2012) {PowerGraph}: distributed {Graph-Parallel} computation on natural graphs. In: 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12)
32.
go back to reference Tian Y et al (2013) From" think like a vertex" to" think like a graph". Proc VLDB Endow 7(3):193–204CrossRef Tian Y et al (2013) From" think like a vertex" to" think like a graph". Proc VLDB Endow 7(3):193–204CrossRef
33.
go back to reference Fan W et al (2017) GRAPE: parallelizing sequential graph computations. Proc VLDB Endow 10(12):1889–1892CrossRef Fan W et al (2017) GRAPE: parallelizing sequential graph computations. Proc VLDB Endow 10(12):1889–1892CrossRef
34.
go back to reference Dindokar R, Choudhury N, Simmhan Y (2016) A meta-graph approach to analyze subgraph-centric distributed programming models. In: 2016 IEEE International Conference on Big Data (Big Data). IEEE Dindokar R, Choudhury N, Simmhan Y (2016) A meta-graph approach to analyze subgraph-centric distributed programming models. In: 2016 IEEE International Conference on Big Data (Big Data). IEEE
35.
go back to reference Verma S (2017) An experimental comparison of partitioning strategies in distributed graph processing Verma S (2017) An experimental comparison of partitioning strategies in distributed graph processing
36.
go back to reference Kim Y, Bae M, Oh S (2018) Dynamic block reassignment for load balancing of block centric graph processing systems. KIPS Trans Softw Data Eng 7(5):177–188 Kim Y, Bae M, Oh S (2018) Dynamic block reassignment for load balancing of block centric graph processing systems. KIPS Trans Softw Data Eng 7(5):177–188
37.
go back to reference Wen X, Zhang S, You H (2018) DRONE: a distributed subgraph-centric framework for processing large scale power-law graphs. arXiv preprint arXiv:1812.04380 Wen X, Zhang S, You H (2018) DRONE: a distributed subgraph-centric framework for processing large scale power-law graphs. arXiv preprint arXiv:​1812.​04380
38.
go back to reference Zhang S et al (2021) An efficient and balanced graph partition algorithm for the subgraph-centric programming model on large-scale power-law graphs. In: 2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS). IEEE Zhang S et al (2021) An efficient and balanced graph partition algorithm for the subgraph-centric programming model on large-scale power-law graphs. In: 2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS). IEEE
39.
go back to reference Sagharichian M, Langouri MA, Naderi H (2017) A fast method to exactly calculate the diameter of incremental disconnected graphs. World Wide Web 20(2):399–416CrossRef Sagharichian M, Langouri MA, Naderi H (2017) A fast method to exactly calculate the diameter of incremental disconnected graphs. World Wide Web 20(2):399–416CrossRef
40.
go back to reference Lam C (2010) Hadoop in action. Simon and Schuster, New York Lam C (2010) Hadoop in action. Simon and Schuster, New York
41.
go back to reference Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRef Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRef
42.
go back to reference Sedgewick R, Wayne K (2011) Algorithms, 4th edn. Addison-Wesley Professional, Boston Sedgewick R, Wayne K (2011) Algorithms, 4th edn. Addison-Wesley Professional, Boston
43.
go back to reference Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392MathSciNetCrossRefMATH Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392MathSciNetCrossRefMATH
44.
go back to reference Xie W et al (2013) Fast iterative graph computation with block updates. Proc VLDB Endow 6(14):2014–2025CrossRef Xie W et al (2013) Fast iterative graph computation with block updates. Proc VLDB Endow 6(14):2014–2025CrossRef
45.
go back to reference Leskovec J, Krevl A (2014)SNAP datasets: Stanford large network dataset collection. Ann Arbor, MI, USA Leskovec J, Krevl A (2014)SNAP datasets: Stanford large network dataset collection. Ann Arbor, MI, USA
Metadata
Title
iPartition: a distributed partitioning algorithm for block-centric graph processing systems
Authors
Masoud Sagharichian
Morteza Alipour Langouri
Publication date
19-06-2023
Publisher
Springer US
Published in
The Journal of Supercomputing / Issue 18/2023
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-023-05492-w

Other articles of this Issue 18/2023

The Journal of Supercomputing 18/2023 Go to the issue

Premium Partner