Skip to main content
Top
Published in: International Journal of Parallel Programming 4-5/2023

05-05-2023

Partitioning-Aware Performance Modeling of Distributed Graph Processing Tasks

Authors: Daniel Presser, Frank Siqueira

Published in: International Journal of Parallel Programming | Issue 4-5/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Much of the data being produced in large scale by modern applications represents connected entities and their relationships, that can be modeled as large graphs. In order to extract valuable information from these large datasets, several parallel and distributed graph processing engines have been proposed. These systems are designed to run in large clusters, where resources must by allocated efficiently. Aiming to handle this problem, this paper presents a performance prediction model for GPS, a popular Pregel-based graph processing framework. By leveraging a micro-partitioning technique, our system can use various partitioning algorithms that greatly reduce the execution time, comparing with the simple hash partitioning that is commonly used in graph processing systems. Experimental results show that the prediction model has accuracy close to 90%, allowing it to be used in schedulers or to estimate the cost of running graph processing tasks.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
2.
go back to reference Avery, C.: Giraph: Large-scale graph processing infrastructure on Hadoop. In: Proceedings of Hadoop Summit (2011) Avery, C.: Giraph: Large-scale graph processing infrastructure on Hadoop. In: Proceedings of Hadoop Summit (2011)
3.
go back to reference Boldi, P., Vigna, S.: The WebGraph framework I: Compression techniques. In: Proceedings of the 13th International World Wide Web Conference, pp. 595–601. Manhattan, USA (2004) Boldi, P., Vigna, S.: The WebGraph framework I: Compression techniques. In: Proceedings of the 13th International World Wide Web Conference, pp. 595–601. Manhattan, USA (2004)
4.
go back to reference Cherkassky, B.V., Goldberg, A.V., Radzik, T.: Shortest paths algorithms: theory and experimental evaluation. Math. Program. 73(2), 129–174 (1996)MathSciNetCrossRefMATH Cherkassky, B.V., Goldberg, A.V., Radzik, T.: Shortest paths algorithms: theory and experimental evaluation. Math. Program. 73(2), 129–174 (1996)MathSciNetCrossRefMATH
5.
go back to reference Cordeiro, M., Sarmento, R.P., Brazdil, P., Gama, J.: Evolving networks and social network analysis methods and techniques. In: Social Media and Journalism-Trends, Connections, Implications. IntechOpen (2018) Cordeiro, M., Sarmento, R.P., Brazdil, P., Gama, J.: Evolving networks and social network analysis methods and techniques. In: Social Media and Journalism-Trends, Connections, Implications. IntechOpen (2018)
6.
go back to reference Danilevsky, M., Koh, E.: Information graph model and application to online advertising. In: Proceedings of the 1st Workshop on User Engagement Optimization, pp. 11–14. ACM (2013) Danilevsky, M., Koh, E.: Information graph model and application to online advertising. In: Proceedings of the 1st Workshop on User Engagement Optimization, pp. 11–14. ACM (2013)
7.
go back to reference Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef
8.
go back to reference Fernandes, K., Melhem, R., Hammoud, M.: Investigating and modeling performance scalability for distributed graph analytics. In: 2018 IEEE International Conference on Cloud Computing Technology and Science (CloudCom), IEEE, pp. 34–3, (2018) Fernandes, K., Melhem, R., Hammoud, M.: Investigating and modeling performance scalability for distributed graph analytics. In: 2018 IEEE International Conference on Cloud Computing Technology and Science (CloudCom), IEEE, pp. 34–3, (2018)
9.
go back to reference Garimella, K., Morales, G.D.F., Gionis, A., Mathioudakis, M.: Quantifying controversy on social media. ACM Trans. Social Comput. 1(1), 3 (2018)CrossRef Garimella, K., Morales, G.D.F., Gionis, A., Mathioudakis, M.: Quantifying controversy on social media. ACM Trans. Social Comput. 1(1), 3 (2018)CrossRef
10.
go back to reference Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: PowerGraph: distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th Symposium on Operating System Design and Implementation (2012) Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: PowerGraph: distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th Symposium on Operating System Design and Implementation (2012)
12.
go back to reference Joaquim, P., Bravo, M., Rodrigues, L., Matos, M.: Hourglass: leveraging transient resources for time-constrained graph processing in the cloud. In: Proceedings of the Fourteenth EuroSys Conference 2019, ACM, p. 35, (2019) Joaquim, P., Bravo, M., Rodrigues, L., Matos, M.: Hourglass: leveraging transient resources for time-constrained graph processing in the cloud. In: Proceedings of the Fourteenth EuroSys Conference 2019, ACM, p. 35, (2019)
13.
go back to reference Karypis, G., Kumar, V.: Multilevel graph partitioning schemes. In: ICPP (3), pp. 113–122 (1995) Karypis, G., Kumar, V.: Multilevel graph partitioning schemes. In: ICPP (3), pp. 113–122 (1995)
15.
go back to reference Kumar, D., Raj, A., Dharanipragada, J.: Graphsteal: Dynamic re-partitioning for efficient graph processing in heterogeneous clusters. In: Cloud Computing (CLOUD), 2017 IEEE 10th International Conference on, pp. 439–446. IEEE (2017) Kumar, D., Raj, A., Dharanipragada, J.: Graphsteal: Dynamic re-partitioning for efficient graph processing in heterogeneous clusters. In: Cloud Computing (CLOUD), 2017 IEEE 10th International Conference on, pp. 439–446. IEEE (2017)
16.
go back to reference Leskovec, J., Krevl, A.: SNAP Datasets: Stanford large network dataset collection (2014) Leskovec, J., Krevl, A.: SNAP Datasets: Stanford large network dataset collection (2014)
17.
go back to reference Li, Z., Zhang, B., Ren, S., Liu, Y., Qin, Z., Goh, R.S.M., Gurusamy, M.: Performance modelling and cost effective execution for distributed graph processing on configurable VMs. Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing pp. 74–83 (2017) Li, Z., Zhang, B., Ren, S., Liu, Y., Qin, Z., Goh, R.S.M., Gurusamy, M.: Performance modelling and cost effective execution for distributed graph processing on configurable VMs. Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing pp. 74–83 (2017)
18.
go back to reference Lumsdaine, A., Gregor, D., Hendrickson, B., Berry, J.: Challenges in parallel graph processing. Parallel Process. Lett. 17(01), 5–20 (2007)MathSciNetCrossRef Lumsdaine, A., Gregor, D., Hendrickson, B., Berry, J.: Challenges in parallel graph processing. Parallel Process. Lett. 17(01), 5–20 (2007)MathSciNetCrossRef
19.
go back to reference Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 135–146 (2010) Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 135–146 (2010)
20.
go back to reference Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: bringing order to the web. Tech. rep, Stanford InfoLab (1999) Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: bringing order to the web. Tech. rep, Stanford InfoLab (1999)
21.
go back to reference Presser, D., Siqueira, F., Reina, F.: Performance modeling and task scheduling in distributed graph processing. In: 2018 IEEE International Congress on Big Data (BigData Congress), pp. 135–142. IEEE (2018) Presser, D., Siqueira, F., Reina, F.: Performance modeling and task scheduling in distributed graph processing. In: 2018 IEEE International Congress on Big Data (BigData Congress), pp. 135–142. IEEE (2018)
24.
go back to reference Salihoglu, S., Widom, J.: GPS: A graph processing system. In: Proceedings of the 25th International Conference on Scientific and Statistical Database Management (2013) Salihoglu, S., Widom, J.: GPS: A graph processing system. In: Proceedings of the 25th International Conference on Scientific and Statistical Database Management (2013)
25.
go back to reference Seo, S., Yoon, E.J., Kim, J., Jin, S., Kim, J.S., Maeng, S.: Hama: An efficient matrix computation with the mapreduce framework. In: Proceedings of the 2nd IEEE International Conference on Cloud Computing Technology and Science, pp. 721–726 (2010) Seo, S., Yoon, E.J., Kim, J., Jin, S., Kim, J.S., Maeng, S.: Hama: An efficient matrix computation with the mapreduce framework. In: Proceedings of the 2nd IEEE International Conference on Cloud Computing Technology and Science, pp. 721–726 (2010)
26.
go back to reference Tsourakakis, C., Gkantsidis, C., Radunovic, B., Vojnovic, M.: Fennel: Streaming graph partitioning for massive scale graphs. In: Proceedings of the 7th ACM international conference on Web search and data mining, pp. 333–342. ACM (2014) Tsourakakis, C., Gkantsidis, C., Radunovic, B., Vojnovic, M.: Fennel: Streaming graph partitioning for massive scale graphs. In: Proceedings of the 7th ACM international conference on Web search and data mining, pp. 333–342. ACM (2014)
28.
go back to reference Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)CrossRef Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)CrossRef
30.
go back to reference White, T.: Hadoop: The definitive guide. O’Reilly Media, Inc., USA (2012) White, T.: Hadoop: The definitive guide. O’Reilly Media, Inc., USA (2012)
31.
go back to reference Xin, R.S., Gonzalez, J.E., Franklin, M.J., Stoica, I.: GraphX: A resilient distributed graph system on spark. In: Proceedings of the 1st International Workshop on Graph Data Management Experiences and Systems (2013) Xin, R.S., Gonzalez, J.E., Franklin, M.J., Stoica, I.: GraphX: A resilient distributed graph system on spark. In: Proceedings of the 1st International Workshop on Graph Data Management Experiences and Systems (2013)
32.
go back to reference Xue, J., Yang, Z., Hou, S., Dai, Y.: When computing meets heterogeneous cluster: Workload assignment in graph computation. In: Big Data (Big Data), 2015 IEEE International Conference on, IEEE, pp. 154–163, (2015) Xue, J., Yang, Z., Hou, S., Dai, Y.: When computing meets heterogeneous cluster: Workload assignment in graph computation. In: Big Data (Big Data), 2015 IEEE International Conference on, IEEE, pp. 154–163, (2015)
33.
go back to reference Yalavarthi, V.K., Khan, A.: Steering top-k influencers in dynamic graphs via local updates. In: 2018 IEEE International Conference on Big Data (Big Data), IEEE, pp. 576–583, (2018) Yalavarthi, V.K., Khan, A.: Steering top-k influencers in dynamic graphs via local updates. In: 2018 IEEE International Conference on Big Data (Big Data), IEEE, pp. 576–583, (2018)
34.
go back to reference Zaharia, M., Xin, R.S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J., Venkataraman, S., Franklin, M.J., et al.: Apache Spark: A unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)CrossRef Zaharia, M., Xin, R.S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J., Venkataraman, S., Franklin, M.J., et al.: Apache Spark: A unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)CrossRef
Metadata
Title
Partitioning-Aware Performance Modeling of Distributed Graph Processing Tasks
Authors
Daniel Presser
Frank Siqueira
Publication date
05-05-2023
Publisher
Springer US
Published in
International Journal of Parallel Programming / Issue 4-5/2023
Print ISSN: 0885-7458
Electronic ISSN: 1573-7640
DOI
https://doi.org/10.1007/s10766-023-00753-w

Premium Partner