Skip to main content
Erschienen in: Cluster Computing 2/2017

24.03.2017

A BSP model graph processing system on many cores

verfasst von: Siyan Lai, Guangda Lai, Fangzhou Lu, Guojun Shen, Jing Jin, Xiaola Lin

Erschienen in: Cluster Computing | Ausgabe 2/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Large-scale graph processing plays an increasingly important role for many data-related applications. Recently GPU has been adopted to accelerate various graph processing algorithms. However, since the architecture of GPU is very different from traditional computing model, the learning threshold for developing GPU-based applications is high. In this paper, we propose a GPU-based parallel graph processing system named GPregel to tackle this challenge. GPregel is a BSP model in graph processing such as Pregel from Google. It harnesses a lightweight compiler to hide the underlying complexity of the parallel processing details and simplifies programming, so that it greatly reduces the difficulty in utilizing the GPU to solve graph computing problems. Moreover, GPregel develops several optimizations for enhancing the performance, including (1) a special storage model for BSP model running on GPU, which overcomes the execution divergence and irregular memory access by coarse-grained designs; (2) a warp-level optimal strategy Parallelized-Messages-Sending and a thread-level optimal strategy Threads-Merge-Executing to accelerate the computations of high degree vertexes and low degree vertexes respectively; (3) messages copy mechanism optimization that utilizes a shared array and a rolling array to speed up the messages copy. Experiments demonstrate that GPregel can achieve high performance with little work for developers.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Kapre, N., Mehta, N., Rizzo, D., Eslick, I., Rubin, R., Uribe, T.E., DeHon, A.: GraphStep: A system architecture for sparse-graph algorithms. In: Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’06), pp. 143–151 (2006) Kapre, N., Mehta, N., Rizzo, D., Eslick, I., Rubin, R., Uribe, T.E., DeHon, A.: GraphStep: A system architecture for sparse-graph algorithms. In: Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’06), pp. 143–151 (2006)
2.
Zurück zum Zitat Bader, D.A., Madduri, K.: GTgraph: A Synthetic Graph Generator Suite, Atlanta (2006) Bader, D.A., Madduri, K.: GTgraph: A Synthetic Graph Generator Suite, Atlanta (2006)
3.
Zurück zum Zitat Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 135–146 (2010) Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 135–146 (2010)
4.
Zurück zum Zitat Zhong, J., He, B.: Medusa: simplified graph processing on GPUs. IEEE Trans. Parallel Distrib. Syst. 25(6), 1543–1552 (2014)MathSciNetCrossRef Zhong, J., He, B.: Medusa: simplified graph processing on GPUs. IEEE Trans. Parallel Distrib. Syst. 25(6), 1543–1552 (2014)MathSciNetCrossRef
5.
Zurück zum Zitat Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)CrossRef Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)CrossRef
6.
Zurück zum Zitat Harish, P., Narayanan, P.J.: Accelerating Large Graph Algorithms on the GPU Using CUDA. Lecture Notes in Computer Science, pp. 197–208. Springer, Berlin (2007) Harish, P., Narayanan, P.J.: Accelerating Large Graph Algorithms on the GPU Using CUDA. Lecture Notes in Computer Science, pp. 197–208. Springer, Berlin (2007)
7.
Zurück zum Zitat He, G., Feng, H., Li, C., Chen, H.: Parallel SimRank computation on large graphs with iterative aggregation. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 543–552, ACM (2010) He, G., Feng, H., Li, C., Chen, H.: Parallel SimRank computation on large graphs with iterative aggregation. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 543–552, ACM (2010)
8.
Zurück zum Zitat Katz, G.J., Kider, Jr, J.T.: All-pairs shortest-paths for large graphs on the GPU. In: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware, pp. 47–55 (2008) Katz, G.J., Kider, Jr, J.T.: All-pairs shortest-paths for large graphs on the GPU. In: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware, pp. 47–55 (2008)
9.
Zurück zum Zitat Vineet, V., Narayanan, P.J.: CUDA cuts: fast graph cuts on the GPU. In: Proceedings of the IEEE Computer Society Computer Vision and Pattern Recognition Workshops, pp. 1–8 (2008) Vineet, V., Narayanan, P.J.: CUDA cuts: fast graph cuts on the GPU. In: Proceedings of the IEEE Computer Society Computer Vision and Pattern Recognition Workshops, pp. 1–8 (2008)
12.
Zurück zum Zitat Nvidia.: CUDA C Programming Guide version 8.0. (2016) Nvidia.: CUDA C Programming Guide version 8.0. (2016)
13.
Zurück zum Zitat Bell, N., Hoberock, J.: Thrust: a productivity-oriented library for CUDA 26. In: Kirk, D., Hwu, W. (eds.) Programming Massively Parallel Processors, 2nd edn, pp. 339–358. Elsevier, Amsterdam (2013)CrossRef Bell, N., Hoberock, J.: Thrust: a productivity-oriented library for CUDA 26. In: Kirk, D., Hwu, W. (eds.) Programming Massively Parallel Processors, 2nd edn, pp. 339–358. Elsevier, Amsterdam (2013)CrossRef
14.
Zurück zum Zitat Satish, N., Harris, M., Garland, M.: Designing efficient sorting algorithms for manycore GPUs. In: Proceedings of the IEEE International Symposium on Parallel & Distributed Processing. IPDPS 2009, pp. 1–10 (2009) Satish, N., Harris, M., Garland, M.: Designing efficient sorting algorithms for manycore GPUs. In: Proceedings of the IEEE International Symposium on Parallel & Distributed Processing. IPDPS 2009, pp. 1–10 (2009)
15.
Zurück zum Zitat Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. Stanford InfoLab (1999) Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. Stanford InfoLab (1999)
16.
Zurück zum Zitat Mtibaa, A., May, M., Diot, C., Ammar, M.: PeopleRank: social opportunistic forwarding. IEEE Int. Conf. Comput. Commun. 54(1), 1–5 (2010)MATH Mtibaa, A., May, M., Diot, C., Ammar, M.: PeopleRank: social opportunistic forwarding. IEEE Int. Conf. Comput. Commun. 54(1), 1–5 (2010)MATH
17.
Zurück zum Zitat Jones, S.: Introduction to dynamic parallelism. In: GPU Technology Conference Presentation S, vol. 338, p. 2012 (2012) Jones, S.: Introduction to dynamic parallelism. In: GPU Technology Conference Presentation S, vol. 338, p. 2012 (2012)
19.
Zurück zum Zitat Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-MAT: a recursive model for graph mining. In: Proceedings of the Fourth SIAM International Conference on Data Mining SDM’ 04 (2004) Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-MAT: a recursive model for graph mining. In: Proceedings of the Fourth SIAM International Conference on Data Mining SDM’ 04 (2004)
20.
Zurück zum Zitat Amaral, L.A.N., Scala, A., Barthélémy, M., Stanley, H.E.: Classes of small-world networks. Proc. Natl. Acad. Sci. USA 97(21), 11149–11152 (2000)CrossRef Amaral, L.A.N., Scala, A., Barthélémy, M., Stanley, H.E.: Classes of small-world networks. Proc. Natl. Acad. Sci. USA 97(21), 11149–11152 (2000)CrossRef
21.
Zurück zum Zitat Erdos, P., Renyi, A.: On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci. 5, 17–60 (1960)MathSciNetMATH Erdos, P., Renyi, A.: On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci. 5, 17–60 (1960)MathSciNetMATH
23.
Zurück zum Zitat Harris, M., Sengupta, S., Owens, J.D.: Parallel prefix sum (scan) with CUDA. In: GPU Gems 3 (2007) Harris, M., Sengupta, S., Owens, J.D.: Parallel prefix sum (scan) with CUDA. In: GPU Gems 3 (2007)
24.
Zurück zum Zitat Gregor, D., Lumsdaine, A.: The parallel BGL: a generic library for distributed graph computations. In: Parallel Object-Oriented Scientific Computing (POOSC) (2005) Gregor, D., Lumsdaine, A.: The parallel BGL: a generic library for distributed graph computations. In: Parallel Object-Oriented Scientific Computing (POOSC) (2005)
28.
Zurück zum Zitat Salihoglu, S., Widom J.: GPS: a graph processing system *. Stanford InfoLab (2013) Salihoglu, S., Widom J.: GPS: a graph processing system *. Stanford InfoLab (2013)
29.
Zurück zum Zitat Bu, Y., Borkar, V., Jia, J., Carey, M.J., Condie, T.: Pregelix: big (ger) graph analytics on a dataflow engine. Proc. VLDB Endow. 8(2), 161–172 (2014)CrossRef Bu, Y., Borkar, V., Jia, J., Carey, M.J., Condie, T.: Pregelix: big (ger) graph analytics on a dataflow engine. Proc. VLDB Endow. 8(2), 161–172 (2014)CrossRef
31.
Zurück zum Zitat Hong, S., Kyun, S., Tayo, K., Olukotun, O.K.: Accelerating CUDA graph algorithms at maximum warp. In: PPoPP, vol. 46, no. 8, pp. 267–276 (2011) Hong, S., Kyun, S., Tayo, K., Olukotun, O.K.: Accelerating CUDA graph algorithms at maximum warp. In: PPoPP, vol. 46, no. 8, pp. 267–276 (2011)
32.
Zurück zum Zitat Luo, L., Wong, M., Hwu, W.: An effective GPU implementation of breadth-first search. In: Proceedings of the 47th ACM/IEEE Design Automation Conference (DAC), pp. 52–55 (2010) Luo, L., Wong, M., Hwu, W.: An effective GPU implementation of breadth-first search. In: Proceedings of the 47th ACM/IEEE Design Automation Conference (DAC), pp. 52–55 (2010)
33.
Zurück zum Zitat Merrill, D., Garland, M., Grimshaw, A.: High-performance and scalable GPU graph traversal. In: Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’12), vol. 47, no. 8, pp. 117–128 (2011) Merrill, D., Garland, M., Grimshaw, A.: High-performance and scalable GPU graph traversal. In: Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’12), vol. 47, no. 8, pp. 117–128 (2011)
34.
Zurück zum Zitat Liu, H., Huang, H.H.: Enterprise: breadth-first graph traversal on gpus. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, p. 68 (2015) Liu, H., Huang, H.H.: Enterprise: breadth-first graph traversal on gpus. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, p. 68 (2015)
35.
Zurück zum Zitat Liu, H., Huang, H.H., Hu, Y.: iBFS: concurrent breadth-first search on GPUs. In: Proceedings of the 2016 International Conference on Management of Data, ACM, pp. 403–416 (2016) Liu, H., Huang, H.H., Hu, Y.: iBFS: concurrent breadth-first search on GPUs. In: Proceedings of the 2016 International Conference on Management of Data, ACM, pp. 403–416 (2016)
36.
Zurück zum Zitat Wang, J., Rubin, N., Sidelnik, A., Yalamanchili, S.: Laperm: Locality aware scheduler for dynamic parallelism on gpus. In: Proceedings of the 43rd International Symposium on Computer Architecture, pp. 583–595 (2016) Wang, J., Rubin, N., Sidelnik, A., Yalamanchili, S.: Laperm: Locality aware scheduler for dynamic parallelism on gpus. In: Proceedings of the 43rd International Symposium on Computer Architecture, pp. 583–595 (2016)
37.
Zurück zum Zitat Tang, X., Pattnaik, A., Jiang, H., Kayiran, O., Jog, A., Sreepathi Pai, M.I., Das, C.R.: Controlled Kernel Launch for dynamic parallelism in GPUs. In: Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing, ACM (2017) Tang, X., Pattnaik, A., Jiang, H., Kayiran, O., Jog, A., Sreepathi Pai, M.I., Das, C.R.: Controlled Kernel Launch for dynamic parallelism in GPUs. In: Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing, ACM (2017)
38.
Zurück zum Zitat El Hajj, I., Gómez-Luna, J., Li, C., Chang, L.W., Milojicic, D., Hwu, W.M.: KLAP: kernel launch aggregation and promotion for optimizing dynamic parallelism. In: Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 1–12 (2016) El Hajj, I., Gómez-Luna, J., Li, C., Chang, L.W., Milojicic, D., Hwu, W.M.: KLAP: kernel launch aggregation and promotion for optimizing dynamic parallelism. In: Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 1–12 (2016)
39.
Zurück zum Zitat Wang, Y., Pan, Y., Davidson, A., Wu, Y., Yang, C., Wang, L., Owens, J. D.: Gunrock: GPU graph analytics. arXiv preprint arXiv:1701.01170 (2017) Wang, Y., Pan, Y., Davidson, A., Wu, Y., Yang, C., Wang, L., Owens, J. D.: Gunrock: GPU graph analytics. arXiv preprint arXiv:​1701.​01170 (2017)
Metadaten
Titel
A BSP model graph processing system on many cores
verfasst von
Siyan Lai
Guangda Lai
Fangzhou Lu
Guojun Shen
Jing Jin
Xiaola Lin
Publikationsdatum
24.03.2017
Verlag
Springer US
Erschienen in
Cluster Computing / Ausgabe 2/2017
Print ISSN: 1386-7857
Elektronische ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-017-0829-0

Weitere Artikel der Ausgabe 2/2017

Cluster Computing 2/2017 Zur Ausgabe