Skip to main content

2018 | OriginalPaper | Buchkapitel

PruX: Communication Pruning of Parallel BFS in the Graph 500 Benchmark

verfasst von : Menghan Jia, Yiming Zhang, Dongsheng Li, Songzhu Mei

Erschienen in: Algorithms and Architectures for Parallel Processing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Parallel Breadth First Search (BFS) is a representative algorithm in Graph 500, the well-known benchmark for evaluating supercomputers for data-intensive applications. However, the specific storage model of Graph 500 brings severe challenge to efficient communication when computing parallel BFS in large-scale graphs. In this paper, we propose an effective method PruX for optimizing the communication of parallel BFS in two aspects. First, we adopt a scalable structure to record the access information of the vertices on each machine. Second, we prune unnecessary inter-machine communication for previously accessed vertices by checking the records. Evaluation results show that the performance of our method is at least six times higher than that of the original implementation of parallel BFS.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The PruX and direction optimization are all optimized by modifying the algorithm execution mode to implement the parallel BFS algorithm. So we choose direction optimization as a contrast.
 
2
We only implement the direction optimization at the algorithm level, and do not optimize its storage and computation, which results in breakdown when SCALE is too large.
 
3
Because there are still a lot of isolated vertices in the graph, the direction optimization will compute these vertices in the bottom-up BFS algorithm, which will bring a lot of computation cost and lead to the performance degradation of large-scale graphs.
 
Literatur
1.
Zurück zum Zitat Agarwal, V., Petrini, F., Pasetto, D., Bader, D.A.: Scalable graph exploration on multicore processors. In: High Performance Computing, Networking, Storage and Analysis, pp. 1–11 (2010) Agarwal, V., Petrini, F., Pasetto, D., Bader, D.A.: Scalable graph exploration on multicore processors. In: High Performance Computing, Networking, Storage and Analysis, pp. 1–11 (2010)
2.
Zurück zum Zitat Ajwani, D., Meyer, U., Osipov, V.:. Improved external memory BFS implementation. In: The Workshop on Algorithm Engineering & Experiments (2007) Ajwani, D., Meyer, U., Osipov, V.:. Improved external memory BFS implementation. In: The Workshop on Algorithm Engineering & Experiments (2007)
3.
Zurück zum Zitat Akkary, H., Driscoll, M.A.: A dynamic multithreading processor. In: 1998 Proceedings of ACM/IEEE International Symposium on Microarchitecture, Micro-31, pp. 226–236 (1998) Akkary, H., Driscoll, M.A.: A dynamic multithreading processor. In: 1998 Proceedings of ACM/IEEE International Symposium on Microarchitecture, Micro-31, pp. 226–236 (1998)
4.
Zurück zum Zitat Awerbuch, B., Gallager, R.: A new distributed algorithm to find breadth first search trees. IEEE Trans. Inf. Theory 33(3), 315–322 (2003)CrossRef Awerbuch, B., Gallager, R.: A new distributed algorithm to find breadth first search trees. IEEE Trans. Inf. Theory 33(3), 315–322 (2003)CrossRef
5.
Zurück zum Zitat Bader, D.A., Madduri, K.: Designing multithreaded algorithms for breadth-first search and st-connectivity on the Cray MTA-2, vol. 34, no. 2, pp. 523–530 (2006) Bader, D.A., Madduri, K.: Designing multithreaded algorithms for breadth-first search and st-connectivity on the Cray MTA-2, vol. 34, no. 2, pp. 523–530 (2006)
6.
Zurück zum Zitat Beamer, S., Patterson, D.: Direction-optimizing breadth-first search. In: International Conference on High Performance Computing, Networking, Storage and Analysis, p. 12 (2012) Beamer, S., Patterson, D.: Direction-optimizing breadth-first search. In: International Conference on High Performance Computing, Networking, Storage and Analysis, p. 12 (2012)
7.
Zurück zum Zitat Bidstrup, S.M., Grady, C.P.L.: SSSP: simulation of single-sludge processes. Journal 60(3), 351–361 (1988) Bidstrup, S.M., Grady, C.P.L.: SSSP: simulation of single-sludge processes. Journal 60(3), 351–361 (1988)
8.
Zurück zum Zitat Bulu, A.: Parallel breadth-first search on distributed memory systems, pp. 1–12 (2011) Bulu, A.: Parallel breadth-first search on distributed memory systems, pp. 1–12 (2011)
9.
Zurück zum Zitat Checconi, F., Petrini, F.: Traversing trillions of edges in real time: graph exploration on large-scale parallel machines. In: IEEE International Parallel and Distributed Processing Symposium, pp. 425–434 (2014) Checconi, F., Petrini, F.: Traversing trillions of edges in real time: graph exploration on large-scale parallel machines. In: IEEE International Parallel and Distributed Processing Symposium, pp. 425–434 (2014)
10.
Zurück zum Zitat Chow, E., Henderson, K., Yoo, A.: Distributed breadth-first search with 2-D partitioning. Lawrence Livermore National Laboratory (2005) Chow, E., Henderson, K., Yoo, A.: Distributed breadth-first search with 2-D partitioning. Lawrence Livermore National Laboratory (2005)
11.
Zurück zum Zitat Dongarra, J., et al.: Special issue - MPI - a message passing interface standard. Int. J. Supercomput. Appl. High Perform. Comput. 8, 165 (1994) Dongarra, J., et al.: Special issue - MPI - a message passing interface standard. Int. J. Supercomput. Appl. High Perform. Comput. 8, 165 (1994)
12.
Zurück zum Zitat Duran, A., Klemm, M.: The Intel® many integrated core architecture. In: International Conference on High Performance Computing and Simulation, pp. 365–366 (2012) Duran, A., Klemm, M.: The Intel® many integrated core architecture. In: International Conference on High Performance Computing and Simulation, pp. 365–366 (2012)
13.
Zurück zum Zitat Greathouse, J.L., Daga, M.: Efficient sparse matrix-vector multiplication on GPUs using the CSR storage format. In: High Performance Computing, Networking, Storage, pp. 769–780 (2015) Greathouse, J.L., Daga, M.: Efficient sparse matrix-vector multiplication on GPUs using the CSR storage format. In: High Performance Computing, Networking, Storage, pp. 769–780 (2015)
14.
Zurück zum Zitat Jose, J., Potluri, S., Tomko, K., Panda, D.K.: Designing scalable graph500 benchmark with hybrid MPI+ OpenSHMEM programming models (2013) Jose, J., Potluri, S., Tomko, K., Panda, D.K.: Designing scalable graph500 benchmark with hybrid MPI+ OpenSHMEM programming models (2013)
15.
Zurück zum Zitat Leiserson, C.E., Schardl, T.B.: A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers). In: SPAA 2010: Proceedings of the ACM Symposium on Parallelism in Algorithms and Architectures, Thira, Santorini, Greece, June, pp. 303–314 (2010) Leiserson, C.E., Schardl, T.B.: A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers). In: SPAA 2010: Proceedings of the ACM Symposium on Parallelism in Algorithms and Architectures, Thira, Santorini, Greece, June, pp. 303–314 (2010)
16.
Zurück zum Zitat Lu, H., Tan, G., Chen, M., Sun, N.: Reducing communication in parallel breadth-first search on distributed memory systems, pp. 1261–1268 (2015) Lu, H., Tan, G., Chen, M., Sun, N.: Reducing communication in parallel breadth-first search on distributed memory systems, pp. 1261–1268 (2015)
17.
Zurück zum Zitat Lumsdaine, A., Gregor, D., Hendrickson, B., Berry, J.: Challenges in parallel graph processing. Parallel Process. Lett. 17(01), 5–20 (2007)MathSciNetCrossRef Lumsdaine, A., Gregor, D., Hendrickson, B., Berry, J.: Challenges in parallel graph processing. Parallel Process. Lett. 17(01), 5–20 (2007)MathSciNetCrossRef
18.
Zurück zum Zitat Luo, L., Wong, M., Hwu, W.M.: An effective GPU implementation of breadth-first search. In: Design Automation Conference, pp. 52–55 (2010) Luo, L., Wong, M., Hwu, W.M.: An effective GPU implementation of breadth-first search. In: Design Automation Conference, pp. 52–55 (2010)
19.
Zurück zum Zitat Malewicz, G., et al.: Pregel: a system for large-scale graph processing. In: ACM SIGMOD International Conference on Management of Data, pp. 135–146 (2010) Malewicz, G., et al.: Pregel: a system for large-scale graph processing. In: ACM SIGMOD International Conference on Management of Data, pp. 135–146 (2010)
21.
Zurück zum Zitat Snir, M.: MPI : The Complete Reference, pp. 4038–4040 (2010) Snir, M.: MPI : The Complete Reference, pp. 4038–4040 (2010)
22.
Zurück zum Zitat Su, B.Y., Brutch, T.G., Keutzer, K.: Parallel BFS graph traversal on images using structured grid, pp. 4489–4492 (2010) Su, B.Y., Brutch, T.G., Keutzer, K.: Parallel BFS graph traversal on images using structured grid, pp. 4489–4492 (2010)
23.
Zurück zum Zitat Yoo, A., Chow, E., Henderson, K., Mclendon, W., Hendrickson, B., Catalyurek, U.: A scalable distributed parallel breadth-first search algorithm on BlueGene/L. In: Proceedings of the ACM/IEEE SC 2005 Conference on Supercomputing, p. 25 (2005) Yoo, A., Chow, E., Henderson, K., Mclendon, W., Hendrickson, B., Catalyurek, U.: A scalable distributed parallel breadth-first search algorithm on BlueGene/L. In: Proceedings of the ACM/IEEE SC 2005 Conference on Supercomputing, p. 25 (2005)
Metadaten
Titel
PruX: Communication Pruning of Parallel BFS in the Graph 500 Benchmark
verfasst von
Menghan Jia
Yiming Zhang
Dongsheng Li
Songzhu Mei
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-030-05051-1_8