Skip to main content

2016 | OriginalPaper | Buchkapitel

Performance Analysis of Spark/GraphX on POWER8 Cluster

verfasst von : Xinyu Que, Lars Schneidenbach, Fabio Checconi, Carlos H. Ã. Costa, Daniele Buono

Erschienen in: High Performance Computing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

POWER 8, the latest RISC (Reduced Instruction Set Computer) microprocessor of the IBM Power architecture family, was designed to significantly benefit emerging workloads, including Business Analytics, Cloud Computing and High Performance Computing. In this paper, we provide a thorough performance evaluation on a widely used large-scale graph processing framework, Spark/GraphX, on a POWER 8 cluster. Note that we use Spark and Java versions out of the box without any optimization. We examine the performance with several important graph kernels such as Breadth-First Search, Connected Components, and PageRank using both large real-world social graphs and synthetic graphs of billions of edges. We study the Spark/GraphX performance against some architectural aspects and perform the first Spark/GraphX scalability test with up to 16 POWER 8 nodes.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
All transformations in Spark are lazy, in that they do not compute their results right away. Instead, they just remember the transformations applied to some base dataset (e.g. a file). The transformations are only computed when an action requires a result to be returned to the driver program.
 
2
A system monitor command used to report on various system loads.
 
Literatur
7.
Zurück zum Zitat Abu-Doleh, A., Catalyurek, U.V.: Spaler: Spark And GraphX based de novo genome assembler. In: 2015 IEEE International Conference on Big Data (Big Data), pp. 1013–1018. IEEE (2015) Abu-Doleh, A., Catalyurek, U.V.: Spaler: Spark And GraphX based de novo genome assembler. In: 2015 IEEE International Conference on Big Data (Big Data), pp. 1013–1018. IEEE (2015)
8.
Zurück zum Zitat Brock, B., Liu, F., Rajamani, K.: Stac-a2™ benchmark on POWER8. In: Proceedings of the 8th Workshop on High Performance Computational Finance, WHPCF 2015, p. 1:1–1:8. ACM, New York (2015) Brock, B., Liu, F., Rajamani, K.: Stac-a2™ benchmark on POWER8. In: Proceedings of the 8th Workshop on High Performance Computational Finance, WHPCF 2015, p. 1:1–1:8. ACM, New York (2015)
9.
Zurück zum Zitat Buono, D., Petrini, F., Checconi, F., Liu, X., Que, X., Long, C., Tuan, T.C.: Optimizing sparse matrix-vector multiplication for large-scale data analytics. In: Proceedings of the 30th ACM on International Conference on Supercomputing, ICS 2016. ACM (2016, to appear) Buono, D., Petrini, F., Checconi, F., Liu, X., Que, X., Long, C., Tuan, T.C.: Optimizing sparse matrix-vector multiplication for large-scale data analytics. In: Proceedings of the 30th ACM on International Conference on Supercomputing, ICS 2016. ACM (2016, to appear)
10.
Zurück zum Zitat Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-MAT: a recursive model for graph mining. In: Proceedings of the 4th ACM on International Conference on Data Mining (SDM 2004), Lake Buena Vista, pp. 442–446, April 2004 Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-MAT: a recursive model for graph mining. In: Proceedings of the 4th ACM on International Conference on Data Mining (SDM 2004), Lake Buena Vista, pp. 442–446, April 2004
11.
Zurück zum Zitat Ewart, T., Yates, S., Cremonesi, F., Kumbhar, P., Schürmann, F., Delalondre, F.: Performance evaluation of the IBM POWER8 architecture to support computational neuroscientific application using morphologically detailed neurons. In: Proceedings of the 6th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems, PMBS 2015, p. 1:1–1:11. ACM, New York (2015) Ewart, T., Yates, S., Cremonesi, F., Kumbhar, P., Schürmann, F., Delalondre, F.: Performance evaluation of the IBM POWER8 architecture to support computational neuroscientific application using morphologically detailed neurons. In: Proceedings of the 6th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems, PMBS 2015, p. 1:1–1:11. ACM, New York (2015)
12.
Zurück zum Zitat Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: PowerGraph: distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI 2012, pp. 17–30. USENIX Association, Berkeley (2012) Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: PowerGraph: distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI 2012, pp. 17–30. USENIX Association, Berkeley (2012)
13.
Zurück zum Zitat Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: GraphX: graph processing in a distributed dataflow framework. In: Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, OSDI 2014, pp. 599–613. USENIX Association, Berkeley (2014) Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: GraphX: graph processing in a distributed dataflow framework. In: Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, OSDI 2014, pp. 599–613. USENIX Association, Berkeley (2014)
14.
Zurück zum Zitat Heintz, B., Chandra, A.: Enabling scalable social group analytics via hypergraph analysis systems. In: 7th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 2015). USENIX Association, Santa Clara, July 2015 Heintz, B., Chandra, A.: Enabling scalable social group analytics via hypergraph analysis systems. In: 7th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 2015). USENIX Association, Santa Clara, July 2015
15.
Zurück zum Zitat Kang, U., Tsourakakis, C.E., Faloutsos, C.: Pegasus: a peta-scale graph mining system implementation and observations. In: Proceedings of the 2009 Ninth IEEE International Conference on Data Mining, ICDM 2009, pp. 229–238. IEEE Computer Society, Washington, DC (2009) Kang, U., Tsourakakis, C.E., Faloutsos, C.: Pegasus: a peta-scale graph mining system implementation and observations. In: Proceedings of the 2009 Ninth IEEE International Conference on Data Mining, ICDM 2009, pp. 229–238. IEEE Computer Society, Washington, DC (2009)
16.
Zurück zum Zitat Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: WWW, pp. 591–600. ACM, New York (2010) Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: WWW, pp. 591–600. ACM, New York (2010)
17.
Zurück zum Zitat Langewisch, R.: A performance study of an implementation of the push-relabel maximum flow algorithm in Apache Spark’s GraphX (2015) Langewisch, R.: A performance study of an implementation of the push-relabel maximum flow algorithm in Apache Spark’s GraphX (2015)
18.
Zurück zum Zitat Leskovec, J., Chakrabarti, D., Kleinberg, J., Faloutsos, C., Ghahramani, Z.: Kronecker graphs: an approach to modeling networks. J. Mach. Learn. Res. 11, 985–1042 (2010)MathSciNetMATH Leskovec, J., Chakrabarti, D., Kleinberg, J., Faloutsos, C., Ghahramani, Z.: Kronecker graphs: an approach to modeling networks. J. Mach. Learn. Res. 11, 985–1042 (2010)MathSciNetMATH
19.
Zurück zum Zitat Li, M., Tan, J., Wang, Y., Zhang, L., Salapura, V.: SparkBench: a comprehensive benchmarking suite for in memory data analytic platform spark. In: Proceedings of the 12th ACM International Conference on Computing Frontiers, CF 2015, pp. 53:1–53:8. ACM, New York (2015) Li, M., Tan, J., Wang, Y., Zhang, L., Salapura, V.: SparkBench: a comprehensive benchmarking suite for in memory data analytic platform spark. In: Proceedings of the 12th ACM International Conference on Computing Frontiers, CF 2015, pp. 53:1–53:8. ACM, New York (2015)
20.
Zurück zum Zitat Lim, S., Lee, S., Ganesh, G., Brown, T.C., Sukumar, S.R.: Graph processing platforms at scale: practices and experiences. In: 2015 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2015, 29–31 March 2015, Philadelphia, PA, USA, pp. 42–51 (2015) Lim, S., Lee, S., Ganesh, G., Brown, T.C., Sukumar, S.R.: Graph processing platforms at scale: practices and experiences. In: 2015 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2015, 29–31 March 2015, Philadelphia, PA, USA, pp. 42–51 (2015)
21.
Zurück zum Zitat Liu, X., Buono, D., Checconi, F., Choi, J.W., Que, X., Petrini, F., Gunnels, J., Stuecheli, J.: An early performance study of large-scale POWER8 SMP systems. In: Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium. IPDPS 2015, IEEE Computer Society, Washington, DC (2016) Liu, X., Buono, D., Checconi, F., Choi, J.W., Que, X., Petrini, F., Gunnels, J., Stuecheli, J.: An early performance study of large-scale POWER8 SMP systems. In: Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium. IPDPS 2015, IEEE Computer Society, Washington, DC (2016)
22.
Zurück zum Zitat Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, pp. 135–146. ACM, New York (2010) Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, pp. 135–146. ACM, New York (2010)
23.
Zurück zum Zitat Mushtaq, H., Al-Ars, Z.: Cluster-based Apache Spark implementation of the GATK DNA analysis pipeline. In: IEEE International Conference on Bioinformatics and Biomedicine, pp. 1471–1477. IEEE Computer Society (2015) Mushtaq, H., Al-Ars, Z.: Cluster-based Apache Spark implementation of the GATK DNA analysis pipeline. In: IEEE International Conference on Bioinformatics and Biomedicine, pp. 1471–1477. IEEE Computer Society (2015)
24.
Zurück zum Zitat Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. Technical report 1999–66, Stanford InfoLab, previous number=SIDL-WP-1999-0120, November 1999 Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. Technical report 1999–66, Stanford InfoLab, previous number=SIDL-WP-1999-0120, November 1999
25.
Zurück zum Zitat Que, X., Checconi, F., Petrini, F., Liu, X., Buono, D.: Exploring network optimizations for large-scale graph analytics. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015, pp. 26:1–26:10. ACM, New York (2015) Que, X., Checconi, F., Petrini, F., Liu, X., Buono, D.: Exploring network optimizations for large-scale graph analytics. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015, pp. 26:1–26:10. ACM, New York (2015)
26.
Zurück zum Zitat Roy, A., Mihailovic, I., Zwaenepoel, W.: X-stream: Edge-centric graph processing using streaming partitions. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP 2013, pp. 472–488. ACM, New York (2013) Roy, A., Mihailovic, I., Zwaenepoel, W.: X-stream: Edge-centric graph processing using streaming partitions. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP 2013, pp. 472–488. ACM, New York (2013)
27.
Zurück zum Zitat Salihoglu, S., Widom, J.: GPS: a graph processing system. In: Proceedings of the 25th International Conference on Scientific and Statistical Database Management, SSDBM 2013, pp. 22:1–22:12. ACM, New York (2013) Salihoglu, S., Widom, J.: GPS: a graph processing system. In: Proceedings of the 25th International Conference on Scientific and Statistical Database Management, SSDBM 2013, pp. 22:1–22:12. ACM, New York (2013)
28.
Zurück zum Zitat Seshadhri, C., Pinar, A., Kolda, T.G.: An in-depth study of stochastic Kronecker graphs. In: International Conference on Data Mining, pp. 587–596. IEEE Computer Society, Los Alamitos (2011) Seshadhri, C., Pinar, A., Kolda, T.G.: An in-depth study of stochastic Kronecker graphs. In: International Conference on Data Mining, pp. 587–596. IEEE Computer Society, Los Alamitos (2011)
29.
Zurück zum Zitat Shun, J., Blelloch, G.E.: Ligra: a lightweight graph processing framework for shared memory. SIGPLAN Not. 48(8), 135–146 (2013)CrossRef Shun, J., Blelloch, G.E.: Ligra: a lightweight graph processing framework for shared memory. SIGPLAN Not. 48(8), 135–146 (2013)CrossRef
30.
Zurück zum Zitat Sinharoy, B., Norstrand, J.A.V., Eickemeyer, R.J., Le, H.Q., Leenstra, J., Nguyen, D.Q., Konigsburg, B., Ward, K., Brown, M.D., Moreira, J.E., Levitan, D., Tung, S., Hrusecky, D., Bishop, J.W., Gschwind, M., Boersma, M., Kroener, M., Kaltenbach, M., Karkhanis, T., Fernsler, K.M.: IBM POWER8 processor core microarchitecture. IBM J. Res. Dev. 59(1), 2:1–2:21 (2015)CrossRef Sinharoy, B., Norstrand, J.A.V., Eickemeyer, R.J., Le, H.Q., Leenstra, J., Nguyen, D.Q., Konigsburg, B., Ward, K., Brown, M.D., Moreira, J.E., Levitan, D., Tung, S., Hrusecky, D., Bishop, J.W., Gschwind, M., Boersma, M., Kroener, M., Kaltenbach, M., Karkhanis, T., Fernsler, K.M.: IBM POWER8 processor core microarchitecture. IBM J. Res. Dev. 59(1), 2:1–2:21 (2015)CrossRef
31.
Zurück zum Zitat Sud, A., Andersen, E., Curtis, S., Lin, M.C., Manocha, D.: Real-time path planning for virtual agents in dynamic environments. In: IEEE Virtual Reality, Charlotte, NC, March 2007 Sud, A., Andersen, E., Curtis, S., Lin, M.C., Manocha, D.: Real-time path planning for virtual agents in dynamic environments. In: IEEE Virtual Reality, Charlotte, NC, March 2007
32.
Zurück zum Zitat Wu, M., Yang, F., Xue, J., Xiao, W., Miao, Y., Wei, L., Lin, H., Dai, Y., Zhou, L.: GraM: scaling graph computation to the trillions. In: Proceedings of the Sixth ACM Symposium on Cloud Computing, SoCC 2015, pp. 408–421. ACM, New York (2015) Wu, M., Yang, F., Xue, J., Xiao, W., Miao, Y., Wei, L., Lin, H., Dai, Y., Zhou, L.: GraM: scaling graph computation to the trillions. In: Proceedings of the Sixth ACM Symposium on Cloud Computing, SoCC 2015, pp. 408–421. ACM, New York (2015)
33.
Zurück zum Zitat Xin, R.S., Gonzalez, J.E., Franklin, M.J., Stoica, I.: GraphX: a resilient distributed graph system on spark. In: First International Workshop on Graph Data Management Experiences and Systems, GRADES 2013, pp. 2:1–2:6. ACM, New York (2013) Xin, R.S., Gonzalez, J.E., Franklin, M.J., Stoica, I.: GraphX: a resilient distributed graph system on spark. In: First International Workshop on Graph Data Management Experiences and Systems, GRADES 2013, pp. 2:1–2:6. ACM, New York (2013)
34.
Zurück zum Zitat Yang, J., Leskovec, J.: Defining and evaluating network communities based on ground-truth (2012). CoRR Yang, J., Leskovec, J.: Defining and evaluating network communities based on ground-truth (2012). CoRR
35.
Zurück zum Zitat Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud 2010, p. 10. USENIX Association, Berkeley (2010) Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud 2010, p. 10. USENIX Association, Berkeley (2010)
36.
Zurück zum Zitat Zhang, L., Kim, Y.J., Manocha, D.: A simple path non-existence algorithm using C-obstacle query. In: Proceedings of the International Workshop on the Algorithmic Foundations of Robotics (WAFR 2006), New York City, July 2006 Zhang, L., Kim, Y.J., Manocha, D.: A simple path non-existence algorithm using C-obstacle query. In: Proceedings of the International Workshop on the Algorithmic Foundations of Robotics (WAFR 2006), New York City, July 2006
Metadaten
Titel
Performance Analysis of Spark/GraphX on POWER8 Cluster
verfasst von
Xinyu Que
Lars Schneidenbach
Fabio Checconi
Carlos H. Ã. Costa
Daniele Buono
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-46079-6_19

Neuer Inhalt