Skip to main content
Erschienen in: Distributed and Parallel Databases 3-4/2017

29.08.2017

Scalable parallel graph algorithms with matrix–vector multiplication evaluated with queries

verfasst von: Wellington Cabrera, Carlos Ordonez

Erschienen in: Distributed and Parallel Databases | Ausgabe 3-4/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Graph problems are significantly harder to solve with large graphs residing on disk compared to main memory only. In this work, we study how to solve four important graph problems: reachability from a source vertex, single source shortest path, weakly connected components, and PageRank. It is well known that the aforementioned algorithms can be expressed as an iteration of matrix–vector multiplications under different semi-rings. Based on this mathematical foundation, we show how to express the computation with standard relational queries and then we study how to efficiently evaluate them in parallel in a shared-nothing architecture. We identify a common algorithmic pattern that unifies the four graph algorithms, considering a common mathematical foundation based on sparse matrix–vector multiplication. The net gain is that our SQL-based approach enables solving “big data” graph problems on parallel database systems, debunking common wisdom that they are cumbersome and slow. Using large social networks and hyper-link real data sets, we present performance comparisons between a columnar DBMS, an open-source array DBMS, and Spark’s GraphX.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Abadi, D., Boncz, P., Harizopoulos, S., Idreos, S., Madden, S., et al.: The design and implementation of modern column-oriented database systems. Found. Trends® Databases 5(3), 197–280 (2013)CrossRef Abadi, D., Boncz, P., Harizopoulos, S., Idreos, S., Madden, S., et al.: The design and implementation of modern column-oriented database systems. Found. Trends® Databases 5(3), 197–280 (2013)CrossRef
2.
Zurück zum Zitat Bai, Z., Demmel, J., Dongarra, J., Ruhe, A., van der Vorst, H.: Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide, vol. 11. Siam, Philadelphia (2000)CrossRefMATH Bai, Z., Demmel, J., Dongarra, J., Ruhe, A., van der Vorst, H.: Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide, vol. 11. Siam, Philadelphia (2000)CrossRefMATH
3.
Zurück zum Zitat Bu, Y., Borkar, V., Jia, J., Carey, M.J., Condie, T.: Pregelix: big(ger) graph analytics on a dataflow engine. Proc. VLDB Endow. 8(2), 161–172 (2014)CrossRef Bu, Y., Borkar, V., Jia, J., Carey, M.J., Condie, T.: Pregelix: big(ger) graph analytics on a dataflow engine. Proc. VLDB Endow. 8(2), 161–172 (2014)CrossRef
4.
Zurück zum Zitat Cabrera, W., Ordonez, C.: Unified algorithm to solve several graph problems with relational queries. In: Proceedings of the 10th Alberto Mendelzon International Workshop on Foundations of Data Management, Panama City, Panama, 8–10 May 2016 (2016) Cabrera, W., Ordonez, C.: Unified algorithm to solve several graph problems with relational queries. In: Proceedings of the 10th Alberto Mendelzon International Workshop on Foundations of Data Management, Panama City, Panama, 8–10 May 2016 (2016)
5.
Zurück zum Zitat Choi, J., Demmel, J., Dhillon, I., Dongarra, J., Ostrouchov, S., Petitet, A., Stanley, K., Walker, D., Whaley, R.: Scalapack: a portable linear algebra library for distributed memory computers-design issues and performance. Comput. Phys. Commun. 97(1–2), 1–15 (1996)CrossRefMATH Choi, J., Demmel, J., Dhillon, I., Dongarra, J., Ostrouchov, S., Petitet, A., Stanley, K., Walker, D., Whaley, R.: Scalapack: a portable linear algebra library for distributed memory computers-design issues and performance. Comput. Phys. Commun. 97(1–2), 1–15 (1996)CrossRefMATH
6.
Zurück zum Zitat Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn. The MIT Press, Cambridge (2009)MATH Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn. The MIT Press, Cambridge (2009)MATH
7.
Zurück zum Zitat DeWitt, D., Gray, J.: Parallel database systems: the future of high performance database systems. Commun. ACM 35(6), 85–98 (1992)CrossRef DeWitt, D., Gray, J.: Parallel database systems: the future of high performance database systems. Commun. ACM 35(6), 85–98 (1992)CrossRef
8.
Zurück zum Zitat Faloutsos, M., Faloutsos, P., Faloutsos, C.: On power-law relationships of the internet topology. In: ACM SIGCOMM computer communication review, vol. 29, pp. 251–262. ACM (1999) Faloutsos, M., Faloutsos, P., Faloutsos, C.: On power-law relationships of the internet topology. In: ACM SIGCOMM computer communication review, vol. 29, pp. 251–262. ACM (1999)
9.
Zurück zum Zitat Fineman, J.T., Robinson, E.: Fundamental Graph Algorithms, chapter 5, pp. 45–58 Fineman, J.T., Robinson, E.: Fundamental Graph Algorithms, chapter 5, pp. 45–58
10.
Zurück zum Zitat Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: Powergraph: Distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI’12, pp. 17–30, Berkeley, CA, USA, USENIX Association (2012) Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: Powergraph: Distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI’12, pp. 17–30, Berkeley, CA, USA, USENIX Association (2012)
11.
Zurück zum Zitat Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: Graphx: Graph processing in a distributed dataflow framework. In: 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14) Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: Graphx: Graph processing in a distributed dataflow framework. In: 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14)
12.
Zurück zum Zitat Idreos, S., Groffen, F., Nes, N., Manegold, S., Mullender, K., Kersten, M.: MonetDB: two decades of research in column-oriented database architectures. IEEE Data Eng. Bull. 35(1), 40–45 (2012) Idreos, S., Groffen, F., Nes, N., Manegold, S., Mullender, K., Kersten, M.: MonetDB: two decades of research in column-oriented database architectures. IEEE Data Eng. Bull. 35(1), 40–45 (2012)
13.
Zurück zum Zitat Jindal, A., Madden, S., Castellanos, M., Hsu, M.: Graph analytics using vertica relational database. In: 2015 IEEE International Conference on Big Data (Big Data), pp. 1191–1200. IEEE (2015) Jindal, A., Madden, S., Castellanos, M., Hsu, M.: Graph analytics using vertica relational database. In: 2015 IEEE International Conference on Big Data (Big Data), pp. 1191–1200. IEEE (2015)
14.
Zurück zum Zitat Jindal, A., Rawlani, P., Wu, E., Madden, S., Deshpande, A., Stonebraker, M.: Vertexica: your relational friend for graph analytics!. Proc. VLDB Endow. 7(13), 1669–1672 (2014)CrossRef Jindal, A., Rawlani, P., Wu, E., Madden, S., Deshpande, A., Stonebraker, M.: Vertexica: your relational friend for graph analytics!. Proc. VLDB Endow. 7(13), 1669–1672 (2014)CrossRef
15.
Zurück zum Zitat Kamvar, S.D., Haveliwala, T.H., Manning, C.D., Golub, G.H.: Extrapolation methods for accelerating pagerank computations. In: Proceedings of the 12th Int. Conf. on World Wide Web, pp. 261–270. ACM (2003) Kamvar, S.D., Haveliwala, T.H., Manning, C.D., Golub, G.H.: Extrapolation methods for accelerating pagerank computations. In: Proceedings of the 12th Int. Conf. on World Wide Web, pp. 261–270. ACM (2003)
16.
Zurück zum Zitat Kang, U., Tsourakakis, C.E., Faloutsos, C.: Pegasus: a peta-scale graph mining system implementation and observations. In: Proceedings of the 2009 Ninth IEEE International Conference on Data Mining Kang, U., Tsourakakis, C.E., Faloutsos, C.: Pegasus: a peta-scale graph mining system implementation and observations. In: Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
17.
Zurück zum Zitat Kepner, J., Gilbert, J.: Graph algorithms in the language of linear algebra (2011) Kepner, J., Gilbert, J.: Graph algorithms in the language of linear algebra (2011)
18.
Zurück zum Zitat Lehmberg, O., Meusel, R., Bizer, C.: Graph structure in the web: Aggregated by pay-level domain. In: Proceedings of the 2014 ACM Conference on Web Science, WebSci ’14, pp. 119–128, New York, NY, USA, ACM (2014) Lehmberg, O., Meusel, R., Bizer, C.: Graph structure in the web: Aggregated by pay-level domain. In: Proceedings of the 2014 ACM Conference on Web Science, WebSci ’14, pp. 119–128, New York, NY, USA, ACM (2014)
20.
Zurück zum Zitat Mahanti, A., Carlsson, N., Mahanti, A., Arlitt, M., Williamson, C.: A tale of the tails: power-laws in internet measurements. IEEE Netw. 27(1), 59–64 (2013)CrossRef Mahanti, A., Carlsson, N., Mahanti, A., Arlitt, M., Williamson, C.: A tale of the tails: power-laws in internet measurements. IEEE Netw. 27(1), 59–64 (2013)CrossRef
21.
Zurück zum Zitat Ordonez, C.: Optimization of linear recursive queries in SQL. IEEE Trans. Knowl. Data Eng. (TKDE) 22(2), 264–277 (2010)CrossRef Ordonez, C.: Optimization of linear recursive queries in SQL. IEEE Trans. Knowl. Data Eng. (TKDE) 22(2), 264–277 (2010)CrossRef
22.
Zurück zum Zitat Ordonez, C., Cabrera, W., Gurram, A.: Comparing columnar, row and array DBMSs to process recursive queries on graphs. Inf. Syst. 63, 66–79 (2016)CrossRef Ordonez, C., Cabrera, W., Gurram, A.: Comparing columnar, row and array DBMSs to process recursive queries on graphs. Inf. Syst. 63, 66–79 (2016)CrossRef
23.
Zurück zum Zitat Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: bringing order to the web (1999) Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: bringing order to the web (1999)
24.
Zurück zum Zitat Pavlo, A., Paulson, E., Rasin, A., Abadi, D., DeWitt, D., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: Proc. ACM SIGMOD Conference, pp. 165–178 (2009) Pavlo, A., Paulson, E., Rasin, A., Abadi, D., DeWitt, D., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: Proc. ACM SIGMOD Conference, pp. 165–178 (2009)
25.
26.
Zurück zum Zitat Rudolf, M., Paradies, M., Bornhövd, C., Lehner, W.: Synopsys: large graph analytics in the SAP HANA database through summarization. In: First International Workshop on Graph Data Management Experiences and Systems, p. 16. ACM (2013) Rudolf, M., Paradies, M., Bornhövd, C., Lehner, W.: Synopsys: large graph analytics in the SAP HANA database through summarization. In: First International Workshop on Graph Data Management Experiences and Systems, p. 16. ACM (2013)
28.
Zurück zum Zitat Soroush, E., Balazinska, M., Wang, D.: ArrayStore: a storage manager for complex parallel array processing. In: Proc. ACM SIGMOD Conference, pp. 253–264 (2011) Soroush, E., Balazinska, M., Wang, D.: ArrayStore: a storage manager for complex parallel array processing. In: Proc. ACM SIGMOD Conference, pp. 253–264 (2011)
29.
Zurück zum Zitat Stonebraker, M., Abadi, D., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E., O’Neil, P., Rasin, A., Tran, N., Zdonik, S.: C-Store: a column-oriented DBMS. In: Proc. VLDB Conference, pp. 553–564 (2005) Stonebraker, M., Abadi, D., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E., O’Neil, P., Rasin, A., Tran, N., Zdonik, S.: C-Store: a column-oriented DBMS. In: Proc. VLDB Conference, pp. 553–564 (2005)
30.
Zurück zum Zitat Stonebraker, M., Brown, P., Poliakov, A., Raman, S.: The architecture of SciDB. In: Proceedings of SSDBM, SSDBM’11, pp. 1–16. Springer (2011) Stonebraker, M., Brown, P., Poliakov, A., Raman, S.: The architecture of SciDB. In: Proceedings of SSDBM, SSDBM’11, pp. 1–16. Springer (2011)
31.
Zurück zum Zitat Welc, A., Raman, R., Wu, Z., Hong, S., Chafi, H., Banerjee, J.: Graph analysis: do we have to reinvent the wheel? In: First International Workshop on Graph Data Management Experiences and Systems, p. 7. ACM (2013) Welc, A., Raman, R., Wu, Z., Hong, S., Chafi, H., Banerjee, J.: Graph analysis: do we have to reinvent the wheel? In: First International Workshop on Graph Data Management Experiences and Systems, p. 7. ACM (2013)
32.
Zurück zum Zitat Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. HotCloud 10(10–10), 95 (2010) Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. HotCloud 10(10–10), 95 (2010)
Metadaten
Titel
Scalable parallel graph algorithms with matrix–vector multiplication evaluated with queries
verfasst von
Wellington Cabrera
Carlos Ordonez
Publikationsdatum
29.08.2017
Verlag
Springer US
Erschienen in
Distributed and Parallel Databases / Ausgabe 3-4/2017
Print ISSN: 0926-8782
Elektronische ISSN: 1573-7578
DOI
https://doi.org/10.1007/s10619-017-7200-6

Weitere Artikel der Ausgabe 3-4/2017

Distributed and Parallel Databases 3-4/2017 Zur Ausgabe