Skip to main content
Erschienen in: The VLDB Journal 2/2016

01.04.2016 | Regular Paper

NScale: neighborhood-centric large-scale graph analytics in the cloud

verfasst von: Abdul Quamar, Amol Deshpande, Jimmy Lin

Erschienen in: The VLDB Journal | Ausgabe 2/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

There is an increasing interest in executing complex analyses over large graphs, many of which require processing a large number of multi-hop neighborhoods or subgraphs. Examples include ego network analysis, motif counting, finding social circles, personalized recommendations, link prediction, anomaly detection, analyzing influence cascades, and others. These tasks are not well served by existing vertex-centric graph processing frameworks, where user programs are only able to directly access the state of a single vertex at a time, resulting in high communication, scheduling, and memory overheads in executing such tasks. Further, most existing graph processing frameworks ignore the challenges in extracting the relevant portions of the graph that an analysis task is interested in, and loading those onto distributed memory. This paper introduces NScale, a novel end-to-end graph processing framework that enables the distributed execution of complex subgraph-centric analytics over large-scale graphs in the cloud. NScale enables users to write programs at the level of subgraphs rather than at the level of vertices. Unlike most previous graph processing frameworks, which apply the user program to the entire graph, NScale allows users to declaratively specify subgraphs of interest. Our framework includes a novel graph extraction and packing (GEP) module that utilizes a cost-based optimizer to partition and pack the subgraphs of interest into memory on as few machines as possible. The distributed execution engine then takes over and runs the user program in parallel on those subgraphs, restricting the scope of the execution appropriately, and utilizes novel techniques to minimize memory consumption by exploiting overlaps among the subgraphs. We present a comprehensive empirical evaluation comparing against three state-of-the-art systems, namely Giraph, GraphLab, and GraphX, on several real-world datasets and a variety of analysis tasks. Our experimental results show orders-of-magnitude improvements in performance and drastic reductions in the cost of analytics compared to vertex-centric approaches.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
For input graphs represented as an edge list with the vertex attributes available as a separate mapping, we have a minor modification to the first stage that uses a MapReduce job to join the edge and vertex data and produce a distributed adjacency list in the required format.
 
2
We use the terms partitions and bins interchangeably in this paper.
 
3
The higher the value of k, the better the quality of the result. We have chosen \(k = 6\) for our implementation which was determined experimentally to strike a fine balance between the quality of shingle-based similarity and computation time.
 
Literatur
1.
Zurück zum Zitat Akoglu, L., McGlohon, M., Faloutsos, C.: OddBall: spotting anomalies in weighted graphs. In: PAKDD (2010) Akoglu, L., McGlohon, M., Faloutsos, C.: OddBall: spotting anomalies in weighted graphs. In: PAKDD (2010)
3.
Zurück zum Zitat Backstrom, L., Leskovec, J.: Supervised random walks: predicting and recommending links in social networks. In: WSDM (2011) Backstrom, L., Leskovec, J.: Supervised random walks: predicting and recommending links in social networks. In: WSDM (2011)
5.
Zurück zum Zitat Burt, R.S.: Secondhand brokerage: evidence on the importance of local structure for managers, bankers, and analysts. Acad. Manag. J. 50(1), 119–148 (2007)MathSciNetCrossRef Burt, R.S.: Secondhand brokerage: evidence on the importance of local structure for managers, bankers, and analysts. Acad. Manag. J. 50(1), 119–148 (2007)MathSciNetCrossRef
6.
Zurück zum Zitat Burt, R.S.: Structural Holes: The Social Structure of Competition. Harvard University Press, Cambridge (2009) Burt, R.S.: Structural Holes: The Social Structure of Competition. Harvard University Press, Cambridge (2009)
7.
Zurück zum Zitat Cheng, J., Ke, Y., Ng, W., Lu, A.: Fg-index: towards verification-free query processing on graph databases. In: SIGMOD (2007) Cheng, J., Ke, Y., Ng, W., Lu, A.: Fg-index: towards verification-free query processing on graph databases. In: SIGMOD (2007)
8.
Zurück zum Zitat Cheng, R., Hong, J., Kyrola, A., Miao, Y., Weng, X., Wu, M., Yang, F., Zhou, L., Zhao, F., Chen, E.: Kineograph: taking pulse of a fast-changing and connected world. In: EuroSys (2012) Cheng, R., Hong, J., Kyrola, A., Miao, Y., Weng, X., Wu, M., Yang, F., Zhou, L., Zhao, F., Chen, E.: Kineograph: taking pulse of a fast-changing and connected world. In: EuroSys (2012)
9.
Zurück zum Zitat Cordella, L.P., Foggia, P., Sansone, C., Vento, M.: A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans. Pattern Anal. Mach. Intell. 26, 1367–1372 (2004)CrossRef Cordella, L.P., Foggia, P., Sansone, C., Vento, M.: A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans. Pattern Anal. Mach. Intell. 26, 1367–1372 (2004)CrossRef
10.
Zurück zum Zitat Curtiss, M., Becker, I., Bosman, T., Doroshenko, S., Grijincu, L., Jackson, T., Kunnatur, S., Lassen, S., Pronin, P., Sankar, S., Shen, G., Woss, G., Yang, C., Zhang, N.: Unicorn: a system for searching the social graph. In: Proceedings of VLDB Endowment (2013) Curtiss, M., Becker, I., Bosman, T., Doroshenko, S., Grijincu, L., Jackson, T., Kunnatur, S., Lassen, S., Pronin, P., Sankar, S., Shen, G., Woss, G., Yang, C., Zhang, N.: Unicorn: a system for searching the social graph. In: Proceedings of VLDB Endowment (2013)
11.
Zurück zum Zitat Everett, M., Borgatti, S.P.: Ego network betweenness. Soc. Netw. 27(1), 31–38 (2005)CrossRef Everett, M., Borgatti, S.P.: Ego network betweenness. Soc. Netw. 27(1), 31–38 (2005)CrossRef
13.
Zurück zum Zitat Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: GraphX: graph processing in a distributed dataflow framework. In: OSDI (2014) Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: GraphX: graph processing in a distributed dataflow framework. In: OSDI (2014)
14.
Zurück zum Zitat Granovetter, M.S.: The strength of weak ties. Am. J. Sociol. 78, 1360–1380 (1973)CrossRef Granovetter, M.S.: The strength of weak ties. Am. J. Sociol. 78, 1360–1380 (1973)CrossRef
16.
Zurück zum Zitat Gupta, P., Goel, A., Lin, J., Sharma, A., Wang, D., Zadeh, R.: WTF: the who to follow service at twitter. In: WWW (2013) Gupta, P., Goel, A., Lin, J., Sharma, A., Wang, D., Zadeh, R.: WTF: the who to follow service at twitter. In: WWW (2013)
17.
Zurück zum Zitat He H., Singh, A.K.: Graphs-at-a-time: query language and access methods for graph databases. In: SIGMOD (2008) He H., Singh, A.K.: Graphs-at-a-time: query language and access methods for graph databases. In: SIGMOD (2008)
18.
Zurück zum Zitat Hoque, I., Gupta, I.: Lfgraph: simple and fast distributed graph analytics. In: TRIOS (2013) Hoque, I., Gupta, I.: Lfgraph: simple and fast distributed graph analytics. In: TRIOS (2013)
19.
Zurück zum Zitat Huang, J., Abadi, D.J., Ren, K.: Scalable SPARQL querying of large RDF graphs. In: PVLDB (2011) Huang, J., Abadi, D.J., Ren, K.: Scalable SPARQL querying of large RDF graphs. In: PVLDB (2011)
20.
Zurück zum Zitat Izumi, T., Yokomaru, T., Takahashi, A., Kajitani, Y.: Computational complexity analysis of set-bin-packing problem. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 81(5), 842–849 (1998) Izumi, T., Yokomaru, T., Takahashi, A., Kajitani, Y.: Computational complexity analysis of set-bin-packing problem. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 81(5), 842–849 (1998)
21.
Zurück zum Zitat Kashtan, N., Itzkovitz, S., Milo, R., Alon, U.: Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs. Bioinformatics 20, 1746–1758 (2004)CrossRef Kashtan, N., Itzkovitz, S., Milo, R., Alon, U.: Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs. Bioinformatics 20, 1746–1758 (2004)CrossRef
22.
Zurück zum Zitat Kolountzakis, M.N., Miller, G.L., Peng, R., Tsourakakis, C.E.: Efficient triangle counting in large graphs via degree-based vertex partitioning. Internet Math. 8, 161–185 (2012)MathSciNetCrossRefMATH Kolountzakis, M.N., Miller, G.L., Peng, R., Tsourakakis, C.E.: Efficient triangle counting in large graphs via degree-based vertex partitioning. Internet Math. 8, 161–185 (2012)MathSciNetCrossRefMATH
23.
Zurück zum Zitat Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev. 44, 35–40 (2010)CrossRef Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev. 44, 35–40 (2010)CrossRef
24.
Zurück zum Zitat Lee, J., Han, W.-S., Kasperovics, R., Lee, J.-H.: An in-depth comparison of subgraph isomorphism algorithms in graph databases. In: PVLDB (2013) Lee, J., Han, W.-S., Kasperovics, R., Lee, J.-H.: An in-depth comparison of subgraph isomorphism algorithms in graph databases. In: PVLDB (2013)
25.
Zurück zum Zitat Leskovec, J., Faloutsos, C.: Sampling from large graphs. In: SIGKDD (2006) Leskovec, J., Faloutsos, C.: Sampling from large graphs. In: SIGKDD (2006)
26.
Zurück zum Zitat Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Distributed GraphLab: a framework for machine learning in the cloud. In: PVLDB (2012) Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Distributed GraphLab: a framework for machine learning in the cloud. In: PVLDB (2012)
27.
Zurück zum Zitat Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: SIGMOD (2010) Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: SIGMOD (2010)
28.
Zurück zum Zitat McAuley, J., Leskovec, J.: Learning to discover social circles in ego networks. In: NIPS (2012) McAuley, J., Leskovec, J.: Learning to discover social circles in ego networks. In: NIPS (2012)
30.
Zurück zum Zitat Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D., Alon, U.: Network motifs: simple building blocks of complex networks. Science 298, 824–827 (2002) Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D., Alon, U.: Network motifs: simple building blocks of complex networks. Science 298, 824–827 (2002)
31.
Zurück zum Zitat Mongiov, M., Natale, R.D., Giugno, R., Pulvirenti, A., Ferro, A., Sharan, R.: Sigma: a set-cover-based inexact graph matching algorithm. J. Bioinform. Comput. Biol. 8, 199–218 (2010)CrossRef Mongiov, M., Natale, R.D., Giugno, R., Pulvirenti, A., Ferro, A., Sharan, R.: Sigma: a set-cover-based inexact graph matching algorithm. J. Bioinform. Comput. Biol. 8, 199–218 (2010)CrossRef
32.
Zurück zum Zitat Moustafa, W.E., Namata, G., Deshpande, A., Getoor, L.: Declarative analysis of noisy information networks. In: ICDE Workshops (2011) Moustafa, W.E., Namata, G., Deshpande, A., Getoor, L.: Declarative analysis of noisy information networks. In: ICDE Workshops (2011)
33.
Zurück zum Zitat Nguyen, D., Lenharth, A., Pingali, K.: A lightweight infrastructure for graph analytics. In: SOSP (2013) Nguyen, D., Lenharth, A., Pingali, K.: A lightweight infrastructure for graph analytics. In: SOSP (2013)
34.
Zurück zum Zitat Popescu, A.D., Balmin, A., Ercegovac, V., Ailamaki, A.: PREDIcT: towards predicting the runtime of large scale iterative analytics. In: Proceedings of VLDB Endowment (2013) Popescu, A.D., Balmin, A., Ercegovac, V., Ailamaki, A.: PREDIcT: towards predicting the runtime of large scale iterative analytics. In: Proceedings of VLDB Endowment (2013)
35.
Zurück zum Zitat Pujol, J.M., Erramilli, V., Siganos, G., Xiaoyuan, Y., Laoutaris, N., Chhabra, P., Rodriguez, P.: The little engine(s) that could: scaling online social networks. In: SIGCOMM (2010) Pujol, J.M., Erramilli, V., Siganos, G., Xiaoyuan, Y., Laoutaris, N., Chhabra, P., Rodriguez, P.: The little engine(s) that could: scaling online social networks. In: SIGCOMM (2010)
36.
Zurück zum Zitat Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press, Cambridge (2011)CrossRef Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press, Cambridge (2011)CrossRef
38.
Zurück zum Zitat Roy, A., Mihailovic, I., Zwaenepoel, W.: X-stream: edge-centric graph processing using streaming partitions. In: SOSP (2013) Roy, A., Mihailovic, I., Zwaenepoel, W.: X-stream: edge-centric graph processing using streaming partitions. In: SOSP (2013)
39.
Zurück zum Zitat Salihoglu, S., Widom, J.: GPS: a graph processing system. In: SSDBM (2013) Salihoglu, S., Widom, J.: GPS: a graph processing system. In: SSDBM (2013)
40.
Zurück zum Zitat Seo, J., Guo, S., Lam, M.S.: Socialite: datalog extensions for efficient social network analysis. In: ICDE (2013) Seo, J., Guo, S., Lam, M.S.: Socialite: datalog extensions for efficient social network analysis. In: ICDE (2013)
41.
Zurück zum Zitat Seo, J., Park, J., Shin, J., Lam, M.S.: Distributed socialite: a datalog-based language for large-scale graph analysis. In: PVLDB (2013) Seo, J., Park, J., Shin, J., Lam, M.S.: Distributed socialite: a datalog-based language for large-scale graph analysis. In: PVLDB (2013)
42.
Zurück zum Zitat Shang, H., Zhang, Y., Lin, X., Yu, J.X.: Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. In: VLDB (2008) Shang, H., Zhang, Y., Lin, X., Yu, J.X.: Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. In: VLDB (2008)
43.
Zurück zum Zitat Shasha, D., Wang, J.T.L., Giugno, R.: Algorithmics and applications of tree and graph searching. In: PODS (2002) Shasha, D., Wang, J.T.L., Giugno, R.: Algorithmics and applications of tree and graph searching. In: PODS (2002)
44.
Zurück zum Zitat Simmhan, Y., Kumbhare, A.G., Wickramaarachchi, C., Nagarkar, S., Ravi, S., Raghavendra, C.S., Prasanna, V.K.: Goffish: a sub-graph centric framework for large-scale graph analytics. In: CoRR (2013) Simmhan, Y., Kumbhare, A.G., Wickramaarachchi, C., Nagarkar, S., Ravi, S., Raghavendra, C.S., Prasanna, V.K.: Goffish: a sub-graph centric framework for large-scale graph analytics. In: CoRR (2013)
46.
Zurück zum Zitat Tian, Y., Balmin, A., Corsten, S.A., Tatikonda, S., McPherson, J.: From “Think Like a Vertex” to “Think Like a Graph”. In: PVLDB (2013) Tian, Y., Balmin, A., Corsten, S.A., Tatikonda, S., McPherson, J.: From “Think Like a Vertex” to “Think Like a Graph”. In: PVLDB (2013)
47.
Zurück zum Zitat Tian, Y., Patel, J.M.: TALE: a tool for approximate large graph matching. In: ICDE (2008) Tian, Y., Patel, J.M.: TALE: a tool for approximate large graph matching. In: ICDE (2008)
49.
Zurück zum Zitat Wang, G., Xie, W., Demers, A.J., Gehrke, J.: Asynchronous large-scale graph processing made easy. In: CIDR (2013) Wang, G., Xie, W., Demers, A.J., Gehrke, J.: Asynchronous large-scale graph processing made easy. In: CIDR (2013)
50.
Zurück zum Zitat Yan, X., Yu, P.S., Han, J.: Graph indexing: a frequent structure-based approach. In: SIGMOD (2004) Yan, X., Yu, P.S., Han, J.: Graph indexing: a frequent structure-based approach. In: SIGMOD (2004)
51.
Zurück zum Zitat Zhao, P., Yu, J.X., Yu, P.S.: Graph indexing: tree + delta less than equal to graph. In: VLDB (2007) Zhao, P., Yu, J.X., Yu, P.S.: Graph indexing: tree + delta less than equal to graph. In: VLDB (2007)
52.
Zurück zum Zitat Zou, L., Chen, L., Yu, J.X., Lu, Y.: A novel spectral coding in a large graph database. In: EDBT (2008) Zou, L., Chen, L., Yu, J.X., Lu, Y.: A novel spectral coding in a large graph database. In: EDBT (2008)
Metadaten
Titel
NScale: neighborhood-centric large-scale graph analytics in the cloud
verfasst von
Abdul Quamar
Amol Deshpande
Jimmy Lin
Publikationsdatum
01.04.2016
Verlag
Springer Berlin Heidelberg
Erschienen in
The VLDB Journal / Ausgabe 2/2016
Print ISSN: 1066-8888
Elektronische ISSN: 0949-877X
DOI
https://doi.org/10.1007/s00778-015-0405-2

Weitere Artikel der Ausgabe 2/2016

The VLDB Journal 2/2016 Zur Ausgabe