Skip to main content
Erschienen in: Distributed and Parallel Databases 1/2017

27.01.2017

Distributed block formation and layout for disk-based management of large-scale graphs

verfasst von: Abdurrahman Yaşar, Buğra Gedik, Hakan Ferhatosmanoğlu

Erschienen in: Distributed and Parallel Databases | Ausgabe 1/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We are witnessing an enormous growth in social networks as well as in the volume of data generated by them. An important portion of this data is in the form of graphs. In recent years, several graph processing and management systems emerged to handle large-scale graphs. The primary goal of these systems is to run graph algorithms and queries in an efficient and scalable manner. Unlike relational data, graphs are semi-structured in nature. Thus, storing and accessing graph data using secondary storage requires new solutions that can provide locality of access for graph processing workloads. In this work, we propose a scalable block formation and layout technique for graphs, which aims at reducing the I/O cost of disk-based graph processing algorithms. To achieve this, we designed a scalable MapReduce-style method called ICBL, which can divide the graph into a series of disk blocks that contain sub-graphs with high locality. Furthermore, ICBL can order the resulting blocks on disk to further reduce non-local accesses. We experimentally evaluated ICBL to showcase its scalability, layout quality, as well as the effectiveness of automatic parameter tuning for ICBL. We deployed the graph layouts generated by ICBL on the Neo4j open source graph database, http://​www.​neo4j.​org/​ (2015) graph database management system. Our results show that the layout generated by ICBL reduces the query running times over Neo4j more than \(2\times \) compared to the default layout.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Acronym is formed by the initial letters of the four solution stages.
 
Literatur
1.
Zurück zum Zitat Aggarwal, C., Wang, H.: Graph data management and mining. In: Aggarwal, C. (ed.) A Survey of Algorithms and Applications. Springer, Berlin (2010) Aggarwal, C., Wang, H.: Graph data management and mining. In: Aggarwal, C. (ed.) A Survey of Algorithms and Applications. Springer, Berlin (2010)
3.
Zurück zum Zitat Bhadkamkar, M., Guerra, J., Useche, L., Burnett, S., Liptak, J., Rangaswami, R., Hristidis, V.: BORG: block-reorganization for self-optimizing storage systems. In: Proceedings of the 7th Conference on File and Storage Technologies, pp. 183–196 (2009) Bhadkamkar, M., Guerra, J., Useche, L., Burnett, S., Liptak, J., Rangaswami, R., Hristidis, V.: BORG: block-reorganization for self-optimizing storage systems. In: Proceedings of the 7th Conference on File and Storage Technologies, pp. 183–196 (2009)
4.
Zurück zum Zitat Boldi, P., Vigna, S.: The WebGraph framework I: compression techniques. In: Proceedings of the Thirteenth International World Wide Web Conference (WWW 2004), pp. 595–601 (2004) Boldi, P., Vigna, S.: The WebGraph framework I: compression techniques. In: Proceedings of the Thirteenth International World Wide Web Conference (WWW 2004), pp. 595–601 (2004)
5.
Zurück zum Zitat Boldi, P., Rosa, M., Santini, M., Vigna, S.: Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks. In: Proceedings of the 20th International Conference on World Wide Web (2011) Boldi, P., Rosa, M., Santini, M., Vigna, S.: Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks. In: Proceedings of the 20th International Conference on World Wide Web (2011)
6.
Zurück zum Zitat Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-MAT: a recursive model for graph mining. In: Fourth SIAM International Conference on Data Mining (2004) Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-MAT: a recursive model for graph mining. In: Fourth SIAM International Conference on Data Mining (2004)
7.
Zurück zum Zitat Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Symposium on Operating System Design and Implementation (OSDI), pp. 137–150 (2004) Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Symposium on Operating System Design and Implementation (OSDI), pp. 137–150 (2004)
8.
Zurück zum Zitat Dominguez-Sal, D., Martinez-Bazan, N., Muntes-Mulero, V., Baleta, P., Larriba-Pey, J.: A discussion on the design of graph database benchmarks. In: Nambiar, R., Poess, M. (eds.) Performance Evaluation, Measurement and Characterization of Complex Systems. Springer, Berlin (2011) Dominguez-Sal, D., Martinez-Bazan, N., Muntes-Mulero, V., Baleta, P., Larriba-Pey, J.: A discussion on the design of graph database benchmarks. In: Nambiar, R., Poess, M. (eds.) Performance Evaluation, Measurement and Characterization of Complex Systems. Springer, Berlin (2011)
9.
Zurück zum Zitat Fortunato, S.: Community detection in graphs. Phys. Rep. 483(3–5), 75–174 (2009)MathSciNet Fortunato, S.: Community detection in graphs. Phys. Rep. 483(3–5), 75–174 (2009)MathSciNet
10.
Zurück zum Zitat Gedik, B., Bordawekar, R.: Disk-based management of interaction graphs. IEEE Trans. Knowl. Data Eng. 26(11), 2689–2702 (2014)CrossRef Gedik, B., Bordawekar, R.: Disk-based management of interaction graphs. IEEE Trans. Knowl. Data Eng. 26(11), 2689–2702 (2014)CrossRef
12.
Zurück zum Zitat Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: PowerGraph: distributed graph-parallel computation on natural graphs. In: Symposium on Operating System Design and Implementation (OSDI), pp. 17–30 (2012) Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: PowerGraph: distributed graph-parallel computation on natural graphs. In: Symposium on Operating System Design and Implementation (OSDI), pp. 17–30 (2012)
13.
Zurück zum Zitat Han, W.S., Lee, S., Park, K., Lee, J.H., Kim, M.S., Kim, J., Yu, H.: TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 77–85 (2013) Han, W.S., Lee, S., Park, K., Lee, J.H., Kim, M.S., Kim, J., Yu, H.: TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 77–85 (2013)
14.
Zurück zum Zitat Hoque, I., Gupta, I.: Disk layout techniques for online social network data. IEEE Comput. 16(3), 24–36 (2012)CrossRef Hoque, I., Gupta, I.: Disk layout techniques for online social network data. IEEE Comput. 16(3), 24–36 (2012)CrossRef
15.
Zurück zum Zitat Kang, U., Tong, H., Sun, J., Lin, C.Y., Faloutsos, C.: GBASE: a scalable and general graph management system. In: ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 1091–1099 (2011) Kang, U., Tong, H., Sun, J., Lin, C.Y., Faloutsos, C.: GBASE: a scalable and general graph management system. In: ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 1091–1099 (2011)
16.
Zurück zum Zitat Karypis, G., Kumar, V.: Multilevel graph partitioning schemes. In: International Conference on Parallel Processing (ICPP), pp. 113–122 (1995) Karypis, G., Kumar, V.: Multilevel graph partitioning schemes. In: International Conference on Parallel Processing (ICPP), pp. 113–122 (1995)
17.
Zurück zum Zitat Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: WWW’10: Proceedings of the 19th International Conference on World Wide Web, pp. 591–600 (2010) Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: WWW’10: Proceedings of the 19th International Conference on World Wide Web, pp. 591–600 (2010)
18.
Zurück zum Zitat Kyrola, A., Blelloch, G., Guestrin, C.: GraphChi: large-scale graph computation on just a PC. In: Symposium on Operating System Design and Implementation (OSDI), pp. 31–46 (2012) Kyrola, A., Blelloch, G., Guestrin, C.: GraphChi: large-scale graph computation on just a PC. In: Symposium on Operating System Design and Implementation (OSDI), pp. 31–46 (2012)
19.
Zurück zum Zitat Lasalle, D., Karypis, G.: Multi-threaded graph partitioning. In: Proceedings of the IEEE International Symposium on Parallel and Distributed Processing (IPDPS), pp. 225–236 (2013) Lasalle, D., Karypis, G.: Multi-threaded graph partitioning. In: Proceedings of the IEEE International Symposium on Parallel and Distributed Processing (IPDPS), pp. 225–236 (2013)
21.
Zurück zum Zitat Low, Y., Bickson, D., Gonzalez, J., Guestrin, C., Kyrola, A., Hellerstein, J.M.: Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proc. VLDB Endow. 5(8), 716–727 (2012). doi:10.14778/2212351.2212354 CrossRef Low, Y., Bickson, D., Gonzalez, J., Guestrin, C., Kyrola, A., Hellerstein, J.M.: Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proc. VLDB Endow. 5(8), 716–727 (2012). doi:10.​14778/​2212351.​2212354 CrossRef
22.
Zurück zum Zitat MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics, pp. 281–297 (1967) MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics, pp. 281–297 (1967)
23.
Zurück zum Zitat Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: ACM International Conference on Management of Data (SIGMOD), pp. 135–146 (2010) Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: ACM International Conference on Management of Data (SIGMOD), pp. 135–146 (2010)
24.
Zurück zum Zitat Mondal, J., Deshpande, A.: Managing large dynamic graphs efficiently. In: ACM International Conference on Management of Data (SIGMOD), pp. 145–156 (2012) Mondal, J., Deshpande, A.: Managing large dynamic graphs efficiently. In: ACM International Conference on Management of Data (SIGMOD), pp. 145–156 (2012)
25.
Zurück zum Zitat Nanavati, A.A., Siva, G., Das, G., Chakraborty, D., Dasgupta, K., Mukherjea, S., Joshi, A.: On the structural properties of massive telecom call graphs: findings and implications. In: ACM International Conference on Information and Knowledge Management (CIKM), pp. 435–444 (2006) Nanavati, A.A., Siva, G., Das, G., Chakraborty, D., Dasgupta, K., Mukherjea, S., Joshi, A.: On the structural properties of massive telecom call graphs: findings and implications. In: ACM International Conference on Information and Knowledge Management (CIKM), pp. 435–444 (2006)
28.
29.
Zurück zum Zitat Prabhakaran, V., Wu, M., Weng, X., McSherry, F., Zhou, L., Haridasan, M.: Managing large graphs on multi-cores with graph awareness. In: Proceedings of the 2012 USENIX Conference on Annual Technical Conference, pp. 4–4 (2012) Prabhakaran, V., Wu, M., Weng, X., McSherry, F., Zhou, L., Haridasan, M.: Managing large graphs on multi-cores with graph awareness. In: Proceedings of the 2012 USENIX Conference on Annual Technical Conference, pp. 4–4 (2012)
30.
Zurück zum Zitat Rajaraman, A., Ullman, J.D.: Data mining. In: Mining of Massive Datasets, pp. 1–17. Cambridge University Press, Cambridge (2011) Rajaraman, A., Ullman, J.D.: Data mining. In: Mining of Massive Datasets, pp. 1–17. Cambridge University Press, Cambridge (2011)
31.
Zurück zum Zitat Shao, B., Wang, H., Li, Y.: Trinity: a distributed graph engine on a memory cloud. In: ACM International Conference on Management of Data (SIGMOD) (2013) Shao, B., Wang, H., Li, Y.: Trinity: a distributed graph engine on a memory cloud. In: ACM International Conference on Management of Data (SIGMOD) (2013)
32.
Zurück zum Zitat Siek, J.G., Lee, L.Q., Lumsdaine, A.: Boost Graph Library. The User Guide and Reference Manual. Addison-Wesley, Boston (2002) Siek, J.G., Lee, L.Q., Lumsdaine, A.: Boost Graph Library. The User Guide and Reference Manual. Addison-Wesley, Boston (2002)
33.
Zurück zum Zitat Simmhan, Y., Kumbhare, A., Wickramaarachchi, C., et al.: Goffish: a sub-graph centric framework for large-scale graph analytics. In: European Conference on Parallel Processing (Euro-Par), pp. 451–462 (2015) Simmhan, Y., Kumbhare, A., Wickramaarachchi, C., et al.: Goffish: a sub-graph centric framework for large-scale graph analytics. In: European Conference on Parallel Processing (Euro-Par), pp. 451–462 (2015)
34.
Zurück zum Zitat Steinhaus, R.: G-Store: a storage manager for graph data. Master’s Thesis, University of Oxford (2011) Steinhaus, R.: G-Store: a storage manager for graph data. Master’s Thesis, University of Oxford (2011)
35.
Zurück zum Zitat Tian, Y., Balmin, A., Corsten, S.A., Tatikonda, S., McPherson, J.: From think like a vertex to think like a graph. Proc. Very Large Databases Conf. 7(3), 193–204 (2013) Tian, Y., Balmin, A., Corsten, S.A., Tatikonda, S., McPherson, J.: From think like a vertex to think like a graph. Proc. Very Large Databases Conf. 7(3), 193–204 (2013)
36.
Zurück zum Zitat Watts, D.J., Strogatz, S.H.: Collective dynamics of ‘small-world’ networks. Nature 393(6684), 409–410 (1998)CrossRef Watts, D.J., Strogatz, S.H.: Collective dynamics of ‘small-world’ networks. Nature 393(6684), 409–410 (1998)CrossRef
37.
Zurück zum Zitat Xie, W., Wang, G., Bindel, D., Demers, A., Gehrke, J.: Fast iterative graph computation with block updates. Proc. Very Large Databases Conf. 6(14), 2014–2025 (2013). doi:10.14778/2556549.2556581 Xie, W., Wang, G., Bindel, D., Demers, A., Gehrke, J.: Fast iterative graph computation with block updates. Proc. Very Large Databases Conf. 6(14), 2014–2025 (2013). doi:10.​14778/​2556549.​2556581
38.
Zurück zum Zitat Xin, R.S., Gonzalez, J.E., Franklin, M.J., Stoica, I.: Graphx: a resilient distributed graph system on spark. In: First International Workshop on Graph Data Management Experiences and Systems, pp. 2:1–2:6 (2013) Xin, R.S., Gonzalez, J.E., Franklin, M.J., Stoica, I.: Graphx: a resilient distributed graph system on spark. In: First International Workshop on Graph Data Management Experiences and Systems, pp. 2:1–2:6 (2013)
39.
Zurück zum Zitat Yan, D., Cheng, J., Lu, Y., Ng, W.: Blogel: a block-centric framework for distributed computation on real-world graphs. Proc. Very Large Databases Conf. 7(14), 1981–1992 (2014) Yan, D., Cheng, J., Lu, Y., Ng, W.: Blogel: a block-centric framework for distributed computation on real-world graphs. Proc. Very Large Databases Conf. 7(14), 1981–1992 (2014)
Metadaten
Titel
Distributed block formation and layout for disk-based management of large-scale graphs
verfasst von
Abdurrahman Yaşar
Buğra Gedik
Hakan Ferhatosmanoğlu
Publikationsdatum
27.01.2017
Verlag
Springer US
Erschienen in
Distributed and Parallel Databases / Ausgabe 1/2017
Print ISSN: 0926-8782
Elektronische ISSN: 1573-7578
DOI
https://doi.org/10.1007/s10619-017-7191-3

Premium Partner