Skip to main content
Top
Published in: Distributed and Parallel Databases 1/2017

27-01-2017

Distributed block formation and layout for disk-based management of large-scale graphs

Authors: Abdurrahman Yaşar, Buğra Gedik, Hakan Ferhatosmanoğlu

Published in: Distributed and Parallel Databases | Issue 1/2017

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We are witnessing an enormous growth in social networks as well as in the volume of data generated by them. An important portion of this data is in the form of graphs. In recent years, several graph processing and management systems emerged to handle large-scale graphs. The primary goal of these systems is to run graph algorithms and queries in an efficient and scalable manner. Unlike relational data, graphs are semi-structured in nature. Thus, storing and accessing graph data using secondary storage requires new solutions that can provide locality of access for graph processing workloads. In this work, we propose a scalable block formation and layout technique for graphs, which aims at reducing the I/O cost of disk-based graph processing algorithms. To achieve this, we designed a scalable MapReduce-style method called ICBL, which can divide the graph into a series of disk blocks that contain sub-graphs with high locality. Furthermore, ICBL can order the resulting blocks on disk to further reduce non-local accesses. We experimentally evaluated ICBL to showcase its scalability, layout quality, as well as the effectiveness of automatic parameter tuning for ICBL. We deployed the graph layouts generated by ICBL on the Neo4j open source graph database, http://​www.​neo4j.​org/​ (2015) graph database management system. Our results show that the layout generated by ICBL reduces the query running times over Neo4j more than \(2\times \) compared to the default layout.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Acronym is formed by the initial letters of the four solution stages.
 
Literature
1.
go back to reference Aggarwal, C., Wang, H.: Graph data management and mining. In: Aggarwal, C. (ed.) A Survey of Algorithms and Applications. Springer, Berlin (2010) Aggarwal, C., Wang, H.: Graph data management and mining. In: Aggarwal, C. (ed.) A Survey of Algorithms and Applications. Springer, Berlin (2010)
3.
go back to reference Bhadkamkar, M., Guerra, J., Useche, L., Burnett, S., Liptak, J., Rangaswami, R., Hristidis, V.: BORG: block-reorganization for self-optimizing storage systems. In: Proceedings of the 7th Conference on File and Storage Technologies, pp. 183–196 (2009) Bhadkamkar, M., Guerra, J., Useche, L., Burnett, S., Liptak, J., Rangaswami, R., Hristidis, V.: BORG: block-reorganization for self-optimizing storage systems. In: Proceedings of the 7th Conference on File and Storage Technologies, pp. 183–196 (2009)
4.
go back to reference Boldi, P., Vigna, S.: The WebGraph framework I: compression techniques. In: Proceedings of the Thirteenth International World Wide Web Conference (WWW 2004), pp. 595–601 (2004) Boldi, P., Vigna, S.: The WebGraph framework I: compression techniques. In: Proceedings of the Thirteenth International World Wide Web Conference (WWW 2004), pp. 595–601 (2004)
5.
go back to reference Boldi, P., Rosa, M., Santini, M., Vigna, S.: Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks. In: Proceedings of the 20th International Conference on World Wide Web (2011) Boldi, P., Rosa, M., Santini, M., Vigna, S.: Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks. In: Proceedings of the 20th International Conference on World Wide Web (2011)
6.
go back to reference Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-MAT: a recursive model for graph mining. In: Fourth SIAM International Conference on Data Mining (2004) Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-MAT: a recursive model for graph mining. In: Fourth SIAM International Conference on Data Mining (2004)
7.
go back to reference Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Symposium on Operating System Design and Implementation (OSDI), pp. 137–150 (2004) Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Symposium on Operating System Design and Implementation (OSDI), pp. 137–150 (2004)
8.
go back to reference Dominguez-Sal, D., Martinez-Bazan, N., Muntes-Mulero, V., Baleta, P., Larriba-Pey, J.: A discussion on the design of graph database benchmarks. In: Nambiar, R., Poess, M. (eds.) Performance Evaluation, Measurement and Characterization of Complex Systems. Springer, Berlin (2011) Dominguez-Sal, D., Martinez-Bazan, N., Muntes-Mulero, V., Baleta, P., Larriba-Pey, J.: A discussion on the design of graph database benchmarks. In: Nambiar, R., Poess, M. (eds.) Performance Evaluation, Measurement and Characterization of Complex Systems. Springer, Berlin (2011)
9.
go back to reference Fortunato, S.: Community detection in graphs. Phys. Rep. 483(3–5), 75–174 (2009)MathSciNet Fortunato, S.: Community detection in graphs. Phys. Rep. 483(3–5), 75–174 (2009)MathSciNet
10.
go back to reference Gedik, B., Bordawekar, R.: Disk-based management of interaction graphs. IEEE Trans. Knowl. Data Eng. 26(11), 2689–2702 (2014)CrossRef Gedik, B., Bordawekar, R.: Disk-based management of interaction graphs. IEEE Trans. Knowl. Data Eng. 26(11), 2689–2702 (2014)CrossRef
12.
go back to reference Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: PowerGraph: distributed graph-parallel computation on natural graphs. In: Symposium on Operating System Design and Implementation (OSDI), pp. 17–30 (2012) Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: PowerGraph: distributed graph-parallel computation on natural graphs. In: Symposium on Operating System Design and Implementation (OSDI), pp. 17–30 (2012)
13.
go back to reference Han, W.S., Lee, S., Park, K., Lee, J.H., Kim, M.S., Kim, J., Yu, H.: TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 77–85 (2013) Han, W.S., Lee, S., Park, K., Lee, J.H., Kim, M.S., Kim, J., Yu, H.: TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 77–85 (2013)
14.
go back to reference Hoque, I., Gupta, I.: Disk layout techniques for online social network data. IEEE Comput. 16(3), 24–36 (2012)CrossRef Hoque, I., Gupta, I.: Disk layout techniques for online social network data. IEEE Comput. 16(3), 24–36 (2012)CrossRef
15.
go back to reference Kang, U., Tong, H., Sun, J., Lin, C.Y., Faloutsos, C.: GBASE: a scalable and general graph management system. In: ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 1091–1099 (2011) Kang, U., Tong, H., Sun, J., Lin, C.Y., Faloutsos, C.: GBASE: a scalable and general graph management system. In: ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 1091–1099 (2011)
16.
go back to reference Karypis, G., Kumar, V.: Multilevel graph partitioning schemes. In: International Conference on Parallel Processing (ICPP), pp. 113–122 (1995) Karypis, G., Kumar, V.: Multilevel graph partitioning schemes. In: International Conference on Parallel Processing (ICPP), pp. 113–122 (1995)
17.
go back to reference Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: WWW’10: Proceedings of the 19th International Conference on World Wide Web, pp. 591–600 (2010) Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: WWW’10: Proceedings of the 19th International Conference on World Wide Web, pp. 591–600 (2010)
18.
go back to reference Kyrola, A., Blelloch, G., Guestrin, C.: GraphChi: large-scale graph computation on just a PC. In: Symposium on Operating System Design and Implementation (OSDI), pp. 31–46 (2012) Kyrola, A., Blelloch, G., Guestrin, C.: GraphChi: large-scale graph computation on just a PC. In: Symposium on Operating System Design and Implementation (OSDI), pp. 31–46 (2012)
19.
go back to reference Lasalle, D., Karypis, G.: Multi-threaded graph partitioning. In: Proceedings of the IEEE International Symposium on Parallel and Distributed Processing (IPDPS), pp. 225–236 (2013) Lasalle, D., Karypis, G.: Multi-threaded graph partitioning. In: Proceedings of the IEEE International Symposium on Parallel and Distributed Processing (IPDPS), pp. 225–236 (2013)
21.
go back to reference Low, Y., Bickson, D., Gonzalez, J., Guestrin, C., Kyrola, A., Hellerstein, J.M.: Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proc. VLDB Endow. 5(8), 716–727 (2012). doi:10.14778/2212351.2212354 CrossRef Low, Y., Bickson, D., Gonzalez, J., Guestrin, C., Kyrola, A., Hellerstein, J.M.: Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proc. VLDB Endow. 5(8), 716–727 (2012). doi:10.​14778/​2212351.​2212354 CrossRef
22.
go back to reference MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics, pp. 281–297 (1967) MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics, pp. 281–297 (1967)
23.
go back to reference Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: ACM International Conference on Management of Data (SIGMOD), pp. 135–146 (2010) Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: ACM International Conference on Management of Data (SIGMOD), pp. 135–146 (2010)
24.
go back to reference Mondal, J., Deshpande, A.: Managing large dynamic graphs efficiently. In: ACM International Conference on Management of Data (SIGMOD), pp. 145–156 (2012) Mondal, J., Deshpande, A.: Managing large dynamic graphs efficiently. In: ACM International Conference on Management of Data (SIGMOD), pp. 145–156 (2012)
25.
go back to reference Nanavati, A.A., Siva, G., Das, G., Chakraborty, D., Dasgupta, K., Mukherjea, S., Joshi, A.: On the structural properties of massive telecom call graphs: findings and implications. In: ACM International Conference on Information and Knowledge Management (CIKM), pp. 435–444 (2006) Nanavati, A.A., Siva, G., Das, G., Chakraborty, D., Dasgupta, K., Mukherjea, S., Joshi, A.: On the structural properties of massive telecom call graphs: findings and implications. In: ACM International Conference on Information and Knowledge Management (CIKM), pp. 435–444 (2006)
28.
29.
go back to reference Prabhakaran, V., Wu, M., Weng, X., McSherry, F., Zhou, L., Haridasan, M.: Managing large graphs on multi-cores with graph awareness. In: Proceedings of the 2012 USENIX Conference on Annual Technical Conference, pp. 4–4 (2012) Prabhakaran, V., Wu, M., Weng, X., McSherry, F., Zhou, L., Haridasan, M.: Managing large graphs on multi-cores with graph awareness. In: Proceedings of the 2012 USENIX Conference on Annual Technical Conference, pp. 4–4 (2012)
30.
go back to reference Rajaraman, A., Ullman, J.D.: Data mining. In: Mining of Massive Datasets, pp. 1–17. Cambridge University Press, Cambridge (2011) Rajaraman, A., Ullman, J.D.: Data mining. In: Mining of Massive Datasets, pp. 1–17. Cambridge University Press, Cambridge (2011)
31.
go back to reference Shao, B., Wang, H., Li, Y.: Trinity: a distributed graph engine on a memory cloud. In: ACM International Conference on Management of Data (SIGMOD) (2013) Shao, B., Wang, H., Li, Y.: Trinity: a distributed graph engine on a memory cloud. In: ACM International Conference on Management of Data (SIGMOD) (2013)
32.
go back to reference Siek, J.G., Lee, L.Q., Lumsdaine, A.: Boost Graph Library. The User Guide and Reference Manual. Addison-Wesley, Boston (2002) Siek, J.G., Lee, L.Q., Lumsdaine, A.: Boost Graph Library. The User Guide and Reference Manual. Addison-Wesley, Boston (2002)
33.
go back to reference Simmhan, Y., Kumbhare, A., Wickramaarachchi, C., et al.: Goffish: a sub-graph centric framework for large-scale graph analytics. In: European Conference on Parallel Processing (Euro-Par), pp. 451–462 (2015) Simmhan, Y., Kumbhare, A., Wickramaarachchi, C., et al.: Goffish: a sub-graph centric framework for large-scale graph analytics. In: European Conference on Parallel Processing (Euro-Par), pp. 451–462 (2015)
34.
go back to reference Steinhaus, R.: G-Store: a storage manager for graph data. Master’s Thesis, University of Oxford (2011) Steinhaus, R.: G-Store: a storage manager for graph data. Master’s Thesis, University of Oxford (2011)
35.
go back to reference Tian, Y., Balmin, A., Corsten, S.A., Tatikonda, S., McPherson, J.: From think like a vertex to think like a graph. Proc. Very Large Databases Conf. 7(3), 193–204 (2013) Tian, Y., Balmin, A., Corsten, S.A., Tatikonda, S., McPherson, J.: From think like a vertex to think like a graph. Proc. Very Large Databases Conf. 7(3), 193–204 (2013)
36.
go back to reference Watts, D.J., Strogatz, S.H.: Collective dynamics of ‘small-world’ networks. Nature 393(6684), 409–410 (1998)CrossRef Watts, D.J., Strogatz, S.H.: Collective dynamics of ‘small-world’ networks. Nature 393(6684), 409–410 (1998)CrossRef
37.
go back to reference Xie, W., Wang, G., Bindel, D., Demers, A., Gehrke, J.: Fast iterative graph computation with block updates. Proc. Very Large Databases Conf. 6(14), 2014–2025 (2013). doi:10.14778/2556549.2556581 Xie, W., Wang, G., Bindel, D., Demers, A., Gehrke, J.: Fast iterative graph computation with block updates. Proc. Very Large Databases Conf. 6(14), 2014–2025 (2013). doi:10.​14778/​2556549.​2556581
38.
go back to reference Xin, R.S., Gonzalez, J.E., Franklin, M.J., Stoica, I.: Graphx: a resilient distributed graph system on spark. In: First International Workshop on Graph Data Management Experiences and Systems, pp. 2:1–2:6 (2013) Xin, R.S., Gonzalez, J.E., Franklin, M.J., Stoica, I.: Graphx: a resilient distributed graph system on spark. In: First International Workshop on Graph Data Management Experiences and Systems, pp. 2:1–2:6 (2013)
39.
go back to reference Yan, D., Cheng, J., Lu, Y., Ng, W.: Blogel: a block-centric framework for distributed computation on real-world graphs. Proc. Very Large Databases Conf. 7(14), 1981–1992 (2014) Yan, D., Cheng, J., Lu, Y., Ng, W.: Blogel: a block-centric framework for distributed computation on real-world graphs. Proc. Very Large Databases Conf. 7(14), 1981–1992 (2014)
Metadata
Title
Distributed block formation and layout for disk-based management of large-scale graphs
Authors
Abdurrahman Yaşar
Buğra Gedik
Hakan Ferhatosmanoğlu
Publication date
27-01-2017
Publisher
Springer US
Published in
Distributed and Parallel Databases / Issue 1/2017
Print ISSN: 0926-8782
Electronic ISSN: 1573-7578
DOI
https://doi.org/10.1007/s10619-017-7191-3

Premium Partner