Skip to main content
Top
Published in:

01-12-2016 | Original Article

An empirical comparison of Big Graph frameworks in the context of network analysis

Authors: Jannis Koch, Christian L. Staudt, Maximilian Vogel, Henning Meyerhenke

Published in: Social Network Analysis and Mining | Issue 1/2016

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Complex networks are heterogeneous relational data sets with nontrivial substructures and statistical properties. They are typically represented as graphs consisting of vertices and edges. The analysis of their intricate structure is relevant to many areas of science and commerce, and data sets may reach sizes that require distributed storage and processing. We describe and compare programming models for distributed computing with a focus on graph algorithms for large-scale complex network analysis. Four frameworks-GraphLab, Apache Giraph, Giraph++ and Apache Flink—are used to implement algorithms for the representative problems connected components, community detection, PageRank and clustering coefficients. The implementations are executed on a computer cluster to evaluate the frameworks’ suitability in practice and to compare their performance to that of the single-machine, shared-memory parallel network analysis package NetworKit. Out of the distributed frameworks, GraphLab and Apache Giraph generally show the best performance. In our experiments a cluster of eight computers running Apache Giraph enables the analysis of a network with ca. 2 billion edges, which is too large for a single machine of the same type. However, for networks that fit into memory of one machine, the performance of the shared-memory parallel implementation is usually far better than the distributed ones. The study provides experimental evidence for selecting the appropriate framework depending on the task and data volume.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
go back to reference Battré D, Ewen S, Hueske F, Kao O, Markl V, Warneke D (2010) Nephele/pacts: a programming model and execution framework for web-scale analytical processing. In: Proceedings of 1st ACM symposium on cloud computing, SoCC ’10. ACM, New York, pp 119–130 Battré D, Ewen S, Hueske F, Kao O, Markl V, Warneke D (2010) Nephele/pacts: a programming model and execution framework for web-scale analytical processing. In: Proceedings of 1st ACM symposium on cloud computing, SoCC ’10. ACM, New York, pp 119–130
go back to reference Boldi P, Vigna S (2004) The WebGraph framework I: compression techniques. In: Proceedings of the thirteenth international World Wide Web Conference (WWW 2004). ACM Press, Manhattan, USA, pp 595–601 Boldi P, Vigna S (2004) The WebGraph framework I: compression techniques. In: Proceedings of the thirteenth international World Wide Web Conference (WWW 2004). ACM Press, Manhattan, USA, pp 595–601
go back to reference Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. In: Computer networks and ISDN systems. Elsevier Science Publishers B. V, Amsterdam, pp 107–117 Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. In: Computer networks and ISDN systems. Elsevier Science Publishers B. V, Amsterdam, pp 107–117
go back to reference Cha M, Haddadi H, Benevenuto F, Gummadi KP (2010) Measuring user influence in Twitter: the million follower fallacy. In: Proceedings of the 4th international AAAI conference on Weblogs and Social Media (ICWSM) Cha M, Haddadi H, Benevenuto F, Gummadi KP (2010) Measuring user influence in Twitter: the million follower fallacy. In: Proceedings of the 4th international AAAI conference on Weblogs and Social Media (ICWSM)
go back to reference Costa LdF, Oliveira ON, Travieso G, Rodrigues FA, Villas Boas PR, Antiqueira L, Viana MP, Correa Rocha LE (2011) Analyzing and modeling real-world phenomena with complex networks: a survey of applications. Adv Phys 60(3):329–412CrossRef Costa LdF, Oliveira ON, Travieso G, Rodrigues FA, Villas Boas PR, Antiqueira L, Viana MP, Correa Rocha LE (2011) Analyzing and modeling real-world phenomena with complex networks: a survey of applications. Adv Phys 60(3):329–412CrossRef
go back to reference Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRef Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRef
go back to reference Gonzalez JE, Low Y, Gu H, Bickson D, Guestrin C (2012) Powergraph: Distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX conference on operating systems design and implementation, OSDI’12. USENIX Association, Berkeley, CA, USA, pp 17–30 Gonzalez JE, Low Y, Gu H, Bickson D, Guestrin C (2012) Powergraph: Distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX conference on operating systems design and implementation, OSDI’12. USENIX Association, Berkeley, CA, USA, pp 17–30
go back to reference Karloff H, Suri S, Vassilvitskii S (2010) A model of computation for mapreduce. In: Proceedings of the twenty-first annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics, pp 938–948 Karloff H, Suri S, Vassilvitskii S (2010) A model of computation for mapreduce. In: Proceedings of the twenty-first annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics, pp 938–948
go back to reference Koch J, Staudt CL, Vogel M, Meyerhenke H (2015) Complex network analysis on distributed systems: an empirical comparison. In: Pei J, Silvestri F, Tang J (eds) Proceedings of 2015 IEEE/ACM international conference on advances in social networks analysis and mining, ASONAM 2015. ACM, pp 1169–1176 Koch J, Staudt CL, Vogel M, Meyerhenke H (2015) Complex network analysis on distributed systems: an empirical comparison. In: Pei J, Silvestri F, Tang J (eds) Proceedings of 2015 IEEE/ACM international conference on advances in social networks analysis and mining, ASONAM 2015. ACM, pp 1169–1176
go back to reference Kunegis J (2013) Konect: the koblenz network collection. In: Proceedings of 22nd international conference on World Wide Web companion. International World Wide Web Conferences Steering Committee, pp 1343–1350 Kunegis J (2013) Konect: the koblenz network collection. In: Proceedings of 22nd international conference on World Wide Web companion. International World Wide Web Conferences Steering Committee, pp 1343–1350
go back to reference Kwak H, Lee C, Park H, Moon S (2010) What is Twitter, a social network or a news media? In: WWW ’10: Proceedings of the 19th international conference on World wide web. ACM, New York, NY, USA, pp 591–600 Kwak H, Lee C, Park H, Moon S (2010) What is Twitter, a social network or a news media? In: WWW ’10: Proceedings of the 19th international conference on World wide web. ACM, New York, NY, USA, pp 591–600
go back to reference Lin J, Dyer C (2010) Data-intensive text processing with MapReduce. G-Reference, Information and Interdisciplinary Subjects Series. Morgan & Claypool Lin J, Dyer C (2010) Data-intensive text processing with MapReduce. G-Reference, Information and Interdisciplinary Subjects Series. Morgan & Claypool
go back to reference Lin J, Schatz M (2010) Design patterns for efficient graph algorithms in mapreduce. In: Proceedings of the eighth workshop on mining and learning with graphs, MLG ’10. ACM, New York, NY, USA, pp 78–85 Lin J, Schatz M (2010) Design patterns for efficient graph algorithms in mapreduce. In: Proceedings of the eighth workshop on mining and learning with graphs, MLG ’10. ACM, New York, NY, USA, pp 78–85
go back to reference Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein JM (2012) Distributed GraphLab: a framework for machine learning in the cloud. CoRR, abs/1204.6078 Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein JM (2012) Distributed GraphLab: a framework for machine learning in the cloud. CoRR, abs/1204.6078
go back to reference Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data. ACM, pp 135–146 Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data. ACM, pp 135–146
go back to reference McColl RC, Ediger D, Poovey J, Campbell D, Bader DA (2014) A performance evaluation of open source graph databases. In: Proceedings of 1st workshop on parallel programming for analytics applications, PPAA ’14. ACM, New York, NY, USA, pp 11–18 McColl RC, Ediger D, Poovey J, Campbell D, Bader DA (2014) A performance evaluation of open source graph databases. In: Proceedings of 1st workshop on parallel programming for analytics applications, PPAA ’14. ACM, New York, NY, USA, pp 11–18
go back to reference Meyerhenke H, Sanders P, Schulz C (2014) Partitioning complex networks via size-constrained clustering. In: Proceedings of 13th international symposium on experimental algorithms (SEA 2014), vol 8504 of LNCS. Springer, Berlin, pp 351–363 Meyerhenke H, Sanders P, Schulz C (2014) Partitioning complex networks via size-constrained clustering. In: Proceedings of 13th international symposium on experimental algorithms (SEA 2014), vol 8504 of LNCS. Springer, Berlin, pp 351–363
go back to reference Raghavan UN, Albert R, Kumara S (2007) Near linear time algorithm to detect community structures in large-scale networks. Phys Rev E 76(3):036106CrossRef Raghavan UN, Albert R, Kumara S (2007) Near linear time algorithm to detect community structures in large-scale networks. Phys Rev E 76(3):036106CrossRef
go back to reference Satish N, Sundaram N, Patwary MMA, Seo J, Park J, Hassaan MA, Sengupta S, Yin Z, Dubey P (2014). Navigating the maze of graph analytics frameworks using massive graph datasets. In: Proceedings 2014 ACM SIGMOD international conference on management of data, SIGMOD ’14. ACM, New York, NY, USA, pp 979–990 Satish N, Sundaram N, Patwary MMA, Seo J, Park J, Hassaan MA, Sengupta S, Yin Z, Dubey P (2014). Navigating the maze of graph analytics frameworks using massive graph datasets. In: Proceedings 2014 ACM SIGMOD international conference on management of data, SIGMOD ’14. ACM, New York, NY, USA, pp 979–990
go back to reference Slota GM, Madduri K, Rajamanickam S (2014) Pulp: scalable multi-objective multi-constraint partitioning for small-world networks. In: Lin J, Pei J, Hu X, Chang W, Nambiar R, Aggarwal C, Cercone N, Honavar V, Huan J, Mobasher B, Pyne S (eds) 2014 IEEE international conference on big data, Big Data 2014, pp 481–490 Slota GM, Madduri K, Rajamanickam S (2014) Pulp: scalable multi-objective multi-constraint partitioning for small-world networks. In: Lin J, Pei J, Hu X, Chang W, Nambiar R, Aggarwal C, Cercone N, Honavar V, Huan J, Mobasher B, Pyne S (eds) 2014 IEEE international conference on big data, Big Data 2014, pp 481–490
go back to reference Staudt CL, Sazonovs A, Meyerhenke H (2016) NetworKit: a tool suite for large-scale complex network analysis. Netw Sci, To Appear Staudt CL, Sazonovs A, Meyerhenke H (2016) NetworKit: a tool suite for large-scale complex network analysis. Netw Sci, To Appear
go back to reference Tian Y, Balmin A, Corsten SA, Tatikonda S, McPherson J (2013) From “think like a vertex” to “think like a graph”. PVLDB 7(3):193–204 Tian Y, Balmin A, Corsten SA, Tatikonda S, McPherson J (2013) From “think like a vertex” to “think like a graph”. PVLDB 7(3):193–204
go back to reference Turi (2016). Website of the company distributing GraphLab Turi (2016). Website of the company distributing GraphLab
go back to reference Valiant LG (1990) A bridging model for parallel computation. Commun ACM 33(8):103–111CrossRef Valiant LG (1990) A bridging model for parallel computation. Commun ACM 33(8):103–111CrossRef
go back to reference Zhang Y, Gao Q, Gao L, Wang C (2012). Accelerate large-scale iterative computation through asynchronous accumulative updates. In: Proceedings of the 3rd workshop on scientific cloud computing date, ACM, pp 13–22 Zhang Y, Gao Q, Gao L, Wang C (2012). Accelerate large-scale iterative computation through asynchronous accumulative updates. In: Proceedings of the 3rd workshop on scientific cloud computing date, ACM, pp 13–22
Metadata
Title
An empirical comparison of Big Graph frameworks in the context of network analysis
Authors
Jannis Koch
Christian L. Staudt
Maximilian Vogel
Henning Meyerhenke
Publication date
01-12-2016
Publisher
Springer Vienna
Published in
Social Network Analysis and Mining / Issue 1/2016
Print ISSN: 1869-5450
Electronic ISSN: 1869-5469
DOI
https://doi.org/10.1007/s13278-016-0394-1

Premium Partner