Skip to main content

2016 | OriginalPaper | Buchkapitel

Online Graph Partitioning with an Affine Message Combining Cost Function

verfasst von : Xiang Chen, Jun Huan

Erschienen in: Big Data Analytics

Verlag: Springer India

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Graph partitioning is a key step in developing scalable data mining algorithms on massive graph data such as web graphs and social networks. Graph partitioning is often formalized as an optimization problem where we assign graph vertices to computing nodes with the objection to both minimize the communication cost between computing nodes and to balance the load of computing nodes. Such optimization was specified using a cost function to measure the quality of graph partition. Current graph systems such as Pregel, Graphlab take graph cut, i.e. counting the number of edges that cross different partitions, as the cost function of graph partition. We argue that graph cut ignores many characteristics of modern computing cluster and to develop better graph partitioning algorithm we should revise the cost function. In particular we believe that message combing, a new technique that was recently developed in order to minimize communication of computing nodes, should be considered in designing new cost functions for graph partitioning. In this paper, we propose a new cost function for graph partitioning which considers message combining. In this new cost function, we consider communication cost from three different sources: (1) two computing nodes establish a message channel between them; (2) a process creates a message utilize the channel and (3) the length of the message. Based on this cost function, we develop several heuristics for large graph partitioning. We have performed comprehensive experiments using real-world graphs. Our results demonstrate that our algorithms yield significant performance improvements over state-of-the-art approaches. The new cost function developed in this paper should help design new graph partition algorithms for better graph system performance.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N et al (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International conference on management of data, 2010, pp 135–146 Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N et al (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International conference on management of data, 2010, pp 135–146
2.
Zurück zum Zitat Avery C (2011) Giraph: Large-scale graph processing infrastruction on Hadoop. In: Proceedings of Hadoop Summit. Santa Clara, USA Avery C (2011) Giraph: Large-scale graph processing infrastruction on Hadoop. In: Proceedings of Hadoop Summit. Santa Clara, USA
3.
Zurück zum Zitat Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX conference on Hot topics in cloud computing,pp. 10–10 Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX conference on Hot topics in cloud computing,pp. 10–10
4.
Zurück zum Zitat Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein JM (2010) Graphlab: a new framework for parallel machine learning. arXiv:1006.4990 Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein JM (2010) Graphlab: a new framework for parallel machine learning. arXiv:​1006.​4990
5.
Zurück zum Zitat Shao B, Wang H, Li Y (2013) Trinity: a distributed graph engine on a memory cloud. In: Proceedings of the 2013 international conference on Management of data, 2013, pp 505–516 Shao B, Wang H, Li Y (2013) Trinity: a distributed graph engine on a memory cloud. In: Proceedings of the 2013 international conference on Management of data, 2013, pp 505–516
6.
Zurück zum Zitat Ke Q, Prabhakaran V, Xie Y, Yu Y, Wu J, Yang J (2011) Optimizing data partitioning for data-parallel computing. HotOS XIII Ke Q, Prabhakaran V, Xie Y, Yu Y, Wu J, Yang J (2011) Optimizing data partitioning for data-parallel computing. HotOS XIII
7.
Zurück zum Zitat Kernighan BW, Lin S (1970) An efficient heuristic procedure for partitioning graphs. Bell Syst Tech J 49:291–307CrossRefMATH Kernighan BW, Lin S (1970) An efficient heuristic procedure for partitioning graphs. Bell Syst Tech J 49:291–307CrossRefMATH
8.
Zurück zum Zitat Fiduccia CM, Mattheyses RM (1982) A linear-time heuristic for improving network partitions. In: 19th Conference on Design Automation, 1982, pp 175–181 Fiduccia CM, Mattheyses RM (1982) A linear-time heuristic for improving network partitions. In: 19th Conference on Design Automation, 1982, pp 175–181
9.
10.
Zurück zum Zitat Ng AY (2002) On spectral clustering: analysis and an algorithm Ng AY (2002) On spectral clustering: analysis and an algorithm
11.
Zurück zum Zitat Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20:359–392MathSciNetCrossRefMATH Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20:359–392MathSciNetCrossRefMATH
12.
Zurück zum Zitat LÜcking T, Monien B, Elsässer R (2001) New spectral bounds on k-partitioning of graphs. In: Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures, pp 255–262 LÜcking T, Monien B, Elsässer R (2001) New spectral bounds on k-partitioning of graphs. In: Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures, pp 255–262
13.
Zurück zum Zitat Abou-Rjeili A, Karypis G (2006) Multilevel algorithms for partitioning power-law graphs. In: 20th International parallel and distributed processing symposium, 2006. IPDPS 2006, 10 pp Abou-Rjeili A, Karypis G (2006) Multilevel algorithms for partitioning power-law graphs. In: 20th International parallel and distributed processing symposium, 2006. IPDPS 2006, 10 pp
14.
Zurück zum Zitat Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51:107–113CrossRef Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51:107–113CrossRef
15.
Zurück zum Zitat Kang U, Tsourakakis CE, Faloutsos C (2009) Pegasus: a peta-scale graph mining system implementation and observations. In: Ninth IEEE International conference on data mining, 2009. ICDM’09, 2009, pp 229–238 Kang U, Tsourakakis CE, Faloutsos C (2009) Pegasus: a peta-scale graph mining system implementation and observations. In: Ninth IEEE International conference on data mining, 2009. ICDM’09, 2009, pp 229–238
16.
Zurück zum Zitat Gonzalez JE, Low Y, Gu H, Bickson D, Guestrin C (2012) Powergraph: Distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX symposium on operating systems design and implementation (OSDI), 2012, pp 17–30 Gonzalez JE, Low Y, Gu H, Bickson D, Guestrin C (2012) Powergraph: Distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX symposium on operating systems design and implementation (OSDI), 2012, pp 17–30
17.
Zurück zum Zitat Bourse F, Lelarge M, Vojnovic M (2014) Balanced graph edge partition Bourse F, Lelarge M, Vojnovic M (2014) Balanced graph edge partition
18.
Zurück zum Zitat Stanton I, Kliot G (2012) Streaming graph partitioning for large distributed graphs. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, 2012, pp 1222–1230 Stanton I, Kliot G (2012) Streaming graph partitioning for large distributed graphs. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, 2012, pp 1222–1230
19.
Zurück zum Zitat Tsourakakis C, Gkantsidis C, Radunovic B, Vojnovic M (2014) Fennel: Streaming graph partitioning for massive scale graphs. In: Proceedings of the 7th ACM international conference on Web search and data mining, 2014, pp 333–342 Tsourakakis C, Gkantsidis C, Radunovic B, Vojnovic M (2014) Fennel: Streaming graph partitioning for massive scale graphs. In: Proceedings of the 7th ACM international conference on Web search and data mining, 2014, pp 333–342
20.
Zurück zum Zitat Leskovec J, Krevl A (2014) SNAP Datasets: Stanford Large Network Dataset Collection Leskovec J, Krevl A (2014) SNAP Datasets: Stanford Large Network Dataset Collection
Metadaten
Titel
Online Graph Partitioning with an Affine Message Combining Cost Function
verfasst von
Xiang Chen
Jun Huan
Copyright-Jahr
2016
Verlag
Springer India
DOI
https://doi.org/10.1007/978-81-322-3628-3_6