Skip to main content
Top
Published in: International Journal of Parallel Programming 3/2016

01-06-2016

Parallel Tree Accumulations on MapReduce

Authors: Kiminori Matsuzaki, Reina Miyazaki

Published in: International Journal of Parallel Programming | Issue 3/2016

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

MapReduce is a remarkable parallel programming model as well as a parallel processing infrastructure for large-scale data processing. Since it is now widely available on cloud environments, developing methodology or patterns for MapReduce programming is important. In particular, XML is the de facto standard for representing data, and processing semi-structured data is involved in many applications. The target computational patterns in this paper are tree accumulations. Tree accumulations are shape-preserving computations over a tree in which values are updated through flows over the tree. We develop BSP algorithms for two tree accumulations as extensions of the BSP algorithm for tree reduction by Kakehi et al. (Tech. Rep. METR 2006-64, Department of Mathematical Informatics, Graduate School of Information Science and Technology, The University of Tokyo, 2006). We also implemented the two-superstep algorithms with a single MapReduce execution. Experimental results on a 16-node PC cluster show good speedups of a factor of 10.9–12.7.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Footnotes
1
In Haskell, we need sectioning to handle operators as functions, e.g., \((\oplus )\). In this paper, we simply write operators without sectioning when used as function parameters.
 
2
Since it is allowed for \(f_\mathrm{reduce}\) to output no result, the result type of \(f_\mathrm{reduce}\) is specified by a maybe type.
 
3
In the functional programming community, the tree of the type \(\textit{Tree}~\alpha \) is called a rose tree [20].
 
4
If some other properties hold on the parameter operators \(\oplus \) and \(\otimes \), we may be able to extract more parallelism. We can find discussions of such a property in [15, 19]. Note that we only require the associativity of \(\otimes \) in this paper.
 
5
This definition is slightly different from that in the original paper [15]. Our \(\textit{as}\) corresponds to their \(\textit{cs}\), and our \(\textit{ds}\) corresponds to two lists \(\textit{as}\) and \(\textit{bs}\) in their definition.
 
Literature
1.
go back to reference Abrahamson, K.R., Dadoun, N., Kirkpatrick, D.G., Przytycka, T.M.: A simple parallel tree contraction algorithm. J. Algorithms 10(2), 287–302 (1989)MathSciNetCrossRefMATH Abrahamson, K.R., Dadoun, N., Kirkpatrick, D.G., Przytycka, T.M.: A simple parallel tree contraction algorithm. J. Algorithms 10(2), 287–302 (1989)MathSciNetCrossRefMATH
2.
go back to reference Bird, R.: Introduction to Functional Programming Using Haskell. Prentice-Hall, New York (1998) Bird, R.: Introduction to Functional Programming Using Haskell. Prentice-Hall, New York (1998)
3.
go back to reference Blelloch, G.E.: Scans as primitive parallel operations. IEEE Trans. Comput. 38(11), 1526–1538 (1989)CrossRef Blelloch, G.E.: Scans as primitive parallel operations. IEEE Trans. Comput. 38(11), 1526–1538 (1989)CrossRef
4.
go back to reference Choi, H., Lee, K.H., Kim, S.H., Lee, Y.J., Moon, B.: HadoopXML: a suite for parallel processing of massive XML data with multiple twig pattern queries. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12), pp. 2737–2739. ACM (2012) Choi, H., Lee, K.H., Kim, S.H., Lee, Y.J., Moon, B.: HadoopXML: a suite for parallel processing of massive XML data with multiple twig pattern queries. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12), pp. 2737–2739. ACM (2012)
5.
go back to reference Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: Proceedings of the 6th Symposium on Operating System Design and Implementation (OSDI2004), pp. 137–150 (2004) Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: Proceedings of the 6th Symposium on Operating System Design and Implementation (OSDI2004), pp. 137–150 (2004)
6.
go back to reference Dehne, F.K.H.A., Ferreira, A., Cáceres, E., Song, S.W., Roncato, A.: Efficient parallel graph algorithms for coarse-grained multicomputers and BSP. Algorithmica 33(2), 183–200 (2002)MathSciNetCrossRefMATH Dehne, F.K.H.A., Ferreira, A., Cáceres, E., Song, S.W., Roncato, A.: Efficient parallel graph algorithms for coarse-grained multicomputers and BSP. Algorithmica 33(2), 183–200 (2002)MathSciNetCrossRefMATH
7.
go back to reference Emoto, K., Imachi, H.: Parallel tree reduction on MapReduce. In: Proceedings of the International Conference on Computational Science (ICCS 2012), Procedia Computer Science, vol. 9, pp. 1827–1836. Elsevier, Amsterdam (2012) Emoto, K., Imachi, H.: Parallel tree reduction on MapReduce. In: Proceedings of the International Conference on Computational Science (ICCS 2012), Procedia Computer Science, vol. 9, pp. 1827–1836. Elsevier, Amsterdam (2012)
8.
go back to reference Gazit, H., Miller, G.L., Teng, S.H.: Optimal tree contraction in EREW model. In: Proceedings of the Princeton Workshop on Algorithms, Architectures, and Technical Issues for Models of Concurrent Computation, pp. 139–156 (1987) Gazit, H., Miller, G.L., Teng, S.H.: Optimal tree contraction in EREW model. In: Proceedings of the Princeton Workshop on Algorithms, Architectures, and Technical Issues for Models of Concurrent Computation, pp. 139–156 (1987)
9.
go back to reference Gibbons, J.: Algebras for tree algorithms. Ph.D. thesis, Programming Research Group, University of Oxford (1991) Gibbons, J.: Algebras for tree algorithms. Ph.D. thesis, Programming Research Group, University of Oxford (1991)
12.
go back to reference Gibbons, J., Cai, W., Skillicorn, D.B.: Efficient parallel algorithms for tree accumulations. Sci. Comput. Progr. 23(1), 1–18 (1994)MathSciNetCrossRefMATH Gibbons, J., Cai, W., Skillicorn, D.B.: Efficient parallel algorithms for tree accumulations. Sci. Comput. Progr. 23(1), 1–18 (1994)MathSciNetCrossRefMATH
13.
go back to reference Hu, Z., Iwasaki, H., Takeichi, M.: Calculating accumulations. New Gener. Comput. 17, 153–173 (1999)CrossRef Hu, Z., Iwasaki, H., Takeichi, M.: Calculating accumulations. New Gener. Comput. 17, 153–173 (1999)CrossRef
14.
go back to reference Kakehi, K., Matsuzaki, K., Emoto, K.: Efficient parallel tree reductions on distributed memory environments. In: Proceedings of the 7th International Conference on Computational Science (ICCS 2007), Part II, Lecture Notes in Computer Science, vol. 4488, pp. 601–608. Springer, Berlin (2007) Kakehi, K., Matsuzaki, K., Emoto, K.: Efficient parallel tree reductions on distributed memory environments. In: Proceedings of the 7th International Conference on Computational Science (ICCS 2007), Part II, Lecture Notes in Computer Science, vol. 4488, pp. 601–608. Springer, Berlin (2007)
15.
go back to reference Kakehi, K., Matsuzaki, K., Emoto, K., Hu, Z.: A practicable framework for tree reductions under distributed memory environments. Tech. Rep. METR 2006-64, Department of Mathematical Informatics, Graduate School of Information Science and Technology, The University of Tokyo (2006) Kakehi, K., Matsuzaki, K., Emoto, K., Hu, Z.: A practicable framework for tree reductions under distributed memory environments. Tech. Rep. METR 2006-64, Department of Mathematical Informatics, Graduate School of Information Science and Technology, The University of Tokyo (2006)
16.
go back to reference Lämmel, R.: Google’s MapReduce programming model—revisited. Sci. Comput. Progr. 70(1), 1–30 (2008)CrossRefMATH Lämmel, R.: Google’s MapReduce programming model—revisited. Sci. Comput. Progr. 70(1), 1–30 (2008)CrossRefMATH
17.
go back to reference Liu, Y., Emoto, K., Matsuzaki, K., Hu, Z.: Accumulative computation on MapReduce. IPSJ Trans. Progr. 7(1), 18–27 (2014) Liu, Y., Emoto, K., Matsuzaki, K., Hu, Z.: Accumulative computation on MapReduce. IPSJ Trans. Progr. 7(1), 18–27 (2014)
18.
go back to reference Matsuzaki, K.: Efficient implementation of tree accumulations on distributed-memory parallel computers. In: Proceedings of the 7th International Conference on Computational Science (ICCS 2007), Part II, Lecture Notes in Computer Science, vol. 4488, pp. 609–616. Springer, Berlin (2007) Matsuzaki, K.: Efficient implementation of tree accumulations on distributed-memory parallel computers. In: Proceedings of the 7th International Conference on Computational Science (ICCS 2007), Part II, Lecture Notes in Computer Science, vol. 4488, pp. 609–616. Springer, Berlin (2007)
19.
go back to reference Matsuzaki, K., Hu, Z., Takeichi, M.: Parallel skeletons for manipulating general trees. Parallel Comput. 32(7–8), 590–603 (2006)CrossRef Matsuzaki, K., Hu, Z., Takeichi, M.: Parallel skeletons for manipulating general trees. Parallel Comput. 32(7–8), 590–603 (2006)CrossRef
20.
go back to reference Meertens, L.: First Steps Towards the Theory of Rose Trees. CWI, Amsterdam; IFIP Working Group 2.1 Working Paper 592 ROM-25 (1988) Meertens, L.: First Steps Towards the Theory of Rose Trees. CWI, Amsterdam; IFIP Working Group 2.1 Working Paper 592 ROM-25 (1988)
21.
go back to reference Mignet, L., Barbosa, D., Veltri, P.: The XML web: A first study. In: Proceedings of the 12th International Conference on World Wide Web (WWW’03), pp. 500–510. ACM, New York (2003) Mignet, L., Barbosa, D., Veltri, P.: The XML web: A first study. In: Proceedings of the 12th International Conference on World Wide Web (WWW’03), pp. 500–510. ACM, New York (2003)
22.
go back to reference Miller, G.L., Reif, J.H.: Parallel tree contraction and its application. In: 26th Annual Symposium on Foundations of Computer Science, pp. 478–489. IEEE Computer Society (1985) Miller, G.L., Reif, J.H.: Parallel tree contraction and its application. In: 26th Annual Symposium on Foundations of Computer Science, pp. 478–489. IEEE Computer Society (1985)
23.
go back to reference Nomura, Y., Emoto, K., Matsuzaki, K., Hu, Z., Takeichi, M.: Parallelization of XPath queries with tree skeletons. Comput. Softw. 24(3), 51–62 (2007). (In Japanese) Nomura, Y., Emoto, K., Matsuzaki, K., Hu, Z., Takeichi, M.: Parallelization of XPath queries with tree skeletons. Comput. Softw. 24(3), 51–62 (2007). (In Japanese)
24.
go back to reference Pardo, A.: Generic accumulations. In: Proceedings of the IFIP TC2/WG2.1 Working Conference on Generic Programming, pp. 49–78 (2003) Pardo, A.: Generic accumulations. In: Proceedings of the IFIP TC2/WG2.1 Working Conference on Generic Programming, pp. 49–78 (2003)
25.
go back to reference Reif, J.H. (ed.): Synthesis of Parallel Algorithms. Morgan Kaufmann Publishers, Burlington, MA (1993) Reif, J.H. (ed.): Synthesis of Parallel Algorithms. Morgan Kaufmann Publishers, Burlington, MA (1993)
26.
go back to reference Sevilgen, F.E., Aluru, S., Futamura, N.: Parallel algorithms for tree accumulations. J. Parallel Distrib. Comput. 65(1), 85–93 (2005)CrossRefMATH Sevilgen, F.E., Aluru, S., Futamura, N.: Parallel algorithms for tree accumulations. J. Parallel Distrib. Comput. 65(1), 85–93 (2005)CrossRefMATH
27.
go back to reference Skillicorn, D.B.: Foundations of Parallel Programming. Cambridge University Press, Cambridge (1994)CrossRefMATH Skillicorn, D.B.: Foundations of Parallel Programming. Cambridge University Press, Cambridge (1994)CrossRefMATH
28.
go back to reference Skillicorn, D.B.: Parallel implementation of tree skeletons. J. Parallel Distrib. Comput. 39(2), 115–125 (1996)CrossRefMATH Skillicorn, D.B.: Parallel implementation of tree skeletons. J. Parallel Distrib. Comput. 39(2), 115–125 (1996)CrossRefMATH
29.
go back to reference Skillicorn, D.B.: Structured parallel computation in structured documents. J. Univers. Comput. Sci. 3(1), 42–68 (1997)MATH Skillicorn, D.B.: Structured parallel computation in structured documents. J. Univers. Comput. Sci. 3(1), 42–68 (1997)MATH
30.
go back to reference Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)CrossRef Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)CrossRef
31.
go back to reference White, T.: Hadoop: The Definitive Guide. O’Reilly Media / Yahoo Press, Sebastopol, CA (2012) White, T.: Hadoop: The Definitive Guide. O’Reilly Media / Yahoo Press, Sebastopol, CA (2012)
Metadata
Title
Parallel Tree Accumulations on MapReduce
Authors
Kiminori Matsuzaki
Reina Miyazaki
Publication date
01-06-2016
Publisher
Springer US
Published in
International Journal of Parallel Programming / Issue 3/2016
Print ISSN: 0885-7458
Electronic ISSN: 1573-7640
DOI
https://doi.org/10.1007/s10766-015-0355-8

Other articles of this Issue 3/2016

International Journal of Parallel Programming 3/2016 Go to the issue

Premium Partner