Skip to main content
Erschienen in:
Buchtitelbild

2014 | OriginalPaper | Buchkapitel

9. Big Data Processing Systems

verfasst von : Liang Zhao, Sherif Sakr, Anna Liu, Athman Bouguettaya

Erschienen in: Cloud Data Management

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a simple and powerful programming model that enables easy development of scalable parallel applications to process vast amounts of data on large clusters of commodity machines. It isolates the application from the details of running a distributed program such as issues on data distribution, scheduling and fault tolerance. However, the original implementation of the MapReduce framework had some limitations that have been tackled by many research efforts in several followup works after its introduction. This chapter provides a comprehensive survey for a family of approaches and mechanisms of large scale data processing mechanisms that have been implemented based on the original idea of the MapReduce framework and are currently gaining a lot of momentum in both research and industrial communities. We also cover a set of systems that have been implemented to provide declarative programming interfaces on top of the MapReduce framework. In addition, we review several large scale data processing systems that resemble some of the ideas of the MapReduce framework for different purposes and application scenarios. Finally, we discuss some of the future research directions for implementing the next generation of MapReduce-like solutions.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
58.
Zurück zum Zitat Azza Abouzeid, Kamil Bajda-Pawlikowski, Daniel Abadi, Avi Silberschatz, and Alexander Rasin. HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proc. VLDB Endow., 2(1):922–933, August 2009. Azza Abouzeid, Kamil Bajda-Pawlikowski, Daniel Abadi, Avi Silberschatz, and Alexander Rasin. HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proc. VLDB Endow., 2(1):922–933, August 2009.
59.
Zurück zum Zitat Azza Abouzied, Kamil Bajda-Pawlikowski, Jiewen Huang, Daniel J. Abadi, and Avi Silberschatz. HadoopDB in action: Building real world applications. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD ’10, pages 1111–1114, New York, NY, USA, 2010. ACM. Azza Abouzied, Kamil Bajda-Pawlikowski, Jiewen Huang, Daniel J. Abadi, and Avi Silberschatz. HadoopDB in action: Building real world applications. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD ’10, pages 1111–1114, New York, NY, USA, 2010. ACM.
60.
Zurück zum Zitat Foto N. Afrati and Jeffrey D. Ullman. Optimizing joins in a map-reduce environment. In EDBT, pages 99–110, 2010. Foto N. Afrati and Jeffrey D. Ullman. Optimizing joins in a map-reduce environment. In EDBT, pages 99–110, 2010.
61.
Zurück zum Zitat Foto N. Afrati and Jeffrey D. Ullman. Optimizing Multiway Joins in a Map-Reduce Environment. IEEE TKDE, 23(9):1282–1298, 2011. Foto N. Afrati and Jeffrey D. Ullman. Optimizing Multiway Joins in a Map-Reduce Environment. IEEE TKDE, 23(9):1282–1298, 2011.
64.
Zurück zum Zitat Ahmed M. Aly, Asmaa Sallam, Bala M. Gnanasekaran, Long-Van Nguyen-Dinh, Walid G. Aref, Mourad Ouzzaniy, and Arif Ghafoor. M3: Stream Processing on Main-Memory MapReduce. In ICDE, 2012. Ahmed M. Aly, Asmaa Sallam, Bala M. Gnanasekaran, Long-Van Nguyen-Dinh, Walid G. Aref, Mourad Ouzzaniy, and Arif Ghafoor. M3: Stream Processing on Main-Memory MapReduce. In ICDE, 2012.
69.
Zurück zum Zitat Shivnath Babu. Towards automatic optimization of MapReduce programs. In SoCC, pages 137–142, 2010. Shivnath Babu. Towards automatic optimization of MapReduce programs. In SoCC, pages 137–142, 2010.
73.
Zurück zum Zitat Andrey Balmin, Tim Kaldewey, and Sandeep Tata. Clydesdale: structured data processing on Hadoop. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD ’12, pages 705–708, New York, NY, USA, 2012. ACM. Andrey Balmin, Tim Kaldewey, and Sandeep Tata. Clydesdale: structured data processing on Hadoop. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD ’12, pages 705–708, New York, NY, USA, 2012. ACM.
74.
Zurück zum Zitat Luiz André Barroso and Urs Hölzle. The Case for Energy-Proportional Computing. IEEE Computer, 40(12):33–37, 2007. Luiz André Barroso and Urs Hölzle. The Case for Energy-Proportional Computing. IEEE Computer, 40(12):33–37, 2007.
80.
Zurück zum Zitat Kevin S. Beyer, Vuk Ercegovac, Rainer Gemulla, Andrey Balmin, Mohamed Y. Eltabakh, Carl-Christian Kanne, Fatma Özcan, and Eugene J. Shekita. Jaql: A scripting language for large scale semistructured data analysis. Proc. VLDB Endow., 4(12):1272–1283, August 2011. Kevin S. Beyer, Vuk Ercegovac, Rainer Gemulla, Andrey Balmin, Mohamed Y. Eltabakh, Carl-Christian Kanne, Fatma Özcan, and Eugene J. Shekita. Jaql: A scripting language for large scale semistructured data analysis. Proc. VLDB Endow., 4(12):1272–1283, August 2011.
81.
Zurück zum Zitat Pramod Bhatotia, Alexander Wieder, Rodrigo Rodrigues, Umut A. Acar, and Rafael Pasquin. Incoop: MapReduce for incremental computations. In Proceedings of the 2nd ACM Symposium on Cloud Computing, SOCC ’11, pages 7:1–7:14, New York, NY, USA, 2011. ACM. Pramod Bhatotia, Alexander Wieder, Rodrigo Rodrigues, Umut A. Acar, and Rafael Pasquin. Incoop: MapReduce for incremental computations. In Proceedings of the 2nd ACM Symposium on Cloud Computing, SOCC ’11, pages 7:1–7:14, New York, NY, USA, 2011. ACM.
82.
Zurück zum Zitat Spyros Blanas, Jignesh M. Patel, Vuk Ercegovac, Jun Rao, Eugene J. Shekita, and Yuanyuan Tian. A comparison of join algorithms for log processing in MapReduce. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD ’10, pages 975–986, New York, NY, USA, 2010. ACM. Spyros Blanas, Jignesh M. Patel, Vuk Ercegovac, Jun Rao, Eugene J. Shekita, and Yuanyuan Tian. A comparison of join algorithms for log processing in MapReduce. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD ’10, pages 975–986, New York, NY, USA, 2010. ACM.
84.
Zurück zum Zitat Vinayak Borkar, Michael J. Carey, and Chen Li. Inside “Big Data management”: ogres, onions, or parfaits? In Proceedings of the 15th International Conference on Extending Database Technology, EDBT ’12, pages 3–14, New York, NY, USA, 2012. ACM. Vinayak Borkar, Michael J. Carey, and Chen Li. Inside “Big Data management”: ogres, onions, or parfaits? In Proceedings of the 15th International Conference on Extending Database Technology, EDBT ’12, pages 3–14, New York, NY, USA, 2012. ACM.
87.
Zurück zum Zitat Yingyi Bu, Bill Howe, Magdalena Balazinska, and Michael D. Ernst. HaLoop: efficient iterative data processing on large clusters. Proc. VLDB Endow., 3(1–2):285–296, September 2010. Yingyi Bu, Bill Howe, Magdalena Balazinska, and Michael D. Ernst. HaLoop: efficient iterative data processing on large clusters. Proc. VLDB Endow., 3(1–2):285–296, September 2010.
88.
Zurück zum Zitat Yingyi Bu, Bill Howe, Magdalena Balazinska, and Michael D. Ernst. The HaLoop approach to large-scale iterative data analysis. VLDB J., 21(2):169–190, 2012. Yingyi Bu, Bill Howe, Magdalena Balazinska, and Michael D. Ernst. The HaLoop approach to large-scale iterative data analysis. VLDB J., 21(2):169–190, 2012.
92.
Zurück zum Zitat Michael J. Cafarella and Christopher Ré. Manimal: Relational Optimization for Data-Intensive Programs. In WebDB, 2010. Michael J. Cafarella and Christopher Ré. Manimal: Relational Optimization for Data-Intensive Programs. In WebDB, 2010.
97.
Zurück zum Zitat Craig Chambers, Ashish Raniwala, Frances Perry, Stephen Adams, Robert R. Henry, Robert Bradshaw, and Nathan Weizenbaum. FlumeJava: easy, efficient data-parallel pipelines. SIGPLAN Not., 45(6):363–375, June 2010. Craig Chambers, Ashish Raniwala, Frances Perry, Stephen Adams, Robert R. Henry, Robert Bradshaw, and Nathan Weizenbaum. FlumeJava: easy, efficient data-parallel pipelines. SIGPLAN Not., 45(6):363–375, June 2010.
99.
Zurück zum Zitat Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst., 26(2):4:1–4:26, June 2008. Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst., 26(2):4:1–4:26, June 2008.
100.
Zurück zum Zitat Biswapesh Chattopadhyay, Liang Lin, Weiran Liu, Sagar Mittal, Prathyusha Aragonda, Vera Lychagina, Younghee Kwon, and Michael Wong. Tenzing A SQL Implementation On The MapReduce Framework. PVLDB, 4(12):1318–1327, 2011. Biswapesh Chattopadhyay, Liang Lin, Weiran Liu, Sagar Mittal, Prathyusha Aragonda, Vera Lychagina, Younghee Kwon, and Michael Wong. Tenzing A SQL Implementation On The MapReduce Framework. PVLDB, 4(12):1318–1327, 2011.
101.
Zurück zum Zitat Songting Chen. Cheetah: a high performance, custom data warehouse on top of MapReduce. Proc. VLDB Endow., 3(1–2):1459–1468, September 2010. Songting Chen. Cheetah: a high performance, custom data warehouse on top of MapReduce. Proc. VLDB Endow., 3(1–2):1459–1468, September 2010.
103.
Zurück zum Zitat Hung chih Yang, Ali Dasdan, Ruey-Lung Hsiao, and D. Stott Parker. Map-reduce-merge: simplified relational data processing on large clusters. In SIGMOD, pages 1029–1040, 2007. Hung chih Yang, Ali Dasdan, Ruey-Lung Hsiao, and D. Stott Parker. Map-reduce-merge: simplified relational data processing on large clusters. In SIGMOD, pages 1029–1040, 2007.
104.
Zurück zum Zitat Hung chih Yang and D. Stott Parker. Traverse: Simplified indexing on large map-reduce-merge clusters. In DASFAA, pages 308–322, 2009. Hung chih Yang and D. Stott Parker. Traverse: Simplified indexing on large map-reduce-merge clusters. In DASFAA, pages 308–322, 2009.
108.
Zurück zum Zitat Tyson Condie, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, Khaled Elmeleegy, and Russell Sears. Mapreduce online. In NSDI, 2010. Tyson Condie, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, Khaled Elmeleegy, and Russell Sears. Mapreduce online. In NSDI, 2010.
109.
Zurück zum Zitat Tyson Condie, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, John Gerth, Justin Talbot, Khaled Elmeleegy, and Russell Sears. Online aggregation and continuous query support in MapReduce. In SIGMOD Conference, pages 1115–1118, 2010. Tyson Condie, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, John Gerth, Justin Talbot, Khaled Elmeleegy, and Russell Sears. Online aggregation and continuous query support in MapReduce. In SIGMOD Conference, pages 1115–1118, 2010.
118.
Zurück zum Zitat Jeffrey Dean and Sanjay Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, pages 137–150, 2004. Jeffrey Dean and Sanjay Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, pages 137–150, 2004.
119.
Zurück zum Zitat Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51(1):107–113, 2008.CrossRef Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51(1):107–113, 2008.CrossRef
120.
Zurück zum Zitat Jeffrey Dean and Sanjay Ghemawat. Mapreduce: a flexible data processing tool. Commun. ACM, 53(1):72–77, 2010.CrossRef Jeffrey Dean and Sanjay Ghemawat. Mapreduce: a flexible data processing tool. Commun. ACM, 53(1):72–77, 2010.CrossRef
122.
Zurück zum Zitat David J. DeWitt and Jim Gray. Parallel Database Systems: The Future of High Performance Database Systems. Commun. ACM, 35(6):85–98, 1992.CrossRef David J. DeWitt and Jim Gray. Parallel Database Systems: The Future of High Performance Database Systems. Commun. ACM, 35(6):85–98, 1992.CrossRef
123.
Zurück zum Zitat Jens Dittrich, Jorge-Arnulfo Quiané-Ruiz, Alekh Jindal, Yagiz Kargin, Vinay Setty, and Jörg Schad. Hadoop++: making a yellow elephant run like a cheetah (without it even noticing). Proc. VLDB Endow., 3(1–2):515–529, September 2010. Jens Dittrich, Jorge-Arnulfo Quiané-Ruiz, Alekh Jindal, Yagiz Kargin, Vinay Setty, and Jörg Schad. Hadoop++: making a yellow elephant run like a cheetah (without it even noticing). Proc. VLDB Endow., 3(1–2):515–529, September 2010.
125.
Zurück zum Zitat Jaliya Ekanayake, Hui Li, Bingjing Zhang, Thilina Gunarathne, Seung-Hee Bae, Judy Qiu, and Geoffrey Fox. Twister: a runtime for iterative MapReduce. In HPDC, pages 810–818, 2010. Jaliya Ekanayake, Hui Li, Bingjing Zhang, Thilina Gunarathne, Seung-Hee Bae, Judy Qiu, and Geoffrey Fox. Twister: a runtime for iterative MapReduce. In HPDC, pages 810–818, 2010.
126.
Zurück zum Zitat Iman Elghandour and Ashraf Aboulnaga. ReStore: Reusing Results of MapReduce Jobs. PVLDB, 5(6):586–597, 2012. Iman Elghandour and Ashraf Aboulnaga. ReStore: Reusing Results of MapReduce Jobs. PVLDB, 5(6):586–597, 2012.
127.
Zurück zum Zitat Iman Elghandour and Ashraf Aboulnaga. ReStore: reusing results of MapReduce jobs in pig. In SIGMOD Conference, pages 701–704, 2012. Iman Elghandour and Ashraf Aboulnaga. ReStore: reusing results of MapReduce jobs in pig. In SIGMOD Conference, pages 701–704, 2012.
129.
Zurück zum Zitat Mohamed Y. Eltabakh, Yuanyuan Tian, Fatma Özcan, Rainer Gemulla, Aljoscha Krettek, and John McPherson. CoHadoop: flexible data placement and its exploitation in Hadoop. Proc. VLDB Endow., 4(9):575–585, June 2011. Mohamed Y. Eltabakh, Yuanyuan Tian, Fatma Özcan, Rainer Gemulla, Aljoscha Krettek, and John McPherson. CoHadoop: flexible data placement and its exploitation in Hadoop. Proc. VLDB Endow., 4(9):575–585, June 2011.
132.
Zurück zum Zitat Avrilia Floratou, Jignesh M. Patel, Eugene J. Shekita, and Sandeep Tata. Column-oriented storage techniques for MapReduce. Proc. VLDB Endow., 4(7):419–429, April 2011. Avrilia Floratou, Jignesh M. Patel, Eugene J. Shekita, and Sandeep Tata. Column-oriented storage techniques for MapReduce. Proc. VLDB Endow., 4(7):419–429, April 2011.
134.
Zurück zum Zitat Eric Friedman, Peter M. Pawlowski, and John Cieslewicz. SQL/MapReduce: A practical approach to self-describing, polymorphic, and parallelizable user-defined functions. PVLDB, 2(2):1402–1413, 2009. Eric Friedman, Peter M. Pawlowski, and John Cieslewicz. SQL/MapReduce: A practical approach to self-describing, polymorphic, and parallelizable user-defined functions. PVLDB, 2(2):1402–1413, 2009.
136.
Zurück zum Zitat Alan Gates. Programming Pig. O’Reilly Media, 2011. Alan Gates. Programming Pig. O’Reilly Media, 2011.
137.
Zurück zum Zitat Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The Google file system. SIGOPS Oper. Syst. Rev., 37(5):29–43, October 2003. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The Google file system. SIGOPS Oper. Syst. Rev., 37(5):29–43, October 2003.
141.
Zurück zum Zitat Yunhong Gu and Robert L. Grossman. Lessons learned from a year’s worth of benchmarks of large data clouds. In Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers, MTAGS ’09, pages 3:1–3:6, New York, NY, USA, 2009. ACM. Yunhong Gu and Robert L. Grossman. Lessons learned from a year’s worth of benchmarks of large data clouds. In Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers, MTAGS ’09, pages 3:1–3:6, New York, NY, USA, 2009. ACM.
145.
Zurück zum Zitat Alon Y. Halevy. Answering queries using views: A survey. The VLDB Journal, 10(4):270–294, December 2001.CrossRefMATH Alon Y. Halevy. Answering queries using views: A survey. The VLDB Journal, 10(4):270–294, December 2001.CrossRefMATH
146.
Zurück zum Zitat Yongqiang He, Rubao Lee, Yin Huai, Zheng Shao, Namit Jain, Xiaodong Zhang, and Zhiwei Xu. RCFile: A fast and space-efficient data placement structure in MapReduce-based warehouse systems. In ICDE, pages 1199–1208, 2011. Yongqiang He, Rubao Lee, Yin Huai, Zheng Shao, Namit Jain, Xiaodong Zhang, and Zhiwei Xu. RCFile: A fast and space-efficient data placement structure in MapReduce-based warehouse systems. In ICDE, pages 1199–1208, 2011.
147.
Zurück zum Zitat Herodotos Herodotou. Hadoop performance models. Technical Report CS-2011-05, Duke University, February 2011. Herodotos Herodotou. Hadoop performance models. Technical Report CS-2011-05, Duke University, February 2011.
148.
Zurück zum Zitat Herodotos Herodotou and Shivnath Babu. Profiling, What-if Analysis, and Cost-based Optimization of MapReduce Programs. PVLDB, 4(11):1111–1122, 2011. Herodotos Herodotou and Shivnath Babu. Profiling, What-if Analysis, and Cost-based Optimization of MapReduce Programs. PVLDB, 4(11):1111–1122, 2011.
149.
Zurück zum Zitat Herodotos Herodotou, Fei Dong, and Shivnath Babu. MapReduce Programming and Cost-based Optimization? Crossing this Chasm with Starfish. PVLDB, 4(12):1446–1449, 2011. Herodotos Herodotou, Fei Dong, and Shivnath Babu. MapReduce Programming and Cost-based Optimization? Crossing this Chasm with Starfish. PVLDB, 4(12):1446–1449, 2011.
150.
Zurück zum Zitat Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong, Fatma Bilgen Cetin, and Shivnath Babu. Starfish: A Self-tuning System for Big Data Analytics. In CIDR, pages 261–272, 2011. Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong, Fatma Bilgen Cetin, and Shivnath Babu. Starfish: A Self-tuning System for Big Data Analytics. In CIDR, pages 261–272, 2011.
153.
Zurück zum Zitat Eaman Jahani, Michael J. Cafarella, and Christopher Ré. Automatic optimization for MapReduce programs. Proc. VLDB Endow., 4(6):385–396, March 2011. Eaman Jahani, Michael J. Cafarella, and Christopher Ré. Automatic optimization for MapReduce programs. Proc. VLDB Endow., 4(6):385–396, March 2011.
154.
Zurück zum Zitat David Jiang, Anthony K. H. Tung, and Gang Chen. MAP-JOIN-REDUCE: Toward Scalable and Efficient Data Analysis on Large Clusters. IEEE TKDE, 23(9):1299–1311, 2011. David Jiang, Anthony K. H. Tung, and Gang Chen. MAP-JOIN-REDUCE: Toward Scalable and Efficient Data Analysis on Large Clusters. IEEE TKDE, 23(9):1299–1311, 2011.
155.
Zurück zum Zitat Dawei Jiang, Beng Chin Ooi, Lei Shi, and Sai Wu. The Performance of MapReduce: An In-depth Study. PVLDB, 3(1):472–483, 2010. Dawei Jiang, Beng Chin Ooi, Lei Shi, and Sai Wu. The Performance of MapReduce: An In-depth Study. PVLDB, 3(1):472–483, 2010.
156.
Zurück zum Zitat Alekh Jindal, Jorge-Arnulfo Quiane-Ruiz, and Jens Dittrich. Trojan Data Layouts: Right Shoes for a Running Elephant. In SoCC, 2011. Alekh Jindal, Jorge-Arnulfo Quiane-Ruiz, and Jens Dittrich. Trojan Data Layouts: Right Shoes for a Running Elephant. In SoCC, 2011.
157.
Zurück zum Zitat Tim Kaldewey, Eugene J. Shekita, and Sandeep Tata. Clydesdale: structured data processing on MapReduce. In Proceedings of the 15th International Conference on Extending Database Technology, EDBT ’12, pages 15–25, New York, NY, USA, 2012. ACM. Tim Kaldewey, Eugene J. Shekita, and Sandeep Tata. Clydesdale: structured data processing on MapReduce. In Proceedings of the 15th International Conference on Extending Database Technology, EDBT ’12, pages 15–25, New York, NY, USA, 2012. ACM.
166.
Zurück zum Zitat Vibhore Kumar, Henrique Andrade, Buğra Gedik, and Kun-Lung Wu. DEDUCE: at the intersection of MapReduce and stream processing. In Proceedings of the 13th International Conference on Extending Database Technology, EDBT ’10, pages 657–662, New York, NY, USA, 2010. ACM. Vibhore Kumar, Henrique Andrade, Buğra Gedik, and Kun-Lung Wu. DEDUCE: at the intersection of MapReduce and stream processing. In Proceedings of the 13th International Conference on Extending Database Technology, EDBT ’10, pages 657–662, New York, NY, USA, 2010. ACM.
169.
Zurück zum Zitat Willis Lang and Jignesh M. Patel. Energy management for MapReduce clusters. Proc. VLDB Endow., 3(1–2):129–139, September 2010. Willis Lang and Jignesh M. Patel. Energy management for MapReduce clusters. Proc. VLDB Endow., 3(1–2):129–139, September 2010.
174.
Zurück zum Zitat Jacob Leverich and Christos Kozyrakis. On the energy (in)efficiency of Hadoop clusters. Operating Systems Review, 44(1):61–65, 2010.CrossRef Jacob Leverich and Christos Kozyrakis. On the energy (in)efficiency of Hadoop clusters. Operating Systems Review, 44(1):61–65, 2010.CrossRef
175.
Zurück zum Zitat Boduo Li, Edward Mazur, Yanlei Diao, Andrew McGregor, and Prashant Shenoy. A platform for scalable one-pass analytics using MapReduce. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD ’11, pages 985–996, New York, NY, USA, 2011. ACM. Boduo Li, Edward Mazur, Yanlei Diao, Andrew McGregor, and Prashant Shenoy. A platform for scalable one-pass analytics using MapReduce. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD ’11, pages 985–996, New York, NY, USA, 2011. ACM.
176.
Zurück zum Zitat Harold Lim, Herodotos Herodotou, and Shivnath Babu. Stubby: A Transformation-based Optimizer for MapReduce Workflows. PVLDB, 5(12), 2012. Harold Lim, Herodotos Herodotou, and Shivnath Babu. Stubby: A Transformation-based Optimizer for MapReduce Workflows. PVLDB, 5(12), 2012.
177.
Zurück zum Zitat Yuting Lin, Divyakant Agrawal, Chun Chen, Beng Chin Ooi, and Sai Wu. Llama: leveraging columnar storage for scalable join processing in the MapReduce framework. In SIGMOD Conference, pages 961–972, 2011. Yuting Lin, Divyakant Agrawal, Chun Chen, Beng Chin Ooi, and Sai Wu. Llama: leveraging columnar storage for scalable join processing in the MapReduce framework. In SIGMOD Conference, pages 961–972, 2011.
179.
Zurück zum Zitat Dionysios Logothetis and Kenneth Yocum. Ad-hoc data processing in the cloud. Proc. VLDB Endow., 1(2):1472–1475, August 2008. Dionysios Logothetis and Kenneth Yocum. Ad-hoc data processing in the cloud. Proc. VLDB Endow., 1(2):1472–1475, August 2008.
180.
Zurück zum Zitat Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD, pages 135–146, 2010. Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD, pages 135–146, 2010.
182.
Zurück zum Zitat Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton, and Theo Vassilakis. Dremel: interactive analysis of web-scale datasets. Proc. VLDB Endow., 3(1–2):330–339, September 2010. Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton, and Theo Vassilakis. Dremel: interactive analysis of web-scale datasets. Proc. VLDB Endow., 3(1–2):330–339, September 2010.
183.
Zurück zum Zitat Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton, and Theo Vassilakis. Dremel: interactive analysis of web-scale datasets. Commun. ACM, 54(6):114–123, June 2011. Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton, and Theo Vassilakis. Dremel: interactive analysis of web-scale datasets. Commun. ACM, 54(6):114–123, June 2011.
184.
Zurück zum Zitat Kristi Morton, Magdalena Balazinska, and Dan Grossman. ParaTimer: a progress indicator for MapReduce DAGs. In SIGMOD Conference, pages 507–518, 2010. Kristi Morton, Magdalena Balazinska, and Dan Grossman. ParaTimer: a progress indicator for MapReduce DAGs. In SIGMOD Conference, pages 507–518, 2010.
185.
Zurück zum Zitat Kristi Morton, Abram L. Friesen, Magdalena Balazinska, and Dan Grossman. Estimating the progress of mapreduce pipelines. In Proceedings of the 26th IEEE International Conference on Data Engineering, ICDE ’10, pages 681–684, Long Beach, CA, USA, March 2010. IEEE Computer Society. Kristi Morton, Abram L. Friesen, Magdalena Balazinska, and Dan Grossman. Estimating the progress of mapreduce pipelines. In Proceedings of the 26th IEEE International Conference on Data Engineering, ICDE ’10, pages 681–684, Long Beach, CA, USA, March 2010. IEEE Computer Society.
187.
Zurück zum Zitat Tomasz Nykiel, Michalis Potamias, Chaitanya Mishra, George Kollios, and Nick Koudas. MRShare: Sharing Across Multiple Queries in MapReduce. PVLDB, 3(1):494–505, 2010. Tomasz Nykiel, Michalis Potamias, Chaitanya Mishra, George Kollios, and Nick Koudas. MRShare: Sharing Across Multiple Queries in MapReduce. PVLDB, 3(1):494–505, 2010.
188.
Zurück zum Zitat Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins. Pig latin: a not-so-foreign language for data processing. In SIGMOD, pages 1099–1110, 2008. Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins. Pig latin: a not-so-foreign language for data processing. In SIGMOD, pages 1099–1110, 2008.
193.
Zurück zum Zitat David A. Patterson. Technical perspective: the data center is the computer. Commun. ACM, 51(1):105, 2008. David A. Patterson. Technical perspective: the data center is the computer. Commun. ACM, 51(1):105, 2008.
194.
Zurück zum Zitat Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel J. Abadi, David J. DeWitt, Samuel Madden, and Michael Stonebraker. A comparison of approaches to large-scale data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, SIGMOD ’09, pages 165–178, New York, NY, USA, 2009. ACM. Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel J. Abadi, David J. DeWitt, Samuel Madden, and Michael Stonebraker. A comparison of approaches to large-scale data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, SIGMOD ’09, pages 165–178, New York, NY, USA, 2009. ACM.
195.
Zurück zum Zitat Rob Pike, Sean Dorward, Robert Griesemer, and Sean Quinlan. Interpreting the data: Parallel analysis with sawzall. Sci. Program., 13(4):277–298, October 2005. Rob Pike, Sean Dorward, Robert Griesemer, and Sean Quinlan. Interpreting the data: Parallel analysis with sawzall. Sci. Program., 13(4):277–298, October 2005.
197.
Zurück zum Zitat Jorge-Arnulfo Quiané-Ruiz, Christoph Pinkel, Jörg Schad, and Jens Dittrich. RAFT at work: speeding-up mapreduce applications under task and node failures. In SIGMOD Conference, pages 1225–1228, 2011. Jorge-Arnulfo Quiané-Ruiz, Christoph Pinkel, Jörg Schad, and Jens Dittrich. RAFT at work: speeding-up mapreduce applications under task and node failures. In SIGMOD Conference, pages 1225–1228, 2011.
198.
Zurück zum Zitat Jorge-Arnulfo Quiané-Ruiz, Christoph Pinkel, Jörg Schad, and Jens Dittrich. RAFTing MapReduce: Fast recovery on the RAFT. In ICDE, pages 589–600, 2011. Jorge-Arnulfo Quiané-Ruiz, Christoph Pinkel, Jörg Schad, and Jens Dittrich. RAFTing MapReduce: Fast recovery on the RAFT. In ICDE, pages 589–600, 2011.
206.
Zurück zum Zitat Sherif Sakr, Anna Liu, and Ayman G. Fayoumi. The Family of MapReduce and Large Scale Data Processing Systems. CoRR, abs/1302.2966, 2013. Sherif Sakr, Anna Liu, and Ayman G. Fayoumi. The Family of MapReduce and Large Scale Data Processing Systems. CoRR, abs/1302.2966, 2013.
215.
Zurück zum Zitat Michael Stonebraker. The case for shared nothing. IEEE Database Eng. Bull., 9(1):4–9, 1986. Michael Stonebraker. The case for shared nothing. IEEE Database Eng. Bull., 9(1):4–9, 1986.
217.
Zurück zum Zitat Michael Stonebraker, Daniel J. Abadi, David J. DeWitt, Samuel Madden, Erik Paulson, Andrew Pavlo, and Alexander Rasin. Mapreduce and parallel dbmss: friends or foes? Commun. ACM, 53(1):64–71, 2010.CrossRef Michael Stonebraker, Daniel J. Abadi, David J. DeWitt, Samuel Madden, Erik Paulson, Andrew Pavlo, and Alexander Rasin. Mapreduce and parallel dbmss: friends or foes? Commun. ACM, 53(1):64–71, 2010.CrossRef
222.
Zurück zum Zitat Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff, and Raghotham Murthy. Hive: a warehousing solution over a map-reduce framework. Proc. VLDB Endow., 2(2):1626–1629, August 2009. Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff, and Raghotham Murthy. Hive: a warehousing solution over a map-reduce framework. Proc. VLDB Endow., 2(2):1626–1629, August 2009.
223.
Zurück zum Zitat Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Ning Zhang, Suresh Anthony, Hao Liu, and Raghotham Murthy. Hive: a petabyte scale data warehouse using Hadoop. In Proceedings of the 26th IEEE International Conference on Data Engineering, ICDE ’10, pages 996–1005, Long Beach, CA, USA, March 2010. IEEE Computer Society. Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Ning Zhang, Suresh Anthony, Hao Liu, and Raghotham Murthy. Hive: a petabyte scale data warehouse using Hadoop. In Proceedings of the 26th IEEE International Conference on Data Engineering, ICDE ’10, pages 996–1005, Long Beach, CA, USA, March 2010. IEEE Computer Society.
224.
Zurück zum Zitat Ashish Thusoo, Zheng Shao, Suresh Anthony, Dhruba Borthakur, Namit Jain, Joydeep Sen Sarma, Raghotham Murthy, and Hao Liu. Data warehousing and analytics infrastructure at facebook. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD ’10, pages 1013–1020, New York, NY, USA, 2010. ACM. Ashish Thusoo, Zheng Shao, Suresh Anthony, Dhruba Borthakur, Namit Jain, Joydeep Sen Sarma, Raghotham Murthy, and Hao Liu. Data warehousing and analytics infrastructure at facebook. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD ’10, pages 1013–1020, New York, NY, USA, 2010. ACM.
230.
Zurück zum Zitat Tom White. Hadoop: The Definitive Guide. O’Reilly Media, 3rd edition, May 2012. Tom White. Hadoop: The Definitive Guide. O’Reilly Media, 3rd edition, May 2012.
236.
Zurück zum Zitat Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Randy Katz, and Ion Stoica. Improving MapReduce performance in heterogeneous environments. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI ’08, pages 29–42, Berkeley, CA, USA, 2008. USENIX Association. Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Randy Katz, and Ion Stoica. Improving MapReduce performance in heterogeneous environments. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI ’08, pages 29–42, Berkeley, CA, USA, 2008. USENIX Association.
240.
Zurück zum Zitat Yanfeng Zhang, Qixin Gao, Lixin Gao, and Cuirong Wang. iMapReduce: A distributed computing framework for iterative computation. J. Grid Comput., 10(1):47–68, March 2012. Yanfeng Zhang, Qixin Gao, Lixin Gao, and Cuirong Wang. iMapReduce: A distributed computing framework for iterative computation. J. Grid Comput., 10(1):47–68, March 2012.
243.
Zurück zum Zitat Marcin Zukowski, Peter A. Boncz, Niels Nes, and Sándor Héman. MonetDB/X100 - A DBMS In The CPU Cache. IEEE Data Eng. Bull., 28(2):17–22, 2005. Marcin Zukowski, Peter A. Boncz, Niels Nes, and Sándor Héman. MonetDB/X100 - A DBMS In The CPU Cache. IEEE Data Eng. Bull., 28(2):17–22, 2005.
Metadaten
Titel
Big Data Processing Systems
verfasst von
Liang Zhao
Sherif Sakr
Anna Liu
Athman Bouguettaya
Copyright-Jahr
2014
DOI
https://doi.org/10.1007/978-3-319-04765-2_9