Skip to main content
Top

2016 | OriginalPaper | Chapter

A Comparative Investigation of Sample Versus Normal Map for Effective BigData Processing

Authors : Shyavappa Yalawar, V. Suma, Jawahar Rao

Published in: Information Systems Design and Intelligent Applications

Publisher: Springer India

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

MapReduce is an effective tool for the parallel-processing of data. A major problem in practice, MapReduce Skew of the data: imbalance amount of data for each task consigned. Because some of the tasks to last much longer than other, and can greatly affect performance. A scale that is lightweight strategy for data skew problem solving Applications, to the reducer side in MapReduce. In contrast to previous work scale is no need to scan in front of a series of input data or to prevent the overlap between the maps and reduce phases. System uses innovative idea take samples, which can achieve a high level of calculation and produce accurate approximation for the distribution of intermediate data by scanning only a small portion of the data on intermediate Map of the normal processing. It allows the reduction of tasks, to start the copying once the sample map functions selected (only a small part Map of tasks that have been fully spent for the first time). It supports split large clusters make connotations when applied and the total Output data is set up. System is implemented in Hadoop and Our experiments show that the implementation of some popular applications to speed up on negligible and can speed up Factor 4.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference J. Dean and S. Ghema “Mapreduce: simplified data processing on large clusters” Communication acm volume 51, Jan 2008. J. Dean and S. Ghema “Mapreduce: simplified data processing on large clusters” Communication acm volume 51, Jan 2008.
3.
go back to reference M. Isard, Y. Yu, and D. Fetterly “Dryad” European conference on Compter Systems 2007. M. Isard, Y. Yu, and D. Fetterly “Dryad” European conference on Compter Systems 2007.
4.
go back to reference Y. Kwon, M. Balazinka and Howe “A study of skew in MR Applications” Cirrus 2011. Y. Kwon, M. Balazinka and Howe “A study of skew in MR Applications” Cirrus 2011.
5.
go back to reference C. B. Walton, A.G. Dale, and R. M. Jenevein, “A Taxonomy and Performance model of data skew effects in parallel joins”. International Conference on very large databases (VLDB), 1991. C. B. Walton, A.G. Dale, and R. M. Jenevein, “A Taxonomy and Performance model of data skew effects in parallel joins”. International Conference on very large databases (VLDB), 1991.
6.
go back to reference D. DeWitt, J. Naughton, A. Schneider, and S. Seshadri, “Practical Skew Handling in Parallel Joins”, VLDB, 1992. D. DeWitt, J. Naughton, A. Schneider, and S. Seshadri, “Practical Skew Handling in Parallel Joins”, VLDB, 1992.
7.
go back to reference J. Stamoas and C. Young “Symmetric Fragment and Replicate algorithm for Distributed Joins”, IEEE TPDS Vol. 4 1993. J. Stamoas and C. Young “Symmetric Fragment and Replicate algorithm for Distributed Joins”, IEEE TPDS Vol. 4 1993.
8.
go back to reference V. Poosala and Y. Ioannnidis, “Estimation of query-result distribution and its application in parallel-join load”, VLDB, 1996. V. Poosala and Y. Ioannnidis, “Estimation of query-result distribution and its application in parallel-join load”, VLDB, 1996.
9.
go back to reference Y. Xu and P. Kostamaa. “Efficient outer join data skew handling in parallel DBMS”, VLDB vol. 2.2 2009. Y. Xu and P. Kostamaa. “Efficient outer join data skew handling in parallel DBMS”, VLDB vol. 2.2 2009.
10.
go back to reference S. Acharya, P. B. Gibbons and V. Poosala, “Congressional Samples for Approximate Answering of Group-by Queries”, International Conference on Mgmt of Data 2000. S. Acharya, P. B. Gibbons and V. Poosala, “Congressional Samples for Approximate Answering of Group-by Queries”, International Conference on Mgmt of Data 2000.
11.
go back to reference A. Shatdal and J. Naughton, “Adaptive Parallel Aggregation Algorithms”, ACM Sigmod International conference on Mgmt of Data 1995. A. Shatdal and J. Naughton, “Adaptive Parallel Aggregation Algorithms”, ACM Sigmod International conference on Mgmt of Data 1995.
12.
go back to reference J. Rolia and B. Howe, “Skew–Resistant Parallel Processing of Feature–Exatracting Scientific User-Defined Functions”, ACM Symposium on Cloud Computing, 2010. J. Rolia and B. Howe, “Skew–Resistant Parallel Processing of Feature–Exatracting Scientific User-Defined Functions”, ACM Symposium on Cloud Computing, 2010.
13.
go back to reference Qi Chen Yao and Zhen Xiao “LIBRA: Lightweight data skew mitigation in mapreduce”, IEEE Transactions on parallel and distributed systems. Qi Chen Yao and Zhen Xiao “LIBRA: Lightweight data skew mitigation in mapreduce”, IEEE Transactions on parallel and distributed systems.
Metadata
Title
A Comparative Investigation of Sample Versus Normal Map for Effective BigData Processing
Authors
Shyavappa Yalawar
V. Suma
Jawahar Rao
Copyright Year
2016
Publisher
Springer India
DOI
https://doi.org/10.1007/978-81-322-2752-6_74

Premium Partner