Skip to main content

2016 | OriginalPaper | Buchkapitel

A Comparative Investigation of Sample Versus Normal Map for Effective BigData Processing

verfasst von : Shyavappa Yalawar, V. Suma, Jawahar Rao

Erschienen in: Information Systems Design and Intelligent Applications

Verlag: Springer India

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

MapReduce is an effective tool for the parallel-processing of data. A major problem in practice, MapReduce Skew of the data: imbalance amount of data for each task consigned. Because some of the tasks to last much longer than other, and can greatly affect performance. A scale that is lightweight strategy for data skew problem solving Applications, to the reducer side in MapReduce. In contrast to previous work scale is no need to scan in front of a series of input data or to prevent the overlap between the maps and reduce phases. System uses innovative idea take samples, which can achieve a high level of calculation and produce accurate approximation for the distribution of intermediate data by scanning only a small portion of the data on intermediate Map of the normal processing. It allows the reduction of tasks, to start the copying once the sample map functions selected (only a small part Map of tasks that have been fully spent for the first time). It supports split large clusters make connotations when applied and the total Output data is set up. System is implemented in Hadoop and Our experiments show that the implementation of some popular applications to speed up on negligible and can speed up Factor 4.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat J. Dean and S. Ghema “Mapreduce: simplified data processing on large clusters” Communication acm volume 51, Jan 2008. J. Dean and S. Ghema “Mapreduce: simplified data processing on large clusters” Communication acm volume 51, Jan 2008.
3.
Zurück zum Zitat M. Isard, Y. Yu, and D. Fetterly “Dryad” European conference on Compter Systems 2007. M. Isard, Y. Yu, and D. Fetterly “Dryad” European conference on Compter Systems 2007.
4.
Zurück zum Zitat Y. Kwon, M. Balazinka and Howe “A study of skew in MR Applications” Cirrus 2011. Y. Kwon, M. Balazinka and Howe “A study of skew in MR Applications” Cirrus 2011.
5.
Zurück zum Zitat C. B. Walton, A.G. Dale, and R. M. Jenevein, “A Taxonomy and Performance model of data skew effects in parallel joins”. International Conference on very large databases (VLDB), 1991. C. B. Walton, A.G. Dale, and R. M. Jenevein, “A Taxonomy and Performance model of data skew effects in parallel joins”. International Conference on very large databases (VLDB), 1991.
6.
Zurück zum Zitat D. DeWitt, J. Naughton, A. Schneider, and S. Seshadri, “Practical Skew Handling in Parallel Joins”, VLDB, 1992. D. DeWitt, J. Naughton, A. Schneider, and S. Seshadri, “Practical Skew Handling in Parallel Joins”, VLDB, 1992.
7.
Zurück zum Zitat J. Stamoas and C. Young “Symmetric Fragment and Replicate algorithm for Distributed Joins”, IEEE TPDS Vol. 4 1993. J. Stamoas and C. Young “Symmetric Fragment and Replicate algorithm for Distributed Joins”, IEEE TPDS Vol. 4 1993.
8.
Zurück zum Zitat V. Poosala and Y. Ioannnidis, “Estimation of query-result distribution and its application in parallel-join load”, VLDB, 1996. V. Poosala and Y. Ioannnidis, “Estimation of query-result distribution and its application in parallel-join load”, VLDB, 1996.
9.
Zurück zum Zitat Y. Xu and P. Kostamaa. “Efficient outer join data skew handling in parallel DBMS”, VLDB vol. 2.2 2009. Y. Xu and P. Kostamaa. “Efficient outer join data skew handling in parallel DBMS”, VLDB vol. 2.2 2009.
10.
Zurück zum Zitat S. Acharya, P. B. Gibbons and V. Poosala, “Congressional Samples for Approximate Answering of Group-by Queries”, International Conference on Mgmt of Data 2000. S. Acharya, P. B. Gibbons and V. Poosala, “Congressional Samples for Approximate Answering of Group-by Queries”, International Conference on Mgmt of Data 2000.
11.
Zurück zum Zitat A. Shatdal and J. Naughton, “Adaptive Parallel Aggregation Algorithms”, ACM Sigmod International conference on Mgmt of Data 1995. A. Shatdal and J. Naughton, “Adaptive Parallel Aggregation Algorithms”, ACM Sigmod International conference on Mgmt of Data 1995.
12.
Zurück zum Zitat J. Rolia and B. Howe, “Skew–Resistant Parallel Processing of Feature–Exatracting Scientific User-Defined Functions”, ACM Symposium on Cloud Computing, 2010. J. Rolia and B. Howe, “Skew–Resistant Parallel Processing of Feature–Exatracting Scientific User-Defined Functions”, ACM Symposium on Cloud Computing, 2010.
13.
Zurück zum Zitat Qi Chen Yao and Zhen Xiao “LIBRA: Lightweight data skew mitigation in mapreduce”, IEEE Transactions on parallel and distributed systems. Qi Chen Yao and Zhen Xiao “LIBRA: Lightweight data skew mitigation in mapreduce”, IEEE Transactions on parallel and distributed systems.
Metadaten
Titel
A Comparative Investigation of Sample Versus Normal Map for Effective BigData Processing
verfasst von
Shyavappa Yalawar
V. Suma
Jawahar Rao
Copyright-Jahr
2016
Verlag
Springer India
DOI
https://doi.org/10.1007/978-81-322-2752-6_74