nach oben

Erschienen in:

2016 | OriginalPaper | Buchkapitel

A Comparative Investigation of Sample Versus Normal Map for Effective BigData Processing

verfasst von : Shyavappa Yalawar, V. Suma, Jawahar Rao

Erschienen in: Information Systems Design and Intelligent Applications

Verlag: Springer India

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

MapReduce is an effective tool for the parallel-processing of data. A major problem in practice, MapReduce Skew of the data: imbalance amount of data for each task consigned. Because some of the tasks to last much longer than other, and can greatly affect performance. A scale that is lightweight strategy for data skew problem solving Applications, to the reducer side in MapReduce. In contrast to previous work scale is no need to scan in front of a series of input data or to prevent the overlap between the maps and reduce phases. System uses innovative idea take samples, which can achieve a high level of calculation and produce accurate approximation for the distribution of intermediate data by scanning only a small portion of the data on intermediate Map of the normal processing. It allows the reduction of tasks, to start the copying once the sample map functions selected (only a small part Map of tasks that have been fully spent for the first time). It supports split large clusters make connotations when applied and the total Output data is set up. System is implemented in Hadoop and Our experiments show that the implementation of some popular applications to speed up on negligible and can speed up Factor 4.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel An Amalgamated Strategy for Iris Recognition Employing Neural Network and Hamming Distance

Nächstes Kapitel Big Data Management System for Personal Privacy Using SW and SDF

J. Dean and S. Ghema “Mapreduce: simplified data processing on large clusters” Communication acm volume 51, Jan 2008.

“Apache Hadoop, http://lucene.apache.org/hadoop/”.

M. Isard, Y. Yu, and D. Fetterly “Dryad” European conference on Compter Systems 2007.

Y. Kwon, M. Balazinka and Howe “A study of skew in MR Applications” Cirrus 2011.

C. B. Walton, A.G. Dale, and R. M. Jenevein, “A Taxonomy and Performance model of data skew effects in parallel joins”. International Conference on very large databases (VLDB), 1991.

D. DeWitt, J. Naughton, A. Schneider, and S. Seshadri, “Practical Skew Handling in Parallel Joins”, VLDB, 1992.

J. Stamoas and C. Young “Symmetric Fragment and Replicate algorithm for Distributed Joins”, IEEE TPDS Vol. 4 1993.

V. Poosala and Y. Ioannnidis, “Estimation of query-result distribution and its application in parallel-join load”, VLDB, 1996.

Y. Xu and P. Kostamaa. “Efficient outer join data skew handling in parallel DBMS”, VLDB vol. 2.2 2009.

10.

S. Acharya, P. B. Gibbons and V. Poosala, “Congressional Samples for Approximate Answering of Group-by Queries”, International Conference on Mgmt of Data 2000.

11.

A. Shatdal and J. Naughton, “Adaptive Parallel Aggregation Algorithms”, ACM Sigmod International conference on Mgmt of Data 1995.

12.

J. Rolia and B. Howe, “Skew–Resistant Parallel Processing of Feature–Exatracting Scientific User-Defined Functions”, ACM Symposium on Cloud Computing, 2010.

13.

Qi Chen Yao and Zhen Xiao “LIBRA: Lightweight data skew mitigation in mapreduce”, IEEE Transactions on parallel and distributed systems.

Titel: A Comparative Investigation of Sample Versus Normal Map for Effective BigData Processing
verfasst von: Shyavappa Yalawar
V. Suma
Jawahar Rao
Verlag: Springer India
Buch: Information Systems Design and Intelligent Applications
Print ISBN: 978-81-322-2750-2

Electronic ISBN: 978-81-322-2752-6

Copyright-Jahr: 2016
DOI: https://doi.org/10.1007/978-81-322-2752-6_74

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"