Weitere Kapitel dieses Buchs durch Wischen aufrufen
MapReduce is an effective tool for the parallel-processing of data. A major problem in practice, MapReduce Skew of the data: imbalance amount of data for each task consigned. Because some of the tasks to last much longer than other, and can greatly affect performance. A scale that is lightweight strategy for data skew problem solving Applications, to the reducer side in MapReduce. In contrast to previous work scale is no need to scan in front of a series of input data or to prevent the overlap between the maps and reduce phases. System uses innovative idea take samples, which can achieve a high level of calculation and produce accurate approximation for the distribution of intermediate data by scanning only a small portion of the data on intermediate Map of the normal processing. It allows the reduction of tasks, to start the copying once the sample map functions selected (only a small part Map of tasks that have been fully spent for the first time). It supports split large clusters make connotations when applied and the total Output data is set up. System is implemented in Hadoop and Our experiments show that the implementation of some popular applications to speed up on negligible and can speed up Factor 4.
Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten
Sie möchten Zugang zu diesem Inhalt erhalten? Dann informieren Sie sich jetzt über unsere Produkte:
J. Dean and S. Ghema “Mapreduce: simplified data processing on large clusters” Communication acm volume 51, Jan 2008.
“Apache Hadoop, http://lucene.apache.org/hadoop/”.
M. Isard, Y. Yu, and D. Fetterly “Dryad” European conference on Compter Systems 2007.
Y. Kwon, M. Balazinka and Howe “A study of skew in MR Applications” Cirrus 2011.
C. B. Walton, A.G. Dale, and R. M. Jenevein, “A Taxonomy and Performance model of data skew effects in parallel joins”. International Conference on very large databases (VLDB), 1991.
D. DeWitt, J. Naughton, A. Schneider, and S. Seshadri, “Practical Skew Handling in Parallel Joins”, VLDB, 1992.
J. Stamoas and C. Young “Symmetric Fragment and Replicate algorithm for Distributed Joins”, IEEE TPDS Vol. 4 1993.
V. Poosala and Y. Ioannnidis, “Estimation of query-result distribution and its application in parallel-join load”, VLDB, 1996.
Y. Xu and P. Kostamaa. “Efficient outer join data skew handling in parallel DBMS”, VLDB vol. 2.2 2009.
S. Acharya, P. B. Gibbons and V. Poosala, “Congressional Samples for Approximate Answering of Group-by Queries”, International Conference on Mgmt of Data 2000.
A. Shatdal and J. Naughton, “Adaptive Parallel Aggregation Algorithms”, ACM Sigmod International conference on Mgmt of Data 1995.
J. Rolia and B. Howe, “Skew–Resistant Parallel Processing of Feature–Exatracting Scientific User-Defined Functions”, ACM Symposium on Cloud Computing, 2010.
Qi Chen Yao and Zhen Xiao “LIBRA: Lightweight data skew mitigation in mapreduce”, IEEE Transactions on parallel and distributed systems.
- A Comparative Investigation of Sample Versus Normal Map for Effective BigData Processing
- Springer India
Neuer Inhalt/© ITandMEDIA