nach oben

Cluster Computing

Erschienen in:

19.12.2017

An iterative sampling method for online aggregation

verfasst von: Zhiqiang Zhang, Jianghua Hu, Xiaoqin Xie, Haiwei Pan, Xiaoning Feng

Erschienen in: Cluster Computing | Sonderheft 1/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Online aggregation (OLA) makes it possible to save cost by taking acceptable approximate early answers. Compared to the precise results, computing the approximate ones are more cost effective, especially for large-scale datasets. The user can terminate the processing at any time, when he/she is satisfied with the quality of the result. And the performance of OLA relies on the sampling approach and estimation model. But in large scale distributed computing environment, how to realize OLA more efficiently is a challenging problem. In this paper, we consider the problem of providing OLA in the distributed computing environment and propose a Hadoop-based iterative sampling method for online aggregation. The desired precision of the user can be met by two iteration samplings. To avoid the effects of data bias, we propose a “layered sampling” method to ensure that the approximate aggregation result is statistically meaningful. The experimental results showed the “layered sampling” method considers not only the time efficiency, but also the usage of computing and storage resources of Hadoop.

Vorheriger Artikel MCCFG: an MOF-based multiple condition control flow graph for automatic test case generation

Nächster Artikel Mirrored and hybrid disk arrays and their reliability

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Pansare, N., Borkar, V.R., Jermaine, C., et al.: Online aggregation for large MapReduce jobs. Proc. VLDB Endow 4(11), 1135–1145 (2011)

Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: OSDI, pp. 137–150 (2004)

Hellerstein, J.M., Haas, P.J., Wang, H.J.: Online aggregation. In: SIGMOD Conference Proceedings, pp. 171–182 (1997)

Haas, P.J.: Large-sample and deterministic confidence intervals for online aggregation. In: SSDBM 1997 Conference Proceedings, pp. 51–63 (1997)

Qin, C., Rusu, F.: Sampling estimators for parallel online aggregation. In: Big Data, pp. 204–217. Springer, Berlin (2013)

Qin, C., Rusu, F.: PF-OLA: a high-performance framework for parallel online aggregation. Distrib. Parallel Databases 32, 1–39 (2013)

Luo, G., Ellmann, C.J., Haas, P.J., Naughton, J.F.: A scalable Hash ripple join algorithm. In: SIGMOD, pp. 252–262 (2002)

Haas, P.J., Hellerstein, J.M.: Ripple joins for online aggregation. In: SIGMOD, pp. 287–298 (1999)

Wu, S., et al.: Distributed online aggregation. PVLDB 2(1), 443–454 (2009)

10.

Wu, S., et al.: Continuous sampling for online aggregation over multiple queries. In: SIGMOD, pp. 651–662 (2010)

11.

Condie, T., Conway, N., Alvaro, P., Hellerstein, J.M., Gerth, J., Talbot, J., Elmeleegy, K., Sears, R.: Online aggregation and continuous query support in MapReduce. In: SIGMOD Conference, pp. 1115–1118 (2010)

12.

Laptev, N., Zeng, K., Zaniolo, C.: Early accurate results for advanced analytics on mapreduce. Proc. VLDB Endow. 5(10), 1028–1039 (2012)CrossRef

13.

Kalavri, V., Brundza, V., Vlassov, V.: Block sampling: efficient accurate online aggregation in MapReduce. In: IEEE 5th International Conference on Cloud Computing Technology and Science (CloudCom), vol. 1, pp. 250–257. IEEE, New York (2013)

14.

Gan, Y., Meng, X., Shi, Y.: Processing online aggregation on skewed data in MapReduce. In: Proceedings of the Fifth International Workshop on Cloud Data Management, pp. 3–10. ACM, New York (2013)

15.

Xixian, H., Jianzhong, L., Hong, G.: PAA: an efficient approximate aggregation algorithm on Massive Data. J. Comput. Res. Dev. 51(1), 41–53 (2014)

16.

Ci, X., Meng, X.: An efficient block sampling strategy for online aggregation in the Cloud. In: Proceedings of International Conference on Web-Age Information Management (WAIM), June 8–10, Qingdao, China. LNCS 9098, pp. 362–373 (2015)

17.

Zhang, Z., Hu, J., Xie, X., Pan, H., Feng, X.: An online approximate aggregation query processing method based on Hadoop. In: IEEE 20th International Conference on Computer Supported Cooperative Work in Design (CSCWD), Nanchang, China, pp. 117–122 (2016)

18.

https://en.wikipedia.org/wiki/Central_limit_theorem

19.

Cox, D.R.: Estimation by double sampling. Biometrika 39(3–4), 217–227 (1952)MathSciNetCrossRefMATH

20.

Govindarajulu, Z.: Elements of Sampling Theory and Methods, pp. 64–72. Prentice Hall, Upper Saddle River (1999)

21.

http://archive.apache.org/dist/hadoop/core/hadoop-0.19.2/

22.

https://github.com/vasia/HOP-S

23.

http://www.tpc.org/tpch/spec/tpch_2_14_0.zip

24.

ftp://ftp.ncdc.noaa.gov/pub/data/noaa/isd-lite

Titel: An iterative sampling method for online aggregation
verfasst von: Zhiqiang Zhang
Jianghua Hu
Xiaoqin Xie
Haiwei Pan
Xiaoning Feng
Publikationsdatum: 19.12.2017
Verlag: Springer US
Erschienen in: Cluster Computing / Ausgabe Sonderheft 1/2019
Print ISSN: 1386-7857
Elektronische ISSN: 1573-7543
DOI: https://doi.org/10.1007/s10586-017-1451-x

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Sonderheft 1/2019

An optimized service broker routing policy based on differential evolution algorithm in fog/cloud environment

Distributed SVM face recognition based on Hadoop

Face recognition algorithm based on wavelet transform and local linear embedding

Modeling and analysis for stock return movements along with exchange rates and interest rates in Markov regime-switching models

Scalable distributed data cube computation for large-scale multidimensional data analysis on a Spark cluster

Triple referee incentive mechanism for secure mobile adhoc networks