Skip to main content
Erschienen in: Distributed and Parallel Databases 4/2019

31.07.2018

Efficient OLAP algorithms on GPU-accelerated Hadoop clusters

verfasst von: Hongzhi Wang, Zheng Wang, Ning Li, Xinxin Kong

Erschienen in: Distributed and Parallel Databases | Ausgabe 4/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In the time of big data, on-line analytical processing (OLAP) is an important method to process massive data. In order to realize a system with the capacity of both high storage and high computing power, Hadoop and GPU are both applied in OLAP. In general, three cores of OLAP determines the efficiency of OLAP analysis, which are aggregation of multi-dimensional data, pre-calculation of multi-dimensional data set (Cube) and connection of dimension table and fact table. For the purpose of boosting efficiency, this paper presents optimizing algorithms for each core. Beginning with aggregation on single machine, this paper firstly designs the GPU-based aggregation algorithm. Then, GPU-based Cube algorithm is introduced to accelerate pre-calculation, using inverted index to shrink computation amount. Finally, with new-designed dimension table connecting algorithm and query algorithm, GPU-based OLAP analysis algorithm is presented. Along with corresponding experiments and results, each algorithm shows their ability of boosting efficiency, optimizing GPU-based OLAP analysis on Hadoop.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Ailamaki, A., DeWitt, D.J., Hill, M.D.: Data page layouts for relational databases on deep memory hierarchies. VLDB J. 11(3), 198–215 (2002)CrossRef Ailamaki, A., DeWitt, D.J., Hill, M.D.: Data page layouts for relational databases on deep memory hierarchies. VLDB J. 11(3), 198–215 (2002)CrossRef
2.
Zurück zum Zitat Alcantara, D.A., Sharf, A.: Real-time parallel hashing on the GPU. ACM Trans. Graph. 28(5), 154 (2011) Alcantara, D.A., Sharf, A.: Real-time parallel hashing on the GPU. ACM Trans. Graph. 28(5), 154 (2011)
3.
Zurück zum Zitat Arres, B., Kabbachi, N., Boussaid, O.: Building olap cubes on a cloud computing environment with mapreduce. In: IEEE ACS International Conference on Computer Systems and Applications (AICCSA), pp. 1–5 (2013) Arres, B., Kabbachi, N., Boussaid, O.: Building olap cubes on a cloud computing environment with mapreduce. In: IEEE ACS International Conference on Computer Systems and Applications (AICCSA), pp. 1–5 (2013)
4.
Zurück zum Zitat Beyer, R.: Bottom-up computation of sparse and iceberg cube. In: SIGMOD (1999) Beyer, R.: Bottom-up computation of sparse and iceberg cube. In: SIGMOD (1999)
5.
Zurück zum Zitat Carstoiu, D., Cernian, A., Olteanu, A.: Hadoop hbase-0.20. 2 performance evaluation. In: NISS (2010) Carstoiu, D., Cernian, A., Olteanu, A.: Hadoop hbase-0.20. 2 performance evaluation. In: NISS (2010)
6.
Zurück zum Zitat Chen, Y., Dehne, F.: Parallel rolap data cube construction on shared-nothing multiprocessors. Distrib. Parallel Databases 15(3), 219–236 (2003)CrossRef Chen, Y., Dehne, F.: Parallel rolap data cube construction on shared-nothing multiprocessors. Distrib. Parallel Databases 15(3), 219–236 (2003)CrossRef
7.
Zurück zum Zitat Chen., Y, Dehne, F.: PnP: parallel and external memory iceberg cube computation. In: ICDE (2005) Chen., Y, Dehne, F.: PnP: parallel and external memory iceberg cube computation. In: ICDE (2005)
8.
Zurück zum Zitat Condie, T., Conway, N.: Online aggregation and continuous query support in mapreduce. In: ACM SIGMOD International Conference on Management of Data (2010) Condie, T., Conway, N.: Online aggregation and continuous query support in mapreduce. In: ACM SIGMOD International Conference on Management of Data (2010)
9.
Zurück zum Zitat Dehne, F., Eavis, T., Rauchaplin, A.: Parallel multi-dimensional ROLAP indexing. In: 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, pp. 86–93 (2003) Dehne, F., Eavis, T., Rauchaplin, A.: Parallel multi-dimensional ROLAP indexing. In: 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, pp. 86–93 (2003)
10.
Zurück zum Zitat Dennl, C., Ziener, D., Teich, J.: Acceleration of SQL restrictions and aggregations through FPGA-based dynamic partial reconfiguration. In: 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines, IEEE Computer Society (2013) Dennl, C., Ziener, D., Teich, J.: Acceleration of SQL restrictions and aggregations through FPGA-based dynamic partial reconfiguration. In: 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines, IEEE Computer Society (2013)
11.
Zurück zum Zitat Garca, I., Lefebvre, S.: Coherent parallel hashing. In: ACM Transactions on Graphics (TOG), vol. 30, no. 6, p. 161 (2011) Garca, I., Lefebvre, S.: Coherent parallel hashing. In: ACM Transactions on Graphics (TOG), vol. 30, no. 6, p. 161 (2011)
12.
Zurück zum Zitat Govindaraju, N., Gray, J.: Gputerasort: high performance graphics co-processor sorting for large database management. In: ACM SIGMOD International Conference on Management of Data. ACM (2006) Govindaraju, N., Gray, J.: Gputerasort: high performance graphics co-processor sorting for large database management. In: ACM SIGMOD International Conference on Management of Data. ACM (2006)
13.
Zurück zum Zitat Guo, Y., Rao, J., Zhou, X.: ishuffle: improving hadoop performance with shuffle-on-write. In: ICAC (2013) Guo, Y., Rao, J., Zhou, X.: ishuffle: improving hadoop performance with shuffle-on-write. In: ICAC (2013)
14.
Zurück zum Zitat Han, J., Pei, J., Dong, G., Wang, K.: Efficient computation of iceberg cubes with complex measures. In: SIGMOD (2001) Han, J., Pei, J., Dong, G., Wang, K.: Efficient computation of iceberg cubes with complex measures. In: SIGMOD (2001)
15.
Zurück zum Zitat He, B., Lu, M.: Relational query coprocessing on graphics processors. ACM Trans. Database Syst. 34(4), 21 (2009)CrossRef He, B., Lu, M.: Relational query coprocessing on graphics processors. ACM Trans. Database Syst. 34(4), 21 (2009)CrossRef
17.
Zurück zum Zitat Janet, B., Reddy, A.V.: Cube index for unstructured text analysis and mining. In: ICCCS (2011) Janet, B., Reddy, A.V.: Cube index for unstructured text analysis and mining. In: ICCCS (2011)
18.
Zurück zum Zitat Kaldewey, T., Lohman, G.: GPU join processing revisited. In: Eighth International Workshop on Data Management on New Hardware. ACM, pp. 55–62 (2012) Kaldewey, T., Lohman, G.: GPU join processing revisited. In: Eighth International Workshop on Data Management on New Hardware. ACM, pp. 55–62 (2012)
19.
Zurück zum Zitat Laks, V.S., Lakshmanan, J.P., Han, J.: Quotient cubes: how to summarize the semantics of a data cube. In: VLDB (2002) Laks, V.S., Lakshmanan, J.P., Han, J.: Quotient cubes: how to summarize the semantics of a data cube. In: VLDB (2002)
20.
Zurück zum Zitat Lauer, T., Datta, A.: Exploring graphics processing units as parallel coprocessors for online aggregation. In: Proceedings of the ACM 13th International Workshop on Data Warehousing and OLAP. ACM, pp. 77–84 (2010) Lauer, T., Datta, A.: Exploring graphics processing units as parallel coprocessors for online aggregation. In: Proceedings of the ACM 13th International Workshop on Data Warehousing and OLAP. ACM, pp. 77–84 (2010)
21.
Zurück zum Zitat Lee, S., Kim, J.: Efficient distributed parallel top-down computation of ROLAP data cube using mapreduce. In: Data Warehousing and Knowledge Discovery (2012)CrossRef Lee, S., Kim, J.: Efficient distributed parallel top-down computation of ROLAP data cube using mapreduce. In: Data Warehousing and Knowledge Discovery (2012)CrossRef
22.
Zurück zum Zitat Lee, S., Jo, S., Kim, J.: MRDataCube: data cube computation using MapReduce. In: IEEE International Conference on Big Data and Smart Computing (BigComp) (2008) Lee, S., Jo, S., Kim, J.: MRDataCube: data cube computation using MapReduce. In: IEEE International Conference on Big Data and Smart Computing (BigComp) (2008)
23.
Zurück zum Zitat Leng, F., Bao, Y.: An efficient indexing technique for computing high dimensional data cubes. In: International Conference on Advances in Web-Age Information Management (2006) Leng, F., Bao, Y.: An efficient indexing technique for computing high dimensional data cubes. In: International Conference on Advances in Web-Age Information Management (2006)
24.
Zurück zum Zitat Leng, F., Bao, Y.: Mapreduce-based data aggregation algorithms. China Science Paper (2011) Leng, F., Bao, Y.: Mapreduce-based data aggregation algorithms. China Science Paper (2011)
25.
Zurück zum Zitat Li, X., Hamilton, H.J.: The multi-tree cubing algorithm for computing iceberg cubes. J. Intell. Inf. Syst. (2009) Li, X., Hamilton, H.J.: The multi-tree cubing algorithm for computing iceberg cubes. J. Intell. Inf. Syst. (2009)
26.
Zurück zum Zitat Lim, Y., Kim, M.: A Bitmap Index for Multidimensional Data Cubes. Database and Expert Systems Applications. Springer, Berlin (2004)CrossRef Lim, Y., Kim, M.: A Bitmap Index for Multidimensional Data Cubes. Database and Expert Systems Applications. Springer, Berlin (2004)CrossRef
27.
Zurück zum Zitat Luan, H., Zhou, M., Fu, Y.: Closed cube computation on multi-core cpus. In: Fuzzy Systems and Knowledge Discovery (FSKD) (2012) Luan, H., Zhou, M., Fu, Y.: Closed cube computation on multi-core cpus. In: Fuzzy Systems and Knowledge Discovery (FSKD) (2012)
28.
Zurück zum Zitat Luo, J.Z., Li, J.Z., Zhao, K.: An iceberg cube algorithm for large compressed data warehouses. J. Softw. (2006) Luo, J.Z., Li, J.Z., Zhao, K.: An iceberg cube algorithm for large compressed data warehouses. J. Softw. (2006)
29.
Zurück zum Zitat Merrill, D., Grimshaw, A.: High performance and scalable radix sorting: a case study of implementing dynamic parallelism for GPU computing. Parallel Process. Lett. 21(02), 245–272 (2011)MathSciNetCrossRef Merrill, D., Grimshaw, A.: High performance and scalable radix sorting: a case study of implementing dynamic parallelism for GPU computing. Parallel Process. Lett. 21(02), 245–272 (2011)MathSciNetCrossRef
30.
Zurück zum Zitat Ng, R.T., Wagner, A., Yin, Y.: Iceberg-cube computation with pc clusters. In: ACM SIGMOD Record (2001)CrossRef Ng, R.T., Wagner, A., Yin, Y.: Iceberg-cube computation with pc clusters. In: ACM SIGMOD Record (2001)CrossRef
31.
Zurück zum Zitat Pansare, N., Borkar, V.R.: Online aggregation for large MapReduce jobs. In: Proceedings of the VLDB Endowment (2011) Pansare, N., Borkar, V.R.: Online aggregation for large MapReduce jobs. In: Proceedings of the VLDB Endowment (2011)
32.
Zurück zum Zitat Quan, Q.: Optimization of aggregation query performance based on MapReduce. In: China Computer and Communication (2014) Quan, Q.: Optimization of aggregation query performance based on MapReduce. In: China Computer and Communication (2014)
33.
Zurück zum Zitat Satish, N., Harris, M., Garland, M.: Designing efficient sorting algorithms for manycore GPUs. In: IPDPS (2009) Satish, N., Harris, M., Garland, M.: Designing efficient sorting algorithms for manycore GPUs. In: IPDPS (2009)
34.
Zurück zum Zitat Song, J., Guo, C., Wang, Z.: Haolap: a hadoop based OLAP system for big data. J. Syst. Softw. 102, 167–181 (2015)CrossRef Song, J., Guo, C., Wang, Z.: Haolap: a hadoop based OLAP system for big data. J. Syst. Softw. 102, 167–181 (2015)CrossRef
35.
Zurück zum Zitat Thusoo, A., Samara, J.S., Jain, N.: Hive: a warehousing solution over a map-reduce framework. In: Proceedings of the VLDB Endowment (2009) Thusoo, A., Samara, J.S., Jain, N.: Hive: a warehousing solution over a map-reduce framework. In: Proceedings of the VLDB Endowment (2009)
36.
Zurück zum Zitat Volkov, V., Demmel, J.W.: Benchmarking GPUs to tune dense linear algebra. In: IEEE (2009) Volkov, V., Demmel, J.W.: Benchmarking GPUs to tune dense linear algebra. In: IEEE (2009)
37.
Zurück zum Zitat Woods, L., István, Z., Alonso, G. Ibex: an intelligent storage engine with support for advanced SQL offloading. In: Proceedings of the VLDB Endowment (2014) Woods, L., István, Z., Alonso, G. Ibex: an intelligent storage engine with support for advanced SQL offloading. In: Proceedings of the VLDB Endowment (2014)
38.
Zurück zum Zitat Xin, D., Han, J., Li, X., Wah, B.W: Star-cubing: computing iceberg cubes by top-down and bottom-up integration. In: Proceedings of the 29th International Conference on VLDB (2003) Xin, D., Han, J., Li, X., Wah, B.W: Star-cubing: computing iceberg cubes by top-down and bottom-up integration. In: Proceedings of the 29th International Conference on VLDB (2003)
39.
Zurück zum Zitat Xin, D., Han, J., Liu, H.: C-cubing efficient computation of closed cubes by aggregation-based checking. In: ICDE (2006) Xin, D., Han, J., Liu, H.: C-cubing efficient computation of closed cubes by aggregation-based checking. In: ICDE (2006)
40.
Zurück zum Zitat You, J., Xi, J.: A parallel algorithm for closed cube computation. In: Seventh IEEE/ACIS International Conference on Computer and Information Science (ICIS), pp. 95–99 (2008) You, J., Xi, J.: A parallel algorithm for closed cube computation. In: Seventh IEEE/ACIS International Conference on Computer and Information Science (ICIS), pp. 95–99 (2008)
41.
Zurück zum Zitat Zhao, A.: An array-based algorithm for simultaneous multidimensional aggregates. In: SIGMOD (1997) Zhao, A.: An array-based algorithm for simultaneous multidimensional aggregates. In: SIGMOD (1997)
42.
Zurück zum Zitat Zhuo, G., Chen, H.: Parallel cube computation on modern CPUs and GPUs. J. Supercomput. 61(3), 394–417 (2012)CrossRef Zhuo, G., Chen, H.: Parallel cube computation on modern CPUs and GPUs. J. Supercomput. 61(3), 394–417 (2012)CrossRef
Metadaten
Titel
Efficient OLAP algorithms on GPU-accelerated Hadoop clusters
verfasst von
Hongzhi Wang
Zheng Wang
Ning Li
Xinxin Kong
Publikationsdatum
31.07.2018
Verlag
Springer US
Erschienen in
Distributed and Parallel Databases / Ausgabe 4/2019
Print ISSN: 0926-8782
Elektronische ISSN: 1573-7578
DOI
https://doi.org/10.1007/s10619-018-7239-z

Weitere Artikel der Ausgabe 4/2019

Distributed and Parallel Databases 4/2019 Zur Ausgabe

Premium Partner