Skip to main content

2018 | OriginalPaper | Buchkapitel

Efficient Aggregation Query Processing for Large-Scale Multidimensional Data by Combining RDB and KVS

verfasst von : Yuya Watari, Atsushi Keyaki, Jun Miyazaki, Masahide Nakamura

Erschienen in: Database and Expert Systems Applications

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper presents a highly efficient aggregation query processing method for large-scale multidimensional data. Recent developments in network technologies have led to the generation of a large amount of multidimensional data, such as sensor data. Aggregation queries play an important role in analyzing such data. Although relational databases (RDBs) support efficient aggregation queries with indexes that enable faster query processing, increasing data size may lead to bottlenecks. On the other hand, the use of a distributed key-value store (D-KVS) is key to obtaining scale-out performance for data insertion throughput. However, querying multidimensional data sometimes requires a full data scan owing to its insufficient support for indexes. The proposed method combines an RDB and D-KVS to use their advantages complementarily. In addition, a novel technique is presented wherein data are divided into several subsets called grids, and the aggregated values for each grid are precomputed. This technique improves query processing performance by reducing the amount of scanned data. We evaluated the efficiency of the proposed method by comparing its performance with current state-of-the-art methods and showed that the proposed method performs better than the current ones in terms of query and insertion.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Our implementation uses a custom filter in HBase for a prefix scan in Step 3 of Algorithm 2, which efficiently extracts the data contained within the given query range.
 
Literatur
1.
Zurück zum Zitat Codd, E., Codd, S., Salley, C.: Providing OLAP (On-line Analytical Processing) to User-Analysts: An IT Mandate. Codd & Associates (1993) Codd, E., Codd, S., Salley, C.: Providing OLAP (On-line Analytical Processing) to User-Analysts: An IT Mandate. Codd & Associates (1993)
2.
Zurück zum Zitat Wang, J., Wu, S., Gao, H., Li, J., Ooi, B.C.: Indexing multi-dimensional data in a cloud system. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 591–602. ACM (2010) Wang, J., Wu, S., Gao, H., Li, J., Ooi, B.C.: Indexing multi-dimensional data in a cloud system. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 591–602. ACM (2010)
3.
Zurück zum Zitat Zhang, X., Ai, J., Wang, Z., Lu, J., Meng, X.: An efficient multi-dimensional index for cloud data management. In: Proceedings of the First International Workshop on Cloud Data Management, pp. 17–24. ACM (2009) Zhang, X., Ai, J., Wang, Z., Lu, J., Meng, X.: An efficient multi-dimensional index for cloud data management. In: Proceedings of the First International Workshop on Cloud Data Management, pp. 17–24. ACM (2009)
4.
Zurück zum Zitat Li, X., Kim, Y.J., Govindan, R., Hong, W.: Multi-dimensional range queries in sensor networks. In: Proceedings of the 1st International Conference on Embedded Networked Sensor Systems, pp. 63–75. ACM (2003) Li, X., Kim, Y.J., Govindan, R., Hong, W.: Multi-dimensional range queries in sensor networks. In: Proceedings of the 1st International Conference on Embedded Networked Sensor Systems, pp. 63–75. ACM (2003)
5.
Zurück zum Zitat Escriva, R., Wong, B., Sirer, E.G.: Hyperdex: a distributed, searchable key-value store. ACM SIGCOMM Comput. Commun. Rev. 42(4), 25–36 (2012)CrossRef Escriva, R., Wong, B., Sirer, E.G.: Hyperdex: a distributed, searchable key-value store. ACM SIGCOMM Comput. Commun. Rev. 42(4), 25–36 (2012)CrossRef
6.
Zurück zum Zitat Nishimura, S., Das, S., Agrawal, D., El Abbadi, A.: \(\cal{MD}\)-hbase: design and implementation of an elastic data infrastructure for cloud-scale location services. Distrib. Parallel Databases 31(2), 289–319 (2013)CrossRef Nishimura, S., Das, S., Agrawal, D., El Abbadi, A.: \(\cal{MD}\)-hbase: design and implementation of an elastic data infrastructure for cloud-scale location services. Distrib. Parallel Databases 31(2), 289–319 (2013)CrossRef
7.
Zurück zum Zitat Codd, E.F.: A relational model of data for large shared data banks. Commun. ACM 13(6), 377–387 (1970)CrossRef Codd, E.F.: A relational model of data for large shared data banks. Commun. ACM 13(6), 377–387 (1970)CrossRef
8.
Zurück zum Zitat Lu, H., Tan, K.L., Ooi, B.-C.: Query Processing in Parallel Relational Database Systems. IEEE Computer Society Press, Los Alamitos (1994) Lu, H., Tan, K.L., Ooi, B.-C.: Query Processing in Parallel Relational Database Systems. IEEE Computer Society Press, Los Alamitos (1994)
10.
Zurück zum Zitat Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. ACM SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)CrossRef Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. ACM SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)CrossRef
11.
Zurück zum Zitat Cooper, B.F., et al.: PNUTS: Yahoo!’s hosted data serving platform. Proc. VLDB Endow. 1(2), 1277–1288 (2008)CrossRef Cooper, B.F., et al.: PNUTS: Yahoo!’s hosted data serving platform. Proc. VLDB Endow. 1(2), 1277–1288 (2008)CrossRef
13.
Zurück zum Zitat DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: Amazon’s highly available key-value store. ACM SIGOPS Oper. Syst. Rev. 41(6), 205–220 (2007)CrossRef DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: Amazon’s highly available key-value store. ACM SIGOPS Oper. Syst. Rev. 41(6), 205–220 (2007)CrossRef
14.
Zurück zum Zitat Morton, G.M.: A computer oriented geodetic data base and a new technique in file sequencing. In: International Business Machines Company New York (1966) Morton, G.M.: A computer oriented geodetic data base and a new technique in file sequencing. In: International Business Machines Company New York (1966)
15.
Zurück zum Zitat Hilbert, D.: Ueber die stetige abbildung einer line auf ein flächenstück. Math. Ann. 38(3), 459–460 (1891)MathSciNetCrossRef Hilbert, D.: Ueber die stetige abbildung einer line auf ein flächenstück. Math. Ann. 38(3), 459–460 (1891)MathSciNetCrossRef
16.
Zurück zum Zitat Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data, SIGMOD 1984, pp. 47–57. ACM, New York (1984) Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data, SIGMOD 1984, pp. 47–57. ACM, New York (1984)
17.
Zurück zum Zitat Finkel, R.A., Bentley, J.L.: Quad trees a data structure for retrieval on composite keys. Acta Inf. 4(1), 1–9 (1974)CrossRef Finkel, R.A., Bentley, J.L.: Quad trees a data structure for retrieval on composite keys. Acta Inf. 4(1), 1–9 (1974)CrossRef
18.
Zurück zum Zitat Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)CrossRef Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)CrossRef
19.
Zurück zum Zitat Nishimura, S., Yokota, H.: Quilts: multidimensional data partitioning framework based on query-aware and skew-tolerant space-filling curves. In: Proceedings of the 2017 ACM International Conference on Management of Data, pp. 1525–1537. ACM (2017) Nishimura, S., Yokota, H.: Quilts: multidimensional data partitioning framework based on query-aware and skew-tolerant space-filling curves. In: Proceedings of the 2017 ACM International Conference on Management of Data, pp. 1525–1537. ACM (2017)
20.
Zurück zum Zitat Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef
21.
Zurück zum Zitat Eldawy, A., Mokbel, M.F.: SpatialHadoop: a MapReduce framework for spatial data. In: 2015 IEEE 31st International Conference on Data Engineering, pp. 1352–1363, April 2015 Eldawy, A., Mokbel, M.F.: SpatialHadoop: a MapReduce framework for spatial data. In: 2015 IEEE 31st International Conference on Data Engineering, pp. 1352–1363, April 2015
22.
Zurück zum Zitat Korry Douglas, S.D.: PostgreSQL: A Comprehensive Guide to Building, Programming, and Administering PostgresSQL Databases. Sams Publishing, Indianapolis (2003) Korry Douglas, S.D.: PostgreSQL: A Comprehensive Guide to Building, Programming, and Administering PostgresSQL Databases. Sams Publishing, Indianapolis (2003)
24.
Zurück zum Zitat Brinkhoff, T.: A framework for generating network-based moving objects. GeoInformatica 6(2), 153–180 (2002)CrossRef Brinkhoff, T.: A framework for generating network-based moving objects. GeoInformatica 6(2), 153–180 (2002)CrossRef
Metadaten
Titel
Efficient Aggregation Query Processing for Large-Scale Multidimensional Data by Combining RDB and KVS
verfasst von
Yuya Watari
Atsushi Keyaki
Jun Miyazaki
Masahide Nakamura
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-98809-2_9