Skip to main content

2017 | OriginalPaper | Buchkapitel

Optimizing Window Aggregate Functions via Random Sampling

verfasst von : Guangxuan Song, Wenwen Qu, Yilin Wang, Xiaoling Wang

Erschienen in: Web and Big Data

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Window functions have been a part of the SQL standard since 2003 and have been well studied during the past decade. As the demand increases in analytics tools, window functions have seen an increasing amount of potential applications. Although the current mainstream commercial databases support window functions, the existing implementation strategies are inefficient for the real-time processing of big data. Recently, some algorithms based on sampling (e.g., online aggregation) have been proposed to deal with large and complex data in relational databases, which offer us a flexible tradeoff between accuracy and efficiency. However, sampling techniques have not been considered for window functions in databases. In this paper, we first propose two algorithms to deal with window functions based on two sampling techniques, Naive Random Sampling and Incremental Random Sampling. The proposed algorithms are highly efficient and are general enough to aggregate other existing algorithms of window functions. In particular, we evaluated our algorithms in the latest version of PostgreSQL, which demonstrated superior performance over the TPC-H benchmark.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Zuzarte, C., Pirahesh, H., Ma, W., Cheng, Q., Liu, L., Wong, V.: Winmagic: subquery elimination using window aggregation. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data (SIGMOD 2003), pp. 652–656. ACM (2003) Zuzarte, C., Pirahesh, H., Ma, W., Cheng, Q., Liu, L., Wong, V.: Winmagic: subquery elimination using window aggregation. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data (SIGMOD 2003), pp. 652–656. ACM (2003)
2.
Zurück zum Zitat Bellamkonda, S., Ahmed, R., Witkowski, A., Amor, A., Zait, M., Lin, C.-C.: Enhanced subquery optimizations in oracle. Proc. VLDB Endow. 2(2), 1366–1377 (2009)CrossRef Bellamkonda, S., Ahmed, R., Witkowski, A., Amor, A., Zait, M., Lin, C.-C.: Enhanced subquery optimizations in oracle. Proc. VLDB Endow. 2(2), 1366–1377 (2009)CrossRef
3.
Zurück zum Zitat Ben-Gan, I.: Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions. Microsoft Press (2012) Ben-Gan, I.: Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions. Microsoft Press (2012)
4.
Zurück zum Zitat Cao, Y., Bramandia, R., Chan, C.Y., Tan, K.-L.: Optimized query evaluation using cooperative sorts. In: Proceedings of the 26th International Conference on Data Engineering, ICDE, Long Beach, California, USA, pp. 601–612. IEEE (2010) Cao, Y., Bramandia, R., Chan, C.Y., Tan, K.-L.: Optimized query evaluation using cooperative sorts. In: Proceedings of the 26th International Conference on Data Engineering, ICDE, Long Beach, California, USA, pp. 601–612. IEEE (2010)
5.
Zurück zum Zitat Cao, Y., Chan, C.-Y., Li, J., Tan, K.-L.: Optimization of analytic window functions. Proc. VLDB Endow. 5(11), 1244–1255 (2012)CrossRef Cao, Y., Chan, C.-Y., Li, J., Tan, K.-L.: Optimization of analytic window functions. Proc. VLDB Endow. 5(11), 1244–1255 (2012)CrossRef
6.
Zurück zum Zitat Cao, Y., Bramandia, R., Chan, C.-Y., Tan, K.-L.: Sort-sharing-aware query processing. VLDB J. 21(3), 411–436 (2012)CrossRef Cao, Y., Bramandia, R., Chan, C.-Y., Tan, K.-L.: Sort-sharing-aware query processing. VLDB J. 21(3), 411–436 (2012)CrossRef
7.
Zurück zum Zitat Neumann, T., Moerkotte, G.: A combined framework for grouping and order optimization. In: Proceedings of the 30th International Conference on Very Large Data Bases (VLDB 2004), pp. 960–971. VLDB Endowment (2004) Neumann, T., Moerkotte, G.: A combined framework for grouping and order optimization. In: Proceedings of the 30th International Conference on Very Large Data Bases (VLDB 2004), pp. 960–971. VLDB Endowment (2004)
8.
Zurück zum Zitat Simmen, D., Shekita, E., Malkemus, T.: Fundamental techniques for order optimization. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data (SIGMOD 1996), pp. 57–67. ACM (1996) Simmen, D., Shekita, E., Malkemus, T.: Fundamental techniques for order optimization. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data (SIGMOD 1996), pp. 57–67. ACM (1996)
9.
Zurück zum Zitat Wang, X.Y., Chernicak, M.: Avoiding sorting and grouping in processing queries. In: Proceedings of the 29th International Conference on Very Large Data Bases (VLDB 2003), pp. 826–837. VLDB Endowment (2003) Wang, X.Y., Chernicak, M.: Avoiding sorting and grouping in processing queries. In: Proceedings of the 29th International Conference on Very Large Data Bases (VLDB 2003), pp. 826–837. VLDB Endowment (2003)
10.
Zurück zum Zitat Leis, V., Kan, K., Kemper, A., et al.: Efficient processing of window functions in analytical SQL queries. Proc. VLDB Endow. 8(10), 1058–1069 (2015)CrossRef Leis, V., Kan, K., Kemper, A., et al.: Efficient processing of window functions in analytical SQL queries. Proc. VLDB Endow. 8(10), 1058–1069 (2015)CrossRef
11.
Zurück zum Zitat Wesley, R., Xu, F.: Incremental computation of common windowed holistic aggregates. Proc. VLDB Endow. 9(12), 1221–1232 (2016)CrossRef Wesley, R., Xu, F.: Incremental computation of common windowed holistic aggregates. Proc. VLDB Endow. 9(12), 1221–1232 (2016)CrossRef
12.
Zurück zum Zitat Hellerstein, J.M., Haas, P.J., Wang, H.J.: Online aggregation. ACM SIGMOD Rec. 26(2), 171–182 (1997)CrossRef Hellerstein, J.M., Haas, P.J., Wang, H.J.: Online aggregation. ACM SIGMOD Rec. 26(2), 171–182 (1997)CrossRef
13.
Zurück zum Zitat Xu, F., Jermaine, C.M., Dobra, A.: Confidence bounds for sampling-based group by estimates. ACM Trans. Database Syst. 33(3), 16 (2015) Xu, F., Jermaine, C.M., Dobra, A.: Confidence bounds for sampling-based group by estimates. ACM Trans. Database Syst. 33(3), 16 (2015)
14.
Zurück zum Zitat Li, F., Wu, B., Yi, K., Join, W., et al.: Online aggregation via random walks. In: Proceedings of 35th ACM SIGMOD International Conference on Management of Data (SIGMOD 2016), pp. 615–629. ACM (2016) Li, F., Wu, B., Yi, K., Join, W., et al.: Online aggregation via random walks. In: Proceedings of 35th ACM SIGMOD International Conference on Management of Data (SIGMOD 2016), pp. 615–629. ACM (2016)
15.
Zurück zum Zitat Haas, P.J.: Large-sample and deterministic confidence intervals for online aggregation. In: Proceedings of International Conference on Scientific and Statistical Database Management, pp. 51–63. IEEE (1997) Haas, P.J.: Large-sample and deterministic confidence intervals for online aggregation. In: Proceedings of International Conference on Scientific and Statistical Database Management, pp. 51–63. IEEE (1997)
16.
Zurück zum Zitat Wu, S., Ooi, B.C., Tan, K.: Continuous sampling for online aggregation over multiple queries. In: SIGMOD, pp. 651–662. ACM (2010) Wu, S., Ooi, B.C., Tan, K.: Continuous sampling for online aggregation over multiple queries. In: SIGMOD, pp. 651–662. ACM (2010)
17.
Zurück zum Zitat Olken, F.: Random sampling from databases. Ph.D. thesis, University of California at Berkeley (1993) Olken, F.: Random sampling from databases. Ph.D. thesis, University of California at Berkeley (1993)
18.
Zurück zum Zitat Wang, L., Christensen, R., Li, F., et al.: Spatial online sampling and aggregation. Proc. VLDB Endow. 9(3), 84–95 (2016)CrossRef Wang, L., Christensen, R., Li, F., et al.: Spatial online sampling and aggregation. Proc. VLDB Endow. 9(3), 84–95 (2016)CrossRef
19.
Zurück zum Zitat Haas, P.J., Hellerstein, J.M.: Ripple joins fro online aggregation. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 287–298. ACM (1999) Haas, P.J., Hellerstein, J.M.: Ripple joins fro online aggregation. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 287–298. ACM (1999)
20.
Zurück zum Zitat Qin, C., Rusu, F.: PF-OLA: a high-performance framework for parallel online aggregation. Distrib. Parallel Databases 32(3), 337–375 (2014)CrossRef Qin, C., Rusu, F.: PF-OLA: a high-performance framework for parallel online aggregation. Distrib. Parallel Databases 32(3), 337–375 (2014)CrossRef
22.
Zurück zum Zitat Bellamkonda, S., Bozkaya, T., Ghosh, B., Gupta, A., Haydu, J., Subramanian, S., Witkowski, A.: Analytic functions in oracle 8i. Technical report (2000) Bellamkonda, S., Bozkaya, T., Ghosh, B., Gupta, A., Haydu, J., Subramanian, S., Witkowski, A.: Analytic functions in oracle 8i. Technical report (2000)
23.
Zurück zum Zitat Jin, R., Glimcher, L., Jermaine, C., et al.: New sampling-based estimators for OLAP queries. In: International Conference on Data Engineering, p. 18. IEEE (2016) Jin, R., Glimcher, L., Jermaine, C., et al.: New sampling-based estimators for OLAP queries. In: International Conference on Data Engineering, p. 18. IEEE (2016)
24.
Zurück zum Zitat Joshi, S., Jermaine, C.M.: Sampling-based estimators for subset-based queries. VLDB J. 18(1), 181–202 (2009)CrossRef Joshi, S., Jermaine, C.M.: Sampling-based estimators for subset-based queries. VLDB J. 18(1), 181–202 (2009)CrossRef
26.
Zurück zum Zitat Murgai, S.R.: Reference Use Statistics: Statistical Sampling Method Works (University of Tennessee at Chattanooga), p. 54. Southeastern Librarian (2006) Murgai, S.R.: Reference Use Statistics: Statistical Sampling Method Works (University of Tennessee at Chattanooga), p. 54. Southeastern Librarian (2006)
27.
Zurück zum Zitat Adcock, B., Hansen, A.C.: Generalized sampling and infinite-dimensional compressed sensing. Found. Comput. Math. 16(5), 1263–1323 (2016)MathSciNetCrossRefMATH Adcock, B., Hansen, A.C.: Generalized sampling and infinite-dimensional compressed sensing. Found. Comput. Math. 16(5), 1263–1323 (2016)MathSciNetCrossRefMATH
28.
Zurück zum Zitat Bengio, S., Vinyals, O., Jaitly, N., et al.: Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks. Computer Science (2015) Bengio, S., Vinyals, O., Jaitly, N., et al.: Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks. Computer Science (2015)
Metadaten
Titel
Optimizing Window Aggregate Functions via Random Sampling
verfasst von
Guangxuan Song
Wenwen Qu
Yilin Wang
Xiaoling Wang
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-63564-4_19