Skip to main content

2015 | OriginalPaper | Buchkapitel

Workload-Aware Self-Tuning Histograms of String Data

verfasst von : Nickolas Zoulis, Effrosyni Mavroudi, Anna Lykoura, Angelos Charalambidis, Stasinos Konstantopoulos

Erschienen in: Database and Expert Systems Applications

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper we extend STHoles, a very successful algorithm that uses query results to build and maintain multi-dimensional histograms of numerical data. Our contribution is the formal definition of extensions of all relevant concepts; such that they are independent of the domain of the data, but subsume STHoles concepts as their numerical specialization. At the same time, we also derive specializations for the string domain and implement these into a prototype that we use to empirically validate our approach. Our current implementation uses string prefixes as the machinery for describing string ranges. Although weaker than regular expressions, prefixes can be very efficiently applied and can capture interesting ranges in hierarchically structured string domains, such as those of filesystem pathnames and URIs. In fact, we base the empirical validation of the approach on existing, publicly available Semantic Web data where we demonstrate convergence to accurate and efficient histograms.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
For more details on Semagrow, please see http://​www.​semagrow.​eu.
 
2
Please see http://​agris.​fao.​org for more details on AGRIS. The AGRIS site mentions 7 million distinct publications, but this includes recent additions that are not in end-2013 data dump used for these experiments.
 
3
We use the canonical string representation of URIs as defined in Sect. 2, IETF RFC 7320 (http://​tools.​ietf.​org/​html/​rfc7320).
 
Literatur
1.
Zurück zum Zitat Aboulnaga, A., Chaudhuri, S.: Self-tuning histograms: Building histograms without looking at data. In: Proceedings of the 1999 ACM International Conference on Management of Data (SIGMOD 1999), pp. 181–192. ACM (1999) Aboulnaga, A., Chaudhuri, S.: Self-tuning histograms: Building histograms without looking at data. In: Proceedings of the 1999 ACM International Conference on Management of Data (SIGMOD 1999), pp. 181–192. ACM (1999)
2.
Zurück zum Zitat Bruno, N., Chaudhuri, S., Gravano, L.: STHoles: a multidimensional workload-aware histogram. In: Proceedings of 2001 ACM International Conference on Management of Data (SIGMOD 2001), pp. 211–222. ACM (2001) Bruno, N., Chaudhuri, S., Gravano, L.: STHoles: a multidimensional workload-aware histogram. In: Proceedings of 2001 ACM International Conference on Management of Data (SIGMOD 2001), pp. 211–222. ACM (2001)
3.
Zurück zum Zitat Srivastava, U., Haas, P.J., Markl, V., Kutsch, M., Tran, T.M.: ISOMER: Consistent histogram construction using query feedback. In: Proceedings of the 22nd International Conference on Data Engineering (ICDE 2006). IEEE Computer Society (2006) Srivastava, U., Haas, P.J., Markl, V., Kutsch, M., Tran, T.M.: ISOMER: Consistent histogram construction using query feedback. In: Proceedings of the 22nd International Conference on Data Engineering (ICDE 2006). IEEE Computer Society (2006)
4.
Zurück zum Zitat Roh, Y.J., Kim, J.H., Chung, Y.D., Son, J.H., Kim, M.H.: Hierarchically organized skew-tolerant histograms for geographic data objects. In: Proceedings of 2010 ACM International Conference on Management of Data (SIGMOD 2010), pp. 627–638. ACM (2010) Roh, Y.J., Kim, J.H., Chung, Y.D., Son, J.H., Kim, M.H.: Hierarchically organized skew-tolerant histograms for geographic data objects. In: Proceedings of 2010 ACM International Conference on Management of Data (SIGMOD 2010), pp. 627–638. ACM (2010)
5.
Zurück zum Zitat Chaudhuri, S., Ganti, V., Gravano, L.: Selectivity estimation for string predicates: Overcoming the underestimation problem. In: Proceedings of 20th International Conference on Data Engineering (ICDE 2004). IEEE Computer Society (2004) Chaudhuri, S., Ganti, V., Gravano, L.: Selectivity estimation for string predicates: Overcoming the underestimation problem. In: Proceedings of 20th International Conference on Data Engineering (ICDE 2004). IEEE Computer Society (2004)
6.
Zurück zum Zitat Lim, L., Wang, M., Vitter, J.S.: CXHist: An on-line classification-based histogram for XML string selectivity estimation. In: Proceedings of the 31st International Conference on Very Large Data Bases (VLDB 2005), Trondheim, Norway, 30 August – 2 September 2005, pp. 1187–1198 (2005) Lim, L., Wang, M., Vitter, J.S.: CXHist: An on-line classification-based histogram for XML string selectivity estimation. In: Proceedings of the 31st International Conference on Very Large Data Bases (VLDB 2005), Trondheim, Norway, 30 August – 2 September 2005, pp. 1187–1198 (2005)
7.
Zurück zum Zitat Ding, L., Finin, T., Joshi, A., Pan, R., Cost, R.S., Peng, Y., Reddivari, P., Doshi, V., Sachs, J.: Swoogle: A search and metadata engine for the Semantic Web. In: Proceedings of the 13th ACM International Conference on Information and Knowledge Management (CIKM 2004), pp. 652–659. ACM (2004) Ding, L., Finin, T., Joshi, A., Pan, R., Cost, R.S., Peng, Y., Reddivari, P., Doshi, V., Sachs, J.: Swoogle: A search and metadata engine for the Semantic Web. In: Proceedings of the 13th ACM International Conference on Information and Knowledge Management (CIKM 2004), pp. 652–659. ACM (2004)
8.
Zurück zum Zitat Auer, S., Demter, J., Martin, M., Lehmann, J.: LODStats – an extensible framework for high-performance dataset analytics. In: ten Teije, A., Völker, J., Handschuh, S., Stuckenschmidt, H., d’Acquin, M., Nikolov, A., Aussenac-Gilles, N., Hernandez, N. (eds.) EKAW 2012. LNCS, vol. 7603, pp. 353–362. Springer, Heidelberg (2012) Auer, S., Demter, J., Martin, M., Lehmann, J.: LODStats – an extensible framework for high-performance dataset analytics. In: ten Teije, A., Völker, J., Handschuh, S., Stuckenschmidt, H., d’Acquin, M., Nikolov, A., Aussenac-Gilles, N., Hernandez, N. (eds.) EKAW 2012. LNCS, vol. 7603, pp. 353–362. Springer, Heidelberg (2012)
9.
Zurück zum Zitat Langegger, A., Wöss, W.: RDFStats - an extensible RDF statistics generator and library. In: Proceedings of DEXA 2009, pp. 79–83 (2009) Langegger, A., Wöss, W.: RDFStats - an extensible RDF statistics generator and library. In: Proceedings of DEXA 2009, pp. 79–83 (2009)
10.
Zurück zum Zitat Harth, A., Hose, K., Karnstedt, M., Polleres, A., Sattler, K.U., Umbrich, J.: Data summaries for on-demand queries over linked data. In: Proceedings of 19th International World Wide Web Conference (WWW 2010), Raleigh, NC, USA, 26–30 April 2010 (2010) Harth, A., Hose, K., Karnstedt, M., Polleres, A., Sattler, K.U., Umbrich, J.: Data summaries for on-demand queries over linked data. In: Proceedings of 19th International World Wide Web Conference (WWW 2010), Raleigh, NC, USA, 26–30 April 2010 (2010)
11.
Zurück zum Zitat Charalambidis, A., Konstantopoulos, S., Karkaletsis, V.: Dataset descriptions for optimizing federated querying. In: Poster Track, Companion Volume to the Procedings of the 24th Intl World Wide Web Conference (WWW 2015), Florence, Italy, 18–22 May 2015. ACM (2015) Charalambidis, A., Konstantopoulos, S., Karkaletsis, V.: Dataset descriptions for optimizing federated querying. In: Poster Track, Companion Volume to the Procedings of the 24th Intl World Wide Web Conference (WWW 2015), Florence, Italy, 18–22 May 2015. ACM (2015)
Metadaten
Titel
Workload-Aware Self-Tuning Histograms of String Data
verfasst von
Nickolas Zoulis
Effrosyni Mavroudi
Anna Lykoura
Angelos Charalambidis
Stasinos Konstantopoulos
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-22849-5_20