Skip to main content
Erschienen in: Knowledge and Information Systems 2/2014

01.05.2014 | Regular Paper

Similarity measures for OLAP sessions

verfasst von: Julien Aligon, Matteo Golfarelli, Patrick Marcel, Stefano Rizzi, Elisa Turricchia

Erschienen in: Knowledge and Information Systems | Ausgabe 2/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

OLAP queries are not normally formulated in isolation, but in the form of sequences called OLAP sessions. Recognizing that two OLAP sessions are similar would be useful for different applications, such as query recommendation and personalization; however, the problem of measuring OLAP session similarity has not been studied so far. In this paper, we aim at filling this gap. First, we propose a set of similarity criteria derived from a user study conducted with a set of OLAP practitioners and researchers. Then, we propose a function for estimating the similarity between OLAP queries based on three components: the query group-by set, its selection predicate, and the measures required in output. To assess the similarity of OLAP sessions, we investigate the feasibility of extending four popular methods for measuring similarity, namely the Levenshtein distance, the Dice coefficient, the tf–idf weight, and the Smith–Waterman algorithm. Finally, we experimentally compare these four extensions to show that the Smith–Waterman extension is the one that best captures the users’ criteria for session similarity.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
3
Note that, while substrings are consecutive parts of a string, subsequences need not be.
 
4
While this enables a simpler formalization for group-by sets (see Definition 4.2), it does not significantly impact on the overall approach. Indeed, partially-ordered hierarchies could be easily solved with by extending Definition 5.4 to measure the distance between two group-by sets on the multidimensional lattice as suggested by Golfarelli [16].
 
5
In a relational implementation, a multidimensional schema can be translated into either a star or a snowflake schema. While the specific joins required in these two cases to formulate the same query are different, a user is completely unaware of this difference because OLAP tools completely hide the underlying SQL and logical schemata to let users reason on the multidimensional cube abstraction.
 
6
In the formula, the three rows of the \(min\) argument deal with deletions, insertions, and substitutions, respectively.
 
Literatur
1.
Zurück zum Zitat Abiteboul S, Hull R, Vianu V (1995) Foundations of databases. Addison-Wesley, ReadingMATH Abiteboul S, Hull R, Vianu V (1995) Foundations of databases. Addison-Wesley, ReadingMATH
2.
Zurück zum Zitat Agrawal R, Rantzau R, Terzi E (2006) Context-sensitive ranking. In: Proceedings ACM SIGMOD international conference on management of data. Chicago, IL, pp 383–394 Agrawal R, Rantzau R, Terzi E (2006) Context-sensitive ranking. In: Proceedings ACM SIGMOD international conference on management of data. Chicago, IL, pp 383–394
3.
Zurück zum Zitat Akbarnejad J, Chatzopoulou G, Eirinaki M, Koshy S, Mittal S, On D, Polyzotis N, Varman JSV (2010) SQL QueRIE recommendations. PVLDB 3(2):1597–1600 Akbarnejad J, Chatzopoulou G, Eirinaki M, Koshy S, Mittal S, On D, Polyzotis N, Varman JSV (2010) SQL QueRIE recommendations. PVLDB 3(2):1597–1600
4.
Zurück zum Zitat Aligon J, Golfarelli M, Marcel P, Rizzi S, Turricchia E (2011) Mining preferences from OLAP query logs for proactive personalization. In: Proceedings ADBIS. Vienna, Austria, pp 84–97 Aligon J, Golfarelli M, Marcel P, Rizzi S, Turricchia E (2011) Mining preferences from OLAP query logs for proactive personalization. In: Proceedings ADBIS. Vienna, Austria, pp 84–97
5.
Zurück zum Zitat Aouiche K, Jouve P-E, Darmont J (2006) Clustering-based materialized view selection in data warehouses. In: Proceedings ADBIS. Thessaloniki, Greece, pp 81–95 Aouiche K, Jouve P-E, Darmont J (2006) Clustering-based materialized view selection in data warehouses. In: Proceedings ADBIS. Thessaloniki, Greece, pp 81–95
6.
Zurück zum Zitat Baikousi E, Rogkakos G, Vassiliadis P (2011) Similarity measures for multidimensional data. In: Proceedings ICDE. Hannover, Germany, pp 171–182 Baikousi E, Rogkakos G, Vassiliadis P (2011) Similarity measures for multidimensional data. In: Proceedings ICDE. Hannover, Germany, pp 171–182
7.
Zurück zum Zitat Brown PF, Pietra VJD, de Souza PV, Lai JC, Mercer RL (1992) Class-based n-gram models of natural language. Comput Linguist 18(4):467–479 Brown PF, Pietra VJD, de Souza PV, Lai JC, Mercer RL (1992) Class-based n-gram models of natural language. Comput Linguist 18(4):467–479
8.
Zurück zum Zitat Bustos B, Skopal T (2011) Non-metric similarity search problems in very large collections. In: Proceedings ICDE. Hannover, Germany, pp 1362–1365 Bustos B, Skopal T (2011) Non-metric similarity search problems in very large collections. In: Proceedings ICDE. Hannover, Germany, pp 1362–1365
9.
Zurück zum Zitat Chatzopoulou G, Eirinaki M, Koshy S, Mittal S, Polyzotis N, Varman JSV (2011) The QueRIE system for personalized query recommendations. IEEE Data Eng Bull 34(2):55–60 Chatzopoulou G, Eirinaki M, Koshy S, Mittal S, Polyzotis N, Varman JSV (2011) The QueRIE system for personalized query recommendations. IEEE Data Eng Bull 34(2):55–60
10.
Zurück zum Zitat Chatzopoulou G, Eirinaki M, Polyzotis N (2009) Query recommendations for interactive database exploration. In: Proceedings SSDBM. New Orleans, LA, pp 3–18 Chatzopoulou G, Eirinaki M, Polyzotis N (2009) Query recommendations for interactive database exploration. In: Proceedings SSDBM. New Orleans, LA, pp 3–18
11.
Zurück zum Zitat Cohen WW, Ravikumar PD, Fienberg SE (2003) A comparison of string distance metrics for name-matching tasks. In: Proceedings IJCAI-03 workshop on information integration on the web. Acapulco, Mexico, pp 73–78 Cohen WW, Ravikumar PD, Fienberg SE (2003) A comparison of string distance metrics for name-matching tasks. In: Proceedings IJCAI-03 workshop on information integration on the web. Acapulco, Mexico, pp 73–78
12.
Zurück zum Zitat Drosou M, Pitoura E (2011) ReDRIVE: result-driven database exploration through recommendations. In: Proceedings CIKM. Glasgow, UK, pp 1547–1552 Drosou M, Pitoura E (2011) ReDRIVE: result-driven database exploration through recommendations. In: Proceedings CIKM. Glasgow, UK, pp 1547–1552
13.
Zurück zum Zitat Garcia-Molina H, Ullman JD, Widom JD (2008) Database systems: the complete book, 2nd edn. Prentice Hall, Englewood Cliffs Garcia-Molina H, Ullman JD, Widom JD (2008) Database systems: the complete book, 2nd edn. Prentice Hall, Englewood Cliffs
14.
Zurück zum Zitat Ghosh A, Parikh J, Sengar VS, Haritsa JR (2002) Plan selection based on query clustering. In: Proceedings VLDB. Hong Kong, China, pp 179–190 Ghosh A, Parikh J, Sengar VS, Haritsa JR (2002) Plan selection based on query clustering. In: Proceedings VLDB. Hong Kong, China, pp 179–190
15.
Zurück zum Zitat Giacometti A, Marcel P, Negre E (2009) Recommending multidimensional queries. In: ‘Proceedings DaWaK. Linz, Austria, pp 453–466 Giacometti A, Marcel P, Negre E (2009) Recommending multidimensional queries. In: ‘Proceedings DaWaK. Linz, Austria, pp 453–466
16.
Zurück zum Zitat Golfarelli M (2003) Handling large workloads by profiling and clustering. In: Proceedings DaWaK. Czech Republic, Prague, pp 212–223 Golfarelli M (2003) Handling large workloads by profiling and clustering. In: Proceedings DaWaK. Czech Republic, Prague, pp 212–223
17.
Zurück zum Zitat Golfarelli M, Rizzi S, Biondi P (2011) myOLAP: an approach to express and evaluate OLAP preferences. IEEE TKDE 23(7):1050–1064 Golfarelli M, Rizzi S, Biondi P (2011) myOLAP: an approach to express and evaluate OLAP preferences. IEEE TKDE 23(7):1050–1064
18.
Zurück zum Zitat Grossman D, Frieder O (2004) Information retrieval: algorithms and heuristics. Springer, BerlinCrossRef Grossman D, Frieder O (2004) Information retrieval: algorithms and heuristics. Springer, BerlinCrossRef
19.
Zurück zum Zitat Gupta A, Mumick I (1999) Materialized views: techniques, implementations, and applications. MIT Press, Cambridge Gupta A, Mumick I (1999) Materialized views: techniques, implementations, and applications. MIT Press, Cambridge
20.
Zurück zum Zitat Khoussainova N, Kwon Y, Balazinska M, Suciu D (2010) SnipSuggest: context-aware autocompletion for SQL. PVLDB 4(1):22–33 Khoussainova N, Kwon Y, Balazinska M, Suciu D (2010) SnipSuggest: context-aware autocompletion for SQL. PVLDB 4(1):22–33
21.
Zurück zum Zitat Khoussainova N, Kwon, Y, Liao W-T, Balazinska M, Gatterbauer W, Suciu D (2011) Session-based browsing for more effective query reuse. In: Proceedings SSDBM. Portland, OR, pp. 583–585 Khoussainova N, Kwon, Y, Liao W-T, Balazinska M, Gatterbauer W, Suciu D (2011) Session-based browsing for more effective query reuse. In: Proceedings SSDBM. Portland, OR, pp. 583–585
22.
Zurück zum Zitat Li H, Durbin R (2010) Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 26(5):589–595CrossRef Li H, Durbin R (2010) Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 26(5):589–595CrossRef
24.
Zurück zum Zitat Monge AE, Elkan C (1997) An efficient domain-independent algorithm for detecting approximately duplicate database records. In: Proceedings workshop on research issues on data mining and knowledge discovery Monge AE, Elkan C (1997) An efficient domain-independent algorithm for detecting approximately duplicate database records. In: Proceedings workshop on research issues on data mining and knowledge discovery
25.
Zurück zum Zitat Moreau E, Yvon F, Cappé O (2008) Robust similarity measures for named entities matching. In: Proceedings international conference on computational linguistics. Manchester, UK, pp 593–600 Moreau E, Yvon F, Cappé O (2008) Robust similarity measures for named entities matching. In: Proceedings international conference on computational linguistics. Manchester, UK, pp 593–600
26.
Zurück zum Zitat Navarro G (2001) A guided tour to approximate string matching. ACM Comput Surv 33(1):31–88CrossRef Navarro G (2001) A guided tour to approximate string matching. ACM Comput Surv 33(1):31–88CrossRef
27.
Zurück zum Zitat Ögüdücü SG (2010) Web page recommendation models: theory and algorithms. In: Synthesis lectures on data management. Morgan & Claypool Publishers Ögüdücü SG (2010) Web page recommendation models: theory and algorithms. In: Synthesis lectures on data management. Morgan & Claypool Publishers
28.
Zurück zum Zitat Ristad ES, Yianilos PN (1998) Learning string-edit distance. IEEE Trans Pattern Anal Mach Intell 20(5):522–532CrossRef Ristad ES, Yianilos PN (1998) Learning string-edit distance. IEEE Trans Pattern Anal Mach Intell 20(5):522–532CrossRef
29.
Zurück zum Zitat Sapia C (2000) PROMISE: predicting query behavior to enable predictive caching strategies for OLAP systems. In: Proceedings DaWaK. London, UK, pp 224–233 Sapia C (2000) PROMISE: predicting query behavior to enable predictive caching strategies for OLAP systems. In: Proceedings DaWaK. London, UK, pp 224–233
30.
Zurück zum Zitat Smith T, Waterman M (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197CrossRef Smith T, Waterman M (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197CrossRef
31.
Zurück zum Zitat Stefanidis K, Drosou M, Pitoura E (2009) “You May Also Like” results in relational databases. In: Proceedings international workshop on personalized access. Profile management and context awareness: Databases. Lyon, France Stefanidis K, Drosou M, Pitoura E (2009) “You May Also Like” results in relational databases. In: Proceedings international workshop on personalized access. Profile management and context awareness: Databases. Lyon, France
33.
Zurück zum Zitat Yang X, Procopiuc CM, Srivastava D (2009) Recommending join queries via query log analysis. In: Proceedings ICDE. Shanghai, China, pp 964–975 Yang X, Procopiuc CM, Srivastava D (2009) Recommending join queries via query log analysis. In: Proceedings ICDE. Shanghai, China, pp 964–975
34.
Zurück zum Zitat Yao Q, An A, Huang X (2005) Finding and analyzing database user sessions. In: Proceedings DASFAA. Beijing, China, pp 851–862 Yao Q, An A, Huang X (2005) Finding and analyzing database user sessions. In: Proceedings DASFAA. Beijing, China, pp 851–862
Metadaten
Titel
Similarity measures for OLAP sessions
verfasst von
Julien Aligon
Matteo Golfarelli
Patrick Marcel
Stefano Rizzi
Elisa Turricchia
Publikationsdatum
01.05.2014
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 2/2014
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-013-0614-1

Weitere Artikel der Ausgabe 2/2014

Knowledge and Information Systems 2/2014 Zur Ausgabe

Premium Partner