Abstract
The purpose of data warehouses is to enable business analysts to make better decisions. Over the years the technology has matured and data warehouses have become extremely successful. As a consequence, more and more data has been added to the data warehouses and their schemas have become increasingly complex. These systems still work great in order to generate pre-canned reports. However, with their current complexity, they tend to be a poor match for non tech-savvy business analysts who need answers to ad-hoc queries that were not anticipated.
This paper describes the design, implementation, and experience of the SODA system (Search over DAta Warehouse). SODA bridges the gap between the business needs of analysts and the technical complexity of current data warehouses. SODA enables a Google-like search experience for data warehouses by taking keyword queries of business users and automatically generating executable SQL. The key idea is to use a graph pattern matching algorithm that uses the metadata model of the data warehouse. Our results with real data from a global player in the financial services industry show that SODA produces queries with high precision and recall, and makes it much easier for business users to interactively explore highly-complex data warehouses.
- S. Agrawal, S. Chaudhuri, and G. Das. DBExplorer: A System for Keyword-Based Search over Relational Databases. In ICDE, pages 5--16, 2002. Google ScholarDigital Library
- S. Bergamaschi, E. Domnori, F. Guerra, R. T. Lado, and Y. Velegrakis. Keyword Search over Relational Databases: A Metadata Approach. In SIGMOD, pages 565--576, 2011. Google ScholarDigital Library
- G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan. Keyword Searching and Browsing in Databases using BANKS. In ICDE, pages 431--440, 2002. Google ScholarDigital Library
- L. Blunschi, C. Jossen, D. Kossmann, M. Mori, and K. Stockinger. Data-Thirsty Business Analysts need SODA - Search Over DAta Warehouse. In CIKM (demo), pages 2525--2528, 2011. Google ScholarDigital Library
- G. Brunner and K. Stockinger. Data Warehouse Historization Concept. Credit Suisse internal architecture document, 2008.Google Scholar
- E. Demidova, I. Oelze, and P. Fankhauser. Do We Mean the Same?: Disambiguation of Extracted Keyword Queries for Database Search. In KEYS, pages 33--38, 2009. Google ScholarDigital Library
- E. Demidova, X. Zhou, I. Oelze, and W. Nejdl. Evaluating Evidences for Keyword Query Disambiguation in Entity Centric Database Search. In DEXA (2), pages 240--247, 2010. Google ScholarDigital Library
- A. Geppert, L. Baumgartner, and D. Jonscher. The Data Warehouse Reference Architecture. Credit Suisse internal architecture document, 2008.Google Scholar
- H. He, H. Wang, J. Yang, and P. S. Yu. BLINKS: Ranked Keyword Searches on Graphs. In SIGMOD, pages 305--316, 2007. Google ScholarDigital Library
- V. Hristidis and Y. Papakonstantinou. DISCOVER: Keyword Search in Relational Databases. In VLDB, pages 670--681, 2002. Google ScholarDigital Library
- C. Jossen, L. Blunschi, M. Mori, D. Kossmann, and K. Stockinger. The Credit Suisse Meta-data Warehouse. In ICDE, 2012. Google ScholarDigital Library
- N. Khoussainova, Y. Kwon, M. Balazinska, and D. Suciu. SnipSuggest: Context-Aware Autocompletion for SQL. PVLDB, 4(1): 22--33, 2010. Google ScholarDigital Library
- R. Kimball. The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses. John Wiley, 1996. Google ScholarDigital Library
- Y. Li, H. Yang, and H. V. Jagadish. NaLIX: Generic Natural Language Search Environment for XML Data. Transactions on Database Systems, 32(4), 2007. Google ScholarDigital Library
- F. Liu, C. Yu, W. Meng, and A. Chowdhury. Effective Keyword Search in Relational Databases. In SIGMOD, pages 563--574, 2006. Google ScholarDigital Library
- Z. Liu and Y. Chen. Processing Keyword Search on XML: A Survey. World Wide Web, 14(5--6): 671--707, 2011. Google ScholarDigital Library
- M. Ortega-Binderberger, K. Chakrabarti, and S. Mehrotra. An Approach to Integrating Query Refinement in SQL. In EDBT, pages 15--33, 2002. Google ScholarDigital Library
- L. Qin, J. X. Yu, and L. Chang. Keyword Search in Databases: The Power of RDBMS. In SIGMOD, pages 681--694, 2009. Google ScholarDigital Library
- A. Simitsis, G. Koutrika, and Y. Ioannidis. Précis: From Unstructured Keywords as Queries to Structured Databases as Answers. VLDB Journal, 17(1): 117--149, 2008. Google ScholarDigital Library
- R. T. Snodgrass. Developing Time-Oriented Database Applications in SQL. Morgan Kaufmann, 1999. Google ScholarDigital Library
- http://www.w3.org/TR/rdf-sparql-query/. SPARQL Query Language for RDF.Google Scholar
- A. S. Szalay, J. Gray, A. Thakar, P. Z. Kunszt, T. Malik, J. Raddick, C. Stoughton, and J. vandenBerg. The SDSS Skyserver: Public Access to the Sloan Digital Sky Server Data. In SIGMOD, pages 570--581, 2002. Google ScholarDigital Library
- S. Tata and G. M. Lohman. SQAK: Doing More with Keywords. In SIGMOD, pages 889--902, 2008. Google ScholarDigital Library
- X. Yang, C. M. Procopiuc, and D. Srivastava. Summarizing Relational Database. PVLDB, 2(1): 634--645, 2009. Google ScholarDigital Library
Recommendations
SODA: A framework for spatial observation data analysis
Very large amounts of geospatial data are daily generated by many observation processes in different application domains. The amount of produced data is increasing due to the advances in the use of modern automatic sensing devices and also in the ...
Data-thirsty business analysts need SODA: search over data warehouse
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge managementQuerying large data warehouses is very hard for non-tech savvy business users. Deep technical knowledge of both SQL as well as the schema of the database is required in order to build correct queries and to come up with new business insights. In this ...
Comments