Abstract
The need to search for complex and recurring patterns in database sequences is shared by many applications. In this paper, we investigate the design and optimization of a query language capable of expressing and supporting efficiently the search for complex sequential patterns in database systems. Thus, we first introduce SQL-TS, an extension of SQL to express these patterns, and then we study how to optimize the queries for this language. We take the optimal text search algorithm of Knuth, Morris and Pratt, and generalize it to handle complex queries on sequences. Our algorithm exploits the interdependencies between the elements of a pattern to minimize repeated passes over the same data. Experimental results on typical sequence queries, such as double bottom queries, confirm that substantial speedups are achieved by our new optimization techniques.
- Agrawal, R. and Srikant, R. 1995. Mining sequential patterns. In Proceedings of the International Conference on Data Engineering.]] Google ScholarDigital Library
- Alur, N., Haas, P., Momiroska, D., Read, P., Summers, N., Totanes, V., and Zuzarte, C. 2002. Db2 UDBS high-function business intelligence in e-business. IBM redbooks, IBM, http://www.redbooks.ibm.com/redbooks/pdfs/sg246546.pdf.]]Google Scholar
- Arasu, A., Babu, S., and Widom, J. 2002. An abstract semantics and concrete language for continuous queries over streams and relations. Tech. rep., Stanford Univ., Stanford, Calif.]]Google Scholar
- Babcock, B., Babu, S., Datar, M., Motawani, R., and Widom, J. 2002. Models and issues in data stream systems. In Proceedings of the 21st ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM, New York, 1--16.]] Google ScholarDigital Library
- Berry, M. and Linoff, M. 1996. Data Mining Techniques for Marketing, Sales, and Customer Support. Wiley, New York.]] Google ScholarDigital Library
- Boag, S., Chamberlin, D., Fernandez, M. F., Florescu, D., Robie, J., and Simeon, J. (Eds.). 2003. Xquery 1.0: An XML query language--working draft 22 august 2003. Working Draft 22 August 2003, W3C, http://www.w3.org/tr/xquery/.]]Google Scholar
- Boyer, R. S. and Moore, J. S. 1977. A fast string searching algorithm. Commun. ACM 20, 10, 762--772.]] Google ScholarDigital Library
- Carney, D., Cetintemel, U., Cherniack, M., Convey, C., Lee, S., Seidman, G., Stonebraker, M., Tatbul, N., and Zdonik, S. 2002. Monitoring streams---A new class of data management applications. In VLDB (Hong Kong, China).]] Google ScholarDigital Library
- Chandrasekaran, S., Cooper, O., Deshpande, A., Franklin, M. J., Hellerstein, J. M., Hong, W., Krishnamurthy, S., Madden, S. R., Raman, V., Reiss, F., and Shah, M. A. 2003. Telegraphcq: Continuous dataflow processing for an uncertain world. In CIDR 2003: First Biennial Conference on Innovative Data Systems Research (Asilomar, Calif., Jan. 5--8).]] Google ScholarDigital Library
- Edwards, R. and Magee, J. 1997. Technical Analysis of Stock Trends. AMACOM.]]Google Scholar
- Faloutsos, C., Ranganathan, M., and Manolopoulos, Y. 1994. Fast subsequence matching in time-series databases. In Proceedings of the International Conference on Management of Data. 419--429.]] Google ScholarDigital Library
- Ferragina, P., Koudas, N., Muthukrishnan, S., and Srivastava, D. 2001. Two-dimensional substring indexing. In Proceedings of the 20th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (Santa Barbara, Calif., May 21--23). ACM, New York.]] Google ScholarDigital Library
- Gatziu, S. and Dittrich, K. R. 1993. Events in an object-oriented database system. In Proceedings of the 1st International Conference on Rules in Database Systems.]]Google Scholar
- Gehani, N. H., Jagadish, H. V., and Shmueli, O. 1992. Composite event specification in active databases: Model and implementation. In Proceedings of the 18th International Conference on Very Large Data Bases.]] Google ScholarDigital Library
- Guo, S., Sun, W., and Weiss, M. 1996a. On satisfiability, equivalence, and implication problems involving conjunctive queries in database systems. IEEE Trans. Knowl. Data Eng. 8, 4, 604--616.]] Google ScholarDigital Library
- Guo, S., Sun, W., and Weiss, M. A. 1996b. Solving satisfiability and implication problems in database systems. ACM Trans. Datab. Syst. 21, 2, 270--293.]] Google ScholarDigital Library
- Hellerstein, J. M., Hass, P. J., and Wang, H. J. 1997. Online aggregation. In Proceedings of the International Conference on Management of Data. ACM, New York, 171--182.]] Google ScholarDigital Library
- Informix Software, Inc. 1998. Managing time-series data in financial applications. White Paper.]]Google Scholar
- Karp, R. and Rabin, M. O. 1987. Efficient randomized pattern matching algorithms. IBM J. Res. Devel. 31, 2 (Mar.), 249--260.]] Google ScholarDigital Library
- Klug, A. 1988. On conjunctive queries containing inequalities. J. ACM 35, 1, 146--160.]] Google ScholarDigital Library
- Knuth, D. E., Morris, J. H., and Pratt, V. R. 1997. Fast pattern matching in strings. SIAM J. Comput. 6, 2 (June), 323--350.]]Google Scholar
- Mesrobian, E., Muntz, R., Santos, J., Shek, E., Mechoso, C., Farrara, J., and Stolorz, P. 1994. Extracting spatio-temporal patterns from geoscience datasets. In Proceedings of the IEEE Workshop on Visualization and Machine Vision.]]Google Scholar
- Motakis, I. and Zaniolo, C. 1997. Temporal aggregation in active databases. In Int. Conf. on the Managment of Data.]] Google ScholarDigital Library
- Perng, C.-S. and Parker, D. S. 1999. SQL/LPP: A time series extension of SQL based on limited patience patterns. In Database and Expert Systems Applications, 10th International Conference (DEXA '99). Lecture Notes in Computer Science, vol. 1677. Springer-Verlag, New York.]] Google ScholarDigital Library
- Rafiei, D. and Mendelzon, A. 2000. Querying time series data based on similarity. TKDE 12, 5, 675--693.]] Google ScholarDigital Library
- Ramakrishnan, R., Donjerkovic, D., Ranganathan, A., Beyer, K., and Krishnaprasad, M. 1998. SRQL: Sorted relational query language. In Proceedings of the 10th Annual International Conference on Scientific and Statistical Database Management (Capri, Italy, July 1--3). IEEE Computer Society Press, Los Alamitos, Calif., 84--95.]] Google ScholarCross Ref
- Rosenkrantz, D. and Hunt III, H. B. 1970. Processing conjunctive predicates and queries. In Proceedings of the 6th International Conference on Very Large Databases. 64--72.]]Google Scholar
- Sadri, R. 2001. Optimization of sequence queries in database systems. Ph.D. thesis, University of California, Los Angeles.]] Google ScholarDigital Library
- Seshadri, P. 1998. Predator: A resource for database research. SIGMOD Record 27, 1, 16--20.]] Google ScholarDigital Library
- Seshadri, P., Linvy, M., and Ramakrishnan, R. 1994. Sequence query processing. In Proceedings of ACM SIGMOD Conference on Management of Data. ACM, New York, 430--441.]] Google ScholarDigital Library
- Seshadri, P., Livny, M., and Ramakrishnan, R. 1995. SEQ: A model for sequence databases. In ICDE. 232--239.]] Google ScholarDigital Library
- Sun, W. and Yu, C. 1994. Semantic query optimization for tree and chain queries. IEEE Trans. Knowl. Data Eng. 6, 5.]] Google ScholarDigital Library
- Sun, X., Kamell, N. N., and Ni, L. M. 1989. Processing implication on queries. IEEE Trans. Softw. Eng. 5, 10 (Oct.), 168--175.]] Google ScholarDigital Library
- Ullman, J. D. 1989. Principles of Database and Knowledge-Base Systems. Vol. 2. Computer Science Press.]] Google ScholarDigital Library
- Wang, H. and Perng, C. 2003. The s2-tree: An index structure for subsequence matching of spatial objects. In Proceedings of the 5th Pacific-Asic Conference on Knowledge Discovery and Data Mining (PAKDD) (Hong Kong, China).]] Google ScholarDigital Library
- Wang, H. and Zaniolo, C. 2000. Using SQL to build new aggregates and extenders for object-relational systems. In Proceedings of 26th International Conference on Very Large Data Bases.]] Google ScholarDigital Library
- Wright, C. A., Cumberland, L., and Feng, Y. 1998. A performance comparison between five string pattern matching algorithms. Technical Report. http://ocean.st.usm.edu/∼cawright/pattern_matching.html.]]Google Scholar
- Zemke, F., Kulkarni, K., Witkowski, A., and Lyle, B. 1999. Proposal for OLAP functions. Tech. Rep. ANSI NCITS H2-99-155, ISO/IEC JTC1/SC32 WG3:YGJ-nnn, http://ganga.iiml.ac.in/∼bhasker/dmds/99-154.pdf.]]Google Scholar
Index Terms
- Expressing and optimizing sequence queries in database systems
Recommendations
Equivalence and minimization of conjunctive queries under combined semantics
ICDT '12: Proceedings of the 15th International Conference on Database TheoryThe problems of query containment, equivalence, and minimization are fundamental problems in the context of query processing and optimization. In their classic work [2] published in 1977, Chandra and Merlin solved the three problems for the language of ...
Optimizing large star-schema queries with snowflakes via heuristic-based query rewriting
CASCON '03: Proceedings of the 2003 conference of the Centre for Advanced Studies on Collaborative researchUser queries have been becoming increasingly complex (e.g., involving a large number of joins) as database technology is applied to some application domains such as data warehouses and life sciences. Query optimizers in existing database management ...
Optimizing complex queries based on similarities of subqueries
As database technology is applied to more and more application domains, user queries are becoming increasingly complex (e.g. involving a large number of joins and a complex query structure). Query optimizers in existing database management systems (DBMS)...
Comments