skip to main content
article

Expressing and optimizing sequence queries in database systems

Authors Info & Claims
Published:01 June 2004Publication History
Skip Abstract Section

Abstract

The need to search for complex and recurring patterns in database sequences is shared by many applications. In this paper, we investigate the design and optimization of a query language capable of expressing and supporting efficiently the search for complex sequential patterns in database systems. Thus, we first introduce SQL-TS, an extension of SQL to express these patterns, and then we study how to optimize the queries for this language. We take the optimal text search algorithm of Knuth, Morris and Pratt, and generalize it to handle complex queries on sequences. Our algorithm exploits the interdependencies between the elements of a pattern to minimize repeated passes over the same data. Experimental results on typical sequence queries, such as double bottom queries, confirm that substantial speedups are achieved by our new optimization techniques.

References

  1. Agrawal, R. and Srikant, R. 1995. Mining sequential patterns. In Proceedings of the International Conference on Data Engineering.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Alur, N., Haas, P., Momiroska, D., Read, P., Summers, N., Totanes, V., and Zuzarte, C. 2002. Db2 UDBS high-function business intelligence in e-business. IBM redbooks, IBM, http://www.redbooks.ibm.com/redbooks/pdfs/sg246546.pdf.]]Google ScholarGoogle Scholar
  3. Arasu, A., Babu, S., and Widom, J. 2002. An abstract semantics and concrete language for continuous queries over streams and relations. Tech. rep., Stanford Univ., Stanford, Calif.]]Google ScholarGoogle Scholar
  4. Babcock, B., Babu, S., Datar, M., Motawani, R., and Widom, J. 2002. Models and issues in data stream systems. In Proceedings of the 21st ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM, New York, 1--16.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Berry, M. and Linoff, M. 1996. Data Mining Techniques for Marketing, Sales, and Customer Support. Wiley, New York.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Boag, S., Chamberlin, D., Fernandez, M. F., Florescu, D., Robie, J., and Simeon, J. (Eds.). 2003. Xquery 1.0: An XML query language--working draft 22 august 2003. Working Draft 22 August 2003, W3C, http://www.w3.org/tr/xquery/.]]Google ScholarGoogle Scholar
  7. Boyer, R. S. and Moore, J. S. 1977. A fast string searching algorithm. Commun. ACM 20, 10, 762--772.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Carney, D., Cetintemel, U., Cherniack, M., Convey, C., Lee, S., Seidman, G., Stonebraker, M., Tatbul, N., and Zdonik, S. 2002. Monitoring streams---A new class of data management applications. In VLDB (Hong Kong, China).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Chandrasekaran, S., Cooper, O., Deshpande, A., Franklin, M. J., Hellerstein, J. M., Hong, W., Krishnamurthy, S., Madden, S. R., Raman, V., Reiss, F., and Shah, M. A. 2003. Telegraphcq: Continuous dataflow processing for an uncertain world. In CIDR 2003: First Biennial Conference on Innovative Data Systems Research (Asilomar, Calif., Jan. 5--8).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Edwards, R. and Magee, J. 1997. Technical Analysis of Stock Trends. AMACOM.]]Google ScholarGoogle Scholar
  11. Faloutsos, C., Ranganathan, M., and Manolopoulos, Y. 1994. Fast subsequence matching in time-series databases. In Proceedings of the International Conference on Management of Data. 419--429.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Ferragina, P., Koudas, N., Muthukrishnan, S., and Srivastava, D. 2001. Two-dimensional substring indexing. In Proceedings of the 20th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (Santa Barbara, Calif., May 21--23). ACM, New York.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Gatziu, S. and Dittrich, K. R. 1993. Events in an object-oriented database system. In Proceedings of the 1st International Conference on Rules in Database Systems.]]Google ScholarGoogle Scholar
  14. Gehani, N. H., Jagadish, H. V., and Shmueli, O. 1992. Composite event specification in active databases: Model and implementation. In Proceedings of the 18th International Conference on Very Large Data Bases.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Guo, S., Sun, W., and Weiss, M. 1996a. On satisfiability, equivalence, and implication problems involving conjunctive queries in database systems. IEEE Trans. Knowl. Data Eng. 8, 4, 604--616.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Guo, S., Sun, W., and Weiss, M. A. 1996b. Solving satisfiability and implication problems in database systems. ACM Trans. Datab. Syst. 21, 2, 270--293.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Hellerstein, J. M., Hass, P. J., and Wang, H. J. 1997. Online aggregation. In Proceedings of the International Conference on Management of Data. ACM, New York, 171--182.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Informix Software, Inc. 1998. Managing time-series data in financial applications. White Paper.]]Google ScholarGoogle Scholar
  19. Karp, R. and Rabin, M. O. 1987. Efficient randomized pattern matching algorithms. IBM J. Res. Devel. 31, 2 (Mar.), 249--260.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Klug, A. 1988. On conjunctive queries containing inequalities. J. ACM 35, 1, 146--160.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Knuth, D. E., Morris, J. H., and Pratt, V. R. 1997. Fast pattern matching in strings. SIAM J. Comput. 6, 2 (June), 323--350.]]Google ScholarGoogle Scholar
  22. Mesrobian, E., Muntz, R., Santos, J., Shek, E., Mechoso, C., Farrara, J., and Stolorz, P. 1994. Extracting spatio-temporal patterns from geoscience datasets. In Proceedings of the IEEE Workshop on Visualization and Machine Vision.]]Google ScholarGoogle Scholar
  23. Motakis, I. and Zaniolo, C. 1997. Temporal aggregation in active databases. In Int. Conf. on the Managment of Data.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Perng, C.-S. and Parker, D. S. 1999. SQL/LPP: A time series extension of SQL based on limited patience patterns. In Database and Expert Systems Applications, 10th International Conference (DEXA '99). Lecture Notes in Computer Science, vol. 1677. Springer-Verlag, New York.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Rafiei, D. and Mendelzon, A. 2000. Querying time series data based on similarity. TKDE 12, 5, 675--693.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Ramakrishnan, R., Donjerkovic, D., Ranganathan, A., Beyer, K., and Krishnaprasad, M. 1998. SRQL: Sorted relational query language. In Proceedings of the 10th Annual International Conference on Scientific and Statistical Database Management (Capri, Italy, July 1--3). IEEE Computer Society Press, Los Alamitos, Calif., 84--95.]] Google ScholarGoogle ScholarCross RefCross Ref
  27. Rosenkrantz, D. and Hunt III, H. B. 1970. Processing conjunctive predicates and queries. In Proceedings of the 6th International Conference on Very Large Databases. 64--72.]]Google ScholarGoogle Scholar
  28. Sadri, R. 2001. Optimization of sequence queries in database systems. Ph.D. thesis, University of California, Los Angeles.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Seshadri, P. 1998. Predator: A resource for database research. SIGMOD Record 27, 1, 16--20.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Seshadri, P., Linvy, M., and Ramakrishnan, R. 1994. Sequence query processing. In Proceedings of ACM SIGMOD Conference on Management of Data. ACM, New York, 430--441.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Seshadri, P., Livny, M., and Ramakrishnan, R. 1995. SEQ: A model for sequence databases. In ICDE. 232--239.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Sun, W. and Yu, C. 1994. Semantic query optimization for tree and chain queries. IEEE Trans. Knowl. Data Eng. 6, 5.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Sun, X., Kamell, N. N., and Ni, L. M. 1989. Processing implication on queries. IEEE Trans. Softw. Eng. 5, 10 (Oct.), 168--175.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Ullman, J. D. 1989. Principles of Database and Knowledge-Base Systems. Vol. 2. Computer Science Press.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Wang, H. and Perng, C. 2003. The s2-tree: An index structure for subsequence matching of spatial objects. In Proceedings of the 5th Pacific-Asic Conference on Knowledge Discovery and Data Mining (PAKDD) (Hong Kong, China).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Wang, H. and Zaniolo, C. 2000. Using SQL to build new aggregates and extenders for object-relational systems. In Proceedings of 26th International Conference on Very Large Data Bases.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Wright, C. A., Cumberland, L., and Feng, Y. 1998. A performance comparison between five string pattern matching algorithms. Technical Report. http://ocean.st.usm.edu/∼cawright/pattern_matching.html.]]Google ScholarGoogle Scholar
  38. Zemke, F., Kulkarni, K., Witkowski, A., and Lyle, B. 1999. Proposal for OLAP functions. Tech. Rep. ANSI NCITS H2-99-155, ISO/IEC JTC1/SC32 WG3:YGJ-nnn, http://ganga.iiml.ac.in/∼bhasker/dmds/99-154.pdf.]]Google ScholarGoogle Scholar

Index Terms

  1. Expressing and optimizing sequence queries in database systems

            Recommendations

            Reviews

            Alan Raymond Hevner

            Sequence queries scan databases to detect patterns and trends of interest. For example, analyses of stock market prices, meteorological trends, and consumer purchasing patterns can be supported via appropriately defined sequence queries on typically large databases. This clear and insightful paper investigates the design and optimization of an extended query language, capable of expressing complex sequence queries. The simple query language for time series (SQL-TS) employs several new constructs (SEQUENCE BY and CLUSTER BY) to support the description of the desired data sequence among ordered relational database tuples. The principal contribution of this research is an algorithm for optimizing the search for valid patterns. The authors adapt and extend the optimal string search algorithm of Knuth-Morris-Pratt to data sequences. The generalization is challenging; instead of searching for strings of letters from a finite alphabet, the optimization algorithm must search for sequences of structured tuples, qualified by arbitrarily complex expressions of propositional predicates. The presentation of the new algorithm is handled well, via numerous examples, and clear illustrations of the required matrix computations. The algorithm uses logical interdependencies among the elements of the desired pattern to avoid repetitive scans over the data. The authors extend the optimization to include repetitive patterns, arbitrary aggregates, and disjunctive patterns. Experimentation demonstrates substantial speedups for several practical applications. Extensions of this promising approach include application to spatio-temporal databases, integration of high-performance indexing techniques into the optimization algorithm, and the ability to handle sequence queries on real-time data streams.

            Access critical reviews of Computing literature here

            Become a reviewer for Computing Reviews.

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Transactions on Database Systems
              ACM Transactions on Database Systems  Volume 29, Issue 2
              June 2004
              211 pages
              ISSN:0362-5915
              EISSN:1557-4644
              DOI:10.1145/1005566
              Issue’s Table of Contents

              Copyright © 2004 ACM

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 1 June 2004
              Published in tods Volume 29, Issue 2

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • article

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader