skip to main content
research-article

Relational languages and data models for continuous queries on sequences and data streams

Published:02 June 2011Publication History
Skip Abstract Section

Abstract

Most data stream management systems are based on extensions of the relational data model and query languages, but rigorous analyses of the problems and limitations of this approach, and how to overcome them, are still wanting. In this article, we elucidate the interaction between stream-oriented extensions of the relational model and continuous query language constructs, and show that the resulting expressive power problems are even more serious for data streams than for databases. In particular, we study the loss of expressive power caused by the loss of blocking query operators, and characterize nonblocking queries as monotonic functions on the database. Thus we introduce the notion of NB-completeness to assure that a query language is as suitable for continuous queries as it is for traditional database queries. We show that neither RA nor SQL are NB-complete on unordered sets of tuples, and the problem is even more serious when the data model is extended to support order—a sine-qua-non in data stream applications. The new limitations of SQL, compounded with well-known problems in applications such as sequence queries and data mining, motivate our proposal of extending the language with user-defined aggregates (UDAs). These can be natively coded in SQL, according to simple syntactic rules that set nonblocking aggregates apart from blocking ones. We first prove that SQL with UDAs is Turing complete. We then prove that SQL with monotonic UDAs and union operators can express all monotonic set functions computable by a Turing machine (NB-completeness) and finally extend this result to queries on sequences ordered by their timestamps. The proposed approach supports data stream models that are more sophisticated than append-only relations, along with data mining queries, and other complex applications.

References

  1. Abiteboul, S., Hull, R., and Vianu, V. 1995. Foundations of Databases. Addison-Wesley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Arasu, A., Babu, S., and Widom, J. 2003. CQL: A language for continuous queries over streams and relations. In Proceedings of the International Conference on Database Programming Languages (DBPL). 1--19.Google ScholarGoogle Scholar
  3. Arasu, A. and Widom, J. 2004. Resource sharing in continuous sliding-window aggregates. In Proceedings of the International Conference on Very large Database (VLDB). 336--347. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Babcock, B., Babu, S., Datar, M., Motawani, R., and Widom, J. 2002. Models and issues in data stream systems. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bai, Y., Thakkar, H., Wang, H., Luo, C., and Zaniolo, C. 2006. A data stream language and system designed for power and extensibility. In Proceedings of the International Conference on Information and Knowledge Management (CIKM). 337--346. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bai, Y., Thakkar, H., Wang, H., and Zaniolo, C. 2008. Timestamp management and query execution models in data stream management systems. IEEE Internet Comput. 12, 6, 13--21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Barbara, D. 1999. The characterization of continuous queries. Intl. J. Coop. Inform. Syst. 8, 4, 295--323.Google ScholarGoogle ScholarCross RefCross Ref
  8. Bohlen, M. H. 1994. The temporal deductive database system chronolog. Ph.D. thesis, Department Informatick, ETH Zurich.Google ScholarGoogle Scholar
  9. Carney, D., Cetintemel, U., Cherniack, M., Convey, C., Lee, S., Seidman, G., Stonebraker, M., Tatbul, N., and Zdonik, S. 2002. Monitoring streams—a new class of data management applications. In Proceedings of the International Conference on Very large Databases (VLDB). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Celko, J. 1995. SQL for Smarties. Morgan Kaufmann, Chapter Advanced SQL Programming. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Chandrasekaran, S. and Franklin, M. 2002. Streaming queries over streaming data. In Proceedings of the International Conference on Very large Databases (VLDB). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Chen, J., DeWitt, D. J., Tian, F., and Wang, Y. 2000. NiagaraCQ: A scalable continuous query system for internet databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). 379--390. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Cranor, C., Gao, Y., Johnson, T., Shkapenyuk, V., and Spatscheck, O. 2002. Gigascope: High performance network monitoring with an SQL interface. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). ACM Press, 623. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Golab, L. and Özsu, M. T. 2003a. Issues in data stream management. ACM SIGMOD Record 32, 2, 5--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Golab, L. and Özsu, M. T. 2003b. Processing sliding window multi-joins in continuous queries over data streams. In Proceedings of the International Conference on Very large Databases (VLDB). 500--511. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Golab, L. and Özsu, M. T. 2005. Update-pattern-aware modeling and processing of continuous queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). 658--669. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Gurevich, Y., Leinders, D., and den Bussche, J. V. 2007. A theory of stream queries. In Proceedings of Database Programming Languages, 11th International Symposium (DBPL). M. Arenas and M. I. Schwartzbach, Eds. Springer, 153--168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Gyllstrom, D., Agrawal, J., Diao, Y., and Immerman, N. 2008. On supporting kleene closure over event streams. In Proceedings of the International Conference on Data Engineering (ICDE). 1391--1393. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Han, J., Fu, Y., Wang, W., Koperski, K., and Zaiane, O. R. 1996. DMQL: A data mining query language for relational databases. In Proceedings of Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD). 27--33.Google ScholarGoogle Scholar
  20. Imielinski, T. and Virmani, A. 1999. MSQL: a query language for database mining. Data Mining Knowl. Discov. 3, 373--408. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. ISO/IEC. 2003. Database languages—sql, iso/iec 9075-*:2003.Google ScholarGoogle Scholar
  22. Jagadish, H., Mumick, I., and Silberschatz, A. 1995. View maintenance issues for the chronicle data model. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). 113--124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Johnson, T., Muthukrishnan, S., Shkapenyuk, V., and Spatscheck, O. 2005. A heartbeat mechanism and its application in gigascope. In Proceedings of the International Conference on Very large Databases (VLDB). 1079--1088. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Kang, J., Naughton, J. F., and Viglas, S. 2003. Evaluating window joins over unbounded streams. In Proceedings of the International Conference on Data Engineering (ICDE). 341--352.Google ScholarGoogle Scholar
  25. Krämer, J. and Seeger, B. 2005. A temporal foundation for continuous queries over data streams. In Proceedings of the International Conference on Management of Data (COMAD). 70--82.Google ScholarGoogle Scholar
  26. Krämer, J. and Seeger, B. 2009. Semantics and implementation of continuous sliding window queries over data streams. ACM Trans. Datab. Syst. 34, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Krishnan, R. and Goldstein, J. 2010. Microsoft streaminsight: A hitchhiker's guide to Microsoft streaminsight queries. www.microsoft.com/download/8/9/C/89C4DB34-6AB9-4AA2-B2F5-4.Google ScholarGoogle Scholar
  28. Law, Y.-N., Wang, H., and Zaniolo, C. 2004. Data models and query language for data streams. In Proceedings of the International Conference on Very Large Databases (VLDB). 492--503. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Li, J., Maier, D., Tufte, K., Papadimos, V., and Tucke, P. A. 2005a. Semantics and evaluation techniques for window aggregates in data streams. In Proceedings of the ACM SIGMOD International Conference on Mangement of Data (SIGMOD). 311--322. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Li, J., Maier, D., Tufte, K., Papadimos, V., and Tucker, P. A. 2005. No pane, no gain: efficient evaluation of sliding-window aggregates over data streams. SIGMOD Record 34, 1, 39--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Liu, L., Pu, C., and Tang, W. 1999. Continual queries for internet scale event-driven information delivery. IEEE Trans. Knowl. Data Eng. 11, 4, 583--590. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Luo, C., Thakkar, H., Wang, H., and Zaniolo, C. 2005. A native extension of SQL for mining data streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). 873--875. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Madden, S., Shah, M. A., Hellerstein, J. M., and Raman, V. 2002. Continuously adaptive continuous queries over streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). 49--61. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Meo, R., Psaila, G., and Ceri, S. 1996. A new SQL-like operator for mining association rules. In Proceedings of the International Conference on Very large Databases (VLDB). 122--133. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Motwani, R., Widom, J., Arasu, A., Babcock, B., S. Babu, M. D., Manku, G., Olston, C., Rosenstein, J., and Varma, R. 2003. Query processing, approximation, and resource management in a data stream management system. In Proceedings of the 1st CIDR Conference.Google ScholarGoogle Scholar
  36. Mozafari, B., Zeng, K., and Zaniolo, C. 2010a. From regular expressions to nested words: Unifying languages and query execution for relational and xml sequences. Proc. VLDB 3, 1, 150--161. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Mozafari, B., Zeng, K., and Zaniolo, C. 2010b. From regular expressions to nested words: Unifying languages and query execution for relational and XML sequences. In Proceedings of the International Conference on Very large Database (VLDB). Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Mozafari, B., Zeng, K., and Zaniolo, C. 2010c. K*SQL: a unifying engine for sequence patterns and XML. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). 1143--1146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Perng, C.-S. and Parker, D. S. 1999. SQL/LPP: A time series extension of SQL based on limited patience patterns. In Proceedings of the International Conference on Database and Expert Systems (DEXA). Lecture Notes in Computer Science Series, vol. 1677. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Ramakrishnan, R., Donjerkovic, D., Ranganathan, A., Beyer, K., and Krishnaprasad, M. 1998. SRQL: Sorted relational query language. In Proceedings of the International Conference on Scientific and Statistical Database Management (SSBM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Rozenshtein, D., Abramovich, A., and Birger, E. 1993. Loop-free SQL solutions for finding continuous regions. SQL Forum 2, 6.Google ScholarGoogle Scholar
  42. Sadri, R., Zaniolo, C., Zarkesh, A., and Adibi, J. 2001a. Optimization of sequence queries in database systems. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Sadri, R., Zaniolo, C., Zarkesh, A. M., and Adibi, J. 2001b. A sequential pattern query language for supporting instant data minining for e-services. In Proceedings of the International Conference on Very large Databases (VLDB). 653--656. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Sadri, R., Zaniolo, C., Zarkesh, A. M., and Adibi, J. 2004. Expressing and optimizing sequence queries in database systems. ACM Trans. Datab. Syst. 29, 2, 282--318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Sarawagi, S., Thomas, S., and Agrawal, R. 1998. Integrating association rule mining with relational database systems: Alternatives and implications. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Seshadri, P. 1998. Predator: A resource for database research. SIGMOD Record 27, 1, 16--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Seshadri, P., Livny, M., and Ramakrishnan, R. 1994. Sequence query processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). R. T. Snodgrass and M. Winslett, Eds. ACM Press, 430--441. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Soulé, R., Hirzel, M., Grimm, R., Gedik, B., Andrade, H., Kumar, V., and Wu, K.-L. 2010. A universal calculus for stream processing languages. In Proceedings of the Conference on Programming Languages and Systems (ESOP). A. D. Gordon, Ed. Springer, 507--528. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Srivastava, U. and Widom, J. 2004. Flexible time management in data stream systems. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Databases Systems (PODS). 263--274. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Sullivan, M. 1996. Tribeca: A stream database manager for network traffic analysis. In Proceedings of the International Conference on Very large Database (VLDB). Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Technologies, I. I. 1994. Illustra user guide.Google ScholarGoogle Scholar
  52. Terry, D., Goldberg, D., Nichols, D., and Oki, B. 1992. Continuous queries over append-only databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). 321--330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Thakkar, H., Laptev, N., Mousavi, H., Mozafari, B., Russo, V., and Zaniolo, C. 2011. SMM: a data stream management system for knowledge discovery. In Proceedings of the International Conference on Data Engineering (ICDE) 27, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Tucker P., Maier, D., and Sheard T. 2003a. Applying punctuation schemes to queries over continuous data streams. IEEE Data Eng. Bull. 26, 1, 33--40.Google ScholarGoogle Scholar
  55. Tucker, P. A., Maier, D., Sheard, T., and Fegaras, L. 2003b. Exploiting punctuation semantics in continuous data streams. IEEE Trans. Knowl. Data Eng. 15, 3, 555--568. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Wang, H. and Zaniolo, C. 2003. ATLaS: a native extension of SQL for data minining. In Proceedings of the 3rd SIAM International Conference on Data Mining. 130--141.Google ScholarGoogle Scholar
  57. Wu, E., Diao, Y., and Rizvi, S. 2006. High-performance complex event processing over streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). 407--418. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Zaniolo, C., Ceri, S., Faloutsos, C., Snodgrass, R., Subrahmanian, V. S., and Zicari, R. 1997. Advanced Database Systems. Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Zemke, F., Witkowski, A., Cherniak, M., and Colby, L. 2007. Pattern matching in sequences of rows. In http://asktom.oracle.com/tkyte/row-patternrecogniton-11-public.pdf http://www.sqlsnippets.com/en/topic-12162.html.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image ACM Transactions on Database Systems
    ACM Transactions on Database Systems  Volume 36, Issue 2
    May 2011
    257 pages
    ISSN:0362-5915
    EISSN:1557-4644
    DOI:10.1145/1966385
    Issue’s Table of Contents

    Copyright © 2011 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 2 June 2011
    • Accepted: 1 November 2010
    • Revised: 1 September 2010
    • Received: 1 April 2010
    Published in tods Volume 36, Issue 2

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader