Abstract
Most data stream management systems are based on extensions of the relational data model and query languages, but rigorous analyses of the problems and limitations of this approach, and how to overcome them, are still wanting. In this article, we elucidate the interaction between stream-oriented extensions of the relational model and continuous query language constructs, and show that the resulting expressive power problems are even more serious for data streams than for databases. In particular, we study the loss of expressive power caused by the loss of blocking query operators, and characterize nonblocking queries as monotonic functions on the database. Thus we introduce the notion of NB-completeness to assure that a query language is as suitable for continuous queries as it is for traditional database queries. We show that neither RA nor SQL are NB-complete on unordered sets of tuples, and the problem is even more serious when the data model is extended to support order—a sine-qua-non in data stream applications. The new limitations of SQL, compounded with well-known problems in applications such as sequence queries and data mining, motivate our proposal of extending the language with user-defined aggregates (UDAs). These can be natively coded in SQL, according to simple syntactic rules that set nonblocking aggregates apart from blocking ones. We first prove that SQL with UDAs is Turing complete. We then prove that SQL with monotonic UDAs and union operators can express all monotonic set functions computable by a Turing machine (NB-completeness) and finally extend this result to queries on sequences ordered by their timestamps. The proposed approach supports data stream models that are more sophisticated than append-only relations, along with data mining queries, and other complex applications.
- Abiteboul, S., Hull, R., and Vianu, V. 1995. Foundations of Databases. Addison-Wesley. Google ScholarDigital Library
- Arasu, A., Babu, S., and Widom, J. 2003. CQL: A language for continuous queries over streams and relations. In Proceedings of the International Conference on Database Programming Languages (DBPL). 1--19.Google Scholar
- Arasu, A. and Widom, J. 2004. Resource sharing in continuous sliding-window aggregates. In Proceedings of the International Conference on Very large Database (VLDB). 336--347. Google ScholarDigital Library
- Babcock, B., Babu, S., Datar, M., Motawani, R., and Widom, J. 2002. Models and issues in data stream systems. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). Google ScholarDigital Library
- Bai, Y., Thakkar, H., Wang, H., Luo, C., and Zaniolo, C. 2006. A data stream language and system designed for power and extensibility. In Proceedings of the International Conference on Information and Knowledge Management (CIKM). 337--346. Google ScholarDigital Library
- Bai, Y., Thakkar, H., Wang, H., and Zaniolo, C. 2008. Timestamp management and query execution models in data stream management systems. IEEE Internet Comput. 12, 6, 13--21. Google ScholarDigital Library
- Barbara, D. 1999. The characterization of continuous queries. Intl. J. Coop. Inform. Syst. 8, 4, 295--323.Google ScholarCross Ref
- Bohlen, M. H. 1994. The temporal deductive database system chronolog. Ph.D. thesis, Department Informatick, ETH Zurich.Google Scholar
- Carney, D., Cetintemel, U., Cherniack, M., Convey, C., Lee, S., Seidman, G., Stonebraker, M., Tatbul, N., and Zdonik, S. 2002. Monitoring streams—a new class of data management applications. In Proceedings of the International Conference on Very large Databases (VLDB). Google ScholarDigital Library
- Celko, J. 1995. SQL for Smarties. Morgan Kaufmann, Chapter Advanced SQL Programming. Google ScholarDigital Library
- Chandrasekaran, S. and Franklin, M. 2002. Streaming queries over streaming data. In Proceedings of the International Conference on Very large Databases (VLDB). Google ScholarDigital Library
- Chen, J., DeWitt, D. J., Tian, F., and Wang, Y. 2000. NiagaraCQ: A scalable continuous query system for internet databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). 379--390. Google ScholarDigital Library
- Cranor, C., Gao, Y., Johnson, T., Shkapenyuk, V., and Spatscheck, O. 2002. Gigascope: High performance network monitoring with an SQL interface. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). ACM Press, 623. Google ScholarDigital Library
- Golab, L. and Özsu, M. T. 2003a. Issues in data stream management. ACM SIGMOD Record 32, 2, 5--14. Google ScholarDigital Library
- Golab, L. and Özsu, M. T. 2003b. Processing sliding window multi-joins in continuous queries over data streams. In Proceedings of the International Conference on Very large Databases (VLDB). 500--511. Google ScholarDigital Library
- Golab, L. and Özsu, M. T. 2005. Update-pattern-aware modeling and processing of continuous queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). 658--669. Google ScholarDigital Library
- Gurevich, Y., Leinders, D., and den Bussche, J. V. 2007. A theory of stream queries. In Proceedings of Database Programming Languages, 11th International Symposium (DBPL). M. Arenas and M. I. Schwartzbach, Eds. Springer, 153--168. Google ScholarDigital Library
- Gyllstrom, D., Agrawal, J., Diao, Y., and Immerman, N. 2008. On supporting kleene closure over event streams. In Proceedings of the International Conference on Data Engineering (ICDE). 1391--1393. Google ScholarDigital Library
- Han, J., Fu, Y., Wang, W., Koperski, K., and Zaiane, O. R. 1996. DMQL: A data mining query language for relational databases. In Proceedings of Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD). 27--33.Google Scholar
- Imielinski, T. and Virmani, A. 1999. MSQL: a query language for database mining. Data Mining Knowl. Discov. 3, 373--408. Google ScholarDigital Library
- ISO/IEC. 2003. Database languages—sql, iso/iec 9075-*:2003.Google Scholar
- Jagadish, H., Mumick, I., and Silberschatz, A. 1995. View maintenance issues for the chronicle data model. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). 113--124. Google ScholarDigital Library
- Johnson, T., Muthukrishnan, S., Shkapenyuk, V., and Spatscheck, O. 2005. A heartbeat mechanism and its application in gigascope. In Proceedings of the International Conference on Very large Databases (VLDB). 1079--1088. Google ScholarDigital Library
- Kang, J., Naughton, J. F., and Viglas, S. 2003. Evaluating window joins over unbounded streams. In Proceedings of the International Conference on Data Engineering (ICDE). 341--352.Google Scholar
- Krämer, J. and Seeger, B. 2005. A temporal foundation for continuous queries over data streams. In Proceedings of the International Conference on Management of Data (COMAD). 70--82.Google Scholar
- Krämer, J. and Seeger, B. 2009. Semantics and implementation of continuous sliding window queries over data streams. ACM Trans. Datab. Syst. 34, 1. Google ScholarDigital Library
- Krishnan, R. and Goldstein, J. 2010. Microsoft streaminsight: A hitchhiker's guide to Microsoft streaminsight queries. www.microsoft.com/download/8/9/C/89C4DB34-6AB9-4AA2-B2F5-4.Google Scholar
- Law, Y.-N., Wang, H., and Zaniolo, C. 2004. Data models and query language for data streams. In Proceedings of the International Conference on Very Large Databases (VLDB). 492--503. Google ScholarDigital Library
- Li, J., Maier, D., Tufte, K., Papadimos, V., and Tucke, P. A. 2005a. Semantics and evaluation techniques for window aggregates in data streams. In Proceedings of the ACM SIGMOD International Conference on Mangement of Data (SIGMOD). 311--322. Google ScholarDigital Library
- Li, J., Maier, D., Tufte, K., Papadimos, V., and Tucker, P. A. 2005. No pane, no gain: efficient evaluation of sliding-window aggregates over data streams. SIGMOD Record 34, 1, 39--44. Google ScholarDigital Library
- Liu, L., Pu, C., and Tang, W. 1999. Continual queries for internet scale event-driven information delivery. IEEE Trans. Knowl. Data Eng. 11, 4, 583--590. Google ScholarDigital Library
- Luo, C., Thakkar, H., Wang, H., and Zaniolo, C. 2005. A native extension of SQL for mining data streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). 873--875. Google ScholarDigital Library
- Madden, S., Shah, M. A., Hellerstein, J. M., and Raman, V. 2002. Continuously adaptive continuous queries over streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). 49--61. Google ScholarDigital Library
- Meo, R., Psaila, G., and Ceri, S. 1996. A new SQL-like operator for mining association rules. In Proceedings of the International Conference on Very large Databases (VLDB). 122--133. Google ScholarDigital Library
- Motwani, R., Widom, J., Arasu, A., Babcock, B., S. Babu, M. D., Manku, G., Olston, C., Rosenstein, J., and Varma, R. 2003. Query processing, approximation, and resource management in a data stream management system. In Proceedings of the 1st CIDR Conference.Google Scholar
- Mozafari, B., Zeng, K., and Zaniolo, C. 2010a. From regular expressions to nested words: Unifying languages and query execution for relational and xml sequences. Proc. VLDB 3, 1, 150--161. Google ScholarDigital Library
- Mozafari, B., Zeng, K., and Zaniolo, C. 2010b. From regular expressions to nested words: Unifying languages and query execution for relational and XML sequences. In Proceedings of the International Conference on Very large Database (VLDB). Google ScholarDigital Library
- Mozafari, B., Zeng, K., and Zaniolo, C. 2010c. K*SQL: a unifying engine for sequence patterns and XML. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). 1143--1146. Google ScholarDigital Library
- Perng, C.-S. and Parker, D. S. 1999. SQL/LPP: A time series extension of SQL based on limited patience patterns. In Proceedings of the International Conference on Database and Expert Systems (DEXA). Lecture Notes in Computer Science Series, vol. 1677. Springer. Google ScholarDigital Library
- Ramakrishnan, R., Donjerkovic, D., Ranganathan, A., Beyer, K., and Krishnaprasad, M. 1998. SRQL: Sorted relational query language. In Proceedings of the International Conference on Scientific and Statistical Database Management (SSBM). Google ScholarDigital Library
- Rozenshtein, D., Abramovich, A., and Birger, E. 1993. Loop-free SQL solutions for finding continuous regions. SQL Forum 2, 6.Google Scholar
- Sadri, R., Zaniolo, C., Zarkesh, A., and Adibi, J. 2001a. Optimization of sequence queries in database systems. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). Google ScholarDigital Library
- Sadri, R., Zaniolo, C., Zarkesh, A. M., and Adibi, J. 2001b. A sequential pattern query language for supporting instant data minining for e-services. In Proceedings of the International Conference on Very large Databases (VLDB). 653--656. Google ScholarDigital Library
- Sadri, R., Zaniolo, C., Zarkesh, A. M., and Adibi, J. 2004. Expressing and optimizing sequence queries in database systems. ACM Trans. Datab. Syst. 29, 2, 282--318. Google ScholarDigital Library
- Sarawagi, S., Thomas, S., and Agrawal, R. 1998. Integrating association rule mining with relational database systems: Alternatives and implications. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). Google ScholarDigital Library
- Seshadri, P. 1998. Predator: A resource for database research. SIGMOD Record 27, 1, 16--20. Google ScholarDigital Library
- Seshadri, P., Livny, M., and Ramakrishnan, R. 1994. Sequence query processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). R. T. Snodgrass and M. Winslett, Eds. ACM Press, 430--441. Google ScholarDigital Library
- Soulé, R., Hirzel, M., Grimm, R., Gedik, B., Andrade, H., Kumar, V., and Wu, K.-L. 2010. A universal calculus for stream processing languages. In Proceedings of the Conference on Programming Languages and Systems (ESOP). A. D. Gordon, Ed. Springer, 507--528. Google ScholarDigital Library
- Srivastava, U. and Widom, J. 2004. Flexible time management in data stream systems. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Databases Systems (PODS). 263--274. Google ScholarDigital Library
- Sullivan, M. 1996. Tribeca: A stream database manager for network traffic analysis. In Proceedings of the International Conference on Very large Database (VLDB). Google ScholarDigital Library
- Technologies, I. I. 1994. Illustra user guide.Google Scholar
- Terry, D., Goldberg, D., Nichols, D., and Oki, B. 1992. Continuous queries over append-only databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). 321--330. Google ScholarDigital Library
- Thakkar, H., Laptev, N., Mousavi, H., Mozafari, B., Russo, V., and Zaniolo, C. 2011. SMM: a data stream management system for knowledge discovery. In Proceedings of the International Conference on Data Engineering (ICDE) 27, 1. Google ScholarDigital Library
- Tucker P., Maier, D., and Sheard T. 2003a. Applying punctuation schemes to queries over continuous data streams. IEEE Data Eng. Bull. 26, 1, 33--40.Google Scholar
- Tucker, P. A., Maier, D., Sheard, T., and Fegaras, L. 2003b. Exploiting punctuation semantics in continuous data streams. IEEE Trans. Knowl. Data Eng. 15, 3, 555--568. Google ScholarDigital Library
- Wang, H. and Zaniolo, C. 2003. ATLaS: a native extension of SQL for data minining. In Proceedings of the 3rd SIAM International Conference on Data Mining. 130--141.Google Scholar
- Wu, E., Diao, Y., and Rizvi, S. 2006. High-performance complex event processing over streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). 407--418. Google ScholarDigital Library
- Zaniolo, C., Ceri, S., Faloutsos, C., Snodgrass, R., Subrahmanian, V. S., and Zicari, R. 1997. Advanced Database Systems. Morgan Kaufmann. Google ScholarDigital Library
- Zemke, F., Witkowski, A., Cherniak, M., and Colby, L. 2007. Pattern matching in sequences of rows. In http://asktom.oracle.com/tkyte/row-patternrecogniton-11-public.pdf http://www.sqlsnippets.com/en/topic-12162.html.Google Scholar
Recommendations
Semantics and implementation of continuous sliding window queries over data streams
In recent years the processing of continuous queries over potentially infinite data streams has attracted a lot of research attention. We observed that the majority of work addresses individual stream operations and system-related issues rather than the ...
Continuous query processing in data streams using duality of data and queries
SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of dataRecent data stream systems such as TelegraphCQ have employed the well-known property of duality between data and queries. In these systems, query processing methods are classified into two dual categories -- data-initiative and query-initiative -- ...
Query languages and data models for database sequences and data streams
VLDB '04: Proceedings of the Thirtieth international conference on Very large data bases - Volume 30We study the fundamental limitations of relational algebra (RA) and SQL in supporting sequence and stream queries, and present effective query language and data model enrichments to deal with them. We begin by observing the well-known limitations of SQL ...
Comments