ABSTRACT
The nature of data in enterprises and on the Internet is changing. Data used to be stored in a database first and queried later. Today timely processing of new data, represented as events, is increasingly valuable. In many domains, complex event processing (CEP) systems detect patterns of events for decision making. Examples include processing of environmental sensor data, trades in financial markets and RSS web feeds. Unlike conventional database systems, most current CEP systems pay little attention to query optimisation. They do not rewrite queries to more efficient representations or make decisions about operator distribution, limiting their overall scalability.
This paper describes the NEXT CEP system that was especially designed for query rewriting and distribution. Event patterns are specified in a high-level query language and, before being translated into event automata, are rewritten in a more efficient form. Automata are then distributed across a cluster of machines for detection scalability. We present algorithms for query rewriting and distributed placement. Our experiments on the Emulab test-bed show a significant improvement in system scalability due to rewriting and distribution.
- D. Abadi, D. Camey, U. Cetintemel, M. Chemiack, et al. Aurora: A New Model and Architecture for Data Stream Management. VLDB Journal, 2003. Google ScholarDigital Library
- D. J. Abadi, Y. Ahmad, M. Balazinska, U. Cetintemel, et al. The Design of the Borealis Stream Processing Engine. In CIDR, Asilomar, CA, January 2005.Google Scholar
- Y. Ahmad and U. Çetintemel. Network-Aware Query Processing for Stream-based App. In VLDB, 2004. Google ScholarDigital Library
- M. Akdere, U. Çetintemel, and N. Tatbul. Plan-based Complex Event Detection across Distributed Event Sources. In VLDB, 2008. Google ScholarDigital Library
- A. Arasu, B. Babcock, S. Babu, J. Cieslewicz, et al. STREAM: The Stanford Data Stream Management System, http://infolab.stanford.edu/stream, 2004.Google Scholar
- A. Arasu, S. Babu, and J. Widom. The CQL Continuous Query Language: Semantic Foundations and Query Execution. VLDB Journal, 15(2), 2006. Google ScholarDigital Library
- M. Balazinska, H. Balakrishnan, and M. Stonebraker. Contract-Based Load Management in Federated Distributed Systems. In Proc. of NSDI'04, San Francisco, CA, Mar. 2004. Google ScholarDigital Library
- D. Carney, U. Çetintemel, M. Cherniack, C. Convey, et al. Monitoring Streams - A New Class of Data Management Applications. In VLDB, 2002. Google ScholarDigital Library
- S. Chakravarthy and D. Mishra. Snoop: an Expressive Event Specification Language for Active Databases. Data Knowledge Engineering, 14(1):1--26, 1994. Google ScholarDigital Library
- S. Chandrasekaran et al. TelegraphCQ: Continuous Dataflow Processing for an Uncertain World. In CIDR, 2003.Google Scholar
- M. Cherniack, H. Balakrishnan, M. Balazinska, D. Carney, et al. Scalable Distributed Stream Processing. In CIDR, Asilomar, CA, January 2003.Google Scholar
- T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. The MIT Press and McGraw-HilDyanl, 2nd edition, 2003. Google ScholarDigital Library
- A. Demers, J. Gehrke, M. Hong, B. Panda, et al. Towards Expressive Publish/Subscribe Systems. In EDBT, 2006. Google ScholarDigital Library
- A. Demers, J. Gehrke, M. Hong, B. Panda, et al. Cayuga: A Genaral Purpose Event Monitoring System. In CIDR, pages 412--422, 2007.Google Scholar
- Emulab's homepage. http://www.emulab.net/.Google Scholar
- R. S. Epstein and M. Stonebraker. Analysis of Distributed Data Base Processing Strategies. In VLDB, pages 92--101, 1980. Google ScholarDigital Library
- Esper's homepage. http://esper.codehaus.org.Google Scholar
- Federal Bureau of Investigation. Financial Crimes Report to the Public Fiscal Year 2007. Technical report, Federal Bureau of Investigation, 2007.Google Scholar
- L. Fiege, G. Muhl, and P. R. Pietzuch. Distributed Event-based Systems. Springer-Verlag Berlin and Heidelberg GmbH&Co. K, 2006. Google ScholarDigital Library
- S. Gatziu and K. R. Dittrich. Detecting Composite Events in Active Database Systems Using Petri Nets. In RIDE-ADS, pages 2--9, 1994.Google ScholarCross Ref
- J. E. Hopcroft, R. Motwani, Rotwani, and J. D. Ullman. Introduction to Automata Theory, Languages and Computability. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2000. Google ScholarDigital Library
- L. Lamport. Time, Clocks, and the Ordering of Events in a Distributed System. Commun. ACM, 21(7):558--565, 1978. Google ScholarDigital Library
- P. Pietzuch, J. Ledlie, J. Shneidman, M. Roussopoulos, et al. Network-Aware Operator Placement for Stream-Processing Sys. In ICDE, 2006. Google ScholarDigital Library
- P. R. Pietzuch, B. Shand, and J. Bacon. A Framework for Event Composition in Distributed Systems. In Middleware, Rio de Janeiro, Brazil, jun 2003. Google ScholarDigital Library
- S. Seshadri, V. Kumar, and B. F. Cooper. Optimizing Multiple Queries in Distributed Data Stream Systems. In NetDB, page 25, 2006. Google ScholarDigital Library
- Visa Inc. Operational Performance Data. http://corporate.visa.com/md/st/, 2008.Google Scholar
- Walker White and Mirek Riedewald and Johannes Gehrke and Alan Demers. What is "Next" in Event Processing? In SIGMOD-SIGACT-SIGART, 2007. Google ScholarDigital Library
- Y. Xing, S. Zdonik, and J.-H. Hwang. Dynamic Load Distribution in the Borealis Stream Processor. In ICDE, pages 791--802, 2005. Google ScholarDigital Library
- M. T. Özsu and P. Valduriez. Principles of Distributed Databases. Prentice-Hall Inc., 2nd edition, 1999. Google ScholarDigital Library
Index Terms
- Distributed complex event processing with query rewriting
Recommendations
Distributed stream join query processing with semijoins
This paper addresses the distributed stream processing of window-based multi-way join queries considering the semijoin as a key join operator. In distributed stream processing, data streams arriving at remote sites need to be shipped to the processing ...
Combining Joint and Semi-Join Operations for Distributed Query Processing
The application of a combination of join and semi-join operations to minimize the amount of data transmission required for distributed query processing is discussed. Specifically, two important concepts that occur with the use of join operations as ...
Generating query plans for distributed query processing using genetic algorithm
ICICA'11: Proceedings of the Second international conference on Information Computing and ApplicationsQuery Processing is a key determinant in the overall performance of distributed databases. It requires processing of data at their respective sites and transmission of the same between them. These together constitute a distributed query processing ...
Comments