Abstract
While Complex Event Processing (CEP) constitutes a considerable portion of the so-called Big Data analytics, current CEP systems can only process data having a simple structure, and are otherwise limited in their ability to efficiently support complex continuous queries on structured or semistructured information. However, XML-like streams represent a very popular form of data exchange, comprising large portions of social network and RSS feeds, financial feeds, configuration files, and similar applications requiring advanced CEP queries. In this article, we present the XSeq language and system that support CEP on XML streams, via an extension of XPath that is both powerful and amenable to an efficient implementation. Specifically, the XSeq language extends XPath with natural operators to express sequential and Kleene-* patterns over XML streams, while remaining highly amenable to efficient execution. In fact, XSeq is designed to take full advantage of the recently proposed Visibly Pushdown Automata (VPA), where higher expressive power can be achieved without compromising the computationally attractive properties of finite state automata. Besides the efficiency and expressivity benefits, the choice of VPA as the underlying model also enables XSeq to go beyond XML streams and be easily applicable to any data with both sequential and hierarchical structures, including JSON messages, RNA sequences, and software traces. Therefore, we illustrate the XSeq's power for CEP applications through examples from different domains and provide formal results on its expressiveness and complexity. Finally, we present several optimization techniques for XSeq queries. Our extensive experiments indicate that XSeq brings outstanding performance to CEP applications: two orders of magnitude improvement is obtained over the same queries executed in general-purpose XML engines.
- Alexander, M., Fawcett, J., and Runciman, P. 2000. Nursing Practice: Hospital and Home: The Adult 2nd Ed. Churchill Livingstone.Google Scholar
- Alur, R. and Madhusudan, P. 2004. Visibly pushdown languages. In Proceedings of the Symposium on Theory of Computing (STOC'04). Google ScholarDigital Library
- Alur, R. and Madhusudan, P. 2006. Adding nesting structure to words. In Proceedings of the 10th International Conference on Developments in Language Theory (DLT'06). Lecture Notes in Computer Science, vol. 4036, Springer, 1--13. Google ScholarDigital Library
- Amagasa, T., Yoshikawa, M., and Uemura, S. 2000. A data model for temporal xml documents. In Proceedings of the 11th International Conference on Database and Expert Systems Applications (DEXA'00). 334--344. Google ScholarDigital Library
- Bamford, R., Borkar, V., Branter, M., Fischer, P. M., Florescu, D., et al. 2009. Xquery reloaded. Proc. VLDB Endow. 2, 2, 1342--1353. Google ScholarDigital Library
- Barton, C., Charles, P., Goyal, D., Raghavachari, M., Fontoura, M., et al. 2003. Streaming xpath processing with forward and backward axes. In Proceedings of the 19th International Conference on Data Engineering (ICDE'03). 455--466.Google ScholarCross Ref
- Boncz, P., Gurst, T., van Keulen, M., Manegold, S., Rittinger, J., et al. 2006. Monetdb/xquery: A fast xquery processor powered by a relational engine. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'06). Google ScholarDigital Library
- Brenna, L., Gehrke, J., Hong, M., and Johansen, D. 2009. Distributed event stream processing with non-deterministic finite automata. In Proceedings of the 3rd International Conference on Distributed Event-Based Systems (DEBS'09). Google ScholarDigital Library
- Cate, B. T. and Lutz, C. 2009. The complexity of query containment in expressive fragments of xpath 2.0. J. ACM 56, 6. Google ScholarDigital Library
- Chen, Y., Davidson, S. B., and Zheng, Y. 2006. An efficient xpath query processor for xml streams. In Proceedings of the 22nd International Conference on Data Engineering (ICDE'06). Google ScholarDigital Library
- Diao, Y., Altinel, M., Franklin, M. J., Zhang, H., and Fischer, P. 2003. Path sharing and predicate evaluation for high-performance xml filtering. ACM Trans. Datab. Syst. 28,4. Google ScholarDigital Library
- Florescu, D., Hilliery, C., Kossman, D., Lucas, P., Riccardi, F., et al. 2003. The bea/xqrl streaming xquery processor. In Proceedings of the 29th International Conference on Very Large Databases (VLDB'03). Google ScholarDigital Library
- Furche, T., Gottlob, G., Grasso, G., Schallhart, C., and Sellers, A. J. 2011. Oxpath: A language for scalable, memory-efficient data extraction from web applications. Proc. VLDB Endow. 4, 11.Google Scholar
- Gauwin, O. and Niehren, J. 2011. Streamable fragments of forward xpath. In Proceedings of the 16th International Conference on Implementation and Application of Automata (CIAA'11). 3--15. Google ScholarDigital Library
- Gauwin, O., Niehren, J., and Tison, S. 2011. Queries on xml streams with bounded delay and concurrency. Inf. Comput. 209, 3, 409--442. Google ScholarDigital Library
- Josifovski, V., Fontoura, M., and Barta, A. 2005. Querying xml streams. VLDB J. 14,2. Google ScholarDigital Library
- Kay, M. 2008. Ten reasons why saxon xquery is fast. IEEE Data Engin. Bull. 31, 4.Google Scholar
- Knuth, D. E., Morris, J. H. Jr., and Pratt, V. R. 1977. Fast pattern matching in strings. SIAM J. Comput. 6, 1, 323--350.Google ScholarDigital Library
- Koch, C. 2009. Xml stream processing. In Encyclopedia of Database Systems, Springer, 3634--3637Google Scholar
- Laptev, N. and Zaniolo, C. 2012. Optimization of massive pattern queries by dynamic configuration morphing. In Proceedings of the 28th International Conference on Data Engineering (ICDE'12). 917--928. Google ScholarDigital Library
- Luckham, D. C. 2001. The Power of Events: An Introduction to Complex Event Processing in Distributed Enterprise Systems. Addison-Wesley. Google ScholarDigital Library
- Madhusudan, P. and Viswanathan, M. 2009. Query automata for nested words. In Proceedings of the 34th International Symposium on Mathematical Foundations of Computer Science (MFCS'09). Lecture Notes in Computer Science, vol. 5734, Springer, 561--573. Google ScholarDigital Library
- Marx, M. 2005. Conditional xpath. Trans. Datab. Syst. 30, 4. Google ScholarDigital Library
- Mozafari, B., Zeng, K., and Zaniolo, C. 2010a. From regular expressions to nested words: Unifying languages and query execution for relational and xml sequences. Proc. VLDB Endow. 3, 1. Google ScholarDigital Library
- Mozafari, B., Zeng, K., and Zaniolo, C. 2010b. K*sql: A unifying engine for sequence patterns and xml. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'10). 1143--1146. Google ScholarDigital Library
- Mozafari, B., Zeng, K., and Zaniolo, C. 2012. High-performance complex event processing over xml streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'12). 253--264. Google ScholarDigital Library
- Olteanu, D., Kiesling, T., and Bry, F. 2003. An evaluation of regular path expressions with qualifiers against xml streams. In Proceedings of the 19th International Conference on Data Engineering (ICDE'03). 702--704.Google Scholar
- Peng, F. and Chawathe, S. S. 2003. Xpath queries on streaming data. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'03). Google ScholarDigital Library
- Pitcher, C. 2005. Visibly pushdown expression effects for xml stream processing. In Proceedings of the Workshop on Programming Language Technologies for XML (PLAN-X'05).Google Scholar
- Schmidt, A., Wass, F., Kersten, M., Carey, M. J., Manoescu, I., and Busse, R. 2002. Xmark: A benchmark for xml data management. In Proceedings of the 28th International Conference on Very Large Data Bases (VLDB'02). 974--985. Google ScholarDigital Library
- Snodgrass, R. T. 2009. Tsql2. In Encyclopedia of Database Systems, Springer, 3192--3197.Google Scholar
- Stromback, L. and Schmidt, S. 2009. An extension of xquery for graph analysis of biological pathways. In Proceedings of the 1st International Conference on Advances on Databases, Knowledge, and Data Applications (DBKDA'09). 22--27. Google ScholarDigital Library
- Tang, N. V. 2009. A tighter bound for the determinization of visibly pushdown automata. In Proceedings of the International Workshop on Verification of Infinite-State Systems (INFINITY'09). 62--76.Google Scholar
- ten Cate, B. 2006. The expressivity of xpath with transitive closure. In Proceedings of the 25th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS'06). 328-337. Google ScholarDigital Library
- ten Cate, B. and Marx, M. 2007a. Axiomatizing the logical core of xpath 2.0. In Proceedings of the 11th International Conference on Database Theory (ICDT'07). Google ScholarDigital Library
- ten Cate, B. and Marx, M. 2007b. Navigational xpath: calculus and algebra. SIGMOD Rec. 36, 2. Google ScholarDigital Library
- ten Cate, B. and Segoufin, L. 2008. Xpath, transitive closure logic, and nested tree walking automata. In Proceedings of the 27th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS'08). 251--260. Google ScholarDigital Library
- Vagena, Z., Moro, M. M., and Tsotras, V. J. 2007. Roxsum: Leveraging data aggregation and batch processing for xml routing. In Proceedings of the 23rd IEEE International Conference on Data Engineering (ICDE'07). 1466--1470.Google Scholar
- Wang, F., Zaniolo, C., and Zhou, X. 2008. Archis: an xml-based approach to transaction-time temporal database systems. VLDB J. 17, 6, 1445--1463. Google ScholarDigital Library
- Wu, E., Diao, Y., and Rizvi, S. 2006. High-performance complex event processing over streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'06). 407--418. Google ScholarDigital Library
- Zaniolo, C. 2009. Event-oriented data models and temporal queries in transaction-time databases. In Proceedings of the 16th International Symposium on Temporal Representation and Reasoning (TIME'09). 47--53. Google ScholarDigital Library
- Zeng, K., Yang, M., Mozafari, B., and Zaniolo, C. 2013. Complex pattern matching in complex structures: The xseq approach. In Proceedings of the IEEE International Conference on Data Engineering (ICDE'13). 1328--1331. Google ScholarDigital Library
Index Terms
- High-performance complex event processing over hierarchical data
Recommendations
High-performance complex event processing over XML streams
SIGMOD '12: Proceedings of the 2012 ACM SIGMOD International Conference on Management of DataMuch research attention has been given to delivering high-performance systems that are capable of complex event processing (CEP) in a wide range of applications. However, many current CEP systems focus on processing efficiently data having a simple ...
Refactoring legacy AJAX applications to improve the efficiency of the data exchange component
The AJAX paradigm encodes data exchange XML formats. Recently, JSON has also become a popular data exchange format. XML has numerous benefits including human-readable structures and self-describing data. However, JSON provides significant performance ...
Better than XML: Towards a lexicographic markup language
AbstractThis article takes a critical look at how XML is used in lexicography and asks the question, why do dictionary entries often end up looking so complex when encoded in XML? The main reason for the perceived complexity of XML-encoded dictionaries ...
Highlights- Dictionaries encoded in XML are unnecessarily verbose and complex due to overuse of purely structural markup.
- Much data in lexicography is inherently headed, but headedness is difficult to represent in XML.
- JSON and YAML are no ...
Comments