skip to main content
research-article

High-performance complex event processing over hierarchical data

Published:04 December 2013Publication History
Skip Abstract Section

Abstract

While Complex Event Processing (CEP) constitutes a considerable portion of the so-called Big Data analytics, current CEP systems can only process data having a simple structure, and are otherwise limited in their ability to efficiently support complex continuous queries on structured or semistructured information. However, XML-like streams represent a very popular form of data exchange, comprising large portions of social network and RSS feeds, financial feeds, configuration files, and similar applications requiring advanced CEP queries. In this article, we present the XSeq language and system that support CEP on XML streams, via an extension of XPath that is both powerful and amenable to an efficient implementation. Specifically, the XSeq language extends XPath with natural operators to express sequential and Kleene-* patterns over XML streams, while remaining highly amenable to efficient execution. In fact, XSeq is designed to take full advantage of the recently proposed Visibly Pushdown Automata (VPA), where higher expressive power can be achieved without compromising the computationally attractive properties of finite state automata. Besides the efficiency and expressivity benefits, the choice of VPA as the underlying model also enables XSeq to go beyond XML streams and be easily applicable to any data with both sequential and hierarchical structures, including JSON messages, RNA sequences, and software traces. Therefore, we illustrate the XSeq's power for CEP applications through examples from different domains and provide formal results on its expressiveness and complexity. Finally, we present several optimization techniques for XSeq queries. Our extensive experiments indicate that XSeq brings outstanding performance to CEP applications: two orders of magnitude improvement is obtained over the same queries executed in general-purpose XML engines.

References

  1. Alexander, M., Fawcett, J., and Runciman, P. 2000. Nursing Practice: Hospital and Home: The Adult 2nd Ed. Churchill Livingstone.Google ScholarGoogle Scholar
  2. Alur, R. and Madhusudan, P. 2004. Visibly pushdown languages. In Proceedings of the Symposium on Theory of Computing (STOC'04). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Alur, R. and Madhusudan, P. 2006. Adding nesting structure to words. In Proceedings of the 10th International Conference on Developments in Language Theory (DLT'06). Lecture Notes in Computer Science, vol. 4036, Springer, 1--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Amagasa, T., Yoshikawa, M., and Uemura, S. 2000. A data model for temporal xml documents. In Proceedings of the 11th International Conference on Database and Expert Systems Applications (DEXA'00). 334--344. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bamford, R., Borkar, V., Branter, M., Fischer, P. M., Florescu, D., et al. 2009. Xquery reloaded. Proc. VLDB Endow. 2, 2, 1342--1353. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Barton, C., Charles, P., Goyal, D., Raghavachari, M., Fontoura, M., et al. 2003. Streaming xpath processing with forward and backward axes. In Proceedings of the 19th International Conference on Data Engineering (ICDE'03). 455--466.Google ScholarGoogle ScholarCross RefCross Ref
  7. Boncz, P., Gurst, T., van Keulen, M., Manegold, S., Rittinger, J., et al. 2006. Monetdb/xquery: A fast xquery processor powered by a relational engine. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Brenna, L., Gehrke, J., Hong, M., and Johansen, D. 2009. Distributed event stream processing with non-deterministic finite automata. In Proceedings of the 3rd International Conference on Distributed Event-Based Systems (DEBS'09). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Cate, B. T. and Lutz, C. 2009. The complexity of query containment in expressive fragments of xpath 2.0. J. ACM 56, 6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Chen, Y., Davidson, S. B., and Zheng, Y. 2006. An efficient xpath query processor for xml streams. In Proceedings of the 22nd International Conference on Data Engineering (ICDE'06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Diao, Y., Altinel, M., Franklin, M. J., Zhang, H., and Fischer, P. 2003. Path sharing and predicate evaluation for high-performance xml filtering. ACM Trans. Datab. Syst. 28,4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Florescu, D., Hilliery, C., Kossman, D., Lucas, P., Riccardi, F., et al. 2003. The bea/xqrl streaming xquery processor. In Proceedings of the 29th International Conference on Very Large Databases (VLDB'03). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Furche, T., Gottlob, G., Grasso, G., Schallhart, C., and Sellers, A. J. 2011. Oxpath: A language for scalable, memory-efficient data extraction from web applications. Proc. VLDB Endow. 4, 11.Google ScholarGoogle Scholar
  14. Gauwin, O. and Niehren, J. 2011. Streamable fragments of forward xpath. In Proceedings of the 16th International Conference on Implementation and Application of Automata (CIAA'11). 3--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Gauwin, O., Niehren, J., and Tison, S. 2011. Queries on xml streams with bounded delay and concurrency. Inf. Comput. 209, 3, 409--442. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Josifovski, V., Fontoura, M., and Barta, A. 2005. Querying xml streams. VLDB J. 14,2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Kay, M. 2008. Ten reasons why saxon xquery is fast. IEEE Data Engin. Bull. 31, 4.Google ScholarGoogle Scholar
  18. Knuth, D. E., Morris, J. H. Jr., and Pratt, V. R. 1977. Fast pattern matching in strings. SIAM J. Comput. 6, 1, 323--350.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Koch, C. 2009. Xml stream processing. In Encyclopedia of Database Systems, Springer, 3634--3637Google ScholarGoogle Scholar
  20. Laptev, N. and Zaniolo, C. 2012. Optimization of massive pattern queries by dynamic configuration morphing. In Proceedings of the 28th International Conference on Data Engineering (ICDE'12). 917--928. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Luckham, D. C. 2001. The Power of Events: An Introduction to Complex Event Processing in Distributed Enterprise Systems. Addison-Wesley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Madhusudan, P. and Viswanathan, M. 2009. Query automata for nested words. In Proceedings of the 34th International Symposium on Mathematical Foundations of Computer Science (MFCS'09). Lecture Notes in Computer Science, vol. 5734, Springer, 561--573. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Marx, M. 2005. Conditional xpath. Trans. Datab. Syst. 30, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Mozafari, B., Zeng, K., and Zaniolo, C. 2010a. From regular expressions to nested words: Unifying languages and query execution for relational and xml sequences. Proc. VLDB Endow. 3, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Mozafari, B., Zeng, K., and Zaniolo, C. 2010b. K*sql: A unifying engine for sequence patterns and xml. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'10). 1143--1146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Mozafari, B., Zeng, K., and Zaniolo, C. 2012. High-performance complex event processing over xml streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'12). 253--264. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Olteanu, D., Kiesling, T., and Bry, F. 2003. An evaluation of regular path expressions with qualifiers against xml streams. In Proceedings of the 19th International Conference on Data Engineering (ICDE'03). 702--704.Google ScholarGoogle Scholar
  28. Peng, F. and Chawathe, S. S. 2003. Xpath queries on streaming data. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'03). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Pitcher, C. 2005. Visibly pushdown expression effects for xml stream processing. In Proceedings of the Workshop on Programming Language Technologies for XML (PLAN-X'05).Google ScholarGoogle Scholar
  30. Schmidt, A., Wass, F., Kersten, M., Carey, M. J., Manoescu, I., and Busse, R. 2002. Xmark: A benchmark for xml data management. In Proceedings of the 28th International Conference on Very Large Data Bases (VLDB'02). 974--985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Snodgrass, R. T. 2009. Tsql2. In Encyclopedia of Database Systems, Springer, 3192--3197.Google ScholarGoogle Scholar
  32. Stromback, L. and Schmidt, S. 2009. An extension of xquery for graph analysis of biological pathways. In Proceedings of the 1st International Conference on Advances on Databases, Knowledge, and Data Applications (DBKDA'09). 22--27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Tang, N. V. 2009. A tighter bound for the determinization of visibly pushdown automata. In Proceedings of the International Workshop on Verification of Infinite-State Systems (INFINITY'09). 62--76.Google ScholarGoogle Scholar
  34. ten Cate, B. 2006. The expressivity of xpath with transitive closure. In Proceedings of the 25th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS'06). 328-337. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. ten Cate, B. and Marx, M. 2007a. Axiomatizing the logical core of xpath 2.0. In Proceedings of the 11th International Conference on Database Theory (ICDT'07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. ten Cate, B. and Marx, M. 2007b. Navigational xpath: calculus and algebra. SIGMOD Rec. 36, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. ten Cate, B. and Segoufin, L. 2008. Xpath, transitive closure logic, and nested tree walking automata. In Proceedings of the 27th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS'08). 251--260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Vagena, Z., Moro, M. M., and Tsotras, V. J. 2007. Roxsum: Leveraging data aggregation and batch processing for xml routing. In Proceedings of the 23rd IEEE International Conference on Data Engineering (ICDE'07). 1466--1470.Google ScholarGoogle Scholar
  39. Wang, F., Zaniolo, C., and Zhou, X. 2008. Archis: an xml-based approach to transaction-time temporal database systems. VLDB J. 17, 6, 1445--1463. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Wu, E., Diao, Y., and Rizvi, S. 2006. High-performance complex event processing over streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'06). 407--418. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Zaniolo, C. 2009. Event-oriented data models and temporal queries in transaction-time databases. In Proceedings of the 16th International Symposium on Temporal Representation and Reasoning (TIME'09). 47--53. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Zeng, K., Yang, M., Mozafari, B., and Zaniolo, C. 2013. Complex pattern matching in complex structures: The xseq approach. In Proceedings of the IEEE International Conference on Data Engineering (ICDE'13). 1328--1331. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. High-performance complex event processing over hierarchical data

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Database Systems
        ACM Transactions on Database Systems  Volume 38, Issue 4
        Invited papers issue
        November 2013
        294 pages
        ISSN:0362-5915
        EISSN:1557-4644
        DOI:10.1145/2539032
        Issue’s Table of Contents

        Copyright © 2013 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 4 December 2013
        • Accepted: 1 February 2013
        • Received: 1 October 2012
        Published in tods Volume 38, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader