skip to main content
research-article

Large-Scale Frequent Episode Mining from Complex Event Sequences with Hierarchies

Authors Info & Claims
Published:20 July 2019Publication History
Skip Abstract Section

Abstract

Frequent Episode Mining (FEM), which aims at mining frequent sub-sequences from a single long event sequence, is one of the essential building blocks for the sequence mining research field. Existing studies about FEM suffer from unsatisfied scalability when faced with complex sequences as it is an NP-complete problem for testing whether an episode occurs in a sequence. In this article, we propose a scalable, distributed framework to support FEM on “big” event sequences. As a rule of thumb, “big” illustrates an event sequence is either very long or with masses of simultaneous events. Meanwhile, the events in this article are arranged in a predefined hierarchy. It derives some abstractive events that can form episodes that may not directly appear in the input sequence. Specifically, we devise an event-centered and hierarchy-aware partitioning strategy to allocate events from different levels of the hierarchy into local processes. We then present an efficient special-purpose algorithm to improve the local mining performance. We also extend our framework to support maximal and closed episode mining in the context of event hierarchy, and to the best of our knowledge, we are the first attempt to define and discover hierarchy-aware maximal and closed episodes. We implement the proposed framework on Apache Spark and conduct experiments on both synthetic and real-world datasets. Experimental results demonstrate the efficiency and scalability of the proposed approach and show that we can find practical patterns when taking event hierarchies into account.

References

  1. Avinash Achar, A. Ibrahim, and P. S. Sastry. 2013. Pattern-growth based frequent serial episode discovery. DKE 87 (2013), 91--108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Avinash Achar, Srivatsan Laxman, and P. S. Sastry. 2012. A unified view of the a priori-based algorithms for frequent episode discovery. KAIS 31, 2 (2012), 223--250.Google ScholarGoogle Scholar
  3. Avinash Achar, Srivatsan Laxman, Raajay Viswanathan, and P. S. Sastry. 2012. Discovering injective episodes with general partial orders. DMKD 25, 1 (2012), 67--108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Xiang Ao, Yang Liu, Zhen Huang, Luo Zuo, and Qing He. 2018. Free-rider episode screening via dual partition model. In DASFAA. 665--683.Google ScholarGoogle Scholar
  5. Xiang Ao, Ping Luo, Chengkai Li, Fuzhen Zhuang, and Qing He. 2015. Online frequent episode mining. In ICDE. 891--902.Google ScholarGoogle Scholar
  6. Xiang Ao, Ping Luo, Chengkai Li, Fuzhen Zhuang, Qing He, and Zhongzhi Shi. 2018. Discovering and learning sensational episodes of news events. Information Systems 78 (2018), 68--80.Google ScholarGoogle ScholarCross RefCross Ref
  7. Xiang Ao, Ping Luo, Jin Wang, Fuzhen Zhuang, and Qing He. 2018. Mining precise-positioning episode rules from event sequences. IEEE TKDE 30, 3 (2018), 530--543.Google ScholarGoogle Scholar
  8. Mikhail Atallah, Wojciech Szpankowski, and R. Gwadera. 2004. Detection of significant sets of episodes in event sequences. In ICDM. 3--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Marina Barsky, Sangkyum Kim, Tim Weninger, and Jiawei Han. 2011. Mining flipping correlations from large datasets with taxonomies. VLDB 5, 4 (2011), 370--381. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Kaustubh Beedkar and Rainer Gemulla. 2015. LASH: Large-scale sequence mining with hierarchies. In SIGMOD. 491--503. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Bouchra Bouqata, Christopher D. Carothers, Boleslaw K. Szymanski, and Mohammed J. Zaki. 2006. Vogue: A novel variable order-gap state machine for modeling sequences. In PKDD. 42--54.Google ScholarGoogle Scholar
  12. Alexandra M. Carvalho, Arlindo L. Oliveira, Ana T. Freitas, and Marie-France Sagot. 2004. A parallel algorithm for the extraction of structured motifs. In SAC. 147--153. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Gemma Casas-Garriga. 2003. Discovering unbounded episodes in sequential data. In PKDD. 83--94.Google ScholarGoogle Scholar
  14. Shengnan Cong, Jiawei Han, Jay Hoeflinger, and David Padua. 2005. A sampling-based framework for parallel data mining. In PPoPP. 255--265. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Shengnan Cong, Jiawei Han, and David Padua. 2005. Parallel mining of closed sequential patterns. In KDD. 562--567. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Lina Fahed, Armelle Brun, and Anne Boyer. 2018. DEER: Distant and essential episode rules for early prediction. ESWA 93 (2018), 283--298. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Robert R. Grauer, Nils H. Hakansson, and Frederick C. Shen. 1990. Industry rotation in the US stock market: 1934--1986 returns on passive, semi-passive, and active strategies. Journal of Banking 8 Finance (1990).Google ScholarGoogle Scholar
  18. Jiaqi Gu, Jin Wang, and Carlo Zaniolo. 2016. Ranking support for matched patterns over complex event streams: The CEPR system. In ICDE. 1354--1357.Google ScholarGoogle Scholar
  19. Jiawei Han and Yongjian Fu. 1995. Discovery of multiple-level association rules from large databases. In VLDB, Vol. 95. 420--431. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Jiawei Han, Jian Pei, Yiwen Yin, and Runying Mao. 2004. Mining frequent patterns without candidate generation: A frequent-pattern tree approach. DMKD 8, 1 (2004), 53--87. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Kuo-Yu Huang and Chia-Hui Chang. 2008. Efficient mining of frequent episodes from complex sequences. Information Systems 33, 1 (2008), 96--114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Klaus Julisch. 2002. Data mining for intrusion detection. In Applications of Data Mining in Computer Security.Google ScholarGoogle Scholar
  23. Shibamouli Lahiri. 2014. Complexity of word collocation networks: A preliminary structural analysis. In EACL Workshop.Google ScholarGoogle ScholarCross RefCross Ref
  24. Srivatsan Laxman, P. S. Sastry, and K. P. Unnikrishnan. 2007. A fast algorithm for finding frequent episodes in event streams. In KDD. 410--419. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Zhen Liao, Daxin Jiang, Enhong Chen, Jian Pei, Huanhuan Cao, and Hang Li. 2011. Mining concept sequences from large-scale search logs for context-aware query suggestion. ACM TIST 3, 1 (2011), 17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Yuri Lin, Jean-Baptiste Michel, Erez Lieberman Aiden, Jon Orwant, Will Brockman, and Slav Petrov. 2012. Syntactic annotations for the Google Books Ngram Corpus. In ACL. 169--174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Yu Feng Lin, Cheng Wei Wu, Chien Feng Huang, and Vincent S. Tseng. 2015. Discovering utility-based episode rules in complex event sequences. ESWA 42, 12 (2015), 5303--5314. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Ling Luo, Xiang Ao, Feiyang Pan, Jin Wang, Tong Zhao, Ningzi Yu, and Qing He. 2018. Beyond polarity: Interpretable financial sentiment analysis with hierarchical query-driven attention. In IJCAI. 4244--4250. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Xi Ma, HweeHwa Pang, and Kian-Lee Tan. 2004. Finding constrained frequent episodes using minimal occurrences. In ICDM. 471--474. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Heikki Mannila and Hannu Toivonen. 1996. Discovering generalized episodes using minimal occurrences. In KDD, Vol. 96. 146--151. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo. 1997. Discovery of frequent episodes in event sequences. DMKD 1, 3 (1997), 259--289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Iris Miliaraki, Klaus Berberich, Rainer Gemulla, and Spyros Zoupanos. 2013. Mind the gap: Large-scale frequent sequence mining. In SIGMOD. 797--808. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Ndapandula Nakashole, Martin Theobald, and Gerhard Weikum. 2011. Scalable knowledge harvesting with high precision and high recall. In WSDM. 227--236. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Anny Ng and Ada Wai-Chee Fu. 2003. Mining frequent episodes for relating financial events and stock trends. In PAKDD. 27--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Debprakash Patnaik, Patrick Butler, Naren Ramakrishnan, Laxmi Parida, Benjamin J. Keller, and David A. Hanauer. 2011. Experiences with mining temporal event sequences from electronic medical records: Initial successes and some challenges. In KDD. 360--368. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Majed Sahli, Essam Mansour, and Panos Kalnis. 2013. Parallel motif extraction from very long sequences. In CIKM. 549--558. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Ramakrishnan Srikant and Rakesh Agrawal. 1995. Mining generalized association rules. In VLDB. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Ramakrishnan Srikant and Rakesh Agrawal. 1996. Mining sequential patterns: Generalizations and performance improvements. In EDBT. 1--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Nikolaj Tatti. 2014. Discovering episodes with compact minimal windows. DMKD 28, 4 (2014), 1046--1077. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Nikolaj Tatti. 2015. Ranking episodes using a partition model. DMKD 29, 5 (2015), 1312--1342. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Nikolaj Tatti and Boris Cule. 2011. Mining closed episodes with simultaneous events. In KDD. 1172--1180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Nikolaj Tatti and Jilles Vreeken. 2012. The long and the short of it: Summarising event sequences with serial episodes. In KDD. 462--470. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. K. P. Unnikrishnan, Basel Q. Shadid, P. S. Sastry, and Srivatsan Laxman. 2009. Root cause diagnostics using temporal data mining. U.S. Patent No. 7,509,234, Issued Mar. 24th, 2009.Google ScholarGoogle Scholar
  44. Cheng-Wei Wu, Yu-Feng Lin, S. Yu Philip, and Vincent S. Tseng. 2013. Mining high utility episodes in complex event sequences. In KDD. 536--544. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Mohammed J. Zaki. 2001. Parallel sequence mining on shared-memory machines. JPDC 61, 3 (2001), 401--426. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Chao Zhang, Jiawei Han, Lidan Shou, Jiajun Lu, and Thomas La Porta. 2014. Splitter: Mining fine-grained sequential patterns in semantic trajectories. In VLDB. 769--780. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Yong Zhang, Jiacheng Wu, Jin Wang, and Chunxiao Xing. 2019. A transformation-based framework for KNN set similarity search. IEEE Trans. Knowl. Data Eng. (2019).Google ScholarGoogle Scholar

Index Terms

  1. Large-Scale Frequent Episode Mining from Complex Event Sequences with Hierarchies

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Intelligent Systems and Technology
      ACM Transactions on Intelligent Systems and Technology  Volume 10, Issue 4
      Survey Papers and Regular Papers
      July 2019
      327 pages
      ISSN:2157-6904
      EISSN:2157-6912
      DOI:10.1145/3344873
      Issue’s Table of Contents

      Copyright © 2019 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 20 July 2019
      • Accepted: 1 April 2019
      • Revised: 1 January 2019
      • Received: 1 July 2018
      Published in tist Volume 10, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format