Abstract
Frequent Episode Mining (FEM), which aims at mining frequent sub-sequences from a single long event sequence, is one of the essential building blocks for the sequence mining research field. Existing studies about FEM suffer from unsatisfied scalability when faced with complex sequences as it is an NP-complete problem for testing whether an episode occurs in a sequence. In this article, we propose a scalable, distributed framework to support FEM on “big” event sequences. As a rule of thumb, “big” illustrates an event sequence is either very long or with masses of simultaneous events. Meanwhile, the events in this article are arranged in a predefined hierarchy. It derives some abstractive events that can form episodes that may not directly appear in the input sequence. Specifically, we devise an event-centered and hierarchy-aware partitioning strategy to allocate events from different levels of the hierarchy into local processes. We then present an efficient special-purpose algorithm to improve the local mining performance. We also extend our framework to support maximal and closed episode mining in the context of event hierarchy, and to the best of our knowledge, we are the first attempt to define and discover hierarchy-aware maximal and closed episodes. We implement the proposed framework on Apache Spark and conduct experiments on both synthetic and real-world datasets. Experimental results demonstrate the efficiency and scalability of the proposed approach and show that we can find practical patterns when taking event hierarchies into account.
- Avinash Achar, A. Ibrahim, and P. S. Sastry. 2013. Pattern-growth based frequent serial episode discovery. DKE 87 (2013), 91--108. Google ScholarDigital Library
- Avinash Achar, Srivatsan Laxman, and P. S. Sastry. 2012. A unified view of the a priori-based algorithms for frequent episode discovery. KAIS 31, 2 (2012), 223--250.Google Scholar
- Avinash Achar, Srivatsan Laxman, Raajay Viswanathan, and P. S. Sastry. 2012. Discovering injective episodes with general partial orders. DMKD 25, 1 (2012), 67--108. Google ScholarDigital Library
- Xiang Ao, Yang Liu, Zhen Huang, Luo Zuo, and Qing He. 2018. Free-rider episode screening via dual partition model. In DASFAA. 665--683.Google Scholar
- Xiang Ao, Ping Luo, Chengkai Li, Fuzhen Zhuang, and Qing He. 2015. Online frequent episode mining. In ICDE. 891--902.Google Scholar
- Xiang Ao, Ping Luo, Chengkai Li, Fuzhen Zhuang, Qing He, and Zhongzhi Shi. 2018. Discovering and learning sensational episodes of news events. Information Systems 78 (2018), 68--80.Google ScholarCross Ref
- Xiang Ao, Ping Luo, Jin Wang, Fuzhen Zhuang, and Qing He. 2018. Mining precise-positioning episode rules from event sequences. IEEE TKDE 30, 3 (2018), 530--543.Google Scholar
- Mikhail Atallah, Wojciech Szpankowski, and R. Gwadera. 2004. Detection of significant sets of episodes in event sequences. In ICDM. 3--10. Google ScholarDigital Library
- Marina Barsky, Sangkyum Kim, Tim Weninger, and Jiawei Han. 2011. Mining flipping correlations from large datasets with taxonomies. VLDB 5, 4 (2011), 370--381. Google ScholarDigital Library
- Kaustubh Beedkar and Rainer Gemulla. 2015. LASH: Large-scale sequence mining with hierarchies. In SIGMOD. 491--503. Google ScholarDigital Library
- Bouchra Bouqata, Christopher D. Carothers, Boleslaw K. Szymanski, and Mohammed J. Zaki. 2006. Vogue: A novel variable order-gap state machine for modeling sequences. In PKDD. 42--54.Google Scholar
- Alexandra M. Carvalho, Arlindo L. Oliveira, Ana T. Freitas, and Marie-France Sagot. 2004. A parallel algorithm for the extraction of structured motifs. In SAC. 147--153. Google ScholarDigital Library
- Gemma Casas-Garriga. 2003. Discovering unbounded episodes in sequential data. In PKDD. 83--94.Google Scholar
- Shengnan Cong, Jiawei Han, Jay Hoeflinger, and David Padua. 2005. A sampling-based framework for parallel data mining. In PPoPP. 255--265. Google ScholarDigital Library
- Shengnan Cong, Jiawei Han, and David Padua. 2005. Parallel mining of closed sequential patterns. In KDD. 562--567. Google ScholarDigital Library
- Lina Fahed, Armelle Brun, and Anne Boyer. 2018. DEER: Distant and essential episode rules for early prediction. ESWA 93 (2018), 283--298. Google ScholarDigital Library
- Robert R. Grauer, Nils H. Hakansson, and Frederick C. Shen. 1990. Industry rotation in the US stock market: 1934--1986 returns on passive, semi-passive, and active strategies. Journal of Banking 8 Finance (1990).Google Scholar
- Jiaqi Gu, Jin Wang, and Carlo Zaniolo. 2016. Ranking support for matched patterns over complex event streams: The CEPR system. In ICDE. 1354--1357.Google Scholar
- Jiawei Han and Yongjian Fu. 1995. Discovery of multiple-level association rules from large databases. In VLDB, Vol. 95. 420--431. Google ScholarDigital Library
- Jiawei Han, Jian Pei, Yiwen Yin, and Runying Mao. 2004. Mining frequent patterns without candidate generation: A frequent-pattern tree approach. DMKD 8, 1 (2004), 53--87. Google ScholarDigital Library
- Kuo-Yu Huang and Chia-Hui Chang. 2008. Efficient mining of frequent episodes from complex sequences. Information Systems 33, 1 (2008), 96--114. Google ScholarDigital Library
- Klaus Julisch. 2002. Data mining for intrusion detection. In Applications of Data Mining in Computer Security.Google Scholar
- Shibamouli Lahiri. 2014. Complexity of word collocation networks: A preliminary structural analysis. In EACL Workshop.Google ScholarCross Ref
- Srivatsan Laxman, P. S. Sastry, and K. P. Unnikrishnan. 2007. A fast algorithm for finding frequent episodes in event streams. In KDD. 410--419. Google ScholarDigital Library
- Zhen Liao, Daxin Jiang, Enhong Chen, Jian Pei, Huanhuan Cao, and Hang Li. 2011. Mining concept sequences from large-scale search logs for context-aware query suggestion. ACM TIST 3, 1 (2011), 17. Google ScholarDigital Library
- Yuri Lin, Jean-Baptiste Michel, Erez Lieberman Aiden, Jon Orwant, Will Brockman, and Slav Petrov. 2012. Syntactic annotations for the Google Books Ngram Corpus. In ACL. 169--174. Google ScholarDigital Library
- Yu Feng Lin, Cheng Wei Wu, Chien Feng Huang, and Vincent S. Tseng. 2015. Discovering utility-based episode rules in complex event sequences. ESWA 42, 12 (2015), 5303--5314. Google ScholarDigital Library
- Ling Luo, Xiang Ao, Feiyang Pan, Jin Wang, Tong Zhao, Ningzi Yu, and Qing He. 2018. Beyond polarity: Interpretable financial sentiment analysis with hierarchical query-driven attention. In IJCAI. 4244--4250. Google ScholarDigital Library
- Xi Ma, HweeHwa Pang, and Kian-Lee Tan. 2004. Finding constrained frequent episodes using minimal occurrences. In ICDM. 471--474. Google ScholarDigital Library
- Heikki Mannila and Hannu Toivonen. 1996. Discovering generalized episodes using minimal occurrences. In KDD, Vol. 96. 146--151. Google ScholarDigital Library
- Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo. 1997. Discovery of frequent episodes in event sequences. DMKD 1, 3 (1997), 259--289. Google ScholarDigital Library
- Iris Miliaraki, Klaus Berberich, Rainer Gemulla, and Spyros Zoupanos. 2013. Mind the gap: Large-scale frequent sequence mining. In SIGMOD. 797--808. Google ScholarDigital Library
- Ndapandula Nakashole, Martin Theobald, and Gerhard Weikum. 2011. Scalable knowledge harvesting with high precision and high recall. In WSDM. 227--236. Google ScholarDigital Library
- Anny Ng and Ada Wai-Chee Fu. 2003. Mining frequent episodes for relating financial events and stock trends. In PAKDD. 27--39. Google ScholarDigital Library
- Debprakash Patnaik, Patrick Butler, Naren Ramakrishnan, Laxmi Parida, Benjamin J. Keller, and David A. Hanauer. 2011. Experiences with mining temporal event sequences from electronic medical records: Initial successes and some challenges. In KDD. 360--368. Google ScholarDigital Library
- Majed Sahli, Essam Mansour, and Panos Kalnis. 2013. Parallel motif extraction from very long sequences. In CIKM. 549--558. Google ScholarDigital Library
- Ramakrishnan Srikant and Rakesh Agrawal. 1995. Mining generalized association rules. In VLDB. Google ScholarDigital Library
- Ramakrishnan Srikant and Rakesh Agrawal. 1996. Mining sequential patterns: Generalizations and performance improvements. In EDBT. 1--17. Google ScholarDigital Library
- Nikolaj Tatti. 2014. Discovering episodes with compact minimal windows. DMKD 28, 4 (2014), 1046--1077. Google ScholarDigital Library
- Nikolaj Tatti. 2015. Ranking episodes using a partition model. DMKD 29, 5 (2015), 1312--1342. Google ScholarDigital Library
- Nikolaj Tatti and Boris Cule. 2011. Mining closed episodes with simultaneous events. In KDD. 1172--1180. Google ScholarDigital Library
- Nikolaj Tatti and Jilles Vreeken. 2012. The long and the short of it: Summarising event sequences with serial episodes. In KDD. 462--470. Google ScholarDigital Library
- K. P. Unnikrishnan, Basel Q. Shadid, P. S. Sastry, and Srivatsan Laxman. 2009. Root cause diagnostics using temporal data mining. U.S. Patent No. 7,509,234, Issued Mar. 24th, 2009.Google Scholar
- Cheng-Wei Wu, Yu-Feng Lin, S. Yu Philip, and Vincent S. Tseng. 2013. Mining high utility episodes in complex event sequences. In KDD. 536--544. Google ScholarDigital Library
- Mohammed J. Zaki. 2001. Parallel sequence mining on shared-memory machines. JPDC 61, 3 (2001), 401--426. Google ScholarDigital Library
- Chao Zhang, Jiawei Han, Lidan Shou, Jiajun Lu, and Thomas La Porta. 2014. Splitter: Mining fine-grained sequential patterns in semantic trajectories. In VLDB. 769--780. Google ScholarDigital Library
- Yong Zhang, Jiacheng Wu, Jin Wang, and Chunxiao Xing. 2019. A transformation-based framework for KNN set similarity search. IEEE Trans. Knowl. Data Eng. (2019).Google Scholar
Index Terms
- Large-Scale Frequent Episode Mining from Complex Event Sequences with Hierarchies
Recommendations
Mining Episode Rules from Event Sequences Under Non-overlapping Frequency
Advances and Trends in Artificial Intelligence. Artificial Intelligence PracticesAbstractFrequent episode mining is a popular framework for retrieving useful information from an event sequence. Many algorithms have been proposed to mine frequent episodes and to derive episode rules from them with respect to a given frequency function ...
Mining serial positioning episode rules by natural exponent inertia weight based swallow swarm optimization algorithm with constraint based event sequences
The Frequent Episode Mining (FEM) is a challenging framework to identify frequent episodes from a sequence database. In a sequence, an ordered collection of events defines an episode, and frequent episodes are only considered by the earlier studies. Also, ...
Efficient mining of frequent episodes from complex sequences
Discovering patterns with great significance is an important problem in data mining discipline. An episode is defined to be a partially ordered set of events for consecutive and fixed-time intervals in a sequence. Most of previous studies on episodes ...
Comments