research-article

Large-Scale Frequent Episode Mining from Complex Event Sequences with Hierarchies

Authors:
Xiang Ao

Institute of Computing Technology, Chinese Academy of Sciences, China and University of Chinese Academy of Sciences, Beijing , China

Institute of Computing Technology, Chinese Academy of Sciences, China and University of Chinese Academy of Sciences, Beijing , China

0000-0001-9633-8361
View Profile

,
Haoran Shi

University of California, Irvine, CA, USA

University of California, Irvine, CA, USA
View Profile

,
Jin Wang

University of California, Los Angeles, CA, USA

University of California, Los Angeles, CA, USA
View Profile

,
Luo Zuo

Institute of Computing Technology, Chinese Academy of Sciences, China and University of Chinese Academy of Sciences, Beijing , China

Institute of Computing Technology, Chinese Academy of Sciences, China and University of Chinese Academy of Sciences, Beijing , China
View Profile

,
Hongwei Li

Institute of Computing Technology, Chinese Academy of Sciences, China and University of Chinese Academy of Sciences, Beijing , China

Institute of Computing Technology, Chinese Academy of Sciences, China and University of Chinese Academy of Sciences, Beijing , China
View Profile

,
Qing He

Institute of Computing Technology, Chinese Academy of Sciences, China and University of Chinese Academy of Sciences, Beijing , China

Institute of Computing Technology, Chinese Academy of Sciences, China and University of Chinese Academy of Sciences, Beijing , China
View Profile

ACM Transactions on Intelligent Systems and Technology Volume 10 Issue 4Article No.: 36pp 1–26https://doi.org/10.1145/3326163

Published:20 July 2019Publication History

ACM Transactions on Intelligent Systems and Technology

Abstract

Frequent Episode Mining (FEM), which aims at mining frequent sub-sequences from a single long event sequence, is one of the essential building blocks for the sequence mining research field. Existing studies about FEM suffer from unsatisfied scalability when faced with complex sequences as it is an NP-complete problem for testing whether an episode occurs in a sequence. In this article, we propose a scalable, distributed framework to support FEM on “big” event sequences. As a rule of thumb, “big” illustrates an event sequence is either very long or with masses of simultaneous events. Meanwhile, the events in this article are arranged in a predefined hierarchy. It derives some abstractive events that can form episodes that may not directly appear in the input sequence. Specifically, we devise an event-centered and hierarchy-aware partitioning strategy to allocate events from different levels of the hierarchy into local processes. We then present an efficient special-purpose algorithm to improve the local mining performance. We also extend our framework to support maximal and closed episode mining in the context of event hierarchy, and to the best of our knowledge, we are the first attempt to define and discover hierarchy-aware maximal and closed episodes. We implement the proposed framework on Apache Spark and conduct experiments on both synthetic and real-world datasets. Experimental results demonstrate the efficiency and scalability of the proposed approach and show that we can find practical patterns when taking event hierarchies into account.

References

Avinash Achar, A. Ibrahim, and P. S. Sastry. 2013. Pattern-growth based frequent serial episode discovery. DKE 87 (2013), 91--108. Google ScholarDigital Library
Avinash Achar, Srivatsan Laxman, and P. S. Sastry. 2012. A unified view of the a priori-based algorithms for frequent episode discovery. KAIS 31, 2 (2012), 223--250.Google Scholar
Avinash Achar, Srivatsan Laxman, Raajay Viswanathan, and P. S. Sastry. 2012. Discovering injective episodes with general partial orders. DMKD 25, 1 (2012), 67--108. Google ScholarDigital Library
Xiang Ao, Yang Liu, Zhen Huang, Luo Zuo, and Qing He. 2018. Free-rider episode screening via dual partition model. In DASFAA. 665--683.Google Scholar
Xiang Ao, Ping Luo, Chengkai Li, Fuzhen Zhuang, and Qing He. 2015. Online frequent episode mining. In ICDE. 891--902.Google Scholar
Xiang Ao, Ping Luo, Chengkai Li, Fuzhen Zhuang, Qing He, and Zhongzhi Shi. 2018. Discovering and learning sensational episodes of news events. Information Systems 78 (2018), 68--80.Google ScholarCross Ref
Xiang Ao, Ping Luo, Jin Wang, Fuzhen Zhuang, and Qing He. 2018. Mining precise-positioning episode rules from event sequences. IEEE TKDE 30, 3 (2018), 530--543.Google Scholar
Mikhail Atallah, Wojciech Szpankowski, and R. Gwadera. 2004. Detection of significant sets of episodes in event sequences. In ICDM. 3--10. Google ScholarDigital Library
Marina Barsky, Sangkyum Kim, Tim Weninger, and Jiawei Han. 2011. Mining flipping correlations from large datasets with taxonomies. VLDB 5, 4 (2011), 370--381. Google ScholarDigital Library
Kaustubh Beedkar and Rainer Gemulla. 2015. LASH: Large-scale sequence mining with hierarchies. In SIGMOD. 491--503. Google ScholarDigital Library
Bouchra Bouqata, Christopher D. Carothers, Boleslaw K. Szymanski, and Mohammed J. Zaki. 2006. Vogue: A novel variable order-gap state machine for modeling sequences. In PKDD. 42--54.Google Scholar
Alexandra M. Carvalho, Arlindo L. Oliveira, Ana T. Freitas, and Marie-France Sagot. 2004. A parallel algorithm for the extraction of structured motifs. In SAC. 147--153. Google ScholarDigital Library
Gemma Casas-Garriga. 2003. Discovering unbounded episodes in sequential data. In PKDD. 83--94.Google Scholar
Shengnan Cong, Jiawei Han, Jay Hoeflinger, and David Padua. 2005. A sampling-based framework for parallel data mining. In PPoPP. 255--265. Google ScholarDigital Library
Shengnan Cong, Jiawei Han, and David Padua. 2005. Parallel mining of closed sequential patterns. In KDD. 562--567. Google ScholarDigital Library
Lina Fahed, Armelle Brun, and Anne Boyer. 2018. DEER: Distant and essential episode rules for early prediction. ESWA 93 (2018), 283--298. Google ScholarDigital Library
Robert R. Grauer, Nils H. Hakansson, and Frederick C. Shen. 1990. Industry rotation in the US stock market: 1934--1986 returns on passive, semi-passive, and active strategies. Journal of Banking 8 Finance (1990).Google Scholar
Jiaqi Gu, Jin Wang, and Carlo Zaniolo. 2016. Ranking support for matched patterns over complex event streams: The CEPR system. In ICDE. 1354--1357.Google Scholar
Jiawei Han and Yongjian Fu. 1995. Discovery of multiple-level association rules from large databases. In VLDB, Vol. 95. 420--431. Google ScholarDigital Library
Jiawei Han, Jian Pei, Yiwen Yin, and Runying Mao. 2004. Mining frequent patterns without candidate generation: A frequent-pattern tree approach. DMKD 8, 1 (2004), 53--87. Google ScholarDigital Library
Kuo-Yu Huang and Chia-Hui Chang. 2008. Efficient mining of frequent episodes from complex sequences. Information Systems 33, 1 (2008), 96--114. Google ScholarDigital Library
Klaus Julisch. 2002. Data mining for intrusion detection. In Applications of Data Mining in Computer Security.Google Scholar
Shibamouli Lahiri. 2014. Complexity of word collocation networks: A preliminary structural analysis. In EACL Workshop.Google ScholarCross Ref
Srivatsan Laxman, P. S. Sastry, and K. P. Unnikrishnan. 2007. A fast algorithm for finding frequent episodes in event streams. In KDD. 410--419. Google ScholarDigital Library
Zhen Liao, Daxin Jiang, Enhong Chen, Jian Pei, Huanhuan Cao, and Hang Li. 2011. Mining concept sequences from large-scale search logs for context-aware query suggestion. ACM TIST 3, 1 (2011), 17. Google ScholarDigital Library
Yuri Lin, Jean-Baptiste Michel, Erez Lieberman Aiden, Jon Orwant, Will Brockman, and Slav Petrov. 2012. Syntactic annotations for the Google Books Ngram Corpus. In ACL. 169--174. Google ScholarDigital Library
Yu Feng Lin, Cheng Wei Wu, Chien Feng Huang, and Vincent S. Tseng. 2015. Discovering utility-based episode rules in complex event sequences. ESWA 42, 12 (2015), 5303--5314. Google ScholarDigital Library
Ling Luo, Xiang Ao, Feiyang Pan, Jin Wang, Tong Zhao, Ningzi Yu, and Qing He. 2018. Beyond polarity: Interpretable financial sentiment analysis with hierarchical query-driven attention. In IJCAI. 4244--4250. Google ScholarDigital Library
Xi Ma, HweeHwa Pang, and Kian-Lee Tan. 2004. Finding constrained frequent episodes using minimal occurrences. In ICDM. 471--474. Google ScholarDigital Library
Heikki Mannila and Hannu Toivonen. 1996. Discovering generalized episodes using minimal occurrences. In KDD, Vol. 96. 146--151. Google ScholarDigital Library
Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo. 1997. Discovery of frequent episodes in event sequences. DMKD 1, 3 (1997), 259--289. Google ScholarDigital Library
Iris Miliaraki, Klaus Berberich, Rainer Gemulla, and Spyros Zoupanos. 2013. Mind the gap: Large-scale frequent sequence mining. In SIGMOD. 797--808. Google ScholarDigital Library
Ndapandula Nakashole, Martin Theobald, and Gerhard Weikum. 2011. Scalable knowledge harvesting with high precision and high recall. In WSDM. 227--236. Google ScholarDigital Library
Anny Ng and Ada Wai-Chee Fu. 2003. Mining frequent episodes for relating financial events and stock trends. In PAKDD. 27--39. Google ScholarDigital Library
Debprakash Patnaik, Patrick Butler, Naren Ramakrishnan, Laxmi Parida, Benjamin J. Keller, and David A. Hanauer. 2011. Experiences with mining temporal event sequences from electronic medical records: Initial successes and some challenges. In KDD. 360--368. Google ScholarDigital Library
Majed Sahli, Essam Mansour, and Panos Kalnis. 2013. Parallel motif extraction from very long sequences. In CIKM. 549--558. Google ScholarDigital Library
Ramakrishnan Srikant and Rakesh Agrawal. 1995. Mining generalized association rules. In VLDB. Google ScholarDigital Library
Ramakrishnan Srikant and Rakesh Agrawal. 1996. Mining sequential patterns: Generalizations and performance improvements. In EDBT. 1--17. Google ScholarDigital Library
Nikolaj Tatti. 2014. Discovering episodes with compact minimal windows. DMKD 28, 4 (2014), 1046--1077. Google ScholarDigital Library
Nikolaj Tatti. 2015. Ranking episodes using a partition model. DMKD 29, 5 (2015), 1312--1342. Google ScholarDigital Library
Nikolaj Tatti and Boris Cule. 2011. Mining closed episodes with simultaneous events. In KDD. 1172--1180. Google ScholarDigital Library
Nikolaj Tatti and Jilles Vreeken. 2012. The long and the short of it: Summarising event sequences with serial episodes. In KDD. 462--470. Google ScholarDigital Library
K. P. Unnikrishnan, Basel Q. Shadid, P. S. Sastry, and Srivatsan Laxman. 2009. Root cause diagnostics using temporal data mining. U.S. Patent No. 7,509,234, Issued Mar. 24th, 2009.Google Scholar
Cheng-Wei Wu, Yu-Feng Lin, S. Yu Philip, and Vincent S. Tseng. 2013. Mining high utility episodes in complex event sequences. In KDD. 536--544. Google ScholarDigital Library
Mohammed J. Zaki. 2001. Parallel sequence mining on shared-memory machines. JPDC 61, 3 (2001), 401--426. Google ScholarDigital Library
Chao Zhang, Jiawei Han, Lidan Shou, Jiajun Lu, and Thomas La Porta. 2014. Splitter: Mining fine-grained sequential patterns in semantic trajectories. In VLDB. 769--780. Google ScholarDigital Library
Yong Zhang, Jiacheng Wu, Jin Wang, and Chunxiao Xing. 2019. A transformation-based framework for KNN set similarity search. IEEE Trans. Knowl. Data Eng. (2019).Google Scholar

Index Terms

Large-Scale Frequent Episode Mining from Complex Event Sequences with Hierarchies
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Mining Episode Rules from Event Sequences Under Non-overlapping Frequency
Advances and Trends in Artificial Intelligence. Artificial Intelligence Practices
Abstract
Frequent episode mining is a popular framework for retrieving useful information from an event sequence. Many algorithms have been proposed to mine frequent episodes and to derive episode rules from them with respect to a given frequency function ...
Read More
Mining serial positioning episode rules by natural exponent inertia weight based swallow swarm optimization algorithm with constraint based event sequences

The Frequent Episode Mining (FEM) is a challenging framework to identify frequent episodes from a sequence database. In a sequence, an ordered collection of events defines an episode, and frequent episodes are only considered by the earlier studies. Also, ...
Read More
Efficient mining of frequent episodes from complex sequences

Discovering patterns with great significance is an important problem in data mining discipline. An episode is defined to be a partially ordered set of events for consecutive and fixed-time intervals in a sequence. Most of previous studies on episodes ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Intelligent Systems and Technology Volume 10, Issue 4
Survey Papers and Regular Papers
July 2019
327 pages
ISSN:2157-6904
EISSN:2157-6912
DOI:10.1145/3344873
Editor:
Yu Zheng
JD Finance, China
Issue’s Table of Contents
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 July 2019
- Accepted: 1 April 2019
- Revised: 1 January 2019
- Received: 1 July 2018
Published in tist Volume 10, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Frequent episode mining
hierarchy-aware maximal/closed episode
large-scale sequence mining
peak episode miner
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 31
  Total Citations
  View Citations
- 336
  Total Downloads
- Downloads (Last 12 months)45
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Large-Scale Frequent Episode Mining from Complex Event Sequences with Hierarchies

ACM Transactions on Intelligent Systems and Technology

Abstract

References

Cited By

Index Terms

Recommendations

Mining Episode Rules from Event Sequences Under Non-overlapping Frequency

Mining serial positioning episode rules by natural exponent inertia weight based swallow swarm optimization algorithm with constraint based event sequences

Efficient mining of frequent episodes from complex sequences