nach oben

Knowledge and Information Systems

Erschienen in:

15.03.2019 | Regular Paper

Discovering frequent chain episodes

verfasst von: Avinash Achar, P. S. Sastry

Erschienen in: Knowledge and Information Systems | Ausgabe 1/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Frequent episode discovery is a popular framework in temporal data mining with many applications. An episode is a partially ordered set of nodes with each node associated with an event-type. The episodes literature has seen different notions of frequency and a variety of associated discovery algorithms under these different frequencies when the associated partial order is total (serial episode) or trivial (parallel episode). Recently an apriori-based discovery algorithm for mining episodes where the associated partial order has no restriction but the node to event-type association is one–one (general injective episodes) was proposed based on the non-overlapped frequency measure. This work pointed out that frequency alone is not a sufficient indicator of interestingness in the context of episodes with general partial orders and introduced a new measure of interestingness called bidirectional evidence (BE) to address this issue. This algorithm discovers episodes by incorporating both frequency and BE thresholds in the level-wise procedure. In this paper, we extend this BE-based algorithm to a much larger class of episodes that we call chain episodes. This class encompasses all serial and parallel episodes (injective or otherwise) and also many other non-injective episodes with unrestricted partial orders. We first discuss how the BE measure can be generalized to chain episodes and prove the monotonicity property it satisfies in this general context. We then describe our candidate generation step (with correctness proofs) which nicely exploits this new monotonicity property. We further describe the frequency counting (with correctness proofs) and BE computation steps for chain episodes. The experimental results demonstrate the effectiveness of our algorithms.

Vorheriger Artikel Ranking the big sky: efficient top-k skyline computation on massive data

Nächster Artikel Preference modeling by exploiting latent components of ratings

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nur mit Berechtigung zugänglich

An episode is injective if the associated node to event-type map is one to one.

The class was originally introduced in [6] while the term “Chain Episodes” was explicitly used for the first time in [1].

The correctness proof for chain episode counting we present here is a minor extension of the correctness proofs for injective episodes first reported in [1]. An alternate correctness proof for chain episode counting is given in [25] which appeared after [1].

Given any set V, a relationR over V (which is a subset of \(V \times V\)) is said to be a strict partial order if it is irreflexive (i.e. for all \(v \in V\), \((v, v) \notin R\)), asymmetric (i.e. \((v_1, v_2) \in R\) implies that \((v_2, v_1) \notin R\), for all distinct \(v_1, v_2 \in V\)) and transitive (i.e. \(\forall v_1, v_2, v_3 \in V\), \((v_1, v_2) \in R\) and \((v_2, v_3) \in R\) implies that \((v_1, v_3) \in R\)).

We will elaborate later in Sect. 6 on how finite state automata can be used to track occurrences of episodes and a strategy for counting with expiry time constraints.

By frequent here, we mean episodes which satisfy both the frequency and BE thresholds.

An element in \(V_{\alpha }\) is minimal if there is no other element less than it as per \(<_{\alpha }\). Note that a poset can in general have multiple minimal elements.

In \(t_e\), subscript e denotes the end time of the window. In \(h_i^e\), superscript e refers to earliest transiting.

An episode is said to be frequency closed if every superepisode has a strictly lower frequency.

A generator is an episode whose every subepisode has a strictly greater frequency.

Recall from Definition 6, an injective episode \(\alpha \) can be viewed as a partially ordered set of event-types \((X^\alpha ,R^\alpha )\). \((X^\beta ,R^\beta )\) is a maximal subepisode of an injective episode \(\alpha \) if \(X^\beta \subseteq X^\alpha \) and \(R^\beta \) is the restriction of \(R^\alpha \) on to \(X^\beta \). The notion of a maximal subepisode of a general episode is discussed in the next section.

A serial extension of a chain episode \((V_\alpha , <_\alpha , g_\alpha )\) is a serial episode \(\beta = (V_\beta ,<_\beta ,g_\beta )\) where \(V_{\beta }=V_{\alpha }\) and \(g_\alpha = g_\beta \) such that \(<_\alpha \subseteq <_\beta \).

The source codes have all been written in C++. The experiments have been run on a 2.5GHz Pentium PC under a Linux operating system.

Achar A (2010) Discovering Frequent episodes with general partial orders. PhD thesis, Department of Electrical Engineering, Indian Institute of Science, Bangalore, India

Achar A, Sastry PS (2015) Statistical significance of general partial orders. Inf Sci 296:175–200MathSciNetCrossRefMATH

Achar A, Laxman S, Sastry PS (2012) A unified view of the apriori-based algorithms for frequent episode discovery. Knowl Inf Syst 31(2):223–250CrossRef

Achar A, Ibrahim A, Sastry PS (2013) Pattern-growth based frequent serial episode discovery. Data Knowl Eng 87:91–108CrossRef

Achar A, Laxman S, Raajay V, Sastry PS (2012) Discovering injective episodes with general partial orders. Data Min Knowl Discov 25(1):67–108MathSciNetCrossRefMATH

Achar A, Laxman S, Raajay V, Sastry PS (2009) Discovering general partial orders from event streams. In: Technical report arXiv: 0902.1227v2 [cs.AI]

Atallah MJ, Gwadera R, Szpankowski W (2004) Detection of significant sets of episodes in event sequences. In: Proceedings of the 4th IEEE international conference on data mining (ICDM), Brighton, UK, pp 3–10

Bouqata B, Caraothers CD, Szymanski BK, Zaki MJ (2006) Vogue: a novel variable order-gap state machine for modeling sequences. In: Proceedings of the 10th European conference on principles and practice of knowledge discovery in databases, vol 4213. Springer, Berlin, pp 42–54

Brown EN, Kass RE, Mitra PP (2004) Multiple neuronal spike train data analysis: state of art and future challenges. Nat Neurosci 7(5):456–461CrossRef

10.

Gan M, Dai H (2012) An efficient one-pass method for discovering bases of recently frequent episodes over online data streams. Int J Innov Comput Inf Control 8(7(A)):4675–4690

11.

Gan M, Dai H (2014) Detecting and monitoring abrupt emergences and submergences of episodes over data streams. Inf Syst 39:277–289CrossRef

12.

Huang K, Chang C (2008) Efficient mining of frequent episodes from complex sequences. Inf Syst 33(1):96–114CrossRef

13.

Ibrahim A, Sastry S, Sastry PS (2016) Discovering compressing serial episodes from event sequences. Knowl Inf Syst 47(2):405–432CrossRef

14.

Iwanuma K, Takano Y, Nabeshima H (2004) On anti-monotone frequency measures for extracting sequential patterns from a single very-long sequence. In: Proceedings of the 2004 IEEE conference on cybernetics and intelligent systems, vol 1, pp 213–217

15.

Laxman S, Sastry PS, Unnikrishnan KP (2005) Discovering frequent episodes and learning hidden Markov models: a formal connection. IEEE Trans Knowl Data Eng 17:1505–1517CrossRef

16.

Laxman S, Tankasali V, White RW (2008) Stream prediction using a generative model based on frequent episodes in event sequences. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’08), pp 453–461

17.

Luo J, Bridges SM (2000) Mining fuzzy association rules and fuzzy frequent episodes for intrusion detection. Int J Intell Syst 15:687–703CrossRefMATH

18.

Mannila H, Toivonen H, Verkamo AI (1997) Discovery of frequent episodes in event sequences. Data Min Knowl Discov 1(3):259–289CrossRef

19.

Nag A, Fu AW (2003) Mining frequent episodes for relating financial events and stock trends. In: Proceedings of 7th Pacific-Asia conference on knowledge discovery and data mining (PAKDD 2003). Springer, Berlin, pp 27–39

20.

Patnaik D, Sastry PS, Unnikrishnan KP (2008) Inferring neuronal network connectivity from spike data: a temporal data mining approach. Sci Program 16:49–77

21.

Patnaik D, Laxman S, Chandramouli B, Ramakrishnan N (2012) Efficient episode mining of dynamic event streams. In: IEEE international conference on data mining, pp 605–614

22.

Sastry PS, Unnikrishnan KP (2010) Conditional probability based significance tests for sequential patterns in multi-neuronal spike trains. Neural Comput 22(4):1025–1059CrossRefMATH

23.

Tatti N (2009) Significance of episodes based on minimal windows. In: Proceedings of 2009 IEEE international conference on data mining

24.

Tatti N (2015) Ranking episodes using a partition model. Data Min Knowl Discov 29(5):1312–1342MathSciNetCrossRefMATH

25.

Tatti N, Cule B (2012) Mining closed strict episodes. Data Min Knowl Discov 25(3):34–66MathSciNetCrossRefMATH

26.

Tatti N, Cule B (2011) Mining closed episodes with simultaneous events. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1172–1180

27.

Tatti N, Cule B (2010) Mining closed strict episodes. In: Proceedings of 2010 IEEE international conference on data mining, pp 501–510

28.

Unnikrishnan KP, Shadid BQ, Sastry PS, Laxman S (2009) Root cause diagnostics using temporal datamining, U.S.Patent no. 7509234, 24 Mar

29.

Wang MF, Wu YC, Tsai MF (2008) Exploiting frequent episodes in weighted suffix tree to improve intrusion detection system. In: Proceedings of the 22nd international conference on advanced information networking and applications-workshops. IEEE Computer Society, Washington, pp 1246–1252

Titel: Discovering frequent chain episodes
verfasst von: Avinash Achar
P. S. Sastry
Publikationsdatum: 15.03.2019
Verlag: Springer London
Erschienen in: Knowledge and Information Systems / Ausgabe 1/2019
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI: https://doi.org/10.1007/s10115-019-01349-y

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 1/2019

Filtering Bayesian optimization approach in weakly specified search space

High-throughput and scalable protein function identification with Hadoop and Map-only pattern of the MapReduce processing model

Overview of the crowdsourcing process

Temporal popularity prediction of locations for geographical placement of retail stores

Dynamic windowing mechanism to combine sentiment and N-gram analysis in detecting events from social media

Personalized itinerary recommendation with time constraints using GPS datasets