Skip to main content
Erschienen in: Data Mining and Knowledge Discovery 5/2015

01.09.2015

Ranking episodes using a partition model

verfasst von: Nikolaj Tatti

Erschienen in: Data Mining and Knowledge Discovery | Ausgabe 5/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

One of the biggest setbacks in traditional frequent pattern mining is that overwhelmingly many of the discovered patterns are redundant. A prototypical example of such redundancy is a freerider pattern where the pattern contains a true pattern and some additional noise events. A technique for filtering freerider patterns that has proved to be efficient in ranking itemsets is to use a partition model where a pattern is divided into two subpatterns and the observed support is compared to the expected support under the assumption that these two subpatterns occur independently. In this paper we develop a partition model for episodes, patterns discovered from sequential data. An episode is essentially a set of events, with possible restrictions on the order of events. Unlike with itemset mining, computing the expected support of an episode requires surprisingly sophisticated methods. In order to construct the model, we partition the episode into two subepisodes. We then model how likely the events in each subepisode occur close to each other. If this probability is high—which is often the case if the subepisode has a high support—then we can expect that when one event from a subepisode occurs, then the remaining events occur also close by. This approach increases the expected support of the episode, and if this increase explains the observed support, then we can deem the episode uninteresting. We demonstrate in our experiments that using the partition model can effectively and efficiently reduce the redundancy in episodes.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
Or more rarely if \(t_i\) are small.
 
2
Consequently, \(F\) contains at least two vertices from \(W\).
 
6
The implementation is available at http://​research.​ics.​aalto.​fi/​dmg/​.
 
Literatur
Zurück zum Zitat Achar A, Laxman S, Viswanathan R, Sastry PS (2012) Discovering injective episodes with general partial orders. Data Min Knowl Discov 25(1):67–108MathSciNetCrossRefMATH Achar A, Laxman S, Viswanathan R, Sastry PS (2012) Discovering injective episodes with general partial orders. Data Min Knowl Discov 25(1):67–108MathSciNetCrossRefMATH
Zurück zum Zitat Gwadera R, Atallah MJ, Szpankowski W (2005a) Markov models for identification of significant episodes. In: Proceedings of the 5th SIAM international conference on data mining (SDM), Newport Beach, CA, pp 404–414 Gwadera R, Atallah MJ, Szpankowski W (2005a) Markov models for identification of significant episodes. In: Proceedings of the 5th SIAM international conference on data mining (SDM), Newport Beach, CA, pp 404–414
Zurück zum Zitat Gwadera R, Atallah MJ, Szpankowski W (2005b) Reliable detection of episodes in event sequences. Knowl Inf Syst 7(4):415–437CrossRef Gwadera R, Atallah MJ, Szpankowski W (2005b) Reliable detection of episodes in event sequences. Knowl Inf Syst 7(4):415–437CrossRef
Zurück zum Zitat Kullback S (1959) Information theory and statistics. Wiley, New YorkMATH Kullback S (1959) Information theory and statistics. Wiley, New YorkMATH
Zurück zum Zitat Lam HT, Mörchen F, Fradkin D, Calders T (2014) Mining compressing sequential patterns. Stat Anal Data Min 7(1):34–52MathSciNetCrossRef Lam HT, Mörchen F, Fradkin D, Calders T (2014) Mining compressing sequential patterns. Stat Anal Data Min 7(1):34–52MathSciNetCrossRef
Zurück zum Zitat Low-Kam C, Raïssi C, Kaytoue M, Pei J (2013) Mining statistically significant sequential patterns. In: Proceedings of the 13th IEEE international conference on data mining (ICDM), Dallas, TX, pp 488–497 Low-Kam C, Raïssi C, Kaytoue M, Pei J (2013) Mining statistically significant sequential patterns. In: Proceedings of the 13th IEEE international conference on data mining (ICDM), Dallas, TX, pp 488–497
Zurück zum Zitat Laxman S, Sastry PS, Unnikrishnan KP (2007) A fast algorithm for finding frequent episodes in event streams. In: Proceedings of the 13th ACM international conference on knowledge discovery and data mining (SIGKDD), San Jose, CA, pp 410–419 Laxman S, Sastry PS, Unnikrishnan KP (2007) A fast algorithm for finding frequent episodes in event streams. In: Proceedings of the 13th ACM international conference on knowledge discovery and data mining (SIGKDD), San Jose, CA, pp 410–419
Zurück zum Zitat Mannila H, Meek C (2000) Global partial orders from sequential data. In: Proceedings of the 6th ACM international conference on knowledge discovery and data mining (SIGKDD), Boston, MA, pp 161–168 Mannila H, Meek C (2000) Global partial orders from sequential data. In: Proceedings of the 6th ACM international conference on knowledge discovery and data mining (SIGKDD), Boston, MA, pp 161–168
Zurück zum Zitat Mannila H, Toivonen H, Verkamo AI (1997) Discovery of frequent episodes in event sequences. Data Min Knowl Discov 1(3):259–289CrossRef Mannila H, Toivonen H, Verkamo AI (1997) Discovery of frequent episodes in event sequences. Data Min Knowl Discov 1(3):259–289CrossRef
Zurück zum Zitat Pei J, Wang H, Liu J, Wang K, Wang J, Yu PS (2006) Discovering frequent closed partial orders from strings. IEEE Trans Knowl Data Eng 18(11):1467–1481CrossRef Pei J, Wang H, Liu J, Wang K, Wang J, Yu PS (2006) Discovering frequent closed partial orders from strings. IEEE Trans Knowl Data Eng 18(11):1467–1481CrossRef
Zurück zum Zitat Tatti N, Cule B (2011) Mining closed episodes with simultaneous events. In: Proceedings of the 17th ACM international conference on knowledge discovery and data mining (SIGKDD), San Diego, CA, pp 1172–1180 Tatti N, Cule B (2011) Mining closed episodes with simultaneous events. In: Proceedings of the 17th ACM international conference on knowledge discovery and data mining (SIGKDD), San Diego, CA, pp 1172–1180
Zurück zum Zitat Tatti N, Vreeken J (2012) The long and the short of it: summarising event sequences with serial episodes. In: Proceedings of the 18th ACM international conference on knowledge discovery and data mining (SIGKDD), Beijing, China, pp 462–470 Tatti N, Vreeken J (2012) The long and the short of it: summarising event sequences with serial episodes. In: Proceedings of the 18th ACM international conference on knowledge discovery and data mining (SIGKDD), Beijing, China, pp 462–470
Zurück zum Zitat Wang J, Han J (2004) Bide: efficient mining of frequent closed sequences. In: Proceedings of the 20th international conference on data dngineering (ICDE), Boston, MA, pp 79–90 Wang J, Han J (2004) Bide: efficient mining of frequent closed sequences. In: Proceedings of the 20th international conference on data dngineering (ICDE), Boston, MA, pp 79–90
Zurück zum Zitat Webb GI (2007) Discovering significant patterns. Mach Learn 68(1):1–33CrossRef Webb GI (2007) Discovering significant patterns. Mach Learn 68(1):1–33CrossRef
Zurück zum Zitat Webb GI (2010) Self-sufficient itemsets: an approach to screening potentially interesting associations between items. ACM Trans Knowl Discov Data 4(1):3CrossRef Webb GI (2010) Self-sufficient itemsets: an approach to screening potentially interesting associations between items. ACM Trans Knowl Discov Data 4(1):3CrossRef
Metadaten
Titel
Ranking episodes using a partition model
verfasst von
Nikolaj Tatti
Publikationsdatum
01.09.2015
Verlag
Springer US
Erschienen in
Data Mining and Knowledge Discovery / Ausgabe 5/2015
Print ISSN: 1384-5810
Elektronische ISSN: 1573-756X
DOI
https://doi.org/10.1007/s10618-015-0419-9

Weitere Artikel der Ausgabe 5/2015

Data Mining and Knowledge Discovery 5/2015 Zur Ausgabe

Premium Partner