Skip to main content
Erschienen in: Data Mining and Knowledge Discovery 2/2020

21.01.2020

NegPSpan: efficient extraction of negative sequential patterns with embedding constraints

Erschienen in: Data Mining and Knowledge Discovery | Ausgabe 2/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Sequential pattern mining is concerned with the extraction of frequent or recurrent behaviors, modeled as subsequences, from a sequence dataset. Such patterns inform about which events are frequently observed in sequences, i.e. events that really happen. Sometimes, knowing that some specific event does not happen is more informative than extracting observed events. Negative sequential patterns (NSPs) capture recurrent behaviors by patterns having the form of sequences mentioning both observed events and absence of events. Few approaches have been proposed to mine such NSPs. In addition, the syntax and semantics of NSPs differ in the different methods which makes it difficult to compare them. This article provides a unified framework for the formulation of the syntax and the semantics of NSPs. Then, we introduce a new algorithm, NegPSpan, that extracts NSPs using a prefix-based depth-first scheme, enabling maxgap constraints that other approaches do not take into account. The formal framework highlights the differences between the proposed approach and methods from the literature, especially against the state of the art approach eNSP. Intensive experiments on synthetic and real datasets show that NegPSpan can extract meaningful NSPs and that it can process bigger datasets than eNSP thanks to significantly lower memory requirements and better computation times.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
Called the maximal positive subsequence in PNSP (Hsueh et al. 2008) and NegGSP (Zheng et al. 2009) or the positive element id-set in eNSP.
 
2
Actually, though not clearly stated, it seems that the negative elements of NegGSP patterns consist of items rather than itemsets. In this case, total and partial inclusion are equivalent (see Proposition 6).
 
3
Code, data generator and synthetic benchmark datasets can be tested online and downloaded here: http://​people.​irisa.​fr/​Thomas.​Guyet/​negativepatterns​/​.
 
5
 
6
ATC: Anatomical Therapeutic Chemical Classification System is a drug classification system that classifies the active ingredients of drugs.
 
7
Note that we proved that this path is actually unique.
 
Literatur
Zurück zum Zitat Bosc G, Boulicaut JF, Raïssi C, Kaytoue M (2018) Anytime discovery of a diverse set of patterns with monte carlo tree search. Data Min Knowl Discov 32(3):604–650MathSciNetCrossRef Bosc G, Boulicaut JF, Raïssi C, Kaytoue M (2018) Anytime discovery of a diverse set of patterns with monte carlo tree search. Data Min Knowl Discov 32(3):604–650MathSciNetCrossRef
Zurück zum Zitat Cao L, Yu PS, Kumar V (2015) Nonoccurring behavior analytics: a new area. Intell Syst 30(6):4–11CrossRef Cao L, Yu PS, Kumar V (2015) Nonoccurring behavior analytics: a new area. Intell Syst 30(6):4–11CrossRef
Zurück zum Zitat Cao L, Dong X, Zheng Z (2016) e-NSP: efficient negative sequential pattern mining. Artif Intell 235:156–182MathSciNetCrossRef Cao L, Dong X, Zheng Z (2016) e-NSP: efficient negative sequential pattern mining. Artif Intell 235:156–182MathSciNetCrossRef
Zurück zum Zitat Dauxais Y, Guyet T, Gross-Amblard D, Happe A (2017) Discriminant chronicles mining—application to care pathways analytics. In: Proceedings of 16th conference on artificial intelligence in medicine (AIME), pp 234–244 Dauxais Y, Guyet T, Gross-Amblard D, Happe A (2017) Discriminant chronicles mining—application to care pathways analytics. In: Proceedings of 16th conference on artificial intelligence in medicine (AIME), pp 234–244
Zurück zum Zitat Dong X, Gong Y, Cao L (2018b) F-NSP\(+\): a fast negative sequential patterns mining method with self-adaptive data storage. Pattern Recognit 84:13–27CrossRef Dong X, Gong Y, Cao L (2018b) F-NSP\(+\): a fast negative sequential patterns mining method with self-adaptive data storage. Pattern Recognit 84:13–27CrossRef
Zurück zum Zitat Giannotti F, Nanni M, Pedreschi D (2006) Efficient mining of temporally annotated sequences. In: Proceedings of the SIAM international conference on data mining, pp 348–359 Giannotti F, Nanni M, Pedreschi D (2006) Efficient mining of temporally annotated sequences. In: Proceedings of the SIAM international conference on data mining, pp 348–359
Zurück zum Zitat Gong Y, Xu T, Dong X, Lv G (2017) e-NSPFI: efficient mining negative sequential pattern from both frequent and infrequent positive sequential patterns. Int J Pattern Recognit Artif Intell 31(02):1750002CrossRef Gong Y, Xu T, Dong X, Lv G (2017) e-NSPFI: efficient mining negative sequential pattern from both frequent and infrequent positive sequential patterns. Int J Pattern Recognit Artif Intell 31(02):1750002CrossRef
Zurück zum Zitat Han J, Pei J, Mortazavi-Asl B, Chen Q, Dayal U, Hsu MC (2000) FreeSpan: frequent pattern-projected sequential pattern mining. In: Proceedings of the sixth international conference on knowledge discovery and data mining (SIGKDD), pp 355–359 Han J, Pei J, Mortazavi-Asl B, Chen Q, Dayal U, Hsu MC (2000) FreeSpan: frequent pattern-projected sequential pattern mining. In: Proceedings of the sixth international conference on knowledge discovery and data mining (SIGKDD), pp 355–359
Zurück zum Zitat Hsueh SC, Lin MY, Chen CL (2008) Mining negative sequential patterns for e-commerce recommendations. In: Proceedings of Asia-Pacific services computing conference, pp 1213–1218 Hsueh SC, Lin MY, Chen CL (2008) Mining negative sequential patterns for e-commerce recommendations. In: Proceedings of Asia-Pacific services computing conference, pp 1213–1218
Zurück zum Zitat Kamepalli S, Sekhara R, Kurra R (2014) Frequent negative sequential patterns - a survey. Int J Comput Eng Technol 5(3):115–121 Kamepalli S, Sekhara R, Kurra R (2014) Frequent negative sequential patterns - a survey. Int J Comput Eng Technol 5(3):115–121
Zurück zum Zitat Lin JCW, Fournier-Viger P, Gan W (2016) FHN: an efficient algorithm for mining high-utility itemsets with negative unit profits. Knowl Based Syst 111:283–298CrossRef Lin JCW, Fournier-Viger P, Gan W (2016) FHN: an efficient algorithm for mining high-utility itemsets with negative unit profits. Knowl Based Syst 111:283–298CrossRef
Zurück zum Zitat Liu C, Dong X, Li C, Li Y (2015) SAPNSP: select actionable positive and negative sequential patterns based on a contribution metric. In: Proceedings of the 12th international conference on fuzzy systems and knowledge discovery, pp 811–815 Liu C, Dong X, Li C, Li Y (2015) SAPNSP: select actionable positive and negative sequential patterns based on a contribution metric. In: Proceedings of the 12th international conference on fuzzy systems and knowledge discovery, pp 811–815
Zurück zum Zitat Mallick B, Garg D, Grover PS (2013) CRM customer value based on constrained sequential pattern mining. Int J Comput Appl 64(9):21–29 Mallick B, Garg D, Grover PS (2013) CRM customer value based on constrained sequential pattern mining. Int J Comput Appl 64(9):21–29
Zurück zum Zitat Mooney CH, Roddick JF (2013) Sequential pattern mining—approaches and algorithms. ACM Comput Surv 45(2):1–39CrossRef Mooney CH, Roddick JF (2013) Sequential pattern mining—approaches and algorithms. ACM Comput Surv 45(2):1–39CrossRef
Zurück zum Zitat Moulis G, Lapeyre-Mestre M, Palmaro A, Pugnet G, Montastruc JL, Sailler L (2015) French health insurance databases: what interest for medical research? La Revue de Médecine Interne 36:411–417CrossRef Moulis G, Lapeyre-Mestre M, Palmaro A, Pugnet G, Montastruc JL, Sailler L (2015) French health insurance databases: what interest for medical research? La Revue de Médecine Interne 36:411–417CrossRef
Zurück zum Zitat Negrevergne B, Guns T (2015) Constraint-based sequence mining using constraint programming. In: Michel L (ed) Proceedings of the conference on integration of AI and OR techniques in constraint programming (CPAIOR). Springer International Publishing, Cham, pp 288–305 Negrevergne B, Guns T (2015) Constraint-based sequence mining using constraint programming. In: Michel L (ed) Proceedings of the conference on integration of AI and OR techniques in constraint programming (CPAIOR). Springer International Publishing, Cham, pp 288–305
Zurück zum Zitat Ngai E, Xiu L, Chau D (2009) Application of data mining techniques in customer relationship management: a literature review and classification. Expert Syst Appl 36(2, Part 2):2592–2602CrossRef Ngai E, Xiu L, Chau D (2009) Application of data mining techniques in customer relationship management: a literature review and classification. Expert Syst Appl 36(2, Part 2):2592–2602CrossRef
Zurück zum Zitat Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu MC (2004) Mining sequential patterns by pattern-growth: the PrefixSpan approach. IEEE Trans Knowl Data Eng 16(11):1424–1440CrossRef Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu MC (2004) Mining sequential patterns by pattern-growth: the PrefixSpan approach. IEEE Trans Knowl Data Eng 16(11):1424–1440CrossRef
Zurück zum Zitat Pei J, Han J, Wang W (2007) Constraint-based sequential pattern mining: the pattern-growth methods. J Intell Inf Syst 28(2):133–160CrossRef Pei J, Han J, Wang W (2007) Constraint-based sequential pattern mining: the pattern-growth methods. J Intell Inf Syst 28(2):133–160CrossRef
Zurück zum Zitat Polard E, Nowak E, Happe A, Biraben A, Oger E (2015) Brand name to generic substitution of antiepileptic drugs does not lead to seizure-related hospitalization: a population-based case-crossover study. Pharmacoepidemiol Drug Saf 24:1161–1169CrossRef Polard E, Nowak E, Happe A, Biraben A, Oger E (2015) Brand name to generic substitution of antiepileptic drugs does not lead to seizure-related hospitalization: a population-based case-crossover study. Pharmacoepidemiol Drug Saf 24:1161–1169CrossRef
Zurück zum Zitat Qiu P, Zhao L, Dong X (2017) NegI-NSP: negative sequential pattern mining based on loose constraints. In: Proceedings of the 43rd annual conference of the IEEE industrial electronics society (IECON), pp 3419–3425 Qiu P, Zhao L, Dong X (2017) NegI-NSP: negative sequential pattern mining based on loose constraints. In: Proceedings of the 43rd annual conference of the IEEE industrial electronics society (IECON), pp 3419–3425
Zurück zum Zitat Srikant R, Agrawal R (1996) Mining sequential patterns: Generalizations and performance improvements. In: Proceedings of the international conference on extending database technology (EDBT). Springer, pp 1–17 Srikant R, Agrawal R (1996) Mining sequential patterns: Generalizations and performance improvements. In: Proceedings of the international conference on extending database technology (EDBT). Springer, pp 1–17
Zurück zum Zitat Wang W, Cao L (2019) Negative sequences analysis: a review. ACM Comput Surv 52(2):1–39 Wang W, Cao L (2019) Negative sequences analysis: a review. ACM Comput Surv 52(2):1–39
Zurück zum Zitat Xu T, Dong X, Xu J, Dong X (2017a) Mining high utility sequential patterns with negative item values. Int J Pattern Recognit Artif Intell 31(10):1750035CrossRef Xu T, Dong X, Xu J, Dong X (2017a) Mining high utility sequential patterns with negative item values. Int J Pattern Recognit Artif Intell 31(10):1750035CrossRef
Zurück zum Zitat Xu T, Dong X, Xu J, Gong Y (2017b) E-msNSP: efficient negative sequential patterns mining based on multiple minimum supports. Int J Pattern Recognit Artif Intell 31(02):1750003CrossRef Xu T, Dong X, Xu J, Gong Y (2017b) E-msNSP: efficient negative sequential patterns mining based on multiple minimum supports. Int J Pattern Recognit Artif Intell 31(02):1750003CrossRef
Zurück zum Zitat Xu T, Li T, Dong X (2018) Efficient high utility negative sequential patterns mining in smart campus. IEEE Access 6:23839–23847CrossRef Xu T, Li T, Dong X (2018) Efficient high utility negative sequential patterns mining in smart campus. IEEE Access 6:23839–23847CrossRef
Zurück zum Zitat Zaki MJ (2001) SPADE: an efficient algorithm for mining frequent sequences. Mach Learn 42(1/2):31–60CrossRef Zaki MJ (2001) SPADE: an efficient algorithm for mining frequent sequences. Mach Learn 42(1/2):31–60CrossRef
Zurück zum Zitat Zheng Z, Zhao Y, Zuo Z, Cao L (2009) Negative-GSP: an efficient method for mining negative sequential patterns. In: Proceedings of the Australasian data mining conference, pp 63–67 Zheng Z, Zhao Y, Zuo Z, Cao L (2009) Negative-GSP: an efficient method for mining negative sequential patterns. In: Proceedings of the Australasian data mining conference, pp 63–67
Zurück zum Zitat Zheng Z, Zhao Y, Zuo Z, Cao L (2010) An efficient GA-based algorithm for mining negative sequential patterns. In: Proceedings of the Pacific-Asia conference on knowledge discovery and data mining (PAKDD). Springer, pp 262–273 Zheng Z, Zhao Y, Zuo Z, Cao L (2010) An efficient GA-based algorithm for mining negative sequential patterns. In: Proceedings of the Pacific-Asia conference on knowledge discovery and data mining (PAKDD). Springer, pp 262–273
Metadaten
Titel
NegPSpan: efficient extraction of negative sequential patterns with embedding constraints
Publikationsdatum
21.01.2020
Erschienen in
Data Mining and Knowledge Discovery / Ausgabe 2/2020
Print ISSN: 1384-5810
Elektronische ISSN: 1573-756X
DOI
https://doi.org/10.1007/s10618-019-00672-w

Weitere Artikel der Ausgabe 2/2020

Data Mining and Knowledge Discovery 2/2020 Zur Ausgabe

Premium Partner