Skip to main content
Top
Published in: Data Mining and Knowledge Discovery 2/2020

21-01-2020

NegPSpan: efficient extraction of negative sequential patterns with embedding constraints

Authors: Thomas Guyet, René Quiniou

Published in: Data Mining and Knowledge Discovery | Issue 2/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Sequential pattern mining is concerned with the extraction of frequent or recurrent behaviors, modeled as subsequences, from a sequence dataset. Such patterns inform about which events are frequently observed in sequences, i.e. events that really happen. Sometimes, knowing that some specific event does not happen is more informative than extracting observed events. Negative sequential patterns (NSPs) capture recurrent behaviors by patterns having the form of sequences mentioning both observed events and absence of events. Few approaches have been proposed to mine such NSPs. In addition, the syntax and semantics of NSPs differ in the different methods which makes it difficult to compare them. This article provides a unified framework for the formulation of the syntax and the semantics of NSPs. Then, we introduce a new algorithm, NegPSpan, that extracts NSPs using a prefix-based depth-first scheme, enabling maxgap constraints that other approaches do not take into account. The formal framework highlights the differences between the proposed approach and methods from the literature, especially against the state of the art approach eNSP. Intensive experiments on synthetic and real datasets show that NegPSpan can extract meaningful NSPs and that it can process bigger datasets than eNSP thanks to significantly lower memory requirements and better computation times.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
1
Called the maximal positive subsequence in PNSP (Hsueh et al. 2008) and NegGSP (Zheng et al. 2009) or the positive element id-set in eNSP.
 
2
Actually, though not clearly stated, it seems that the negative elements of NegGSP patterns consist of items rather than itemsets. In this case, total and partial inclusion are equivalent (see Proposition 6).
 
3
Code, data generator and synthetic benchmark datasets can be tested online and downloaded here: http://​people.​irisa.​fr/​Thomas.​Guyet/​negativepatterns​/​.
 
5
 
6
ATC: Anatomical Therapeutic Chemical Classification System is a drug classification system that classifies the active ingredients of drugs.
 
7
Note that we proved that this path is actually unique.
 
Literature
go back to reference Bosc G, Boulicaut JF, Raïssi C, Kaytoue M (2018) Anytime discovery of a diverse set of patterns with monte carlo tree search. Data Min Knowl Discov 32(3):604–650MathSciNetCrossRef Bosc G, Boulicaut JF, Raïssi C, Kaytoue M (2018) Anytime discovery of a diverse set of patterns with monte carlo tree search. Data Min Knowl Discov 32(3):604–650MathSciNetCrossRef
go back to reference Cao L, Yu PS, Kumar V (2015) Nonoccurring behavior analytics: a new area. Intell Syst 30(6):4–11CrossRef Cao L, Yu PS, Kumar V (2015) Nonoccurring behavior analytics: a new area. Intell Syst 30(6):4–11CrossRef
go back to reference Dauxais Y, Guyet T, Gross-Amblard D, Happe A (2017) Discriminant chronicles mining—application to care pathways analytics. In: Proceedings of 16th conference on artificial intelligence in medicine (AIME), pp 234–244 Dauxais Y, Guyet T, Gross-Amblard D, Happe A (2017) Discriminant chronicles mining—application to care pathways analytics. In: Proceedings of 16th conference on artificial intelligence in medicine (AIME), pp 234–244
go back to reference Dong X, Gong Y, Cao L (2018b) F-NSP\(+\): a fast negative sequential patterns mining method with self-adaptive data storage. Pattern Recognit 84:13–27CrossRef Dong X, Gong Y, Cao L (2018b) F-NSP\(+\): a fast negative sequential patterns mining method with self-adaptive data storage. Pattern Recognit 84:13–27CrossRef
go back to reference Giannotti F, Nanni M, Pedreschi D (2006) Efficient mining of temporally annotated sequences. In: Proceedings of the SIAM international conference on data mining, pp 348–359 Giannotti F, Nanni M, Pedreschi D (2006) Efficient mining of temporally annotated sequences. In: Proceedings of the SIAM international conference on data mining, pp 348–359
go back to reference Gong Y, Xu T, Dong X, Lv G (2017) e-NSPFI: efficient mining negative sequential pattern from both frequent and infrequent positive sequential patterns. Int J Pattern Recognit Artif Intell 31(02):1750002CrossRef Gong Y, Xu T, Dong X, Lv G (2017) e-NSPFI: efficient mining negative sequential pattern from both frequent and infrequent positive sequential patterns. Int J Pattern Recognit Artif Intell 31(02):1750002CrossRef
go back to reference Han J, Pei J, Mortazavi-Asl B, Chen Q, Dayal U, Hsu MC (2000) FreeSpan: frequent pattern-projected sequential pattern mining. In: Proceedings of the sixth international conference on knowledge discovery and data mining (SIGKDD), pp 355–359 Han J, Pei J, Mortazavi-Asl B, Chen Q, Dayal U, Hsu MC (2000) FreeSpan: frequent pattern-projected sequential pattern mining. In: Proceedings of the sixth international conference on knowledge discovery and data mining (SIGKDD), pp 355–359
go back to reference Hsueh SC, Lin MY, Chen CL (2008) Mining negative sequential patterns for e-commerce recommendations. In: Proceedings of Asia-Pacific services computing conference, pp 1213–1218 Hsueh SC, Lin MY, Chen CL (2008) Mining negative sequential patterns for e-commerce recommendations. In: Proceedings of Asia-Pacific services computing conference, pp 1213–1218
go back to reference Kamepalli S, Sekhara R, Kurra R (2014) Frequent negative sequential patterns - a survey. Int J Comput Eng Technol 5(3):115–121 Kamepalli S, Sekhara R, Kurra R (2014) Frequent negative sequential patterns - a survey. Int J Comput Eng Technol 5(3):115–121
go back to reference Lin JCW, Fournier-Viger P, Gan W (2016) FHN: an efficient algorithm for mining high-utility itemsets with negative unit profits. Knowl Based Syst 111:283–298CrossRef Lin JCW, Fournier-Viger P, Gan W (2016) FHN: an efficient algorithm for mining high-utility itemsets with negative unit profits. Knowl Based Syst 111:283–298CrossRef
go back to reference Liu C, Dong X, Li C, Li Y (2015) SAPNSP: select actionable positive and negative sequential patterns based on a contribution metric. In: Proceedings of the 12th international conference on fuzzy systems and knowledge discovery, pp 811–815 Liu C, Dong X, Li C, Li Y (2015) SAPNSP: select actionable positive and negative sequential patterns based on a contribution metric. In: Proceedings of the 12th international conference on fuzzy systems and knowledge discovery, pp 811–815
go back to reference Mallick B, Garg D, Grover PS (2013) CRM customer value based on constrained sequential pattern mining. Int J Comput Appl 64(9):21–29 Mallick B, Garg D, Grover PS (2013) CRM customer value based on constrained sequential pattern mining. Int J Comput Appl 64(9):21–29
go back to reference Mooney CH, Roddick JF (2013) Sequential pattern mining—approaches and algorithms. ACM Comput Surv 45(2):1–39CrossRef Mooney CH, Roddick JF (2013) Sequential pattern mining—approaches and algorithms. ACM Comput Surv 45(2):1–39CrossRef
go back to reference Moulis G, Lapeyre-Mestre M, Palmaro A, Pugnet G, Montastruc JL, Sailler L (2015) French health insurance databases: what interest for medical research? La Revue de Médecine Interne 36:411–417CrossRef Moulis G, Lapeyre-Mestre M, Palmaro A, Pugnet G, Montastruc JL, Sailler L (2015) French health insurance databases: what interest for medical research? La Revue de Médecine Interne 36:411–417CrossRef
go back to reference Negrevergne B, Guns T (2015) Constraint-based sequence mining using constraint programming. In: Michel L (ed) Proceedings of the conference on integration of AI and OR techniques in constraint programming (CPAIOR). Springer International Publishing, Cham, pp 288–305 Negrevergne B, Guns T (2015) Constraint-based sequence mining using constraint programming. In: Michel L (ed) Proceedings of the conference on integration of AI and OR techniques in constraint programming (CPAIOR). Springer International Publishing, Cham, pp 288–305
go back to reference Ngai E, Xiu L, Chau D (2009) Application of data mining techniques in customer relationship management: a literature review and classification. Expert Syst Appl 36(2, Part 2):2592–2602CrossRef Ngai E, Xiu L, Chau D (2009) Application of data mining techniques in customer relationship management: a literature review and classification. Expert Syst Appl 36(2, Part 2):2592–2602CrossRef
go back to reference Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu MC (2004) Mining sequential patterns by pattern-growth: the PrefixSpan approach. IEEE Trans Knowl Data Eng 16(11):1424–1440CrossRef Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu MC (2004) Mining sequential patterns by pattern-growth: the PrefixSpan approach. IEEE Trans Knowl Data Eng 16(11):1424–1440CrossRef
go back to reference Pei J, Han J, Wang W (2007) Constraint-based sequential pattern mining: the pattern-growth methods. J Intell Inf Syst 28(2):133–160CrossRef Pei J, Han J, Wang W (2007) Constraint-based sequential pattern mining: the pattern-growth methods. J Intell Inf Syst 28(2):133–160CrossRef
go back to reference Polard E, Nowak E, Happe A, Biraben A, Oger E (2015) Brand name to generic substitution of antiepileptic drugs does not lead to seizure-related hospitalization: a population-based case-crossover study. Pharmacoepidemiol Drug Saf 24:1161–1169CrossRef Polard E, Nowak E, Happe A, Biraben A, Oger E (2015) Brand name to generic substitution of antiepileptic drugs does not lead to seizure-related hospitalization: a population-based case-crossover study. Pharmacoepidemiol Drug Saf 24:1161–1169CrossRef
go back to reference Qiu P, Zhao L, Dong X (2017) NegI-NSP: negative sequential pattern mining based on loose constraints. In: Proceedings of the 43rd annual conference of the IEEE industrial electronics society (IECON), pp 3419–3425 Qiu P, Zhao L, Dong X (2017) NegI-NSP: negative sequential pattern mining based on loose constraints. In: Proceedings of the 43rd annual conference of the IEEE industrial electronics society (IECON), pp 3419–3425
go back to reference Srikant R, Agrawal R (1996) Mining sequential patterns: Generalizations and performance improvements. In: Proceedings of the international conference on extending database technology (EDBT). Springer, pp 1–17 Srikant R, Agrawal R (1996) Mining sequential patterns: Generalizations and performance improvements. In: Proceedings of the international conference on extending database technology (EDBT). Springer, pp 1–17
go back to reference Wang W, Cao L (2019) Negative sequences analysis: a review. ACM Comput Surv 52(2):1–39 Wang W, Cao L (2019) Negative sequences analysis: a review. ACM Comput Surv 52(2):1–39
go back to reference Xu T, Dong X, Xu J, Dong X (2017a) Mining high utility sequential patterns with negative item values. Int J Pattern Recognit Artif Intell 31(10):1750035CrossRef Xu T, Dong X, Xu J, Dong X (2017a) Mining high utility sequential patterns with negative item values. Int J Pattern Recognit Artif Intell 31(10):1750035CrossRef
go back to reference Xu T, Dong X, Xu J, Gong Y (2017b) E-msNSP: efficient negative sequential patterns mining based on multiple minimum supports. Int J Pattern Recognit Artif Intell 31(02):1750003CrossRef Xu T, Dong X, Xu J, Gong Y (2017b) E-msNSP: efficient negative sequential patterns mining based on multiple minimum supports. Int J Pattern Recognit Artif Intell 31(02):1750003CrossRef
go back to reference Xu T, Li T, Dong X (2018) Efficient high utility negative sequential patterns mining in smart campus. IEEE Access 6:23839–23847CrossRef Xu T, Li T, Dong X (2018) Efficient high utility negative sequential patterns mining in smart campus. IEEE Access 6:23839–23847CrossRef
go back to reference Zaki MJ (2001) SPADE: an efficient algorithm for mining frequent sequences. Mach Learn 42(1/2):31–60CrossRef Zaki MJ (2001) SPADE: an efficient algorithm for mining frequent sequences. Mach Learn 42(1/2):31–60CrossRef
go back to reference Zheng Z, Zhao Y, Zuo Z, Cao L (2009) Negative-GSP: an efficient method for mining negative sequential patterns. In: Proceedings of the Australasian data mining conference, pp 63–67 Zheng Z, Zhao Y, Zuo Z, Cao L (2009) Negative-GSP: an efficient method for mining negative sequential patterns. In: Proceedings of the Australasian data mining conference, pp 63–67
go back to reference Zheng Z, Zhao Y, Zuo Z, Cao L (2010) An efficient GA-based algorithm for mining negative sequential patterns. In: Proceedings of the Pacific-Asia conference on knowledge discovery and data mining (PAKDD). Springer, pp 262–273 Zheng Z, Zhao Y, Zuo Z, Cao L (2010) An efficient GA-based algorithm for mining negative sequential patterns. In: Proceedings of the Pacific-Asia conference on knowledge discovery and data mining (PAKDD). Springer, pp 262–273
Metadata
Title
NegPSpan: efficient extraction of negative sequential patterns with embedding constraints
Authors
Thomas Guyet
René Quiniou
Publication date
21-01-2020
Publisher
Springer US
Published in
Data Mining and Knowledge Discovery / Issue 2/2020
Print ISSN: 1384-5810
Electronic ISSN: 1573-756X
DOI
https://doi.org/10.1007/s10618-019-00672-w

Other articles of this Issue 2/2020

Data Mining and Knowledge Discovery 2/2020 Go to the issue

Premium Partner