Skip to main content
Top
Published in: Knowledge and Information Systems 3/2015

01-12-2015 | Regular Paper

Mining sequential patterns for classification

Authors: Dmitriy Fradkin, Fabian Mörchen

Published in: Knowledge and Information Systems | Issue 3/2015

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

While a number of efficient sequential pattern mining algorithms were developed over the years, they can still take a long time and produce a huge number of patterns, many of which are redundant. These properties are especially frustrating when the goal of pattern mining is to find patterns for use as features in classification problems. In this paper, we describe BIDE-Discriminative, a modification of BIDE that uses class information for direct mining of predictive sequential patterns. We then perform an extensive evaluation on nine real-life datasets of the different ways in which the basic BIDE-Discriminative can be used in real multi-class classification problems, including 1-versus-rest and model-based search tree approaches. The results of our experiments show that 1-versus-rest provides an efficient solution with good classification performance.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD international ACM Press, conference on management of data, pp 207–216 Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD international ACM Press, conference on management of data, pp 207–216
2.
go back to reference Agrawal R, Srikant R (1995) Mining sequential patterns. In: ICDE. IEEE Press, pp 3–14 Agrawal R, Srikant R (1995) Mining sequential patterns. In: ICDE. IEEE Press, pp 3–14
3.
go back to reference Asuncion A, Newman D (n.d.) UCI Machine Learning Repository Asuncion A, Newman D (n.d.) UCI Machine Learning Repository
4.
go back to reference Batal I, Fradkin D, Harrison J, Moerchen F, Hauskrecht M (2012) Mining recent temporal patterns for event detection in multivariate time series data. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 280–288. doi:10.1145/2339530.2339578 Batal I, Fradkin D, Harrison J, Moerchen F, Hauskrecht M (2012) Mining recent temporal patterns for event detection in multivariate time series data. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 280–288. doi:10.​1145/​2339530.​2339578
5.
go back to reference Batal I, Valizadegan H, Cooper GF, Hauskrecht M (2011) A pattern mining approach for classifying multivariate temporal data. In: Proceedings of the 2011 IEEE international conference on bioinformatics and biomedicine, pp 358–365. doi:10.1109/BIBM.2011.39 Batal I, Valizadegan H, Cooper GF, Hauskrecht M (2011) A pattern mining approach for classifying multivariate temporal data. In: Proceedings of the 2011 IEEE international conference on bioinformatics and biomedicine, pp 358–365. doi:10.​1109/​BIBM.​2011.​39
6.
go back to reference Bringmann B, Zimmermann A (2008) One in a million: picking the right patterns. Knowl Inf Syst 18(1):61–81CrossRef Bringmann B, Zimmermann A (2008) One in a million: picking the right patterns. Knowl Inf Syst 18(1):61–81CrossRef
7.
go back to reference Bringmann B, Zimmermann A, Raedt L, Nijssen S (2006) Dont be afraid of simpler patterns. In: Frnkranz J, Scheffer T, Spiliopoulou M (eds) Knowledge discovery in databases: PKDD 2006, vol 4213 of LNCS. Springer, Berlin, pp 55–66. doi:10.1007/11871637_10 Bringmann B, Zimmermann A, Raedt L, Nijssen S (2006) Dont be afraid of simpler patterns. In: Frnkranz J, Scheffer T, Spiliopoulou M (eds) Knowledge discovery in databases: PKDD 2006, vol 4213 of LNCS. Springer, Berlin, pp 55–66. doi:10.​1007/​11871637_​10
8.
go back to reference Buza K, Schmidt-Thieme L (2010) Motif-based classification of time series with bayesian networks and svms. In: Fink A, Lausen B, Seidel W, Ultsch A (eds) Advances in data analysis, data handling and business intelligence. Studies in classification, data analysis, and knowledge organization. Springer, Berlin, pp 105–114. doi:10.1007/978-3-642-01044-6_9 Buza K, Schmidt-Thieme L (2010) Motif-based classification of time series with bayesian networks and svms. In: Fink A, Lausen B, Seidel W, Ultsch A (eds) Advances in data analysis, data handling and business intelligence. Studies in classification, data analysis, and knowledge organization. Springer, Berlin, pp 105–114. doi:10.​1007/​978-3-642-01044-6_​9
9.
go back to reference Carbonell J, Coldstein J (1998) The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of SIGIR, p 335336 Carbonell J, Coldstein J (1998) The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of SIGIR, p 335336
10.
go back to reference Cheng H, Yan X, Han J, Hsu C-W (2007) Discriminative frequent pattern analysis for effective classification. In: Proceedings of the IEEE ICDE Cheng H, Yan X, Han J, Hsu C-W (2007) Discriminative frequent pattern analysis for effective classification. In: Proceedings of the IEEE ICDE
11.
go back to reference Cheng H, Yan X, Han J, Yu PS (2008) Direct discriminative pattern mining for effective classification. In: ICDE, pp 169–178 Cheng H, Yan X, Han J, Yu PS (2008) Direct discriminative pattern mining for effective classification. In: ICDE, pp 169–178
12.
go back to reference Cover TM, Thomas JA (2006) Elements of information theory, 2nd edn. Wiley, New York Cover TM, Thomas JA (2006) Elements of information theory, 2nd edn. Wiley, New York
13.
go back to reference Dong G, Pei J (2007) Sequence data mining. Morgan Kaufmann, BurlingtonMATH Dong G, Pei J (2007) Sequence data mining. Morgan Kaufmann, BurlingtonMATH
14.
go back to reference Fan R-E, Chang K-W, Hsieh C-J, Wang X-R, Lin C-J (2008) Liblinear: a library for large linear classification. J Mach Learn Res 9:1871–1874MATH Fan R-E, Chang K-W, Hsieh C-J, Wang X-R, Lin C-J (2008) Liblinear: a library for large linear classification. J Mach Learn Res 9:1871–1874MATH
15.
go back to reference Fan W, Zhang K, Cheng H, Gao J, Yan X, Han J, Yu PS, Verscheure O (2008) Direct mining of discriminative and essential frequent patterns via model-based search tree. In: KDD, pp 230–238 Fan W, Zhang K, Cheng H, Gao J, Yan X, Han J, Yu PS, Verscheure O (2008) Direct mining of discriminative and essential frequent patterns via model-based search tree. In: KDD, pp 230–238
16.
go back to reference Fern A (2004) Learning models and formulas of a temporal event logic. PhD thesis, Purdue University, West Lafayette, IN, USA Fern A (2004) Learning models and formulas of a temporal event logic. PhD thesis, Purdue University, West Lafayette, IN, USA
17.
go back to reference Fradkin D, Moerchen F (2010) Margin-closed frequent sequential pattern mining. KDD workshop on useful patterns. ACM, New York, NY, USA, pp 45–54 Fradkin D, Moerchen F (2010) Margin-closed frequent sequential pattern mining. KDD workshop on useful patterns. ACM, New York, NY, USA, pp 45–54
18.
go back to reference Grahne G, Zhu J (2003) Efficiently using prefix-trees in mining frequent itemsets. In: ICDM workshop on frequent itemset mining implementations Grahne G, Zhu J (2003) Efficiently using prefix-trees in mining frequent itemsets. In: ICDM workshop on frequent itemset mining implementations
19.
go back to reference Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182MATH Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182MATH
20.
go back to reference Han J, Kamber M (2006) Data mining: concepts and techniques, 2nd edn. Morgan Kaufmann, Burlington Han J, Kamber M (2006) Data mining: concepts and techniques, 2nd edn. Morgan Kaufmann, Burlington
21.
go back to reference Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the ACM SIGMOD international conference on management of data. ACM Press, pp 1–12 Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the ACM SIGMOD international conference on management of data. ACM Press, pp 1–12
22.
go back to reference Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: SIGMOD, pp 1–12 Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: SIGMOD, pp 1–12
23.
go back to reference Ifrim G, Bakir GH, Weikum G (2008) Fast logistic regression for text categorization with variable-length n-grams. In: KDD, pp 354–362 Ifrim G, Bakir GH, Weikum G (2008) Fast logistic regression for text categorization with variable-length n-grams. In: KDD, pp 354–362
24.
go back to reference Ifrim G, Wiuf C (2011) Bounded coordinate-descent for biological sequence classification in high dimensional predictor space. In: KDD Ifrim G, Wiuf C (2011) Bounded coordinate-descent for biological sequence classification in high dimensional predictor space. In: KDD
25.
go back to reference Kadous MW (2002) Temporal classification: extending the classification paradigm to multivariate time series. PhD thesis, University of New South Wales Kadous MW (2002) Temporal classification: extending the classification paradigm to multivariate time series. PhD thesis, University of New South Wales
26.
go back to reference Kerr W, Cohen P, Chang Y-H (2008) Learning and playing in wubble world. In: Proceedings of the fourth artificial intelligence and interactive digital entertainment conference, pp 66–71 Kerr W, Cohen P, Chang Y-H (2008) Learning and playing in wubble world. In: Proceedings of the fourth artificial intelligence and interactive digital entertainment conference, pp 66–71
27.
go back to reference Knobbe AJ, Ho EKY (2006) Pattern teams. In: PKDD, pp 577–584 Knobbe AJ, Ho EKY (2006) Pattern teams. In: PKDD, pp 577–584
28.
go back to reference Lee J-G, Han J, Li X, Cheng H (2011) Mining discriminative patterns for classifying trajectories on road networks. IEEE Trans Knowl Data Eng 23(5):713–726CrossRef Lee J-G, Han J, Li X, Cheng H (2011) Mining discriminative patterns for classifying trajectories on road networks. IEEE Trans Knowl Data Eng 23(5):713–726CrossRef
29.
go back to reference Lin J, Keogh E, Lonardi S, Chiu B (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 2003 ACM SIGMOD workshop on research issues in data mining and knowledge discovery. ACM Press, pp 2–11. URL:http://citeseer.ist.psu.edu/583097.html Lin J, Keogh E, Lonardi S, Chiu B (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 2003 ACM SIGMOD workshop on research issues in data mining and knowledge discovery. ACM Press, pp 2–11. URL:http://​citeseer.​ist.​psu.​edu/​583097.​html
30.
go back to reference Lo D, Cheng H, Cia L (2011) Mining closed discriminative dyadic sequential patterns. In: EDBT Lo D, Cheng H, Cia L (2011) Mining closed discriminative dyadic sequential patterns. In: EDBT
31.
go back to reference Lo D, Han J, Cheng H, Khoo S-C, Sun C (2009) Classification of software behaviros for failure detection: a discriminative pattern mining approach. In: Proceedings of KDD Lo D, Han J, Cheng H, Khoo S-C, Sun C (2009) Classification of software behaviros for failure detection: a discriminative pattern mining approach. In: Proceedings of KDD
32.
go back to reference Lucchese C, Orlando S, Perego R (2006) Fast and memory efficient mining of frequent closed itemsets. IEEE Trans Knowl Data Eng 18(1):21–36CrossRef Lucchese C, Orlando S, Perego R (2006) Fast and memory efficient mining of frequent closed itemsets. IEEE Trans Knowl Data Eng 18(1):21–36CrossRef
33.
go back to reference Mäntyjärvi J, Himberg J, Kangas P, Tuomela U, Huuskonen P (2004) Sensor signal data set for exploring context recognition of mobile devices. In: Proceedings of PERVASIVE. Springer, pp 18–23 Mäntyjärvi J, Himberg J, Kangas P, Tuomela U, Huuskonen P (2004) Sensor signal data set for exploring context recognition of mobile devices. In: Proceedings of PERVASIVE. Springer, pp 18–23
34.
go back to reference Moerchen F, Thies M, Ultsch A (2011) Efficient mining of all margin-closed itemsets with applications in temporal knowledge discovery and classification by compression. Knowl Inf Syst 29:55–80. doi:10.1007/s10115-010-0329-5 Moerchen F, Thies M, Ultsch A (2011) Efficient mining of all margin-closed itemsets with applications in temporal knowledge discovery and classification by compression. Knowl Inf Syst 29:55–80. doi:10.​1007/​s10115-010-0329-5
35.
go back to reference Mörchen F, Ultsch A (2005) Optimizing time series discretization for knowledge discovery. In: Proceedings of the ACM SIGKDD. ACM Press, pp 660–665 Mörchen F, Ultsch A (2005) Optimizing time series discretization for knowledge discovery. In: Proceedings of the ACM SIGKDD. ACM Press, pp 660–665
36.
go back to reference Mörchen F, Ultsch A (2007) Efficient mining of understandable patterns from multivariate interval time series. Data Min Knowl Discov 15(2):181–215. doi:10.1007/s10618-007-0070-1 Mörchen F, Ultsch A (2007) Efficient mining of understandable patterns from multivariate interval time series. Data Min Knowl Discov 15(2):181–215. doi:10.​1007/​s10618-007-0070-1
37.
go back to reference Morishita S, Sese J (2000) Traversing itemset lattice with statistical metric pruning. In: PODS, pp 226–236 Morishita S, Sese J (2000) Traversing itemset lattice with statistical metric pruning. In: PODS, pp 226–236
38.
go back to reference Nijssen S, Kok J (2006) Multi-class correlated pattern mining. In: Bonchi F, Boulicaut J-F (eds) Knowledge discovery in inductive databases, vol 3933 of LNCS. Springer, Berlin, pp 165–187. doi:10.1007/11733492_10 Nijssen S, Kok J (2006) Multi-class correlated pattern mining. In: Bonchi F, Boulicaut J-F (eds) Knowledge discovery in inductive databases, vol 3933 of LNCS. Springer, Berlin, pp 165–187. doi:10.​1007/​11733492_​10
39.
go back to reference Ohara K, Hara M, Takabayashi K, Motoda H, Washio T (2008) Pruning strategies based on the upper bound of information gain for discriminative subgraph mining. In: PKAW’08, pp 50–60 Ohara K, Hara M, Takabayashi K, Motoda H, Washio T (2008) Pruning strategies based on the upper bound of information gain for discriminative subgraph mining. In: PKAW’08, pp 50–60
40.
go back to reference Papaterou P, Kollios G, Sclaroff S, Gunopoulos D (2005) Discovering frequent arrangements of temporal intervals. In: ICDM, pp 354–361 Papaterou P, Kollios G, Sclaroff S, Gunopoulos D (2005) Discovering frequent arrangements of temporal intervals. In: ICDM, pp 354–361
41.
go back to reference Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu M-C (2001) PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth. In: Proceedings of the IEEE ICDE. IEEE Press, pp 215–224 Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu M-C (2001) PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth. In: Proceedings of the IEEE ICDE. IEEE Press, pp 215–224
42.
go back to reference Sese J, Morishita S (2004) Itemset classified clustering, In: Boulicaut J-F, Esposito F, Giannotti F, Pedreschi D (eds) Knowledge discovery in databases: PKDD 2004, vol 3202 of LNCS. Springer, Berlin, pp 398–409. doi:10.1007/978-3-540-30116-5_37 Sese J, Morishita S (2004) Itemset classified clustering, In: Boulicaut J-F, Esposito F, Giannotti F, Pedreschi D (eds) Knowledge discovery in databases: PKDD 2004, vol 3202 of LNCS. Springer, Berlin, pp 398–409. doi:10.​1007/​978-3-540-30116-5_​37
43.
go back to reference Sipos R, Fradkin D, Moerchen F, Wang Z (2014) Log-based predictive maintenance, In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1867–1876. doi:10.1145/2623330.2623340 Sipos R, Fradkin D, Moerchen F, Wang Z (2014) Log-based predictive maintenance, In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1867–1876. doi:10.​1145/​2623330.​2623340
45.
go back to reference Starner T, Weaver J, Pentland A (1998) Real-time American sign language recognition using desk and wearable computer-based video. IEEE Trans Pattern Anal Mach Intell 20(12):1371–1375. doi:10.1109/34.735811 Starner T, Weaver J, Pentland A (1998) Real-time American sign language recognition using desk and wearable computer-based video. IEEE Trans Pattern Anal Mach Intell 20(12):1371–1375. doi:10.​1109/​34.​735811
46.
go back to reference Wang J, Han J (2004) BIDE: Efficient mining of frequent closed sequences. In: ICDE. IEEE Press, pp 79–90 Wang J, Han J (2004) BIDE: Efficient mining of frequent closed sequences. In: ICDE. IEEE Press, pp 79–90
47.
go back to reference Wang J, Han J, Li C (2007) Frequent closed sequence mining without candidate maintenance. IEEE Trans Knowl Data Eng 19(8):1042–1056MathSciNetCrossRef Wang J, Han J, Li C (2007) Frequent closed sequence mining without candidate maintenance. IEEE Trans Knowl Data Eng 19(8):1042–1056MathSciNetCrossRef
48.
go back to reference Wu S-Y, Chen Y-L (2007) Mining nonambiguous temporal patterns for interval-based events. IEEE Trans Knowl Data Eng 19(6):742–758CrossRef Wu S-Y, Chen Y-L (2007) Mining nonambiguous temporal patterns for interval-based events. IEEE Trans Knowl Data Eng 19(6):742–758CrossRef
49.
go back to reference Xu W, Huang L, Fox A, Patterson D, Jordan M (2008) Mining console logs for large-scale system problem detection. In: Proceedings of the 3rd workshop on tackling computer systems problems with machine learning techniques Xu W, Huang L, Fox A, Patterson D, Jordan M (2008) Mining console logs for large-scale system problem detection. In: Proceedings of the 3rd workshop on tackling computer systems problems with machine learning techniques
50.
go back to reference Yan X, Han J (2002) gspan: Graph-based substructure pattern mining. In: ICDM Yan X, Han J (2002) gspan: Graph-based substructure pattern mining. In: ICDM
51.
go back to reference Yang Y, Pedersen J (1997) A comparative study on feature selection in text categorization. In: ICML, pp 412–420 Yang Y, Pedersen J (1997) A comparative study on feature selection in text categorization. In: ICML, pp 412–420
52.
go back to reference Zaki M (2001) Spade: an efficient algorithm for mining frequent sequences. Mach Learn 42:31–60CrossRefMATH Zaki M (2001) Spade: an efficient algorithm for mining frequent sequences. Mach Learn 42:31–60CrossRefMATH
53.
go back to reference Zaki MJ, Hsiao C-J (2002) CHARM: an efficient algorithm for closed itemset mining. In: Proceedings of the 2nd SIAM international conference on data mining (SDM), SIAM, pp 457–473 Zaki MJ, Hsiao C-J (2002) CHARM: an efficient algorithm for closed itemset mining. In: Proceedings of the 2nd SIAM international conference on data mining (SDM), SIAM, pp 457–473
Metadata
Title
Mining sequential patterns for classification
Authors
Dmitriy Fradkin
Fabian Mörchen
Publication date
01-12-2015
Publisher
Springer London
Published in
Knowledge and Information Systems / Issue 3/2015
Print ISSN: 0219-1377
Electronic ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-014-0817-0

Other articles of this Issue 3/2015

Knowledge and Information Systems 3/2015 Go to the issue

Premium Partner