Skip to main content
Erschienen in: Knowledge and Information Systems 3/2015

01.12.2015 | Regular Paper

Mining sequential patterns for classification

verfasst von: Dmitriy Fradkin, Fabian Mörchen

Erschienen in: Knowledge and Information Systems | Ausgabe 3/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

While a number of efficient sequential pattern mining algorithms were developed over the years, they can still take a long time and produce a huge number of patterns, many of which are redundant. These properties are especially frustrating when the goal of pattern mining is to find patterns for use as features in classification problems. In this paper, we describe BIDE-Discriminative, a modification of BIDE that uses class information for direct mining of predictive sequential patterns. We then perform an extensive evaluation on nine real-life datasets of the different ways in which the basic BIDE-Discriminative can be used in real multi-class classification problems, including 1-versus-rest and model-based search tree approaches. The results of our experiments show that 1-versus-rest provides an efficient solution with good classification performance.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD international ACM Press, conference on management of data, pp 207–216 Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD international ACM Press, conference on management of data, pp 207–216
2.
Zurück zum Zitat Agrawal R, Srikant R (1995) Mining sequential patterns. In: ICDE. IEEE Press, pp 3–14 Agrawal R, Srikant R (1995) Mining sequential patterns. In: ICDE. IEEE Press, pp 3–14
3.
Zurück zum Zitat Asuncion A, Newman D (n.d.) UCI Machine Learning Repository Asuncion A, Newman D (n.d.) UCI Machine Learning Repository
4.
Zurück zum Zitat Batal I, Fradkin D, Harrison J, Moerchen F, Hauskrecht M (2012) Mining recent temporal patterns for event detection in multivariate time series data. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 280–288. doi:10.1145/2339530.2339578 Batal I, Fradkin D, Harrison J, Moerchen F, Hauskrecht M (2012) Mining recent temporal patterns for event detection in multivariate time series data. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 280–288. doi:10.​1145/​2339530.​2339578
5.
Zurück zum Zitat Batal I, Valizadegan H, Cooper GF, Hauskrecht M (2011) A pattern mining approach for classifying multivariate temporal data. In: Proceedings of the 2011 IEEE international conference on bioinformatics and biomedicine, pp 358–365. doi:10.1109/BIBM.2011.39 Batal I, Valizadegan H, Cooper GF, Hauskrecht M (2011) A pattern mining approach for classifying multivariate temporal data. In: Proceedings of the 2011 IEEE international conference on bioinformatics and biomedicine, pp 358–365. doi:10.​1109/​BIBM.​2011.​39
6.
Zurück zum Zitat Bringmann B, Zimmermann A (2008) One in a million: picking the right patterns. Knowl Inf Syst 18(1):61–81CrossRef Bringmann B, Zimmermann A (2008) One in a million: picking the right patterns. Knowl Inf Syst 18(1):61–81CrossRef
7.
Zurück zum Zitat Bringmann B, Zimmermann A, Raedt L, Nijssen S (2006) Dont be afraid of simpler patterns. In: Frnkranz J, Scheffer T, Spiliopoulou M (eds) Knowledge discovery in databases: PKDD 2006, vol 4213 of LNCS. Springer, Berlin, pp 55–66. doi:10.1007/11871637_10 Bringmann B, Zimmermann A, Raedt L, Nijssen S (2006) Dont be afraid of simpler patterns. In: Frnkranz J, Scheffer T, Spiliopoulou M (eds) Knowledge discovery in databases: PKDD 2006, vol 4213 of LNCS. Springer, Berlin, pp 55–66. doi:10.​1007/​11871637_​10
8.
Zurück zum Zitat Buza K, Schmidt-Thieme L (2010) Motif-based classification of time series with bayesian networks and svms. In: Fink A, Lausen B, Seidel W, Ultsch A (eds) Advances in data analysis, data handling and business intelligence. Studies in classification, data analysis, and knowledge organization. Springer, Berlin, pp 105–114. doi:10.1007/978-3-642-01044-6_9 Buza K, Schmidt-Thieme L (2010) Motif-based classification of time series with bayesian networks and svms. In: Fink A, Lausen B, Seidel W, Ultsch A (eds) Advances in data analysis, data handling and business intelligence. Studies in classification, data analysis, and knowledge organization. Springer, Berlin, pp 105–114. doi:10.​1007/​978-3-642-01044-6_​9
9.
Zurück zum Zitat Carbonell J, Coldstein J (1998) The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of SIGIR, p 335336 Carbonell J, Coldstein J (1998) The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of SIGIR, p 335336
10.
Zurück zum Zitat Cheng H, Yan X, Han J, Hsu C-W (2007) Discriminative frequent pattern analysis for effective classification. In: Proceedings of the IEEE ICDE Cheng H, Yan X, Han J, Hsu C-W (2007) Discriminative frequent pattern analysis for effective classification. In: Proceedings of the IEEE ICDE
11.
Zurück zum Zitat Cheng H, Yan X, Han J, Yu PS (2008) Direct discriminative pattern mining for effective classification. In: ICDE, pp 169–178 Cheng H, Yan X, Han J, Yu PS (2008) Direct discriminative pattern mining for effective classification. In: ICDE, pp 169–178
12.
Zurück zum Zitat Cover TM, Thomas JA (2006) Elements of information theory, 2nd edn. Wiley, New York Cover TM, Thomas JA (2006) Elements of information theory, 2nd edn. Wiley, New York
13.
Zurück zum Zitat Dong G, Pei J (2007) Sequence data mining. Morgan Kaufmann, BurlingtonMATH Dong G, Pei J (2007) Sequence data mining. Morgan Kaufmann, BurlingtonMATH
14.
Zurück zum Zitat Fan R-E, Chang K-W, Hsieh C-J, Wang X-R, Lin C-J (2008) Liblinear: a library for large linear classification. J Mach Learn Res 9:1871–1874MATH Fan R-E, Chang K-W, Hsieh C-J, Wang X-R, Lin C-J (2008) Liblinear: a library for large linear classification. J Mach Learn Res 9:1871–1874MATH
15.
Zurück zum Zitat Fan W, Zhang K, Cheng H, Gao J, Yan X, Han J, Yu PS, Verscheure O (2008) Direct mining of discriminative and essential frequent patterns via model-based search tree. In: KDD, pp 230–238 Fan W, Zhang K, Cheng H, Gao J, Yan X, Han J, Yu PS, Verscheure O (2008) Direct mining of discriminative and essential frequent patterns via model-based search tree. In: KDD, pp 230–238
16.
Zurück zum Zitat Fern A (2004) Learning models and formulas of a temporal event logic. PhD thesis, Purdue University, West Lafayette, IN, USA Fern A (2004) Learning models and formulas of a temporal event logic. PhD thesis, Purdue University, West Lafayette, IN, USA
17.
Zurück zum Zitat Fradkin D, Moerchen F (2010) Margin-closed frequent sequential pattern mining. KDD workshop on useful patterns. ACM, New York, NY, USA, pp 45–54 Fradkin D, Moerchen F (2010) Margin-closed frequent sequential pattern mining. KDD workshop on useful patterns. ACM, New York, NY, USA, pp 45–54
18.
Zurück zum Zitat Grahne G, Zhu J (2003) Efficiently using prefix-trees in mining frequent itemsets. In: ICDM workshop on frequent itemset mining implementations Grahne G, Zhu J (2003) Efficiently using prefix-trees in mining frequent itemsets. In: ICDM workshop on frequent itemset mining implementations
19.
Zurück zum Zitat Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182MATH Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182MATH
20.
Zurück zum Zitat Han J, Kamber M (2006) Data mining: concepts and techniques, 2nd edn. Morgan Kaufmann, Burlington Han J, Kamber M (2006) Data mining: concepts and techniques, 2nd edn. Morgan Kaufmann, Burlington
21.
Zurück zum Zitat Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the ACM SIGMOD international conference on management of data. ACM Press, pp 1–12 Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the ACM SIGMOD international conference on management of data. ACM Press, pp 1–12
22.
Zurück zum Zitat Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: SIGMOD, pp 1–12 Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: SIGMOD, pp 1–12
23.
Zurück zum Zitat Ifrim G, Bakir GH, Weikum G (2008) Fast logistic regression for text categorization with variable-length n-grams. In: KDD, pp 354–362 Ifrim G, Bakir GH, Weikum G (2008) Fast logistic regression for text categorization with variable-length n-grams. In: KDD, pp 354–362
24.
Zurück zum Zitat Ifrim G, Wiuf C (2011) Bounded coordinate-descent for biological sequence classification in high dimensional predictor space. In: KDD Ifrim G, Wiuf C (2011) Bounded coordinate-descent for biological sequence classification in high dimensional predictor space. In: KDD
25.
Zurück zum Zitat Kadous MW (2002) Temporal classification: extending the classification paradigm to multivariate time series. PhD thesis, University of New South Wales Kadous MW (2002) Temporal classification: extending the classification paradigm to multivariate time series. PhD thesis, University of New South Wales
26.
Zurück zum Zitat Kerr W, Cohen P, Chang Y-H (2008) Learning and playing in wubble world. In: Proceedings of the fourth artificial intelligence and interactive digital entertainment conference, pp 66–71 Kerr W, Cohen P, Chang Y-H (2008) Learning and playing in wubble world. In: Proceedings of the fourth artificial intelligence and interactive digital entertainment conference, pp 66–71
27.
Zurück zum Zitat Knobbe AJ, Ho EKY (2006) Pattern teams. In: PKDD, pp 577–584 Knobbe AJ, Ho EKY (2006) Pattern teams. In: PKDD, pp 577–584
28.
Zurück zum Zitat Lee J-G, Han J, Li X, Cheng H (2011) Mining discriminative patterns for classifying trajectories on road networks. IEEE Trans Knowl Data Eng 23(5):713–726CrossRef Lee J-G, Han J, Li X, Cheng H (2011) Mining discriminative patterns for classifying trajectories on road networks. IEEE Trans Knowl Data Eng 23(5):713–726CrossRef
29.
Zurück zum Zitat Lin J, Keogh E, Lonardi S, Chiu B (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 2003 ACM SIGMOD workshop on research issues in data mining and knowledge discovery. ACM Press, pp 2–11. URL:http://citeseer.ist.psu.edu/583097.html Lin J, Keogh E, Lonardi S, Chiu B (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 2003 ACM SIGMOD workshop on research issues in data mining and knowledge discovery. ACM Press, pp 2–11. URL:http://​citeseer.​ist.​psu.​edu/​583097.​html
30.
Zurück zum Zitat Lo D, Cheng H, Cia L (2011) Mining closed discriminative dyadic sequential patterns. In: EDBT Lo D, Cheng H, Cia L (2011) Mining closed discriminative dyadic sequential patterns. In: EDBT
31.
Zurück zum Zitat Lo D, Han J, Cheng H, Khoo S-C, Sun C (2009) Classification of software behaviros for failure detection: a discriminative pattern mining approach. In: Proceedings of KDD Lo D, Han J, Cheng H, Khoo S-C, Sun C (2009) Classification of software behaviros for failure detection: a discriminative pattern mining approach. In: Proceedings of KDD
32.
Zurück zum Zitat Lucchese C, Orlando S, Perego R (2006) Fast and memory efficient mining of frequent closed itemsets. IEEE Trans Knowl Data Eng 18(1):21–36CrossRef Lucchese C, Orlando S, Perego R (2006) Fast and memory efficient mining of frequent closed itemsets. IEEE Trans Knowl Data Eng 18(1):21–36CrossRef
33.
Zurück zum Zitat Mäntyjärvi J, Himberg J, Kangas P, Tuomela U, Huuskonen P (2004) Sensor signal data set for exploring context recognition of mobile devices. In: Proceedings of PERVASIVE. Springer, pp 18–23 Mäntyjärvi J, Himberg J, Kangas P, Tuomela U, Huuskonen P (2004) Sensor signal data set for exploring context recognition of mobile devices. In: Proceedings of PERVASIVE. Springer, pp 18–23
34.
Zurück zum Zitat Moerchen F, Thies M, Ultsch A (2011) Efficient mining of all margin-closed itemsets with applications in temporal knowledge discovery and classification by compression. Knowl Inf Syst 29:55–80. doi:10.1007/s10115-010-0329-5 Moerchen F, Thies M, Ultsch A (2011) Efficient mining of all margin-closed itemsets with applications in temporal knowledge discovery and classification by compression. Knowl Inf Syst 29:55–80. doi:10.​1007/​s10115-010-0329-5
35.
Zurück zum Zitat Mörchen F, Ultsch A (2005) Optimizing time series discretization for knowledge discovery. In: Proceedings of the ACM SIGKDD. ACM Press, pp 660–665 Mörchen F, Ultsch A (2005) Optimizing time series discretization for knowledge discovery. In: Proceedings of the ACM SIGKDD. ACM Press, pp 660–665
36.
Zurück zum Zitat Mörchen F, Ultsch A (2007) Efficient mining of understandable patterns from multivariate interval time series. Data Min Knowl Discov 15(2):181–215. doi:10.1007/s10618-007-0070-1 Mörchen F, Ultsch A (2007) Efficient mining of understandable patterns from multivariate interval time series. Data Min Knowl Discov 15(2):181–215. doi:10.​1007/​s10618-007-0070-1
37.
Zurück zum Zitat Morishita S, Sese J (2000) Traversing itemset lattice with statistical metric pruning. In: PODS, pp 226–236 Morishita S, Sese J (2000) Traversing itemset lattice with statistical metric pruning. In: PODS, pp 226–236
38.
Zurück zum Zitat Nijssen S, Kok J (2006) Multi-class correlated pattern mining. In: Bonchi F, Boulicaut J-F (eds) Knowledge discovery in inductive databases, vol 3933 of LNCS. Springer, Berlin, pp 165–187. doi:10.1007/11733492_10 Nijssen S, Kok J (2006) Multi-class correlated pattern mining. In: Bonchi F, Boulicaut J-F (eds) Knowledge discovery in inductive databases, vol 3933 of LNCS. Springer, Berlin, pp 165–187. doi:10.​1007/​11733492_​10
39.
Zurück zum Zitat Ohara K, Hara M, Takabayashi K, Motoda H, Washio T (2008) Pruning strategies based on the upper bound of information gain for discriminative subgraph mining. In: PKAW’08, pp 50–60 Ohara K, Hara M, Takabayashi K, Motoda H, Washio T (2008) Pruning strategies based on the upper bound of information gain for discriminative subgraph mining. In: PKAW’08, pp 50–60
40.
Zurück zum Zitat Papaterou P, Kollios G, Sclaroff S, Gunopoulos D (2005) Discovering frequent arrangements of temporal intervals. In: ICDM, pp 354–361 Papaterou P, Kollios G, Sclaroff S, Gunopoulos D (2005) Discovering frequent arrangements of temporal intervals. In: ICDM, pp 354–361
41.
Zurück zum Zitat Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu M-C (2001) PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth. In: Proceedings of the IEEE ICDE. IEEE Press, pp 215–224 Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu M-C (2001) PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth. In: Proceedings of the IEEE ICDE. IEEE Press, pp 215–224
42.
Zurück zum Zitat Sese J, Morishita S (2004) Itemset classified clustering, In: Boulicaut J-F, Esposito F, Giannotti F, Pedreschi D (eds) Knowledge discovery in databases: PKDD 2004, vol 3202 of LNCS. Springer, Berlin, pp 398–409. doi:10.1007/978-3-540-30116-5_37 Sese J, Morishita S (2004) Itemset classified clustering, In: Boulicaut J-F, Esposito F, Giannotti F, Pedreschi D (eds) Knowledge discovery in databases: PKDD 2004, vol 3202 of LNCS. Springer, Berlin, pp 398–409. doi:10.​1007/​978-3-540-30116-5_​37
43.
Zurück zum Zitat Sipos R, Fradkin D, Moerchen F, Wang Z (2014) Log-based predictive maintenance, In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1867–1876. doi:10.1145/2623330.2623340 Sipos R, Fradkin D, Moerchen F, Wang Z (2014) Log-based predictive maintenance, In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1867–1876. doi:10.​1145/​2623330.​2623340
45.
Zurück zum Zitat Starner T, Weaver J, Pentland A (1998) Real-time American sign language recognition using desk and wearable computer-based video. IEEE Trans Pattern Anal Mach Intell 20(12):1371–1375. doi:10.1109/34.735811 Starner T, Weaver J, Pentland A (1998) Real-time American sign language recognition using desk and wearable computer-based video. IEEE Trans Pattern Anal Mach Intell 20(12):1371–1375. doi:10.​1109/​34.​735811
46.
Zurück zum Zitat Wang J, Han J (2004) BIDE: Efficient mining of frequent closed sequences. In: ICDE. IEEE Press, pp 79–90 Wang J, Han J (2004) BIDE: Efficient mining of frequent closed sequences. In: ICDE. IEEE Press, pp 79–90
47.
Zurück zum Zitat Wang J, Han J, Li C (2007) Frequent closed sequence mining without candidate maintenance. IEEE Trans Knowl Data Eng 19(8):1042–1056MathSciNetCrossRef Wang J, Han J, Li C (2007) Frequent closed sequence mining without candidate maintenance. IEEE Trans Knowl Data Eng 19(8):1042–1056MathSciNetCrossRef
48.
Zurück zum Zitat Wu S-Y, Chen Y-L (2007) Mining nonambiguous temporal patterns for interval-based events. IEEE Trans Knowl Data Eng 19(6):742–758CrossRef Wu S-Y, Chen Y-L (2007) Mining nonambiguous temporal patterns for interval-based events. IEEE Trans Knowl Data Eng 19(6):742–758CrossRef
49.
Zurück zum Zitat Xu W, Huang L, Fox A, Patterson D, Jordan M (2008) Mining console logs for large-scale system problem detection. In: Proceedings of the 3rd workshop on tackling computer systems problems with machine learning techniques Xu W, Huang L, Fox A, Patterson D, Jordan M (2008) Mining console logs for large-scale system problem detection. In: Proceedings of the 3rd workshop on tackling computer systems problems with machine learning techniques
50.
Zurück zum Zitat Yan X, Han J (2002) gspan: Graph-based substructure pattern mining. In: ICDM Yan X, Han J (2002) gspan: Graph-based substructure pattern mining. In: ICDM
51.
Zurück zum Zitat Yang Y, Pedersen J (1997) A comparative study on feature selection in text categorization. In: ICML, pp 412–420 Yang Y, Pedersen J (1997) A comparative study on feature selection in text categorization. In: ICML, pp 412–420
52.
Zurück zum Zitat Zaki M (2001) Spade: an efficient algorithm for mining frequent sequences. Mach Learn 42:31–60CrossRefMATH Zaki M (2001) Spade: an efficient algorithm for mining frequent sequences. Mach Learn 42:31–60CrossRefMATH
53.
Zurück zum Zitat Zaki MJ, Hsiao C-J (2002) CHARM: an efficient algorithm for closed itemset mining. In: Proceedings of the 2nd SIAM international conference on data mining (SDM), SIAM, pp 457–473 Zaki MJ, Hsiao C-J (2002) CHARM: an efficient algorithm for closed itemset mining. In: Proceedings of the 2nd SIAM international conference on data mining (SDM), SIAM, pp 457–473
Metadaten
Titel
Mining sequential patterns for classification
verfasst von
Dmitriy Fradkin
Fabian Mörchen
Publikationsdatum
01.12.2015
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 3/2015
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-014-0817-0

Weitere Artikel der Ausgabe 3/2015

Knowledge and Information Systems 3/2015 Zur Ausgabe