Skip to main content
Erschienen in: Knowledge and Information Systems 1/2016

01.01.2016 | Regular Paper

An efficient pattern mining approach for event detection in multivariate temporal data

verfasst von: Iyad Batal, Gregory F. Cooper, Dmitriy Fradkin, James Harrison Jr., Fabian Moerchen, Milos Hauskrecht

Erschienen in: Knowledge and Information Systems | Ausgabe 1/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This work proposes a pattern mining approach to learn event detection models from complex multivariate temporal data, such as electronic health records. We present recent temporal pattern mining, a novel approach for efficiently finding predictive patterns for event detection problems. This approach first converts the time series data into time-interval sequences of temporal abstractions. It then constructs more complex time-interval patterns backward in time using temporal operators. We also present the minimal predictive recent temporal patterns framework for selecting a small set of predictive and non-spurious patterns. We apply our methods for predicting adverse medical events in real-world clinical data. The results demonstrate the benefits of our methods in learning accurate event detection models, which is a key step for developing intelligent patient monitoring and decision support systems.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
Sequential pattern mining is a special case of time-interval pattern mining, in which all intervals are simply time points with zero durations.
 
2
If \(E.s = E.e\), state interval \(E\) corresponds to a time point.
 
3
If two state intervals have the same start time, we sort them by their end time. If they also have the same end time, we sort them by lexical order of their variable names (as proposed by [21]).
 
4
This section contains materials that have been published in [6].
 
5
It is more efficient to mine patterns that cover more than \(n\) instances in one of the classes compared to mining patterns that cover more than \(n\) instances in the entire database (the former is always a subset of the latter).
 
6
The observations of the clinical variables are irregular in time because they are measured asynchronously at different time moments.
 
7
We apply statistical significance testing with k-fold cross-validation. In this setting, the testing sets are independent of each other, but the training sets are not. Even though this does not perfectly fit the iid assumption, the significance results are still of great help in comparing different learning methods [27].
 
8
As discussed in Sect. 4.2, we mine frequent patterns for the positives and negatives separately using the local minimum supports.
 
9
Most of the highest scores MPRTPs are predicting the RENAL category because it is the easiest prediction task. So to diversify the patterns, we show the top three predictive MPRTPs for RENAL and the top two MPRTPs for other categories.
Table 4
Diabetes dataset: the top MPRTPs with their precision and recall
MPRTP
Precision
Recall
\(P_1\): BUN=VH \(\Rightarrow \) Dx=RENAL
0.97
0.17
\(P_2\): Creat=N before Creat=H \(\Rightarrow \) Dx=RENAL
0.96
0.21
\(P_3\): BUN=H co-occurs Creat=H \(\Rightarrow \) Dx=RENAL
0.95
0.21
\(P_4\): Gluc=H before Gluc=VH \(\Rightarrow \) Dx=METAB
0.79
0.24
\(P_5\): Dx=CARDI co-occurs (Gluc=N before Gluc=H) \(\Rightarrow \) Dx=CEREB
0.71
0.22
Dx diagnosis code (one of the eight major categories described in Sect. 6.1.1), BUN blood urea nitrogen, Creat creatinine, Gluc blood glucose
Value abstractions: BUN=VH: \(>\)49 mg/dl; BUN=H: \(>34\) mg/dl; Creat=H: \(>\)1.8 mg/dl; Creat=N: [0.8–1.8] mg/dl; Gluc=VH: \(>\)243 mg/dl; Gluc=H: \(>\)191 mg/dl
 
Literatur
1.
Zurück zum Zitat Abramowitz M, Stegun IA (1964) Handbook of mathematical functions with formulas, graphs, and mathematical tables Abramowitz M, Stegun IA (1964) Handbook of mathematical functions with formulas, graphs, and mathematical tables
2.
Zurück zum Zitat Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the international conference on very large data bases (VLDB) Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the international conference on very large data bases (VLDB)
3.
Zurück zum Zitat Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of the international conference on data engineering (ICDE) Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of the international conference on data engineering (ICDE)
4.
Zurück zum Zitat Allen F (1984) Towards a general theory of action and time. Artif Intell 23:123–154MATHCrossRef Allen F (1984) Towards a general theory of action and time. Artif Intell 23:123–154MATHCrossRef
5.
Zurück zum Zitat Batal I, Cooper G, Hauskrecht M (2012) A Bayesian scoring technique for mining predictive and non-spurious rules. In: Proceedings of the European conference on principles of data mining and knowledge discovery (PKDD) Batal I, Cooper G, Hauskrecht M (2012) A Bayesian scoring technique for mining predictive and non-spurious rules. In: Proceedings of the European conference on principles of data mining and knowledge discovery (PKDD)
6.
Zurück zum Zitat Batal I, Fradkin D, Harrison J, Moerchen F, Hauskrecht M (2012) Mining recent temporal patterns for event detection in multivariate time series data. In: Proceedings of the international conference on knowledge discovery and data mining (SIGKDD) Batal I, Fradkin D, Harrison J, Moerchen F, Hauskrecht M (2012) Mining recent temporal patterns for event detection in multivariate time series data. In: Proceedings of the international conference on knowledge discovery and data mining (SIGKDD)
7.
Zurück zum Zitat Batal I, Hauskrecht M (2009) A supervised time series feature extraction technique using DCT and DWT. In: International conference on machine learning and applications (ICMLA) Batal I, Hauskrecht M (2009) A supervised time series feature extraction technique using DCT and DWT. In: International conference on machine learning and applications (ICMLA)
8.
Zurück zum Zitat Batal I, Valizadegan H, Cooper GF, Hauskrecht M (2011) A pattern mining approach for classifying multivariate temporal data. In: Proceedings of the IEEE international conference on bioinformatics and biomedicine (BIBM) Batal I, Valizadegan H, Cooper GF, Hauskrecht M (2011) A pattern mining approach for classifying multivariate temporal data. In: Proceedings of the IEEE international conference on bioinformatics and biomedicine (BIBM)
9.
Zurück zum Zitat Batal I, Valizadegan H, Cooper GF, Hauskrecht M (2013) A temporal pattern mining approach for classifying electronic health record data. ACM Trans Intell Syst Technol 4(4). doi:10.1145/2508037.2508044 Batal I, Valizadegan H, Cooper GF, Hauskrecht M (2013) A temporal pattern mining approach for classifying electronic health record data. ACM Trans Intell Syst Technol 4(4). doi:10.​1145/​2508037.​2508044
10.
Zurück zum Zitat Blasiak S, Rangwala H (2011) A hidden Markov model variant for sequence classification. In: Proceedings of the international joint conferences on artificial intelligence (IJCAI) Blasiak S, Rangwala H (2011) A hidden Markov model variant for sequence classification. In: Proceedings of the international joint conferences on artificial intelligence (IJCAI)
11.
Zurück zum Zitat Chandola V, Eilertson E, Ertoz L, Simon G, Kumar V (2006) Data mining for cyber security. In: Data warehousing and data mining techniques for computer security. Springer, Berlin Chandola V, Eilertson E, Ertoz L, Simon G, Kumar V (2006) Data mining for cyber security. In: Data warehousing and data mining techniques for computer security. Springer, Berlin
12.
Zurück zum Zitat Cheng H, Yan X, Han J, wei Hsu C (2007) Discriminative frequent pattern analysis for effective classification. In: Proceedings of the international conference on data engineering (ICDE) Cheng H, Yan X, Han J, wei Hsu C (2007) Discriminative frequent pattern analysis for effective classification. In: Proceedings of the international conference on data engineering (ICDE)
13.
Zurück zum Zitat Deshpande M, Kuramochi M, Wale N, Karypis G (2005) Frequent substructure-based approaches for classifying chemical compounds. IEEE Trans Knowl Data Eng 17:1036–1050CrossRef Deshpande M, Kuramochi M, Wale N, Karypis G (2005) Frequent substructure-based approaches for classifying chemical compounds. IEEE Trans Knowl Data Eng 17:1036–1050CrossRef
14.
Zurück zum Zitat Exarchos TP, Tsipouras MG, Papaloukas C, Fotiadis DI (2008) A two-stage methodology for sequence classification based on sequential pattern mining and optimization. Data Knowl Eng 66:467–487CrossRef Exarchos TP, Tsipouras MG, Papaloukas C, Fotiadis DI (2008) A two-stage methodology for sequence classification based on sequential pattern mining and optimization. Data Knowl Eng 66:467–487CrossRef
15.
Zurück zum Zitat Geng L, Hamilton HJ (2006) Interestingness measures for data mining: a survey. ACM Comput Surv 38(3) Geng L, Hamilton HJ (2006) Interestingness measures for data mining: a survey. ACM Comput Surv 38(3)
16.
Zurück zum Zitat Guttormsson SE, Marks RJ, El-Sharkawi MA, Kerszenbaum I (1999) Elliptical novelty grouping for on-line short-turn detection of excited running rotors. IEEE Trans Energy Convers 14(1):16–22 Guttormsson SE, Marks RJ, El-Sharkawi MA, Kerszenbaum I (1999) Elliptical novelty grouping for on-line short-turn detection of excited running rotors. IEEE Trans Energy Convers 14(1):16–22
17.
Zurück zum Zitat Hauskrecht M, Batal I, Valko M, Visweswaram S, Cooper G, Clermont G (2012) Outlier detection for patient monitoring and alerting. J Biomed Inform 46(1):47–55 Hauskrecht M, Batal I, Valko M, Visweswaram S, Cooper G, Clermont G (2012) Outlier detection for patient monitoring and alerting. J Biomed Inform 46(1):47–55
18.
Zurück zum Zitat Hauskrecht M, Valko M, Batal I, Clermont G, Visweswaram S, Cooper G (2010) Conditional outlier detection for clinical alerting. In Proceedings of the American Medical Informatics Association (AMIA) Hauskrecht M, Valko M, Batal I, Clermont G, Visweswaram S, Cooper G (2010) Conditional outlier detection for clinical alerting. In Proceedings of the American Medical Informatics Association (AMIA)
19.
Zurück zum Zitat Heckerman D, Geiger D, Chickering DM (1995) Learning Bayesian networks: the combination of knowledge and statistical data. Mach Learn 20:197–243 Heckerman D, Geiger D, Chickering DM (1995) Learning Bayesian networks: the combination of knowledge and statistical data. Mach Learn 20:197–243
20.
Zurück zum Zitat Höppner F (2001) Discovery of temporal patterns. Learning rules about the qualitative behaviour of time series. In: Proceedings of the European conference on principles of data mining and knowledge discovery (PKDD) Höppner F (2001) Discovery of temporal patterns. Learning rules about the qualitative behaviour of time series. In: Proceedings of the European conference on principles of data mining and knowledge discovery (PKDD)
21.
Zurück zum Zitat Höppner F (2003) Knowledge discovery from sequential data, PhD thesis. Technical University Braunschweig, Germany Höppner F (2003) Knowledge discovery from sequential data, PhD thesis. Technical University Braunschweig, Germany
22.
Zurück zum Zitat Kam P-S, Fu AW-C (2000) Discovering temporal patterns for interval-based events. In: Proceedings of the international conference on data warehousing and knowledge discovery (DaWaK) Kam P-S, Fu AW-C (2000) Discovering temporal patterns for interval-based events. In: Proceedings of the international conference on data warehousing and knowledge discovery (DaWaK)
23.
Zurück zum Zitat Kavsek B, Lavrač N (2006) APRIORI-SD: adapting association rule learning to subgroup discovery. Appl Artif Intell 20(7):543–583CrossRef Kavsek B, Lavrač N (2006) APRIORI-SD: adapting association rule learning to subgroup discovery. Appl Artif Intell 20(7):543–583CrossRef
24.
Zurück zum Zitat Keogh E, Chu S, Hart D, Pazzani M (1993) Segmenting time series: a survey and novel approach. In: Data mining in time series databases. World Scientific, pp 1–22 Keogh E, Chu S, Hart D, Pazzani M (1993) Segmenting time series: a survey and novel approach. In: Data mining in time series databases. World Scientific, pp 1–22
25.
Zurück zum Zitat Li L, Prakash BA, Faloutsos C (2010) Parsimonious linear fingerprinting for time series. PVLDB 3:385–396 Li L, Prakash BA, Faloutsos C (2010) Parsimonious linear fingerprinting for time series. PVLDB 3:385–396
26.
Zurück zum Zitat Li W, Han J, Pei J (2001) CMAR: accurate and efficient classification based on multiple class-association rules. In: Proceedings of the international conference on data mining (ICDM) Li W, Han J, Pei J (2001) CMAR: accurate and efficient classification based on multiple class-association rules. In: Proceedings of the international conference on data mining (ICDM)
27.
Zurück zum Zitat Mitchell TM (1997) Machine learning. McGraw-Hill Inc., New YorkMATH Mitchell TM (1997) Machine learning. McGraw-Hill Inc., New YorkMATH
28.
Zurück zum Zitat Moerchen F (2006a) Algorithms for time series knowledge mining. In: Proceedings of the international conference on knowledge discovery and data mining (SIGKDD) Moerchen F (2006a) Algorithms for time series knowledge mining. In: Proceedings of the international conference on knowledge discovery and data mining (SIGKDD)
29.
Zurück zum Zitat Moerchen F (2006b) Time series knowledge mining, PhD thesis. Philipps-University Marburg Moerchen F (2006b) Time series knowledge mining, PhD thesis. Philipps-University Marburg
30.
Zurück zum Zitat Moskovitch R, Shahar Y (2009), Medical temporal-knowledge discovery via temporal abstraction. In: Proceedings of the American Medical Informatics Association (AMIA) Moskovitch R, Shahar Y (2009), Medical temporal-knowledge discovery via temporal abstraction. In: Proceedings of the American Medical Informatics Association (AMIA)
31.
Zurück zum Zitat Papadimitriou S, Sun J, Faloutsos C (2005) Streaming pattern discovery in multiple time-series. In: Proceedings of the international conference on very large data bases (VLDB) Papadimitriou S, Sun J, Faloutsos C (2005) Streaming pattern discovery in multiple time-series. In: Proceedings of the international conference on very large data bases (VLDB)
32.
Zurück zum Zitat Papapetrou P, Kollios G, Sclaroff S, Gunopulos D (2005) Discovering frequent arrangements of temporal intervals. In Proceedings of the international conference on data mining (ICDM) Papapetrou P, Kollios G, Sclaroff S, Gunopulos D (2005) Discovering frequent arrangements of temporal intervals. In Proceedings of the international conference on data mining (ICDM)
33.
Zurück zum Zitat Patel D, Hsu W, Lee ML (2008a) Mining relationships among interval-based events for classification. In: Proceedings of the international conference on management of data (SIGMOD) Patel D, Hsu W, Lee ML (2008a) Mining relationships among interval-based events for classification. In: Proceedings of the international conference on management of data (SIGMOD)
34.
Zurück zum Zitat Patel D, Hsu W, Lee ML (2008b) Mining relationships among interval-based events for classification, In: Proceedings of the international conference on management of data (SIGMOD) Patel D, Hsu W, Lee ML (2008b) Mining relationships among interval-based events for classification, In: Proceedings of the international conference on management of data (SIGMOD)
35.
Zurück zum Zitat Pei J, Han J, Mortazavi-asl B, Pinto H, Chen Q, Dayal U, Hsu MC (2001) PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth. In: Proceedings of the international conference on data engineering (ICDE) Pei J, Han J, Mortazavi-asl B, Pinto H, Chen Q, Dayal U, Hsu MC (2001) PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth. In: Proceedings of the international conference on data engineering (ICDE)
36.
Zurück zum Zitat Pei J, Han J, Wang W (2007) Constraint-based sequential pattern mining: the pattern-growth methods. J Intell Inf Syst 28:133–160CrossRef Pei J, Han J, Wang W (2007) Constraint-based sequential pattern mining: the pattern-growth methods. J Intell Inf Syst 28:133–160CrossRef
37.
Zurück zum Zitat Pendelton R, Wheeler M, Rodgers G (2006) Argatroban dosing of patients with heparin induced thrombocytopenia and an elevated aPTT due to antiphospholipid antibody syndrome. Ann Pharmacother 40:972–976CrossRef Pendelton R, Wheeler M, Rodgers G (2006) Argatroban dosing of patients with heparin induced thrombocytopenia and an elevated aPTT due to antiphospholipid antibody syndrome. Ann Pharmacother 40:972–976CrossRef
38.
Zurück zum Zitat Ratanamahatana C, Keogh EJ (2005) Three myths about dynamic time warping data mining, In: Proceedings of the SIAM international conference on data mining (SDM) Ratanamahatana C, Keogh EJ (2005) Three myths about dynamic time warping data mining, In: Proceedings of the SIAM international conference on data mining (SDM)
39.
Zurück zum Zitat Sacchi L, Larizza C, Combi C, Bellazzi R (2007) Data mining with temporal abstractions: learning rules from time series. Data Min Knowl Discov 15(2):217–247 Sacchi L, Larizza C, Combi C, Bellazzi R (2007) Data mining with temporal abstractions: learning rules from time series. Data Min Knowl Discov 15(2):217–247
40.
Zurück zum Zitat Shahar Y (1997) A framework for knowledge-based temporal abstraction. Artif Intell 90:79–133MATHCrossRef Shahar Y (1997) A framework for knowledge-based temporal abstraction. Artif Intell 90:79–133MATHCrossRef
41.
Zurück zum Zitat Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: Proceedings of the international conference on extending database technology (EDBT) Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: Proceedings of the international conference on extending database technology (EDBT)
42.
Zurück zum Zitat Srivastava A, Kundu A, Sural S, Majumdar AK (2008) Credit card fraud detection using hidden Markov model. IEEE Trans Dependable Secure Comput 5(1):37–48 Srivastava A, Kundu A, Sural S, Majumdar AK (2008) Credit card fraud detection using hidden Markov model. IEEE Trans Dependable Secure Comput 5(1):37–48
43.
Zurück zum Zitat Vail DL, Veloso MM, Lafferty JD (2007) Conditional random fields for activity recognition. In: Proceedings of the international joint conference on autonomous agents and multiagent systems (AAMAS) Vail DL, Veloso MM, Lafferty JD (2007) Conditional random fields for activity recognition. In: Proceedings of the international joint conference on autonomous agents and multiagent systems (AAMAS)
44.
Zurück zum Zitat Warkentin T (2000) Heparin-induced thrombocytopenia: pathogenesis and management. Br J Haematol 121:535–555CrossRef Warkentin T (2000) Heparin-induced thrombocytopenia: pathogenesis and management. Br J Haematol 121:535–555CrossRef
45.
Zurück zum Zitat Webb GI (2007) Discovering significant patterns. Mach Learn 68(1):1–33CrossRef Webb GI (2007) Discovering significant patterns. Mach Learn 68(1):1–33CrossRef
46.
Zurück zum Zitat Weng X, Shen J (2008) Classification of multivariate time series using two-dimensional singular value decomposition. Knowl Based Syst 21(7):535–539CrossRef Weng X, Shen J (2008) Classification of multivariate time series using two-dimensional singular value decomposition. Knowl Based Syst 21(7):535–539CrossRef
47.
Zurück zum Zitat Winarko E, Roddick JF (2007) ARMADA—an algorithm for discovering richer relative temporal association rules from interval-based data. Data Knowl Eng 63:76–90CrossRef Winarko E, Roddick JF (2007) ARMADA—an algorithm for discovering richer relative temporal association rules from interval-based data. Data Knowl Eng 63:76–90CrossRef
48.
Zurück zum Zitat Wu S-Y, Chen Y-L (2007) Mining nonambiguous temporal patterns for interval-based events. IEEE Trans Knowl Data Eng 19:742–758CrossRef Wu S-Y, Chen Y-L (2007) Mining nonambiguous temporal patterns for interval-based events. IEEE Trans Knowl Data Eng 19:742–758CrossRef
49.
Zurück zum Zitat Xin D, Cheng H, Yan X, Han J (2006) Extracting redundancy-aware top-k patterns. In: Proceedings of the international conference on knowledge discovery and data mining (SIGKDD) Xin D, Cheng H, Yan X, Han J (2006) Extracting redundancy-aware top-k patterns. In: Proceedings of the international conference on knowledge discovery and data mining (SIGKDD)
50.
Zurück zum Zitat Yan X, Han J, Afshar R (2003) CloSpan: mining closed sequential patterns in large datasets. In: Proceedings of the SIAM international conference on data mining (SDM) Yan X, Han J, Afshar R (2003) CloSpan: mining closed sequential patterns in large datasets. In: Proceedings of the SIAM international conference on data mining (SDM)
51.
Zurück zum Zitat Yang K, Shahabi C (2004) A PCA-based similarity measure for multivariate time series. In: Proceedings of the international workshop on multimedia databases Yang K, Shahabi C (2004) A PCA-based similarity measure for multivariate time series. In: Proceedings of the international workshop on multimedia databases
52.
Zurück zum Zitat Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12:372–390CrossRef Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12:372–390CrossRef
53.
Zurück zum Zitat Zaki MJ (2001) SPADE: an efficient algorithm for mining frequent sequences. Mach Learn 42:31–60MATHCrossRef Zaki MJ (2001) SPADE: an efficient algorithm for mining frequent sequences. Mach Learn 42:31–60MATHCrossRef
Metadaten
Titel
An efficient pattern mining approach for event detection in multivariate temporal data
verfasst von
Iyad Batal
Gregory F. Cooper
Dmitriy Fradkin
James Harrison Jr.
Fabian Moerchen
Milos Hauskrecht
Publikationsdatum
01.01.2016
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 1/2016
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-015-0819-6

Weitere Artikel der Ausgabe 1/2016

Knowledge and Information Systems 1/2016 Zur Ausgabe