Skip to main content
Erschienen in:
Buchtitelbild

2017 | OriginalPaper | Buchkapitel

BeatLex: Summarizing and Forecasting Time Series with Patterns

verfasst von : Bryan Hooi, Shenghua Liu, Asim Smailagic, Christos Faloutsos

Erschienen in: Machine Learning and Knowledge Discovery in Databases

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Given time-series data such as electrocardiogram (ECG) readings, or motion capture data, how can we succintly summarize the data in a way that robustly identifies patterns that appear repeatedly? How can we then use such a summary to identify anomalies such as abnormal heartbeats, and also forecast future values of the time series? Our main idea is a vocabulary-based approach, which automatically learns a set of common patterns, or ‘beat patterns,’ which are used as building blocks to describe the time series in an intuitive and interpretable way. Our summarization algorithm, BeatLex (Beat Lexicons for Summarization) is: (1) fast and online, requiring linear time in the data size and bounded memory; (2) effective, outperforming competing algorithms in labelling accuracy by \(5.3 \) times, and forecasting accuracy by \(1.8 \) times; (3) principled and parameter-free, as it is based on the Minimum Description Length principle of summarizing the data by compressing it using as few bits as possible, and automatically tunes all its parameters; (4) general: it applies to any domain of time series data, and can make use of multidimensional (i.e. coevolving) time series.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Here, \(\log ^*\) is the universal code length for positive integers [19].
 
2
We use \(C_F=8\), following [15].
 
Literatur
1.
Zurück zum Zitat Basu, S., Meckesheimer, M.: Automatic outlier detection for time series: an application to sensor data. Knowl. Inf. Syst. 11(2), 137–154 (2007)CrossRef Basu, S., Meckesheimer, M.: Automatic outlier detection for time series: an application to sensor data. Knowl. Inf. Syst. 11(2), 137–154 (2007)CrossRef
2.
Zurück zum Zitat Baum, L.E., Petrie, T.: Statistical inference for probabilistic functions of finite state Markov chains. Ann. Math. Stat. 37(6), 1554–1563 (1966)MathSciNetCrossRefMATH Baum, L.E., Petrie, T.: Statistical inference for probabilistic functions of finite state Markov chains. Ann. Math. Stat. 37(6), 1554–1563 (1966)MathSciNetCrossRefMATH
3.
Zurück zum Zitat Box, G.E., Jenkins, G.M.: Time Series Analysis: Forecasting and Control, revised edn. Holden-Day Inc., San Francisco (1976)MATH Box, G.E., Jenkins, G.M.: Time Series Analysis: Forecasting and Control, revised edn. Holden-Day Inc., San Francisco (1976)MATH
4.
Zurück zum Zitat Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, Hoboken (2012)MATH Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, Hoboken (2012)MATH
5.
Zurück zum Zitat De Livera, A.M., Hyndman, R.J., Snyder, R.D.: Forecasting time series with complex seasonal patterns using exponential smoothing. J. Am. Stat. Assoc. 106(496), 1513–1527 (2011)MathSciNetCrossRefMATH De Livera, A.M., Hyndman, R.J., Snyder, R.D.: Forecasting time series with complex seasonal patterns using exponential smoothing. J. Am. Stat. Assoc. 106(496), 1513–1527 (2011)MathSciNetCrossRefMATH
6.
Zurück zum Zitat Keogh, E.: Exact indexing of dynamic time warping. In: Proceedings of the 28th International Conference on Very Large Data Bases, pp. 406–417. VLDB Endowment (2002) Keogh, E.: Exact indexing of dynamic time warping. In: Proceedings of the 28th International Conference on Very Large Data Bases, pp. 406–417. VLDB Endowment (2002)
7.
Zurück zum Zitat Keogh, E., Chu, S., Hart, D., Pazzani, M.: An online algorithm for segmenting time series. In: Proceedings IEEE International Conference on Data Mining, ICDM 2001, pp. 289–296. IEEE (2001) Keogh, E., Chu, S., Hart, D., Pazzani, M.: An online algorithm for segmenting time series. In: Proceedings IEEE International Conference on Data Mining, ICDM 2001, pp. 289–296. IEEE (2001)
8.
Zurück zum Zitat Keogh, E., Lin, J., Fu, A.: Hot sax: efficiently finding the most unusual time series subsequence. In: Fifth IEEE International Conference on Data Mining, pp. 8–pp. IEEE (2005) Keogh, E., Lin, J., Fu, A.: Hot sax: efficiently finding the most unusual time series subsequence. In: Fifth IEEE International Conference on Data Mining, pp. 8–pp. IEEE (2005)
9.
Zurück zum Zitat Keogh, E., Lin, J., Truppel, W.: Clustering of time series subsequences is meaningless: implications for previous and future research. In: Third IEEE International Conference on Data Mining, ICDM 2003, pp. 115–122. IEEE (2003) Keogh, E., Lin, J., Truppel, W.: Clustering of time series subsequences is meaningless: implications for previous and future research. In: Third IEEE International Conference on Data Mining, ICDM 2003, pp. 115–122. IEEE (2003)
10.
Zurück zum Zitat Letchner, J., Re, C., Balazinska, M., Philipose, M.: Access methods for markovian streams. In: IEEE 25th International Conference on Data Engineering, ICDE 2009, pp. 246–257. IEEE (2009) Letchner, J., Re, C., Balazinska, M., Philipose, M.: Access methods for markovian streams. In: IEEE 25th International Conference on Data Engineering, ICDE 2009, pp. 246–257. IEEE (2009)
11.
Zurück zum Zitat Li, L., McCann, J., Pollard, N.S., Faloutsos, C.: Dynammo: mining and summarization of coevolving sequences with missing values. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 507–516. ACM (2009) Li, L., McCann, J., Pollard, N.S., Faloutsos, C.: Dynammo: mining and summarization of coevolving sequences with missing values. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 507–516. ACM (2009)
12.
Zurück zum Zitat Mathioudakis, M., Koudas, N., Marbach, P.: Early online identification of attention gathering items in social media. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining, pp. 301–310. ACM (2010) Mathioudakis, M., Koudas, N., Marbach, P.: Early online identification of attention gathering items in social media. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining, pp. 301–310. ACM (2010)
13.
Zurück zum Zitat Matsubara, Y., Sakurai, Y.: Regime shifts in streams: real-time forecasting of co-evolving time sequences. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1045–1054. ACM (2016) Matsubara, Y., Sakurai, Y.: Regime shifts in streams: real-time forecasting of co-evolving time sequences. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1045–1054. ACM (2016)
14.
Zurück zum Zitat Matsubara, Y., Sakurai, Y., Faloutsos, C.: Autoplait: automatic mining of co-evolving time sequences. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 193–204. ACM (2014) Matsubara, Y., Sakurai, Y., Faloutsos, C.: Autoplait: automatic mining of co-evolving time sequences. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 193–204. ACM (2014)
15.
Zurück zum Zitat Matsubara, Y., Sakurai, Y., Faloutsos, C.: The web as a jungle: non-linear dynamical systems for co-evolving online activities. In: Proceedings of the 24th International Conference on World Wide Web, pp. 721–731. ACM (2015) Matsubara, Y., Sakurai, Y., Faloutsos, C.: The web as a jungle: non-linear dynamical systems for co-evolving online activities. In: Proceedings of the 24th International Conference on World Wide Web, pp. 721–731. ACM (2015)
16.
Zurück zum Zitat Moody, G.B., Mark, R.G.: The impact of the MIT-BIH arrhythmia database. IEEE Eng. Med. Biol. Mag. 20(3), 45–50 (2001)CrossRef Moody, G.B., Mark, R.G.: The impact of the MIT-BIH arrhythmia database. IEEE Eng. Med. Biol. Mag. 20(3), 45–50 (2001)CrossRef
17.
Zurück zum Zitat Rakthanmanon, T., Keogh, E.J., Lonardi, S., Evans, S.: MDL-based time series clustering. Knowl. Inf. Syst. 33(2), 371–399 (2012)CrossRef Rakthanmanon, T., Keogh, E.J., Lonardi, S., Evans, S.: MDL-based time series clustering. Knowl. Inf. Syst. 33(2), 371–399 (2012)CrossRef
18.
Zurück zum Zitat Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)CrossRef Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)CrossRef
19.
Zurück zum Zitat Rissanen, J.: A universal prior for integers and estimation by minimum description length. Ann. Stat. 11(2), 416–431 (1983)MathSciNetCrossRefMATH Rissanen, J.: A universal prior for integers and estimation by minimum description length. Ann. Stat. 11(2), 416–431 (1983)MathSciNetCrossRefMATH
20.
Zurück zum Zitat Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Sig. Process. 26(1), 43–49 (1978)CrossRefMATH Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Sig. Process. 26(1), 43–49 (1978)CrossRefMATH
21.
Zurück zum Zitat Shieh, J., Keogh, E.: iSAX: indexing and mining terabyte sized time series. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 623–631. ACM (2008) Shieh, J., Keogh, E.: iSAX: indexing and mining terabyte sized time series. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 623–631. ACM (2008)
22.
Zurück zum Zitat Strehl, A., Ghosh, J.: Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3(Dec), 583–617 (2002)MathSciNetMATH Strehl, A., Ghosh, J.: Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3(Dec), 583–617 (2002)MathSciNetMATH
23.
Zurück zum Zitat Sun, H., Lui, J.C., Yau, D.K.: Distributed mechanism in detecting and defending against the low-rate TCP attack. Comput. Netw. 50(13), 2312–2330 (2006)CrossRefMATH Sun, H., Lui, J.C., Yau, D.K.: Distributed mechanism in detecting and defending against the low-rate TCP attack. Comput. Netw. 50(13), 2312–2330 (2006)CrossRefMATH
24.
Zurück zum Zitat Wagner, G.S.: Marriott’s Practical Electrocardiography. Lippincott Williams & Wilkins, Philadelphia (2001) Wagner, G.S.: Marriott’s Practical Electrocardiography. Lippincott Williams & Wilkins, Philadelphia (2001)
25.
Zurück zum Zitat Wang, P., Wang, H., Wang, W.: Finding semantics in time series. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, pp. 385–396. ACM (2011) Wang, P., Wang, H., Wang, W.: Finding semantics in time series. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, pp. 385–396. ACM (2011)
26.
Zurück zum Zitat Wang, X., Mueen, A., Ding, H., Trajcevski, G., Scheuermann, P., Keogh, E.: Experimental comparison of representation methods and distance measures for time series data. Data Min. Knowl. Discov. 26(2), 1–35 (2013)MathSciNetCrossRef Wang, X., Mueen, A., Ding, H., Trajcevski, G., Scheuermann, P., Keogh, E.: Experimental comparison of representation methods and distance measures for time series data. Data Min. Knowl. Discov. 26(2), 1–35 (2013)MathSciNetCrossRef
27.
Zurück zum Zitat Yankov, D., Keogh, E., Rebbapragada, U.: Disk aware discord discovery: finding unusual time series in terabyte sized datasets. In: Seventh IEEE International Conference on Data Mining, ICDM 2007, pp. 381–390. IEEE (2007) Yankov, D., Keogh, E., Rebbapragada, U.: Disk aware discord discovery: finding unusual time series in terabyte sized datasets. In: Seventh IEEE International Conference on Data Mining, ICDM 2007, pp. 381–390. IEEE (2007)
Metadaten
Titel
BeatLex: Summarizing and Forecasting Time Series with Patterns
verfasst von
Bryan Hooi
Shenghua Liu
Asim Smailagic
Christos Faloutsos
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-71246-8_1