Skip to main content
Erschienen in: Data Mining and Knowledge Discovery 6/2018

31.07.2018

Exact variable-length anomaly detection algorithm for univariate and multivariate time series

verfasst von: Xing Wang, Jessica Lin, Nital Patel, Martin Braun

Erschienen in: Data Mining and Knowledge Discovery | Ausgabe 6/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The problem of anomaly detection in time series has received a lot of attention in the past two decades. However, existing techniques cannot locate where the anomalies are within anomalous time series, or they require users to provide the length of potential anomalies. To address these limitations, we propose a self-learning online anomaly detection algorithm that automatically identifies anomalous time series, as well as the exact locations where the anomalies occur in the detected time series. In addition, for multivariate time series, it is difficult to detect anomalies due to the following challenges. First, anomalies may occur in only a subset of dimensions (variables). Second, the locations and lengths of anomalous subsequences may be different in different dimensions. Third, some anomalies may look normal in each individual dimension but different with combinations of dimensions. To mitigate these problems, we introduce a multivariate anomaly detection algorithm which detects anomalies and identifies the dimensions and locations of the anomalous subsequences. We evaluate our approaches on several real-world datasets, including two CPU manufacturing data from Intel. We demonstrate that our approach can successfully detect the correct anomalies without requiring any prior knowledge about the data.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
We use the whole time series to demonstrate the idea, but our approach also works in the scenario when points of time series come in a streaming fashion.
 
Literatur
Zurück zum Zitat Aggarwal CC, Yu PS (2010) On clustering massive text and categorical data streams. Knowl Inf Syst 24(2):171–196CrossRef Aggarwal CC, Yu PS (2010) On clustering massive text and categorical data streams. Knowl Inf Syst 24(2):171–196CrossRef
Zurück zum Zitat Ahmed M, Baqqar M, Gu F, Ball AD (2012) Fault detection and diagnosis using principal component analysis of vibration data from a reciprocating compressor. In: Proceedings of 2012 UKACC international conference on control, pp 461–466 Ahmed M, Baqqar M, Gu F, Ball AD (2012) Fault detection and diagnosis using principal component analysis of vibration data from a reciprocating compressor. In: Proceedings of 2012 UKACC international conference on control, pp 461–466
Zurück zum Zitat Baragona R, Battaglia F (2007) Outliers detection in multivariate time series by independent component analysis. Neural Comput 19(7):1962–1984CrossRef Baragona R, Battaglia F (2007) Outliers detection in multivariate time series by independent component analysis. Neural Comput 19(7):1962–1984CrossRef
Zurück zum Zitat Begum N, Ulanova L, Wang J, Keogh E (2015) Accelerating dynamic time warping clustering with a novel admissible pruning strategy. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, KDD ’15, pp 49–58 Begum N, Ulanova L, Wang J, Keogh E (2015) Accelerating dynamic time warping clustering with a novel admissible pruning strategy. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, KDD ’15, pp 49–58
Zurück zum Zitat Budalakoti S, Srivastava AN, Akella R, Turkov E (2006) Anomaly detection in large sets of high-dimensional symbol sequences. Tech Rep Budalakoti S, Srivastava AN, Akella R, Turkov E (2006) Anomaly detection in large sets of high-dimensional symbol sequences. Tech Rep
Zurück zum Zitat Cai D, Zhang C, He X (2010) Unsupervised feature selection for multi-cluster data. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, KDD ’10, pp 333–342 Cai D, Zhang C, He X (2010) Unsupervised feature selection for multi-cluster data. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, KDD ’10, pp 333–342
Zurück zum Zitat Chandola V, Cheboli D, Kumar V (2009) Detecting anomalies in a time series database. University of Minnesota, Tech Rep, Computer Science Department Chandola V, Cheboli D, Kumar V (2009) Detecting anomalies in a time series database. University of Minnesota, Tech Rep, Computer Science Department
Zurück zum Zitat Cheng H, Tan PN, Potter C, Klooster S (2009) Detection and characterization of anomalies in multivariate time series. In: Proceedings of the 2009 SIAM international conference on data mining, pp 413–424CrossRef Cheng H, Tan PN, Potter C, Klooster S (2009) Detection and characterization of anomalies in multivariate time series. In: Proceedings of the 2009 SIAM international conference on data mining, pp 413–424CrossRef
Zurück zum Zitat Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second international conference on knowledge discovery and data mining, AAAI Press, KDD’96, pp 226–231 Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second international conference on knowledge discovery and data mining, AAAI Press, KDD’96, pp 226–231
Zurück zum Zitat Galeano P, Pea D, Tsay RS (2006) Outlier detection in multivariate time series by projection pursuit. J Am Stat Assoc 101(474):654–669MathSciNetCrossRef Galeano P, Pea D, Tsay RS (2006) Outlier detection in multivariate time series by projection pursuit. J Am Stat Assoc 101(474):654–669MathSciNetCrossRef
Zurück zum Zitat Gupta M, Gao J, Aggarwal C, Han J (2014) Outlier detection for temporal data. Synth Lect Data Min Knowl Discov 5(1):1–129CrossRef Gupta M, Gao J, Aggarwal C, Han J (2014) Outlier detection for temporal data. Synth Lect Data Min Knowl Discov 5(1):1–129CrossRef
Zurück zum Zitat Hawkins DM (1980) Identification of outliers, vol 11. Springer, DordrechtCrossRef Hawkins DM (1980) Identification of outliers, vol 11. Springer, DordrechtCrossRef
Zurück zum Zitat He Z, Xu X, Deng S (2003) Discovering cluster-based local outliers. Pattern Recogn Lett 24(9):1641–1650CrossRef He Z, Xu X, Deng S (2003) Discovering cluster-based local outliers. Pattern Recogn Lett 24(9):1641–1650CrossRef
Zurück zum Zitat Hyndman RJ, Wang E, Laptev N (2015) Large-scale unusual time series detection. In: 2015 IEEE international conference on data mining workshop (ICDMW), pp 1616–1619 Hyndman RJ, Wang E, Laptev N (2015) Large-scale unusual time series detection. In: 2015 IEEE international conference on data mining workshop (ICDMW), pp 1616–1619
Zurück zum Zitat Id T, Papadimitriou S, Vlachos M (2007) Computing correlation anomaly scores using stochastic nearest neighbors. In: Seventh IEEE international conference on data mining (ICDM 2007), pp 523–528 Id T, Papadimitriou S, Vlachos M (2007) Computing correlation anomaly scores using stochastic nearest neighbors. In: Seventh IEEE international conference on data mining (ICDM 2007), pp 523–528
Zurück zum Zitat Izakian H, Pedrycz W (2014) Anomaly detection and characterization in spatial time series data: a cluster-centric approach. IEEE Trans Fuzzy Syst 22(6):1612–1624CrossRef Izakian H, Pedrycz W (2014) Anomaly detection and characterization in spatial time series data: a cluster-centric approach. IEEE Trans Fuzzy Syst 22(6):1612–1624CrossRef
Zurück zum Zitat Jaccard P (1912) The distribution of the flora in the alpine zone. New Phytol 11(2):37–50CrossRef Jaccard P (1912) The distribution of the flora in the alpine zone. New Phytol 11(2):37–50CrossRef
Zurück zum Zitat Keogh E, Lin J (2005) Clustering of time-series subsequences is meaningless: implications for previous and future research. Knowl Inf Syst 8(2):154–177CrossRef Keogh E, Lin J (2005) Clustering of time-series subsequences is meaningless: implications for previous and future research. Knowl Inf Syst 8(2):154–177CrossRef
Zurück zum Zitat Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2001) Dimensionality reduction for fast similarity search in large time series databases. Knowl Inf Syst 3(3):263–286CrossRef Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2001) Dimensionality reduction for fast similarity search in large time series databases. Knowl Inf Syst 3(3):263–286CrossRef
Zurück zum Zitat Keogh E, Lin J, Fu A (2005) Hot sax: efficiently finding the most unusual time series subsequence. In: Fifth IEEE international conference on data mining (ICDM’05), pp 8 Keogh E, Lin J, Fu A (2005) Hot sax: efficiently finding the most unusual time series subsequence. In: Fifth IEEE international conference on data mining (ICDM’05), pp 8
Zurück zum Zitat Keogh E, Lin J, Lee SH, Herle HV (2007) Finding the most unusual time series subsequence: algorithms and applications. Knowl Inf Syst 11(1):1–27CrossRef Keogh E, Lin J, Lee SH, Herle HV (2007) Finding the most unusual time series subsequence: algorithms and applications. Knowl Inf Syst 11(1):1–27CrossRef
Zurück zum Zitat Laptev N, Amizadeh S, Flint I (2015) Generic and scalable framework for automated time-series anomaly detection. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, KDD ’15, pp 1939–1947 Laptev N, Amizadeh S, Flint I (2015) Generic and scalable framework for automated time-series anomaly detection. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, KDD ’15, pp 1939–1947
Zurück zum Zitat Li Y, Lin J, Oates T (2012) Visualizing variable-length time series motifs. In: Proceedings of the 2012 SIAM international conference on data mining, pp 895–906CrossRef Li Y, Lin J, Oates T (2012) Visualizing variable-length time series motifs. In: Proceedings of the 2012 SIAM international conference on data mining, pp 895–906CrossRef
Zurück zum Zitat Li J, Pedrycz W, Jamal I (2017) Multivariate time series anomaly detection: a framework of hidden markov models. Appl Soft Comput 60(Supplement C):229–240CrossRef Li J, Pedrycz W, Jamal I (2017) Multivariate time series anomaly detection: a framework of hidden markov models. Appl Soft Comput 60(Supplement C):229–240CrossRef
Zurück zum Zitat Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing sax: a novel symbolic representation of time series. Data Min Knowl Disc 15(2):107–144MathSciNetCrossRef Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing sax: a novel symbolic representation of time series. Data Min Knowl Disc 15(2):107–144MathSciNetCrossRef
Zurück zum Zitat Miljkovi D (2011) Fault detection methods: a literature survey. In: 2011 Proceedings of the 34th international convention MIPRO, pp 750–755 Miljkovi D (2011) Fault detection methods: a literature survey. In: 2011 Proceedings of the 34th international convention MIPRO, pp 750–755
Zurück zum Zitat Nevill-Manning CG, Witten IH (1997) Identifying hierarchical structure in sequences: a linear-time algorithm. J Artif Intell Res 7(1):67–82CrossRef Nevill-Manning CG, Witten IH (1997) Identifying hierarchical structure in sequences: a linear-time algorithm. J Artif Intell Res 7(1):67–82CrossRef
Zurück zum Zitat Pires AM, Santos-Pereira C (2005) Using clustering and robust estimators to detect outliers in multivariate data. In: Proceedings of the international conference on robust statistics Pires AM, Santos-Pereira C (2005) Using clustering and robust estimators to detect outliers in multivariate data. In: Proceedings of the international conference on robust statistics
Zurück zum Zitat Qiu H, Liu Y, Subrahmanya NA, Li W (2012) Granger causality for time-series anomaly detection. In: 2012 IEEE 12th international conference on data mining, pp 1074–1079 Qiu H, Liu Y, Subrahmanya NA, Li W (2012) Granger causality for time-series anomaly detection. In: 2012 IEEE 12th international conference on data mining, pp 1074–1079
Zurück zum Zitat Senin P, Lin J, Wang X, Oates T, Gandhi S, Boedihardjo AP, Chen C, Frankenstein S, Lerner M (2014) Grammarviz 2.0: a tool for grammar-based pattern discovery in time series. In: Calders T, Esposito F, Hüllermeier E, Meo R (eds) Machine learning and knowledge discovery in databases: European conference, ECML PKDD 2014, Nancy, France, September 15-19, 2014. Proceedings, Part III. Springer, Berlin pp 468–472 Senin P, Lin J, Wang X, Oates T, Gandhi S, Boedihardjo AP, Chen C, Frankenstein S, Lerner M (2014) Grammarviz 2.0: a tool for grammar-based pattern discovery in time series. In: Calders T, Esposito F, Hüllermeier E, Meo R (eds) Machine learning and knowledge discovery in databases: European conference, ECML PKDD 2014, Nancy, France, September 15-19, 2014. Proceedings, Part III. Springer, Berlin pp 468–472
Zurück zum Zitat Senin P, Lin J, Wang X, Oates T, Gandhi S, Boedihardjo AP, Chen C, Frankenstein S (2015) Time series anomaly discovery with grammar-based compression. In: Proceedings of the 18th international conference on extending database technology, EDBT 2015, Brussels, Belgium, March 23–27, 2015, pp 481–492 Senin P, Lin J, Wang X, Oates T, Gandhi S, Boedihardjo AP, Chen C, Frankenstein S (2015) Time series anomaly discovery with grammar-based compression. In: Proceedings of the 18th international conference on extending database technology, EDBT 2015, Brussels, Belgium, March 23–27, 2015, pp 481–492
Zurück zum Zitat Sequeira K, Zaki M (2002) Admit: anomaly-based data mining for intrusions. In: Proceedings of the Eighth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, KDD ’02, pp 386–395 Sequeira K, Zaki M (2002) Admit: anomaly-based data mining for intrusions. In: Proceedings of the Eighth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, KDD ’02, pp 386–395
Zurück zum Zitat Sun H, Bao Y, Zhao F, Yu G, Wang D (2004) Cd-trees: an efficient index structure for outlier detection. In: Li Q, Wang G, Feng L (eds) Advances in web-age information management: 5th international conference, WAIM 2004, Dalian, China, July 15–17, 2004. Springer, Berlin, pp 600–609CrossRef Sun H, Bao Y, Zhao F, Yu G, Wang D (2004) Cd-trees: an efficient index structure for outlier detection. In: Li Q, Wang G, Feng L (eds) Advances in web-age information management: 5th international conference, WAIM 2004, Dalian, China, July 15–17, 2004. Springer, Berlin, pp 600–609CrossRef
Zurück zum Zitat Wang H, Tang M, Park Y, Priebe CE (2014) Locality statistics for anomaly detection in time series of graphs. IEEE Trans Signal Process 62(3):703–717MathSciNetCrossRef Wang H, Tang M, Park Y, Priebe CE (2014) Locality statistics for anomaly detection in time series of graphs. IEEE Trans Signal Process 62(3):703–717MathSciNetCrossRef
Zurück zum Zitat Wang X, Gao Y, Lin J, Rangwala H, Mittu R (2015) A machine learning approach to false alarm detection for critical arrhythmia alarms. In: 2015 IEEE 14th international conference on machine learning and applications (ICMLA), pp 202–207 Wang X, Gao Y, Lin J, Rangwala H, Mittu R (2015) A machine learning approach to false alarm detection for critical arrhythmia alarms. In: 2015 IEEE 14th international conference on machine learning and applications (ICMLA), pp 202–207
Zurück zum Zitat Wang X, Lin J, Patel N, Braun M (2016) A self-learning and online algorithm for time series anomaly detection, with application in cpu manufacturing. In: Proceedings of the 25th ACM international on conference on information and knowledge management, ACM, New York, CIKM ’16, pp 1823–1832 Wang X, Lin J, Patel N, Braun M (2016) A self-learning and online algorithm for time series anomaly detection, with application in cpu manufacturing. In: Proceedings of the 25th ACM international on conference on information and knowledge management, ACM, New York, CIKM ’16, pp 1823–1832
Zurück zum Zitat Wei L, Keogh E, Xi X (2006) Saxually explicit images: Finding unusual shapes. In: Sixth international conference on data mining (ICDM’06), pp 711–720 Wei L, Keogh E, Xi X (2006) Saxually explicit images: Finding unusual shapes. In: Sixth international conference on data mining (ICDM’06), pp 711–720
Zurück zum Zitat Xie Y, Huang J, Willett R (2013) Change-point detection for high-dimensional time series with missing data. IEEE J Sel Top Signal Process 7(1):12–27CrossRef Xie Y, Huang J, Willett R (2013) Change-point detection for high-dimensional time series with missing data. IEEE J Sel Top Signal Process 7(1):12–27CrossRef
Zurück zum Zitat Zhang Y, Meratnia N, Havinga P (2010) Outlier detection techniques for wireless sensor networks: a survey. IEEE Commun Surv Tutor 12(2):159–170CrossRef Zhang Y, Meratnia N, Havinga P (2010) Outlier detection techniques for wireless sensor networks: a survey. IEEE Commun Surv Tutor 12(2):159–170CrossRef
Metadaten
Titel
Exact variable-length anomaly detection algorithm for univariate and multivariate time series
verfasst von
Xing Wang
Jessica Lin
Nital Patel
Martin Braun
Publikationsdatum
31.07.2018
Verlag
Springer US
Erschienen in
Data Mining and Knowledge Discovery / Ausgabe 6/2018
Print ISSN: 1384-5810
Elektronische ISSN: 1573-756X
DOI
https://doi.org/10.1007/s10618-018-0569-7

Weitere Artikel der Ausgabe 6/2018

Data Mining and Knowledge Discovery 6/2018 Zur Ausgabe

Premium Partner