Skip to main content
Erschienen in: Data Mining and Knowledge Discovery 6/2016

01.11.2016

Exemplar learning for extremely efficient anomaly detection in real-valued time series

verfasst von: Michael Jones, Daniel Nikovski, Makoto Imamura, Takahisa Hirata

Erschienen in: Data Mining and Knowledge Discovery | Ausgabe 6/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We investigate algorithms for efficiently detecting anomalies in real-valued one-dimensional time series. Past work has shown that a simple brute force algorithm that uses as an anomaly score the Euclidean distance between nearest neighbors of subsequences from a testing time series and a training time series is one of the most effective anomaly detectors. We investigate a very efficient implementation of this method and show that it is still too slow for most real world applications. Next, we present a new method based on summarizing the training time series with a small set of exemplars. The exemplars we use are feature vectors that capture both the high frequency and low frequency information in sets of similar subsequences of the time series. We show that this exemplar-based method is both much faster than the efficient brute force method as well as a prediction-based method and also handles a wider range of anomalies. We compare our algorithm across a large variety of publicly available time series and encourage others to do the same. Our exemplar-based algorithm is able to process time series in minutes that would take other methods days to process.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
We did test the z-normalized BFED algorithm and as expected found it to be less accurate for anomaly detection. Over all the testing time series used in Sect. 6, the z-normalized BFED algorithm has a detection rate of 31/45 with no false positives which is worse than the unnormalized BFED algorithm as well as our exemplar approach.
 
Literatur
Zurück zum Zitat Aha D, Kibler D, Albert M (1991) Instance-based learning algorithms. Mach Learn 6:37–66 Aha D, Kibler D, Albert M (1991) Instance-based learning algorithms. Mach Learn 6:37–66
Zurück zum Zitat Assent I, Krieger R, Afschari F, Seidl T (2008) The TS-tree: efficient time series search and retrieval. In: Proceedings of the 11th international conference on extending database technology: advances in database technology (EDBT) Assent I, Krieger R, Afschari F, Seidl T (2008) The TS-tree: efficient time series search and retrieval. In: Proceedings of the 11th international conference on extending database technology: advances in database technology (EDBT)
Zurück zum Zitat Bay S, Saito K, Ueda N, Langley P (2004) A framework for discovering anomalous regimes in multivariate time-series data with local models. Symposium on machine learning for anomaly detection. Stanford University Bay S, Saito K, Ueda N, Langley P (2004) A framework for discovering anomalous regimes in multivariate time-series data with local models. Symposium on machine learning for anomaly detection. Stanford University
Zurück zum Zitat Chan P, Mahoney M (2005) Modeling multiple time series for anomaly detection. In: Fifth IEEE international conference on data mining, pp 90–97 Chan P, Mahoney M (2005) Modeling multiple time series for anomaly detection. In: Fifth IEEE international conference on data mining, pp 90–97
Zurück zum Zitat Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3) Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3)
Zurück zum Zitat Chandola V, Cheboli D, Kumar V (2009) Detecting anomalies in a time series database. Dept. of Computer Science and Engineering, Univ. of Minnesota Technical Report, TR 09–004 Chandola V, Cheboli D, Kumar V (2009) Detecting anomalies in a time series database. Dept. of Computer Science and Engineering, Univ. of Minnesota Technical Report, TR 09–004
Zurück zum Zitat Chang C-C, Lin C-J (2011) LIBSVM : a library for support vector machines. ACM Trans Intell Syst Technol 2(3): article no. 27, 1–27 Chang C-C, Lin C-J (2011) LIBSVM : a library for support vector machines. ACM Trans Intell Syst Technol 2(3): article no. 27, 1–27
Zurück zum Zitat Chiu B, Keogh E, Lonardi S (2003) Probabilistic discovery of time series motifs. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, pp 493–498 Chiu B, Keogh E, Lonardi S (2003) Probabilistic discovery of time series motifs. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, pp 493–498
Zurück zum Zitat Dasgupta D, Forrest S (1996) Novelty Detection in time series data using ideas from immunology. In: 5th international conference on intelligent systems Dasgupta D, Forrest S (1996) Novelty Detection in time series data using ideas from immunology. In: 5th international conference on intelligent systems
Zurück zum Zitat Gupta M, Gao J, Aggarwal C, Han J (2014) Outlier detection for temporal data: a survey. IEEE Trans Knowl Data Eng 26(9):2250–2267MathSciNetCrossRefMATH Gupta M, Gao J, Aggarwal C, Han J (2014) Outlier detection for temporal data: a survey. IEEE Trans Knowl Data Eng 26(9):2250–2267MathSciNetCrossRefMATH
Zurück zum Zitat Jones M, Nikovski D, Imamura M, Hirata T (2014) Anomaly detection in real-valued multidimensional time series. In: Proceedings of the 2nd international ASE conference on big data science and computing Jones M, Nikovski D, Imamura M, Hirata T (2014) Anomaly detection in real-valued multidimensional time series. In: Proceedings of the 2nd international ASE conference on big data science and computing
Zurück zum Zitat Keogh E, Lin J, Fu A (2005) HOT SAX: finding the most unusual time series subsequence: algorithms and applications. In: Proceedings of the Fifth IEEE international conference on data mining, pp 226–233 Keogh E, Lin J, Fu A (2005) HOT SAX: finding the most unusual time series subsequence: algorithms and applications. In: Proceedings of the Fifth IEEE international conference on data mining, pp 226–233
Zurück zum Zitat Liu B, Chen H, Sharma A, Jiang G, Xiong H (2013) Modeling heterogeneous time series dynamics to profile big sensor data in complex physical systems. In: IEEE international conference on big data, pp 631–638 Liu B, Chen H, Sharma A, Jiang G, Xiong H (2013) Modeling heterogeneous time series dynamics to profile big sensor data in complex physical systems. In: IEEE international conference on big data, pp 631–638
Zurück zum Zitat Ma J, Perkins S (2003) Online novelty detection on temporal sequences. Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, pp 613–618 Ma J, Perkins S (2003) Online novelty detection on temporal sequences. Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, pp 613–618
Zurück zum Zitat Mahoney M, Chan P (2005) Trajectory boundary modeling of time series for anomaly detection. Workshop on data mining methods for anomaly detection at SIGKDD Mahoney M, Chan P (2005) Trajectory boundary modeling of time series for anomaly detection. Workshop on data mining methods for anomaly detection at SIGKDD
Zurück zum Zitat Oliveira A, Meira S (2006) Detecting novelties in time series through neural network forcasting with robust confidence intervals. Neurocomputing 70:79–92CrossRef Oliveira A, Meira S (2006) Detecting novelties in time series through neural network forcasting with robust confidence intervals. Neurocomputing 70:79–92CrossRef
Zurück zum Zitat Patel P, Keogh E, Lin J, Lonardi S (2002) Mining motifs in massive time series databases. In: Proceedings of the 2002 IEEE international conference on data mining, pp 370–377 Patel P, Keogh E, Lin J, Lonardi S (2002) Mining motifs in massive time series databases. In: Proceedings of the 2002 IEEE international conference on data mining, pp 370–377
Zurück zum Zitat Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 262–270 Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 262–270
Metadaten
Titel
Exemplar learning for extremely efficient anomaly detection in real-valued time series
verfasst von
Michael Jones
Daniel Nikovski
Makoto Imamura
Takahisa Hirata
Publikationsdatum
01.11.2016
Verlag
Springer US
Erschienen in
Data Mining and Knowledge Discovery / Ausgabe 6/2016
Print ISSN: 1384-5810
Elektronische ISSN: 1573-756X
DOI
https://doi.org/10.1007/s10618-015-0449-3