Skip to main content
Top
Published in: Knowledge and Information Systems 8/2020

09-03-2020 | Regular Paper

Online anomaly search in time series: significant online discords

Authors: Paolo Avogadro, Luca Palonca, Matteo Alessandro Dominoni

Published in: Knowledge and Information Systems | Issue 8/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The aim of this work is to obtain a useful anomaly definition for online analysis of time series. The idea is to develop an anomaly concept which is sustainable for long-lived and frequent streamings. As a solution, we provide an adaptation of the discord concept, which has been successfully used for anomaly detection on time series. An online approach implies the frequent processing of a data streaming for timely providing anomaly alerts. This requires a modification since discord search is not exactly decomposable in its original definition. With a statistical approach, allowing to rate the significance of the discords of each analysis, it has been possible to obtain a solution where the number of false positives is minimized. The new online anomalies are called significant online discords (sods). As a novel feature, sod search determines the quantity of anomalies in the time series under investigation. The search for sods has been implemented and its properties validated with synthetic and real data. As a result, we found that sods can be considered as a useful new tool for anomaly detection in fast streaming time series or Big Data contexts.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Aggarwal CC (2007) Data streams: models and algorithms, vol 31. Advances in database system. Springer, BerlinCrossRef Aggarwal CC (2007) Data streams: models and algorithms, vol 31. Advances in database system. Springer, BerlinCrossRef
2.
go back to reference Aghabozorgi S, Shirkhorshidi AS, Wah TY (2015) Time-series clustering—a decade review. Inf Syst 53:16–38CrossRef Aghabozorgi S, Shirkhorshidi AS, Wah TY (2015) Time-series clustering—a decade review. Inf Syst 53:16–38CrossRef
4.
go back to reference Avogadro P, Dominoni MA (2019) Topological approach for finding the nearest neighbor sequence in time series. In: Proceedings of the 12th international conference on knowledge discovery and information retrieval (KDIR) 2019, pp 233–244 Avogadro P, Dominoni MA (2019) Topological approach for finding the nearest neighbor sequence in time series. In: Proceedings of the 12th international conference on knowledge discovery and information retrieval (KDIR) 2019, pp 233–244
5.
go back to reference Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. ACM Press, Addison-Wesley, New York Seiten 75 ff. ISBN 0-201-39829-X Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. ACM Press, Addison-Wesley, New York Seiten 75 ff. ISBN 0-201-39829-X
6.
go back to reference Barbará D, Domeniconi C, Duric Z, Filippone M, Mansfield R, Lawson E (2008) Detecting suspicious behavior in surveillance images. In: IEEE international conference on proceedings of data mining workshops, ICDMW’08, IEEE, pp 891–900 Barbará D, Domeniconi C, Duric Z, Filippone M, Mansfield R, Lawson E (2008) Detecting suspicious behavior in surveillance images. In: IEEE international conference on proceedings of data mining workshops, ICDMW’08, IEEE, pp 891–900
7.
go back to reference Bentley JL, Sedgewick R (1997) Fast algorithms for sorting and searching strings. In: Proceedings of the 8 annual ACM–SIAM symposium on discrete algorithms, pp 360–369 Bentley JL, Sedgewick R (1997) Fast algorithms for sorting and searching strings. In: Proceedings of the 8 annual ACM–SIAM symposium on discrete algorithms, pp 360–369
8.
go back to reference Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis. J Mach Learn Res 11:1601–1604 Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis. J Mach Learn Res 11:1601–1604
9.
go back to reference Box GEP, Jenkins G, Reinsel GC, Ljung GM (2015) Time series analysis: forecasting and control. Wiley, HobokenMATH Box GEP, Jenkins G, Reinsel GC, Ljung GM (2015) Time series analysis: forecasting and control. Wiley, HobokenMATH
10.
go back to reference Chandola V, Arindam B, Vipin K (2009) Anomaly detection: a survey. ACM Comput Surv (CSUR) 41.3(2009):15 Chandola V, Arindam B, Vipin K (2009) Anomaly detection: a survey. ACM Comput Surv (CSUR) 41.3(2009):15
11.
go back to reference Chiu B, Keogh E, Lonardi S (2003) Probabilistic discovery of time series motifs. In: Proceeding KDD ’03 proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 493–498. ISBN: 1-58113-737-0. https://doi.org/10.1145/956750.956808 Chiu B, Keogh E, Lonardi S (2003) Probabilistic discovery of time series motifs. In: Proceeding KDD ’03 proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 493–498. ISBN: 1-58113-737-0. https://​doi.​org/​10.​1145/​956750.​956808
13.
go back to reference Gama J, Zliobaite I, Bifet A, Pechenizky M, Bouchachia A (2013) A survey on concept drift adaptation. ACM Comput Surv 46:1–35CrossRef Gama J, Zliobaite I, Bifet A, Pechenizky M, Bouchachia A (2013) A survey on concept drift adaptation. ACM Comput Surv 46:1–35CrossRef
14.
15.
go back to reference Goldin DQ, Kanellakis PC (1995) On similarity queries for time-series data: constraint specification and implementation. In: Montanari U, Rossi F (eds) Principles and practice of constraint programming—CP ’95 CP, vol 976. Lecture notes in computer science. Springer, Berlin Goldin DQ, Kanellakis PC (1995) On similarity queries for time-series data: constraint specification and implementation. In: Montanari U, Rossi F (eds) Principles and practice of constraint programming—CP ’95 CP, vol 976. Lecture notes in computer science. Springer, Berlin
16.
go back to reference Govindan RB, Narayanan K, Gopinathan MS (1998) On the evidence of deterministic chaos in ECG: surrogate and predictability analysis. Chaos 8(2):495–502CrossRef Govindan RB, Narayanan K, Gopinathan MS (1998) On the evidence of deterministic chaos in ECG: surrogate and predictability analysis. Chaos 8(2):495–502CrossRef
17.
22.
go back to reference Kaufman L, Rousseeuw PJ (2005) Finding groups in data: an introduction to cluster analysis, 1st edn. Wiley series in probability and statistics. Wiley, New YorkMATH Kaufman L, Rousseeuw PJ (2005) Finding groups in data: an introduction to cluster analysis, 1st edn. Wiley series in probability and statistics. Wiley, New YorkMATH
23.
go back to reference Keogh E, Lin J, Fu A (2005) HOT SAX: efficiently finding the most unusual time series subsequence. In: Proceedings of the fifth IEEE international conference on data mining (ICDM’05), pp 226–233 Keogh E, Lin J, Fu A (2005) HOT SAX: efficiently finding the most unusual time series subsequence. In: Proceedings of the fifth IEEE international conference on data mining (ICDM’05), pp 226–233
25.
go back to reference Kontaki M, Gounaris A, Papadopoulos AN, Tsichlas T, Manolopoulos Y (2011) Continuous monitoring of distance-based outliers over data streams. In: Proceedings of the 27th IEEE international conference on data engineering (ICDE’11), Hannover, Germany Kontaki M, Gounaris A, Papadopoulos AN, Tsichlas T, Manolopoulos Y (2011) Continuous monitoring of distance-based outliers over data streams. In: Proceedings of the 27th IEEE international conference on data engineering (ICDE’11), Hannover, Germany
26.
go back to reference Laguna P, Mark RG, Goldberger AL, Moody GB (1997) A database for evaluation of algorithms for measurement of QT and other waveform intervals in the ECG. Comput Cardiol 24:673–676 Laguna P, Mark RG, Goldberger AL, Moody GB (1997) A database for evaluation of algorithms for measurement of QT and other waveform intervals in the ECG. Comput Cardiol 24:673–676
27.
go back to reference Lin J, Keogh E, Lonardi S, Chiu B (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD workshop on research issues in data mining and knowledge discovery Lin J, Keogh E, Lonardi S, Chiu B (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD workshop on research issues in data mining and knowledge discovery
28.
go back to reference Malhotra P, Vig L, Shroff G, Agarwal P (2015) Long short term memory networks for anomaly detection in time series. In: Proceedings of ESANN 2015, Bruges (Belgium), 22–24 April 2015, ISBN 978-287587014-8 Malhotra P, Vig L, Shroff G, Agarwal P (2015) Long short term memory networks for anomaly detection in time series. In: Proceedings of ESANN 2015, Bruges (Belgium), 22–24 April 2015, ISBN 978-287587014-8
29.
go back to reference Massey FJ Jr (1951) The Kolmogorov–Smirnov test for goodness of fit. J Am Stat Assoc 46(253):68–78CrossRef Massey FJ Jr (1951) The Kolmogorov–Smirnov test for goodness of fit. J Am Stat Assoc 46(253):68–78CrossRef
33.
go back to reference Phua C, Alahakoon D, Lee V (2004) Minority report in fraud detection: classification of skewed data. In: ACM SIGKDD explorations newsletter—special issue on learning from imbalanced datasets, vol 6, no 1, pp 50–59, ACM, New York, NY, USA Phua C, Alahakoon D, Lee V (2004) Minority report in fraud detection: classification of skewed data. In: ACM SIGKDD explorations newsletter—special issue on learning from imbalanced datasets, vol 6, no 1, pp 50–59, ACM, New York, NY, USA
34.
go back to reference Pimentel M, Clifton D, Tarassenko L (2014) A review of novelty detection. Signal Process 99:215–249CrossRef Pimentel M, Clifton D, Tarassenko L (2014) A review of novelty detection. Signal Process 99:215–249CrossRef
36.
go back to reference Senin P, Lin J, Wang X, Oates T, Gandhi S, Boedihardjo AP, Chen C, Frankenstein S, Lerner M (2014) GrammarViz 2.0: a tool for grammar-based pattern discovery in time series. In: Proceedings of ECML/PKDD conference, 2014 Senin P, Lin J, Wang X, Oates T, Gandhi S, Boedihardjo AP, Chen C, Frankenstein S, Lerner M (2014) GrammarViz 2.0: a tool for grammar-based pattern discovery in time series. In: Proceedings of ECML/PKDD conference, 2014
37.
go back to reference Senin P, Lin J, Wang X, Oates T, Gandhi S, Boedihardjo AP, Chen C, Frankenstein S, Lerner M (2015) Time series anomaly discovery with grammar-based compression. In: Proceedings of the international conference on extending database technology, EDBT 15 Senin P, Lin J, Wang X, Oates T, Gandhi S, Boedihardjo AP, Chen C, Frankenstein S, Lerner M (2015) Time series anomaly discovery with grammar-based compression. In: Proceedings of the international conference on extending database technology, EDBT 15
38.
go back to reference Sheng B, Li Q, Mao W, Jin W (2007) Outlier detection in sensor networks. In: Proceedings of the 8th ACM international symposium on mobile ad hoc networking and computing, MobiHoc ’07, ACM, New York, NY, USA, pp 219–228 Sheng B, Li Q, Mao W, Jin W (2007) Outlier detection in sensor networks. In: Proceedings of the 8th ACM international symposium on mobile ad hoc networking and computing, MobiHoc ’07, ACM, New York, NY, USA, pp 219–228
40.
go back to reference Tran L, Fan L, Shahabi C (2016) Distance-based outlier detection in data streams. Proc VLDB Endow 9:1089–1100CrossRef Tran L, Fan L, Shahabi C (2016) Distance-based outlier detection in data streams. Proc VLDB Endow 9:1089–1100CrossRef
41.
go back to reference Tukey JW (1977) Exploratory data analysis. Addison-Wesley, Boston ISBN 0-201-07616-0. OCLC 3058187MATH Tukey JW (1977) Exploratory data analysis. Addison-Wesley, Boston ISBN 0-201-07616-0. OCLC 3058187MATH
42.
go back to reference Wang C, Viswanathan K, Choudur L, Talwar V, Satterfield W, Schwan K (2011) Statistical techniques for online anomaly detection in data centers. In: Proceedings of the IFIP/IEEE international symposium on integrated network management (1M), 23–27 May 2011 Wang C, Viswanathan K, Choudur L, Talwar V, Satterfield W, Schwan K (2011) Statistical techniques for online anomaly detection in data centers. In: Proceedings of the IFIP/IEEE international symposium on integrated network management (1M), 23–27 May 2011
43.
go back to reference Wang X, Lin J, Senin P, Oates T, Gandhi, Boedihardjo AP, Chen C, Frankenstein S (2016) RPM: representative pattern mining for efficient time series classification. In: Proceedings of the international conference on extending database technology, EDBT 16, pp 185–196 Wang X, Lin J, Senin P, Oates T, Gandhi, Boedihardjo AP, Chen C, Frankenstein S (2016) RPM: representative pattern mining for efficient time series classification. In: Proceedings of the international conference on extending database technology, EDBT 16, pp 185–196
45.
go back to reference Yang D, Rundensteiner E, Ward M (2009) Neighbor-based pattern detection for windows over streaming data. In: Proceedings of the 12th international conference on extending database technology (EDBT’09), Saint Petersburg, Russia Yang D, Rundensteiner E, Ward M (2009) Neighbor-based pattern detection for windows over streaming data. In: Proceedings of the 12th international conference on extending database technology (EDBT’09), Saint Petersburg, Russia
46.
go back to reference Yeh CC-M, Zhu Y, Ulanova L, Begum N, Ding Y, Dau HA, Silva DF, Mueen A, Keogh E (2016) Matrix profile I: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets, IEEE ICDM 2016 Yeh CC-M, Zhu Y, Ulanova L, Begum N, Ding Y, Dau HA, Silva DF, Mueen A, Keogh E (2016) Matrix profile I: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets, IEEE ICDM 2016
47.
go back to reference Zimmerman Z, Kamgar K, Senobari NS, Crites B, Funning G, Brisk P, Keogh E (2019) Matrix profile XIV: scaling time series motif discovery with GPUs to break a quintillion pairwise comparisons a day and beyond. In: Proceedings of the ACM symposium on cloud computing, association for computing machinery, New York, NY, USA, SoCC ’19, pp 74–86. https://doi.org/10.1145/3357223.3362721 Zimmerman Z, Kamgar K, Senobari NS, Crites B, Funning G, Brisk P, Keogh E (2019) Matrix profile XIV: scaling time series motif discovery with GPUs to break a quintillion pairwise comparisons a day and beyond. In: Proceedings of the ACM symposium on cloud computing, association for computing machinery, New York, NY, USA, SoCC ’19, pp 74–86. https://​doi.​org/​10.​1145/​3357223.​3362721
48.
go back to reference Zhu Y, Zimmerman Z, Senobari NS, Yeh C-CM, Funning G, Mueen A, Brisk P, Keogh E (2018) Exploiting a novel algorithm and GPUs to break the ten quadrillion pairwise comparisons barrier for time series motifs and joins. Knowl Inf Syst 54(1):203–236CrossRef Zhu Y, Zimmerman Z, Senobari NS, Yeh C-CM, Funning G, Mueen A, Brisk P, Keogh E (2018) Exploiting a novel algorithm and GPUs to break the ten quadrillion pairwise comparisons barrier for time series motifs and joins. Knowl Inf Syst 54(1):203–236CrossRef
49.
go back to reference Zhang GP (2003) Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing 50:159–175CrossRef Zhang GP (2003) Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing 50:159–175CrossRef
50.
Metadata
Title
Online anomaly search in time series: significant online discords
Authors
Paolo Avogadro
Luca Palonca
Matteo Alessandro Dominoni
Publication date
09-03-2020
Publisher
Springer London
Published in
Knowledge and Information Systems / Issue 8/2020
Print ISSN: 0219-1377
Electronic ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-020-01453-4

Other articles of this Issue 8/2020

Knowledge and Information Systems 8/2020 Go to the issue

Premium Partner