Skip to main content
Erschienen in: Cluster Computing 3/2021

29.01.2021

Online sequential extreme studentized deviate tests for anomaly detection in streaming data with varying patterns

verfasst von: Minho Ryu, Geonseok Lee, Kichun Lee

Erschienen in: Cluster Computing | Ausgabe 3/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In the new era of big data, numerous information and technology systems can store huge amounts of streaming data in real time, for example, in server-access logs on web application servers. The importance of anomaly detection in voluminous quantities of streaming data from such systems is rapidly increasing. One of the biggest challenges in the detection task is to carry out real-time contextual anomaly detection in streaming data with varying patterns that are visually detectable but unsuitable for a parametric model. Most anomaly detection algorithms have weaknesses in dealing with streaming time-series data containing such patterns. In this paper, we propose a novel method for online contextual anomaly detection in streaming time-series data using generalized extreme studentized deviates (GESD) tests. The GESD test is relatively accurate and efficient because it performs statistical hypothesis testing but it is unable to handle streaming time-series data. Thus, focusing on streaming time-series data, we propose an online version of the test capable of detecting outliers under varying patterns. We perform extensive experiments with simulated data, syntactic data, and real online traffic data from Yahoo Webscope, showing a clear advantage of the proposed method, particularly for analyzing streaming data with varying patterns.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Adibi, M.A., Shahrabi, J.: Online anomaly detection based on support vector clustering. Int. J. Comput. Intell. Syst. 8(4), 735746 (2015)CrossRef Adibi, M.A., Shahrabi, J.: Online anomaly detection based on support vector clustering. Int. J. Comput. Intell. Syst. 8(4), 735746 (2015)CrossRef
3.
Zurück zum Zitat Bartos, M.D., Mullapudi, A., Troutman, S.C.: rrcf: Implementation of the robust random cut forest algorithm for anomaly detection on streams. J. Open Source Softw. 4(35), 1336 (2019)CrossRef Bartos, M.D., Mullapudi, A., Troutman, S.C.: rrcf: Implementation of the robust random cut forest algorithm for anomaly detection on streams. J. Open Source Softw. 4(35), 1336 (2019)CrossRef
4.
Zurück zum Zitat Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41(3), 15:1–15:58 (2009)CrossRef Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41(3), 15:1–15:58 (2009)CrossRef
5.
Zurück zum Zitat Chen, C., Liu, L.M.: Joint estimation of model parameters and outlier effects in time series. J. Am. Stat. Assoc. 88(421), 284–297 (1993)MATH Chen, C., Liu, L.M.: Joint estimation of model parameters and outlier effects in time series. J. Am. Stat. Assoc. 88(421), 284–297 (1993)MATH
6.
Zurück zum Zitat Chen, T., Liu, X., Xia, B., Wang, W., Lai, Y.: Unsupervised anomaly detection of industrial robots using sliding-window convolutional variational autoencoder. IEEE Access 8, 47072–47081 (2020)CrossRef Chen, T., Liu, X., Xia, B., Wang, W., Lai, Y.: Unsupervised anomaly detection of industrial robots using sliding-window convolutional variational autoencoder. IEEE Access 8, 47072–47081 (2020)CrossRef
7.
Zurück zum Zitat Choi, Y., Lim, H., Choi, H., Kim, I.J.: Gan-based anomaly detection and localization of multivariate time series data for power plant. In: 2020 IEEE International Conference on Big Data and Smart Computing (BigComp), pp. 71–74. IEEE (2020) Choi, Y., Lim, H., Choi, H., Kim, I.J.: Gan-based anomaly detection and localization of multivariate time series data for power plant. In: 2020 IEEE International Conference on Big Data and Smart Computing (BigComp), pp. 71–74. IEEE (2020)
8.
Zurück zum Zitat Fan, C., Xiao, F., Zhao, Y., Wang, J.: Analytical investigation of autoencoder-based methods for unsupervised anomaly detection in building energy data. Appl. Energy 211, 1123–1135 (2018)CrossRef Fan, C., Xiao, F., Zhao, Y., Wang, J.: Analytical investigation of autoencoder-based methods for unsupervised anomaly detection in building energy data. Appl. Energy 211, 1123–1135 (2018)CrossRef
9.
Zurück zum Zitat Grubbs, F.E.: Procedures for detecting outlying observations in samples. Technometrics 11(1), 121 (1969)CrossRef Grubbs, F.E.: Procedures for detecting outlying observations in samples. Technometrics 11(1), 121 (1969)CrossRef
11.
Zurück zum Zitat Kingma, D.P., Welling, M.: Auto-encoding variational bayes (2013) Kingma, D.P., Welling, M.: Auto-encoding variational bayes (2013)
12.
Zurück zum Zitat Kloft, M., Laskov, P.: Online anomaly detection under adversarial impact. In: AISTATS (2010) Kloft, M., Laskov, P.: Online anomaly detection under adversarial impact. In: AISTATS (2010)
13.
Zurück zum Zitat Li, L., Yan, J., Wang, H., Jin, Y.: Anomaly detection of time series with smoothness-inducing sequential variational auto-encoder. IEEE Trans. Neural Netw. Learning Syst. 99, 1–15 (2020) Li, L., Yan, J., Wang, H., Jin, Y.: Anomaly detection of time series with smoothness-inducing sequential variational auto-encoder. IEEE Trans. Neural Netw. Learning Syst. 99, 1–15 (2020)
15.
Zurück zum Zitat Niu, Z., Yu, K., Wu, X.: LSTM-based vae-gan for time-series anomaly detection. Sensors 20(13), 3738 (2020)CrossRef Niu, Z., Yu, K., Wu, X.: LSTM-based vae-gan for time-series anomaly detection. Sensors 20(13), 3738 (2020)CrossRef
16.
Zurück zum Zitat Ozkan, H., Ozkan, F., Kozat, S.S.: Online anomaly detection under Markov statistics with controllable type-I error. IEEE Trans. Signal Process. 64(6), 1435–1445 (2015)MathSciNetCrossRef Ozkan, H., Ozkan, F., Kozat, S.S.: Online anomaly detection under Markov statistics with controllable type-I error. IEEE Trans. Signal Process. 64(6), 1435–1445 (2015)MathSciNetCrossRef
17.
Zurück zum Zitat Ren, H., Xu, B., Wang, Y., Yi, C., Huang, C., Kou, X., Xing, T., Yang, M., Tong, J., Zhang, Q.: Time-series anomaly detection service at Microsoft. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2019) Ren, H., Xu, B., Wang, Y., Yi, C., Huang, C., Kou, X., Xing, T., Yang, M., Tong, J., Zhang, Q.: Time-series anomaly detection service at Microsoft. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2019)
18.
Zurück zum Zitat Rosner, B.: Percentage points for a generalized ESD many-outlier procedure. Technometrics 25(2), 165172 (1983)CrossRef Rosner, B.: Percentage points for a generalized ESD many-outlier procedure. Technometrics 25(2), 165172 (1983)CrossRef
19.
Zurück zum Zitat Su, W., Liu, F., Zhao, J., He, M., Chen, H.: An online detection method for outliers of dynamic unstable measurement data. Clust. Comput. 22(4), 7831–7839 (2019)CrossRef Su, W., Liu, F., Zhao, J., He, M., Chen, H.: An online detection method for outliers of dynamic unstable measurement data. Clust. Comput. 22(4), 7831–7839 (2019)CrossRef
20.
Zurück zum Zitat Sun, R., Zhang, S., Yin, C., Wang, J., Min, S.: Strategies for data stream mining method applied in anomaly detection. Clust. Comput. 22(2), 399–408 (2019)CrossRef Sun, R., Zhang, S., Yin, C., Wang, J., Min, S.: Strategies for data stream mining method applied in anomaly detection. Clust. Comput. 22(2), 399–408 (2019)CrossRef
21.
Zurück zum Zitat Vallis, O., Hochenbaum, J., Kejariwal, A.: A novel technique for long-term anomaly detection in the cloud. In: Proceedings of the 6th USENIX Conference on Hot Topics in Cloud Computing, pp. 15–15. USENIX Association (2014) Vallis, O., Hochenbaum, J., Kejariwal, A.: A novel technique for long-term anomaly detection in the cloud. In: Proceedings of the 6th USENIX Conference on Hot Topics in Cloud Computing, pp. 15–15. USENIX Association (2014)
22.
Zurück zum Zitat Wei, L., Kumar, N., Lolla, V., Keogh, E.J., Lonardi, S., Ratanamahatana, C.: Assumption-free anomaly detection in time series. In: Proceedings of the 17th International Conference on Scientific and Statistical Database Management, SSDBM’ 2005, pp. 237–240. Lawrence Berkeley Laboratory, Berkeley, USA (2005) Wei, L., Kumar, N., Lolla, V., Keogh, E.J., Lonardi, S., Ratanamahatana, C.: Assumption-free anomaly detection in time series. In: Proceedings of the 17th International Conference on Scientific and Statistical Database Management, SSDBM’ 2005, pp. 237–240. Lawrence Berkeley Laboratory, Berkeley, USA (2005)
23.
Zurück zum Zitat Wyszecki, G., Stiles, W.: Color science: Concepts and methods, quantitative data and formulae. In: Gunther Wyszecki, W.S. (ed.) Color Science: Concepts and Methods, Quantitative Data and Formulae, 2nd edn. Wiley, Hoboken (2000) Wyszecki, G., Stiles, W.: Color science: Concepts and methods, quantitative data and formulae. In: Gunther Wyszecki, W.S. (ed.) Color Science: Concepts and Methods, Quantitative Data and Formulae, 2nd edn. Wiley, Hoboken (2000)
24.
Zurück zum Zitat Xu, L., Zhang, P., Xu, J., Wu, S., Han, G., Xu, D.: Conflict analysis of multi-source SST distribution. In: Zhang, W., Chen, Z., Douglas, C.C., Tong, W. (eds.) High Performance Computing and Applications, pp. 479–484. Springer, Berlin (2010)CrossRef Xu, L., Zhang, P., Xu, J., Wu, S., Han, G., Xu, D.: Conflict analysis of multi-source SST distribution. In: Zhang, W., Chen, Z., Douglas, C.C., Tong, W. (eds.) High Performance Computing and Applications, pp. 479–484. Springer, Berlin (2010)CrossRef
25.
Zurück zum Zitat Xu, H., Chen, W., Zhao, N., Li, Z., Bu, J., Li, Z., Liu, Y., Zhao, Y., Pei, D., Feng, Y., et al.: Unsupervised anomaly detection via variational auto-encoder for seasonal kpis in web applications. In: Proceedings of the 2018 World Wide Web Conference on World Wide Web, pp. 187–196. International World Wide Web Conferences Steering Committee (2018) Xu, H., Chen, W., Zhao, N., Li, Z., Bu, J., Li, Z., Liu, Y., Zhao, Y., Pei, D., Feng, Y., et al.: Unsupervised anomaly detection via variational auto-encoder for seasonal kpis in web applications. In: Proceedings of the 2018 World Wide Web Conference on World Wide Web, pp. 187–196. International World Wide Web Conferences Steering Committee (2018)
26.
Zurück zum Zitat Yin, C., Zhang, S., Yin, Z., Wang, J.: Anomaly detection model based on data stream clustering. Clust. Comput. 49, 1–10 (2019) Yin, C., Zhang, S., Yin, Z., Wang, J.: Anomaly detection model based on data stream clustering. Clust. Comput. 49, 1–10 (2019)
Metadaten
Titel
Online sequential extreme studentized deviate tests for anomaly detection in streaming data with varying patterns
verfasst von
Minho Ryu
Geonseok Lee
Kichun Lee
Publikationsdatum
29.01.2021
Verlag
Springer US
Erschienen in
Cluster Computing / Ausgabe 3/2021
Print ISSN: 1386-7857
Elektronische ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-021-03236-0

Weitere Artikel der Ausgabe 3/2021

Cluster Computing 3/2021 Zur Ausgabe

Premium Partner