Skip to main content
Erschienen in: The VLDB Journal 4/2023

14.01.2023 | Regular Paper

A meta-level analysis of online anomaly detectors

verfasst von: Antonios Ntroumpogiannis, Michail Giannoulis, Nikolaos Myrtakis, Vassilis Christophides, Eric Simon, Ioannis Tsamardinos

Erschienen in: The VLDB Journal | Ausgabe 4/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Real-time detection of anomalies in streaming data is receiving increasing attention as it allows us to raise alerts, predict faults, and detect intrusions or threats across industries. Yet, little attention has been given to compare the effectiveness and efficiency of anomaly detectors for streaming data (i.e., of online algorithms). In this paper, we present a qualitative, synthetic overview of major online detectors from different algorithmic families (i.e., distance, density, tree or projection based) and highlight their main ideas for constructing, updating and testing detection models. Then, we provide a thorough analysis of the results of a quantitative experimental evaluation of online detection algorithms along with their offline counterparts. The behavior of the detectors is correlated with the characteristics of different datasets (i.e., meta-features), thereby providing a meta-level analysis of their performance. Our study addresses several missing insights from the literature such as (a) how reliable are detectors against a random classifier and what dataset characteristics make them perform randomly; (b) to what extent online detectors approximate the performance of offline counterparts; (c) which sketch strategy and update primitives of detectors are best to detect anomalies visible only within a feature subspace of a dataset; (d) what are the trade-offs between the effectiveness and the efficiency of detectors belonging to different algorithmic families; (e) which specific characteristics of datasets yield an online algorithm to outperform all others.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
In this paper, we use the terms outlier, novelty and anomaly detection interchangeably
 
2
We call “sample” an element (i.e., observation, measurement) of a data stream.
 
3
An hyper-parameter cannot be estimated from the data.
 
4
The recently proposed distance-based detector NETS [73] consumes less resources than MCOD but is not reporting any improvement in terms of effectiveness.
 
5
nearest neighbors are distinguished according to maximum and average distance.
 
6
\(2^{h+1} - 1\) is the number of nodes in a perfect binary tree.
 
8
\(2 n - 1\) is the number of nodes of a full binary tree with n leaves.
 
9
We consider the coordinates of each point as (F2, F1).
 
10
in our implementation we consider the simple forgetting mechanism rather than the time-decaying mechanism.
 
11
Authors report that the value \(\gamma =0.01\) is optimal for the most datasets.
 
21
After removing null values and categorical features not treated by our anomaly detectors.
 
23
There are also some cases of unknown anomalies they may appear in any of those categories.
 
24
k value is automatically computed during training.
 
25
When p is stored in a leaf of size larger than one, the value of !t(p) is adjusted to c(size).
 
26
In most experimental studies [15, 24, 25], detectors were executed using the default values of their hyper-parameters as “recommended by their authors.”
 
27
RS does not require to compute the gradient of the problem to be optimized and hence be used on functions that are not continuous or differentiable [31].
 
Literatur
1.
Zurück zum Zitat Aggarwal, C.: An Introduction to Outlier Analysis, pp. 1–40 (2013) Aggarwal, C.: An Introduction to Outlier Analysis, pp. 1–40 (2013)
2.
Zurück zum Zitat Aggarwal, C., Hinneburg, A., Keim, A.: On the surprising behavior of distance metrics in high dimensional spaces ICDT (2001) Aggarwal, C., Hinneburg, A., Keim, A.: On the surprising behavior of distance metrics in high dimensional spaces ICDT (2001)
3.
Zurück zum Zitat Aggarwal, C., Sathe, S.: Theoretical foundations and algorithms for outlier ensembles. SIGKDD Explor. 17(1), 74 (2015)CrossRef Aggarwal, C., Sathe, S.: Theoretical foundations and algorithms for outlier ensembles. SIGKDD Explor. 17(1), 74 (2015)CrossRef
4.
Zurück zum Zitat Aggarwal, C., Sathe, S.: Outlier Ensembles-An Introduction. Springer, Berlin (2017)CrossRef Aggarwal, C., Sathe, S.: Outlier Ensembles-An Introduction. Springer, Berlin (2017)CrossRef
5.
Zurück zum Zitat Akidau, T., Bradshaw, R., Chambers, C., Chernyak, S., Fernández-Moctezuma, R., Lax, R., McVeety, S., Mills, D., Perry, F., Schmidt, E., Whittle, S.: The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. PVLDB 8(12), 68 (2015) Akidau, T., Bradshaw, R., Chambers, C., Chernyak, S., Fernández-Moctezuma, R., Lax, R., McVeety, S., Mills, D., Perry, F., Schmidt, E., Whittle, S.: The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. PVLDB 8(12), 68 (2015)
6.
Zurück zum Zitat Alcobaça, E., Siqueira, F., Rivolli, A., Garcia, L., Oliva, J., de Carvalho, A.: Mfe: towards reproducible meta-feature extraction. JMLR 21(111), 1–5 (2020) Alcobaça, E., Siqueira, F., Rivolli, A., Garcia, L., Oliva, J., de Carvalho, A.: Mfe: towards reproducible meta-feature extraction. JMLR 21(111), 1–5 (2020)
7.
Zurück zum Zitat Bailis, P., Gan, E., Madden, S., Narayanan, D., Rong, K., Suri, S.: Macrobase: prioritizing attention in fast data. In: SIGMOD (2017) Bailis, P., Gan, E., Madden, S., Narayanan, D., Rong, K., Suri, S.: Macrobase: prioritizing attention in fast data. In: SIGMOD (2017)
8.
9.
Zurück zum Zitat Bergmeir, C., Benítez, M.: On the use of cross-validation for time series predictor evaluation. Inf. Sci. 7, 191 (2012) Bergmeir, C., Benítez, M.: On the use of cross-validation for time series predictor evaluation. Inf. Sci. 7, 191 (2012)
10.
Zurück zum Zitat Birge, L., Rozenholc, Y.: How many bins should be put in a regular histogram. In: ESAIM: Probability and Statistics, pp. 24–45 (2006) Birge, L., Rozenholc, Y.: How many bins should be put in a regular histogram. In: ESAIM: Probability and Statistics, pp. 24–45 (2006)
11.
Zurück zum Zitat Blázquez-García, A., Conde, A., Mori, U., Lozano, J.: A review on outlier/anomaly detection in time series data. ACM Comput. Surv. 54(3), 98 (2021) Blázquez-García, A., Conde, A., Mori, U., Lozano, J.: A review on outlier/anomaly detection in time series data. ACM Comput. Surv. 54(3), 98 (2021)
12.
Zurück zum Zitat Braei, M., Wagner, S.: Anomaly detection in univariate time-series: a survey on the state-of-the-art. CoRR 00433, 2020 (2004) Braei, M., Wagner, S.: Anomaly detection in univariate time-series: a survey on the state-of-the-art. CoRR 00433, 2020 (2004)
13.
Zurück zum Zitat Branco, P., Torgo, L., Ribeiro, R.: A survey of predictive modelling under imbalanced distributions. CoRR, 1505.01658 (2015) Branco, P., Torgo, L., Ribeiro, R.: A survey of predictive modelling under imbalanced distributions. CoRR, 1505.01658 (2015)
14.
Zurück zum Zitat Breunig, M., Kriegel, H., Ng, R., Sander, J.: Lof: identifying density-based local outliers. SIGMOD Rec. 29(2), 799 (2000)CrossRef Breunig, M., Kriegel, H., Ng, R., Sander, J.: Lof: identifying density-based local outliers. SIGMOD Rec. 29(2), 799 (2000)CrossRef
15.
Zurück zum Zitat Campos, G., Zimek, A., Sander, J., Campello, R., Micenková, B., Schubert, E., Assent, I., Houle, M.: On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Min. Knowl. Discov. 30(4), 891–927 (2016)MathSciNetCrossRef Campos, G., Zimek, A., Sander, J., Campello, R., Micenková, B., Schubert, E., Assent, I., Houle, M.: On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Min. Knowl. Discov. 30(4), 891–927 (2016)MathSciNetCrossRef
16.
Zurück zum Zitat Cao, L., Yang, D., Wang, Q., Yu, Y., Wang, J., Rundensteiner, E.: Scalable distance-based outlier detection over high-volume data streams (2014) Cao, L., Yang, D., Wang, Q., Yu, Y., Wang, J., Rundensteiner, E.: Scalable distance-based outlier detection over high-volume data streams (2014)
17.
Zurück zum Zitat Carbone, P., Fragkoulis, M., Kalavri, V., Katsifodimos, A.: Beyond analytics: the evolution of stream processing systems. In: SIGMOD (2020) Carbone, P., Fragkoulis, M., Kalavri, V., Katsifodimos, A.: Beyond analytics: the evolution of stream processing systems. In: SIGMOD (2020)
18.
Zurück zum Zitat Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41(3), 96 (2009)CrossRef Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41(3), 96 (2009)CrossRef
19.
Zurück zum Zitat Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection for discrete sequences: a survey. IEEE TKDE 24(5), 119 (2012) Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection for discrete sequences: a survey. IEEE TKDE 24(5), 119 (2012)
20.
Zurück zum Zitat Choudhary, D., Arun Kejariwal, A., Orsini, F.: On the runtime-efficacy trade-off of anomaly detection techniques for real-time streaming data. CoRR, arxiv:1710.04735 (2017) Choudhary, D., Arun Kejariwal, A., Orsini, F.: On the runtime-efficacy trade-off of anomaly detection techniques for real-time streaming data. CoRR, arxiv:​1710.​04735 (2017)
21.
Zurück zum Zitat Cook, A., Mısırlı, G., Fan, Z.: Anomaly detection for IoT time-series data: a survey. IEEE IoT J. 7(7), 88 (2020) Cook, A., Mısırlı, G., Fan, Z.: Anomaly detection for IoT time-series data: a survey. IEEE IoT J. 7(7), 88 (2020)
22.
Zurück zum Zitat Davis, J., Goadrich, M.: The relationship between precision-recall and ROC curves. In: (ICML’06), pp. 233–240 (2006) Davis, J., Goadrich, M.: The relationship between precision-recall and ROC curves. In: (ICML’06), pp. 233–240 (2006)
23.
Zurück zum Zitat Demšar, J.: Statistical comparisons of classifiers over multiple data sets. In: JMLR, 7, December (2006) Demšar, J.: Statistical comparisons of classifiers over multiple data sets. In: JMLR, 7, December (2006)
24.
Zurück zum Zitat Domingues, R., Filippone, M., Michiardi, P., Zouaoui, J.: A comparative evaluation of outlier detection algorithms. Pattern Recogn. 74(C), 406–421 (2018)MATHCrossRef Domingues, R., Filippone, M., Michiardi, P., Zouaoui, J.: A comparative evaluation of outlier detection algorithms. Pattern Recogn. 74(C), 406–421 (2018)MATHCrossRef
25.
Zurück zum Zitat Domingues, R., Filippone, M., Michiardi, P., Zouaoui, J.: A comparative evaluation of outlier detection algorithms: experiments and analyses. Pattern Recognit. 74, 478 (2018)MATHCrossRef Domingues, R., Filippone, M., Michiardi, P., Zouaoui, J.: A comparative evaluation of outlier detection algorithms: experiments and analyses. Pattern Recognit. 74, 478 (2018)MATHCrossRef
26.
Zurück zum Zitat Dua, D., Graff, C.: Uci Machine Learning Repository. University of California, School of Information and Computer Sciences, Irvine (2017) Dua, D., Graff, C.: Uci Machine Learning Repository. University of California, School of Information and Computer Sciences, Irvine (2017)
27.
Zurück zum Zitat Dudani, S.: The distance-weighted k-nearest-neighbor rule. IEEE Trans. SMC 6(4), 325–327 (1976) Dudani, S.: The distance-weighted k-nearest-neighbor rule. IEEE Trans. SMC 6(4), 325–327 (1976)
28.
Zurück zum Zitat Emmott, A., Das, S., Dietterich, T., Fern, A., Wong, W.: A meta-analysis of the anomaly detection problem. CoRR, arxiv:1503.01158 (2015) Emmott, A., Das, S., Dietterich, T., Fern, A., Wong, W.: A meta-analysis of the anomaly detection problem. CoRR, arxiv:​1503.​01158 (2015)
29.
Zurück zum Zitat Goix, N., Drougard, N., Brault, R., Chiapino, M.: One class splitting criteria for random forests. In: ACML (2017) Goix, N., Drougard, N., Brault, R., Chiapino, M.: One class splitting criteria for random forests. In: ACML (2017)
30.
Zurück zum Zitat Goldstein, M., Uchida, S.: A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data. PLoS One 11(4), 887 (2016)CrossRef Goldstein, M., Uchida, S.: A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data. PLoS One 11(4), 887 (2016)CrossRef
31.
Zurück zum Zitat Granichin, O.N., Volkovich, Z., Toledano-Kitai, D.: Randomized Algorithms in Automatic Control and Data Mining, vol. 67. Springer, Berlin (2015) Granichin, O.N., Volkovich, Z., Toledano-Kitai, D.: Randomized Algorithms in Automatic Control and Data Mining, vol. 67. Springer, Berlin (2015)
32.
Zurück zum Zitat Guha, S., Mishra, N., Roy, G., Schrijvers, O.: Robust random cut forest based anomaly detection on streams. In: ICML’16, pp. 2712–2721 (2016) Guha, S., Mishra, N., Roy, G., Schrijvers, O.: Robust random cut forest based anomaly detection on streams. In: ICML’16, pp. 2712–2721 (2016)
33.
Zurück zum Zitat Gupta, M., Gao, J., Aggarwal, C., Han, J.: Outlier detection for temporal data: a survey. IEEE TKDE 26(9), 83 (2014)MATH Gupta, M., Gao, J., Aggarwal, C., Han, J.: Outlier detection for temporal data: a survey. IEEE TKDE 26(9), 83 (2014)MATH
34.
Zurück zum Zitat Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer, Berlin (2009)MATHCrossRef Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer, Berlin (2009)MATHCrossRef
35.
Zurück zum Zitat Herbold, S.: Autorank: a python package for automated ranking of classifiers. J. Open Source Softw. 3, 2173 (2020)CrossRef Herbold, S.: Autorank: a python package for automated ranking of classifiers. J. Open Source Softw. 3, 2173 (2020)CrossRef
36.
Zurück zum Zitat Hodge, V., Austin, J.: A survey of outlier detection methodologies. Artif. Intell. Rev. 22(2), 69 (2004)MATHCrossRef Hodge, V., Austin, J.: A survey of outlier detection methodologies. Artif. Intell. Rev. 22(2), 69 (2004)MATHCrossRef
37.
Zurück zum Zitat Jacob, V., Song, F., Stiegler, A., Rad, B., Diao, Y., Tatbul, N.: Exathlon: a benchmark for explainable anomaly detection over time series. PVLDB 14(11), 58 (2021) Jacob, V., Song, F., Stiegler, A., Rad, B., Diao, Y., Tatbul, N.: Exathlon: a benchmark for explainable anomaly detection over time series. PVLDB 14(11), 58 (2021)
38.
Zurück zum Zitat Keller, F., Müller, E., Böhm, K.: (2012) Hics: high contrast subspaces for density-based outlier ranking. In: ICDE, pp. 1037–1048 Keller, F., Müller, E., Böhm, K.: (2012) Hics: high contrast subspaces for density-based outlier ranking. In: ICDE, pp. 1037–1048
39.
Zurück zum Zitat Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAI (1995) Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAI (1995)
40.
Zurück zum Zitat Kontaki, M., Gounaris, A., Papadopoulos, A., Tsichlas, K., Manolopoulos, Y.: Continuous monitoring of distance-based outliers over data streams. In: ICDE (2011) Kontaki, M., Gounaris, A., Papadopoulos, A., Tsichlas, K., Manolopoulos, Y.: Continuous monitoring of distance-based outliers over data streams. In: ICDE (2011)
41.
Zurück zum Zitat Li, J., Maier, D., Tufte, K., Papadimos, V., Tucker, P.: Semantics and evaluation techniques for window aggregates in data streams. In: SIGMOD (2005) Li, J., Maier, D., Tufte, K., Papadimos, V., Tucker, P.: Semantics and evaluation techniques for window aggregates in data streams. In: SIGMOD (2005)
42.
Zurück zum Zitat Lindner, G., Studer, R.: Ast: Support for algorithm selection with a CBR approach. In: Principles of Data Mining and Knowledge Discovery, pp. 418–423 (1999) Lindner, G., Studer, R.: Ast: Support for algorithm selection with a CBR approach. In: Principles of Data Mining and Knowledge Discovery, pp. 418–423 (1999)
43.
Zurück zum Zitat Liu, T., Ting, K. Ming, Zhou, Z.: Isolation forest. In: ICDM, pp. 413–422 (2008) Liu, T., Ting, K. Ming, Zhou, Z.: Isolation forest. In: ICDM, pp. 413–422 (2008)
44.
Zurück zum Zitat Lobo, J., Jiménez-Valverde, A., Real, R.: AUC: a misleading measure of the performance of predictive distribution models. Global Ecol. Biogeogr. 17(2), 9008 (2008)CrossRef Lobo, J., Jiménez-Valverde, A., Real, R.: AUC: a misleading measure of the performance of predictive distribution models. Global Ecol. Biogeogr. 17(2), 9008 (2008)CrossRef
45.
Zurück zum Zitat Manzoor, E., Lamba, H., Akoglu, L.: Xstream: Outlier detection in feature-evolving data streams KDD. (2018) Manzoor, E., Lamba, H., Akoglu, L.: Xstream: Outlier detection in feature-evolving data streams KDD. (2018)
46.
Zurück zum Zitat Na, Gyoung S., Kim, Donghyun, Yu., Hwanjo: Dilof: Effective and memory efficient local outlier detection in data streams. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1993–2002 (2018) Na, Gyoung S., Kim, Donghyun, Yu., Hwanjo: Dilof: Effective and memory efficient local outlier detection in data streams. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1993–2002 (2018)
47.
Zurück zum Zitat Orair, G., Teixeira, C., Meira, W., Wang, Y., Parthasarathy, S.: Distance-based outlier detection: consolidation and renewed bearing. PVLDB 3(2), 788 (2010) Orair, G., Teixeira, C., Meira, W., Wang, Y., Parthasarathy, S.: Distance-based outlier detection: consolidation and renewed bearing. PVLDB 3(2), 788 (2010)
48.
Zurück zum Zitat Pang, G., Shen, C., Cao, L., Hengel, A.: Deep learning for anomaly detection: a review. ACM Comput. Surv. 54(2), 89 (2021) Pang, G., Shen, C., Cao, L., Hengel, A.: Deep learning for anomaly detection: a review. ACM Comput. Surv. 54(2), 89 (2021)
50.
Zurück zum Zitat Qin, X., Cao, L., Rundensteiner, E.A., Madden, S.: Scalable kernel density estimation-based local outlier detection over large data streams. In: EDBT (2019) Qin, X., Cao, L., Rundensteiner, E.A., Madden, S.: Scalable kernel density estimation-based local outlier detection over large data streams. In: EDBT (2019)
51.
Zurück zum Zitat Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets, pp. 427–438 (2000) Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets, pp. 427–438 (2000)
52.
Zurück zum Zitat Rastrigin, L.A.: The convergence of the random search method in the extremal control of a many parameter system. Autom. Remote Control 4, 1337–1342 (1963) Rastrigin, L.A.: The convergence of the random search method in the extremal control of a many parameter system. Autom. Remote Control 4, 1337–1342 (1963)
53.
Zurück zum Zitat Rogers, J., Gunn, S.: Identifying feature relevance using a random forest. In: SLSFS, pp. 173–184, Bohinj, Slovenia (2005) Rogers, J., Gunn, S.: Identifying feature relevance using a random forest. In: SLSFS, pp. 173–184, Bohinj, Slovenia (2005)
54.
Zurück zum Zitat Roy, S.N.: On a Heuristic method of test construction and its use in multivariate analysis. Ann. Math. Stat. 6, 220–238 (1953)MathSciNetMATHCrossRef Roy, S.N.: On a Heuristic method of test construction and its use in multivariate analysis. Ann. Math. Stat. 6, 220–238 (1953)MathSciNetMATHCrossRef
55.
Zurück zum Zitat Sadik, S., Gruenwald, L.: Research issues in outlier detection for data streams. SIGKDD Explor. Newsl. 15(1), 78 (2014)CrossRef Sadik, S., Gruenwald, L.: Research issues in outlier detection for data streams. SIGKDD Explor. Newsl. 15(1), 78 (2014)CrossRef
56.
Zurück zum Zitat Saito, T., Rehmsmeier, M.: The precision-recall plot is more informative than ROC when evaluating binary classifiers on imbalanced datasets. PLoS One 10(3), 708 (2015)CrossRef Saito, T., Rehmsmeier, M.: The precision-recall plot is more informative than ROC when evaluating binary classifiers on imbalanced datasets. PLoS One 10(3), 708 (2015)CrossRef
57.
Zurück zum Zitat Sathe, S., Aggarwal, C.: Subspace histograms for outlier detection in linear time. KAIS 56(3), 68 (2018) Sathe, S., Aggarwal, C.: Subspace histograms for outlier detection in linear time. KAIS 56(3), 68 (2018)
58.
Zurück zum Zitat Silva, J., Faria, E., Barros, R., Hruschka, E., de Carvalho, A., Gama, J.: Data stream clustering: a survey. ACM Comput. Surv. 46(1), 9114 (2013)MATHCrossRef Silva, J., Faria, E., Barros, R., Hruschka, E., de Carvalho, A., Gama, J.: Data stream clustering: a survey. ACM Comput. Surv. 46(1), 9114 (2013)MATHCrossRef
59.
Zurück zum Zitat Somol, P., Grim, J., Filip, J., Pudil, P.: On stopping rules in dependency-aware feature ranking. In: CIARP (2013) Somol, P., Grim, J., Filip, J., Pudil, P.: On stopping rules in dependency-aware feature ranking. In: CIARP (2013)
60.
Zurück zum Zitat Tan, C., Ting, M., Liu, T.: Fast anomaly detection for streaming data. In: IJCAI (2011) Tan, C., Ting, M., Liu, T.: Fast anomaly detection for streaming data. In: IJCAI (2011)
61.
Zurück zum Zitat Tatbul, N., Lee, T.J., Zdonik, S., Alam, M., Gottschlich, J.: Precision and recall for time series. In: NIPS (2018) Tatbul, N., Lee, T.J., Zdonik, S., Alam, M., Gottschlich, J.: Precision and recall for time series. In: NIPS (2018)
62.
Zurück zum Zitat Ting, K.M., Washio, T., Wells, J.R., Aryal, S.: Defying the gravity of learning curve: a characteristic of nearest neighbour anomaly detectors. Mach. Learn. 5, 55–91 (2017)MathSciNetMATHCrossRef Ting, K.M., Washio, T., Wells, J.R., Aryal, S.: Defying the gravity of learning curve: a characteristic of nearest neighbour anomaly detectors. Mach. Learn. 5, 55–91 (2017)MathSciNetMATHCrossRef
63.
Zurück zum Zitat Tran, L., Fan, L., Shahabi, C.: Distance-based outlier detection in data streams. PVLDB 9(12), 96 (2016) Tran, L., Fan, L., Shahabi, C.: Distance-based outlier detection in data streams. PVLDB 9(12), 96 (2016)
64.
Zurück zum Zitat Tran, L., Mun, M., Shahabi, C.: Real-time distance-based outlier detection in data streams. PVLDB 14(2), 7006 (2020) Tran, L., Mun, M., Shahabi, C.: Real-time distance-based outlier detection in data streams. PVLDB 14(2), 7006 (2020)
65.
Zurück zum Zitat van Stein, B., van Leeuwen, M., Bäck, T.: Local subspace-based outlier detection using global neighbourhoods. CoRR, arxiv:1611.00183 (2016) van Stein, B., van Leeuwen, M., Bäck, T.: Local subspace-based outlier detection using global neighbourhoods. CoRR, arxiv:​1611.​00183 (2016)
66.
Zurück zum Zitat Vanschoren, J.: Meta-Learning, pp. 35–61 (2019) Vanschoren, J.: Meta-Learning, pp. 35–61 (2019)
67.
Zurück zum Zitat Vanschoren, J., van Rijn, J., Bischl, B., Torgo, L.: OpenML: networked science in machine learning. SIGKDD Explor. 15(2), 96 (2013) Vanschoren, J., van Rijn, J., Bischl, B., Torgo, L.: OpenML: networked science in machine learning. SIGKDD Explor. 15(2), 96 (2013)
68.
Zurück zum Zitat Wang, H., Bah, J., Hammad, M.: Progress in outlier detection techniques: a survey. IEEE Access 7, 998 (2019) Wang, H., Bah, J., Hammad, M.: Progress in outlier detection techniques: a survey. IEEE Access 7, 998 (2019)
69.
Zurück zum Zitat Wu, R., Keogh, E.: Current time series anomaly detection benchmarks are flawed and are creating the illusion of progress. In: IEEE TKDE (2021) Wu, R., Keogh, E.: Current time series anomaly detection benchmarks are flawed and are creating the illusion of progress. In: IEEE TKDE (2021)
70.
Zurück zum Zitat Xia, S., Xiong, Z., Luo, Y., WeiXu, Z.G.: Effectiveness of the Euclidean distance in high dimensional spaces. Optik 4, 5614–5619 (2015) Xia, S., Xiong, Z., Luo, Y., WeiXu, Z.G.: Effectiveness of the Euclidean distance in high dimensional spaces. Optik 4, 5614–5619 (2015)
71.
Zurück zum Zitat Yang, J., Rahardja, S., Fränti, P.: Outlier detection: how to threshold outlier scores? In: AIIPCC (2019) Yang, J., Rahardja, S., Fränti, P.: Outlier detection: how to threshold outlier scores? In: AIIPCC (2019)
72.
Zurück zum Zitat Yoon, S., Lee, J., Lee, B.: Ultrafast local outlier detection from a data stream with stationary region skipping. In: KDD (2020) Yoon, S., Lee, J., Lee, B.: Ultrafast local outlier detection from a data stream with stationary region skipping. In: KDD (2020)
73.
Zurück zum Zitat Yoon, S., Lee, J., Lee, B.S.: Nets: extremely fast outlier detection from a data stream via set-based processing. PVLDB 12(11), 998 (2019) Yoon, S., Lee, J., Lee, B.S.: Nets: extremely fast outlier detection from a data stream via set-based processing. PVLDB 12(11), 998 (2019)
74.
Zurück zum Zitat Zhang, E., Zhang, Y.I.: Average precision. In: Encyclopedia of Database Systems (2009) Zhang, E., Zhang, Y.I.: Average precision. In: Encyclopedia of Database Systems (2009)
75.
Zurück zum Zitat Zhao, Y., Rossi, A., Akoglu, L.: Automating outlier detection via meta-learning. CoRR 2009, 10606 (2020) Zhao, Y., Rossi, A., Akoglu, L.: Automating outlier detection via meta-learning. CoRR 2009, 10606 (2020)
76.
Zurück zum Zitat Zimek, A., Filzmoser, P.: There and back again: outlier detection between statistical reasoning and data mining algorithms. Int. Rev. Data Min. Knowl. Discov. 8(6), 66 (2018) Zimek, A., Filzmoser, P.: There and back again: outlier detection between statistical reasoning and data mining algorithms. Int. Rev. Data Min. Knowl. Discov. 8(6), 66 (2018)
77.
Zurück zum Zitat Zimek, A., Gaudet, M., Campello, R., Sander, J.: Subsampling for efficient and effective unsupervised outlier detection ensembles. In: KDD (2013) Zimek, A., Gaudet, M., Campello, R., Sander, J.: Subsampling for efficient and effective unsupervised outlier detection ensembles. In: KDD (2013)
78.
Zurück zum Zitat Zimek, A., Schubert, E., Kriegel, H.: A survey on unsupervised outlier detection in high-dimensional numerical data. Stat. Anal. Data Mini. 5(5), 997 (2012) Zimek, A., Schubert, E., Kriegel, H.: A survey on unsupervised outlier detection in high-dimensional numerical data. Stat. Anal. Data Mini. 5(5), 997 (2012)
Metadaten
Titel
A meta-level analysis of online anomaly detectors
verfasst von
Antonios Ntroumpogiannis
Michail Giannoulis
Nikolaos Myrtakis
Vassilis Christophides
Eric Simon
Ioannis Tsamardinos
Publikationsdatum
14.01.2023
Verlag
Springer Berlin Heidelberg
Erschienen in
The VLDB Journal / Ausgabe 4/2023
Print ISSN: 1066-8888
Elektronische ISSN: 0949-877X
DOI
https://doi.org/10.1007/s00778-022-00773-x

Weitere Artikel der Ausgabe 4/2023

The VLDB Journal 4/2023 Zur Ausgabe