Zum Inhalt
Erschienen in:

Open Access 01.06.2025 | Original Article

Robustness of unsupervised methods for image surface-anomaly detection

verfasst von: Jakob Božič, Matic Fučka, Vitjan Zavrtanik, Danijel Skočaj

Erschienen in: Pattern Analysis and Applications | Ausgabe 2/2025

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Die Erkennung von Oberflächenanomalien ist in Qualitätskontrollsystemen von entscheidender Bedeutung, wo die Erkennung von Fehlern an hergestellten Komponenten entscheidend für die Gewährleistung der Langlebigkeit und Sicherheit von Produkten ist. Dieser Artikel untersucht die Robustheit unbeaufsichtigter Methoden zur Erkennung von Bildoberflächen-Anomalien, die ausschließlich auf normalen Proben beruhen. Sie befasst sich mit der praktischen Herausforderung anomaler Proben im Trainingssatz, ein Szenario, das in idealisierten Bewertungen oft übersehen wird. Die Studie schlägt eine Messgröße für die Robustheit vor, um den Leistungsabfall bei der Einführung anomaler Proben zu quantifizieren und eine realistischere Bewertung dieser Methoden zu ermöglichen. Durch eine umfassende Evaluierung von sieben etablierten Methoden über vier verschiedene Datensätze bietet der Artikel Einblicke in die Stärken und Schwächen unterschiedlicher Paradigmen zur Erkennung von Anomalien. Sie unterstreicht die überlegene Robustheit von Methoden, die vorausgebildete Merkmale verwenden, selbst wenn ein erheblicher Teil des Trainingssatzes mit Anomalien kontaminiert ist. Die Ergebnisse unterstreichen, wie wichtig es ist, bei der Entwicklung und Bewertung unkontrollierter Anomalieerkennungsmethoden auf Robustheit zu achten, um falsche Negative auszubilden und so den Weg für zuverlässigere Qualitätskontrollsysteme zu ebnen.
Hinweise

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Introduction

Surface-anomaly detection aims to detect image regions, which significantly differ from the expected appearance. This is a common task in quality control systems [36], where the products need to be visually inspected to detect irregularities on the surface of manufactured components. These defects diminish the lifespan of the manufactured component or pose safety risks to the end user. For this reason, the problem of surface-anomaly detection [6, 20, 27, 32] has become a critical aspect for ensuring high-quality products.
In recent years, many deep-learning-based solutions have been developed to tackle this problem. Fully supervised methods [10, 30] were among the first that were successfully applied to automate the surface-anomaly detection. However, they require a considerable amount of pixel-level labeled data for training, which can be challenging to obtain, either due to the rarity of anomalous samples, or due to the costly labeling process. Unsupervised methods [9, 12, 14, 21, 23, 38, 39] address these limitations by building the models using normal samples only and have therefore seen significant growth in development and usage. An assumption that anomalous samples can be perfectly identified during the data collection underlies all unsupervised methods, however, in practice this is rarely the case. Anomalous samples can remain in the training set either due to human errors, as some of the anomalies can be very hard to see, or due to the vague definition of what is considered a defect, which can occur when small imperfections are considered acceptable, whereas slightly larger ones are already considered as an anomaly.
Fig. 1
An underlying assumption of all unsupervised methods for surface-anomaly detection is that the training set contains only normal images. We investigate how much the performance of unsupervised methods drops in more realistic scenarios, when anomalous samples are present in the training set
In this paper we investigate how the performance of unsupervised methods changes, when the assumption of the clean training set is violated and anomalous samples are present in the training set, thus reflecting a more realistic scenario (Fig. 1). We propose a robustness measure for unsupervised surface-anomaly detection, which describes how the performance of the method drops when anomalous samples are present in the training set, and should be reported alongside the standard performance metrics when a new method is proposed. Furthermore, we present a detailed analysis of seven well established methods, representing different anomaly detection paradigms, based on the results obtained on four diverse datasets.
The remainder of this paper is structured as follows: In Sect. 2, we present the related work, followed by the description of the anomaly detection methods used in the paper in Sect. 3. In Sect. 4, we present the evaluation details, and in Sect. 5.2, we describe the experimental setup and our proposed robustness measure. Section 6 presents a detailed evaluation of the results, followed by a conclusion in Sect. 7.
In the last couple of years, many unsupervised methods have been developed for surface-anomaly detection. In this section we outline a number of them, while seven selected methods are further described in the next section.
Reconstructive approaches using autoencoders (AE) [4, 13] and generative adversarial networks (GAN) [13, 28] were among the earliest, relying on poor reconstruction of anomalous regions since they were absent from the training set. Anomaly scores were then assigned by comparing the input and the reconstructed image, requiring further post-processing. Reconstruction methods have also shown promising results for anomaly detection in medical images [16, 29] and road anomaly detection [31]. Zhang and Deng [41] combined SVDD (Support Vector Data Description) with autoencoders to simultaneously preserve data structure and map the normal samples in a minimal volume hypersphere. Another proposed solution by Zavrtanik et al. [39] involved masking parts of the image and reconstructing it using information from neighboring patches. With the impressive performance of Transformers in Natural Language Processing, several methods [35] tried to harness its power for anomaly detection. For instance, Intra [19] masked a percentage of image patches and used transformer blocks to reconstruct them. All of the transformer encoder blocks also had residual connections to various points in the architecture. In the last year, diffusion models have emerged as the state-of-the-art in image generation. One of the earliest approaches for anomaly detection using diffusion models, AnoDDPM [33], changed the type of noise added to the images from the standard Gaussian noise to Simplex noise [18].
In the past few years, various embedding-based approaches [5, 26, 35] have gained significant attention. Christiansen et al. [7] used an ImageNet pretrained VGG for feature extraction and modeled normality with different models such as k-Nearest Neighbors, Multivariate Gaussian distribution (MVG) and Gaussian Mixture Models, to detect anomalies in agricultural field images. Rippel et al. [21] used ImageNet pretrained EfficientNet features and MVG with Mahalanobis distance for surface-anomaly detection on image level, while Defard et al. [9] extended their work allowing also for accurate anomaly localization. Normalizing flows have also been shown to be effective for surface-anomaly detection. Rudolph et al. [24] used AlexNet features at different scales and coupling flows to estimate the likelihood of query images. Gudovskiy et al. [14] combined positional encodings and pooled features as inputs to normalizing flow to obtain precise anomaly localization. Rudolph et al. [25] designed cross-scale normalizing flow that maintains image structure to obtain interpretable latent space and precise localization of anomalous regions. FastFlow [37] improved upon the idea of C-Flow by improving the architecture of the normalizing flow. Roth et al. [23] used a coreset-subsampled memory bank of pretrained features, both to model the normal data and in order to perform anomaly detection and localization.
In addition, discriminative approaches [8, 11, 22] to surface anomaly detection have been proposed. DRÆM [38] introduced the idea of generating synthetic anomalies by overlaying a normal image from another dataset according to a binary mask obtained through thresholding Perlin noise. DRÆM also extended the reconstruction architecture with a discriminative subnetwork. Li et al. [15] proposed to finetune a pretrained network on augmentation-type prediction proxy task. MemSeg [34] improved upon this approach by also taking into consideration a binary map obtained through Otsu’s method [17] for generating more realistic defects. DSR [40] further advanced the generation of synthetic anomalies by injecting them into the latent space.

3 Anomaly detection methods

To analyze the anomaly detection robustness we evaluated seven well established methods that represent different anomaly detection paradigms being in use nowadays. We selected several well established high-performing methods, which fall in different categories based on how they represent the data, how they build the model of normality and how they assign the final anomaly score. We selected one representative reconstructive approach, four embedding-based approaches using various normality models and two different discriminative approaches. We summarize the key aspects of all analyzed methods in Table 1 and describe them briefly below.
Table 1
Overview of analyzed methods, highlighting the key features
Method
Type
Normality model
Anomaly measure
Anomaly localization
RIAD
R
Reconstructive
SSIM + MGMS + L2
\(\checkmark \)
Gaussian AD
EB
MVG
Mahalanobis distance
 
PaDiM
EB
MVG
Mahalanobis distance
\(\checkmark \)
CutPaste
D
MVG
Mahalanobis distance
\(\checkmark \)
PatchCore
EB
Memory bank
L2 distance
\(\checkmark \)
CFLOW-AD
EB
Normalizing flow
(log) likelihood
\(\checkmark \)
DRÆM
D
Reconstructive
Implicit
\(\checkmark \)
In the method type column, R stands for reconstructive approach, EB for embedding-based approach, and D for discriminative approach
RIAD
RIAD [39] (Reconstruction by inpainting for visual anomaly detection) learns representation of normal images by training an encoder-decoder network to perform reconstruction by an iterative inpainting process. During training, an image is divided into complementary sets of rectangular patches. Pixels of individual sets are set to 0 and the network is trained to inpaint the missing patches. The entire image is reconstructed by inpainting all sets individually. The anomaly score is defined as the reconstruction error between the input and the output image reconstruction. The image-level anomaly score is defined as the maximum value in the smoothed reconstruction error map.
Gaussian AD
Gaussian AD [21] uses an ImageNet pretrained network as a feature extractor. It models normality by fitting a Multivariate Gaussian distribution (MVG) over the extracted features of training samples. At inference, features are extracted from an image and the anomaly score is defined as the Mahalanobis distance to the estimated MVG. Global average pooling is performed on the features before MVG estimation so the method can only output image-level scores.
PaDiM
PaDiM [9] is based on Gaussian AD, extending the method to consider spatial information during feature extraction and subsequently at MVG estimation. A MVG is estimated for every feature map position. Mahalanobis distance is used to estimate the anomaly score at each map position. The image-level score is defined as the maximum value in the score map. To reduce the dimensionality of MVG, a random subset of feature dimensions is dropped.
CutPaste
CutPaste [15] does not use an ImageNet pretrained network for feature extraction, instead, it trains the network to classify images containing simulated anomalies. Simulated anomalies are generated by cutting and pasting rectangular image patches across the image or by overlaying the image with rectangular color patches. Training samples can also contain no anomalies. The network is trained on a proxy task of three-way classification, distinguishing between the three classes of training samples. The network is then used as a feature extractor and the normality is modeled using a MVG. Out of the box CutPaste can not output a defect localization map as only image-level features are used; however, it is possible to obtain localization by evaluating each image patch separately.
PatchCore
PatchCore [23] uses an ImageNet pretrained network as a feature extractor. The extracted features are preprocessed to obtain locally aware patch features by also considering neighboring features. The normality model is obtained by extracting a coreset from all the features in the training set, retaining only the most representative features. To assign an anomaly score to a feature map location, both the distance to the nearest feature in the coreset and the density of the region in which the feature resides are considered. The image-level anomaly score is obtained by taking the maximum value in the anomaly map.
CFLOW-AD
CFLOW-AD [14] uses a normalizing flows framework for efficient anomaly detection and localization. It uses an ImageNet pretrained network to extract image features. Extracted multi-scale features are combined with positional encodings to introduce spatial information in the model. Normality of the extracted features is then modeled with conditional normalizing flows, allowing explicit likelihood estimation, which also serves as a final image anomaly score.
DRÆM
DRÆM [38] uses simulated image anomalies to train a discriminative anomaly detection model. It contains a reconstructive and a discriminative subnetwork. First, diverse anomalies with pixel-perfect anomaly masks are generated. The reconstructive network employs an encoder-decoder architecture and is trained to restore the regions containing simulated anomalies to their original appearance. The discriminative network then takes the original image and the output of the reconstructive network as an input and learns to directly output a defect localization map. The discriminative network is trained using the simulated anomaly images and masks. The final anomaly score is given by the maximum value in the smoothed defect localization map.

4 Datasets

We used four diverse datasets to evaluate the robustness of the unsupervised methods. The datasets were selected due to their diversity and due to the larger availability of both normal and anomalous images, which allowed us to design identical experimental frameworks for all four datasets, thus enabling better comparison of results from different datasets. As shown in Fig. 2, the images from the KolektorSDD2 and the DAGM depict the surface of a textured object, whereas the images from the SensumSODF and the BSData show complete objects.
Fig. 2
Examples of images with anomalies highlighted in red from all four datasets. The top row shows samples from KolektorSDD2 [6], the second row shows samples from SensumSODF [20] and DAGM [32]. The final row depicts two samples from BSDATA [27]
Fig. 3
Left: Heatmaps displaying spatial distributions of anomalies for all four datasets. Right: Histograms of sizes of anomalies, normalized by the total size of the area in which anomalies may appear
KolektorSDD2
The first dataset, KolektorSDD2 [6], contains normal and anomalous images from a real-world industrial anomaly detection problem. The images show the textured surface of the products during the manufacturing process. Different types of defects are present in the dataset, varying both in shape, size, color, number, and location.
SensumSODF
SensumSODF [20] dataset contains two subsets of images of solid oral dosage forms. We selected only the softgel subset due to the limited availability of anomalous samples in the other subset. The defects vary both in appearance and size and are all located on the pill itself and not on the background.
DAGM
The DAGM [32] dataset contains 10 classes of artificially generated surfaces and defects. We limited our evaluation to a single class from the dataset, class 10. All defects within the same class are of identical appearance, varying in size, location, and rotation.
BSData
BSData dataset [27] contains images of ball screw drives. The defects are of similar appearance, varying in size and limited to a relatively small region in the image.
The left part of Fig. 3 shows spatial distributions of defects for the individual datasets. It can be observed that the defects from the BSData dataset are concentrated in a relatively small area in the image, whereas for the other datasets, defects are spread over a much larger area, especially in the images from the KolektorSDD2 dataset. Right part of Fig. 3 displays the distributions of the defect sizes, normalized by the total area that the defects may occupy. Despite the defective regions in BSData dataset being the smallest in the absolute size, they appear only in a small area, effectively increasing their importance, which also holds true for the defects in SensumSODF dataset. The datasets are therefore quite diverse and enable a fair comparison of different methods in a rather general settings.

5 Experimental methodology

5.1 Robustness measure

We propose to measure the robustness of unsupervised methods by injecting anomalous images in the training sets, which are usually assumed to be anomaly-free. We inject different percentages of anomalous images and the robustness is computed as the average performance of a method across all evaluated percentages of anomalous images compared to the baseline performance. More specifically, Anomaly Detection Robustness\(\textit{ADR}\) is defined as:
$$\begin{aligned} \textit{ADR} =\frac{1}{|P|} \frac{ \sum _{\forall p_i \in P} S_{p_i}}{S_0}; P = \{1, 5, 15, 25\}, \end{aligned}$$
(1)
where \(S_0\) is the baseline score on the clean training set, \(S_{p_i}\) is the average score when there are \(p_i\%\) of anomalous images in the training set, and|P| is the total number of different \(p_i \in P\), i.e., \(|P|=4\).
\(\textit{ADR}\) thus measures the average relative drop in performance compared to the baseline performance \(S_0\) for all different percentages of anomalous samples \(p_i\) in the training set. The value 1.0 indicates that, on average, the performance of the method is the same, regardless of how many anomalous samples there are in the training set, while the value of 0.9, would indicate an average drop in performance of \(10\%\). For evaluating the performance any metrics can be used; we selected AUC (Area Under the Curve), since it is threshold independent and most commonly used in the unsupervised anomaly detection literature. This makes it well-suited for comparing methods that produce anomaly scores on different scales without requiring the selection of arbitrary thresholds and supports fair and consistent evaluation across datasets.

5.2 Experimental setup

Based on the proposed methodology, we constructed the training sets consisting of 0%, 1%, 5%, 15% and 25% of anomalous images for all four datasets we performed the experiments on. We followed the same experimental framework for all datasets and all methods, constructing the training and test sets of the same size. Training sets consisted of 400 images, of which either 0 (baseline), 4 (1%), 20 (5%), 60 (15%) or 100 (25%) were anomalous. Test sets contained 200 images, 100 of which were normal and 100 were anomalous.
To increase the significance of results, we constructed five different training subsets for each percentage of anomalous images \(p_i\). We first randomly selected the required number of normal and anomalous samples for the test set, and then randomly selected five training subsets among the leftover samples. Due to the random sampling and smaller number of images for some datasets, there is some overlap between the multiple training subsets. Final results were then obtained by evaluating all five training subsets and by taking the average, furthermore, this allowed us to report standard errors of the obtained results. For all methods, we used identical hyperparameters across all datasets and all percentages of anomalous images.

6 Results

Based on the proposed methodology we performed a thorough experimental evaluation. We first report some general conclusions about the robustness of the evaluated methods and then discuss in detail the results of the individual methods and identify the reasons for the drop in performance.

6.1 Overall results

For every dataset we calculated the drop in performance of all seven methods for different percentages of added anomalous samples in the training set and plotted the results as depicted in Fig. 4. This figure shows the results averaged across all four datasets and it clearly shows that the performance of the analyzed methods drops when the pollution of the training set increases; however these drops significantly vary between the methods. Based on the obtained results, we calculated \(\textit{ADR}\), as defined in (1), for every method on every dataset. Figure 5 shows both the \(\textit{ADR}\) and the baseline performance in terms of AUC for all seven methods, averaged across all four datasets. Additionally, in Table 2 we show \(\textit{ADR}\) for each dataset and each method separately.
We can observe that the achieved baseline AUC differs for different methods and roughly corresponds to the results reported in the literature, although the size of the clean dataset is limited due to our adapted experimental setup as described in Sec. 5.2. What is more interesting to observe is the robustness the methods are able to achieve. The main takeaway of the experiments is that all analyzed methods are fairly robust to a small percentage of anomalous samples in the training set and some of them retain high performance even when the percentage increases to \(25\%\). Furthermore, the methods that employ pretrained features to represent the data are generally more robust than those that learn the representations from scratch.
The three most robust methods (PaDiM, Gaussian AD, CFLOW-AD) use pretrained features, whereas the two least robust ones (DRÆM, CutPaste) learn them from scratch in a discriminative manner. The fourth most robust method, RIAD, learns representations from scratch in a reconstructive manner. Similarly to the most robust methods, PatchCore uses pretrained features, however it models normality with a coreset-reduced memory bank, which is highly sensitive to anomalous samples in the training data.
Fig. 4
Performance of all seven methods for various number of anomalous images in the training set, averaged across all four datasets
Table 2
Summary of the robustness of all unsupervised methods on all datasets
Method\(\backslash \)Dataset
KolektorSDD2
SensumSODF
DAGM
BSData
Average
RIAD
0.952
0.992
0.870
0.962
0.944 \(\pm 0.045\)
Gaussian AD
0.989
0.986
0.965
0.978
0.980 \(\pm 0.009\)
PaDiM
0.999
0.996
1.000
0.951
0.986 ± 0.021
CutPaste
0.888
0.925
0.837
0.866
0.879 \(\pm 0.032\)
PatchCore
0.995
0.954
0.938
0.782
0.917 \(\pm 0.081\)
CFLOW-AD
0.995
0.986
0.891
0.966
0.960 \(\pm 0.041\)
DRÆM
0.876
0.950
0.694
0.941
0.865 \(\pm 0.103\)
Average
0.956 \(\pm 0.049\)
0.970 \(\pm 0.025\)
0.885 \(\pm 0.094\)
0.921 \(\pm 0.066\)
/
The last column shows the robustness of each method averaged over all datasets, whereas the last row shows the average robustness that all methods achieve on that dataset. The most robust methods for each dataset is bolded as well
Fig. 5
Scatter plot displaying robustness \(\textit{ADR}\) on x-axis and baseline AUC performances on y-axis for all seven unsupervised methods averaged across all four datasets. Blue crosses plot the average performance achieved on every of four datasets, averaged across all seven methods

6.2 Detailed analysis

PaDiM
PaDiM is the most robust of all seven methods, achieving \(\textit{ADR}\) of 0.986. We investigate how the components of the Multivariate Gaussian (MVG), which is used to model normality, change when \(p_i\) increases. Components of both the mean vectors and the covariance matrices are observed, in particular we observe the mean value of all components of the mean vectors for all patch locations and the mean of the absolute values of all the components of covariance matrices for all patch locations. Additionally, we average the aforementioned means across all five different randomly sampled training sets. Table 3 shows how the means and covariances change for methods that use the MVG normality model. We can observe that for PaDiM both remain almost identical, which is probably the key factor behind the high robustness of the method. We hypothesize that this is mostly due to the fact that in all four datasets the anomalies are relatively small in size and each affects very few patch locations. The highest change in the average MVG model which occurs on the BSData dataset corresponds to the highest drop in performance, which also holds true for the other datasets. In Fig. 6 we also visualized the anomaly score densities on the BSData dataset. While the scores for normal images tend to stay the same, the anomaly scores for the anomalous images get closer to the anomaly scores of the normal images the more the dataset is fulled with anomalous data.
Table 3
Comparison of changes in components of the MVG model for three methods that model normality with it
  
Dataset
Baseline
1%
5%
15%
25%
PaDiM
Mean
KolektorSDD2
5.52
5.52 (0.00)
5.52 (0.00)
5.52 (0.00)
5.53 (0.01)
SensumSODF
5.88
5.88 (0.00)
5.88 (0.00)
5.89 (0.00)
5.89 (0.01)
BSData
6.01
6.01 (0.00)
6.01 (0.00)
6.02 (0.01)
6.03 (0.02)
DAGM
6.13
6.13 (0.00)
6.13 (0.00)
6.13 (0.00)
6.13 (0.00)
Covariance
KolektorSDD2
0.02
0.02 (0.00%)
0.02 (0.01%)
0.02 (0.02%)
0.02 (0.03%)
SensumSODF
0.02
0.02 (0.01%)
0.02 (0.02%)
0.02 (0.08%)
0.02 (0.14%)
BSData
0.02
0.02 (0.01%)
0.02 (0.05%)
0.02 (0.16%)
0.03 (0.27%)
DAGM
0.03
0.03 (– 0.00%)
0.03 (0.00%)
0.03 (0.00%)
0.03 (0.01%)
Gaussian AD
Mean
KolektorSDD2
– 5.15
– 5.18 (0.03)
– 5.28 (0.13)
– 5.45 (0.30)
– 5.63 (0.48)
SensumSODF
– 15.99
– 15.93 (0.06)
– 15.85 (0.14)
– 15.44 (0.55)
– 15.11 (0.88)
BSData
– 14.40
– 14.43 (0.03)
– 14.57 (0.17)
– 14.83 (0.44)
– 15.15 (0.76)
DAGM
– 61.56
– 61.57 (0.00)
– 61.54 (0.03)
– 61.41 (0.15)
– 61.30 (0.26)
Covariance
KolektorSDD2
9.64
9.63 (– 0.12%)
9.68 (0.41%)
9.83 (1.98%)
10.08 (4.59%)
SensumSODF
27.04
26.90 (– 0.51%)
27.20 (0.61%)
27.65 (2.28%)
28.05 (3.74%)
BSData
10.36
10.82 (4.38%)
12.90 (24.48%)
18.48 (78.33%)
22.83 (120.26%)
DAGM
3.49
3.51 (0.34%)
3.52 (0.63%)
3.61 (3.28%)
3.64 (4.07%)
CutPaste
Mean
KolektorSDD2
– 3.70
– 3.71 (0.01)
– 3.73 (0.03)
– 3.73 (0.03)
– 3.59 (0.11)
SensumSODF
– 3.21
– 3.33 (0.12)
– 3.27 (0.05)
– 3.25 (0.03)
– 3.13 (0.08)
BSData
2.42
2.41 (0.01)
2.40 (0.02)
2.39 (0.02)
2.29 (0.13)
DAGM
0.51
0.80 (0.30)
0.56 (0.05)
0.78 (0.27)
0.28 (0.22)
Covariance
KolektorSDD2
0.08
0.08 (0.00%)
0.08 (– 0.01%)
0.08 (– 0.02%)
0.08 (0.00%)
SensumSODF
0.24
0.22 (– 0.07%)
0.23 (– 0.03%)
0.23 (– 0.04%)
0.22 (– 0.07%)
BSData
0.08
0.08 (0.01%)
0.08 (0.03%)
0.09 (0.10%)
0.10 (0.19%)
DAGM
0.06
0.06 (0.01%)
0.06 (0.01%)
0.06 (– 0.01%)
0.06 (– 0.01%)
Differences in the mean value of the mean vectors and increases in the mean absolute value of the covariance matrix are shown for PaDiM, Gaussian AD, and CutPaste. For PaDiM, values are in addition averaged over all patch locations and multiplied by 100 to increase readability
Fig. 6
Anomaly score densities for PaDiM on the BSData dataset (left) and for CFLOW-AD on the DAGM dataset (right). The numbers on the left signify how much of the dataset is anomalous and the green line represents the density for normal images, while the red line represents the density for the anomalous images. For PaDiM it is possible to observe that the average values for normal images barely change while the average anomaly score for the anomalous scores regress towards the normal ones. On the other hand it is possible to observe for CFLOW-AD that when the dataset is not corrupted significantly the densities are separated, while they start to overlap by a significant margin when the corruption is higher
Gaussian AD
Gaussian AD is the second most robust method, achieving \(\textit{ADR}\) of 0.980. Similarly as for PaDiM, we investigate how the components of MVG change when anomalous samples are introduced in the training. The second section in Table 3 shows the changes; we can observe that covariances increase significantly more than for the PaDiM, which leads to lower anomaly scores for anomalous images and higher anomaly scores for normal images, thus reducing the overall AUC score. The model changes the most on BSData, where the mean absolute value of all covariances more than doubles. The performance of the method drops the most on the DAGM dataset, however the model does not change as much, which can be attributed to the synthetic nature of the dataset.
Table 4
Means and standard deviations of scores for both normal and anomalous samples for CFLOW-AD on the DAGM dataset. Distance between means of normal and anomalous samples shrinks when \(p_i\) increases
 
Baseline
1%
5%
15%
25%
Normal
0.31 ± 0.14
0.32± 0.14
0.31 ± 0.14
0.34 ± 0.14
0.35 ± 0.14
Anomalous
0.82 ± 0.18
0.77± 0.19
0.66 ± 0.21
0.56 ± 0.19
0.53 ± 0.18
AUC
0.97
0.96
0.91
0.82
0.78
CFLOW-AD
CFLOW-AD achieves \(\textit{ADR}\) of 0.960, making it the third most robust method. Identically as the PaDiM and Gaussian AD, the method uses pretrained image representations, however, it models the distribution of normal data with normalizing flows, which are significantly more complex and expressive than the MVG model, therefore also less robust. In Table 4 it can be observed that the distance between means of scores of normal and anomalous samples shrinks as the \(p_i\) increases, leading to a drop in AUC. As with PaDiM we also visualized the anomaly score densities for CFLOW-AD on the DAGM dataset. The visualization can be seen in Fig. 6. While the model still performs well when the number of anomalous images is relatively small, the performance quickly deteriorates with the higher number of anomalous images in the training dataset.
RIAD
With \(\textit{ADR}\) of 0.944, RIAD is the fourth most robust method. We hypothesize that the main reason for relatively high robustness lies in the small size of anomalies and their diversity, which makes it impossible for the reconstruction network to reconstruct them with high enough precision to drastically reduce their anomaly score compared to non-anomalous regions. Nevertheless, the reconstruction of anomalous regions improves when \(p_i\) grows, which still leads to a noticeable drop in performance.
PatchCore
PatchCore achieves \(\textit{ADR}\) of 0.917. While the method, similarly as the most robust ones, uses a pretrained network for feature extraction, its Coreset-reduced memory bank representation of normality is significantly more sensitive to the presence of anomalous features. Greedy coreset algorithm ensures that features from anomalous regions in images are included in the final memory bank, as they are the furthest away from the normal features, and as such are selected by the algorithm. This leads to the drop in performance as it both increases distances of normal patch features to their representative nearest neighbors in the memory bank, as well as decreases the same distances for anomalous patch features. The issue is somewhat alleviated by also considering how dense regions in memory bank feature space are when calculating anomaly scores. In Table 5 we show the mean \(l_2\) distance between all memory bank components for all datasets and their clearly visible increase when the percentage of anomalous samples in the training set grows.
Table 5
Comparison of mean \(l_2\) distance between all components from the PatchCore memory bank
Dataset
Baseline
1%
5%
15%
25%
KolektorSDD2
3.39
3.40 (0.5%)
3.55 (4.6%)
3.94 (16.2%)
4.15 (22.6%)
SensumSODF
4.46
4.50 (0.8%)
4.58 (2.7%)
4.67 (4.7%)
4.72 (5.9%)
BSData
3.77
3.78 (0.1%)
3.79 (0.5%)
3.82 (1.2%)
3.83 (1.6%)
DAGM
3.45
3.46 (0.3%)
3.50 (1.3%)
3.56 (3.1%)
3.59 (4.2%)
Relative increase is shown in brackets
CutPaste
CutPaste is with \(\textit{ADR}\) of 0.879 the second least robust method despite the fact that it shares the model of normality with the two most robust ones. The third section in Table 3 shows the changes in the MVG model of normality. Interestingly, despite the significantly lower \(\textit{ADR}\) as Gaussian AD, the MVG model changes significantly less than for the Gaussian AD. We hypothesize that the reason for smaller changes lies in the design of the proxy task; the network is pretrained to distinguish between various types of artificial defects. During pretraining, images are either left as they are, or they are augmented with artificial defects. The same image can appear in a different target class depending on whether it was artificially corrupted or not, which guides the training in such a direction that the features are highly sensitive to artificial defects and indifferent to the real ones. The MVG model that is fitted to these features then changes significantly less when \(p_i\) increases, however, it is also less capable of separating normal and anomalous samples.
DRÆM
The least robust method is DRÆM, achieving \(\textit{ADR}\) of 0.865. Reconstruction subnetwork is trained to remove anomalous pixels in the image by reconstructing them as anomaly-free, similarly as in RIAD, which is one of the more robust methods, however, the discriminative subnetwork is then explicitly trained to identify anomalous pixels from image-reconstruction input pair. Due to its greater learning capacity and discriminative nature, the discriminative subnetwork tends to successfully learn to recognize anomalous regions as non-anomalous, leading to a significant drop in performance.

6.3 Training on anomalous samples only

We also investigated what happens if the percentage of anomalous samples in the training set is pushed to the limit. For a fair comparison we evaluated how the AUC on the test set changes, when we train on either 100 normal samples or on 100 anomalous samples, to account for the decrease in performance that occurs when using fewer training samples. Both the normal and anomalous training samples, as well as the test set were the same as in the other experiments. The results are shown in Table 6. As in the other experiments, PaDiM retains the highest performance when trained on anomalous samples only. On three of the four datasets, PaDiM experiences a very small drop in the performance, dropping less than 2% when compared to the training on normal samples. Additionally, on the two datasets with the highest diversity of the anomalies, KolektorSDD2 and SensumSODF, some other methods obtain comparable performance as when trained on normal samples. We can also notice that the highest drop in performance is observed on BSData; as can be seen in Fig. 3, the defects in this dataset are concentrated in a small image area, covering a relatively large part of these areas. In contrast, in other datasets the number of corrupted pixels is still relatively small, although they are present in all training images, allowing most of the methods to still perform decently well.
Table 6
Performance of different methods, when trained on anomalous samples only
Method
KolektorSDD2
SensumSODF
DAGM
BSData
RIAD
0.871 \(\rightarrow \) 0.723 (0.147)
0.815 \(\rightarrow \) 0.791 (0.024)
0.825 \(\rightarrow \) 0.601 (0.224)
0.930 \(\rightarrow \) 0.640 (0.290)
Gaussian AD
0.927 \(\rightarrow \) 0.844 (0.082)
0.868 \(\rightarrow \) 0.769 (0.100)
0.981 \(\rightarrow \) 0.602 (0.379)
0.971 \(\rightarrow \) 0.276 (0.695)
PaDiM
0.853 \(\rightarrow \) 0.841 (0.012)
0.871 \(\rightarrow \) 0.855 (0.016)
0.996 \(\rightarrow \) 0.993 (0.003)
0.936 \(\rightarrow \) 0.515 (0.421)
CutPaste
0.736 \(\rightarrow \) 0.587 (0.149)
0.776 \(\rightarrow \) 0.602 (0.174)
0.975 \(\rightarrow \) 0.622 (0.352)
0.907 \(\rightarrow \) 0.356 (0.551)
PatchCore
0.971 \(\rightarrow \) 0.919 (0.052)
0.856 \(\rightarrow \) 0.777 (0.079)
1.000 \(\rightarrow \) 0.871 (0.129)
0.946 \(\rightarrow \) 0.201 (0.746)
CFLOW-AD
0.954 \(\rightarrow \) 0.936 (0.018)
0.832 \(\rightarrow \) 0.800 (0.032)
0.997 \(\rightarrow \) 0.915 (0.081)
0.948 \(\rightarrow \) 0.588 (0.360)
DRÆM
0.911 \(\rightarrow \) 0.738 (0.172)
0.804 \(\rightarrow \) 0.736 (0.068)
0.911 \(\rightarrow \) 0.488 (0.423)
0.911 \(\rightarrow \) 0.368 (0.543)
Each entry in the table shows the AUC when the method is trained either on 100 normal or on 100 anomalous samples, and the decrease in AUC

6.4 Robustness on different datasets

We further analyzed the characteristics of the individual datasets by observing the average ADR that the methods achieve on them. In Table 2 and Fig. 5 we additionally show the mean ADR of all the methods on the individual datasets. The dataset on which the methods achieved the highest mean ADR, SensumSODF contains anomalies that are visually the most similar to the normal appearance, leading to less significant distribution shift, however, this also corresponds with the lowest mean baseline performance, as such anomalies are harder to detect in the first place. KolektorSDD2, with the second highest mean ADR, contains very diverse anomalies, again introducing a smaller distribution shift. BSData, the second least robust dataset, contains anomalies which are less diverse in appearance, but even more importantly, occupy a relatively small area in an image, leading to a similar distribution which is easier to learn. Subsequently, during the testing, anomalous samples are harder to identify and separate from normal. The methods tend to be least robust on DAGM, which contains artificially generated images and artificially generated anomalies. Since the anomalies were artificially generated, they have identical appearances, varying only in size and location. Unsupervised methods are therefore capable of learning that distribution, making it significantly harder to separate between normal and anomalous samples.

7 Conclusion

In this paper, we discuss an important characteristic of unsupervised surface anomaly detection methods, which is usually not considered in the evaluation of the proposed approaches. We advocate that the robustness to the training false negatives plays an important role in application of such methods in real world applications, since in practice it is not always possible to avoid a fraction of anomalous samples to be present in the training set. We defined the anomaly detection robustness as the average performance of the method obtained on the datasets contaminated with several different percentages of anomalous images compared to the baseline performance.
We performed an extensive evaluation of seven recently proposed unsupervised methods on four diverse datasets. The methods are based on different anomaly detection paradigms, ranging from reconstruction-based methods all the way to normalizing flows. By analyzing the experimental results we point at the characteristics of the individual methods and general paradigms that are being used.
The main conclusion is that all analyzed methods are fairly robust to a small percentage of anomalous samples in the training, which is good news, since it is difficult to expect that the training sets are completely clean. The methods that employ pretrained features retain a high performance even when as much as a quarter of training images are anomalous.
The fact is that even in anomalous samples the actual anomalies occupy only a small portion of the images. They are therefore still in the minority in comparison with the regular appearance, and the learned distribution over the pretrained features is not harmed significantly. Consequently, these methods are quite robust. On the other hand, this would also mean that such methods would require more training data to model problem domains, where a fraction of normal samples contain unusual, not near-main-distribution appearance. Collecting such datasets and analyzing the robustness of different methods on them remains a topic of our feature research endeavors.
To conclude, we would like to point out that anomaly detection robustness is an important characteristic of unsupervised anomaly detection methods and that it should be considered as one of the standard metrics for the evaluation of newly proposed approaches.

Acknowledgements

This work was in part supported by the ARRS research project L2-3169 (MV4.0) and research programme P2-0214.

Declarations

Conflict of interest

The authors declare that we have no Conflict of interest.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.
This article does not contain any studies with human participants or animals performed by any of the authors.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creativecommons.​org/​licenses/​by/​4.​0/​.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Literatur
1.
Zurück zum Zitat Akcay S, Atapour-Abarghouei A, Breckon TP (2019) Ganomaly: Semi-supervised anomaly detection via adversarial training. Computer Vision–ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, December 2–6, 2018, Revised Selected Papers, Part III 14. Springer, Cham, pp 622–637CrossRef Akcay S, Atapour-Abarghouei A, Breckon TP (2019) Ganomaly: Semi-supervised anomaly detection via adversarial training. Computer Vision–ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, December 2–6, 2018, Revised Selected Papers, Part III 14. Springer, Cham, pp 622–637CrossRef
2.
Zurück zum Zitat Akçay S, Atapour-Abarghouei A, Breckon TP (2019) Skip-ganomaly: Skip connected and adversarially trained encoder-decoder anomaly detection. In: 2019 International Joint Conference on Neural Networks (IJCNN), IEEE, pp 1–8 Akçay S, Atapour-Abarghouei A, Breckon TP (2019) Skip-ganomaly: Skip connected and adversarially trained encoder-decoder anomaly detection. In: 2019 International Joint Conference on Neural Networks (IJCNN), IEEE, pp 1–8
3.
Zurück zum Zitat Baur C, Wiestler B, Albarqouni S et al (2019) Deep autoencoding models for unsupervised anomaly segmentation in brain MR images. Brainlesion: glioma, multiple sclerosis stroke and traumatic brain injuries: 4th International Workshop, BrainLes 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 16, 2018, Revised Selected Papers, Part I 4,. Springer, Cham, pp 161–169 Baur C, Wiestler B, Albarqouni S et al (2019) Deep autoencoding models for unsupervised anomaly segmentation in brain MR images. Brainlesion: glioma, multiple sclerosis stroke and traumatic brain injuries: 4th International Workshop, BrainLes 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 16, 2018, Revised Selected Papers, Part I 4,. Springer, Cham, pp 161–169
4.
Zurück zum Zitat Bergmann P, Löwe S, Fauser M, et al (2019) Improving unsupervised defect segmentation by applying structural similarity to autoencoders. In: Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, VISIGRAPP 2019, Volume 5: VISAPP, Prague, Czech Republic, February 25-27, 2019. SciTePress, pp 372–380 Bergmann P, Löwe S, Fauser M, et al (2019) Improving unsupervised defect segmentation by applying structural similarity to autoencoders. In: Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, VISIGRAPP 2019, Volume 5: VISAPP, Prague, Czech Republic, February 25-27, 2019. SciTePress, pp 372–380
5.
Zurück zum Zitat Bigdeli E, Mohammadi M, Raahemi B et al (2017) A fast and noise resilient cluster-based anomaly detection. Pattern Anal Appl 20:183–199MathSciNetCrossRef Bigdeli E, Mohammadi M, Raahemi B et al (2017) A fast and noise resilient cluster-based anomaly detection. Pattern Anal Appl 20:183–199MathSciNetCrossRef
6.
Zurück zum Zitat Božič J, Tabernik D, Skočaj D (2021) Mixed supervision for surface-defect detection: From weakly to fully supervised learning. Comput Ind 129:103459CrossRef Božič J, Tabernik D, Skočaj D (2021) Mixed supervision for surface-defect detection: From weakly to fully supervised learning. Comput Ind 129:103459CrossRef
7.
Zurück zum Zitat Christiansen P, Nielsen LN, Steen KA et al (2016) Deep anomaly combining background subtraction and deep learning for detecting obstacles and anomalies in an agricultural field. Sensors 16(11):1904CrossRef Christiansen P, Nielsen LN, Steen KA et al (2016) Deep anomaly combining background subtraction and deep learning for detecting obstacles and anomalies in an agricultural field. Sensors 16(11):1904CrossRef
8.
Zurück zum Zitat Dai S, Wu Y, Li X, et al (2024) Generating and reweighting dense contrastive patterns for unsupervised anomaly detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 1454–1462 Dai S, Wu Y, Li X, et al (2024) Generating and reweighting dense contrastive patterns for unsupervised anomaly detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 1454–1462
9.
Zurück zum Zitat Defard T, Setkov A, Loesch A, et al (2021) PaDiM: a patch distribution modeling framework for anomaly detection and localization. In: International Conference on Pattern Recognition, Springer, pp 475–489 Defard T, Setkov A, Loesch A, et al (2021) PaDiM: a patch distribution modeling framework for anomaly detection and localization. In: International Conference on Pattern Recognition, Springer, pp 475–489
10.
Zurück zum Zitat Ding C, Pang G, Shen C (2022) Catching both gray and black swans: open-set supervised anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Ding C, Pang G, Shen C (2022) Catching both gray and black swans: open-set supervised anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
11.
Zurück zum Zitat Fučka M, Zavrtanik V, Skočaj D (2025) TransFusion– a transparency-based diffusion model for anomaly detection. In: European conference on computer vision, Springer, pp 91–108 Fučka M, Zavrtanik V, Skočaj D (2025) TransFusion– a transparency-based diffusion model for anomaly detection. In: European conference on computer vision, Springer, pp 91–108
12.
Zurück zum Zitat Gaidhane VH, Hote YV, Singh V (2018) An efficient similarity measure approach for PCB surface defect detection. Pattern Anal Appl 21:277–289MathSciNetCrossRef Gaidhane VH, Hote YV, Singh V (2018) An efficient similarity measure approach for PCB surface defect detection. Pattern Anal Appl 21:277–289MathSciNetCrossRef
13.
Zurück zum Zitat Gong D, Liu L, Le V, et al (2019) Memorizing normality to detect anomaly: memory-augmented deep autoencoder for unsupervised anomaly detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 1705–1714 Gong D, Liu L, Le V, et al (2019) Memorizing normality to detect anomaly: memory-augmented deep autoencoder for unsupervised anomaly detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 1705–1714
14.
Zurück zum Zitat Gudovskiy D, Ishizaka S, Kozuka K (2022) CFLOW-AD: Real-time unsupervised anomaly detection with localization via conditional normalizing flows. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 98–107 Gudovskiy D, Ishizaka S, Kozuka K (2022) CFLOW-AD: Real-time unsupervised anomaly detection with localization via conditional normalizing flows. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 98–107
15.
Zurück zum Zitat Li CL, Sohn K, Yoon J, et al (2021) CutPaste: Self-supervised learning for anomaly detection and localization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9664–9674 Li CL, Sohn K, Yoon J, et al (2021) CutPaste: Self-supervised learning for anomaly detection and localization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9664–9674
16.
Zurück zum Zitat Narin A, Kaya C, Pamuk Z (2021) Automatic detection of coronavirus disease (covid-19) using x-ray images and deep convolutional neural networks. Pattern Anal Appl 24:1207–1220CrossRef Narin A, Kaya C, Pamuk Z (2021) Automatic detection of coronavirus disease (covid-19) using x-ray images and deep convolutional neural networks. Pattern Anal Appl 24:1207–1220CrossRef
17.
Zurück zum Zitat Otsu N et al (1975) A threshold selection method from gray-level histograms. Automatica 11(285–296):23–27 Otsu N et al (1975) A threshold selection method from gray-level histograms. Automatica 11(285–296):23–27
18.
19.
Zurück zum Zitat Pirnay J, Chai K (2022) Inpainting transformer for anomaly detection. In: Sclaroff S, Distante C, Leo M et al (eds) Image analysis and processing - ICIAP 2022. Springer International Publishing, Cham, pp 394–406 Pirnay J, Chai K (2022) Inpainting transformer for anomaly detection. In: Sclaroff S, Distante C, Leo M et al (eds) Image analysis and processing - ICIAP 2022. Springer International Publishing, Cham, pp 394–406
20.
Zurück zum Zitat Rački D, Tomaževič D, Skočaj D (2022) Detection of surface defects on pharmaceutical solid oral dosage forms with convolutional neural networks. Neural Comput Appl 34(1):631–650CrossRef Rački D, Tomaževič D, Skočaj D (2022) Detection of surface defects on pharmaceutical solid oral dosage forms with convolutional neural networks. Neural Comput Appl 34(1):631–650CrossRef
21.
Zurück zum Zitat Rippel O, Mertens P, Merhof D (2021) Modeling the distribution of normal data in pre-trained deep features for anomaly detection. In: 2020 25th International Conference on Pattern Recognition (ICPR), IEEE, pp 6726–6733 Rippel O, Mertens P, Merhof D (2021) Modeling the distribution of normal data in pre-trained deep features for anomaly detection. In: 2020 25th International Conference on Pattern Recognition (ICPR), IEEE, pp 6726–6733
22.
Zurück zum Zitat Rolih B, Fučka M, Skočaj D (2024) SuperSimpleNet: unifying unsupervised and supervised learning for fast and reliable surface defect detection. In: International Conference on Pattern Recognition Rolih B, Fučka M, Skočaj D (2024) SuperSimpleNet: unifying unsupervised and supervised learning for fast and reliable surface defect detection. In: International Conference on Pattern Recognition
23.
Zurück zum Zitat Roth K, Pemula L, Zepeda J, et al (2022) Towards total recall in industrial anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 14318–14328 Roth K, Pemula L, Zepeda J, et al (2022) Towards total recall in industrial anomaly detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 14318–14328
24.
Zurück zum Zitat Rudolph M, Wandt B, Rosenhahn B (2021) Same same but differnet: semi-supervised defect detection with normalizing flows. In: winter conference on applications of computer vision (WACV) Rudolph M, Wandt B, Rosenhahn B (2021) Same same but differnet: semi-supervised defect detection with normalizing flows. In: winter conference on applications of computer vision (WACV)
25.
Zurück zum Zitat Rudolph M, Wehrbein T, Rosenhahn B, et al (2022) Fully convolutional cross-scale-flows for image-based defect detection. WACV pp 1829–1838 Rudolph M, Wehrbein T, Rosenhahn B, et al (2022) Fully convolutional cross-scale-flows for image-based defect detection. WACV pp 1829–1838
26.
Zurück zum Zitat Rudolph M, Wehrbein T, Rosenhahn B, et al (2023) Asymmetric student-teacher networks for industrial anomaly detection. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp 2592–2602 Rudolph M, Wehrbein T, Rosenhahn B, et al (2023) Asymmetric student-teacher networks for industrial anomaly detection. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp 2592–2602
28.
Zurück zum Zitat Schlegl T, Seeböck P, Waldstein SM, et al (2017) Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In: International conference on information processing in medical imaging, Springer, pp 146–157 Schlegl T, Seeböck P, Waldstein SM, et al (2017) Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In: International conference on information processing in medical imaging, Springer, pp 146–157
29.
Zurück zum Zitat Shvetsova N, Bakker B, Fedulova I et al (2021) Anomaly detection in medical imaging with deep perceptual autoencoders. IEEE Access 9:118571–118583CrossRef Shvetsova N, Bakker B, Fedulova I et al (2021) Anomaly detection in medical imaging with deep perceptual autoencoders. IEEE Access 9:118571–118583CrossRef
30.
Zurück zum Zitat Tabernik D, Šela S, Skvarč J et al (2020) Segmentation-based deep-learning approach for surface-defect detection. J Intell Manuf 31(3):759–776CrossRef Tabernik D, Šela S, Skvarč J et al (2020) Segmentation-based deep-learning approach for surface-defect detection. J Intell Manuf 31(3):759–776CrossRef
31.
Zurück zum Zitat Xy Wang, Wang C, Wang L et al (2021) Robust and effective multiple copy-move forgeries detection and localization. Pattern Anal Appl 24:1025–1046CrossRef Xy Wang, Wang C, Wang L et al (2021) Robust and effective multiple copy-move forgeries detection and localization. Pattern Anal Appl 24:1025–1046CrossRef
33.
35.
Zurück zum Zitat Yao X, Li R, Qian Z, et al (2023) Focus the discrepancy: intra-and inter-correlation learning for image anomaly detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 6803–6813 Yao X, Li R, Qian Z, et al (2023) Focus the discrepancy: intra-and inter-correlation learning for image anomaly detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 6803–6813
36.
Zurück zum Zitat You OT, Pae DS, Kim SH et al (2018) Pattern matching for industrial object recognition using geometry-based vector mapping descriptors. Pattern Anal Appl 21:1167–1183MathSciNetCrossRef You OT, Pae DS, Kim SH et al (2018) Pattern matching for industrial object recognition using geometry-based vector mapping descriptors. Pattern Anal Appl 21:1167–1183MathSciNetCrossRef
37.
Zurück zum Zitat Yu J, Zheng Y, Wang X, et al (2021) FastFlow: Unsupervised anomaly detection and localization via 2D normalizing flows. arXiv preprint arXiv:2111.07677. https://doi.org/https://doi.org/10.48550/arXiv.2111.07677 Yu J, Zheng Y, Wang X, et al (2021) FastFlow: Unsupervised anomaly detection and localization via 2D normalizing flows. arXiv preprint arXiv:​2111.​07677. https://​doi.​org/​https://doi.org/10.48550/arXiv.2111.07677
38.
Zurück zum Zitat Zavrtanik V, Kristan M, Skočaj D (2021a) DRÆM - a discriminatively trained reconstruction embedding for surface anomaly detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 8330–8339 Zavrtanik V, Kristan M, Skočaj D (2021a) DRÆM - a discriminatively trained reconstruction embedding for surface anomaly detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 8330–8339
39.
Zurück zum Zitat Zavrtanik V, Kristan M, Skočaj D (2021) Reconstruction by inpainting for visual anomaly detection. Pattern Recogn 112:107706CrossRef Zavrtanik V, Kristan M, Skočaj D (2021) Reconstruction by inpainting for visual anomaly detection. Pattern Recogn 112:107706CrossRef
Metadaten
Titel
Robustness of unsupervised methods for image surface-anomaly detection
verfasst von
Jakob Božič
Matic Fučka
Vitjan Zavrtanik
Danijel Skočaj
Publikationsdatum
01.06.2025
Verlag
Springer London
Erschienen in
Pattern Analysis and Applications / Ausgabe 2/2025
Print ISSN: 1433-7541
Elektronische ISSN: 1433-755X
DOI
https://doi.org/10.1007/s10044-025-01477-y