Skip to main content

Tipp

Weitere Kapitel dieses Buchs durch Wischen aufrufen

Erschienen in:
Buchtitelbild

Open Access 2023 | OriginalPaper | Buchkapitel

A Study on Data Augmentation Techniques for Visual Defect Detection in Manufacturing

verfasst von : Lars Leyendecker, Shobhit Agarwal, Thorben Werner, Maximilian Motz, Robert H. Schmitt

Erschienen in: Bildverarbeitung in der Automation

Verlag: Springer Berlin Heidelberg

Abstract

Deep learning-based defect detection is rapidly gaining importance for automating visual quality control tasks in industrial applications. However, due to usually low rejection rates in manufacturing processes, industrial defect detection datasets are inherent to three severe data challenges: data sparsity, data imbalance, and data shift. Because the acquisition of defect data is highly cost″​=intensive, and Deep Learning (DL) algorithms require a sufficiently large amount of data, we are investigating how to solve these challenges using data oversampling and data augmentation (DA) techniques. Given the problem of binary defect detection, we present a novel experimental procedure for analyzing the impact of different DA-techniques. Accordingly, pre-selected DA-techniques are used to generate experiments across multiple datasets and DL models. For each defect detection use-case, we configure a set of random DA-pipelines to generate datasets of different characteristics. To investigate the impact of DA-techniques on defect detection performance, we then train convolutional neural networks with two different but fixed architectures and hyperparameter sets. To quantify and evaluate the generalizability, we compute the distances between dataset derivatives to determine the degree of domain shift. The results show that we can precisely analyze the influences of individual DA-methods, thus laying the foundation for establishing a mapping between dataset properties and DA-induced performance enhancement aiming for enhancing DL development. We show that there is no one-fits all solution, but that within the categories of geometrical and color augmentations, certain DA-methods outperform others.

1 Introduction

Manufacturing processes have been optimized in recent decades to achieve minimum reject rates and high product qualities. However, as product and process complexities increase, the importance of reliable quality continues to grow. Defects such as internal holes, pits, abrasions, and scratches on workpieces or knots, broken picks, and broken yarn in fabrics [1] negatively impact both visual and functional product properties [2]. Defects also contribute to the additional wastage of resources, safety hazards, and can have severe economic consequences for a company. Therefore, reliably assuring the quality of manufactured products is of paramount importance in manufacturing. One of the famous and contemporary solutions towards achieving the goal of a fully automated quality control system is through deep learning (DL)-based computer vision. DL algorithms improve over existing rule-based systems in terms of generalization and performance, while requiring less domain expertise [3, 4, 5]. However, a major disadvantage of data-driven approaches compared to rule-based techniques lies in the strong dependency of model precision on data quantity, data quality, and the evolution of the data over time (data drift) [6]. While the focus in recent years has been on the development of advanced network architectures (e.g., ResNet-50 [7] or Inception-v3 [8]), the progress that is being made in model-space is increasingly diminishing. As a result, the development is shifting more towards data-centric approaches, especially in real-world domains like for example manufacturing or medical diagnostics. Table 6.1 provides an overview of the main data challenges that are characteristic for image data acquired from production processes. These properties form a strong contrast to the ones of (research) datasets (e.g., ImageNet [9], COCO [10], MNIST [11]) used for developing and benchmarking of deep neural network architectures and DL-algorithms, which is why the approaches from research are difficult to transfer one-to-one to such complex defect detection use-cases.
Tab. 6.1
Causes of data quality issues in DL-based visual defect detection in terms of data sparsity, data imbalance and data shift
Data Quality Issue
Description
Amount of data
Difficulty in collecting sufficiently large amounts of data.
Label inconsistencies
Labor″​=intensive task that is oftentimes ambiguous and usually requires multiple domain″​=experts
Data imbalance
Defective parts tend to be significantly underrepresented compared to non-defective ones
Changing lightning conditions
Contrasts and brightness changes across different work shifts
Exposure issues
Reflections and shadows cast by complex components
Sensor failure
Image failures or high noise-levels due to sensor degradation amplified by harsh environments
Changing object poses
Especially in mass production often different orientation of components
Changing appearances
Changes in the appearance of a product from time to time can make the data previously collected unusable.
Data augmentation (DA) represents a data-space solution addressing the above mentioned data quality challenges. There are various DA techniques that aim for changing both the geometrical and visual appearance of images to improve both performance and robustness properties of deep neural networks. The most common DA techniques are geometric transformations, color augmentations, kernel filters, mixing images and random erasing [12]. Even though DA is already an integral part of DL pipelines, different DA-methods are often blindly applied based on empirical knowledge and require elaborate tuning for specific datasets. To analyze the impact of different DA-methods on both precision and generalization for the task of visual defect detection, this paper introduces our experimental procedure in Sect. 6.3.3, presents the results in Sect. 6.4.2 and finally derives insights about the studied DA-methods in Sect. 6.5. Sect. 6.3.2 introduces the three real-world datasets which we work with. Our DA-methods are chosen according to a preliminary study of related papers that is summarized in Sects. 6.2 and 6.3.3.
This section provides a brief overview of work that addresses the generalization problem, DA approaches, and its impact on real-world DL tasks. One central drawback of real-world datasets is that the models trained on them do not generalize well as these datasets are prone to domain shift [13]. In recent years model-centric techniques such as dropout [14], transfer learning [15], and pretraining [16] have tried to address the issues of generalization, particularly in deep neural networks. DA tries to avoid poor generalization by solving the root problem of training data [17] rather than changing the model or training process. Applications of DA can be found in various works across multiple domains such as natural language processing [18], computer vision [17], and time series classification [19]. Particularly in computer vision tasks DA has been applied to address the domain generalization problem [20, 21, 22]. Many papers exist that apply and analyze basic DA-techniques (e.g., oversampling and data warping on histopathological images [23]) and advanced methods (e.g., stacked DA on medical images [24], style-transfer augmentations [25], cGan, and geometric transformations [26]) for specific use cases and datasets.
Fewer papers exist that provide an overview of DA-methods and try to examine their influences on model accuracy. The survey of Shorten et al. [17] presents a comprehensive overview of DA and present the impact examination of individual methods on well-known datasets (e.g., CIFAR-10, MNIST, Caltech101) in an isolated manner of pairwise comparisons. Shijie et al. [27] explore the impact of various DA-methods on image classification tasks with CNNs. On subsets of CIFAR10 and ImageNet, they conduct pair and triple comparisons to identify best″​=performing DA-techniques and to draw general conclusions. Yang et al. [28] systematically review different DA-methods and propose a taxonomy of reviewed methods. For semantic segmentation, image classification, and object detection, they compare the performances of different model architectures on datasets (e.g., CIFAR-100, SVHN) with and without pre-defined set of DA-techniques. The survey paper of Khosla et al. [29] presents an overview of selected DA-methods without conducting further effect analyses. In addition to generic studies on scientific datasets, a few domain″​=specific approaches exist. The only related work on DA in defect detection is provided by Jain et al. [30]. They propose a DA-framework utilizing GANs which they use to investigate data synthetization for classification of manufacturing datasets.

Scientific Impact

Existing studies are almost exclusively conducted on scientific datasets and no reference is made to specific application domains (with the exception of [30]). To the best of our knowledge, there is currently no preliminary work, that examines the impact of DA-methods specific to DL-based visual quality control in manufacturing datasets in an unconstrained setting (i.e. only pairwise evaluations).

3 Approach

In this section, we present our approaches and procedures. Sect. 6.3.1 defines the mathematical problem of binary defect detection. Sect. 6.3.2 introduces the datasets considered in this study and their properties. The experimental procedure, the domain shift measure, and the evaluation metrics are presented in Sect. 6.3.3.

3.1 Binary Defect Detection Problem Definition

For binary visual defect detection, the input feature space is denoted by 𝒳 and 𝒴 denotes the target space. We define the domain as a joint distribution PXY on 𝒳 × 𝒴 and the dataset as 𝒟 = {(𝒳i , 𝒴i)} i N , where N is the number of training examples. In this work, 𝒳1, 𝒳2, 𝒳3 comprises images from three datasets, namely: (1) AITEX fabric defects [1], (2) Magnetic tile defects [31], and (3) TIG Aluminium 5083 welding defects [32]. We define the binary classification problem where 𝒴 ∈  {Defected, Non-defected}. Furthermore, the DL model is defined as f : 𝒳 → 𝒴, where the primary objective is to learn a mapping from the input space 𝒳 to target space 𝒴. In this work f ∈  {ResNet-50 [7], Inception-V3 [8]}. The predictions generated using model f are denoted as \(\hat{\mathcal{Y}}\). The categorical cross entropy loss function is defined as \(\ell\colon\mathcal{Y}\times\hat{\mathcal{Y}}\rightarrow[0,\infty)\). Each dataset 𝒟 = {(𝒳i , 𝒴i)} i N is augmented using various DAs, where θ denotes the list of all DAs, and a new augmented dataset is generated as 𝒟1 = θ(𝒟). For each dataset, ten DA-pipelines with varying DAs are constructed to create ten different data sets 𝒟1 .. 𝒟10.

3.2 Presentation of the Datasets

Three real-world industrial″​=grade datasets are used in this work. An overview of exemplary images is provided in Fig. 6.1. The Magnetic tile defects dataset (MagTile) contains a total of 1,344 images of magnetic tiles with five defect types: blowhole, crack, fray, break, nneven (grinding uneven), and free (no defects). AITEX is a fabric production dataset containing 246 images of 4,096 × 256 pixels that capture seven different fabric structures. In total, there are 140 defect-free images, 20 for each type of fabric, and there are a total of 105 images with defects. The TIG Aluminium 5083 welding seam dataset (TIG5083) contains 33,254 images of aluminium weld seams and the surrounding area of the weld seam, with six classes: good weld, burn through, contamination, lack of fusion, misalignment, and lack of penetration. We convert the multi-class classification task of all datasets into a binary classification problem by merging all individual defect types into a single defect class.

3.3 Experiment Procedure

To evaluate the impact of DA-techniques we propose a three-stage process: First, for each dataset, apply a DA-pipeline and evaluate model performance on different test sets. Second, measure the domain shift between the train set and the test sets. Third, correlate the achieved performance with the domain shift. This framework provides insight into the effects of different DAs on model performance, domain shift, and, through the correlation of both, the generalization capabilities of the trained model. An overview of our algorithm can be found in Fig. 6.2. We assume a standard train-test split of 80/20 and further a validation split of 60/20 (based on the 80% train split). Additionally, we create a hold-out test set by splitting off one of the defect classes per dataset before they are merged (see Sect. 6.3.2). This hold-out set serves as an additional out-of-distribution test set to measure the generalization capabilities of the model. We apply DA in two different settings. For AITEX and MagTile, augmented datapoints were added as new instances, retaining the original ones. This was done to increase the overall number of instances in the dataset and stabilize training. For TIG5083, augmented datapoints replace the originals since the dataset already contains enough images for training. The hold-out class for the AITEX data set was ‘Broken end’, the hold-out class for Magnetic tile defects was ‘Crack’, and the hold-out class for TIG Aluminium 5083 was ‘contamination’ class.

Data Augmentation Pipelines

In order to pre-select the DA-steps for this paper, a survey was conducted across 24 papers dealing with 6 major industrial image data sets. Table 6.2 describes all available augmentations for each dataset. From these augmentation pools, different pipelines for each dataset were constructed. For each pipeline, two of the augmentations are reserved for the test set and are later referred to as test augmentations. The remaining DAs have a 0.5 chance of being applied to the training set. This process is repeated ten times (see Table 6.3). App. 6.6.1 provides an overview of selected unaugmented and augmented images for all three datasets.
Tab. 6.2
Preselected set of DA-methods for TIG5083, AITEX, and MagTile
TIG5083
AITEX
MagTile
1. Gaussian Noise
1. Gaussian Noise
1. Salt & Pepper Noise
2. Transpose Image
2. Transpose Image
2. Transpose Image
3. Flip Image
3. Flip Image
3. Flip Image
4. Perspective Transformation
4. Random Perspective
4. Random Perspective
5. Add Brightness
5. Color Jitter
5. Color Jitter
6. Affine Transformation
6. Moving Least Squares (MLS)
6. MLS [33]
 
7. Random Erase [12]
7. Retinex[34, 35]
 
8. Random Rotate
 
Tab. 6.3
Train, validation, and test set DA-Pipelines (AITEX)
Nr.
Train & Validation augmentations
Test augmentations
1
Random Perspective, Flip Image, Color Jitter
Random Rotate, Transpose Image
2
Flip Image, Random Perspective
Transpose Image, Random Erase
3
Gaussian Noise, Color Jitter, Random Perspective, Random Rotate, Flip Image
Random Erase, MLS
4
Random Erase, MLS, Gaussian Noise
Color Jitter, Random Perspective
5
Random Rotate, MLS, Gaussian Noise
Transpose Image, Flip Image
6
Random Perspective, Random Rotate
Gaussian Noise, MLS
7
Random Erase, AddNoise, Color Jitter
Random Rotate, Random Perspective
8
Random Rotate, Random Erase, Flip Image, Color Jitter
Random Perspective, Gaussian Noise
9
Color Jitter, Random Rotate, Gaussian Noise
MLS, Random Perspective
10
Random Perspective
Random Rotate, MLS

Domain Shift Measures

We use an algorithm proposed by [36] for measuring the domain shift between datasets. In computer vision tasks, calculating domain shift can be seen as calculating the difference in representation by a model given the source and target domain. Given that a source domain is distant from the target domain, the representation of the domains in the learned space for a specific model tends to diverge. The authors used the activation values from the model’s last layers to quantify the domain shift. Specifically, by creating a statistical distribution using each kernel’s activation value in those layers, we can measure the distance between the datasets using the Wasserstein distance.

Evaluation Metrics

To evaluate the results of the binary classification problem, various metrics such as F1″​=Score, precision, recall, Jaccard similarity [37], Cohen’s kappa score [38], and Matthews correlation coefficient (MCC) [39] are used. Since the datasets are imbalanced even after applying DA, all metrics (Jaccard, precision, recall, and F1″​=Score) are weighted by the class distribution. We use multiple different evaluation metrics, as they all slightly deviate from each other. In this way, we circumvent the difficulties due to the sensitivity of individual metrics and obtain a more conclusive evaluation. Since all these scores are bound between [0,1] we average all of them for our reporting of final performance values.

4 Results

In this section, we present the results. Sect. 6.4.1 defines the training and implementation procedure. Sect. 6.4.2 provides an overview of the protocol followed to evaluate the results at the example of the AITEX dataset. Sect. 6.4.3 presents the results of our ablation study.

4.1 Training and Implementation

For controlling the model training, a validation set is split of from the augmented training set. The model is evaluated on the original test set, augmented test sets (using the two reserved test augmentations) and the hold-out set as described in Sect. 6.3.3. The hold-out class for the AITEX data set was ’Broken end’, the hold-out class for MagTile defects was ’Crack’, and the hold-out class for TIG5083 was ’contamination’. As models for our experiment, ResNet-50 and Inception-v3 were chosen, as both are widely used in the literature about industrial applications. The learning rate for both models is set to 10−3, the Adam optimizer [40] is used and the first-layer input shape of the networks is set to 224 and 299 respectively. We initialize the networks using pre-trained weights (ImageNet) for both architectures. DL is enhanced via transfer learning with 50 epochs of frozen weights in the encoder (shallow training) and additional 30 epochs of fine-tuning the entire model (deep training). Similarly to the evaluation metrics, the class-balanced version of the loss function was employed to stabilize the learning process. The data for each experiment was normalized according to the statistics of the train set after applying DA.

4.2 Results for the AITEX Dataset

Fig. 6.3 depicts the average F1″​=Score across both the models and across the DA steps for each test set. The values are obtained by averaging the performance of each pipeline that contains the respective augmentation. We observe that the performance on the original test and, to a lesser extent, the augmented test set remains stable, but on the hold-out set (highest amount of domain shift) model performance significantly improved. The top three DA-steps for AITEX dataset are MLS, Gaussian noise and random rotating. As stated in Sect. 6.3.3, we also averaged the performance across multiple other metrics, since they all slightly differ from each other. Similar trends can be observed in Fig. 6.4.
Next, the distance between the train set (source domain) and the test sets (target domain) was calculated for all the models and datasets. Table 6.4 contains the mean and standard deviation across all the pipelines for the AITEX dataset and ResNet-50 model. The domain shift increases from the original test set to the augmented test set to the hold-out set. Finally, the domain shift is correlated to the respective F1-Scores, as Wasserstein distance alone lacks interpretability.
Tab. 6.4
Domain shift measure averaged across DA-pipelines for the last layer of ResNet-50
Train/Test
Train/Aug_test
Train/Hold_out
0.0764 ± 0.0831
0.0808 ± 0.0804
0.1841 ± 0.1981
A negative correlation means that with increasing domain shift the performance of the model on the test data decreases. Therefore, a greater correlation is desirable. Each cell in Table 6.5 contains the Pearson correlations between the distance measure and F1-Scores across all the test sets. Since the domain shift is measured based on a single layer of the model we evaluated the last three layers of each model and reported the values separately in the columns. The correlation values don’t change depending on the layer used, but we observe two outliers in the pipelines that display a weaker correlation between domain shift and model performance. Further information can be found in App. 6.6.2. The same evaluation protocol was followed for evaluating the results across the other two datasets as well and similar trends were observed. The results TIG5083 and MagTile can be found in App. 6.6.3.
Tab. 6.5
Pearson correlations between the domain shift and model F1-Scores (AITEX). The bold values represent the largest negative mean correlations value.
Pipeline
Inception v3
ResNet-50
Mean
 
Last layer
2nd Last layer
3rd Last layer
Last layer
2nd Last layer
3rd Last layer
 
1
−0.996
−0.996
−0.998
−0.999
−0.999
−0.999
−0.998 ± 0.002
2
−0.727
−0.734
−0.505
−0.941
−0.940
−0.979
−0.804 ± 0.167
3
−0.989
−0.990
−0.971
−0.960
−0.967
−0.986
−0.977 ± 0.012
4
−0.970
−0.972
−0.995
−0.352
−0.317
−0.530
0.689 ± 0.297
5
−0.916
−0.916
−0.986
−0.900
−0.905
−0.979
−0.934 ± 0.035
6
−1.000
−1.000
−1.000
−0.917
−0.931
−0.996
−0.974 ± 0.036
7
−0.999
−0.996
−0.988
−0.935
−0.946
−0.826
−0.948 ± 0.060
8
−0.999
−0.998
−1.000
−0.955
−0.916
−0.974
−0.974 ± 0.0307
9
−0.952
−0.955
−1.000
−0.341
0.121
−0.785
0.652 ± 0.411
10
−0.994
−0.994
−0.991
−0.989
−0.987
−0.994
−0.991 ± 0.003
 
−0.954 ± 0.084
−0.955 ± 0.082
−0.943 ± 0.154
−0.829 ± 0.256
−0.779 ± 0.375
−0.905 ± 0.152
 

4.3 Results of the Ablation Study

In addition to the average score presented in Sect. 6.4.2, we draw additional insights from comparing mode performance across all models and datasets available. Fig. 6.5 depicts the stacked bar plot of weighted F1-Scores averaged across all datasets and models for each augmentation that was available for the dataset. Across all the experiments, affine transformations, moving least squares (MLS) and random rotation DA techniques performed the best. Similarly, Fig. 6.6 depicts the average of the scores across all other evaluation metrics. We can observe similar trends where on average across experiments affine transformations, perspective transformation and MLS perform the best.

5 Conclusion

DL offers enormous potential to automate complex visual quality control tasks that cannot be solved using rule-based methods. However, manufacturing applications entail three severe data challenges: data sparsity, data imbalance and data shift. DA-methods have become an integral part of DL-pipelines to improve both performance and generalization. To provide precise assistance for the selection of DA-methods for developing DL-based quality control in the future, in this paper we present an experiment protocol. Thereby, we aim to evaluate the impact of individual DA-methods on defect detection performance depending on dataset characteristics. We apply this protocol to three defect detection use-cases, present and interpret the results.
Using our approach, we can evaluate the influences of each DA method on the model metrics in detail. We show how to determine the domain shift between genuine and augmented dataset derivatives and therefore providing a measure and interpretability for choosing the degree of DA. By correlating this domain shift with F1-Scores, the strength of the positive influence of a DA-pipeline on bridging the domain shift can be determined. Applying our protocol to the datasets, we obtain the three best DA-methods MLS, Gaussian noise, random rotating (AITEX), image transpose, random perspective, salt & pepper noise (MagTile), and affine transformation, perspective transformation, image transpose (TIG5083). Thereby we confirm that the performance improvement of DA-methods depends on dataset characteristics, the DL-task to be solved and the degree of DA. This shows that there is no one-fits-all solution, but at the same time makes it all the more clear that establishing a mapping between dataset properties (e.g., degree of imbalance, defect sizes, positional variance of defects) and DA-induced performance enhancement will enable tailor-made and precise DL-pipeline development, especially in real-world applications.
Correlating the found performances with the respective domain shift revealed additional insights. The two pipelines for the AITEX dataset that induced the weakest negative correlation between domain shift and performance were mainly composed of our three best″​=performing augmentations for that dataset (see Table 6.5 pipeline 4,9). Additionally, we found that the worst performing pipelines either had very few augmentations or contained badly performing augmentations in them (mainly ″​random rotate’’ for AITEX), further highlighting the need for tailor-made DA-pipelines for each dataset. Our ablation study showed that (in contrast), by averaging the results over all datasets and models, at least some augmentations do perform better than others on average. The better″​=performing augmentations are the more complex ones, showcasing their versatility and robustness, while simple of-the-shelf augmentations display the least amount of lift in model performance. Fig. 6.6 can serve as a benchmark of augmentation techniques for new industrial″​=grade datasets, or those with unknown properties.
With the proposed the protocol, we lay the foundation for determining the appropriateness of DA-methods for specific data properties in an analytical approach. We will include also more advanced DA-methods and extend the study to additional domain″​=specific datasets to provide more validity to the results. By establishing a catalog of dataset properties to which we can map the results of the study, we aim to develop a domain″​=specific decision support system for choosing optimal DA-pipelines for DL-applications.

6 Appendix

6.1 Dataset Illustrations

6.2 Domain Shift Calculations

The distance measure does not have good interpretability alone. Hence, we correlate the distance measure to the F1-Scores, a negative correlation is expected between them where the distance should be smaller, and the F1-Scores should be higher. Table 6.6 provides the distance measures for the averagepool layer of the ResNet-50 model across train and test sets, where the first three columns represent the distance and the following three columns represent the F1-score for the same pipelines. We take Pearson correlations along each pipeline, correlating the distance measure with the corresponding performance metric. Similarly, repeating this process for the last layers of both the models gives us Table 6.5. The same procedure was followed to construct similar tables for MagTile defects and TIG5083 dataset. Furthermore, we take the mean across the last layers of the models.
Tab. 6.6
Wasserstein distance between the augmented train set and all test sets for ResNet-50 and corresponding model F1-Scores
Pipeline
Averagepool layer ResNet-50(Distance)
Test results on ResNet-50
 
Train/Test
Train/Aug_test
Train/Hold_out
Test
Aug_test
Hold_out
1
0.0314
0.04008
0.13468
0.93218
0.91231
0.36298
2
0.30067
0.29725
0.72971
0.94262
0.81365
0.56211
3
0.05871
0.02578
0.14406
0.93388
0.92921
0.56211
4
0.01909
0.09311
0.0724
0.95699
0.90933
0.56211
5
0.07192
0.03547
0.16015
0.95217
0.92163
0.77694
6
0.09642
0.05205
0.16251
0.92431
0.92431
0.36298
7
0.04293
0.09987
0.16797
0.94673
0.87684
0.36298
8
0.02218
0.02751
0.03917
0.92921
0.92262
0.36298
9
0.03592
0.06176
0.0556
0.94262
0.92431
0.6417
10
0.0843
0.07488
0.17482
0.94046
0.89822
0.36298

6.3 Results

6.3.1 MagTile Dataset

Tab. 6.7
Train, validation and test set DA-Pipelines (MagTile)
Nr.
Train & validation Augmentation
Test Augmentation
1
Color Jitter, Salt & Pepper Noise
Flip Image, Transpose Image
2
Random Perspective, Flip Image
Salt & Pepper Noise, Retinex
3
Retinex, MLS
Salt & Pepper Noise, Random Perspective
4
Transpose Image, Random Perspective
MLS, Retinex
5
Color Jitter, Retinex, Salt & Pepper Noise
MLS, Flip Image
6
MLS
Retinex, Salt & Pepper Noise
7
Retinex, Color Jitter, MLS
Flip Image, Transpose Image
8
Random Perspective
MLS, Flip Image
9
Transpose Image
Random Perspective, Flip Image
10
Salt & Pepper Noise, Flip Image, Random Perspective, Retinex
MLS, Transpose Image
Tab. 6.8
Pearson correlations between the domain shift and model F1-Scores (MagTile). The bold values represent the largest negative mean correlations value.
Pipeline
Inception v3
ResNet-50
Mean
 
Last layer
2nd Last layer
3rd Last layer
Last layer
2nd Last layer
3rd Last layer
 
1
−0.42134
−0.29361
−0.17805
−0.908
−0.92141
−0.99612
−0.61976 ± 0.3308
2
−0.87905
−0.77159
−0.75817
−0.98297
−0.91635
−0.98849
−0.88277 ± 0.09151
3
−0.99165
−0.86321
−0.89364
−0.99371
−0.95432
−0.97724
−0.94563 ± 0.05
4
−0.07302
−0.64191
−0.95817
0.86934
−0.99643
−0.63085
0.40517 ± 0.64512
5
−0.99206
−0.99545
−0.99109
−0.99631
−0.99184
−0.99959
−0.99439 ± 0.00302
6
−0.27443
−0.28848
−0.41971
−0.85509
−0.91592
−0.90101
0.60911 ± 0.28593
7
−0.99722
−0.99293
−0.9993
−0.40914
−0.5799
−0.98809
−0.82776 ± 0.24077
8
−0.98136
−0.87428
−0.90671
−0.71458
−0.83115
−0.99136
−0.88324 ± 0.09408
9
−0.98295
−0.93676
−0.82706
−0.94623
−0.96887
−0.9424
−0.93404 ± 0.05046
10
−0.98475
−0.97233
0.05444
−0.98154
−0.99635
−0.9711
−0.8086 ± 0.38606
 
−0.7578 ± 0.3574
−0.7631 ± 0.2715
−0.6877 ± 0.3741
−0.6918 ± 0.5782
−0.9073 ± 0.1258
−0.9386 ± 0.1123
 

6.3.2 TIG5083 Dataset

Tab. 6.9
Train, validation and test set DA-Pipelines (TIG5083)
Nr.
Train & validation Augmentation
Test Augmentation
1
Add Brightness
Affine Transfomer, Perspective Transformation
2
Add Brightness, Gaussian Noise
Transpose Image, Affine Transfomer
3
Transpose Image, Perspective Transformation, Affine Transfomer
Flip Image, Gaussian Noise
4
Gaussian Noise, Perspective Transformation
Transpose Image, Flip Image
5
Transpose Image, Affine Transfomer, Add Brightness
Gaussian Noise, Flip Image
6
 
Transpose Image, Add Brightness
7
Gaussian Noise, Transpose Image
Perspective Transformation, Flip Image
8
Gaussian Noise
Add Brightness, Affine Transfomer
9
Transpose Image, Add Brightness, Flip Image
Perspective Transformation, Gaussian Noise
10.
Perspective Transformation, Gaussian Noise
Flip Image, Transpose Image
Tab. 6.10
Pearson correlations between the domain shift and model F1-Scores (TIG5083). The bold values represent the largest negative mean correlations value.
Pipeline
Inception v3
ResNet-50
Mean
 
Last layer
2nd Last layer
3rd Last layer
Last layer
2nd Last layer
3rd Last layer
 
1
−0.81792
−0.81945
−0.82353
−0.82422
−0.85945
−0.83561
−0.83003 ± 0.01433
2
−0.95836
−0.95362
−0.96833
−0.99884
−0.9493
−0.93576
−0.9607 ± 0.01966
3
−0.92238
−0.90818
−0.24479
−0.91736
−0.9626
−0.90916
−0.81074 ± 0.25376
4
−0.61701
−0.84784
−0.78521
−0.7663
−0.8051
−0.85062
0.77868 ± 0.07852
5
−0.99923
−0.96855
−0.93786
−0.99872
−0.98831
−0.98185
−0.97909 ± 0.02119
6
−0.76129
−0.76871
−0.8904
−0.87209
−0.82126
−0.8541
−0.82798 ± 0.04921
7
−0.98311
−0.98905
−0.97248
−0.47366
−0.33301
−0.36318
0.68575 ± 0.29891
8
−0.9528
−0.96702
−0.93089
−0.95038
−0.99583
−0.96655
−0.96058 ± 0.01985
9
−0.80877
−0.87399
−0.75693
−0.7757
−0.78879
−0.83407
−0.80638 ± 0.03882
10
−0.77332
−0.98553
−0.73991
−0.95172
−0.90796
−0.94447
−0.88382 ± 0.09322
 
−0.8594 ± 0.1236
−0.9082 ± 0.0773
−0.805 ± 0.2152
−0.8529 ± 0.1579
−0.8412 ± 0.1942
−0.8475 ± 0.1789
 
Open Access Dieses Kapitel wird unter der Creative Commons Namensnennung 4.0 International Lizenz (http://​creativecommons.​org/​licenses/​by/​4.​0/​deed.​de) veröffentlicht, welche die Nutzung, Vervielfältigung, Bearbeitung, Verbreitung und Wiedergabe in jeglichem Medium und Format erlaubt, sofern Sie den/die ursprünglichen Autor(en) und die Quelle ordnungsgemäß nennen, einen Link zur Creative Commons Lizenz beifügen und angeben, ob Änderungen vorgenommen wurden.
Die in diesem Kapitel enthaltenen Bilder und sonstiges Drittmaterial unterliegen ebenfalls der genannten Creative Commons Lizenz, sofern sich aus der Abbildungslegende nichts anderes ergibt. Sofern das betreffende Material nicht unter der genannten Creative Commons Lizenz steht und die betreffende Handlung nicht nach gesetzlichen Vorschriften erlaubt ist, ist für die oben aufgeführten Weiterverwendungen des Materials die Einwilligung des jeweiligen Rechteinhabers einzuholen.
Literatur
3.
Zurück zum Zitat Minhas MS, Zelek JS (2020) Defect detection using deep learning from minimal annotations. In: Farinella GM, Radeva P, Braz J (Hrsg) Proceedings of the 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, VISIGRAPP 2020, Volume 4: VISAPP, Valletta, Malta. SCITEPRESS, Setúbal, S 506–513 https://​doi.​org/​10.​5220/​0009168005060513​ CrossRef Minhas MS, Zelek JS (2020) Defect detection using deep learning from minimal annotations. In: Farinella GM, Radeva P, Braz J (Hrsg) Proceedings of the 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, VISIGRAPP 2020, Volume 4: VISAPP, Valletta, Malta. SCITEPRESS, Setúbal, S 506–513 https://​doi.​org/​10.​5220/​0009168005060513​ CrossRef
12.
Zurück zum Zitat Zhong Z, Zheng L, Kang G, Li S, Yang Y (2020) Random erasing data augmentation. AAAI 34(07):13001–13008 CrossRef Zhong Z, Zheng L, Kang G, Li S, Yang Y (2020) Random erasing data augmentation. AAAI 34(07):13001–13008 CrossRef
13.
Zurück zum Zitat Zhou K, Liu Z, Qiao Y, Xiang T, Loy CC (2021) Domain generalization in vision: a survey (arXiv e-prints arXiv:2103.02503) Zhou K, Liu Z, Qiao Y, Xiang T, Loy CC (2021) Domain generalization in vision: a survey (arXiv e-prints arXiv:2103.02503)
14.
Zurück zum Zitat Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: A simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958 MathSciNetMATH Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: A simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958 MathSciNetMATH
16.
Zurück zum Zitat Erhan D, Bengio Y, Courville A, Manzagol PA, Vincent P, Bengio S (2010) Why does unsupervised pre-training help deep learning? J Mach Learn Res 11(19):625–660 MathSciNetMATH Erhan D, Bengio Y, Courville A, Manzagol PA, Vincent P, Bengio S (2010) Why does unsupervised pre-training help deep learning? J Mach Learn Res 11(19):625–660 MathSciNetMATH
19.
Zurück zum Zitat Iwana BK, Uchida S (2021) An empirical survey of data augmentation for time series classification with neural networks. PLoS ONE 16(7):e254841 CrossRef Iwana BK, Uchida S (2021) An empirical survey of data augmentation for time series classification with neural networks. PLoS ONE 16(7):e254841 CrossRef
20.
Zurück zum Zitat Wan C, Shen X, Zhang Y, Yin Z, Tian X, Gao F, Huang J, Hua XS (2022) Meta convolutional neural networks for single domain generalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), S 4682–4691 Wan C, Shen X, Zhang Y, Yin Z, Tian X, Gao F, Huang J, Hua XS (2022) Meta convolutional neural networks for single domain generalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), S 4682–4691
26.
Zurück zum Zitat Meister S, Wermes MAM, Stüve J, Groves RM (2021) Review of image segmentation techniques for layup defect detection in the automated fiber placement process. J Intell Manuf 32(8):2099–2119 CrossRef Meister S, Wermes MAM, Stüve J, Groves RM (2021) Review of image segmentation techniques for layup defect detection in the automated fiber placement process. J Intell Manuf 32(8):2099–2119 CrossRef
32.
Zurück zum Zitat Bacioiu D, Melton G, Papaelias M, Shaw R (2019) Automated defect classification of aluminium 5083 tig welding using hdr camera and neural networks. J Manuf Process 45:603–613 CrossRef Bacioiu D, Melton G, Papaelias M, Shaw R (2019) Automated defect classification of aluminium 5083 tig welding using hdr camera and neural networks. J Manuf Process 45:603–613 CrossRef
40.
Zurück zum Zitat Loshchilov I, Hutter F (2019) Decoupled weight decay regularization. In: ICLR Loshchilov I, Hutter F (2019) Decoupled weight decay regularization. In: ICLR
Metadaten
Titel
A Study on Data Augmentation Techniques for Visual Defect Detection in Manufacturing
verfasst von
Lars Leyendecker
Shobhit Agarwal
Thorben Werner
Maximilian Motz
Robert H. Schmitt
Copyright-Jahr
2023
Verlag
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-662-66769-9_6