Zum Inhalt

Intelligent Flaw Detection in Eddy Current Inspection Data Through Machine Learning Model

  • Open Access
  • 01.09.2025
Erschienen in:

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Der Artikel geht auf die entscheidende Rolle von Wärmetauschern in verschiedenen Branchen und die Herausforderungen ein, vor denen traditionelle Wirbelstromprüfungen zur Fehlererkennung stehen. Sie unterstreicht die Grenzen der manuellen Analyse und die Notwendigkeit automatisierter, präziser und zuverlässiger Methoden. Die vorgeschlagene Lösung umfasst ein maschinelles Lernmodell, das vier ausgeklügelte Merkmale - Varianz, Template-Korrelation, dynamische Zeitverzerrung und Fläche unter dem Signal - nutzt, die aus Wirbelstromsignalen gewonnen werden. Das Modell verwendet einen zufälligen Waldklassifikator, der für seine Robustheit, Effizienz und Fähigkeit zur Handhabung komplexer Datensätze bekannt ist. Die Studie zeigt die außergewöhnliche Leistung des Modells durch Kreuzvalidierung und unabhängige Bewertung, die hohe Genauigkeit, Präzision und Abrufraten erreicht. Der Artikel diskutiert auch die rechnerische Effizienz des Modells und sein Potenzial für den Einsatz in Echtzeit. Darüber hinaus werden die Bedeutung jedes einzelnen Merkmals und seine synergistische Beziehung untersucht, wobei die Bedeutung der Verwendung aller vier Merkmale für eine optimale Leistung betont wird. Die Forschungsergebnisse schließen mit dem Potenzial des Modells, die Fehlererkennung in Wärmetauscherrohren zu revolutionieren, Ausfallzeiten zu reduzieren und die Sicherheit in kritischen industriellen Anwendungen zu verbessern.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Introduction

Heat exchangers are critical components in a wide array of industries including petrochemicals, pharmaceuticals, refineries, and nuclear power plants. They are designed to exchange heat energy between two fluids through a thin interface (tubes) while maintaining physical separation. Their reliable operation is essential for system efficiency and safety. Over time, the tubes of heat exchangers are subjected to wear and tear, corrosion, and other forms of degradation that can compromise their performance and lead to costly downtimes or even catastrophic failures.
Eddy current technique is the most widely used non-destructive testing (NDT) technique employed periodically to monitor the integrity of heat exchanger tubes [1, 2]. This method utilizes electromagnetic induction principle to detect flaws in conductive materials by measuring changes in the impedance of a coil as it interacts with eddy currents generated within the test material. Eddy current inspection is particularly useful for detecting surface and near-surface flaws such as cracks, corrosion, and thinning of the heat exchanger tubes.
Despite its advantages, traditional eddy current testing of heat exchanger tubes faces several significant challenges [3, 4]. Typically, heat exchangers consist of a large number of tubes. Currently, eddy current data is acquired on-site, tube by tube, and analysed manually by qualified personnel after completion of the inspection. However, this manual analysis of the data is time-consuming and prone to human errors. Moreover, the high volume of inspection data, coupled with the presence of noise and other artefacts, can make it difficult to accurately and consistently detect flaws and further characterize them [5]. Hence, an intelligent flaw detection method capable of automatically detecting flaw features is essential. In recent years, the integration of machine learning with eddy current testing (ECT) has gained significant momentum, aiming to improve automation, accuracy, and reliability. Numerous studies [6] have investigated a wide range of machine learning models—ranging from conventional models such as Support Vector Machines (SVM), Random Forest (RF), Decision Trees, and Naïve Bayes classifiers to advanced deep learning models including Artificial Neural Networks (ANN), Probabilistic Neural Networks (PNN), Extreme Learning Machines (ELM), and various Convolutional Neural Network (CNN) architectures like ResNet [7], DenseNet, VGGNet, GoogLeNet, and AlexNet. These models have been successfully applied to address diverse ECT challenges, including defect detection [8], classification, localization, and characterization, as well as lift off variation [9], corrosion estimation, thickness measurement, physical parameter estimation, fatigue degradation assessment, and hardness estimation.
Several studies on the automated characterisation of flaws have been reported in the open literature, wherein the location of a defect is already known and shape, size and depth of the flaw is predicted using inversion approaches She et al. [10] have proposed a pulsed eddy current (PEC) testing approach combined with signal processing and an optimized Res2Net18 deep learning model to simultaneously detect defect size, depth, and material thickness with high classification accuracy and improved model stability. Tian et al. [11] have proposed an automated approach for evaluating the depth of metal surface defects using a portable ECT device and deep learning, where a custom dataset (MDDECT) was created and a 1D ResNeXt model achieved 93.58% accuracy in classifying defect depths while remaining robust to liftoff variations. Mohseni et al. [12] have used COMSOL to model the interaction between a split-D ECT probe and EDM notches, considering lift-off and tilt effects. They validated the model with impedance measurements. An adaptive neuro-fuzzy inference system model trained on simulated signals accurately estimated notch lengths with low error. Smid et al. [13] have developed a method for automatically evaluating flaws during manual eddy current (EC) inspections, addressing challenges like varying scanning speeds and probe positions. It employs robust signal normalization, feature extraction using Fourier and complex discrete wavelet descriptors, and six different ML classifiers. Bernieri et al. [14] have compared the performance of an artificial neural network (ANN) and a support vector machine for regression (SVR) in both simulated and real environments, finding SVR to be more effective for crack size reconstruction. Yin et al. [15] have developed an analytical method to generate Lissajous curves for creating artificial datasets, and clustering-based method is proposed for extracting geometric features from these curves, enhancing feature extraction for machine learning models. Zhu et al. [16] have proposed Convolution neural network (CNN) based model for defect classification with uncertainty. Similarly, Zheng et al. [17] have developed completely autonomous robot for heat exchanger tube inspection using CNN based classifier, which may appear to be computationally expensive, requiring significant processing power or time to train and deploy. While algorithms such as CNN are powerful, they often have complex architectures that are not always feasible for real-time or large-scale applications.
While the challenge of accurately detecting and locating flaws remains unresolved, a few researchers have reported upon automated defect identification/classification [18, 19]. Udpa et al. [5] have proposed automated flaw detection method for heat exchanger tube inspection data using neural network and rule-based model for classification. The approach leverages three types of features for classification: physical features are computed from the time-domain eddy current signal, including peak-to-peak values, phase angle, and energy. Transform-based features involve Fourier transform-derived magnitude and phase spectra. Additionally, statistical features, such as mean and variance, are utilized to enhance the detection process [5]. However, they set a threshold before calculating features, which is akin to having prior knowledge of the defect and achieved up to 93% accuracy in flaw detection. Falque et al. [20] have used feature-based ML model for defect detection in water pipes using the amplitude, phase-shift and Hjorth Parameters as features combined with Random forest, SVM, logistic regression and Naïve Bayes classifiers. These models exhibited a maximum accuracy of 92%, particularly in scenarios involving diverse or previously unseen defect types. This limitation stemmed from factors such as inadequate training data or ineffective feature extraction techniques, leading to high rates of false positives or negatives and reduced overall reliability.
This paper proposes a machine learning model capable of detecting flaws in tubes without prior knowledge of flaw locations and addresses several critical issues in flaw detection in heat exchanger tube eddy current inspection data. This represents a significant advancement over earlier methods that required known flaw locations. The model leverages ingenious feature-based classification of eddy current inspection data through, four carefully selected features combined with random forest based machine learning model to improve the precision and efficiency of automated flaw identification. The proposed model demonstrates enhanced accuracy compared to traditional methods, overcoming some of the limitations associated with earlier proposed ML approaches. It incorporates efficient algorithms that balance computational cost and performance, making it suitable for real-time applications. A key feature of this study is its ability to generalize to a large number of unknown tubes. This proves the model’s applicability across various types of flaw scenarios, ultimately enhancing the reliability and safety of heat exchangers in critical industrial applications.

2 Machine Learning Model for Automated Flaw Identification

The primary goal of the study is to enhance flaw detection accuracy through a feature based classification model, trained and evaluated on processed eddy current inspection data. The flow chart of the proposed methodology is given in Fig. 1. Among the many different supervised machine learning models such as SVM with Gaussian kernel, XGBoost and Neural Network, the Random Forest Model (RFM) has been chosen due to its ability to efficiently handle large and complex datasets, capturing intricate non-linear relationships inherent in eddy current inspection data [21]. Such data often exhibit subtle patterns influenced by material properties and defect characteristics, and RFM excels in modeling these subtleties through its ensemble learning approach. It is robust to noise and variability [22], such as probe positioning errors or surface roughness, making it particularly well-suited for practical applications. The model’s robustness to outliers, minimal parameter tuning requirements, and strong generalization capabilities enable efficient implementation without extensive optimization [23]. Compared to more complex neural network models, RFM offers greater computational efficiency and performs well on smaller or imbalanced datasets [24, 25]. Additionally, its built-in methods for estimating feature importance enhance interpretability by identifying critical attributes for flaw detection, providing valuable insights into the underlying factors and mechanisms at play. With its proven effectiveness in similar non-destructive testing scenarios, RFM offers a reliable, efficient, and interpretable solution for automating flaw detection in eddy current inspection data, improving accuracy and consistency in the detection process [21, 26]. As can be seen from Fig. 1, four ingenious and significant features are extracted from eddy current signals and are given as input to the RFM. The output of the model is configured to provide a binary classification of the signal data points into flaw (TRUE or 1) and no-flaw (FALSE or 0).
Fig. 1
Proposed methodology for classification of flaw from eddy current signals
Bild vergrößern

2.1 Eddy Current (EC) Data and Ingenious Features for Classification of Flaws

EC inspection data of 1060 condenser type heat exchanger tubes from a power plant were collected for training and testing of the proposed machine learning model. The tubes are made of ASTM B 111 alloy 687. The outer diameter of the tubes is 25.4 mm, wall thickness is 1.26 mm and length is 8 m. Carbon steel support structures are used at almost every 1 m distance in the heat exchanger. Thus, each tube has around 7–8 support structures. EC inspection of the heat exchanger tubes, in general, are performed in accordance with ASME Section V, Article 8 [27]. As per the standard, a conventional bobbin-type differential eddy current probe with a fill factor of 85% was used for inspection. The probe consists of two coils, each having a width of 2 mm and are separated by 2 mm along the axial direction. Dual frequency EC testing was performed at excitation frequencies of 9 kHz and 18 kHz, to capture depth-dependent responses and for multifrequency mixing. Dual frequency mixing technique [27] is used to eliminate disturbing noise from support structures. This mix horizontal (in-phase component of the impedance) and vertical (quadrature component of the impedance) components are used for deriving input features for this study. Manual inspection of the tubes was performed. During the inspection EC probe was inserted in each tube and data was recorded while pulling the probe at a reasonably uniform speed of ~ 100 mm/s.
As per the standard, a calibration tube consisting of reference flaws such as a through hole and flat bottom holes (FBH) of 80%, 60%, 40% and 20% of wall thickness (WT) depth on the OD side of the tube was used to calibrate the instrument and probe and for approximate sizing of the depth of defects based on the phase angle response of the indications. Figure 2 shows typical in-phase and quadrature components along with the impedance plane signals of the reference flaws. As can be seen, typical EC signal of a flaw exhibits Lissajous pattern (8-type) in the impedance plane. Figure 3 shows the typical in-phase and quadrature components of EC impedance data of a tube. Signal of a flaw exhibit a typical differentiated Gaussian type signal. In this study, manual analysis of the data is carried out to identify potential indications based on the shape (Lissajous pattern) and amplitude of the signals and was used for training the model.
Fig. 2
Typical in-phase and quadrature components and impedance plane signals of the reference flaws
Bild vergrößern
Prior to the computation of features, the raw eddy current impedance data is subjected to a band-pass filter and an edge removal algorithm to eliminate noise. This step ensures that the data is cleaned and suitable for further analysis. The raw impedance signals were denoised using a customised Fast Fourier Transform (FFT) based band-pass filter to eliminate high frequency noise due to probe wobble, electronic and material dependent noise and low frequency noise due to baseline fluctuations. The impedance data contains high amplitude signals at the start and the end of the scan due to probe imbalance signal at the edges of the tube (edge effect). These signals were removed using a semi-automated customized edge removal algorithm. Figure 4 shows the processed in-phase and quadrature components of the EC impedance data shown in Fig. 3, highlighting a flaw indication.
Fig. 3
A typical raw EC signal from a heat exchanger tube, with a zoomed-in view of the flaw signal shown in the inset
Bild vergrößern
Fig. 4
Processed in-phase and quadrature components of the eddy current (EC) impedance data shown in Fig. 3, highlighting a flaw indication
Bild vergrößern
This study proposes four novel sliding window operation based features viz., (i) Variance, (ii) Template based correlation, (iii) Template based dynamic time warping and (iv) Area under the signal. All the four features were derived from the amplitude impedance signals which is derived from the horizontal and vertical components of the mix signal data using the Eq. (1).
\(A=\:\sqrt{{R}^{2}+{X}^{2}}\) (1)
where A is the amplitude, R is the in-phase or resistance component of the probe impedance and X is the quadrature or inductive reactance component.

2.1.1 Variance

A sliding window-based variance of the amplitude signal is considered as one of the features for the ML model. A 30 data points window is considered optimal based on the observation from Fig. 5(b) as this window size effectively captures all the characteristic features of the flaw. The sliding window method involves calculating the variance of the amplitude values of the initial 30 data points of the signal, then moving the window to the next data point to calculate the variance. This process is continued till the end of the signal. Zero padding is done to avoid the window size exceeding the number of data points. The computed feature this way will have the same number of data points as that of the original signal. Variance is considered as a good feature for flaw detection as it clearly points out the outlier data points from the baseline data, which may be a flaw. The Variance of the n-data points window is computed using Eq. (1):
$$\:{{\upsigma\:}}^{2}=\frac{1}{\text{n}-1}\:{\sum\:}_{\left\{\text{i}=1\right\}}^{\left\{\text{n}\right\}}{\left({\text{x}}_{\text{i}}-\:\stackrel{-}{\text{x}}\right)}^{2}$$
(1)
where \(\:{\text{x}}_{\text{i}}\) is amplitude value at each individual data point and \(\:\stackrel{-}{\text{x}}\) is the mean of the amplitude value of n data points.

2.1.2 Template Correlation

A flaw signal extracted from a heat exchanger EC inspection data shown in Fig. 5(c) was used to calculate correlation with the amplitude signal of test data to calculate this feature. The template was moved along the amplitude signal, similar to a sliding window, to calculate the second feature called Template Correlation by using Eq. (2):
$$\:\text{B}=\sum\:_{\text{i}=1}^{\text{i}=\text{n}}\sum\:_{\text{j}=1}^{\text{j}=51}{a}_{j}.{b}_{j}$$
(2)
where B is the template correlation value and n is the number of data points in a processed amplitude data, aj is the template amplitude value and bj is the amplitude value of a segment of the processed data. The rationale behind using this approach is to check the shape similarity of the amplitude signal with the flaw signal. A high Correlation indicates high similarity between the sequences.

2.1.3 DTW Distances

DTW (Dynamic Time Warping) distance is calculated between amplitude signal and the template signal in a similar manner as done in the correlation feature using Eq. (3):
$$\:\text{C}=\sum\:_{\text{i}=1}^{\text{i}=\text{n}}\sum\:_{\text{j}=1}^{\text{j}=51}\text{D}\text{T}\text{W}({a}_{j},{b}_{j})$$
(3)
where C is the DTW distance value and n is the number of data points in processed amplitude data, aj is the template amplitude value and bj is the amplitude value of a segment of processed data. DTW provides a similarity between two temporal sequence that may vary in speed or timing [28].

2.1.4 Area Under Signal

A sliding window based area under the amplitude signal is considered as another important feature. In this case also a 30 data point sliding window operation is performed as that of the variance. Area under the amplitude signal is computed using trapezoid method as per Eq. (4).
$$\:\text{A}\text{r}\text{e}\text{a}\left(\text{b}\text{j}\right)=\frac{1}{2}.\left({\text{x}}_{\text{j}+1}-{\text{x}}_{\text{j}}\right).({\text{y}}_{\text{j}+1}+{\text{y}}_{\text{j}})\:$$
(4)
where yj is the amplitude value corresponding to point xj.
Fig. 5
(a) In-phase component, (b) quadrature component, and (c) normalized amplitudes of the defect template used for calculating features
Bild vergrößern

2.1.5 Analysis of the Extracted Features

The features extracted from the EC amplitude signal of Fig. 6(a) are illustrated in Figs. 6(b-e). Figure 6(b) presents the rolling variance feature, which effectively separates the flaw signal from background noise by capturing the abrupt change in amplitude at the flaw location. Figure 6(c) shows the template correlation feature, which yields higher values at the flaw region because the signal shape matches the flaw and the amplitude is elevated. Figure 6(d) depicts the DTW distances, which are higher in the flaw region because both the amplitude and shape similarity is more. Finally, Fig. 6(e) displays the rolling area under the signal feature, which highlights the flaw region due to the larger area associated with the flaw signal as compared to the noise signals.
Fig. 6
(a) Amplitude, (b) rolling variance, (c) template correlation, (d) template DTW and and (e) rolling area under the signal features obtained from a typical EC flaw signal
Bild vergrößern

2.2 Machine Learning Model and Dataset

A RFM was utilized for the classification task, leveraging the concept of ensemble learning. This algorithm constructs multiple decision trees, each built by recursively splitting the dataset into subsets based on feature values that provide the greatest improvement in purity, using Gini impurity. Gini impurity measures the likelihood of incorrect classification of a randomly chosen element, with lower values indicating purer splits [29]. Each split creates a node, while branches represent the outcomes based on feature thresholds, continuing until stopping criteria such as maximum depth or a minimum number of samples in a leaf node are met.
Random Forest enhances this approach by training each tree on random subsets of both the data and the features, employing a technique known as bootstrap aggregating or “bagging.” By creating multiple samples from the training dataset with replacement, each tree is built independently, ensuring diversity. Once the individual trees have made their predictions, the RFM combines these predictions by taking the mode (the most frequently predicted class) among the trees to determine the final classification for the input. This voting mechanism enhances the robustness of the model, as it reduces the influence of any single tree that might have made a poor prediction.
This ensemble method effectively reduces the risk of overfitting, which is common in single decision trees, and improves the model’s ability to generalize to unseen data. The model was implemented using “Scikit-learn”, a widely recognized state-of-the-art Python library [30].
As discussed earlier, the goal is a binary classification of each impedance data point to presence of flaw (1) and absence of flaw (0). The ML model was trained using labelled data created from impedance data of 60 tubes (316639 data points), which contained 117 instances of flaw signals (5805 data points) and a large amount of very low amplitude or closer to zero impedance values indicating no-flaw. Figure 7 shows a typical amplitude eddy current impedance data of one tube and the corresponding labelled data used for training. As can be seen, the label data is set to 1 where a flaw is present and zero at other instances (no-flaw).
The labelled data provided by the expert were predominantly imbalanced, with most impedance points falling into the no-flaw class. This skewed distribution can impair the classification performance of the machine learning model and potentially lead to unreliable results from an imbalanced testing dataset. To address this, the minority class (flaw) was over-sampled using Synthetic Minority Oversampling Technique (SMOTE), It generates new synthetic instances for the minority class by creating random data points along the line segments between a data instance and its kth-nearest neighbours. The number of nearest neighbours is determined based on the desired level of oversampling, ensuring the minority class is more adequately represented, resulting in a balanced dataset that was then used for training the model [31].
Fig. 7
Label data for the corresponding amplitude eddy current signal
Bild vergrößern

2.2.1 Optimisation of the RFM

The hyperparameters of the RFM i.e. the number of trees (n_estimators), tree depth (max_depth), the minimum number of data instances required to split a node (min_samples_split), the minimum samples per leaf (min_samples_leaf), and the maximum features considered for splitting a node (max_features), were optimized using training dataset. The optimization was performed with the grid search method using 5-fold cross-validation, resulting in optimized values of 100, ‘None’ (allowing trees to grow fully), 2, 1, and 2 respectively [32].

2.3 Evaluation Metrics

The optimized and trained ML model was tested on training and evaluation dataset. The predicted results from the model were compared with the ground truth labels provided by an expert. Instances from this comparison were categorized into four classes: when the model correctly predicted a flaw as a flaw, it was classified as a true positive (TP); when the model correctly predicted no-flaw as no-flaw, it was classified as a true negative (TN); when the model incorrectly predicted no-flaw as a flaw, it was classified as a false positive (FP); and when the model incorrectly predicted a flaw as no-flaw, it was classified as a false negative (FN). These categories were used to construct a confusion matrix based on the number of data points, as shown in Table 1.
Table 1
A typical confusion matrix
 
True Label
Flaw (1)
No-flaw (0)
Predicted
Label
Flaw (1)
True positive
False positive
No-flaw (0)
False negative
True negative
In the confusion matrix, a high number of TP and TN indicates stronger performance by the trained ML model on the test set, as it represents correctly classified instances. In contrast, FP and FN values correspond to misclassified cases. However, it is important to note that the Confusion matrix alone cannot serve as a definitive measure of the model performance. Instead, metrics derived from its elements, such as accuracy, precision, recall, F1-score and Matthews correlation coefficient (MCC), were calculated using Eqs. (59) to provide a more comprehensive quantitative evaluation:
$$\:Accuracy=\frac{(TP+TN)}{(TP+TN+FP+FN)}$$
(5)
$$\:Precision=\:\frac{TP}{TP+FP}$$
(6)
$$\:Recall=\frac{TP}{TP+FN}$$
(7)
$$\:F1-score=\:\frac{2*\:Precision*\:Recall}{Precision+Recall}$$
(8)
$$\:MCC=\:\frac{(TP*TN-FP*FN)}{\sqrt{\left(\left(TP+FP\right)*\left(TN+FN\right)*\left(FP+TN\right)*\right(TP+FN\left)\right)}}$$
(9)
where Accuracy measures the overall correctness of the model by calculating the proportion of true results (both true positives and true negatives) out of the total predictions. Precision evaluates the model’s ability to correctly identify positive instances, focusing on the ratio of true positives to the sum of true and false positives, highlighting how many of the predicted positives are actually correct. Recall (or sensitivity) assesses how well the model captures all actual positive instances by comparing true positives to the sum of true positives and false negatives, indicating how well the model identifies all relevant cases. The F1-score combines precision and recall into a single metric by calculating their harmonic mean, offering a balanced view of the model’s performance, while the Matthews Correlation Coefficient (MCC) provides a more comprehensive assessment by considering all four confusion matrix categories (true positives, true negatives, false positives, and false negatives), making it particularly valuable for imbalanced datasets where one metric alone may be misleading.

3 Results and Discussion

The trained and optimized ML model was initially evaluated on the training data, with the results detailed in Sect. 3.1. Subsequently, the outcomes of cross-validation studies are presented in Sect. 3.2. Finally, the model was assessed on an independent dataset comprising of 1,000 tubes, as discussed in Sect. 3.3.

3.1 Evaluation of the model on training data

The trained and optimized RFM was subjected to evaluation using the training data first to assess the performance. The confusion matrix obtained for this evaluation is shown in Table 2. This confusion matrix shows an exceptionally high-performance of the model with almost perfect classification. It correctly identifies all true positives (310833) and true negatives (310834), with only 1 false positive and no false negatives. This model demonstrates both high precision and recall, with minimal error.

3.2 Cross-validation

Cross validation studies were performed to assess the overfitting or underfitting performance of the model. A 5-fold cross validation of the trained model was performed on the training dataset with 80% (training) and 20% (evaluation) splitting of the data. The evaluation metrics at each fold of the cross- validation studies are shown in Table 3. It can be observed that the model performs optimally at each fold of the cross validation indicating strong generalization performance.

3.3 Evaluation

The optimized and trained ML model was tested on an independent dataset of 1000 tubes and the confusion matrix obtained for the same is shown in Table 2.
Table 2
Confusion matrix for evaluation on training and test dataset
 
Training data
Independent Evaluation data
True Label
True Label
Predicted
Label
310,834 (TP)
1 (FP)
4331 (TP)
1739 (FP)
0 (FN)
310,833 (TN)
1075 (FN)
4,334,779 (TN)
Table 3
Evaluation metrics for training and test dataset
 
Training data
Independent Evaluation data
Evaluation Metrics
Fold-1
Fold-2
Fold-3
Fold-4
Fold-5
Accuracy
0.9977
0.9978
0.9981
0.9979
0.9979
0.9994
Precision
0.9955
0.9958
0.9964
0.9958
0.9959
0.7135
Recall
0.9999
0.9999
0.9997
1.0000
1.0000
0.8011
F1-score
0.9977
0.9979
0.9981
0.9979
0.9979
0.7548
MCC
0.9954
0.9957
0.9962
0.9957
0.9959
0.7557

3.3.1 Performance Evaluation

It can be seen from the evaluation metrics in Table 3 that the model achieved an accuracy of 99.94% on unseen data. As mentioned earlier, eddy current testing data is generally imbalanced, with a higher number of negative cases and rare positive cases. In such instances, accuracy does not provide insights into how well the model performs on the minority class (i.e., positive cases). Therefore, relying solely on accuracy can be misleading.
In this context, metrics like precision, recall, F1-score and MCC offer more insights. The model achieved a precision of 0.71, indicating that when it predicts a data point as a flaw, it is correct 71% of the time, which ensures a reasonable level of precision in flaw predictions. The model’s recall value of 0.80 indicates that it successfully identified 80% of data points of flaw instances. However, when recall is assessed based on flaw instances, i.e. a flaw is considered to be detected if any portion of the predicted region overlaps with the actual flaw (as shown in Fig. 8(d)), the model achieved a 100% recall rate, i.e. it successfully detected all actual flaws.

3.3.2 Robustness Evaluation

Figure 8 shows actual EC amplitude data and the predicted classification label for a few candidate cases from which the robustness can be readily confirmed. Figure 8(a) illustrates how the machine learning model accurately captures the flaw, demonstrating its ability to correctly identify the flaw region. Figure 8(b) shows the case when model detects some regions that are no-flaws (FP) along with the actual flaw. However, these false positives are not a concern for flaw detection, as they can be filtered out using a thresholding approach based on the phase angle of the flaw signal. Figure 8(c) highlights the model’s inherent advantage in point wise flaw prediction, showing its capability to clearly distinguish flaws even when they are located close to each other. Figure 8(d) presents a case where the model struggles to fully capture the extent of the flaw. This difficulty arises from confusion in defining the transition region between flaw and no-flaw, particularly in flaws with low amplitudes.
The high accuracy, precision, recall, F1-score and MCC shown by the model is due to the combination of carefully selected features and RFM, which makes this framework a strong candidate for eddy current flaw detection system. This framework also provides a strong base for further characterisation of the flaws in eddy current inspection data.
Fig. 8
Evaluation of the robustness of the machine learning model for an independent dataset
Bild vergrößern

3.3.3 Approach to Further Improve the Efficacy of the Model

As can be observed in Figs. 8 (b) and (c), the predictions can have higher false positives when shorter duration non-flaw events are classified as flaws. Similarly, higher false negatives can be there when longer duration defect events are classified as multiple smaller flaw events wherein data points within a flaw region are misclassified as flaw free regions (refer Fig. 8 (d)). These false positives and negatives can be significantly reduced by simply ignoring shorter events as well as combining closely spaced multiple events into one flaw event. Such a rule-based approach was employed to further reduce the false positives and false negatives. In this approach, the predicted label is subjected to two sequential conditions. Firstly, if the distance between two consecutive flaws is less than 30 data points, they are combined to reduce false negatives because they are part of the same flaw and which may result in a small increase in false positives. Secondly, if the ones are standalone i.e. no other ones in the 30 data points vicinity on either side and have a length of less than 30 data points, they are considered as no flaw.
The confusion matrix and the evaluation metrics after the implementation of the proposed approach is shown in Tables 4 and 5, respectively. It is clear from the tables that the model’s performance has significantly improved with this rule-based approach. False positives decreased from 1,739 to 937, resulting in more than 10% increase in precision. False negatives decreased from 1,075 to 987, leading to an overall improvement in recall, F1-score, MCC and accuracy. With an F1-score of 0.82 and MCC of 0.82, the model demonstrates balanced performance across all classes, making it ideal for an eddy current flaw detection system where good precision is required and false negatives cannot be tolerated.
Table 4
Confusion matrix for evaluation on test dataset after applying rule based approach
 
Independent Evaluation data
True Label
Predicted
Label
4419
937
987
4,335,581
Table 5
Evaluation metrics for test dataset after applying rule based approach
Evaluation Metrics
Independent Evaluation data
Accuracy
0.9996
Precision
0.8251
Recall
0.8174
F1-score
0.8212
MCC
0.8210

3.3.4 Feature Importance

The overall success of the model is attributed to the carefully selected ingenious features. In the Random Forest Classifier model, the feature importance scores shown in Fig. 9 highlight the varying contributions of each feature to the model’s performance. Template correlation, with the highest importance score of 48%, plays a crucial role in guiding the model’s predictions by capturing significant patterns in the data. Similarly, the Variance feature has got the second highest importance score of 26%, reinforcing the model’s accuracy by complementing Template correlation’s information. Together, these two features account for a substantial portion of the model’s ability to classify, indicating their pivotal roles in driving the classifier’s performance.
However, Area under the signal and DTW distances, with importance scores of 21% and 5% respectively, also contribute meaningfully to the overall prediction quality. Although their importance is lower, they provide essential complementary information that enhances the model’s ability to generalize across different cases. Area under signal plays a supportive role, refining the decision-making process, while DTW distances, despite its lower score, captures more subtle patterns like the change in flaw width, due to scanning speed variations that might otherwise be missed. The model’s success is driven by the combined power of all four features, each playing a unique and essential role in ensuring accurate and reliable predictions.
Fig. 9
Feature importance of Random forest model
Bild vergrößern
In order to further demonstrate the efficacy and ingenuity of the features, a leave one out cross validation study was performed on the features. In this study, three features were considered for training and evaluation of the model leaving one of the four features at time. The performance metrics were computed for each cross validation and studied. Figure 10 shows the results of the study. As can be seen, the performance metrics obtained from the RFM for four different combinations of features (T-Template Correlation, V-Variance, A-Area under signal and D-DTW), clearly demonstrate the significance of including all four features together as input. When all features are used, the model achieves the highest values across all metrics. This indicates that all four features contribute to the model’s ability to correctly classify instances, balance between true positives and false positives and ensure a high rate of capturing relevant instances.
Fig. 10
Performance metrics for different features combination (T-Template Correlation, V-Variance, A-Area under signal and D-DTW distance)
Bild vergrößern

3.3.5 Observations from Feature Combinations

T, A, D (Accuracy = 99.51%, Precision = 17.34%, Recall = 77.77%, F1-score = 28.36%
MCC = 0.3658): Excluding V causes a catastrophic drop in Precision, F1-score and MCC, highlighting that V is a vital feature for the model’s performance. The low Precision suggests the model struggles to limit false positives without V as it separates low amplitude noise signals from flaw.
T, V, D (Accuracy = 99.88%, Precision = 51.09%, Recall = 76.27%, F1-score = 61.19%
MCC = 0.6237): Excluding A leads to a significant drop in Precision, F1-score and MCC, suggesting that A plays a critical role in identifying true positives and reducing false positives. The combination still maintains high Recall, but the overall effectiveness (as reflected in F1-Score and MCC) declines.
V, A, D (Accuracy = 99.90%, Precision = 56.09%, Recall = 85.33%, F1-score = 67.68%
MCC = 0.6913): Removing T results in a large increase in Recall, but the drop in Precision indicates a higher false positive rate. This shows that T might contribute to enhancing the model’s ability to differentiate between true and false positives.
T, V, A (Accuracy = 99.90%, Precision = 58.09%, Recall = 81.65%, F1-score = 67.89%
MCC = 0.6883): Omitting the D feature significantly reduces Precision (71.35% → 58.09%), indicating that the model is more prone to false positives without this feature. Recall is marginally higher, suggesting that this combination captures slightly more true positives, but the imbalance between Precision and Recall results in a lower F1-score.

3.3.6 Role of Each Feature

Table 6 summarizes the roles and physical significances of individual features as described below:
T: Likely provides global or overarching patterns that help the model achieve high Precision. Its absence significantly increases false positives, indicating its role in refining the model’s predictive power.
V: Essential for ensuring model reliability, as its removal leads to a dramatic drop in Precision. It might capture variability that the other features cannot.
A: Plays a vital role in balancing Precision and Recall. Without it, the F1-score declines, suggesting its importance in both correctly identifying true positives and minimizing false positives.
D: Contributes to the overall stability of the model. Its absence causes a noticeable reduction in Precision, indicating its utility in ensuring correct classification.
Table 6
Table showing role and significance of individual features
S. No.
Features
Role
Physical Significance
1.
T
Captures global or overarching patterns that enhance the model’s ability to achieve high Precision.
Captures Shape Similarity
2.
V
Captures variability and separates low amplitude noise signals from flaw
Outlier Detection
3.
A
Identifies true positives and minimizing false positives
Amplitude
4.
D
Provide overall stability
It is essentially a similarity metric which also captures speed variation.
The analysis highlights the synergistic relationship between the features T, V, A, and D. Using all four features ensures the model achieves the best possible performance across all metrics. Each feature provides unique information that complements the others, making their combined use indispensable for robust and accurate predictions.

3.4 Computational Time

The time complexity of the RFM is O (k* n*m*log(n)), where k is the number of trees, n is the number of data points, and m is the number of features, the computational time increases n*log(n) with the number of data points. In contrast to SVM where time complexity is O(n2) [33], this n*log(n) ensures that the model can scale effectively, even with large datasets, while maintaining reasonable computational time [34]. Training time for the RFM is 124 s in a standard Intel i5 processor with 8GB RAM, demonstrating its efficiency in handling the dataset. The efficient training time and scalability make Random Forest a strong candidate for real-time deployment, as it can quickly process data and provide timely predictions without compromising performance.

3.5 Future Works

The current model utilizes features derived from a flaw template based on eddy current testing data, which has shown effectiveness in flaw detection. However, these features were selected based on the expertise of an analyst, leaving room for further exploration. Future work will aim to develop a model that dynamically selects flaw templates by analysing the frequency characteristics of the data, thereby optimizing detection accuracy. Additionally, there is a scope for the inclusion of new, data-driven features, making feature selection an open-ended process that could further enhance the model’s performance and adaptability for more accurate flaw detection.

4 Conclusions

This research presents a flaw classification methodology adept at identifying flaws in tubes without prior knowledge of their locations. The model leverages four sliding window based ingenious features associated with flaw characteristics namely variance, template correlation, template dynamic time warping distance and area under the signal, along with an optimised RFM (random forest model). Cross validation studies on the training dataset shows optimal performance of the model. The machine learning model achieved an impressive accuracy rate of 99.94% and successfully classified 100% of the flaw instances in an independent evaluation dataset consisting of 1000 tubes. With further application of the rule based approach the F1-score and MCC improved to 0.82 from 0.75, showing a balance between precision (82%) and recall (81%). The model took 124 s for training time, which is computationally highly efficient. These results demonstrate the model’s exceptional accuracy, robustness, and efficiency in flaw detection. The success of the model for accurately classifying the flaw instances is mainly attributed to the selection of the ingenious features which reflects the characteristics of the flaws. This automation will significantly reduce the labour-intensive and time-consuming process of manual analysis of EC inspection data of heat exchanger tubes, enabling real-time flaw detection and minimizing equipment downtime.

Acknowledgements

Authors thank Dr. R. Divakar, Director, Metallurgy and Materials Group, Indira Gandhi Centre for Atomic Research (IGCAR), Kalpakkam, India and Mr. Chandrashekhar Gaurinath Karhadkar, Director, IGCAR for their encouragement and support.

Declarations

Competing Interests

The authors declare no competing interests.
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
download
DOWNLOAD
print
DRUCKEN
Titel
Intelligent Flaw Detection in Eddy Current Inspection Data Through Machine Learning Model
Verfasst von
Tikesh Kumar Sahu
S. Thirunavukkarasu
Anish Kumar
Publikationsdatum
01.09.2025
Verlag
Springer US
Erschienen in
Journal of Nondestructive Evaluation / Ausgabe 3/2025
Print ISSN: 0195-9298
Elektronische ISSN: 1573-4862
DOI
https://doi.org/10.1007/s10921-025-01229-2
1.
Zurück zum Zitat Baldev Raj, T., Jayakumar, Rao, B.P.C.: Non-destructive testing and evaluation for structural integrity, Sadhana, vol. 20, pp. 5–38, (1995). https://doi.org/10.1007/BF02747282
2.
Zurück zum Zitat García-Martín, J., Gómez-Gil, J., Vázquez-Sánchez, E.: Non-destructive techniques based on eddy current testing, Sensors, vol. 11, no. 3, pp. 2525–2565, (2011). https://doi.org/10.3390/s110302525
3.
Zurück zum Zitat Abdalla, A.N., Faraj, M.A., Samsuri, F., Rifai, D., Ali, K., Al-Douri, Y.: Challenges in improving the performance of eddy current testing. Rev. Meas. Control. 52, 1–2 (2019). https://doi.org/10.1177/0020294018801382CrossRef
4.
Zurück zum Zitat Valery Lunin and Andrey Zhdanov: Automated data analysis in eddy current inspection of steam generator tubes, Proceedings of the 9th European Conference on NDT, pp. 1–7, (2006)
5.
Zurück zum Zitat Udpa, L., Ramuhalli, P., Benson, J., Udpa, S.: Automated analysis of eddy current signals in steam generator tube inspection, Proceedings of the 16th World Conference on NDT (WCNDT), (2004). https://www.ndt.net/?id=2182
6.
Zurück zum Zitat Munir, N., Huang, J., Wong, C.N., Song, S.J.: Machine learning based eddy current testing: A review. Results Eng. p. 103724 (2024). https://doi.org/10.1016/j.rineng.2024.103724
7.
Zurück zum Zitat Zheng, X., She, S., Xia, Z., Xiong, L., Zou, X., Yu, K., Guo, R., Zhu, R., Zhang, Z., Yin, W.: Analyzing the permeability distribution of multilayered specimens using pulsed eddy-current testing with multi-scale 1D-ResNet. NDT E Int. p. 149, 103247 (2025). https://doi.org/10.1016/j.ndteint.2024.103247CrossRef
8.
Zurück zum Zitat Machado, M., Araújo: Luís Filipe Soldado Granadeiro rosado, Nuno Alberto Marques mendes, Rosa Maria Mendes miranda, and Telmo Jorge gomes Dos santos, new directions for inline inspection of automobile laser welds using non-destructive testing. Int. J. Adv. Manuf. Technol. p. 1–13 (2022). https://doi.org/10.1007/s00170-021-08007-0
9.
Zurück zum Zitat Ricci, M., Silipigni, G., Ferrigno, L., Laracca, M., Adewale, I.D., Tian, G.Y.: Evaluation of the lift-off robustness of eddy current imaging techniques. NDT E Int. p. 85, 43–52 (2017). https://doi.org/10.1016/j.ndteint.2016.10.001CrossRef
10.
Zurück zum Zitat She, S., Zheng, X., Xiong, L., Meng, T., Zhang, Z., Shao, Y., Yin, W.: Jialong shen, and Yunze he,thickness measurement and Surface-Defect detection for metal plate using pulsed eddy current testing and optimized Res2Net network. IEEE Trans. Instrum. Meas. p. 73, 1–13 (2024). https://doi.org/10.1109/TIM.2024.3418101CrossRef
11.
Zurück zum Zitat Meng, T., Tao, Y., Chen, Z., Salas, J.R., Avila, Q., Ran, Y., Shao, R., Huang, et al.: Depth evaluation for metal surface defects by eddy current testing using deep residual convolutional neural networks. IEEE Trans. Instrum. Meas. p. 70, 1–13 (2021). https://doi.org/10.1109/TIM.2021.3117367
12.
Zurück zum Zitat Mohseni, E., Viens, M., Xie, W.-F.: Adaptive neuro-fuzzy inference system trained for sizing semi-elliptical notches scanned by eddy currents. J. Nondestr. Eval. p. 39, 1–12 (2020). https://doi.org/10.1007/s10921-019-0648-8CrossRef
13.
Zurück zum Zitat Smid, R., Docekal, A., Kreidl, M.: Automated classification of eddy current signatures during manual inspection, NDT&E International, vol. 38, no. 6, pp. 462–470, (2005). https://doi.org/10.1016/j.ndteint.2004.12.004
14.
Zurück zum Zitat Bernieri, L., Ferrigno, M., Laracca, M., Molinara, M.: Crack shape reconstruction in eddy current testing using machine learning systems for regression. IEEE Trans. Instrum. Meas. p. 57(9), 1958–1968 (2008). https://doi.org/10.1109/TIM.2008.919011CrossRef
15.
Zurück zum Zitat Yin, L., Ye, B., Zhang, Z., Tao, Y., Xu, H., Avila J.R.S., Yin, W.: A novel feature extraction method of eddy current testing for defect detection based on machine learning. NDT E Int. p. 107, 102–108 (2019). https://doi.org/10.1016/j.ndteint.2019.04.005
16.
Zurück zum Zitat Zhu, P., Cheng, Y., Banerjee, P., Tamburrino, A., Deng, Y.: A novel machine learning model for eddy current testing with uncertainty. NDT&E Int. p. 101, 104–112 (2019). https://doi.org/10.1016/j.ndteint.2018.09.010CrossRef
17.
Zurück zum Zitat Zheng, B., Su, J.W., Xie, Y., Miles, J., Wang, H., Gao, W., Xin, M., Lin, J.: An autonomous robot for shell and tube heat exchanger inspection. J. Field Robot. p. 39(8), 1165–1177 (2022)CrossRef
18.
Zurück zum Zitat Dutta, C., Sagar, S.P., Kumar, A., Bhushan, R., Kadu, S., Das, T.K.: An adaptive sampling protocol for real-time defect assessment using eddy current sensor and machine learning algorithm. IEEE Trans. Ind. Appl. p. 59(5), 4556–4564 (2023). https://doi.org/10.1109/TIA.2023.3284782CrossRef
19.
Zurück zum Zitat Ye, B., Huang, P., Fan, M., Gong, X., Hou, D., Zhang, G., Zhou, Z.: Automatic classification of eddy current signals based on kernel methods. Nondestructive Test. Evaluation. p. 24, 1–2 (2009). https://doi.org/10.1080/10589750802002590CrossRef
20.
Zurück zum Zitat Falque, R., Vidal-Calleja, T., Miro, J.V.: Defect detection and segmentation framework for remote field eddy current sensor data. Sens. (Switzerland). p. 17(10), 2276 (2017). https://doi.org/10.3390/s17102276CrossRef
21.
Zurück zum Zitat Breiman, L.: Random forests. Mach. Learn. p. 45, 5–32 (2001). https://doi.org/10.1023/A:1010933404324CrossRef
22.
Zurück zum Zitat Pal, M.: Random forest classifier for remote sensing classification. Int. J. Remote Sens. p. 26(1), 217–222 (2005). https://doi.org/10.1080/01431160412331269698CrossRef
23.
Zurück zum Zitat Probst, P., Wright, M.N., Boulesteix, A.L.: Hyperparameters and tuning strategies for random forest, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 9, no. 3, (2019). https://doi.org/10.1002/widm.1301
24.
Zurück zum Zitat Liaw, Wiener, M.: Classification and regression by random forest. R News, 2, 3, (2002)
25.
Zurück zum Zitat Vargas-Lopez, O., Perez-Ramirez, C.A., Valtierra-Rodriguez, M., Yanez-Borjas, J.J., Amezquita-Sanchez, J.P.: An explainable machine learning approach based on statistical indexes and SVM for stress detection in automobile drivers using electromyographic signals. Sensors. p. 21(9) (2021). https://doi.org/10.3390/s21093155
26.
Zurück zum Zitat Bosch, A., Zisserman, Munoz, X.: Image classification using random forests and ferns, Proceedings of the IEEE 11th International Conference on Computer Vision, pp. 1–8, (2007)
27.
Zurück zum Zitat Eddy current examination of nonferromagnetic heat exchanger tubing: ASME Sect. V, (2017). Article 8, ASME International (BPVC)
28.
Zurück zum Zitat Muller, M.: Dynamic time warping. Inform. Retr. Music Motion (Springer). p. 69–84 (2007). https://doi.org/10.1007/978-3-540-74048-3_4
29.
Zurück zum Zitat Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees, 1st edn. Chapman & Hall/CRC, London (1984). https://doi.org/10.1201/9781315139470CrossRef
30.
Zurück zum Zitat Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B.: Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. p. 12, 9–12 (2011)MathSciNet
31.
Zurück zum Zitat Chawla N.V., Bowyer K.W., Hall L.O., Kegelmeyer W.P.: SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. p. 16, 321–357 (2002). https://doi.org/10.1613/jair.953
32.
Zurück zum Zitat Syarif, A., Prugel-Bennett, Wills, G.: SVM parameter optimization using grid search and genetic algorithm to improve classification performance, TELKOMNIKA (Telecommunication Computing Electronics and Control), 14, 4, pp. 1502–1509, (2016)
33.
Zurück zum Zitat Tsang, W., Kwok, J.T., Cheung, P.M., Cristianini, N.: Core vector machines: Fast SVM training on very large data sets. J. Mach. Learn. Res. p. 6(4), 363–392 (2005)MathSciNet
34.
Zurück zum Zitat Takhirov, Z., Wang, J., Louis, M.S., Saligrama, V., Joshi, A.: Field of groves: An energy-efficient random forest, Submitted as work in progress to Distributed, Parallel, and Cluster Computing, (2017). https://doi.org/10.48550/arXiv.1704.02978

JOT - Journal für Oberflächentechnik (Link öffnet in neuem Fenster)

Das führende Magazin für sämtliche Themen in der Oberflächentechnik.
Für Entscheider und Anwender aus allen Bereichen der Industrie.

    Bildnachweise
    Wagner Logo/© J. Wagner GmbH, Harter Drying Solutions/© HARTER GmbH, Cenaris Logo/© CENARIS GmbH, Ecoclean Logo/© SBS Ecoclean Group, Eisenmann Logo/© EISENMANN GmbH, L&S Logo/© L&S Oberflächentechnik GmbH & Co. KG, FreiLacke Logo/© Emil Frei GmbH & Co. KG, Afotek Logo/© @AFOTEK Anlagen für Oberflächentechnik GmbH, Fischer Logo/© Helmut Fischer GmbH, Venjakob Logo/© VENJAKOB Maschinenbau GmbH & Co. KG, Nordson Logo/© Nordson Deutschland GmbH, JOT - Journal für Oberflächentechnik, Chemetall und ZF optimieren den Vorbehandlungsprozess/© Chemetall