Learning to classify patterns from labeled examples and predicting discrete classes in a new unseen set is the main goal of supervised machine learning (ML). Starting from the notion of statistical learning, different losses can be potentially tailored for solving specific learning tasks in different challenging pattern recognition domains, from industrial [
1,
2] to diagnostic applications [
3,
4]. In the context of Industry 4.0, the increasing availability of data, the advancements in computing power and breakthroughs in algorithm development have led ML and deep learning (DL) methodologies to develop appealing solutions in different industrial areas such as predictive maintenance [
1,
5], decision support system (DSS) [
6] and quality control (QC) [
2,
7,
8]. Indeed, the QC task has quickly established itself as one of the most crucial and challenging industrial 4.0 scenarios [
9], where the main objective is to detect production issues and classify the quality of the final product. The quality monitoring of the instrumentation/products/materials may potentially enable manufacturers to support the technicians during the process while reducing resource costs and improving productivity [
10].
The application of ML and DL techniques offers great opportunities to automatize the overall QC process [
11]. Indeed, these methodologies for QC have been employed in several industrial areas, but state-of-the-art is mainly oriented to present ad hoc rather than vanilla ML solutions capable of dealing with challenges of this domain, namely the intrinsic variability of the annotation procedure and the difficulty to generalize across different sets. The aesthetic quality control (AQC) task is a non-metric QC task where the aesthetic aspect of the material is not measurable and is based on expert observation. In this domain, the classes of the target variable often exhibit a natural ordering. However, the natural ordinal structure of the problem is not usually exploited and modeled in the learning procedure. For these reasons, the state-of-the-art solutions include standard classification and regression models that do not completely solve the ordinal structure of the AQC task. This gap in the scientific literature lies the foundations to introduce a DL-based DSS driven by ordinal constraints for solving an AQC task. The proposed approach allows penalizing the misclassification errors that are further from the correct AQC classes. This outcome is also in line with the industrial demands in order to provide a DL-based DSS for AQC that is as aligned as possible with the human operator (human agency and oversight [
12]).
1.1 Aesthetic quality control task
Quality control (QC) is a growing area in Industry 4.0 and a fundamental step for detecting production issues and for classifying the compliance of the finished product. Recently, the increasing amount of data in this scenario offers a great opportunity for ML and DL techniques to be the main core of a DSS that is able to automatize the overall QC process, saving time and resources and maximizing the performances, while easily generalizing in different contexts. As evidence of this, these methodologies for QC task have been employed in different domains. In the fabric and textile industry, DL approaches have been applied to perform leather and stitching classification, avoiding the operator visual inspection phase to identify stitching defects on material surfaces [
13]. In the printing industry, a deep neural network soft sensor has been proposed, which compares the scanned surface to the used engraving file and performs an automatic quality control process by learning features through exposure to training data [
14]. In the automotive industry, a DL-based approach has been adopted for automatic fault detection and isolation [
15] and for the quality control of complex multistage manufacturing processes, where the product dimensional variability is a crucial factor and undetected defects can easily be propagated downstream [
16]. All of these solutions focus on quantitative and deterministic analyses: the dimensional control, the inspection of the roughness of materials, the patterned fabric defect detection and the test of production parameters are all measurable evaluation procedures. In our previous work [
17], we dealt with an unexplored and challenging QC application, which is the aesthetic evaluation of material, introducing the topic of aesthetic quality control (AQC) task. In this case, the DL algorithm should model all those qualitative analyses that are strictly human dependent, subjective and not directly measurable: this aspect clearly increases the complexity of the classification task, and it becomes more and more apparent as the number of classes to be considered is higher. As demonstrated, approaching this problem with a nominal DL classification method (which does not exploit class order) causes a substantial drop in accuracy performance and an increase in misclassification errors even between widely distant classes, which represents the main fault from the industrial production perspective. Considering the ordinal nature of the problem, these issues can be addressed by overcoming the limitation of the nominal approach, in which the classes are not arranged in an appropriate ordered scale, by exploiting the gradual rank of the dataset classes with specific methodologies for ordinal classification.
1.2 Ordinal classification
Recently ordinal classification (also called ordinal regression) methods have been proven useful in different research areas, including medical research [
18‐
20], computer vision [
21‐
23], finance application [
24], and environmental management [
25]. An extensive review of ordinal classification approaches was provided in [
26]. However, the introduction of these methodologies for solving an AQC task is not still explored in the ML literature. It is worth noting that ordinal classification approaches differentiate from the multipartite ranking problems where learning to rank strategy is applied to automatically construct a ranking model from training data [
27,
28]. The multipartite ranking problem represents the state-of-the-art in many information retrieval applications [
29]. Although ordinal classification can be potentially scaled for solving a multipartite ranking problem, they are pointwise approaches for classifying data, where a naturalistic order is encoded in the label.
Ordinal classification problems can be easily simplified into other standard problems using the round prediction of a regression model or a cost-sensitive penalty. These are considered standard approaches for solving the ordinal classification task, with the main limitation that they assume a distance between class labels which can influence the performance of the classifier. A specific method based on a cost-sensitive ordinal hyperplanes ranking algorithm has been used for human age estimation using face images [
22]. The authors designed the cost of an individual binary classifier so that the misranking cost can be bounded by the total misclassification costs.
Another class of ordinal-based approaches is the ordinal decomposition strategy. Within this category, the multiple model approaches use several binary classification branches to compute a series of cumulative probabilities. Although this approach introduces a large number of hyperparameters to be tuned, there are some works [
20] that try to reduce the effect of this problem, by redesigning the output layer of the conventional deep neural network. Moreover, in the ordinal decomposition approaches, the relationships among different binary classifiers are often neglected. To try to alleviate this issue, it was proposed to learn an ordinal distribution of the problem and to optimize those binary classifiers simultaneously [
30]. Similarly, a multiple ordinal regression algorithm to estimate the preferences of humans was proposed [
31]. They maximized the sum of the margins between every consecutive class with respect to one or more rankings (e.g., perceived length and weight). An ordinal decomposition approach combined with a fully 3D convolutional neural network (CNN) network was used for assessing the level of neurological damage in Parkinson’s disease (PD) patients and exploring the potential classification performance improvement in using ordinal label information [
18]. A standard sigmoid function is provided in the output node, rather than using a softmax function for the output nodes. They trained a single convolutional model for solving simultaneously individual binary classification tasks, which were treated as multiple fully connected blocks.
The most natural strategy to handle the ordinal structure extends the standard regression task by assuming that a latent variable underlies the ordinal classes. In this general approach, called the threshold model, both the latent variable and the thresholds, which act, respectively, as a mapping function and ordinal constraints, need to be learned from the data. A threshold-based loss function is designed to model the ordinal values among multiple output variables [
32]. The authors applied the kernel trick to provide a nonlinear extension of the model. Another work presented a structural distance metric for video-based face recognition [
23]. Here the ordinal problem is designed as a non-convex integer program problem that firstly learns stable ordinal filters by projecting video data into a large-marginal ordinal space and then self-corrected the projected data in a structure low-rank strategy. A large margin ordinal regression formulation was also provided as a feature selection strategy for detecting minimum and maximum feature relevance bounds by inducing sparsity in the model [
33]. The authors in [
34] proposed the introduction of the
lp-norm for deriving the ordinal threshold with class centers with the aim to alleviate the influence of outliers (i.e., non-i.i.d. noises). Their approach provided an optimization algorithm and corresponding convergence analysis for computing the lp-centroid. In [
35], two neural network threshold ensemble models were proposed for ordinal regression problems. They generated a different formulation of the learned threshold by generating different projections for the parameter updating. Another approach consists in imposing the ordinal constraints on the weights that connect the hidden layer with the output layer [
36]. The formulation allows determining the optimum ones analytically according to the closed-form solution of the inequality constrained least-squares problem estimated from the Karush–Kuhn–Tucker conditions. In [
37] is proposed a deep convolutional neural network model for ordinal regression by considering a family of probabilistic ordinal link functions in the output layer. These ordinal link functions fall within cumulative link models (CLMs). They split the ordinal space into the different classes of the problem by using a set of ordered thresholds. The thresholds are learned during the training process by minimizing a loss function that takes into account the distance between the categories, based on the weighted Kappa index.
Other ordinal approaches include ensemble decision tree and random forest models [
19,
38] based on a weighted entropy function for selecting the predictors in the tree that reflect the magnitude of potential classification errors. A different approach based on conditional ordinal random field model was proposed for context-sensitive modeling of the facial action unit intensity by answering the context question in terms of temporal correlation between the ordinal outputs [
21].
1.3 Limitation of state-of-the-art
Similar to the regression model, the main problem of standard ordinal classification approaches based on regression is the lack of a direct relationship between the prediction error of the regression model and the misclassification error. A different problem arises for the cost-sensitive penalty approach where there is the need to have a priori knowledge of the task in order to properly define the cost matrix. Accordingly, the ordinal binary decomposition approaches are highly influenced by how the overall problem is decomposed and how the results of all decompositions are aggregated into a single final classification. Some recent work in literature tried to overcome these problems by learning a single model for solving simultaneously individual binary classification tasks. However, these methodologies only model a static relationship among the ordinal classes that originate on how the problem is decomposed in binary subtasks. The threshold-based models proposed in literature often require multiple hyperparameters for setting the ordinal probability thresholds. Indeed, most of the state-of-the-art threshold-based approaches require highly demanding optimization procedures, which do not always guarantee optimal convergence and robustness against outliers.
The most related work to our proposal is the paper [
37] that introduced the CLMs and quadratic weight kappa for solving an ordinal problem. The main differences with our work lie in the (i) loss function we adopted, (ii) the different hyperparameters (i.e., slope) we learned in the learning process, (iii) a different unexplored task we aim to solve (AQC task) and (iv) the multiple objectives we aim to achieve, i.e., both an increase in generalization performance and also mitigation of unwanted bias related to the geometry. Indeed, we solved the ordinal problem by modeling the cumulative distribution of the AQC classes through the hyperparameters we learn in the CLM. Moreover, in our work, we exploited the standard cross-entropy loss for solving the ordinal AQC problem. As we shall see in results section, our deep ordinal model performs favorably over the CLMs for deep ordinal classification in [
37].
1.4 Main contributions
To summarize, the main contributions of this paper are:
-
the introduction of a deep learning methodology for ordinal classification specifically tailored for solving a topical and unexplored challenge on Industry 4.0, i.e., the aesthetic quality control classification. We introduced a novel dataset for the evaluation of wooden stocks. The task at the basis of the overall project originated from a specific company’s demand;
-
the introduction of a deep learning methodology for ordinal classification based on cumulative link model and categorical cross-entropy. We demonstrated how there is a sort of redundancy between the maximization of an ordinal loss and the modeling of cumulative distribution. The proposed approach overcomes this limitation by combining categorical cross-entropy with the cumulative link model and imposing the ordinal constraint via the thresholds and slope parameters. The introduction of the slope is effective in order to model the transient between adjacent cumulative link functions.
-
the demonstration on how the proposed methodology is able from one side to reduce misclassification errors among distant classes (which is a relevant aspect for the real use case) and from the other side to reduce the bias factor related to the geometry. This fact has been demonstrated through an insightful explanation of the proposed DL behavior on the most discriminative shotgun parts. The ordinal constraints allow the network to learn the characteristics that properly describe the quality of shotgun (i.e., wood grains), rather than other confounds/bias characteristics (e.g., geometry).
The rest of this paper is organized as follows: Section
2 introduces the novel dataset for solving the quality control task, i.e., the evaluation of wooden stocks; in Sect.
3 the proposed deep ordinal method is described; in Sect.
4 the evaluation procedure with respect to the state-of-the-art models is reported; in Sect.
5 the results are presented; in Sect.
6 the integration of the proposed approach in a decision support system is reported; and finally, in Sect.
7 the conclusions, limitations and future work of the proposed approach are discussed.