Skip to main content
Top

2021 | Book

Pattern Recognition. ICPR International Workshops and Challenges

Virtual Event, January 10–15, 2021, Proceedings, Part IV

Editors: Prof. Alberto Del Bimbo, Prof. Rita Cucchiara, Prof. Stan Sclaroff, Dr. Giovanni Maria Farinella, Tao Mei, Prof. Dr. Marco Bertini, Hugo Jair Escalante, Dr. Roberto Vezzani

Publisher: Springer International Publishing

Book Series : Lecture Notes in Computer Science

insite
SEARCH

About this book

This 8-volumes set constitutes the refereed of the 25th International Conference on Pattern Recognition Workshops, ICPR 2020, held virtually in Milan, Italy and rescheduled to January 10 - 11, 2021 due to Covid-19 pandemic. The 416 full papers presented in these 8 volumes were carefully reviewed and selected from about 700 submissions. The 46 workshops cover a wide range of areas including machine learning, pattern analysis, healthcare, human behavior, environment, surveillance, forensics and biometrics, robotics and egovision, cultural heritage and document analysis, retrieval, and women at ICPR2020.

Table of Contents

Frontmatter

FGVRID - Fine-Grained Visual Recognition and re-Identification

Frontmatter
Densely Annotated Photorealistic Virtual Dataset Generation for Abnormal Event Detection

Many timely computer vision problems, such as crowd event detection, individual or crowd activity recognition, person detection and re-identification, tracking, pose estimation, segmentation, require pixel-level annotations. This involves significant manual effort, and is likely to face challenges related to the privacy of individuals, due to the intrinsic nature of these problems, requiring in-depth identifying information. To cover the gap in the field and address these issues, we introduce and make publicly available a photorealistic, synthetically generated dataset, with detailed dense annotations. We also publish the tool we developed to generate it, that will allow users to not only use our dataset, but expand upon it by building their own densely annotated videos for many other computer vision problems. We demonstrate the usefulness of the dataset with experiments on unsupervised crowd anomaly detection in various scenarios, environments, lighting, weather conditions. Our dataset and the annotations provided with it allow its use in numerous other computer vision problems, such as pose estimation, person detection, segmentation, re-identification and tracking, individual and crowd activity recognition, and abnormal event detection. We present the dataset as is, along with the source code and tool to generate it, so any modification can be made and new data can be created. To our knowledge, there is currently no other photorealistic, densely annotated, realistic, synthetically generated dataset for abnormal crowd event detection, nor one that allows for flexibility of use by allowing the creation of new data with annotations for many other computer vision problems. Dataset and source code available: https://github.com/RicoMontulet/GTA5Event .

Rico Montulet, Alexia Briassouli
Unsupervised Domain Adaptive Re-Identification with Feature Adversarial Learning and Self-similarity Clustering

In this paper, we propose a novel unsupervised domain adaptation re-ID framework by fusing feature adversarial learning and self-similarity clustering. Different from most of the existing works which only regard the source domain data as network pretraining data, we use the source domain data both in network pretraining and finetuing stage. Concretely, we construct an feature adversarial learning module to learn domain invariant feature representations. The feature extractor network is optimized in an adversarial training manner through minimizing the discrepancy of feature representations between source and target domains. To further enhance the discriminability of the feature extractor network, we design the self-similarity clustering module to mine the implicit similarity relationships among the unlabeled samples of the target domain. By unsupervised clustering, we can generate pseudo-identity labels for the target domain data, which are then combined with the labeled source data together to train the feature extractor network. Additionally, we present a relabeling algorithm to construct correspondence between two groups of pseudo-identity labels generated by two iterative clusterings. Experimental results validate the effectiveness of our method.

Tianyi Yan, Haiyun Guo, Songyan Liu, Chaoyang Zhao, Ming Tang, Jinqiao Wang
A Framework for Jointly Training GAN with Person Re-Identification Model

To cope with the problem caused by inadequate training data, many person re-identification (re-id) methods exploited generative adversarial networks (GAN) for data augmentation, where the training of GAN is typically independent of that of the re-id model. The coupling relation between them which probably brings in a performance gain of re-id is thus ignored. In this work, we propose a general framework to jointly train GAN and the re-id model. It can simultaneously achieve the optima of both the generator and the re-id model, where the training is guided by each other through a discriminator. The re-id model is boosted for two reasons: 1) The adversarial training that encourages it to fool the discriminator; 2) The generated samples that augment the training data. Extensive results on benchmark datasets show that for the re-id model trained with the identification loss as well as the triplet loss, the proposed joint training framework outperforms existing methods with separated training and achieves state-of-the-art re-id performance.

Zhongwei Zhao, Ran Song, Qian Zhang, Peng Duan, Youmei Zhang
Interpretable Attention Guided Network for Fine-Grained Visual Classification

Fine-grained visual classification (FGVC) is challenging but more critical than traditional classification tasks. It requires distinguishing different subcategories with the inherently subtle intra-class object variations. Previous works focus on enhancing the feature representation ability using multiple granularities and discriminative regions based on the attention strategy or bounding boxes. However, these methods highly rely on deep neural networks which lack interpretability. We propose an Interpretable Attention Guided Network (IAGN) for fine-grained visual classification. The contributions of our method include: i) an attention guided framework which can guide the network to extract discriminitive regions in an interpretable way; ii) a progressive training mechanism obtained to distill knowledge stage by stage to fuse features of various granularities; iii) the first interpretable FGVC method with a competitive performance on several standard FGVC benchmark datasets.

Zhenhuan Huang, Xiaoyue Duan, Bo Zhao, Jinhu Lü, Baochang Zhang
Use of Frequency Domain for Complexity Reduction of Convolutional Neural Networks

The implementation of convolutional neural networks (CNNs) is not easy because of the high number of parameters that these networks have. Researchers have applied numerous approaches to reduce the complexity of convolutional networks. Quantization of the weights and pruning are two complexity reduction methods. A new paradigm for accelerating CNNs operations and simplification of the network is to perform all the computations in the Fourier domain. Using a fast Fourier transform (FFT) can simplify the operations by converting the convolution operation into multiplication. Different approaches can be taken for the simplification of computations in FFT. Our approach in this paper is to let the CNN operate in the FFT domain by splitting the input. There are problems in the computation of FFT using small kernels. Splitting is an effective solution for small kernels. The splitting reduces the redundancy that is caused by the overlap-and-add, and hence, the network’s efficiency is increased. Hardware implementation of the proposed FFT method and complexity analysis of the hardware demonstrate the proper performance of the proposed approach.

Kamran Chitsaz, Mohsen Hajabdollahi, Pejman Khadivi, Shadrokh Samavi, Nader Karimi, Shahram Shirani
From Coarse to Fine: Hierarchical Structure-Aware Video Summarization

Hierarchical structure is a common characteristic of some kinds of videos (e.g., sports videos, game videos): the videos are composed of several actions hierarchically and there exists temporal dependencies among segments of different scales, where action labels can be enumerated. Our ideas are based on two intuition: First, the actions are the fundamental units for people to understand these videos. Second, the process of summarization is naturally one of observation and refinement, i.e., observing segments in video and hierarchically refining the boundaries of an important action according to video hierarchical structure. Based on above insights, we generate action proposals to exploit the structure and formulate the summarization process as a hierarchical refining process. We also train a hierarchical summarization network with deep Q-learning (HQSN) to achieve the refining process and explore temporal dependency. Besides, we collect a new dataset that consists of structured game videos with fine-grain actions and importance annotations. The experimental results demonstrate the effectiveness of our framework.

Wenxu Li, Gang Pan, Chen Wang, Zhen Xing, Xiaozhou Zhou, Xiaoxuan Dong, Jiawan Zhang
ADNet: Temporal Anomaly Detection in Surveillance Videos

Anomaly detection in surveillance videos is an important research problem in computer vision. In this paper, we propose ADNet, an anomaly detection network, which utilizes temporal convolutions to localize anomalies in videos. The model works online by accepting consecutive windows of video clips. Features extracted from video clips in a window are fed to ADNet, which allows to localize anomalies in videos effectively. We propose the AD Loss function to improve abnormal segment detection performance of ADNet. Additionally, we propose to use F1@k metric for temporal anomaly detection. F1@k is a better evaluation metric than AUC in terms of not penalizing minor shifts in temporal segments and punishing short false positive temporal segment predictions. Furthermore, we extend UCF Crime [29] dataset by adding two more anomaly classes and providing temporal anomaly annotations for all classes. Finally, we thoroughly evaluate our model on the extended UCF Crime dataset. ADNet produces promising results with respect to F1@k metric. Code and dataset extensions are publicly at https://github.com/hibrahimozturk/temporal_anomaly_detection .

Halil İbrahim Öztürk, Ahmet Burak Can
Soft Pseudo-labeling Semi-Supervised Learning Applied to Fine-Grained Visual Classification

Pseudo-labeling is a simple and well known strategy in Semi-Supervised Learning with neural networks. The method is equivalent to entropy minimization as the overlap of class probability distribution can be reduced minimizing the entropy for unlabeled data. In this paper we review the relationship between the two methods and evaluate their performance on Fine-Grained Visual Classification datasets. We include also the recent released iNaturalist-Aves that is specifically designed for Semi-Supervised Learning. Experimental results show that although in some cases supervised learning may still have better performance than the semi-supervised methods, Semi Supervised Learning shows effective results. Specifically, we observed that entropy-minimization slightly outperforms a recent proposed method based on pseudo-labeling.

Daniele Mugnai, Federico Pernici, Francesco Turchini, Alberto Del Bimbo

HCAU 2020 - The First International Workshop on Deep Learning for Human-Centric Activity Understanding

Frontmatter
Spot What Matters: Learning Context Using Graph Convolutional Networks for Weakly-Supervised Action Detection

The dominant paradigm in spatiotemporal action detection is to classify actions using spatiotemporal features learned by 2D or 3D Convolutional Networks. We argue that several actions are characterized by their context, such as relevant objects and actors present in the video. To this end, we introduce an architecture based on self-attention and Graph Convolutional Networks in order to model contextual cues, such as actor-actor and actor-object interactions, to improve human action detection in video. We are interested in achieving this in a weakly-supervised setting, i.e. using as less annotations as possible in terms of action bounding boxes. Our model aids explainability by visualizing the learned context as an attention map, even for actions and objects unseen during training. We evaluate how well our model highlights the relevant context by introducing a quantitative metric based on recall of objects retrieved by attention maps. Our model relies on a 3D convolutional RGB stream, and does not require expensive optical flow computation. We evaluate our models on the DALY dataset, which consists of human-object interaction actions. Experimental results show that our contextualized approach outperforms a baseline action detection approach by more than 2 points in Video-mAP. Code is available at https://github.com/micts/acgcn .

Michail Tsiaousis, Gertjan Burghouts, Fieke Hillerström, Peter van der Putten
Social Modeling Meets Virtual Reality: An Immersive Implication

The development of novel techniques for social modeling in the context of surveillance applications has significantly reduced manual processing of large and continuous video data. These techniques for social modeling widely cover crowd motion analysis since the impact of social modeling on crowd is significant. However, existing crowd motion analysis methods face a number of problems including limited availability of crowd data representing a specific behavior and weaknesses of proposed models to explore the underlying patterns of crowd behavior. To cope with these problems, we propose a novel method based on energy modeling and social interaction of individual particles in crowd to detect unusual behavior. Our method describes collective dissipative interactions among particles in a crowd scene. We reveal the changing patterns about the crowd behavior states, to support the conversion between different social behaviors during evolution. To further improve the performance of our method, virtual reality can be considered to consolidate the acquisition of data associated with a particular behavior. Therefore, we provide theoretical background of immersive implication considering virtual reality that can expose individuals to virtual crowds and acquire useful data on human motion and behaviors in crowds. The experimental evaluation of our energy and social interaction driven method shows convincing results.

Habib Ullah, Sultan Daud Khan, Mohib Ullah, Faouzi Alaya Cheikh
Pickpocketing Recognition in Still Images

Human activity recognition (HAR) is a challenging topic in the computer vision field. Pickpocketing is a type of human criminal actions. It needs extensive research and development for detection. This paper researches how it’s possible of pickpocketing recognition in still images. This paper takes consideration both of classification and detection. We develop our models from state-of-art pre-trained models: VGG16, ResNet50, ResNet101, and ResNet152. Moreover, we also include a convolutional block attention module (CBAM [27]) in the model. The attention mechanism enhances model performances by focusing on informative features. For classification, the highest accuracy (89%) is ResNet152 with CBAM [27] (ResNet152+CBAM). We also examine pickpocketing detection on RetinaNet [14] and YOLOv.3 [34]. The mean average precision (mAP) of pickpocketing detection is consistent with Redmon et al. [34]. RetinaNet’s precision (80 mAP) is higher than YOLOv.3 (78 mAP), but YOLOv.3 is much faster detection. ResNet152+CBAM model detection on RetinaNet approach provides the highest mAP. However, it is much slower detection than YOLOv.3 (only 10 ms). This paper proves that It is possible to implement pickpocketing on still images in a reliable time and with outstanding accuracy. This proposed model possibly apply to the other HAR tasks.

Prisa Damrongsiri, Hossein Malekmohamadi
t-EVA: Time-Efficient t-SNE Video Annotation

Video understanding has received more attention in the past few years due to the availability of several large-scale video datasets. However, annotating large-scale video datasets are cost-intensive. In this work, we propose a time-efficient video annotation method using spatio-temporal feature similarity and t-SNE dimensionality reduction to speed up the annotation process massively. Placing the same actions from different videos near each other in the two-dimensional space based on feature similarity helps the annotator to group-label video clips. We evaluate our method on two subsets of the ActivityNet (v1.3) and a subset of the Sports-1M dataset. We show that t-EVA ( https://github.com/spoorgholi74/t-EVA ) can outperform other video annotation tools while maintaining test accuracy on video classification.

Soroosh Poorgholi, Osman Semih Kayhan, Jan C. van Gemert
Vision-Based Fall Detection Using Body Geometry

Falling is a major health problem that causes thousands of deaths every year, according to the World Health Organization. Fall detection and fall prediction are both important tasks that should be performed efficiently to enable accurate medical assistance to vulnerable population whenever required. This allows local authorities to predict daily health care resources and reduce fall damages accordingly. We present in this paper a fall detection approach that explores human body geometry available at different frames of the video sequence. Especially, the angular information and the distance between the vector formed by the head -centroid of the identified facial image- and the center hip of the body, and the vector aligned with the horizontal axis of the center hip, are then used to construct distinctive image features. A two-class SVM classifier is trained on the newly constructed feature images, while a Long Short-Term Memory (LSTM) network is trained on the calculated angle and distance sequences to classify falls and non-falls activities. We perform experiments on the Le2i fall detection dataset and the UR FD dataset. The results demonstrate the effectiveness and efficiency of the developed approach.

Beddiar Djamila Romaissa, Oussalah Mourad, Nini Brahim, Bounab Yazid
Comparative Analysis of CNN-Based Spatiotemporal Reasoning in Videos

Understanding actions and gestures in video streams requires temporal reasoning of the spatial content from different time instants, i.e., spatiotemporal (ST) modeling. In this survey paper, we have made a comparative analysis of different ST modeling techniques for action and gesture recognition tasks. Since Convolutional Neural Networks (CNNs) are proved to be an effective tool as a feature extractor for static images, we apply ST modeling techniques on the features of static images from different time instants extracted by CNNs. All techniques are trained end-to-end together with a CNN feature extraction part and evaluated on two publicly available benchmarks: The Jester and the Something-Something datasets. The Jester dataset contains various dynamic and static hand gestures, whereas the Something-Something dataset contains actions of human-object interactions. The common characteristic of these two benchmarks is that the designed architectures need to capture the full temporal content of videos in order to correctly classify actions/gestures. Contrary to expectations, experimental results show that Recurrent Neural Network (RNN) based ST modeling techniques yield inferior results compared to other techniques such as fully convolutional architectures. Codes and pretrained models of this work are publicly available ( https://github.com/fubel/stmodeling ).

Okan Köpüklü, Fabian Herzog, Gerhard Rigoll
Generalization of Fitness Exercise Recognition from Doppler Measurements by Domain-Adaption and Few-Shot Learning

In previous works, a mobile application was developed using an unmodified commercial smartphone to recognize whole-body exercises. The working principle was based on the ultrasound Doppler sensing with the device built-in hardware. Applying such a lab environment trained model on realistic application variations causes a significant drop in performance, and thus decimate its applicability. The reason of the reduced performance can be manifold. It could be induced by the user, environment, and device variations in realistic scenarios. Such scenarios are often more complex and diverse, which can be challenging to anticipate in the initial training data. To study and overcome this issue, this paper presents a database with controlled and uncontrolled subsets of fitness exercises. We propose two concepts to utilize small adaption data to successfully improve model generalization in an uncontrolled environment, increasing the recognition accuracy by two to six folds compared to the baseline for different users.

Biying Fu, Naser Damer, Florian Kirchbuchner, Arjan Kuijper
Local Anomaly Detection in Videos Using Object-Centric Adversarial Learning

We propose a novel unsupervised approach based on a two-stage object-centric adversarial framework that only needs object regions for detecting frame-level local anomalies in videos. The first stage consists in learning the correspondence between the current appearance and past gradient images of objects in scenes deemed normal, allowing us to either generate the past gradient from current appearance or the reverse. The second stage extracts the partial reconstruction errors between real and generated images (appearance and past gradient) with normal object behaviour, and trains a discriminator in an adversarial fashion. In inference mode, we employ the trained image generators with the adversarially learned binary classifier for outputting region-level anomaly detection scores. We tested our method on four public benchmarks, UMN, UCSD, Avenue and ShanghaiTech and our proposed object-centric adversarial approach yields competitive or even superior results compared to state-of-the-art methods.

Pankaj Raj Roy, Guillaume-Alexandre Bilodeau, Lama Seoud
A Hierarchical Framework for Motion Trajectory Forecasting Based on Modality Sampling

In this paper, we present a hierarchical framework for multi-modal trajectory forecasting, which can provide for each pedestrian in the scene the distributions for the next moves at every time step. The overall architecture adopts a standard encoder-decoder paradigm, where the encoder is based on a self-attention mechanism to extract the temporal features of motion histories, while the decoder is built upon a stack of LSTMs to generate the future path sequentially. The model is learned in a discriminative manner, with the purpose of differentiating among varied motion modalities. To this end, we propose a clustering strategy to construct the so-called transformation set. The transformation set collaborates with the hierarchical LSTMs in the decoder, in order to approximate the real distributions in the training data. Experimental results demonstrate that the proposed framework can not only predict the future trajectory accurately, but also provide multi-modal trajectory distributions explicitly.

Yifan Ma, Bo Zhang, Nicola Conci, Hongbo Liu
Skeleton-Based Methods for Speaker Action Classification on Lecture Videos

The volume of online lecture videos is growing at a frenetic pace. This has led to an increased focus on methods for automated lecture video analysis to make these resources more accessible. These methods consider multiple information channels including the actions of the lecture speaker. In this work, we analyze two methods that use spatio-temporal features of the speaker skeleton for action classification in lecture videos. The first method is the AM Pose model which is based on Random Forests with motion-based features. The second is a state-of-the-art action classifier based on a two-stream adaptive graph convolutional network (2S-AGCN) that uses features of both joints and bones of the speaker skeleton. Each video is divided into fixed-length temporal segments. Then, the speaker skeleton is estimated on every frame in order to build a representation for each segment for further classification. Our experiments used the AccessMath dataset and a novel extension which will be publicly released. We compared four state-of-the-art pose estimators: OpenPose, Deep High Resolution, AlphaPose and Detectron2. We found that AlphaPose is the most robust to the encoding noise found in online videos. We also observed that 2S-AGCN outperforms the AM Pose model by using the right domain adaptations.

Fei Xu, Kenny Davila, Srirangaraj Setlur, Venu Govindaraju

IADS - Integrated Artificial Intelligence In Data Science

Frontmatter
Fake Review Classification Using Supervised Machine Learning

The revolution of social media has propelled the online community to take advantage of online reviews for not only posting feedback about the products, services, and other issues but also assists individuals to analyze user’s feedback for making purchase decisions, and companies for improving the quality of manufactured goods. However, the propagation of fake reviews has become an alarming issue, as it deceives online users while purchasing and promotes or demotes the reputation of competing brands. In this work, we propose a supervised learning-based technique for the detection of fake reviews from the online textual content. The study employs machine learning classifiers for bifurcating fake and genuine reviews. Experimental results are evaluated against different evaluation measures and the performance of the proposed system is compared with baseline works.

Hanif Khan, Muhammad Usama Asghar, Muhammad Zubair Asghar, Gautam Srivastava, Praveen Kumar Reddy Maddikunta, Thippa Reddy Gadekallu
Defect Detection of Stainless Steel Plates Using Deep Learning Technology

In the era of industry 4.0, factories around the world are developing towards automation and artificial intelligence, in which industrial detection plays an important role. After the cutting process, the surface of a stainless steel plate may produce various defects, such as scratches, chisels, and stains. Due to the characteristic of bright reflections on the surface of a stainless steel plate, the traditional manual comparison detection method is time-consuming, laborious, and prone to different detection results due to the interference of high reflection, resulting in the outflow of defective products. This paper used existing mature deep learning models for object detection, YOLOv3 (You Only Look Once) and SSD (Single Shot MultiBox Detector), which are the base network architectures for the defect detection of stainless steel plates, in order to effectively improve the accuracy of stainless steel plate detection. Through image preprocessing, the relative positions of sample defects are marked to improve data processing before training, in order that a large number of image samples can be quickly and effectively processed for training.

Yu-Jen Huang, Ko-Wei Huang, Shih-Hsiung Lee
Deep Neural Networks for Detecting Real Emotions Using Biofeedback and Voice

When people are in an interview, with the interview questions, people’s emotions will change differently. Therefore, it is very helpful to detect people’s emotions in real-time. To do so, comprehensive data collection was performed through the voice recording platform and the Empatica E4 wristband (biofeedback). Also, through using both existing feed-forward deep neural network technology and machine learning, we implemented an artificial deep neural network that aims to detect real emotions using multiple sensors: voice and biometrics. The artificial deep neural network we implemented consistently achieved an accuracy of 85% in our testing set and 79% in validation sets to determine the emotional scale. The research also assists with understanding how to detect emotional ranges and the important role that it plays in interviews and conversations.

Mohammed Aledhari, Rehma Razzak, Reza M. Parizi, Gautam Srivastava
Data Augmentation for a Deep Learning Framework for Ventricular Septal Defect Ultrasound Image Classification

Congenital heart diseases (CHD) can be detected through ultrasound imaging. Although ultrasound can be used for immediate diagnosis, doctors require considerable time to read dynamic clips; typically, physicians must continuously examine disease data from beating heart images. Most importantly, this type of diagnosis relies heavily on the expertise and experience of the diagnosing physician. This study established an ultrasound image classification with deep learning algorithms to overcome the challenges involved in CHD diagnosis. We detected the most common CHD, namely the first, second, and fourth types of ventricular septal defect (VSD). We improved the performance levels of well-known deep learning algorithms (InceptionV3, ResNet, and DenseNet). Because algorithm optimization and overfitting problems can influence the performance of deep learning algorithms, we studied some optimizer algorithms and early-stopping strategies. To enhance the solution quality, we used data augmentation methods for solving this classification problem. The selected approach was further compared with Google AutoML, which applies structure search for quality prediction. Our results revealed that the proposed deep learning algorithm was able to recognize most types of VSD. However, one type of VSD remains unconquered and warrants more advanced techniques.

Shih-Hsin Chen, I-Hsin Tai, Yi-Hui Chen, Ken-Pen Weng, Kai-Sheng Hsieh
A Neural Network Model for Lead Optimization of MMP12 Inhibitors

Lead Optimization is a complex process, whereby a large number of interacting entities give rise to molecular structures whose properties should be optimized in order to be considered for drug development. We will study molecular systems that are characterized by high dimensionality and dynamically interacting networks with the goal of discovering the optimal molecules with respect to the set of essential properties. Currently, the research involves the screening and the identification of molecule with desirable properties from large molecule libraries. Lead Optimization is a multi-objective optimization problem. The classical approaches involving in-vitro laboratory analysis are time consuming and very expensive. To address this problem, we propose in this paper an in-silico approach: Lead Optimization based on Neural Network (NN) model in order to help the chemist in the lab experimentation by requiring a small set of real laboratory tests. We propose and estimate a predictive network model to derive a simultaneous optimal multi-response property following a single and multi-objective optimization procedure. We adopt different architectures in this study and we compare our procedure with other state-of-the-art method showing the better performance of our approach.

Tewodros M. Dagnew, Claudio Silvestri, Debora Slanzi, Irene Poli
An Empirical Analysis of Integrating Feature Extraction to Automated Machine Learning Pipeline

Machine learning techniques and algorithms are employed in many application domains such as financial applications, recommendation systems, medical diagnosis systems, and self-driving cars. They play a crucial role in harnessing the power of Big Data being produced every day in our digital world. In general, building a well-performing machine learning pipeline is an iterative and complex process that requires a solid understanding of various techniques that can be used in each component of the machine learning pipeline. Feature engineering (FE) is one of the most time-consuming steps in building machine learning pipelines. It requires a deep understanding of the domain and data exploration to discover relevant hand-crafted features from raw data. In this work, we empirically evaluate the impact of integrating an automated feature extraction tool (AutoFeat) into two automated machine learning frameworks, namely, Auto-Sklearn and TPOT, on their predictive performance. Besides, we discuss the limitations of AutoFeat that need to be addressed in order to improve the predictive performance of the automated machine learning frameworks on real-world datasets.

Hassan Eldeeb, Shota Amashukeli, Radwa El Shawi
Input-Aware Neural Knowledge Tracing Machine

Knowledge Tracing (KT) is the task of tracing evolving knowledge state of each student as (s)he engages with a sequence of learning activities and can provide personalized instructions. However, exiting methods such as Bayesian Knowledge Tracing (BKT) and Deep Knowledge Tracing (DKT) either cannot capture the relationship among different concepts or lack of interpretability. Although Knowledge Tracing Machines (KTM) makes up for these shortcomings, it only uses a linear function to model students' knowledge states, which cannot capture more information contained in each feature. To solve above problems, this work introduces a novel model called Input-aware Neural Knowledge Tracing Machine (INKTM) which can enhance the interpretability to some extent and capture more complex structure information of real-world data to improve prediction performance. Unlike standard FM-based methods that focus on the feature interactions, our model focuses more on the information contained in each feature itself and retains all 2-order feature interactions. By converting weights of each feature to a multidimensional vector, our model can use the vectors to learn a unique attention weight of each feature in different instances by an attention network, so as to highlight important features and then enhance interpretability. At last, we input re-weighted features to a deep neural network to capture the non-linear and complex inherent structure of data. Experiment results show our model can consistently outperform existing models in a range of KT datasets.

Moyu Zhang, Xinning Zhu, Yang Ji
Towards Corner Case Detection by Modeling the Uncertainty of Instance Segmentation Networks

State-of-the-art instance segmentation techniques currently provide a bounding box, class, mask, and scores for each instance. What they do not provide is an epistemic uncertainty estimate of these predictions. With our approach, we want to identify corner cases by considering the epistemic uncertainty. Corner cases are data/situations that are underrepresented or not covered in our data set. Our work is based on Mask R-CNN. We estimate the epistemic uncertainty by extending the architecture with Monte-Carlo dropout layers. By repeatedly executing the forward pass, we create a large number of predictions per instance. Afterward, we cluster the predictions of an instance based on the bounding box coordinates. It becomes possible to determine the epistemic position uncertainty for the bounding boxes and the classifier’s epistemic class uncertainty. For the epistemic uncertainty regarding the bounding box position and the class assignment, we provide a criterion for detecting corner cases utilizing the model’s epistemic uncertainty.

Florian Heidecker, Abdul Hannan, Maarten Bieshaar, Bernhard Sick
Intelligent and Interactive Video Annotation for Instance Segmentation Using Siamese Neural Networks

Training machine learning models in a supervised manner requires vast amounts of labeled data. These labels are typically provided by humans manually annotating samples using a variety of tools. In this work, we propose an intelligent annotation tool to combine the fast and efficient labeling capabilities of modern machine learning models with the reliable and accurate, but slow, correction capabilities of human annotators. We present our approach to interactively condition a model on previously predicted and manually annotated or corrected instances and explore an iterative workflow combining the advantages of the intelligent model and the human annotator for the task of instance segmentation in videos. Thereby, the intelligent model conducts the bulk of the work, performing instance detection, tracking, and segmentation, and enables the human annotator to correct individual frames and instances selectively. The proposed approach avoids the computational cost of online retraining by being based on the one-shot learning paradigm. For this purpose, we use Siamese neural networks to transfer annotations from one video frame to another. Multiple interaction options regarding the choice of the additional input data to the neural network, e.g., model predictions or manual corrections, are explored to refine the given model’s labeling performance and speed up the annotation process.

Jan Schneegans, Maarten Bieshaar, Florian Heidecker, Bernhard Sick
Imputation of Rainfall Data Using Improved Neural Network Algorithm

Missing rainfall data have reduced the quality of hydrological data analysis because they are the essential input for hydrological modeling. Much research has focused on rainfall data imputation. However, the compatibility of precipitation (rainfall) and non-precipitation (meteorology) as input data has received less attention. First, we propose a novel input structure for the missing data imputation method. Principal component analysis (PCA) is used to extract the most relevant features from the meteorological data. This paper introduces the combined input of the significant principal components (PCs) and rainfall data from nearest neighbor gauging stations as the input to the estimation of the missing values. Second, the effects of the combination input for infilling the missing rainfall data series were compared using the sine cosine algorithm neural network (SCANN) and feedforward neural network (FFNN). The results showed that SCANN outperformed FFNN imputation in terms of mean absolute error (MAE), root means square error (RMSE) and correlation coefficient (R), with an average accuracy of more than 90%. This study revealed that as the percentage of missingness increased, the precision of both imputation methods reduced.

Po Chan Chiu, Ali Selamat, Ondrej Krejcar, King Kuok Kuok
Novelty Based Driver Identification on RR Intervals from ECG Data

We present an approach for driver identification, which is useful in many automotive applications such as safety or comfort functions. Driver identification would also be of great interest to other business models, such as car rental and car-sharing companies. The identification method is based on the driver’s physiological state or rather his/her electrocardiogram (ECG) data. For this purpose, we have recorded ECG data of 25 people driving in a simulated environment. To identify a driver, we extend our existing novelty detection by aggregating local features over time. To do so, we extracted features and trained a Gaussian Mixture Model (GMM) to exploit localities present in the recorded sensor data. With novelty detection by aggregating local features, we are smoothing the noisy signal and reducing the dimensionality for further processing in a one-class SVM classification. Based on the output, a decision function decides whether the driver is unknown or well-known and if the driver is well-known, who of the known driver is it.

Florian Heidecker, Christian Gruhl, Bernhard Sick
Link Prediction in Social Networks by Variational Graph Autoencoder and Similarity-Based Methods: A Brief Comparative Analysis

Link prediction is an emerging and fast-growing applied research area. In a network, it is possible to predict the next link which is going to be formed. The usefulness of link prediction modeling has been proved in several fields and applications, such as biomedicine, recommending systems, and social media. In this short paper, we discuss the potential of Variational Graph Autoencoder, by comparing the results so obtained against those by some similarity-based methods, such as Adamic-Adar, Jaccard coefficient, and Preferential Attachment.

Sanjiban Sekhar Roy, Aditya Ranjan, Stefania Tomasiello
A Hybrid Wine Classification Model for Quality Prediction

“Wine is bottled poetry” a quote from Robert Louis Stevenson shows the wine is an exciting and complex product with distinctive qualities that make it different from other products. Therefore, the testing approach to determine the quality of the wine is complex and diverse. The opinion of a wine expert is influential, but it is also costly and subjective. Hence, many algorithms based on machine learning techniques have been proposed for predicting wine quality. However, most of them focus on analyzing different classifiers to figure out what the best classifier for wine quality prediction is. Instead of focusing on a particular classifier, it motivates us to find a more effective classifier. In this paper, a hybrid model that consists of two classifiers at least, e.g. the random forest, support vector machine, is proposed for wine quality prediction. To evaluate the performance of the proposed hybrid model, experiments also made on the wine datasets to show the merits of the hybrid model.

Terry Hui-Ye Chiu, Chien-Wen Wu, Chun-Hao Chen
A PSO-Based Sanitization Process with Multi-thresholds Model

Earlier, many PPDM algorithms have been proposed to conceal sensitive items in a database in order to disclose sensitive itemsets. All prior techniques, however, ignored a crucial problem in setting minimum support thresholds. Thus, a new concept of minimal support for solving this issue is proposed in this paper. In compliance with a given threshold function, the proposed approach would set a tighter threshold for an object containing several items. Experimental results are then evaluated to show the performance of the traditional Greedy PPDM approach, GA-based PPDM approaches, and the proposed PSO-based algorithm with the new flexible and minimal support function.

Jimmy Ming-Tai Wu, Gautam Srivastava, Shahab Tayeb, Jerry Chun-Wei Lin
Task-Specific Novel Object Characterization

In an open world, a robot encounters novel objects. It needs to be able to deal with such novelties. For instance, to characterize the object, if it is similar to a known object, or to indicate that an object is unknown or irrelevant. We present an approach for robots to deal with an open world in which objects are encountered that are not known beforehand. Our method first decides whether an object is relevant for the task at hand, if it is similar to an object that is known to be relevant. Relevancy is determined from a task-specific taxonomy of objects. If the object is relevant for the task, then it is characterized through the taxonomy. The task determines the level of detail that is needed, which relates to the levels in the taxonomy. The advantage of our method is that it only needs to model the relevant objects and not all possible irrelevant and often unknown objects that the robot may also encounter. We show the merit of our method in a real-life experiment of a search and rescue task in a messy and cluttered house, where victims (including novelties) were successfully found.

Gertjan J. Burghouts

IML - International Workshop on Industrial Machine Learning

Frontmatter
Deep Learning Based Dimple Segmentation for Quantitative Fractography

In this work, we try to address the challenging problem of dimple segmentation from Scanning Electron Microscope (SEM) images of titanium alloys using machine learning methods, particularly neural networks. This automated method would in turn help in correlating the topographical features of the fracture surface with the mechanical properties of the material. Our proposed, UNet-inspired attention driven model not only achieves the best performance on dice-score metric when compared to other previous segmentation methods when applied to our curated dataset of SEM images, but also consumes significantly less memory. To the best of our knowledge, this is one of the first work in fractography using fully convolutional neural networks with self-attention for supervised learning of deep dimple fractography, though it can be easily extended to account for brittle characteristics as well.

Ashish Sinha, K. S. Suresh
PaDiM: A Patch Distribution Modeling Framework for Anomaly Detection and Localization

We present a new framework for Patch Distribution Modeling, PaDiM, to concurrently detect and localize anomalies in images in a one-class learning setting. PaDiM makes use of a pretrained convolutional neural network (CNN) for patch embedding, and of multivariate Gaussian distributions to get a probabilistic representation of the normal class. It also exploits correlations between the different semantic levels of CNN to better localize anomalies. PaDiM outperforms current state-of-the-art approaches for both anomaly detection and localization on the MVTec AD and STC datasets. To match real-world visual industrial inspection, we extend the evaluation protocol to assess performance of anomaly localization algorithms on non-aligned dataset. The state-of-the-art performance and low complexity of PaDiM make it a good candidate for many industrial applications.

Thomas Defard, Aleksandr Setkov, Angelique Loesch, Romaric Audigier
Real-Time Cross-Dataset Quality Production Assessment in Industrial Laser Cutting Machines

In laser cutting processes, cutting failure is one of the most common causes of faulty productions. Monitoring cutting failure events is extremely complex, as failures might be initiated by several factors, the most prominent probably being the high production speeds required by modern standards. The present work aims at creating and deploying a classifier able to assess the status of a production cutting quality in a real-time fashion. To this aim, multiple datasets were collected in different environmental conditions and with different sensors. Model inputs include photo-sensors and production parameters. At first, different algorithms were tested and rated by prediction ability. Second, the selected algorithm was deployed on a GPU embedded system and added to the current machine configuration. The final system can receive the input data from the sensors, perform the inference, and send back the results to the computer numerical control. The data management is based on a client-server architecture. The selected algorithm and hardware showed good performances despite multiple changes in the environmental conditions (domain adaptation ability) both in terms of prediction ability (accuracy) and computational times.

Nicola Peghini, Andrea Zignoli, Davide Gandolfi, Paolo Rota, Paolo Bosetti
An Online Deep Learning Based System for Defects Detection in Glass Panels

Automated surface anomaly inspection for industrial application is assuming every year an increasing importance, in particular, deep learning methods are remarkably suitable for detection and segmentation of surface defects. The identification of flaws and structural weaknesses of glass surfaces is crucial to ensure the quality, and more importantly, guarantee the integrity of the panel itself. Glass inspection, in particular, has to overcome many challenges, given the nature of the material itself and the presence of defects that may occur with arbitrary size, shape, and orientation. Traditionally, glass manufacturers automated inspection systems are based on more conventional machine learning algorithms with handcrafted features. However, considering the unpredictable nature of the defects, manually engineered features may easily fail even in the presence of small changes in the environment conditions. To overcome these problems, we propose an inductive transfer learning application for the detection and classification of glass defects. The experimental results show a comparison among different deep learning single-stage and two-stage detectors. Results are computed on a brand new dataset prepared in collaboration with Deltamax Automazione Srl.

Matteo Moro, Claudio Andreatta, Chiara Corridori, Paolo Rota, Niculae Sebe
Evaluation of Edge Platforms for Deep Learning in Computer Vision

In recent years, companies, such as Intel and Google, have brought onto the market small low-power platforms that can be used to deploy and run inference of Deep Neural Networks at a low cost. These platforms can process data at the edge, such as images from a camera, to avoid transfer of large amount of data across a network. To determine which platform to use for a specific task, practitioners usually compare parameters, such as inference time and power consumption. However, to provide a better incentive on platform selection based on requirements, it is important to also consider the platform price. In this paper, we explore platform/model trade-offs, by providing benchmarks of state-of-the-art platforms within three common computer vision tasks; classification, detection and segmentation. By also considering the price of each platform, we provide a comparison of price versus inference time, to aid quick decision making in regard to platform and model selection. Finally, by analysing the operation allocation of models for each platform, we identify operations that should be optimised, based on platform/model selection.

Christoffer Bøgelund Rasmussen, Aske Rasch Lejbølle, Kamal Nasrollahi, Thomas B. Moeslund
BlendTorch: A Real-Time, Adaptive Domain Randomization Library

Solving complex computer vision tasks by deep learning techniques rely on large amounts of (supervised) image data, typically unavailable in industrial environments. Consequently, the lack of training data is beginning to impede the successful transfer of state-of-the-art computer vision methods to industrial applications. We introduce BlendTorch, an adaptive Domain Randomization (DR) library, to help create infinite streams of synthetic training data. BlendTorch generates data by massively randomizing low-fidelity simulations and takes care of distributing artificial training data for model learning in real-time. We show that models trained with BlendTorch repeatedly perform better in an industrial object detection task than those trained on real or photo-realistic datasets.

Christoph Heindl, Lukas Brunner, Sebastian Zambal, Josef Scharinger
SAFFIRE: System for Autonomous Feature Filtering and Intelligent ROI Estimation

This work introduces a new framework, named SAFFIRE, to automatically extract a dominant recurrent image pattern from a set of image samples. Such a pattern shall be used to eliminate pose variations between samples, which is a common requirement in many computer vision and machine learning tasks.The framework is specialized here in the context of a machine vision system for automated product inspection. Here, it is customary to ask the user for the identification of an anchor pattern, to be used by the automated system to normalize data before further processing. Yet, this is a very sensitive operation which is intrinsically subjective and requires high expertise. Hereto, SAFFIRE provides a unique and disruptive framework for unsupervised identification of an optimal anchor pattern in a way which is fully transparent to the user.SAFFIRE is thoroughly validated on several realistic case studies for a machine vision inspection pipeline.

Marco Boschi, Luigi Di Stefano, Martino Alessandrini
Heterogeneous Feature Fusion Based Machine Learning on Shallow-Wide and Heterogeneous-Sparse Industrial Datasets

Although machine learning has gained great success in industry, there are still many challenges in mining industrial data, especially in manufacturing domains. Because industrial data can be 1) shallow and wide, 2) highly heterogeneous and sparse. Particularly, mining on sparse data (i.e. data with missing features) is extremely challenging, because it is not easy to fill in some features (e.g. images), and removing data points would reduce the data size further. Thus, in this work, we propose a machine learning framework including transfer learning, heterogeneous feature fusion, principal component analysis and gradient boosting to solve these challenges and effectively develop predictive models on industrial datasets. Compared to a non-fusion method and a traditional fusion method on two real world datasets from Toyota Motor Corporation, the results show that the proposed method can not only maximize the utility of available features and data to achieve more stable and better performance, but also give more flexibility when predicting new unseen data points with only partial set of features available.(Code and data are available at: https://github.com/zyz293/FusionML .)

Zijiang Yang, Tetsushi Watari, Daisuke Ichigozaki, Akita Mitsutoshi, Hiroaki Takahashi, Yoshinori Suga, Wei-keng Liao, Alok Choudhary, Ankit Agrawal
3-D Deep Learning-Based Item Classification for Belt Conveyors Targeting Packaging and Logistics

In this study, we apply concepts taken from the fields of Artificial Intelligence (AI) and Industry 4.0 to a belt conveyor, a key tool in the packaging and logistics industries. Specifically, we present an item classification model built for belt conveyors, helping the conveyor control system to recognize items while minimizing its impact on the conveyor design and the movement of items. To that end, we followed a three-pronged approach. First, we converted a size measurement system into a 3-D shape reconstruction system by recycling a belt conveyor prototype developed in a previous study. Secondly, we transformed a scanned point cloud that varies in size, given the use of variable-length items, into a point cloud with a fixed size. Thirdly, we constructed three different end-to-end 3-D point cloud classification models, with the Dynamic Graph Convolutional Neural Network (DGCNN) model coming out on top when considering accuracy, response time, and training stability.

Ho-min Park, Byungkon Kang, Arnout Van Messem, Wesley De Neve
Development of Fast Refinement Detectors on AI Edge Platforms

Refinement detector (RefineDet) is a state-of-the-art model in object detection that has been developed and refined based on high-end GPU systems. In this study, we discovered that the speed of models developed in high-end GPU systems is inconsistent with that in embedded systems. In other words, the fastest model that operates on high-end GPU systems may not be the fastest model on embedded boards. To determine the reason for this phenomenon, we performed several experiments on RefineDet using various backbone architectures on three different platforms: NVIDIA Titan XP GPU system, Drive PX2 board, and Jetson Xavier board. Finally, we achieved real-time performances (approximately 20 fps) based on the experiments on AI edge platforms such as NVIDIA Drive PX2 and Jetson Xavier boards. We believe that our current study would serve as a good reference for developers who wish to apply object detection algorithms to AI edge computing hardware. The complete code and models are publicly available on the web (link) .

Min-Kook Choi, Heechul Jung
Selecting Algorithms Without Meta-features

The algorithm selection has been successfully used on a variety of decision problems. When the problem definition is structured and several algorithms for the same problem are available, then meta-features, that in turn permit a highly accurate algorithm selection on a case-by-case basis, can be easily and at a relatively low cost extracted. Real world problems such as computer vision could benefit from algorithm selection as well, however the input is not structured and datasets are very large both in samples size and sample numbers. Therefore, meta-features are either impossible or too costly to be extracted. Considering such limitations, in this paper we experimentally evaluate the cost and the complexity of algorithm selection on two popular computer vision datasets VOC2012 and MSCOCO and by using a variety task oriented features. We evaluate both dataset on algorithm selection accuracy over five algorithms and by using a various levels of dataset manipulation such as data augmentation, algorithm selector fine tuning and ensemble selection. We determine that the main reason for low accuracy from existing features is due to insufficient evaluation of existing algorithms. Our experiments show that even without meta features, it is thus possible to have meaningful algorithm selection accuracy, and thus obtain processing accuracy increase. The main result shows that using ensemble method, trained on MSCOCO dataset, we can successfully increase the processing result by at least 3% of processing accuracy.

Martin Lukac, Ayazkhan Bayanov, Albina Li, Kamila Abiyeva, Nadira Izbassarova, Magzhan Gabidolla, Michitaka Kameyama
A Hybrid Machine Learning Approach for Energy Consumption Prediction in Additive Manufacturing

Additive manufacturing (AM), as a fast-developing technology for rapid manufacturing, offers a paradigm shift in terms of process flexibility and product customisation, showing great potential for widespread adoption in the industry. In recent years, energy consumption has increasingly attracted attention in both academia and industry due to the increasing demands and applications of AM systems in production. However, AM systems are considered highly complex, consisting of several subsystems, where energy consumption is related to various correlated factors. These factors stem from different sources and typically contain features with various types and dimensions, posing challenges for integration for analysing and modelling. To tackle this issue, a hybrid machine learning (ML) approach that integrates extreme gradient boosting (XGBoost) decision tree and density-based spatial clustering of applications with noise (DBSCAN) technique, is proposed to handle such multi-source data with different granularities and structures for energy consumption prediction. In this paper, four different sources, including design, process, working environment, and material, are taken into account. The unstructured data is clustered by DBSCAN so to reduce data dimensionality and combined with handcrafted features into the XGBoost model for energy consumption prediction. A case study was conducted, focusing on the real-world SLS system to demonstrate the effectiveness of the proposed method.

Yixin Li, Fu Hu, Jian Qin, Michael Ryan, Ray Wang, Ying Liu
Bias from the Wild Industry 4.0: Are We Really Classifying the Quality or Shotgun Series?

The traditional data quality control (QC) process was usually limited by the high time consuming and high resources demand, in addition to a limit in performance mainly due to the high intrinsic variability across different annotators. The application of Deep Learning (DL) strategies for solving the QC task open the realm of possibilities in order to overcome these challenges. However, not everything would be a bed of roses: the inability to detect bias from the collected data and the risk to reproduce bias in the outcome of DL model pose a remarkable and unresolved point in the Industrial 4.0 scenario. In this work, we propose a Deep Learning approach, specifically tailored for providing the aesthetic quality classification of shotguns based on the analysis of wood grains without running into an unwanted bias. The task as well as the collected dataset are the result of a collaboration with an industrial company. Although the proposed DL model based on VGG-16 and ordinal categorical cross-entropy loss has been proven to be reliable in solving the QC task, it is not immune to those who may be unwanted bias such as the typical characteristics of each shotgun series. This may lead to an overestimation of the DL performance, thus reflecting a more focus on the geometry than an evaluation of the wood grain. The proposed two-stage solution named Hierarchical Unbiased VGG-16 (HUVGG-16) is able to separate the shotgun series prediction (shotgun series task) from the quality class prediction (quality task). The higher performance (up to 0.95 of F1 score) by the proposed HUVGG-16 suggests how the proposed approach represents a solution for automatizing the overall QC procedure in a challenging industrial case scenario. Moreover, the saliency map results confirm how the proposed solution represents a proof of concept for detecting and mitigated unwanted bias by constraining the network to learn the characteristics that properly describe the quality of shotgun, rather than other confound characteristics (e.g. geometry).

Riccardo Rosati, Luca Romeo, Gianalberto Cecchini, Flavio Tonetto, Luca Perugini, Luca Ruggeri, Paolo Viti, Emanuele Frontoni
Machine Learning for Storage Location Prediction in Industrial High Bay Warehouses

Global trade and logistics require efficient management of the scarce resource of storage locations. In order to adequately manage that resource in a high bay warehouse, information regarding the overall logistics processes need to be considered, while still enabling human stakeholders to keep track of the decision process and utilizing their non-digitized, domain-specific, expert knowledge. Although a plethora of machine learning models gained high popularity in many industrial sectors, only those models that provide a transparent perspective on their own inner decision procedures are applicable for a sensitive domain like logistics. In this paper, we propose the application of machine learning for efficient data-driven storage type classification in logistics. In order to reflect this research problem in practice, we used production data from a warehouse at a large Danish retailer. We evaluate and discuss the proposed solution and its different manifestations in the given logistics context.

Fabian Berns, Timo Ramsdorf, Christian Beecks
A Deep Learning-Based Approach for Automatic Leather Classification in Industry 4.0

Smart production is trying to bring companies into the world of industry 4.0. In this field, leather is a natural product commonly used as a raw material to manufacture luxury objects. To ensure good quality on these products, one of the fundamental processes is the visual inspection phase to identify defects on leather surfaces. A typical exercise in quality control during the production is to perform a rigorous manual inspection on the same piece of leather several times, using different viewing angles and distances. However, the process of the human inspection is expensive, time-consuming, and subjective. In addition, it is always prone to human error and inter-subject variability as it requires a high level of concentration and might lead to labor fatigue. Therefore, there is a necessity to develop an automatic vision-based solution in order to reduce manual intervention in this specific process.In this regard, this work presents an automatic approach to perform leather and stitching classification. The main goal is to automatically classify the images inside of a new dataset called LASCC (Leather And Stitching Color Classification) dataset. The dataset is newly collected and it is composed of 67 images with two different colors of leathers and seven different colors of stitching. For this purpose, Deep Convolutional Neural Networks (DCNNs) such as VGG16, Resnet50 and InceptionV3 have been applied to LASCC dataset, on a sample of 67 images.Experimental results confirmed the effectiveness and the suitability of the approach, showing high values of accuracy.

Giulia Pazzaglia, Massimo Martini, Riccardo Rosati, Luca Romeo, Emanuele Frontoni
Automatic Viewpoint Estimation for Inspection Planning Purposes

Viewpoint estimation is an important aspect of surface inspection and planning. Typically viewpoint estimation has been done only with the 3D model and not with the actual object. This, therefore, limits the flexibility of using the actual object in inspection planning. In this work, we present a novel pipeline that can efficiently estimate viewpoints from live camera images. The pipeline can be used for different sized industrial objects to efficiently estimate the viewpoint. The achieved results are real-time and the method is easily generalizable to different objects. The presented method is based on a 3D model of the object, which is self-supervised and requires no manual data annotation or real images to be used as an input. The complete solution, together with documentation and examples is available in the public domain for testing. https://gitlab.itwm.fraunhofer.de/dutta/real-time-pose-est/.

Siddhartha Dutta, Markus Rauhut, Hans Hagen, Petra Gospodnetić
Localisation of Defects in Volumetric Computed Tomography Scans of Valuable Wood Logs

We present a novel pipeline to efficiently localise defects in volumetric Computed Tomography (CT) scans of valuable wood logs. We couple a 2D detector applied independently on each scan slice with a multi-object tracking approach processing detections along the scan direction to localise the defects in 3D. Our solution is designed to meet the real-time requirements of modern production lines, to optimise the wood sawing operations for high-quality final products and to reduce wood waste as well as carbon footprints. We effectively embedded our defect localisation algorithm in the Meccanica del Sarca S.p.A.’s production pipeline achieving a reduction of their economic loss by $$7\%$$ 7 % compared to the previous years.

Davide Boscaini, Fabio Poiesi, Stefano Messelodi, Ayman Younes, Donato A. Grande
Image Anomaly Detection by Aggregating Deep Pyramidal Representations

Anomaly detection consists in identifying, within a dataset, those samples that significantly differ from the majority of the data, representing the normal class. It has many practical applications, e.g. ranging from defective product detection in industrial systems to medical imaging. This paper focuses on image anomaly detection using a deep neural network with multiple pyramid levels to analyze the image features at different scales. We propose a network based on encoding-decoding scheme, using a standard convolutional autoencoders, trained on normal data only in order to build a model of normality. Anomalies can be detected by the inability of the network to reconstruct its input. Experimental results show a good accuracy on MNIST, FMNIST and the recent MVTec Anomaly Detection dataset.

Pankaj Mishra, Claudio Piciarelli, Gian Luca Foresti
Fault Detection in Uni-Directional Tape Production Using Image Processing

The quality of uni-directional tape in its production process is affected by environmental conditions like temperature and production speed. In this paper, computer vision algorithms on the scanned images are needed to be used in this context to detect and classify tape damages during the manufacturing procedure. We perform a comparative study among famous feature descriptors for fault candidate generation, then propose own features for fault detection. We investigate various machine learning techniques to find best model for the classification problem. The empirical results demonstrate the high performance of the proposed system and show preference of random forest and canny edges for classifier and feature generator respectively.

Somesh Devagekar, Ahmad Delforouzi, Paul G. Plöger
Backmatter
Metadata
Title
Pattern Recognition. ICPR International Workshops and Challenges
Editors
Prof. Alberto Del Bimbo
Prof. Rita Cucchiara
Prof. Stan Sclaroff
Dr. Giovanni Maria Farinella
Tao Mei
Prof. Dr. Marco Bertini
Hugo Jair Escalante
Dr. Roberto Vezzani
Copyright Year
2021
Electronic ISBN
978-3-030-68799-1
Print ISBN
978-3-030-68798-4
DOI
https://doi.org/10.1007/978-3-030-68799-1

Premium Partner