Skip to main content
Top

2019 | Book

New Trends in Image Analysis and Processing – ICIAP 2019

ICIAP International Workshops, BioFor, PatReCH, e-BADLE, DeepRetail, and Industrial Session, Trento, Italy, September 9–10, 2019, Revised Selected Papers

Editors: Marco Cristani, Andrea Prati, Oswald Lanz, Stefano Messelodi, Nicu Sebe

Publisher: Springer International Publishing

Book Series : Lecture Notes in Computer Science

insite
SEARCH

About this book

This book constitutes the refereed proceedings of five workshops and an industrial session held at the 20th International Conference on Image Analysis and Processing, ICIAP 2019, in Trento, Italy, in September 2019: Second International Workshop on Recent Advances in Digital Security: Biometrics and Forensics (BioFor 2019); First International Workshop on Pattern Recognition for Cultural
Heritage (PatReCH 2019); First International Workshop eHealth in the Big Data and Deep Learning Era (e-BADLE 2019); International Workshop on Deep Understanding Shopper Behaviors and Interactions in Intelligent Retail Environments (DEEPRETAIL 2019); Industrial Session.

Table of Contents

Frontmatter

BioFor: Workshop on Recent Advances in Digital Security: Biometrics and Forensics

Frontmatter
EEG-Based Biometric Verification Using Siamese CNNs

Cognitive biometric characteristics have recently attracted the attention of the scientific community thanks to some of their interesting properties, such as their intrinsic liveness detection capability and their robustness against spoofing attacks. Among the traits belonging to this category, brain signals have been considered in several studies, commonly focusing on the analysis of electroencephalography (EEG) recordings. Unfortunately, a significant intra-class variability affects EEG data acquired at different times, making it therefore hard for current state-of-the-art methods to achieve high recognition rates. To cope with this issue, deep learning techniques have been recently employed to search for EEG discriminative information, yet only identification scenarios have been so far considered in literature. In this paper a verification context is instead taken into account, and proper networks are proposed to extract features allowing to differentiate subjects which are not available during network training, by resorting to siamese designs. The performed experimental tests, conducted over a longitudinal database comprising EEG acquisitions taken during five sessions spanning a period of one year and a half, show the effectiveness of the proposed approach in achieving high-level accuracy for brain-based biometric verification purposes.

Emanuele Maiorana
On the Cross-Finger Similarity of Vein Patterns

Biometric recognition based on finger-vein patterns is gaining more and more attention, as several approaches have been recently proposed to extract discriminative features from vascular structures. In this paper we investigate the similarity between vein patterns of symmetric fingers of the left and right hand of a subject. More in detail, we analyze the performance achievable when using symmetric fingers and geometry- or deep-learning-based feature extraction methods for recognition. A database with acquisitions from left and right index, medium, and ring fingers of 106 subjects is exploited for experimental tests.

Emanuela Piciucco, Ridvan Salih Kuzu, Emanuele Maiorana, Patrizio Campisi
Improving Multi-scale Face Recognition Using VGGFace2

Convolutional neural networks have reached extremely high performances on the Face Recognition task. These models are commonly trained by using high-resolution images and for this reason, their discrimination ability is usually degraded when they are tested against low-resolution images. Thus, Low-Resolution Face Recognition remains an open challenge for deep learning models. Such a scenario is of particular interest for surveillance systems in which it usually happens that a low-resolution probe has to be matched with higher resolution galleries. This task can be especially hard to accomplish since the probe can have resolutions as low as 8, 16 and 24 pixels per side while the typical input of state-of-the-art neural network is 224. In this paper, we described the training campaign we used to fine-tune a ResNet-50 architecture, with Squeeze-and-Excitation blocks, on the tasks of very low and mixed resolutions face recognition. For the training process we used the VGGFace2 dataset and then we tested the performance of the final model on the IJB-B dataset; in particular, we tested the neural network on the 1:1 verification task. In our experiments we considered two different scenarios: (1) probe and gallery with same resolution; (2) probe and gallery with mixed resolutions.Experimental results show that with our approach it is possible to improve upon state-of-the-art models performance on the low and mixed resolution face recognition tasks with a negligible loss at very high resolutions.

Fabio Valerio Massoli, Giuseppe Amato, Fabrizio Falchi, Claudio Gennaro, Claudio Vairo
Blind Print-Cam Data Hiding Exploiting Color Perception

Augmented Reality is becoming a fundamental technique to provide an easy access to additional information directly from the surrounding environment. It is however crucial that the mean through which the information is accessed is as integrated in the environment as possible. To this end, several data hiding techniques have been devised in the years to encode information in images in an imperceptible way. However, these techniques are frequently strongly affected by printing and re-acquisition process. This work presents an application developing a data hiding technique robust to printing and camera acquisition (print-cam), thus allowing to recover inserted data (hundreds or thousands of information bits) from printed images in a robust way (e.g., for different size of printed cover image, in different illumination conditions, with various geometric distortions). Performance and robustness of the proposed solution are tested with respect to different metrics to prove the feasibility of the technique.

Federico Baldessari, Giulia Boato, Federica Lago
A Database for Face Presentation Attack Using Wax Figure Faces

Compared to 2D face presentation attacks (e.g. printed photos and video replays), 3D type attacks are more challenging to face recognition systems (FRS) by presenting 3D characteristics or materials similar to real faces. Existing 3D face spoofing databases, however, mostly based on 3D masks, are restricted to small data size or poor authenticity due to the production difficulty and high cost. In this work, we introduce the first wax figure face database, WFFD, as one type of super-realistic 3D presentation attacks to spoof the FRS. This database consists of 2200 images with both real and wax figure faces (totally 4400 faces) with a high diversity from online collections. Experiments on this database first investigate the vulnerability of three popular FRS to this kind of new attack. Further, we evaluate the performance of several face presentation attack detection methods to show the attack abilities of this super-realistic face spoofing database.

Shan Jia, Chuanbo Hu, Guodong Guo, Zhengquan Xu
Analysis of “User-Specific Effect” and Impact of Operator Skills on Fingerprint PAD Systems

Fingerprint Liveness detection, or presentation attacks detection (PAD), that is, the ability of detecting if a fingerprint submitted to an electronic capture device is authentic or made up of some artificial materials, boosted the attention of the scientific community and recently machine learning approaches based on deep networks opened novel scenarios. A significant step ahead was due thanks to the public availability of large sets of data; in particular, the ones released during the International Fingerprint Liveness Detection Competition (LivDet). Among others, the fifth edition carried on in 2017, challenged the participants in two more challenges which were not detailed in the official report. In this paper, we want to extend that report by focusing on them: the first one was aimed at exploring the case in which the PAD is integrated into a fingerprint verification systems, where templates of users are available too and the designer is not constrained to refer only to a generic users population for the PAD settings. The second one faces with the exploitation ability of attackers of the provided fakes, and how this ability impacts on the final performance. These two challenges together may set at which extent the fingerprint presentation attacks are an actual threat and how to exploit additional information to make the PAD more effective.

Giulia Orrù, Pierluigi Tuveri, Luca Ghiani, Gian Luca Marcialis

PatReCH: International Workshop on Pattern Recognition for Cultural Heritage

Frontmatter
Enriching Character-Based Neural Machine Translation with Modern Documents for Achieving an Orthography Consistency in Historical Documents

The nature of human language and the lack of a spelling convention make historical documents hard to handle for natural language processing. Spelling normalization tackles this problem by adapting their spelling to modern standards in order to get an orthography consistency. In this work, we compare several character-based machine translation approaches, and propose a method to profit from modern documents to enrich neural machine translation models. We tested our proposal with four different data sets, and observed that the enriched models successfully improved the normalization quality of the neural models. Statistical models, however, yielded a better result.

Miguel Domingo, Francisco Casacuberta
A Comparative Analysis of Two Commercial Digital Photogrammetry Software for Cultural Heritage Applications

The paper seeks to evaluate the literature on digital photogrammetry processing software, comparing Agisoft Photoscan (now known as Agisoft Metashape, beginning with version 1.5) RealityCapture and to test the two programs in a real-world situation to further examine the difference in the software and resulting model. Tests were carried out for the evaluation of Photoscan, and RealityCapure—two European based digital photogrammetry programs—using the same artifact and data in all three tests. The artifact is part of the Farid Karam Collection of Antiquities, housed at the University of South Florida (USF) Libraries Tampa Special Collections. The chosen artifact for study, the aryballos inv. no. 68, was captured in ideal conditions to better evaluate multiple programs with the same data. The digital artifact was examined based upon knowledge and photographs of the physical artifact and evaluated relatively and comparatively between the programs.

Kaitlyn Kingsland
Segmentation of Multi-temporal UV-Induced Fluorescence Images of Historical Violins

Monitoring the state of conservation of a historical violin is a difficult task. Multiple restorations during centuries have created a very complex and stratified surface, hard to correctly interpret. Moreover, the reflectance of the varnishes and the rounded morphology of the violins can easily produce noise, that can be confused for a real alteration. To properly compare multi-temporal images of the same instrument a robust segmentation is needed. To reach this goal we adopted a genetic algorithm to evolve in this direction our previous segmentation method based on HSV histogram quantization. As test set we used images of two important violins held in “Museo del Violino” in Cremona (Italy), periodically acquired during a six-month period, and images of a sample violin altered in laboratory to reproduce a long-term evolution.

Piercarlo Dondi, Luca Lombardi, Marco Malagodi, Maurizio Licchelli
A Study of English Neologisms Through Large-Scale Probabilistic Indexing of Bentham’s Manuscripts

Probabilistic indexes (PI) are obtained from untranscribed handwritten text images by means of recently introduced lexicon-free, query-by-string, probabilistic keyword spotting techniques. PIs have proven to be a powerful tool that allow efficient, free textual searching in very large collections of handwritten historical documents. PIs convey uncertain information about the textual contents of the document images. However, text uncertainty is accurately modeled by the associated lexical probability distributions, which can be conveniently exploited in many applications. As an example of these applications, here we study the dating of a number of English neologisms in the large collection of Bentham’s manuscripts, which encompass $$90\,000$$ images. The statistical techniques used for neologism dating are theoretically motivated and experiments on this collection are reported. Among other interesting contributions of this study, it provides sound evidence that some commonly assumed neologism introduction dates need to be revised.

Alejandro H. Toselli, Verónica Romero, Enrique Vidal, Joan Andreu Sánchez, Louise Seaward, Philip Schofield
Modern vs Diplomatic Transcripts for Historical Handwritten Text Recognition

The transcription of handwritten documents is useful to make their contents accessible to the general public. However, so far automatic transcription of historical documents has mostly focused on producing diplomatic transcripts, even if such transcripts are often only understandable by experts. Main difficulties come from the heavy use of extremely abridged and tangled abbreviations and archaic or outdated word forms. Here we study different approaches to train optical models which allow to recognize historic document images containing archaic and abbreviated handwritten text and produce modernized transcripts with expanded abbreviations. Experiments comparing the performance of the different approaches proposed are carried out on a document collection related with Spanish naval commerce during the XV–XIX centuries, which includes extremely difficult handwritten text images.

Verónica Romero, Alejandro H. Toselli, Enrique Vidal, Joan Andreu Sánchez, Carlos Alonso, Lourdes Marqués
Improving Ancient Cham Glyph Recognition from Cham Inscription Images Using Data Augmentation and Transfer Learning

Ancient Cham glyphs have mostly appeared in inscriptions on stones at some museums in Vietnam. Unfortunately, these inscriptions are being abrasive by the time. To conserve Cham heritage as well as to make them widely accessible and readable by users, digitization and recognition of ancient Cham glyphs become necessary. In our previous work, we have built the first dataset of champ inscription images, manually segmented them in glyphs and annotated by an ancient Cham expert. We adapted some automatic recognition methods and conducted experiments on the manually denoised dataset. The aim of this paper is to extend that earlier research to work on noising data. To this end, we face two main issues. Firstly, the current pre-built dataset is still small which is usually a main drawback for deep learning based methods. Therefore, some data augmentation techniques will be evaluated and investigated to increase the number and variation of samples in the dataset. Second, even with the augmented dataset, the fact of training a deep model from scratch could be very long and sometimes cannot meet a good local minimum. Therefore, we use a simple transfer learning procedure which inherits knowledge from similar or of the same family language. Experiments on both the raw test set and its denoised version show very promising results ( $$64.4\%$$ and $$88.5\%$$ of F1-score on two test sets respectively).

Minh-Thang Nguyen, Anne-Valérie Schweyer, Thi-Lan Le, Thanh-Hai Tran, Hai Vu
Oracle Bone Inscription Detector Based on SSD

This paper introduces Oracle Bone Inscription Detector which based on Single Shot Multibox Detector, for segmenting and recognizing Oracle Bone Inscriptions from rubbing images. Oracle Bone Inscription which is the one of the oldest and most mysterious ancient characters, used about 3000 years ago in china, and lots of these literature are stored by rubbing images. Because that only few of specialists understand the Oracle Bone Inscriptions, lots of Oracle Bone Inscriptions are waiting for be understood for helping researchers know the history, culture, economy etc. Currently, deep learning method of single shot multibox detector achieves a good performance for segmentation and recognition, and may achieve a good performance for Oracle Bone Inscription detection. However, we fond the Single Shot Multibox Detector is weak at small object detection. This research equips and extends Single Shot Multibox Detector for Oracle Bone Inscription detection, and analyzes the mis-detection for achieving a better accuracy. The experimental results shows that Precision, Recall and F value achieve 0.95, 0.83 and 0.88 respectively, and proves the effectiveness of extended Single Shot Multibox Detector in Oracle Bone Inscription detection.

Lin Meng, Bing Lyu, Zhiyu Zhang, C. V. Aravinda, Naoto Kamitoku, Katsuhiro Yamazaki
Shot Boundary Detection for Automatic Video Analysis of Historical Films

In automatic video content analysis and film preservation, Shot Boundary Detection (SBD) is a fundamental pre-processing step. While previous research focuses on detecting Abrupt Transitions (AT) as well as Gradual Transitions (GT) in different video genres such as sports movies or news clips only few studies investigate in the detection of shot transitions in historical footage. The main aim of this paper is to create an SBD mechanism inspired by state-of-the-art algorithms which is applied and evaluated on a self-generated historical dataset as well as a publicly available dataset called Clipshots. Therefore, a three-stage pipeline is introduced consisting of a Candidate Frame Range Selection based on the network DeepSBD, Extraction of Convolutional Neural Network (CNN) Features and Similarity Calculation. A combination of pre-trained backbone CNNs such as ResNet, VGG19 and SqueezeNet with different similarity metrics like Cosine Similarity and Euclidean Distance are used and evaluated. The outcome of this paper displays that the proposed algorithm reaches promising results on detecting ATs in historical videos without the need of complex optimization and re-training processes. Furthermore, it points out the main challenges concerning historical footage such as damaged film reels, scratches or splices. The results of this paper contribute a significant base for future research on automatic video analysis of historical videos.

Daniel Helm, Martin Kampel
The Epistle to Cangrande Through the Lens of Computational Authorship Verification

The Epistle to Cangrande is one of the most controversial among the works of Italian poet Dante Alighieri. For more than a hundred years now, scholars have been debating over its real paternity, i.e., whether it should be considered a true work by Dante or a forgery by an unnamed author. In this work we address this philological problem through the methodologies of (supervised) Computational Authorship Verification, by training a classifier that predicts whether a given work is by Dante Alighieri or not. We discuss the system we have set up for this endeavour, the training set we have assembled, the experimental results we have obtained, and some issues that this work leaves open.

Silvia Corbara, Alejandro Moreo, Fabrizio Sebastiani, Mirko Tavoni
A Cockpit of Measures for Image Quality Assessment in Digital Film Restoration

We present an alternative approach for the quality assessment of the digital restoration of a degraded film. Instead of summarizing the film quality by an unique value, here we propose a set of basic measures that account for different film visual features. These measures describe over time global and local properties of the film, like brightness, contrast, color distribution entropy, color variations and perceptual intra-frame color changes. They are relevant to estimate the level of readability of the visual content of the film and to quantify the perceptual differences between the original and restored film. The measures proposed here are viewed as the parameters showed in a car or airplane cockpit and necessary to control the machine status and performance. The idea of cockpit would like to contribute to the automation of the digital restoration process and of its evaluation that are currently still performed for the most part manually by video editors and curators and thus often biased by subjective issues.

Alice Plutino, Michela Lecca, Alessandro Rizzi
Augmented Reality for the Valorization and Communication of Ruined Architecture

This paper is focused on the valorization and the communication about the Mother church of Santa Maria delle Grazie in the Ancient Misterbianco (Catania), one of the rare surviving vestiges of the eruption of Mount Etna in 1669 and the earthquake in Val di Noto in 1693. The project, starting from a 3D digital surveys carried out through reality-based techniques, uses an Augmented Reality approach to propose a virtual re-positioning of some significant elements of the church, removed during the eruption. This study required a deep architectural study and an archival documents research to exactly identify the original location of the re-positioned artworks. Then, a 3D reconstruction carried out to get accurate 3D models of them and Augmented Reality application allows the visitors to experience the current church environment enriched with these original artefacts, in order to achieve a more powerful learning/visiting experience.

Raissa Garozzo, Giovanni Pasqualino, Dario Allegra, Cettina Santagati, Filippo Stanco
Classification of Arabic Poems: from the to the Century

This paper describes a system for classification of Arabic poems according to the eras in which they were written. We used machine learning techniques where we applied a bunch of filters and classifiers. The best results were achieved by using the Multinomial Naïve Bayes (MNB) algorithm, with an accuracy equal to 70.21%, an F1-Score of 68.8% and a Kappa equal to 0.398, without filtering stop words. We observed that the stop words can have a positive impact on the accuracy but also a negative impact if it is used with word tokenizer preprocessing.

Mourad Abbas, Mohamed Lichouri, Ahmed Zeggada
A Page-Based Reject Option for Writer Identification in Medieval Books

One main goal of paleographers is to identify the different writers who wrote a given manuscript. Recently, paleographers are starting to use digital tools which provide new and more objective ways to analyze ancient documents. On the other hand, in the last few years, deep learning techniques have been applied to many domains and to overcome its requirement of a large amount of labeled data, transfer learning has been used. This latter approach uses previously trained large deep networks as starting points to solve specific classification problems. In this paper, we present a novel approach based on deep transfer learning to implement a reject option for the recognition of the writers in medieval documents. The implemented option is page-based and considers the row labels provided by the trained deep network to estimate the class probabilities. The proposed approach has been tested on a set of digital images from a Bible of the XII century. The achieved results confirmed the effectiveness of the proposed approach.

Nicole Dalia Cilia, Claudio De Stefano, Francesco Fontanella, Claudio Marrocco, Mario Molinara, Alessandra Scotto di Freca
Minimizing Training Data for Reliable Writer Identification in Medieval Manuscripts

Palaeography aims to study ancient documents and the identification of the people who participated in the handwriting process of a given document is one of the most important problems. To this aim, expert paleographers typically analyze handwriting features such as letter heights and widths, distances between characters and angles of inclination. With the aim of achieving more precise measures and also thanks to the availability of high-quality digital images, paleographers are starting to use digital tools. In this context, in previous studies, we proposed a pattern recognition system for distinguishing the writers of mediaeval books and also investigated which is the minimum amount of training data needed to achieve satisfactory results in terms of accuracy. In this paper, we present a reject option that allows us to implement a highly-reliable system for writer identification, trained on a reduced set of data. The experimental results, performed on two sets of digital images from medieval Bibles, show that rejecting only a few samples it is possible to strongly reduce the error rate.

Nicole Dalia Cilia, Claudio De Stefano, Francesco Fontanella, Mario Molinara, Alessandra Scotto di Freca

e-BADLE: First International Workshop on eHealth in the Big Data and Deep Learning Era

Frontmatter
Fusion of Visual and Anamnestic Data for the Classification of Skin Lesions with Deep Learning

Early diagnosis of skin lesions is essential for the positive outcome of the disease, which can only be resolved with surgical treatment. In this manuscript, a deep learning method is proposed for the classification of cutaneous lesions based on their visual appearance and on the patient’s anamnestic data. These include age and gender of the patient and position of the lesion. The classifier discriminates between benign and malignant lesions, mimicking a typical procedure in dermatological diagnostics. Good preliminary results on the ISIC Dataset demonstrate the importance of the information fusion process, which significantly improves the classification accuracy.

Simone Bonechi, Monica Bianchini, Pietro Bongini, Giorgio Ciano, Giorgia Giacomini, Riccardo Rosai, Linda Tognetti, Alberto Rossi, Paolo Andreini
Slide Screening of Metastases in Lymph Nodes via Conditional, Fully Convolutional Segmentation

We assess the viability of applying a conditional algorithm to the segmentation of Whole Slide Images (WSI) for human histopathology. Our objective is designing a deep network for automatic screening of large sets sentinel lymph-nodes of WSIs to detect those worth inspecting by a pathologist. Ideally, such system should modify and correct its behavior based on a limited set of examples, to foster interactivity and the incremental tuning to specific diagnostic pipelines and clinical practices and, not the least, to alleviate the task of collecting a suitable annotated dataset for training. In contrast, ‘classical’ supervised techniques require a vast dataset upfront and their behavior cannot be adapted unless through extensive retraining. The approach presented here is based on conditional and fully convolutional networks, which can segment a query image by conditioning on a support set of sparsely annotated images, fed at inference time. We describe the target scenario, the architecture used, and we present some preliminary results of segmentation experiments conducted on the publicly-available Camelyon16 dataset.

Gianluca Gerard, Marco Piastra
A Learning Approach for Informative-Frame Selection in US Rheumatology Images

Rheumatoid arthritis (RA) is an autoimmune disorder that causes pain, swelling and stiffness in joints. Nowadays, ultrasound (US) has undergone an increasing role in RA screening since it is a powerful tool to assess disease activity. However, obtaining a good quality US frame is a tricky operator dependent procedure. For this reason, the purpose of this paper is to present a strategy to the automatic selection of informative US rheumatology images by means of Convolutional Neural Networks (CNNs). The proposed method is based on VGG16 and Inception V3 CNNs, which are fine tuned to classify 214 balanced metacarpal head US images (75% used for training and 25% used for testing). A repeated 3 fold cross validation for each CNN was performed. The best results were achieved with VGG16 (area under the curve = 90%). These results support the possibility of applying this method in the actual clinical practice for supporting the diagnostic process and helping young residents’ training.

Maria Chiara Fiorentino, Sara Moccia, Edoardo Cipolletta, Emilio Filippucci, Emanuele Frontoni
A Serious Game to Support Decision Making in Medical Education

The patient safety is one of the most important element to guarantee a good quality of healthcare and to satisfy the required standard. As shown in several recent studies, the technological development facilitated the growth and the diffusion of the simulation in healthcare education. In particular, many different serious games have been developed to educate medical professionals and to improve the learning of the medical procedures. In this paper we present the design of an educational game to train the medical students in order to deal with cardiology cases. A multidisciplinary methodology was adopted in order to make the medical knowledge, the biomedical technical skills and the mathematical approach converging. The serious game was designed to support the decision making process, formulated as an integer programming mathematical model that also evaluates the game performance. Moreover, the serious game was developed in a 3D environment and was implemented by Scrum framework.

Ersilia Vallefuoco, Michele Mele, Alessandro Pepino
Nerve Contour Tracking for Ultrasound-Guided Regional Anesthesia

Ultrasound-Guided Regional Anesthesia is a technique to provide regional anesthesia aided by ultrasound visualization of the region on which the anesthesia will be applied. A proper detection and tracking of the nerve contour is necessary to decide where anesthesia should be applied. If the needle is too far from the nerve contour, the anesthesia could be ineffective, but if it touch the nerve could harm the patient. In this paper we address a model to track nerve contours in ultrasonic videos to assist the doctors during Ultrasound-Guided Regional Anesthesia procedures. The experimental results show that our model performs good within an acceptable margin of error.

Xavier Cortés, Donatello Conte, Pascal Makris
Skin Lesions Classification: A Radiomics Approach with Deep CNN

Supporting the early diagnosis of skin cancer is crucial for the sake of any kind of treatment or surgery. This work proposes to improve the outcome of automatic diagnoses approaches by using an ensemble of pre-trained deep convolutional neural networks and a suitable voting strategy. Moreover, a novel patching approach has been deployed. The proposal has been fairly evaluated with the literature proposals demonstrating good preliminary results.

Gabriele Piantadosi, Giampaolo Bovenzi, Giuseppe Argenziano, Elvira Moscarella, Domenico Parmeggiani, Ludovico Docimo, Carlo Sansone

DeepRetail: Deep Understanding of Shopper Behaviours and Interactions in Intelligent Retail Environment

Frontmatter
Semantic 3D Object Maps for Everyday Robotic Retail Inspection

In retail field, customer culture is shifting towards in-store researching, and retailers need to re-evaluate their location services to better assist customer. In-store mapping help retailers learn how their employees are interacting and it satisfies user intent to search for products, something that is often ignored by retailers especially for the secondary placement, which contains offers and promotions that change very often. In this paper, we describe a retail robot that moves autonomously inside a store and gathers points cloud data for a semantic store mapping. With all the data collected, it is possible to build a 3D map of the store with the exact product locations. This retail robot combines the features of both Robotics and Artificial Intelligence. Three classification approach have been compared in order to achieve the best performances: a machine learning technique, PointNet++ and a novel Reflectance PointNet++ especially designed for this task. Experiments are performed in a real retail environment that is an Italian supermarket, during business hours. A dataset has been built and made publicly available. The application of our approach yields good results in terms of precision, recall and F1-score and demonstrates the effectiveness of the proposed approach.

Marina Paolanti, Roberto Pierdicca, Massimo Martini, Francesco Di Stefano, Christian Morbidoni, Adriano Mancini, Eva Savina Malinverni, Emanuele Frontoni, Primo Zingaretti
Collecting Retail Data Using a Deep Learning Identification Experience

The aim of the paper is to present a part of an architecture realized by Huawei, that propose the first Christmas tree endowed with artificial intelligence. Its ability is to identify facial expressions from images acquired by a mobile application and then recognize the sentiment of the subject. So, basing on the prevailing sentiment the tree lights up itself with different special effects. Our task in the project was testing the performances of the neural networks employed in the mobile application for the recognition of facial emotion. We used a convolutional neural networks model-based and created a purposely dedicated dataset of images for testing the recognition performances.

Salvatore La Porta, Fabrizio Marconi, Isabella Lazzini
A Large Scale Trajectory Dataset for Shopper Behaviour Understanding

In intelligent retail environment, Ultra Wideband (UWB) is suitable for applications where the positioning accuracy is a critical parameter. This technology provides the use of several UWB antennas properly positioned inside a predetermined area and powered battery tags free to move inside the area. This has been used to deploy a Real Time Locating System (RTLS), which gives complete oversight of the customers and employees in the store and improves the customer experience. In this paper, it is described a tracking system based on UWB technology. The installation, in stores in Germany and Indonesia became the basis for a trajectory dataset and results presented in this paper based on a two-year experience that measured 10.4 million shoppers. Through the analysis of the collected tracking data, it allows to derive several information on the shoppers behaviour inside a store. Behaviours that concern flows of walking, most visited areas inside the space dedicated to the shopping and average travel times. The collection of this great quantity of data is very important for future marketing research that have the aim to attract people to purchase.

Patrizia Gabellini, Mauro D’Aloisio, Matteo Fabiani, Valerio Placidi
An IOT Edge-Fog-Cloud Architecture for Vision Based Pallet Integrity

Improving the availability of products in a store in order to avoid the OOS (out-of-stock) problem is a crucial topic nowadays. The reduction of OOS events leads to a series of consequences, including, an increase in customer satisfaction and loyalty to the store and brand, the production of positive advertising with a consequent growth in sales, and finally an increase in profitability and sales for a specific category. In this context, we propose the Pallet Integrity system for the automatic and real-time detection of OOS on promo pallets and promo forecasting using computer vision. The system uses two cameras placed in top-view configuration; one equipped with a depth sensor used to determines the number of pieces on the pallet and the other, a very high resolution web-cam, that is used for the facing recognition. The computer vision depth process takes place on edge, while the product recognition and promo OOS alarms runs on the fog, with a processing unit per store; the multi-promo forecasting service and the data aggregation and visualization is on cloud. The system was extensively tested on different real stores worldwide with accurate OOS detection and forecasting results.

Raffaele Vaira, Rocco Pietrini, Roberto Pierdicca, Primo Zingaretti, Adriano Mancini, Emanuele Frontoni
The Vending Shopper Science Lab: Deep Learning for Consumer Research

To understand human behavior, a fundamental aspect is the analysis of the face and movement. This aspect is particularly important in the context of sales, where to know the shopper also means to guide purchases. A major challenge for vending environment is to predict the shopper behavior, with the aim to influence and increase purchases. In this ambit, vending machine industry is actually an interesting and growing data-driven marketing area of research. In this context, the aim of this paper is to propose an innovative architecture that is able to integrate face and movement understanding in a common strategy for real time consumer modeling. The vending machine and the decision support system process multimedia data to smartly respond with dynamic pricing and product proposal to the particular shopper which is in front of a vending machine. The aim is to build an intelligent vending machine which in real time is able to suitably propose products to a labelled shopper. The results come from real environments vending lab with 30 locations and about 1 million consumers in Italy, and have the aim to demonstrate the good performances and high efficiency of our solution in recognizing the age and the gender of consumer and different interactions with the vending machine.

Fioravante Allegrino, Patrizia Gabellini, Luigi Di Bello, Marco Contigiani, Valerio Placidi

Industrial Session

Frontmatter
Boosting Object Recognition in Point Clouds by Saliency Detection

Object recognition in 3D point clouds is a challenging task, mainly when time is an important factor to deal with, such as in industrial applications. Local descriptors are an amenable choice whenever the 6 DoF pose of recognized objects should also be estimated. However, the pipeline for this kind of descriptors is highly time-consuming. In this work, we propose an update to the traditional pipeline, by adding a preliminary filtering stage referred to as saliency boost. We perform tests on a standard object recognition benchmark by considering four keypoint detectors and four local descriptors, in order to compare time and recognition performance between the traditional pipeline and the boosted one. Results on time show that the boosted pipeline could turn out up to 5 times faster, with the recognition rate improving in most of the cases and exhibiting only a slight decrease in the others. These results suggest that the boosted pipeline can speed-up processing time substantially with limited impacts or even benefits in recognition accuracy.

Marlon Marcon, Riccardo Spezialetti, Samuele Salti, Luciano Silva, Luigi Di Stefano
Hand Gesture Recognition for Collaborative Workstations: A Smart Command System Prototype

Human-machine collaboration is a key aspect in modern industries, which must be compliant to the Industry 4.0 paradigm. Although the collaboration can be achieved using a Collaborative Robot in a purposely designed workstation, this solution is not always neither feasible nor affordable for the specific task to be carried out in the workstation. On the other hand, using a smart HMI to make an industrial robot a “smart” robot can be a better and affordable solution depending on the task. In this work we present the preliminary development and characteristics of an experimental HMI for smart manufacturing developed in MATLAB and ROS Industrial. The collaboration between humans and robots is achieved by leveraging the Faster R-CNN Object Detector to robustly detect and recognize the hand gestures performed in real-time. The system is based on a state machine to carry out simple tasks such as the repeated movement of the robot following a given trajectory and a pick and place task where the robot interactively reaches a given point and a jog modality.

Cristina Nuzzi, Simone Pasinetti, Roberto Pagani, Franco Docchio, Giovanna Sansoni
In-Line Burr Inspection Through Backlight Vision

This paper presents a vision-based quality control system for detecting burrs (miniature metal filaments) in transverse holes of high precision turned hollow cylinders. The system performs 100% in-line quality control at the turning station. It exploits a camera with telecentric optics framing the sample from the outside in back-light condition. A specifically developed cylindrical illuminator provides radial diffuse back-light illumination over 360° and can be inserted within the part to be inspected. The possibility to detect burrs placed on both the outer and the inner surface of target holes is achieved by exploiting a customized rotating device integrated to a commercial gripping device. Overall, the system mimics the manual inspection normally performed by operators. Burrs are detected as modifications of the circular shape of each hole, through algorithms that identify the holes on grayscale images, perform circle identification by geometric matching and identify burrs through analysis of local deviations of the edge from circularity. Results acquired in a real production line over a batch of 2000 parts showed no false-positive or false-negative diagnosis.

Matteo Fitti, Paolo Castellini, Nicola Paone, Marco Zannini, Saverio Zitti, Marco Gambini, Paolo Chiariotti
Segmentation Guided Scoring of Pathological Lesions in Swine Through CNNs

The slaughterhouse is widely recognised as a useful checkpoint for assessing the health status of livestock. At the moment, this is implemented through the application of scoring systems by human experts. The automation of this process would be extremely helpful for veterinarians to enable a systematic examination of all slaughtered livestock, positively influencing herd management. However, such systems are not yet available, mainly because of a critical lack of annotated data.In this work we: (i) introduce a large scale dataset to enable the development and benchmarking of these systems, featuring more than 4000 high-resolution swine carcass images annotated by domain experts with pixel-level segmentation; (ii) exploit part of this annotation to train a deep learning model in the task of pleural lesion scoring.In this setting, we propose a segmentation-guided framework which stacks together a fully convolutional neural network performing semantic segmentation with a rule-based classifier integrating a-priori veterinary knowledge in the process. Thorough experimental analysis against state-of-the-art baselines proves our method to be superior both in terms of accuracy and in terms of model interpretability.Code and dataset are publicly available here: https://github.com/lucabergamini/swine-lesion-scoring

Luca Bergamini, Abigail Rose Trachtman, Andrea Palazzi, Ercole Del Negro, Andrea Capobianco Dondona, Giuseppe Marruchella, Simone Calderara
Intelligent Recognition of TCP Intrusions for Embedded Micro-controllers

IoT end-user devices are attractive and sometime easy targets for attackers, because they are often vulnerable in different aspects. Cyberattacks, started from those devices, can easily disrupt the availability of services offered by major internet companies. People that commonly get access to them across the world may experience abrupt interruption of services they use. In that context, this paper describes an embedded prototype to classify intrusions, affecting TCP packets. The proposed solution adopts an Artificial Neural Network (ANN) executed on resource-constrained and low-cost embedded micro controllers. The prototype operates without the need of remote intelligence assist. The adoption of an on-the-edge artificial intelligence architecture brings advantages such as responsiveness, promptness and low power consumption. The embedded intelligence is trained by using the well-known KDD Cup 1999 dataset, properly balanced on 5 types of labelled intrusions patterns. A pre-trained ANN classifies features extracted from TCP packets. The results achieved in this paper refer to the application running on the low cost widely available Nucleo STM32 micro controller boards from STMicroelectronics, featuring a F3 chip running at 72 MHz and a F4 chip running at 84 MHz with small embedded RAM and Flash memory.

Remi Varenne, Jean Michel Delorme, Emanuele Plebani, Danilo Pau, Valeria Tomaselli
6D Pose Estimation for Industrial Applications

Object pose estimation is important for systems and robots to interact with the environment where the main challenge of this task is the complexity of the scene caused by occlusions and clutters. A key challenge is performing pose estimation leveraging on both RGB and depth information: prior works either extract information from the RGB image and depth separately or use costly post-processing steps, limiting their performances in highly cluttered scenes and real-time applications. Traditionally, the pose estimation problem is tackled by matching feature points between 3D models and images. However, these methods require rich textured models. In recent years, the raising of deep learning has offered an increasing number of methods based on neural networks, such as DSAC++, PoseCNN, DenseFusion and SingleShotPose. In this work, we present a comparison between two recent algorithms, DSAC++ and DenseFusion, focusing on computational cost, performance and applicability in the industry.

Federico Cunico, Marco Carletti, Marco Cristani, Fabio Masci, Davide Conigliaro
Grain Segmentation in Atomic Force Microscopy for Thin-Film Deposition Quality Control

In this paper we propose an image segmentation method specifically designed to detect crystalline grains in microscopic images. We build on the watershed segmentation approach; we propose a preprocessing pipeline to generate a topographic map exploiting the physical nature of the incoming data (i.e. Atomic Force Microscopy) to emphasize grain boundaries and generate seeds for basins. Experimental results show the effectiveness of the proposed method against grain segmentation implementations available in commercial software on a new labelled dataset with an average improvement of over 20% in precision and recall over the standard implementation of watershed segmentation.

Nicolò Lanza, Alessandro Romeo, Marco Cristani, Francesco Setti
Advanced Moving Camera Object Detection

Assuming a moving camera, detection of moving objects is a challenging task. This is mainly due to the difficulties to distinguish between objects motion and background motion, introduced by the camera. The proposed real time system, based on previous work without camera movement, is able to discriminate well the two kind of motions, thanks to a robust global motion vector removal, which preserves objects identified in the previous steps. The system reaches high performances just using input Optical Flow, without any assumptions about environmental conditions and camera motion.

Giuseppe Spampinato, Arcangelo Bruna, Salvatore Curti, Davide Giacalone
Backmatter
Metadata
Title
New Trends in Image Analysis and Processing – ICIAP 2019
Editors
Marco Cristani
Andrea Prati
Oswald Lanz
Stefano Messelodi
Nicu Sebe
Copyright Year
2019
Electronic ISBN
978-3-030-30754-7
Print ISBN
978-3-030-30753-0
DOI
https://doi.org/10.1007/978-3-030-30754-7

Premium Partner