Skip to main content

About this book

This 8-volumes set constitutes the refereed of the 25th International Conference on Pattern Recognition Workshops, ICPR 2020, held virtually in Milan, Italy and rescheduled to January 10 - 11, 2021 due to Covid-19 pandemic. The 416 full papers presented in these 8 volumes were carefully reviewed and selected from about 700 submissions. The 46 workshops cover a wide range of areas including machine learning, pattern analysis, healthcare, human behavior, environment, surveillance, forensics and biometrics, robotics and egovision, cultural heritage and document analysis, retrieval, and women at ICPR2020.

Table of Contents


IMTA VII - Workshop on Image Mining Theory and Applications


The Study of Improving the Accuracy of Convolutional Neural Networks in Face Recognition Tasks

The article discusses the efficiency of convolutional neural networks in solving the problem of face recognition of tennis players. The characteristics of training and accuracy on a test set for networks of various architectures are compared. Application of weight drop out methods and data augmentation to eliminate the effect of retraining is also considered. Finally, the transfer learning from other known networks is used. It is shown how, for initial data, it is possible to increase recognition accuracy by 25% compared to a typical convolutional neural network.

Nikita Andriyanov, Vitaly Dementev, Alexandr Tashlinskiy, Konstantin Vasiliev

Estimate of the Neural Network Dimension Using Algebraic Topology and Lie Theory

In this paper we present an approach to determine the smallest possible number of neurons in a layer of a neural network in such a way that the topology of the input space can be learned sufficiently well. We introduce a general procedure based on persistent homology to investigate topological invariants of the manifold on which we suspect the data set. We specify the required dimensions precisely, assuming that there is a smooth manifold on or near which the data are located. Furthermore, we require that this space is connected and has a commutative group structure in the mathematical sense. These assumptions allow us to derive a decomposition of the underlying space whose topology is well known. We use the representatives of the k-dimensional homology groups from the persistence landscape to determine an integer dimension for this decomposition. This number is the dimension of the embedding that is capable of capturing the topology of the data manifold. We derive the theory and validate it experimentally on toy data sets.

Luciano Melodia, Richard Lenz

On the Depth of Gestalt Hierarchies in Common Imagery

Apart from machine learning and knowledge engineering, there is a third way of challenging machine vision – the Gestalt law school. In an interdisciplinary effort between psychology and cybernetics, compositionality in perception has been studied for at least a century along these lines. Hierarchical compositions of parts and aggregates are possible in this approach. This is particularly required for high-quality high-resolution imagery becoming more and more common, because tiny details may be important as well as large-scale interdependency over several thousand pixels distance. The contribution at hand studies the depth of Gestalt-hierarchies in a typical image genre – the group picture – exemplarily, and outlines technical means for their automatic extraction. The practical part applies bottom-up hierarchical Gestalt grouping as well as top-down search focusing, listing as well success as failure. In doing so, the paper discusses exemplarily the depth and nature of such compositions in imagery relevant to human beings.

Eckart Michaelsen

Image Recognition Algorithms Based on the Representation of Classes by Convex Hulls

Various approaches to the construction of pattern recognition algorithms based on the representation of classes as convex hulls in a multidimensional feature space are considered. This trend is well suited for biometrics problems with a large number of classes and small volumes of learning samples by class, for example, for problems of recognizing people by faces or fingerprints. In addition to simple algorithms for a point hitting a convex hull, algorithms of the nearest convex hull with different approaches to assessing the proximity of a test point to the convex hull of classes are investigated. Comparative experimental results are given and the advantages and disadvantages of the proposed approach are formulated.

Anatoly Nemirko

Tire Surface Segmentation in Infrared Imaging with Convolutional Neural Networks

Tire modeling is a fundamental task that experts must carry out to ensure optimal tire performance in terms of stability, grip, and fuel consumption. In addition to the major forces that act on the tire, the temperature changes that occur during test handling provide meaningful information for an accurate model. However, the analysis of the temperature in a rolling tire is not a trivial task due to the interactions of the tire and the pavement. A non-invasive technique, such as thermal infrared inspection, allows analyzing temperature changes on the surface of the tire under dynamic rolling conditions. Thus, the accurate segmentation of the tire is the first objective towards a better understanding of its performance. To this aim, we propose a novel approach that combines image processing techniques with convolutional neural networks. First, the handcrafted features extracted from the infrared images are used to build a dataset; then, a convolutional neural network is trained with the labeled images. Finally, the network makes predictions of the tire surface under different test conditions. The results have shown that our proposal achieves a segmentation accuracy $${>}$$ > 0.98 and a validation error $${<}$$ < 0.05.

Rodrigo Nava, Duc Fehr, Frank Petry, Thomas Tamisier

Human Action Recognition Using Recurrent Bag-of-Features Pooling

Bag-of-Features (BoF)-based models have been traditionally used for various computer vision tasks, due to their ability to provide compact semantic representations of complex objects, e.g., images, videos, etc. Indeed, BoF has been successfully combined with various feature extractions methods, ranging from handcrafted feature extractors to powerful deep learning models. However, BoF, along with most of the pooling approaches employed in deep learning, fails to capture the temporal dynamics of the input sequences. This leads to significant information loss, especially when the informative content of the data is sequentially distributed over the temporal dimension, e.g., videos. In this paper we propose a novel stateful recurrent quantization and aggregation approach in order to overcome the aforementioned limitation. The proposed method is inspired by the well-known Bag-of-Features (BoF) model, but employs a stateful trainable recurrent quantizer, instead of plain static quantization, allowing for effectively encoding the temporal dimension of the data. The effectiveness of the proposed approach is demonstrated using three video action recognition datasets.

Marios Krestenitis, Nikolaos Passalis, Alexandros Iosifidis, Moncef Gabbouj, Anastasios Tefas

Algorithms Based on Maximization of the Mutual Information for Measuring Parameters of Canvas Texture from Images

This work deals with the problem of canvas threads counting in images of paintings. Counting of threads is necessary for measuring canvas density and a number of other parameters used by art historians for dating the artworks. We propose to use raking light in the image acquisition process in order to emphasize canvas texture. We improve known techniques developed for inspecting fabrics in the textile industry. Two new threads counting algorithms based on filtering in the Fourier domain and mutual information maximization thresholding techniques are proposed and tested. These algorithms for measuring the canvas density from images taken in raking light are efficient in cases when the analysis of canvas images acquired in X-rays and transmitted light is ineffective. The results of the experiment show that the accuracy of the proposed threads counting algorithms is comparable to the accuracy of known techniques. The analysis of the characteristics of canvases of paintings by F.S. Rokotov allowed obtaining an informative feature that can be used by art historians and experts for dating the artworks.

Dmitry M. Murashov, Aleksey V. Berezin, Ekaterina Yu. Ivanova

Machine Learning Approach for Contactless Photoplethysmographic Measurement Verification

Contactless heart rate measurement techniques can be applied in medical and biometrical tasks such as vital signs measurement and vitality detection. Incorrect measurement result can cause serious consequences. In this paper a method for contactless heart rate measurement result verification is proposed. A binary classifier is used in order to identify whether a contactless photoplethysmogram (PPG) signal is reliable. Experimental setup used for signal dataset acquisition consists of contact plethysmograph, web-camera and contactless plethysmography device. Feature vector containing various signal and signal’s spectral density metrics as classification algorithms input is used. The highest classification accuracy is shown by classifier based on logistic regression (99.94%). The classification results demonstrate that the proposed method can be used in further contactless methods research.

Ivan Semchuk, Natalia Muravskaya, Konstantin Zlobin, Andrey Samorodov

On New Kemeny’s Medians

The Kemeny’s median is the well-known method to get a coordinated decision representing a group opinion. This ranking is the least different from experts’ orderings of alternatives. If rankings are immersed in a metric space, the median should be an average ranking from the mathematical point of view. As a result, the correct median should be the center of the set of rankings as points in a metric space. In this case it should be the median denoted as the Kemeny’s metric median. In this paper we propose also a new median denoted as the Kemeny’s weighted median as another type of the metric one. A new procedure is developed for the linear combination of experts’ rankings to build the weighted median.

Sergey Dvoenko, Denis Pshenichny

Image Decomposition Based on Region-Constrained Smoothing

The task of image decomposition is to split an image into piecewise smooth structural and difference texture-noise components. It is used in many tasks of video information processing and analyzing. The problem of decomposition is to provide independent smoothing in each of the structural regions of the image and to preserve the signal structure. Most of the known methods of decomposition and smoothing are based on analysis of measurable parameters of the local image area, for example, the distribution of signal values. These data does not reflect image area characteristics well enough. Obvious criterion for spatial limiting of the analyzed area is the belonging of the target and surrounding points to the same image spatial area. A sufficient criterion for the connectivity of the points in a region is the absence of contour edges between them. The article proposes an approach to the construction of a decomposition algorithm based on a preliminary delineation of image areas by detecting contours between them and subsequent contour-limited smoothing inside each of the areas. The concept of similarity of points in the image is introduced, on the basis of which the smoothing algorithm is built. Experimental comparisons of the proposed algorithm with other well-known smoothing algorithms are carried out.

Pavel A. Chochia

Machine Learning Based on Minimizing Robust Mean Estimates

The article considers the approach to the construction of robust methods and machine learning algorithms, which are based on the principle of minimizing estimates of average values that are insensitive to outliers. Proposed machine learning algorithms are based on the principle of iterative reweighting. Illustrative examples show the ability of the proposed approach and algorithms to overcome the influense of outliers.

Zaur M. Shibzukhov, Timofey A. Semenov

The Use of Machine Learning Methods to Detect Defects in Images of Metal Structures

The work is devoted to the study of the possibilities provided by modern neural networks in image processing for solving the problem of monitoring the state of steel and reinforced concrete structures. The paper presents a method for solving problems of such monitoring based on the use of a combination of several neural networks focused on recognizing a fragment of a structure and parts of a structure. Methods for training neural networks on small training samples are proposed. The results of algorithms on real images that show the consistency and efficiency of the proposed solution are presented.

Vitalii E. Dementev, Maria A. Gaponova, Marat R. Suetin, Anastasia S. Streltzova

Multiregion Multiscale Image Segmentation with Anisotropic Diffusion

We present a multiregion image segmentation approach which utilizes multiscale anisotropic diffusion based scale spaces. By combining powerful edge preserving anisotropic diffusion smoothing with isolevel set linking and merging, we obtain coherent segments which are tracked across multiple scales. A hierarchical tree representation of the given input image with progressively simplified regions is used with intra-scale splitting and inter-scale merging for obtaining multiregion segmentations. Experimental results on natural and medical images indicate that multiregion, multiscale image segmentation (MMIS) approach obtains coherent segmentation results.

V. B. Surya Prasath, Dang Ngoc Hoang Thanh, Nguyen Hoang Hai, Sergey Dvoenko

The Test of Covariation Functions of Cylindrical and Circular Images

Nowadays, image processing problems are becoming increasingly important due to development of the aerospace Earth monitoring systems, radio and sonar systems, medical devices for early disease diagnosis etc. But the most of the image processing works deal with images defined on rectangular two-dimensional grids or grids of higher dimension. In some practical situations, images are set on a cylinder (for example, images of pipelines, blood vessels, parts during turning) or on a circle (for example, images of the facies (thin film) of dried biological fluid, an eye, cut of a tree trunk). The peculiarity of the domain for specifying such images requires its consideration in their models and processing algorithms. In the present paper, autoregressive models of cylindrical and circular images are considered, and expressions of the correlation function depending on the autoregression parameters are given. The spiral scan of a cylindrical image can be considered as a quasiperiodic process due to the correlation of image rows. To represent inhomogeneous images with random heterogeneity, «doubly stochastic» models are used in which one or more controlling images control the parameters of the resulting image. Given the resulting image, it is possible to estimate parameters of the model of controlling images. But it is not enough to identify hidden images completely. It is necessary to investigate the covariation function of given image. Does it match the hypothetical one? The test for covariation functions of cylindrical and circular images is proposed with investigation its power relative to parameters of image model.

Victor Krasheninnikov, Yuliya Kuvayskova, Olga Malenova, Alexey Subbotin

One-Class Classification Criterion Robust to Anomalies in Training Dataset

A new version of one-class classification criterion robust to anomalies in the training dataset is proposed based on support vector data description (SVDD). The original formulation of the problem is not geometrically correct, since the value of the penalty for the admissible escape of the training sample objects outside the describing hypersphere is incommensurable with the distance to its center in the optimization problem and the presence of outliers can greatly affect the decision boundary. The proposed criterion is intended to eliminate this inconsistency. The equivalent form of criterion without constraints lets us use a kernel-based approach without transition to the dual form to make a flexible description of the training dataset. The substitution of the non-differentiable objective function by the smooth one allows us to apply an algorithm of sequential optimizations to solve the problem. We apply the Jaccard measure for a quantitative assessment of the robustness of a decision rule to the presence of outliers. A comparative experimental study of existing one-class methods shows the superiority of the proposed criterion in anomaly detection.

Aleksandr O. Larin, Oleg S. Seredin, Andrey V. Kopylov

Recognition of Tomographic Images in the Diagnosis of Stroke

In this paper, a method for automatic recognition of acute stroke model using non-contrast computed tomography brain images is presented. The complexity of the task lies in the fact that the dataset consists of a very small number of images. To solve the problem, we used the traditional computer vision methods and a convolutional neural network consisting of a segmentator and classifier. To increase the dataset, augmentations and sub images were used. Experiments with real CT images using validation and test samples showed that even on an extremely small dataset it is possible to train a model that will successfully cope with the classification and segmentation of images. We also proposed a way to increase the interpretability of the model.

Kirill Kalmutskiy, Andrey Tulupov, Vladimir Berikov

Two-Stage Classification Model for Feather Images Identification

The paper explores the usage of neural networks for bird species identification based on feathers image. The taxonomic identification of birds’ feather is widely used in aviation ornithology to analyze collisions with aircraft and develop methods to prevent them. This article presents a novel dataset consisting of 28,272 images of the plumage of 595 bird species. We compare models trained on four subsets from the initial dataset. We propose the method of identifying bird species based on YoloV4 and DenseNet models. The experimental estimation showed that the resulted method makes it possible to identify the bird based on the photograph of the single feather with an accuracy up to 81,03% for precise classification and with accuracy 97,09% for of the first five predictions of the classifier.

Alina Belko, Konstantin Dobratulin, Andrey Kunznetsov

An Objective Comparison of Ridge/Valley Detectors by Image Filtering

Ridges and valleys are the principle geometric features for their diverse applications, especially in image analysis problems such as segmentation, object detection, etc. Numerous characterizations have contributed to formalize the ridge and valley theory. The significance of each characterization rely however on its practical usefulness in a particular application. The objective comparison and evaluation of ridgeness/valleyness characterized as thin and complex image structure is thus crucially important, for choosing, which parameter’s values correspond to the optimal configuration to obtain accurate results and best performance. This paper presents a supervised and objective comparison of different filtering-based ridge detectors. Moreover, the optimal parameter configuration of each filtering techniques have been objectively investigated.

Ghulam-Sakhi Shokouh, Baptiste Magnier, Binbin Xu, Philippe Montesinos

High-Performance Algorithms Application for Retinal Image Segmentation Based on Texture Features

Diabetic retinopathy is a dangerous disease of the eye fundus. If the treatment is untimely or inadequate, people affected by the disease may loose their eyesight for a variety of reasons. Laser photocoagulation is an advanced technique for treating diabetic retinopathy, with an eye surgeon extracting certain retinal areas to be exposed to laser pulses based on his expertise. Laser light parameters and pulse repetition rate are also chosen based on the previous experience of surgical interventions. An automated mapping of a preliminary coagulation pattern enables a number of challenges associated with the surgical procedure on the retina to be addressed. The manual mapping of the coagulation pattern is a highly demanding job that requires high-level concentration. It would be much more convenient if a doctor was able slightly to adjust an automatically mapped preliminary coagulation pattern rather than mapping it themselves. In this way, both the possibility of human error and the preparatory phase the surgical procedure are essentially reduced. Of great interest is an algorithm for extracting a laser coagulation zone, which is based on an algorithm for retinal image segmentation. The algorithm performs segmentation using texture features but takes long to run. Because of this, here, we propose a high-performance algorithm for retinal image segmentation, which enables a consecutive version to be made essentially faster, while outperforming a parallel algorithm.

Nataly Ilyasova, Alexandr Shirokanev, Nikita Demin, Andrey Zolotarev

Learning Topology: Bridging Computational Topology and Machine Learning

Topology is a classical branch of mathematics, born essentially from Euler’s studies in the XVII century, which deals with the abstract notion of shape and geometry. Last decades were characterised by a renewed interest in topology and topology-based tools, due to the birth of computational topology and Topological Data Analysis (TDA). A large and novel family of methods and algorithms computing topological features and descriptors (e.g. persistent homology) have proved to be effective tools for the analysis of graphs, 3d objects, 2D images, and even heterogeneous datasets. This survey is intended to be a concise but complete compendium that, offering the essential basic references, allows you to orient yourself among the recent advances in TDA and its applications, with an eye to those related to machine learning and deep learning.

Davide Moroni, Maria Antonietta Pascali

Library of Sample Image Instances for the Cutting Path Problem

The Cutting Path Problem (CPP) is a complex continuous and combinatorial optimization problem that is about finding an optimal tool path for CNC technologies equipment. The problem has many valuable industrial applications arising from the Industry 4.0 strategy, such as those, related to tool path routing for the sheet metal cutting machines. The CPP is strongly NP-hard enclosing variants of the well-known Traveling Salesman Problem (TSP) as sub-problems. In this paper, we for the first time propose an open access library of sample instances (CPPLib) for the CPP to facilitate the benchmarking of optimization algorithms, most of them are heuristics or metaheuristics. Each instance is obtained as an image of a finite set of mutually nested industrial parts on a metal sheet and is presented in the DXF vector format that is induced by a solution result of the well-known 2D nesting problem. For the first time we propose geometric and quantitative principles for constructing different groups (classes) of such image instances. Along with continuous CPP settings, the library contains their discrete counterparts presented in the form of instances of the Precedence Constraints Generalized Traveling Salesman Problem (PCGTSP), since the solution processes for the CPP mostly based on discretizing boundary contours of parts. In addition, the paper presents examples of testing some optimization algorithms for solving the cutting path problem on test instances from the developed CPPLib library.

Alexander Petunin, Alexander Khalyavka, Michael Khachay, Andrei Kudriavtsev, Pavel Chentsov, Efim Polishchuk, Stanislav Ukolov

Interest Points Detection Based on Sign Representations of Digital Images

In this work, we present a method for detecting interest points in digital images that is robust under a certain class of brightness transformations. Importance of such method is due to the fact that current video surveillance systems perform well under controlled environments but tend to suffer when variations in illumination are present.Novelity of the method is based on the use of so-called sign representation of images. In contrast to representation of a digital image by its brightness function, sign representation associates with an image a graph of brightness increasing relation on pixels. As a result, the sign representation determines not a single image but a class of images, whose brightness functions are differ by monotonic transforms.Other feature of the method is in interpretation of interest points. This concept in image processing theory is not rigorously defined; in general, a point of interest can be characterized by increased “complexity” of image structure in its vicinity. Since the sign representation associates with an image a directed graph, we consider interest points as “concentrators” of paths from/to vertices of the graph.The results of experiments confirm the efficiency of the method.

Alexander Karkishchenko, Valeriy Mnukhin

Memory Consumption and Computation Efficiency Improvements of Viola-Jones Object Detection Method for UAVs

In this paper, we consider object classification and detection problems for autonomous UAVs. We propose an algorithm that is effective from the point of view of computational complexity and memory consumption. The proposed algorithm can be successfully used as a basic tool for building an autonomous UAV control system. The algorithm is based on the Viola-Jones method. It is shown in the paper, that the Viola-Jones method is the most preferable approach to detect objects on-board UAVs because it needs the least amount of memory and the number of computational operations to solve the object detection problem. To ensure sufficient accuracy, we use a modified feature: rectangular Haar-like features, calculated over the magnitude of the image gradient. To increase computational efficiency, the L1 norm was used to calculate the magnitude of the image gradient. The PSN-10 inflatable life raft (an example of an object that is detected during rescue operations using UAVs) and oil tank storage (such kind of objects are usually detected during the inspection of industrial infrastructure) are considered as target objects in this work. The performance of the trained detectors was estimated on real data (including data obtained during the real rescue operation of the trawler “Dalniy Vostok” in 2015).

Sergey A. Usilin, Oleg A. Slavin, Vladimir V. Arlazarov

Automation of the Detection of Pathological Changes in the Morphometric Characteristics of the Human Eye Fundus Based on the Data of Optical Coherence Tomography Angiography

This paper presents the results of the joint work of image analysis specialists and ophthalmologists on the task of analyzing images obtained by the method of optical coherence tomography angiography. A method was developed to automate the detection of pathological changes in the morphometric characteristics of the fundus. The solution of the image recognition problem assumes the presence of certain image representations, the presence of effective recognition algorithms, and the compliance of the used image representations with the requirements of the recognition algorithms for the source data. To reduce images to a form that is easy to recognize we considered sets of features that met all the necessary requirements of specialists. Chosen feature model was implemented to the problem of classification of images of patients with and without pathologies. The developed method makes it possible to classify pathological changes in the vascular bed of the human eye with high accuracy.

Igor Gurevich, Maria Budzinskaya, Vera Yashina, Adil Tleubaev, Vladislav Pavlov, Denis Petrachkov

MobileEmotiFace: Efficient Facial Image Representations in Video-Based Emotion Recognition on Mobile Devices

In this paper, we address the emotion classification problem in videos using a two-stage approach. At the first stage, deep features are extracted from facial regions detected in each video frame using a MobileNet-based image model. This network has been preliminarily trained to identify the age, gender, and identity of a person, and further fine-tuned on the AffectNet dataset to classify emotions in static images. At the second stage, the features of each frame are aggregated using multiple statistical functions (mean, standard deviation, min, max) into a single MobileEmotiFace descriptor of the whole video. The proposed approach is experimentally studied on the AFEW dataset from the EmotiW 2019 challenge. It was shown that our image mining technique leads to more accurate and much faster decision-making in video-based emotion recognition when compared to conventional feature extractors.

Polina Demochkina, Andrey V. Savchenko

Basic Models of Descriptive Image Analysis

This paper is devoted to the basic models of descriptive image analysis, which is the leading branch of the modern mathematical theory of image analysis and recognition.Descriptive analysis provides for the implementation of image analysis processes in the image formalization space, the elements of which are various forms (states, phases) of the image representation that is transformed from the original form into a form that is convenient for recognition (i.e., into a model), and models for converting data representations. Image analysis processes are considered as sequences of transformations that are implemented in the phase space and provide the construction of phase states of the image, which form a phase trajectory of the image translation from the original view to the model.Two types of image analysis models are considered: 1) models that reflect the general properties of the process of image recognition and analysis – the setting of the task, the mathematical and heuristic methods used, and the algorithmic content of the process: a) a model based on a reverse algebraic closure; b) a model based on the equivalence property of images; c) a model based on multiple image models and multiple classifiers; 2) models that characterize the architecture and structure of the recognition process: a) a multilevel model for combining algorithms and source data in image recognition; b) an information structure for generating descriptive algorithmic schemes for image recognition.A brief description, a comparative analysis of the relationships and specifics of these models are given. Directions for further research are discussed.

Igor Gurevich, Vera Yashina

Evaluation of Spectral Similarity Measures and Dimensionality Reduction Techniques for Hyperspectral Images

Hyperspectral data is becoming more and more in demand these days. However, its effective use is hindered by significant redundancy. In this paper, we analyze the effectiveness of using common dimensionality reduction methods together with known measures of spectral similarity. In particular, we use Euclidean distance, spectral angle mapper, and spectral divergence to measure dissimilarity in hyperspectral space. For the mapping to lower-dimensional space, we use nonlinear methods, namely, Nonlinear Mapping, Isomap, Locally Linear Embedding, Laplacian Eigenmaps, and UMAP. Quality assessment is performed using known hyperspectral scenes based on the results provided by the nearest neighbor classifier and support vector machine.

Evgeny Myasnikov

Maximum Similarity Method for Image Mining

The paper discusses a new Image Mining approach to extracting and exploring relations in the image repositories. The proposed approach, called Maximum Similarity Method, is based on the identification of characteristic fragments in images by a set of predefined patterns. Such an identification is basically carried out as a comparison of the fragment intensity shape with the shapes of already registered patterns - precedents. Mathematically (statistically) such a comparison implies a selection of some measure of similarity and optimization (maximization) of that measure on a set of precedents. In the paper, basing on the principles of machine learning, a special type of similarity measure is proposed, and its reliability is discussed. In fact, this measure represents conditional probability distribution of the registered data - counts of a fragment tested when analogous data for the patterns are given. So, the search for the optimal precedent pattern that maximized the chosen similarity measure constitutes the proposed method.

Viacheslav Antsiperov

First Step Towards Creating a Software Package for Detecting the Dangerous States During Driver Eye Monitoring

The problem of detecting human fatigue by the state of the eyes is considered. A program for detecting the state of open/closed eyes has been developed. The Haar cascades were used to search for faces. Then the eyes were detected on the video from simple web-camera, which allowed us to accumulate a sufficient dataset. Training took place using convolutional neural networks, and due to different lighting conditions, different accuracy characteristics were obtained for the left and right eyes. Using Python programming language with the Jupyter Notebook functionality and the OpenCV library, a software package has been developed that allows us to highlight closed eyes when testing for a learning subject (certain person from whose images the model was trained) with an accuracy of about 90% on a camera with a low resolution (640 by 480 pixels). The proposed solution can be used in the tasks of monitoring driver’s state because one of the most frequent reasons of road accidents is driver fatigue.

Nikita Andriyanov

IUC 2020 - The 1st International Workshop on Human and Vehicle Analysis for Intelligent Urban Computing


Unbalanced Optimal Transport in Multi-camera Tracking Applications

Multi-view multi-object tracking algorithms are expected to resolve multi-object tracking persistent issues within a single camera. However, the inconsistency of camera videos in most of the surveillance systems obstructs the ability of re-identifying and jointly tracking targets through different views. As a crucial task in multi-camera tracking, assigning targets from one view to another is considered as an assignment problem. This paper is presenting an alternative approach based on Unbalanced Optimal Transport for the unbalanced assignment problem. On each view, targets’ position and appearance are projected on a learned metric space, and then an Unbalanced Optimal Transport algorithm is applied to find the optimal assignment of targets between pairs of views. The experiments on common multi-camera databases show the superiority of our proposal to the heuristic approach on MOT metrics.

Quoc Cuong Le, Donatello Conte, Moncef Hidane

Arithmetic Evaluation System Based on MixNet-YOLOv3 and CRNN Neural Networks

In the traditional teaching procedure, the repetitive labor of correcting arithmetic exercise brings huge human costs. To reduce these costs and improve the given teaching efficiency, we propose a novel intelligent arithmetic evaluation system, which can automatically identify the meaning of each arithmetic question and make a reasonable judgment or decision. The designed evaluation system can be divided into two modules with detection and identification. In the detection module, due to the intensive distribution and various formats of arithmetic questions in the test papers, we adopt the MixNet-YOLOv3 network with scale balance and lightweight to achieve speed-accuracy trade-off with the mAP being up to 0.989; In the recognition module, considering the formats of each arithmetic problem are mostly fixed, we employ the CRNN network based on the CTC decoding mechanism to achieve an accuracy being up to 0.971. By the incorporation of two networks, the proposed system is capable of intelligently evaluating arithmetic exercise in mobile devices.

Tianliang Liu, Congcong Liang, Xiubin Dai, Jiebo Luo

HSS-GCN: A Hierarchical Spatial Structural Graph Convolutional Network for Vehicle Re-identification

Vehicle re-identification (Re-ID) is the task aiming to identify the same vehicle from images captured by different cameras. Recent years have seen various appearance-based approaches focusing only on global features or exploring local features to obtain more subtle details which can alleviate the subtle inter-instance problem. However, few emphasize the spatial geometrical structure relationship among local regions or between the global region and local regions. To explore above-mentioned spatial structure relationship, this paper proposes a hierarchical spatial structural graph convolutional network (HSS-GCN) for vehicle Re-ID, in which we firstly construct a hierarchical spatial structural graph with the global region and local regions as nodes and a two-hierarchy relationship as edges, and later learning discriminative structure features with a GCN module under the constraints of metric learning. To augment the performance of our proposed network, we jointly combine the classification loss with metric learning loss. Extensive experiments conducted on the public VehicleID and VeRi-776 datasets validate the effectiveness of our approach in comparison with recent works.

Zheming Xu, Lili Wei, Congyan Lang, Songhe Feng, Tao Wang, Adrian G. Bors

A Novel Multi-feature Skeleton Representation for 3D Action Recognition

Deep-learning-based methods have been used for 3D action recognition in recent years. Methods based on recurrent neural networks (RNNs) have the advantage of modeling long-term context, but they focus mainly on temporal information and ignore the spatial relationships in each skeleton frame. In addition, it is difficult to handle a very long skeleton sequence using an RNN. Compared with an RNN, a convolutional neural network (CNN) is better able to extract spatial information. To model the temporal information of skeleton sequences and incorporate the spatial relationship in each frame efficiently using a CNN, this paper proposes a multi-feature skeleton representation for encoding features from original skeleton sequences. The relative distances between joints in each skeleton frame are computed from the original skeleton sequence, and several relative angles between the skeleton structures are computed. This useful information from the original skeleton sequence is encoded as pixels in grayscale images. To preserve more spatial relationships between input skeleton joints in these images, the skeleton joints are divided into five groups: one for the trunk and one for each arm and each leg. Relationships between joints in the same group are more relevant than those between joints in different groups. By rearranging pixels in encoded images, the joints that are mutually related in the spatial structure are adjacent in the images. The skeleton representations, composed of several grayscale images, are input to CNNs for action recognition. Experimental results demonstrate the effectiveness of the proposed method on three public 3D skeleton-based action datasets.

Lian Chen, Ke Lu, Pengcheng Gao, Jian Xue, Jinbao Wang

R2SN: Refined Semantic Segmentation Network of City Remote Sensing Image

Semantic segmentation is always a key problem in remote sensing image analysis. Especially, it is very useful for city-scale vehicle detection. However, multi-object and imbalanced data classes of remote sensing images bring a huge challenge, which leads that many traditional segmentation approaches were often unsatisfactory. In this paper, we propose a novel Refined Semantic Segmentation Network (R2SN), which apply the classic encoder-to-decoder framework to handle segmentation problem. However, we add the convolution layers in encoder and decoder to make the network can achieve more local information in the training step. The design is more suitable for high-resolution remote sensing image. More specially, the classic Focal loss is introduced in this network, which can guide the model focus on the difficult objects in remote sensing images and effectively handle multi-object segmentation problem. Meanwhile, the classic Hinge loss is also utilized to increase the distinction between classes, which can guarantee the more refined segmentation results. We validate our approach on the International Society for Photogrammetry and Remote Sensing (ISPRS) semantic segmentation benchmark dataset. The evaluation and comparison results show that our method exceeds the state-of-the-art remote sensing image segmentation methods in terms of mean intersection over union (MIoU), pixel accuracy, and F1-score.

Chenglong Wang, Dong Wu, Jie Nie, Lei Huang

Light-Weight Distilled HRNet for Facial Landmark Detection

A light-weight facial landmark detection model is proposed in this paper (we named it “LDHRNet”), which can be trained in an end-to-end fashion and could perform precise facial landmark detection in various conditions including those with large pose, exaggerated expression, non-uniform lighting and occlusions. Firstly, in order to deal with these challenging cases above, a light-weight HRNet (LHRNet) structure is proposed as the backbone while the bottleneck block is used to replace the standard residual block in the original HRNet and the group convolution is used to replace the standard convolution in the original HRNet. Then in order to prevent the accuracy loss by the coordinates quantization, we use function named dual soft argmax (DSA) to map the heatmap response to final coordinates. And then we proposed Similarity-FeatureMap knowledge distillation model which guides the training of a student network such that input pairs that produce similar (dissimilar) feature maps in the pre-trained teacher network produce similar (dissimilar) feature maps in the student network. Finally, we combine the distillation loss and NME loss to train our model. The best result 79.10% for AUC is achieved on the validation set.

Ziye Tong, Shenqi Lai, Zhenhua Chai

DeepFM-Based Taxi Pick-Up Area Recommendation

Recommending accurately pick-up area with sparse GPS data is valuable and still challenging to increase taxi drivers’ profits and reduce fuel consumption. In recent years, the recommendation approach based on matrix factorization has been proposed to deal with sparsity. However, it is not accurate enough due to the regardless of the interaction effect between features. Therefore, this paper proposes DeepFM-based taxi pick-up area recommendation. Firstly, the research area is divided into grid area of equal size, the pick-up point information is extracted from the original GPS trajectory data, the pick-up point information and POI data are mapped to the grid area, the corresponding grid attributes are calculated and the grid feature matrix is constructed; Then, DeepFM is used to mine the combined relationship between the grid features, combining spatial information to recommend the most suitable grid area for drivers; Finally, the performance evaluation is carried out using DiDi's public data. The experimental results show that this method can significantly improve the quality of the recommended results and is superior to some existing recommended methods.

Xuesong Wang, Yizhi Liu, Zhuhua Liao, Yijiang Zhao

IWBDAF 2020 - International Workshop on Biometric Data Analysis and Forensics


Blockchain-Based Iris Authentication in Order to Secure IoT Access and Digital Money Spending

In this work we use two approaches to prevent double spending and secure IoT transaction through Blockchain and information fusion techniques: the first method is based on Smart Contracts; while the second method is implemented by a novel Blockchain system. Both methods use a hybrid RSA and biometric codes fusion in order to have an encrypted, taking account of the privacy, identity of the spender. In Blockchain a contract between the parties is stipulated and the biometric fused code are kept. The used coin pack is tagged, so a future double spending or any illegal action on the IoT network or Smart city becomes forbidden and prosecutable legally. The second part of the work will present a novel Blockchain where a transaction can be signed only if the user is successfully authenticated though his biometric fused key.

Gerardo Iovane, Antonio Rapuano, Patrizia Di Gironimo

DeepFakes Evolution: Analysis of Facial Regions and Fake Detection Performance

Media forensics has attracted a lot of attention in the last years in part due to the increasing concerns around DeepFakes. Since the initial DeepFake databases from the $$\text {1}^{\text {st}}$$ 1 st generation such as UADFV and FaceForensics++ up to the latest databases of the $$\text {2}^{\text {nd}}$$ 2 nd generation such as Celeb-DF and DFDC, many visual improvements have been carried out, making fake videos almost indistinguishable to the human eye. This study provides an exhaustive analysis of both $$\text {1}^{\text {st}}$$ 1 st and $$\text {2}^{\text {nd}}$$ 2 nd DeepFake generations in terms of facial regions and fake detection performance. Two different methods are considered in our experimental framework: i) the traditional one followed in the literature and based on selecting the entire face as input to the fake detection system, and ii) a novel approach based on the selection of specific facial regions as input to the fake detection system.Among all the findings resulting from our experiments, we highlight the poor fake detection results achieved even by the strongest state-of-the-art fake detectors in the latest DeepFake databases of the $$\text {2}^{\text {nd}}$$ 2 nd generation, with Equal Error Rate results ranging from 15% to 30%. These results remark the necessity of further research to develop more sophisticated fake detectors.

Ruben Tolosana, Sergio Romero-Tapiador, Julian Fierrez, Ruben Vera-Rodriguez

Large Scale Graph Based Network Forensics Analysis

In this paper we tackle the problem of performing graph based network forensics analysis at a large scale. To this end, we propose a novel distributed version of a popular network forensics analysis algorithm, the one by Wang and Daniels [18].Our version of the Wang and Daniels algorithm has been formulated according to the MapReduce paradigm and implemented using the Apache Spark framework. The resulting code is able to analyze in a scalable way graphs of arbitrary size thanks to its distributed nature. We also present the results of an experimental study where we assessed both the time performance and the scalability of our algorithm when run on a distributed system of increasing size.

Lorenzo Di Rocco, Umberto Ferraro Petrillo, Francesco Palini

Deep Iris Compression

Lossy image compression can reduce the space and bandwidth required for image storage and transmission, which is increasinly in demand by the iris recognition systems developers. Deep learning techniques (i.e. CNN, and GAN networks) are quickly becoming a tool of choice for general image compression tasks. But some key quality criteria, such as high perceptual quality and the spatial precision of the images, need to be satisfied when applying such modules for iris images compression tasks. We investigate and evaluate the expediency of a deep learning based compression model for iris data compression. In particular, we relate rate-distortion performance as measured in PSNR, and Multi-scale Structural Similarity Index (MS-SSIM) to the recognition scores as obtained by a concrete recognition system. We further compare the model performance against a state-of-the-art deep learning base image compression technique as well as some lossy compression algorithms currently used for iris compression (namely: the current ISO standard JPEG2000, JPEG, H.265 derivate BPG, and WEBP), to figure out the most suited compression algorithm which can be used for this purpose. The experimental results show superior compression, and promising recognition performance of the model over all other techniques on different iris data.

Ehsaneddin Jalilian, Heinz Hofbauer, Andreas Uhl

IFEPE: On the Impact of Facial Expression in Head Pose Estimation

The Head Pose Estimation (HPE) is the study of the angular rotations of the head along the Pitch, Yaw, and Roll axes. Widely used in facial involving methods, as face frontalization, driver attention and best surveillance frame selection, is strongly related to facial features. In this study we examine the impact of facial expressions (FE) on HPE and, in particular, we put in relation the axis more affected by the error when a specific facial expression is observable. The HPE method chosen for this purpose is based on the Partitioned Iterated Function System (PIFS). For its construction this method is dependent on the facial appearance and self-similarity. Basing on this, and using a FER network, we observed that there is an evident relation between facial expressions and pose errors. This relation go thought the facial keypoints distances and can be discriminated along the three axes, by providing an estimate of the percentages of variation in errors related to a percentage of variation in distances.

Carmen Bisogni, Chiara Pero

Analysing and Exploiting Complexity Information in On-Line Signature Verification

This paper proposes an in-depth analysis on how the complexity of signatures affects the performance in on-line signature verification. In signature verification there is a very wide range of signatures from some based on a simple flourish to very complex ones. In this work we consider three different complexity groups: low, medium and high. We carry out an analysis of performance evaluation for each complexity group for both random and skilled forgeries. Two verification systems are used for this analysis, a traditional one based on the popular DTW and a state-of-the-art one based on time aligned recurrent neural networks (TA-RNN) recently proposed. The experiments are carried out over the largest database available to date for on-line signature verification (DeepSignDB). Then, we propose several approaches in order to exploit the information related to the signature complexity with the final aim of improving the signature verification system performance. Our best proposed approach is based on training a system with a balanced number of subjects regarding their type of signature complexity.

Miguel Caruana, Ruben Vera-Rodriguez, Ruben Tolosana

A Novel Ensemble Framework for Face Search

This paper proposes an ensemble of different state-of-art algorithms for realizing a face search system aimed at achieving higher accuracies compared to any single algorithm. This is achieved by leveraging most promising deep networks (Facenet, OpenFace, DeepFace, and VGGFace – originally trained for face recognition) and different Approximate Nearest Neighbor Search (ANNS) algorithms (Annoy and LSHash). Face images in the database are subjected to feature extraction (embeddings computed by deep networks) and indexing (in set structure for faster search) by ANNS algorithms. An input face query image is processed in the following four stages. First, the face region is detected from the query image and appropriately aligned for further processing. Second, the facial features are extracted using multiple deep networks. Third, the ANNS algorithms perform fast search by efficiently shrinking the gallery size from millions to a few hundred faces. Fourth, a fine matching is performed using two different methods separately to produce the final search results. These are (a) cosine similarity and (b) score-based matching and re-ranking of results. The experimental results demonstrate the diversity in results obtained by use of multiple feature extractors and ANNS techniques and the accuracy achieved by using the proposed ensemble framework.

Shashank Vats, Sankalp Jain, Prithwijit Guha

Real-Time Thermal Face Identification System for Low Memory Vision Applications Using CNN

Image based face identification systems have attained optimal performance. However, the design of such systems often involves some issues related to extreme light conditions and privacy protection, among others. Since several years, Face Identification (FI) based on thermal images using deep neural networks (DNN) has received significant attention. Yet, the majority of the FI systems developed through DNN’s need huge computational power; those systems are not suitable for the devices with memory limitations. In this paper, we proposed a new CNN framework based on depthwise separable convolutions for real-time face identification for low memory vision applications. The lack of publicly available thermal datasets makes very hard the research and developing of new techniques. In this work, we further present a new large-scale thermal face database called “ST_UNICT_Thermal_Face”. As per our analysis, the evaluation of the learnt model using the data obtained in the single-day (without temporal variations), it might not stable over time. One of the main reasons behind the development of this database for the real-time evaluation of the proposed model depends on the fact that most thermal face identification systems are not stable over time and climate due to insufficient time data. The evaluation results exhibit that the proposed framework is suitable for the devices having limited memory and which is stable over time and different indoor environmental conditions.

Rami Reddy Devaram, Alessandro Ortis, Sebastiano Battiato, Arcangelo R. Bruna, Valeria Tomaselli

MADiMa 2020 - 6th International Workshop on Multimedia Assisted Dietary Management


Assessing Individual Dietary Intake in Food Sharing Scenarios with Food and Human Pose Detection

Food sharing and communal eating are very common in some countries. To assess individual dietary intake in food sharing scenarios, this work proposes a vision-based approach to first capturing the food sharing scenario with a 360-degree camera, and then using a neural network to infer different eating states of each individual based on their body pose and relative positions to the dishes. The number of bites each individual has taken of each dish is then deduced by analyzing the inferred eating states. A new dataset with 14 panoramic food sharing videos was constructed to validate our approach. The results show that our approach is able to reliably predict different eating states as well as individual’s bite count with respect to each dish in food sharing scenarios.

Jiabao Lei, Jianing Qiu, Frank P.-W. Lo, Benny Lo

Recognition of Food-Texture Attributes Using an In-Ear Microphone

Food texture is a complex property; various sensory attributes such as perceived crispiness and wetness have been identified as ways to quantify it. Objective and automatic recognition of these attributes has applications in multiple fields, including health sciences and food engineering. In this work we use an in-ear microphone, commonly used for chewing detection, and propose algorithms for recognizing three food-texture attributes, specifically crispiness, wetness (moisture), and chewiness. We use binary SVMs, one for each attribute, and propose two algorithms: one that recognizes each texture attribute at the chew level and one at the chewing-bout level. We evaluate the proposed algorithms using leave-one-subject-out cross-validation on a dataset with 9 subjects. We also evaluate them using leave-one-food-type-out cross-validation, in order to examine the generalization of our approach to new, unknown food types. Our approach performs very well in recognizing crispiness (0.95 weighted accuracy on new subjects and 0.93 on new food types) and demonstrates promising results for objective and automatic recognition of wetness and chewiness.

Vasileios Papapanagiotou, Christos Diou, Janet van den Boer, Monica Mars, Anastasios Delopoulos

Visual Aware Hierarchy Based Food Recognition

Food recognition is one of the most important components in image-based dietary assessment. However, due to the different complexity level of food images and inter-class similarity of food categories, it is challenging for an image-based food recognition system to achieve high accuracy for a variety of publicly available datasets. In this work, we propose a new two-step food recognition system that includes food localization and hierarchical food classification using Convolutional Neural Networks (CNNs) as the backbone architecture. The food localization step is based on an implementation of the Faster R-CNN method to identify food regions. In the food classification step, visually similar food categories can be clustered together automatically to generate a hierarchical structure that represents the semantic visual relations among food categories, then a multi-task CNN model is proposed to perform the classification task based on the visual aware hierarchical structure. Since the size and quality of dataset is a key component of data driven methods, we introduce a new food image dataset, VIPER-FoodNet (VFN) dataset, consists of 82 food categories with 15k images based on the most commonly consumed foods in the United States. A semi-automatic crowdsourcing tool is used to provide the ground-truth information for this dataset including food object bounding boxes and food object labels. Experimental results demonstrate that our system can significantly improve both classification and recognition performance on 4 publicly available datasets and the new VFN dataset.

Runyu Mao, Jiangpeng He, Zeman Shao, Sri Kalyan Yarlagadda, Fengqing Zhu

Analysis of Chewing Signals Based on Chewing Detection Using Proximity Sensor for Diet Monitoring

This paper presents chewing data analysis based on the new approach of chewing detection for diet monitoring applications. The proposed approach is based on chewing detection using a proximity sensor in capturing the temporalis muscle movement during chewing. The aim is to support the development of non-contact-based chewing detection. A wearable device of eyeglass is used with the sensor mounted at the right side of the eyeglass temple using a 3D printed housing. The main activity involved in this study is resting and eating, three test food that represents different food hardness (carrot, banana, and apple) with a portion of one spoonful. Several upper cut-off frequencies (fc2) of bandpass filters were used during the analyses. The signals were evaluated using accuracy and F1-score for classification and the absolute mean of error for chewing count estimation. In the classification stage, using the setting of a 10-fold cross-validation method and a 3 s segmented window, fc2 of 6 Hz gives the highest accuracy with 97.6%, while, 2.5 Hz gives the lowest accuracy of 92.6%. However, in the chewing count estimation stage, which is based on a 240 s segmented window, 2.4 Hz able to give a smaller percentage absolute error of 2.69%, compare to 6 Hz with 12.11%. It can be concluded that the chewing frequency was under 2.5 Hz, but, the self-reporting labeling approach used in this study reduced the accuracy of the system as fc2 equal to 2.5 Hz is used. Further analysis of chewing count shows that might in relating the total chewing count with different food hardness.

Nur Asmiza Selamat, Sawal Hamid Md. Ali

Food Recognition in the Presence of Label Noise

The objective of multi-label image classification is to recognise several objects that appear within a single image. In the current paper, we consider the task of multi-label food recognition, where the images contain foods for which the labels in the training set are noisy, as they are annotated by inexperienced annotators. We now propose that a noise adaptation layer should be appended to a pretrained baseline model, in order to make it possible to learn from these noisy labels. From the baseline model, predictions are made on the training set and a confusion matrix is created from these predictions and the noisy labels. This confusion matrix is used to initialise the weights of the noise layer and the full model is retrained on the training set. The final predictions for the testing set are made from the baseline model, after its weights have been readjusted by the noise layer. We show that the final model significantly improves performance on noisy datasets.

Ioannis Papathanail, Ya Lu, Arindam Ghosh, Stavroula Mougiakakou

S2ML-TL Framework for Multi-label Food Recognition

Transfer learning can be attributed to several recent breakthroughs in deep learning. It has shown upbeat performance improvements, but most of the transfer learning applications are confined towards fine-tuning. Transfer learning facilitates the learnability of the networks on domains with less data. However, learning becomes a difficult task with complex domains, such as multi-label food recognition, owing to the shear number of food classes as well as to the fine-grained nature of food images. For this purpose, we propose S2ML-TL, a new transfer learning framework to leverage the knowledge learnt on a simpler single-label food recognition task onto multi-label food recognition. The framework is further enhanced using class priors to tackle the dataset bias that exists between single-label and multi-label food domains. We validate the proposed scheme with two multi-label datasets on different backbone architectures and the results show improved performance compared to the conventional transfer learning approach.

Bhalaji Nagarajan, Eduardo Aguilar, Petia Radeva

UEC-FoodPix Complete: A Large-Scale Food Image Segmentation Dataset

Currently, many segmentation image datasets are open to the public. However, only a few open segmentation image dataset of food images exists. Among them, UEC-FoodPix is a large-scale food image segmentation dataset which consists of 10,000 food images with segmentation masks. However, it contains some incomplete mask images, because most of the segmentation masks were generated automatically based on the bounding boxes. To enable accurate food segmentation, complete segmentation masks are required for training. Therefore, in this work, we created “UEC-FoodPix Complete” by refining the 9,000 segmentation masks by hand which were automatically generated in the previous UEC-FoodPix. As a result, the segmentation performance was much improved compared to the segmentation model trained with the original UEC-FoodPix. In addition, as applications of the new food segmentation dataset, we performed food calorie estimation using the food segmentation models trained with “UEC-FoodPix Complete”, and food image synthesis from segmentation masks.

Kaimu Okamoto, Keiji Yanai

Event Mining Driven Context-Aware Personal Food Preference Modelling

A personal food model (PFM) is essential for high-quality food recommendation systems to enhance health and enjoyment. We can build such models using food logging platforms that capture the users’ food events. As proposed in the Westermann and Jain event model, capturing six facets of multi-modal data provides a holistic view of any event. Five of these facets are captured during the event (temporal, structural, informational, experiential, spatial), while the sixth facet is related to the causality of the event. This causal facet is needed to build a robust PFM if all the other relevant information in the aforementioned five facets are captured. Any food logger and subsequent processing should collect all this data in the food event. Ultimately, we want to know what caused this person to eat this food and what changes this food event causes in the person’s health state. In this paper, we identify details of the food event model that may help build a causal understanding in PFM to address the first aspect of the causality, what may be the contextual factors that cause a certain food event to occur for a user. We utilize an event mining approach to determine the causal relationships to build a contextual understanding of the PFM. We generate data using a food event simulator that can generate needed food event data for a person with known PFM. The event mining results uncover this hidden PFM and demonstrate the greater efficacy of this approach than a traditionally designed PFM.

Vaibhav Pandey, Ali Rostami, Nitish Nag, Ramesh Jain

Analysis of Traditional Italian Food Recipes: Experiments and Results

Traditional recipes are among those elements that UNESCO included in its Intangible Cultural Heritage for safeguarding. Traditional recipes are passed down from one generation to the other, and offer strong links with a particular territory. Driven by the important role of food recipes in the cultural heritage domain, we have create CookIt, a web portal with the aim to collect, disseminate and safeguard the knowledge of typical Italian recipes and the Mediterranean diet which is a significant part of the Italian cuisine. In this paper we present some preliminary results in recipe analysis to be used within our web portal to support innovative ways to navigate and browse them. We developed some processing and visualization tools to support the analysis and the presentation of the recipes. Our tools are tailored for the Italian language although they can be generalized.

Maria Teresa Artese, Gianluigi Ciocca, Isabella Gagliardi


Additional information

Premium Partner

    Image Credits