Zum Inhalt

Pattern Recognition and Image Analysis

9th Iberian Conference, IbPRIA 2019, Madrid, Spain, July 1–4, 2019, Proceedings, Part II

  • 2019
  • Buch
insite
SUCHEN

Über dieses Buch

This 2-volume set constitutes the refereed proceedings of the 9th Iberian Conference on Pattern Recognition and Image Analysis, IbPRIA 2019, held in Madrid, Spain, in July 2019.

The 99 papers in these volumes were carefully reviewed and selected from 137 submissions. They are organized in topical sections named:

Part I: best ranked papers; machine learning; pattern recognition; image processing and representation.

Part II: biometrics; handwriting and document analysis; other applications.

Inhaltsverzeichnis

Frontmatter

Biometrics

Frontmatter
What Is the Role of Annotations in the Detection of Dermoscopic Structures?

There has been an increasing demand for computer-aided diagnosis systems to become self-explainable. However, in fields such as dermoscopy image analysis this comes at the cost of asking physicians to annotate datasets in a detailed way, such that they simultaneously identify and manually segment regions of medical interest (dermoscopic criteria) in the images. The segmentations are then used to train an automatic detection system to reproduce the procedure. Unfortunately, providing manual segmentations is a cumbersome and time consuming task that may not be generalized to large amounts of data. Thus, this work aims to understand how much information is really needed for a system to learn to detect dermoscopic criteria. In particular, we will show that given sufficient data, it is possible to train a model to detect dermoscopic criteria solely using global annotations at the image level, and achieve similar performances to that of a fully supervised approach, where the model has access to local annotations at the pixel level (segmentations).

Bárbara Ferreira, Catarina Barata, Jorge S. Marques
Keystroke Mobile Authentication: Performance of Long-Term Approaches and Fusion with Behavioral Profiling

In this paper we evaluate the performance of mobile keystroke authentication according to: (1) data availability to model the user; and (2) combination with behavioral-based profiling techniques. We have developed an ensemble of three behavioral based-profile authentication techniques (WiFi, GPS Location, and App usage) and a Keystroke state-of-the-art recognition approach. Algorithms based on template update are employed for profiling systems meanwhile bidirectional recurrent neuronal networks with a Siamese training setup is used for the keystroke system. Our experiments are conducted on the semi-uncontrolled UMDAA-02 database. This database comprises smartphone sensor signals acquired during natural human-mobile interaction. Our results show that it is necessary 6 days of usage data stored to achieve the best performance in average. The template update allows to improve the equal error rate of keystroke by a relative 20%–30% performance.

Alejandro Acien, Aythami Morales, Ruben Vera-Rodriguez, Julian Fierrez
Incremental Learning Techniques Within a Self-updating Approach for Face Verification in Video-Surveillance

Data labelling is still a crucial task which precedes the training of a face verification system. In contexts where training data are obtained online during operational stages, and/or the genuine identity changes over time, supervised approaches are less suitable.This work proposes a face verification system capable of autonomously generating a robust model of a target identity (genuine) from a very limited amount of labelled data (one or a few video frames). A self-updating approach is used to wrap two well known incremental learning techniques, namely Incremental SVM and Online Sequential ELM.The performance of both strategies are compared by measuring their ability to unsupervisedly improve the model of the genuine identity over time, as the system is queried by both genuine and impostor identities. Results confirm the feasibility and potential of the self-updating approach in a video-surveillance context.

Eric Lopez-Lopez, Carlos V. Regueiro, Xosé M. Pardo, Annalisa Franco, Alessandra Lumini
Don’t You Forget About Me: A Study on Long-Term Performance in ECG Biometrics

The performance of biometric systems is known to decay over time, eventually rendering them ineffective. Focused on ECG-based biometrics, this work aims to study the permanence of these signals for biometric identification in state-of-the-art methods, and measure the effect of template update on their long-term performance. Ensuring realistic testing settings, four literature methods based on autocorrelation, autoencoders, and discrete wavelet and cosine transforms, were evaluated with and without template update, using Holter signals from THEW’s E-HOL 24 h database. The results reveal ECG signals are unreliable for long-term biometric applications, and template update techniques offer considerable improvements over the state-of-the-art results. Nevertheless, further efforts are required to ensure long-term effectiveness in real applications.

Gabriel Lopes, João Ribeiro Pinto, Jaime S. Cardoso
Face Identification Using Local Ternary Tree Pattern Based Spatial Structural Components

This paper reports a face identification system which makes use of a novel local descriptor called Local Ternary Tree Pattern (LTTP). Exploiting and extracting distinctive local descriptor from a face image plays a crucial role in face identification task in the presence of a variety of face images including constrained, unconstrained and plastic surgery images. LTTP has been used to extract robust and useful spatial features which use to describe the various structural components on a face. To extract the features, a ternary tree is formed for each pixel with its eight neighbors in each block. LTTP pattern can be generated in four forms such as LTTP–Left Depth (LTTP-LD), LTTP–Left Breadth (LTTP-LB), LTTP–Right Depth (LTTP-RD) and LTTP–Right Breadth (LTTP-RB). The encoding schemes of these patterns are very simple and efficient in terms of computational as well as time complexity. The proposed face identification system is tested on six face databases, namely, the UMIST, the JAFFE, the extended Yale face B, the Plastic Surgery, the LFW and the UFI. The experimental evaluation demonstrates the most promising results considering a variety of faces captured under different environments. The proposed LTTP based system is also compared with some local descriptors under identical conditions.

Rinku Datta Rakshit, Dakshina Ranjan Kisku, Massimo Tistarelli, Phalguni Gupta
Catastrophic Interference in Disguised Face Recognition

It is commonly known the natural tendency of artificial neural networks to completely and abruptly forget previously known information when learning new information. We explore this behaviour in the context of Face Verification on the recently proposed Disguised Faces in the Wild dataset (DFW). We empirically evaluate several commonly used DCNN architectures on Face Recognition and distill some insights about the effect of sequential learning on distinct identities from different datasets, showing that the catastrophic forgetness phenomenon is present even in feature embeddings fine-tuned on different tasks from the original domain.

Parichehr B. Ardakani, Diego Velazquez, Josep M. Gonfaus, Pau Rodríguez, F. Xavier Roca, Jordi Gonzàlez
Iris Center Localization Using Geodesic Distance and CNN

In this paper, we propose a new eye iris center localization method for remote tracking scenarios. The method combines the geodesic distance with CNN-based classification. Firstly, the geodesic distance is used for fast preliminary localization of the regions possibly containing the iris. Then a convolutional neural network is used to carry out the final decision and to refine the final position of the iris center. In the first step, the areas that do not appear to contain the eyeball are quickly filtered out, which makes the whole algorithm fast even on less powerful computers. The proposed method is evaluated and compared with the state-of-the-art methods on two publicly available datasets focused to the remote tracking scenarios (namely BioID [9], GI4E [15]).

Radovan Fusek, Eduard Sojka
Low-Light Face Image Enhancement Based on Dynamic Face Part Selection

A common challenge faced by face recognition community is struggling to circumvent face images that are acquired under low-light situation. The present work aims to couple the power of the popular CLAHE algorithm for face preprocessing with a Fuzzy inference system in such a way to correct the annoyance of non-uniform illumination of face images in a targeted and a precise manner. Due to the particularity of the low-light illumination problem. Firstly, the input face image is divided into two equal sub-regions. Subsequently, the degree of brightness in each sub-region and in the whole face is used for dynamic decision of whether to normalize. In the case where only one region of the face undertakes the CLAHE-Fuzzy approach is applied. Thus, the left and right face regions are grouped back followed by further processing like a blur removal and contrast enhancement (smoothing). Visual results showed that more facial features appeared in comparison with other approaches for enhancement. Besides, we quantitatively validate the accuracy of the developed Partial Fuzzy Enhancement Approach (PFEA) with four different metrics. The effectiveness of PFEA technique has been demonstrated by presenting extensive experimental results using Extended Yale-B, CMU-PIE, Mobio, and CAS-PEAL databases.

Adel Oulefki, Mustapha Aouache, Messaoud Bengherabi
Retinal Blood Vessel Segmentation: A Semi-supervised Approach

Segmentation of retinal blood vessels is an important step in several retinal image analysis tasks. State-of-the-art papers are still incapable to segment retinal vessels correctly, especially, in presence of pathology. In this paper an innovative descriptor named Robust Feature Descriptor (RFD) is proposed to describe vessel pixels more uniquely in the presence of pathology. For accurate segmentation of blood vessels, the method combines both supervised and unsupervised approaches. Extensive experiments have been conducted on three publicly available datasets namely DRIVE, STARE and CHASE_DB1; and the method has been compared with other state-of-the-art methods. The proposed method achieves an overall segmentation accuracy of 0.961, 0.960 and 0.955 respectively on DRIVE, STARE and CHASE_DB1 datasets, which are better than the state-of-the-art methods in comparison. The sensitivity, specificity and area under curve (AUC) of the method are respectively 0.737, 0.981, 0.859 on DRIVE dataset; 0.805, 0.972, 0.889 on STARE dataset; and 0.763, 0.969, 0.866 on CHASE_DB1 dataset.

Tanmai K. Ghosh, Sajib Saha, G. M. Atiqur Rahaman, Md. Abu Sayed, Yogesan Kanagasingam
Quality-Based Pulse Estimation from NIR Face Video with Application to Driver Monitoring

In this paper we develop a robust heart rate (HR) estimation method using face video for challenging scenarios with high variability sources such as head movement, illumination changes, vibration, blur, etc. Our method employs a quality measure Q to extract a remote Plethysmography (rPPG) signal as clean as possible from a specific face video segment. Our main motivation is developing robust technology for driver monitoring. Therefore, for our experiments we use a self-collected dataset consisting of Near Infrared (NIR) videos acquired with a camera mounted in the dashboard of a real moving car. We compare the performance of a classic rPPG algorithm, and the performance of the same method, but using Q for selecting which video segments present a lower amount of variability. Our results show that using the video segments with the highest quality in a realistic driving setup improves the HR estimation with a relative accuracy improvement larger than 20%.

Javier Hernandez-Ortega, Shigenori Nagae, Julian Fierrez, Aythami Morales

Handwriting and Document Analysis

Frontmatter
Multi-task Layout Analysis of Handwritten Musical Scores

Document Layout Analysis (DLA) is a process that must be performed before attempting to recognize the content of handwritten musical scores by a modern automatic or semiautomatic system. DLA should provide the segmentation of the document image into semantically useful region types such as staff, lyrics, etc. In this paper we extend our previous work for DLA of handwritten text documents to also address complex handwritten music scores. This system is able to perform region segmentation, region classification and baseline detection in an integrated manner.Several experiments were performed in two different datasets in order to validate this approach and assess it in different scenarios. Results show high accuracy in such complex manuscripts and very competent computational time, which is a good indicator of the scalability of the method for very large collections.

Lorenzo Quirós, Alejandro H. Toselli, Enrique Vidal
Domain Adaptation for Handwritten Symbol Recognition: A Case of Study in Old Music Manuscripts

The existence of a large amount of untranscripted music manuscripts has caused initiatives that use Machine Learning (ML) for Optical Music Recognition, in order to efficiently transcribe the music sources into a machine-readable format. Although most music manuscript are similar in nature, they inevitably vary from one another. This fact can negatively influence the complexity of the classification task because most ML models fail to transfer their knowledge from one domain to another, thereby requiring learning from scratch on new domains after manually labeling new data. This work studies the ability of a Domain Adversarial Neural Network for domain adaptation in the context of classifying handwritten music symbols. The main idea is to exploit the knowledge of a specific manuscript to classify symbols from different (unlabeled) manuscripts. The reported results are promising, obtaining a substantial improvement over a conventional Convolutional Neural Network approach, which can be used as a basis for future research.

Tudor N. Mateiu, Antonio-Javier Gallego, Jorge Calvo-Zaragoza
Approaching End-to-End Optical Music Recognition for Homophonic Scores

The recognition of patterns that have a time dependency is common in areas like speech recognition or natural language processing. The equivalent situation in image analysis is present in tasks like text or video recognition. Recently, Recurrent Neural Networks (RNN) have been broadly applied to solve these task with good results in an end-to-end fashion. However, its application to Optical Music Recognition (OMR) is not so straightforward due to the presence of different elements at the same horizontal position, disrupting the linear flow of the time line. In this paper we study the ability of the RNNs to learn codes that represent this disruption in homophonic scores. The results prove that our serialized ways of encoding the music content are appropriate for Deep Learning-based OMR and they deserve further study.

María Alfaro-Contreras, Jorge Calvo-Zaragoza, José M. Iñesta
Glyph and Position Classification of Music Symbols in Early Music Manuscripts

Optical Music Recognition is a field of research that automates the reading of musical scores so as to transcribe their content into a structured digital format. When dealing with music manuscripts, the traditional workflow establishes separate stages of detection and classification of musical symbols. In the latter, most of the research has focused on detecting musical glyphs, ignoring that the meaning of a musical symbol is defined by two components: its glyph and its position within the staff. In this paper we study how to perform both glyph and position classification of handwritten musical symbols in early music manuscripts written in white Mensural notation, a common notation system used for the most part of the XVI and XVII centuries. We make use of Convolutional Neural Networks as the classification method, and we tested several alternatives such as using independent models for each component, combining label spaces, or using both multi-input and multi-output models. Our results on early music manuscripts provide insights about the effectiveness and efficiency of each approach.

Alicia Nuñez-Alcover, Pedro J. Ponce de León, Jorge Calvo-Zaragoza
Recognition of Arabic Handwritten Literal Amounts Using Deep Convolutional Neural Networks

In this paper, we propose using a convolutional neural network (CNN) in the recognition of Arabic handwritten literal amounts. Deep convolutional neural networks have achieved an excellent performance in various computer vision and document recognition tasks, and have received increased attention in the few last years. The domain of handwriting in the Arabic script specially poses a different type of technical challenges. In this work we focus on the recognition of handwritten Arabic literal amount with a limited lexicon. Our experimental results demonstrate the high performance of the proposed CNN recognition system compared to traditional methods.

Moumen El-Melegy, Asmaa Abdelbaset, Alaa Abdel-Hakim, Gamal El-Sayed
Offline Signature Verification Using Textural Descriptors

Offline signature verification has been the most commonly employed modality for authentication of an individual and, it enjoys global acceptance in legal, banking and official documents. Verifying the authenticity of a signature (genuine or forged) remains a challenging problem from the perspective of computerized solutions. This paper presents a signature verification technique that exploits the textural information of a signature image to discriminate between genuine and forged signatures. Signature images are characterized using two textural descriptors, the local ternary patterns (LTP) and the oriented basic image features (oBIFs). Signature images are projected in the feature space and the distances between pairs of genuine and forged signatures are used to train SVM classifiers (a separate SVM for each of the two descriptors). When presented with a questioned signature, the decision on its authenticity is made by combining the decisions of the two classifiers. The technique is evaluated on Dutch and Chinese signature images of the ICDAR 2011 benchmark dataset and high accuracies are reported.

Ismail Hadjadj, Abdeljalil Gattal, Chawki Djeddi, Mouloud Ayad, Imran Siddiqi, Faycel Abass
Pencil Drawing of Microscopic Images Through Edge Preserving Filtering

Automatic diatom identification approaches have revealed remarkable abilities to tackle the challenges of water quality assessment and other environmental issues. Scientists often analyze the taxonomic characters of the target taxa for automatic identification. In this process the digital photographs, sketches or drawings are recorded to analyze the shape and size of the frustule, the arrangement of striae, the raphe endings, and the striae density. In this paper, we describe two new methods for producing drawings of different diatom species at any stage of their life cycle development that can also be useful for future reference and comparisons. We attempt to produce drawings of diatom species using Edge-preserving Multi-scale Decomposition (EMD). The edge preserving smoothing property of Weighted Least Squares (WLS) optimization framework is used to extract high-frequency details. The details extracted from two-scale decomposition are transformed to drawings which help in identifying possible striae patterns from diatom images. To analyze the salient local features preserved in the drawings, the Scale Invariant Feature Transform (SIFT) model is adopted for feature extraction. The generated drawings help to identify certain unique taxonomic and morphological features that are necessary for the identification of the diatoms. The new methods have been compared with two alternative pencil drawing techniques showing better performance for details preservation.

Harbinder Singh, Carlos Sánchez, Gabriel Cristóbal, Gloria Bueno
Line Segmentation Free Probabilistic Keyword Spotting and Indexing

Probabilistic Keyword Spotting and Indexing (PKWSI) allows effective search through untranscribed large collections of images. However, when text-line detection fails to detect foreground text, the PKWSI techniques also fail dramatically. In this paper, we develop a new line segmentation-free approach using a uniform line-sized image slicing instead of previous text-line detection. As a result, new issues arise due to overlapping slices, leading to several spot hypotheses for the same word. We develop solutions to take advantage of multiple spots and to consolidate them into single hypotheses. We test our approach on a difficult historical handwritten dataset and it yields promising results.

Killian Barrere, Alejandro H. Toselli, Enrique Vidal

Other Applications

Frontmatter
Incremental Learning for Football Match Outcomes Prediction

Generating predictions for football match results is an expanding research area due to the commercial assets involved in the betting process. Traditionally, the results of the matches are predicted using statistical models verified by domain experts. Nowadays, this approach is challenged by the increasing amount of diverse football related information that need to be processed. In this paper, we propose an incremental learning method to predict the football match outcome category (home win, draw, away win) based on prior to the game publicly available information. The proposed framework is illustrated with data for the Portuguese first division football teams for 2017/2018 season. Factor Analysis was applied to extract most discriminating features which allowed gradual convergence of the prediction error to 32.4% after accumulation of about one third of the season games. Our approach outperforms traditional models in the gambling industry today and implies potential financial opportunities. The proposed prediction model is useful for researchers, football betting crowd, bookmakers, sport managers.

José Domingues, Bernardo Lopes, Petya Mihaylova, Petia Georgieva
Frame by Frame Pain Estimation Using Locally Spatial Attention Learning

Estimating pain intensity for patient is a challenging area in clinic treatment and medical diagnosis. Painful facial expression only relates to some areas of face. Inspired by this fact, we introduce end-to-end locally spatial attention learning for pain estimation. By focusing on important region in the face with $$1 \times 1$$ locally convolutional layer, the local features related to pain intensity can be captured. Furthermore, facial expression is the dynamic deformation of face in the time domain. In order to model the information, the long short-term memory network (LSTM) is incorporated into our architecture. The feature extracted by the convolutional neural network (CNN) with the locally spatial attention learning is fed to the LSTM network. The results show that our locally spatial attention learning can provide the fine-grained variation on the face region for pain intensity assessment.

Jun Yu, Toru Kurihara, Shu Zhan
Mosquito Larvae Image Classification Based on DenseNet and Guided Grad-CAM

The surveillance of Aedes aegypti and Aedes albopictus mosquito to avoid the spreading of arboviruses that cause Dengue, Zika and Chikungunya becomes more important, because these diseases have greatest repercussions in public health in the significant extension of the world. Mosquito larvae identification methods require special equipment, skillful entomologists and tedious work with considerable consuming time. In comparison with the short mosquito lifecycle, which is less than 2 weeks, the time required for all surveillance process is too long. In this paper, we proposed a novel technological approach based on Deep Neural Networks (DNNs) and visualization techniques to classify mosquito larvae images using the comb-like figure appeared in the eighth segment of the larva’s abdomen. We present the DNN and the visualization technique employed in this work, and the results achieved after training the DNN to classify an input image into two classes: Aedes and Non-Aedes mosquito. Based on the proposed scheme, we obtain the accuracy, sensitivity and specificity, and compare this performance with existing technological approaches to demonstrate that the automatic identification process offered by the proposed scheme provides a better identification performance.

Zaira García, Keiji Yanai, Mariko Nakano, Antonio Arista, Laura Cleofas Sanchez, Hector Perez
Towards Automatic Rat’s Gait Analysis Under Suboptimal Illumination Conditions

Rat’s gait analysis plays an important role in the assessment of the impact of certain drugs on the treatment of osteoarthritis. Since movement-evoked pain is an early characteristic of this degenerative joint disease, the affected animal modifies its behavior to protect the injured joint from load while walking, altering its gait’s parameters, which can be detected through a video analysis. Because commercially available video-based gait systems still present many limitations, researchers often choose to develop a customized system for the acquisition of the videos and analyze them manually, a laborious and time-consuming task prone to high user variability. Therefore, and bearing in mind the recent advances in machine learning and computer vision fields, as well as their presence in many tracking and recognition applications, this work is driven by the need to find a solution to automate the detection and quantification of the animal’s gait changes making it an easier, faster, simpler and more robust task. Thus, a comparison between different methodologies to detect and segment the animal under degraded luminance conditions is presented in this paper as well as an algorithm to detect, segment and classify the animal’s paws.

Ana F. Adonias, Joana Ferreira-Gomes, Raquel Alonso, Fani Neto, Jaime S. Cardoso
Impact of Enhancement for Coronary Artery Segmentation Based on Deep Learning Neural Network

X-ray Coronary angiograms are intended to specify the global state of the artery system and therefore to detect and locate the zones of narrowing. Accurate coronary artery segmentation is a fundamental step in computer aided diagnosis of many diseases. In this paper, deep neural network based on U-Net architecture is proposed in order to improve the segmentation task for coronary images. In this context, various enhancement methods, like the adaptive histogram equalization and the multiscale technique with a Frangi filter are tested not only in normal conditions but also in the presence of noise to improve the system performance and to ensure its robustness against real conditions. Promising result are obtained and discussed for different performance criteria. This work will serve as a reference and motivation for researchers interested in the field of blood vessel segmentation by deep learning neural networks.

Ahmed Ghazi Blaiech, Asma Mansour, Asma Kerkeni, Mohamed Hédi Bedoui, Asma Ben Abdallah
Real-Time Traffic Monitoring with Occlusion Handling

Traffic surveillance through vision systems is a highly demanded task. To solve it, it is necessary to combine detection and tracking in a way that meets the requirements of operating in real time while being robust against occlusions. This paper proposes a traffic monitoring system that meets these requirements. It is formed by a deep learning-based detector, tracking through a combination of Discriminative Correlation Filter and a Kalman Filter, and data association based on the Hungarian method. The viability of the system has been proved for roundabout input/output analysis with near 1,000 vehicles in real-life scenarios.

Mauro Fernández-Sanjurjo, Manuel Mucientes, Víctor M. Brea
Image Based Estimation of Fruit Phytopathogenic Lesions Area

A method was developed to measure the surface area of walnut fruit phytopathogenic lesions from images acquired with a basic calibration target. The fruit is modelled by a spheroid, established from the 2D view ellipse using an iterative process. The method was tested with images of colour circular marks placed on a wooden spheroid. It proved effective in the estimation of the spheroid semi-diameters (average relative errors of 0.8% and 1.0%), spheroid surface (1.77%) and volume (2.71%). The computation of the colour mark surface area was within the expected error, considering the image resolution (up to about 4%), for 22 out of 28 images tested.

André R. S. Marcal, Elisabete M. D. S. Santos, Fernando Tavares
A Weakly-Supervised Approach for Discovering Common Objects in Airport Video Surveillance Footage

Object detection in video is a relevant task in computer vision. Standard and current detectors are typically trained in a strongly supervised way, what requires a huge amount of labelled data. In contrast, in this paper we focus on object discovery in video sequences by using sets of unlabelled data. Thus, we present an approach based on the use of two region proposal algorithms (a pretrained Region Proposal Network and an Optical Flow Proposal) to produce regions of interest that will be grouped using a clustering algorithm. Therefore, our system does not require the collaboration of a human except for assigning human understandable labels to the discovered clusters. We evaluate our approach in a set of videos recorded at apron area, where the aeroplanes park to load passengers and luggage. Our experimental results suggest that the use of an unsupervised approach is valid for automatic object discovery in video sequences, obtaining a CorLoc of 86.8 and a mAP of 0.374 compared to a CorLoc of 70.4 and mAP of 0.683 achieved by a supervised Faster R-CNN trained and tested on the same dataset.

Francisco Manuel Castro, Rubén Delgado-Escaño, Nicolás Guil, Manuel Jesús Marín-Jiménez
Standard Plenoptic Camera Calibration for a Range of Zoom and Focus Levels

Plenoptic cameras have a complex optical geometry combining a main lens, a microlens array and an image sensor to capture the radiance of the light rays in the scene in its spatial and directional dimensions. As conventional cameras, changing the zoom and focus settings originate different parameters to describe the cameras, and consequently a new calibration is needed. Current calibration procedures for these cameras require the acquisition of a dataset with a calibration pattern for the specific zoom and focus settings. Complementarily, standard plenoptic cameras (SPCs) provide metadata parameters with the acquired images that are not considered on the calibration procedures. In this work, we establish the relationships between the camera model parameters of a SPC obtained by calibration and the metadata parameters provided by the camera manufacturer. These relationships are used to obtain an estimate of the camera model parameters for a given zoom and focus setting without having to acquire a calibration dataset. Experiments show that the parameters estimated by acquiring a calibration dataset and applying a calibration procedure are similar to the parameters obtained based on the metadata.

Nuno Barroso Monteiro, José António Gaspar
Going Back to Basics on Volumetric Segmentation of the Lungs in CT: A Fully Image Processing Based Technique

Radiotherapy planning is a crucial task in cancer patients’ management. This task is, however, very time consuming and prone to a high intra and inter subject variance and human errors.In this way, the present line of work aims at developing a tool to help the specialists in this task. The developed tool will consider the delimitation of anatomical regions of interest, since it is crucial to identify the organs at risk and minimize the exposure of these organs to the radiation.This paper, in particular, presents a lung segmentation algorithm, based on image processing techniques, such as intensity projection and region growing, for Computed Tomography volumes. Our pipeline consists in first separating two halves of the volume to isolate each lung. Then, three techniques for seed placement are developed. Finally, a traditional region growing algorithm has been changed in order to automatically derive the value of the threshold parameter.The results obtained for the three different techniques for seed placement were, respectively, 74%, 74% and 92% of DICE with the Iterative Region Growing algorithm.Although the presented results have as use case the Hodgkin Lymphoma, we believe that the developed method is generalizable to any other pathology.

Ana Catarina Oliveira, Inês Domingues, Hugo Duarte, João Santos, Pedro H. Abreu
Radiogenomics: Lung Cancer-Related Genes Mutation Status Prediction

Advances in genomics have driven to the recognition that tumours are populated by different minor subclones of malignant cells that control the way the tumour progresses. However, the spatial and temporal genomic heterogeneity of tumours has been a hurdle in clinical oncology. This is mainly because the standard methodology for genomic analysis is the biopsy, that besides being an invasive technique, it does not capture the entire tumour spatial state in a single exam. Radiographic medical imaging opens new opportunities for genomic analysis by providing full state visualisation of a tumour at a macroscopic level, in a non-invasive way. Having in mind that mutational testing of EGFR and KRAS is a routine in lung cancer treatment, it was studied whether clinical and imaging data are valuable for predicting EGFR and KRAS mutations in a cohort of NSCLC patients. A reliable predictive model was found for EGFR (AUC = 0.96) using both a Multi-layer Perceptron model and a Random Forest model but not for KRAS (AUC = 0.56). A feature importance analysis using Random Forest reported that the presence of emphysema and lung parenchymal features have the highest correlation with EGFR mutation status. This study opens new opportunities for radiogenomics on predicting molecular properties in a more readily available and non-invasive way.

Catarina Dias, Gil Pinheiro, António Cunha, Hélder P. Oliveira
Learning to Perform Visual Tasks from Human Demonstrations

The human visual system makes extensive use of saccadic eye movements to cope with decaying resolution towards the periphery of the visual field, and point the highest acuity region of the retina (i.e. the fovea) to regions of interest, while searching for objects in natural scenes. Experimental evidence has shown that, when searching for objects, humans exploit the advantage of a priori known relations between object classes (i.e. context) to prune the search space, which results in oculo-motor behaviours that are optimal both in terms of effectiveness (i.e. success rate) and efficiency (i.e. energy and time consumed) during search tasks execution. In this work we propose a biologically plausible system that learns from human demonstrations provided by eye tracking data to perform visual search tasks. The proposed framework leverages the recognition accuracy of state-of-the-art pre-trained CNNs with the sequential predictive power of RNNs, trained in an end-to-end manner to predict saliency maps (task-independent) and locate objects of interest (task-dependent).

Afonso Nunes, Rui Figueiredo, Plinio Moreno
Serious Game Controlled by a Human-Computer Interface for Upper Limb Motor Rehabilitation: A Feasibility Study

Stroke affects the population worldwide, with a prevalence of 0.58% worldwide. One of the possible consequences is the negative impact in the motor function of the patient, limiting their quality of life. For these reason, Brain-Computer Interfaces are studied as a tool for improving rehabilitation processes. Nevertheless, to the best of our knowledge, there are no Brain-Computer Interface systems which use video-games for upper limb motor rehabilitation. This study aimed to design and assess a Human-Computer Interface that includes electroencephalography, forearm motion and postural analysis, with healthy subjects. This assessment was made by designing two scenarios in which the participant carried out exercises involving the mouth and the hand and forearm trajectory symmetry. Results show that the system is ready to be tested on patients, since the participants were comfortable using it. Also, the quantitative results, particularly, the metrics used in the video-game, are an important start for health professionals to characterize motor rehabilitation in stroke patients, enabling the path to the use of the designed system in motor rehabilitation therapies.

Sergio David Pulido, Álvaro José Bocanegra, Sandra Liliana Cancino, Juan Manuel López
Weapon Detection for Particular Scenarios Using Deep Learning

The development of object detection systems is normally driven to achieve both high detection and low false positive rates in a certain public dataset. However, when put into a real scenario the result is generally an unacceptable rate of false alarms. In this context we propose to add an additional step that models and filters the typical false alarms of the new scenario while roughly maintaining the ability to detect the objects of interest. We propose to use the false alarms of the new scenario to train a deep autoencoder and to model them. The latter will act as a filter that checks whether the output of the detector is one of its typical false positives or not based on the reconstruction error measured with the Mean Squared Error (MSE) and the Peak Signal-to-Noise Ratio (PSNR). We test the system using an entirely synthetic novel dataset for training and testing the autoencoder generated with Unreal Engine 4. Results show a reduction in the number of FPs of up to 37.9% in combination with the PSNR error while maintaining the same detection capability.

Noelia Vallez, Alberto Velasco-Mata, Juan Jose Corroto, Oscar Deniz
Hierarchical Deep Learning Approach for Plant Disease Detection

In this paper we propose a hierarchical deep learning approach for plant disease detection. The detection of diseases in plants using deep image approaches is attracting researchers as a way of taking advantage of cutting-edge learning techniques in scenarios where major benefits can be achieved for mankind. In this work, we focus on diseases of three major different agricultural crops: apple, peach and tomato. Using a real-world dataset composed of nearly 24,000 images, including healthy examples, we propose a hierarchical deep learning approach for plant disease detection and compare it with the standard deep learning approaches. Results permit the conclusion that hierarchical approaches can overcome standard approaches in terms of detection performance.

Joana Costa, Catarina Silva, Bernardete Ribeiro
An Artificial Vision Based Method for Vehicle Detection and Classification in Urban Traffic

This paper proposes a system to analyze urban traffic through the using of artificial vision, in order to get reliable information about the traffic flow in cities with severe traffic jam, as in Bogotá, Colombia. It was proposed a method efficient enough to be implemented in an embedded system, in order to process the images captured by a local camera and send the synthesized information to the cloud. This approach would allow spending fewer data transference, because it would not be necessary to send the video of each camera in the city via streaming, instead, each camera would send only the relevant traffic information. The system is able to calculate traffic flow, classified in motorbikes, buses, microbuses, minivans, sedans, SUVs and trucks.The detection was implemented using a cascade classifier that evaluates HAAR features, providing a detection rate of 74.9% and a false positive rate of 1.4%. A Kalman filter was used to track and count the detected vehicles. Finally, a Convolutional Neural Network performing as classifier, with accuracies around 88%. The complete system presented errors around 2.5% in contrast with the manual counting in traffic of Bogotá, Colombia.

Camilo Camacho, César Pedraza, Carolina Higuera
Breaking Text-Based CAPTCHA with Sparse Convolutional Neural Networks

CAPTCHA is an automated test designed to check if the user is human. Though other approaches are explored (such as object recognition), the text-based CAPTCHA is still the main test used by many web service providers, to separate human users from bots. In this paper, a sparse Convolutional Neural Network (CNN) to break text-based CAPTCHA is proposed. Unlike previous CNN solutions, which mainly use fine-tuning and transfer learning from pre-trained models, the proposed framework does not require a pre-trained model. The sparsity constraint deactivates connections between neurons in the CNN fully connected layers that leads to improved accuracy compared to the baseline approach. Visualization of the spatial distribution of neuron activity shed light on the internal learning and the effect of the sparsity constraint.

Diogo Daniel Ferreira, Luís Leira, Petya Mihaylova, Petia Georgieva
Image Processing Method for Epidermal Cells Detection and Measurement in Arabidopsis Thaliana Leaves

Arabidopsis thaliana is the most important model specie employed for genetic analysis in plants. As it has been extensively proven, the first pair of extended leaves and its cellular and morphological changes during Arabidopsis development, is and accurate model to understand the molecular and physiological events that control cell cycle progression in plants. Nevertheless, cell analysis on leaves depends significantly on images acquired from a microscopy coupled to a drawing tube, where cells are traced by hand for posterior digitalization and analysis. This process is tedious, inaccurate and highly temporally inefficient. A new image processing method for cell detection in leaves of Arabidopsis thaliana is presented. Using complementary image processing techniques, we introduce a good way to obtain the original cell contour shapes, surpassing the limitations given by factors like noise, stomata, blurred edges, and non-uniform illumination. Results show the new methodology minimizes considerably the time of cell detection compared with the microscopy coupled tube method, and produces matching percentages over 80%.

Manuel G. Forero, Sammy A. Perdomo, Mauricio A. Quimbaya, Guillermo F. Perez
User Modeling on Mobile Device Based on Facial Clustering and Object Detection in Photos and Videos

The article describes an approach for extraction of user preferences based on the analysis of a gallery of photos and videos on mobile device. It is proposed to firstly use fast SSD-based methods in order to detect objects of interests in offline mode directly on mobile device. Next we perform facial analysis of all visual data: extract feature vectors from detected facial regions, cluster them and select public photos and videos which do not contain faces from the large clusters of an owner of mobile device and his or her friends and relatives. At the second stage, these public images are processed on the remote server using very accurate but rather slow object detectors. Experimental study of several contemporary detectors is presented with the specially designed subset of MS COCO, ImageNet and Open Images datasets.

Ivan Grechikhin, Andrey V. Savchenko
Gun and Knife Detection Based on Faster R-CNN for Video Surveillance

Public safety in public areas is nowadays one of the main concerns for governments and companies around the world. Video surveillance systems can take advantage from the emerging techniques of deep learning to improve their performance and accuracy detecting possible threats. This paper presents a system for gun and knife detection based on the Faster R-CNN methodology. Two approaches have been compared taking as CNN base a GoogleNet and a SqueezeNet architecture respectively. The best result for gun detection was obtained using a SqueezeNet architecture achieving a 85.44% $$AP_{50}$$ . For knife detection, the GoogleNet approach achieved a 46.68% $$AP_{50}$$ . Both results improve upon previous literature results evidencing the effectiveness of our detectors.

M. Milagro Fernandez-Carrobles, Oscar Deniz, Fernando Maroto
A Method for the Evaluation and Classification of the Orange Peel Effect on Painted Injection Moulded Part Surfaces

Orange peel effect, the wavy appearance on the surface, is one of the frequently encountered defects on the painted injection moulded parts. In this work, a method for evaluation and classification of the orange peel effect on a black-painted high-gloss surface using image processing and frequency analysis is presented. A monochrome camera is used to acquire images from the surface of the part, while an LED-bar is illuminating it. Because the part is complex shaped, in order to inspect the whole surface, a robotic manipulator is used to handle the part in front of the camera. After taking images, the region of interest (the region in the image illuminated by the light source) is selected manually. Based on the result of the image processing and frequency analysis assessing the waviness of the ROI, a score to evaluate and classify the intensity of the orange peel effect is calculated. Finally, the outcomes of this method are compared with the subjective evaluation of the experts, in order to examine the reliability of the evaluation method.

Atae Jafari-Tabrizi, Hannah Luise Lichtenegger, Dieter P. Gruber
A New Automatic Cancer Colony Forming Units Counting Method

Clonogenic assays are an essential tool to evaluate the survival of cancer cells that have been exposed to a certain dose of radiation. Its result can be used in the generation of strategies for the optimization of radiotherapy treatments. The analysis of this type of data requires that the specialist performs the manual counting of colony forming units (CFU), i.e., find every cell that retains the ability to produce a large progeny. This task is time consuming, prone to errors and the results are not reproducible due to specialist subjective assessment. Digital image processing tools can deal with the flaws described above. This article presents a new technique for automatic CFU counting. The proposed technique extracts the regions of interest (ROIs), where a local segmentation algorithm finds and labels the CFUs in order to quantify them. Results show good sensitivity and specificity performance compared to state-of-the-art software used for CFU detection and counting.

Nicolás Roldán, Lizeth Rodriguez, Andrea Hernandez, Karen Cepeda, Alejandro Ondo-Méndez, Sandra Liliana Cancino Suárez, Manuel G. Forero, Juan M. Lopéz
Deep Vesselness Measure from Scale-Space Analysis of Hessian Matrix Eigenvalues

The enhancement of tubular structures such as vessels in medical images has been addressed in the past, aiming for easier extraction and or visualization of such structures by professionals. Some literature methodologies propose vesselness measures whose design is motivated by local properties of vascular networks and how these influence the eigenvalues of the Hessian matrix. However, past work fails to combine properly the scale-space and neighborhood information, thus leading to the proposal of suboptimal vesselness measures. In this paper, we show that a shallow convolutional neural network is able to learn more optimal embedding spaces from the eigenvalue analysis at different scales, thus leading to a stronger vessel enhancement. Additionally, we also show that such a system maintains one of the biggest advantages of Hessian-based vesselness measures, which is the robustness to data with varying statistics.

Ricardo J. Araújo, Jaime S. Cardoso, Hélder P. Oliveira
Segmentation in Corridor Environments: Combining Floor and Ceiling Detection

Automatic segmentation from indoor images has several applications for mobile platforms. We address the problem of corridor segmentation and propose an approach by combining floor and ceiling detection. However, different difficulties may limit the accuracy of the system. To overcome these difficulties, a strategy is used in this paper to evaluate the degree of consistency of ceiling and floor guidelines. The method is based on computing the disparity between the hypothesized vanishing points by intersecting the boundaries par-wise. The approach is evaluated in a novel dataset. Our experimental validation confirms that the integration of floor and ceiling detection with the consistency model performs effectively and robustly. Because of the simplicity of the method, the image processing is quite fast and robust.

Sergio Lafuente-Arroyo, Saturnino Maldonado-Bascón, Hilario Gómez-Moreno, Cristina Alén-Cordero
Development of a Fire Detection Based on the Analysis of Video Data by Means of Convolutional Neural Networks

Convolutional Neural Networks (CNNs) have proven their worth in the field of image-based object recognition and localization. In the context of this work, a fire detector based on CNNs has been developed that detects fire by analyzing video sequences. The major additions of this work will primarily be realized through the use of temporal information contained in the video sequences depicting fire. In contrast to state of the art fire detectors, a large image database with 160,000 images with an even distribution of positive and negative samples has been created. To be able to compare image-based and video-based approaches as objectively as possible, different image-based CNNs will be trained under the same conditions as the video-based networks within the scope of this work. It will be shown that video-based networks offer an advantage over conventional image-based networks and therefore benefit from the temporal information of fire. We have achieved a prediction accuracy of 96.82%.

Jan Lehr, Christian Gerson, Mohamad Ajami, Jörg Krüger
Towards Automatic and Robust Particle Tracking in Microrheology Studies

Particle tracking applied to video passive microrheology is conventionally done through methods that are far from being automatic. Creating mechanisms that decode the image set properties and correctly detect the tracer beads, to find their trajectories, is fundamental to facilitate microrheology studies. In this work, the adequacy of two particle detection methods - a Radial Symmetry-based approach and Gaussian fitting - for microrheology setups is tested, both on a synthetic database and on real data. Results show that it is possible to automate the particle tracking process in this scope, while ensuring high detection accuracy and sub-pixel precision, crucial for an adequate characterization of microrheology studies.

Marina Castro, Ricardo J. Araújo, Laura Campo-Deaño, Hélder P. Oliveira
Study of the Impact of Pre-processing Applied to Images Acquired by the Cygno Experiment

This work proposes to evaluate the effect of digital filters when applied to images acquired by the ORANGE prototype of the Cygno experiment. A preliminary analysis is presented in order to understand if filtering techniques can produce results that justify investing efforts in the pre-processing stage of those images. Such images come from a camera sensor based on CMOS technology installed in an appropriate gas detector. To perform the proposed work, a simulation environment was created and used to evaluate some of the classical filtering techniques known in the literature. The results showed that the signal-to-noise ratio of the images can be considerably improved, which may help in subsequent processing steps such as clustering and particles identification.

G. S. P. Lopes, E. Baracchini, F. Bellini, L. Benussi, S. Bianco, G. Cavoto, I. A. Costa, E. Di Marco, G. Maccarrone, M. Marafini, G. Mazzitelli, A. Messina, R. A. Nobrega, D. Piccolo, D. Pinci, F. Renga, F. Rosatelli, D. M. Souza, S. Tomassini
Backmatter
Titel
Pattern Recognition and Image Analysis
Herausgegeben von
Aythami Morales
Julian Fierrez
José Salvador Sánchez
Prof. Dr. Bernardete Ribeiro
Copyright-Jahr
2019
Electronic ISBN
978-3-030-31321-0
Print ISBN
978-3-030-31320-3
DOI
https://doi.org/10.1007/978-3-030-31321-0

Informationen zur Barrierefreiheit für dieses Buch folgen in Kürze. Wir arbeiten daran, sie so schnell wie möglich verfügbar zu machen. Vielen Dank für Ihre Geduld.

    Bildnachweise
    AvePoint Deutschland GmbH/© AvePoint Deutschland GmbH, NTT Data/© NTT Data, Wildix/© Wildix, arvato Systems GmbH/© arvato Systems GmbH, Ninox Software GmbH/© Ninox Software GmbH, Nagarro GmbH/© Nagarro GmbH, GWS mbH/© GWS mbH, CELONIS Labs GmbH, USU GmbH/© USU GmbH, G Data CyberDefense/© G Data CyberDefense, FAST LTA/© FAST LTA, Vendosoft/© Vendosoft, Kumavision/© Kumavision, Noriis Network AG/© Noriis Network AG, WSW Software GmbH/© WSW Software GmbH, tts GmbH/© tts GmbH, Asseco Solutions AG/© Asseco Solutions AG, AFB Gemeinnützige GmbH/© AFB Gemeinnützige GmbH