Computer Analysis of Images and Patterns
18th International Conference, CAIP 2019, Salerno, Italy, September 3–5, 2019, Proceedings, Part II
- 2019
- Book
- Editors
- Mario Vento
- Gennaro Percannella
- Book Series
- Lecture Notes in Computer Science
- Publisher
- Springer International Publishing
About this book
The two volume set LNCS 11678 and 11679 constitutes the refereed proceedings of the 18th International Conference on Computer Analysis of Images and Patterns, CAIP 2019, held in Salerno, Italy, in September 2019.
The 106 papers presented were carefully reviewed and selected from 176 submissions The papers are organized in the following topical sections: Intelligent Systems; Real-time and GPU Processing; Image Segmentation; Image and Texture Analysis; Machine Learning for Image and Pattern Analysis; Data Sets and Benchmarks; Structural and Computational Pattern Recognition; Posters.
Advertisement
Table of Contents
-
Poster Session
-
Frontmatter
-
3D Color CLUT Compression by Multi-scale Anisotropic Diffusion
David Tschumperlé, Christine Porquet, Amal MahboubiAbstract3D CLUTs (Color Look Up Tables) are popular digital models used in image and video processing for color grading, simulation of analog films, and more generally for the application of various color transformations. The large size of these models leads to data storage issues when trying to distribute them on a large scale. Here, a highly effective lossy compression technique for 3D CLUTs is proposed. It is based on a multi-scale anisotropic diffusion scheme. Our method exhibits an average compression rate of more than \(99\%\), while ensuring visually indistinguishable differences with the application of the original CLUTs. -
Analysis of Skill Improvement Process Based on Movement of Gaze and Hand in Assembly Task
Yohei Kawase, Manabu HashimotoAbstractIn this paper, we propose a method to analyze the characteristics of the movements of the workers at each skill level in an assembly task. First, the method quantizes the positional information of the gaze and hands into eighteen areas and converts the positional information into a code. Second, the method calculates pairs of codes for the gaze and hands in each frame. Third, the method calculates the frequency of those pairs and generates co-occurrence histograms of the codes for the gaze and hands. In this research, we clearly distinguish the dominant hand from non-dominant hand because, the degree of skill improvement differs between the dominant hand and non-dominant hand. Therefore, the method generates co-occurrence histograms for the gaze and the dominant hand, as well as for the gaze and non-dominant hand. These histograms are proposed as feature for analyzing the characteristics of the movements. The results of the analysis of the skill improvement process show that the non-dominant hand at the elementary level stays in two areas, and the non-dominant hand at the intermediate level moves to five areas. This suggests that workers can move their non-dominant hand more efficiently at the intermediate level than at the elementary level. In addition, we found that the gaze at the intermediate level moves to eight areas, and the gaze at the expert level moves to three areas. This indicates that the gaze at the expert level remains at the center of the workbench. -
Hybrid Function Sparse Representation Towards Image Super Resolution
Junyi Bian, Baojun Lin, Ke ZhangAbstractSparse representation with training-based dictionary has been shown successful on super resolution(SR) but still have some limitations. Based on the idea of making the magnification of function curve without losing its fidelity, we proposed a function based dictionary on sparse representation for super resolution, called hybrid function sparse representation (HFSR). The dictionary we designed is directly generated by preset hybrid functions without additional training, which can be scaled to any size as is required due to its scalable property. We mixed approximated Heaviside function (AHF), sine function and DCT function as the dictionary. Multi-scale refinement is then proposed to utilize the scalable property of the dictionary to improve the results. In addition, a reconstruct strategy is adopted to deal with the overlaps. The experiments on ‘Set14’ SR dataset show that our method has an excellent performance particularly with regards to images containing rich details and contexts compared with non-learning based state-of-the art methods. -
Toward New Spherical Harmonic Shannon Entropy for Surface Modeling
Malika Jallouli, Wafa Belhadj Khalifa, Anouar Ben Mabrouk, Mohamed Ali MahjoubAbstractGenus zero surfaces are widespread forms in real life. It is important to have adequate mathematical tools that best represent them. Spherical harmonics are special bases able to model them in a compact and a relevant way. The main problem of the spherical harmonics modeling process is how to define the optimal reconstruction order that best represent the initial surface. This paper proposed a new spherical harmonics shannon-type entropy to optimize reconstruction and to provide an accurate and efficient evaluation method of the reconstruction order. -
Over Time RF Fitting for Jitter Free 3D Vertebra Reconstruction from Video Fluoroscopy
Ioannis Ioannidis, Hammadi Nait-CharifAbstractOver the past decades, there has been an increasing interest in spine kinematics and various approaches have been proposed on how to analyse spine kinematics. Amongst all, emphasis has been given to both the shape of the individual vertebrae as well as the overall spine curvature as a means of providing accurate and valid spinal condition diagnosis. Traditional invasive methods cannot accurately delineate the intersegmental motion of the spine vertebrae. On the contrary, capturing and measuring spinal motion via the non-invasive fluoroscopy has been a popular technique choice because of its low incurred patient radiation exposure nature. In general, image-based 3D reconstruction methods focus on static spine instances. However, even the ones analysing sequences yield in unstable and jittery animations of the reconstructed spine. In this paper, we address this issue using a novel approach to robustly reconstruct and rigidly derive a shape with no inter-frame variations. This is to produce animations that are jitter free across our sequence based on fluoroscopy video. Our main contributions are (1) retaining the shape of the solid vertebrae across the frame range, (2) helping towards a more accurate image segmentation even when there’s a limited training set. We show our pipeline’s success by reconstructing and comparing 3D animations of the lumbar spine from a corresponding fluoroscopic video. -
Challenges and Methods of Violence Detection in Surveillance Video: A Survey
Wafa Lejmi, Anouar Ben Khalifa, Mohamed Ali MahjoubAbstractThis article presents a survey of the latest methods of violence detection in video sequences. Although many studies have described the approaches taken to detect violence, there are few surveys providing exhaustive review of the available methods. We expose the main challenges in this area and we classify the methods into five broad categories. We discuss each category and present the main techniques that proposed improvements as well as some performance measures using public datasets to evaluate the different existing techniques of violence detection. -
Uncertainty Based Adaptive Projection Selection Strategy for Binary Tomographic Reconstruction
Gábor Lékó, Szilveszter Domány, Péter BalázsAbstractThe goal of binary tomography is to examine the inner structure of homogeneous objects based on their projections. The 2D slices of the objects can be represented by binary matrices and the aim is to recreate these matrices from a collection of their line sums. For cost-effectiveness and speed reasons it is worth to do the reconstruction from as few projections as possible while still maintaining an acceptable image quality. The key is to specify the most informative projection angles. In this paper we propose a reconstruction uncertainty based adaptive (online) projection selection method for binary tomographic reconstruction. We compare our algorithm to other already published methods. -
Non-contact Heart Rate Monitoring Using Multiple RGB Cameras
Hamideh Ghanadian, Hussein Al OsmanAbstractRecent advances in computer vision and signal processing are enabling researchers to realize mechanisms for the remote monitoring of vital signs. The remote measurement of vital signs, including heart rate (HR), Heart Rate Variability (HRV), and respiratory rate, presents important advantages for patients. For instance, continuous remote monitoring alleviates the discomfort due to skin irritation and/or mobility limitation associated with contact-based measurement techniques. Recently, several studies presented methods to measure HR and HRV by detecting the Blood Volume Pulse (BVP) from the human skin. They use a single camera to capture a visible segment of the skin such as face, hand, or foot to monitor the BVP. We propose a remote HR measurement algorithm that uses multiple cameras to capture the facial video recordings of still and moving subjects. Using Independent Component Analysis (ICA) as a Blind Source Separation (BSS) method, we isolate the physiological signals from noise in the RGB facial video recordings. With respect to the ECG measurement ground truth, the proposed method decreases the RMSE by 18% compared to the state-of-the-art in the subject movement condition. The proposed method achieves an RMSE of 1.43 bpm and 0.96 bpm in the stationary and movement conditions respectively. -
An Efficient Anaglyph 3D Video Watermarking Approach Based on Hybrid Insertion
Dorra Dhaou, Saoussen Ben Jabra, Ezzeddine ZagroubaAbstractDigital watermarking techniques have been proposed as an efficient solution to protect different media from illegal manipulations. However, for 3D videos, this domain has not been developed enough where only three watermarking methods exist for anaglyph 3D videos. These existing methods suffer from robustness insufficiency, especially against malicious attacks such as video compression. In this paper, a robust hybrid anaglyph 3D video watermarking algorithm, which is based on spatial and frequency domains (Least Significant Bit (LSB), Discrete Wavelet Transform (DWT) and Discrete Cosines Transform (DCT)) is proposed. First, the original anaglyph 3D video is divided into a set of frames, then each frame is split into cyan and red images. A first signature will be inserted in cyan and red images using, respectively, DCT and LSB based techniques. Second, the obtained anaglyph frame is processed by the first level of the DWT, where the second signature is inserted in a low frequency. The inverse of a DWT is performed to obtain the final marked anaglyph frame. The proposed approach is evaluated based on invisibility and robustness criteria and shows a high level of invisibility and robustness against different video compression standards such as MPEG-4 and H264-AVC. -
A Computer Vision Pipeline that Uses Thermal and RGB Images for the Recognition of Holstein Cattle
Amey Bhole, Owen Falzon, Michael Biehl, George AzzopardiAbstractThe monitoring of farm animals is important as it allows farmers keeping track of the performance indicators and any signs of health issues, which is useful to improve the production of milk, meat, eggs and others. In Europe, bovine identification is mostly dependent upon the electronic ID/RFID ear tags, as opposed to branding and tattooing. The RFID based ear-tagging approach has been called into question because of implementation and management costs, physical damage and animal welfare concerns. In this paper, we conduct a case study for individual identification of Holstein cattle, characterized by black, brown and white patterns, in collaboration with the Dairy campus in Leeuwarden. We use a FLIR E6 thermal camera to collect an infrared and RGB image of the side view of each cow just after leaving the milking station. We apply a fully automatic pipeline, which consists of image processing, computer vision and machine learning techniques on a data set containing 1237 images and 136 classes (i.e. individual animals). In particular, we use the thermal images to segment the cattle from the background and remove horizontal and vertical pipes that occlude the cattle in the station, followed by filling the blank areas with an inpainting algorithm. We use the segmented image and apply transfer learning to a pre-trained AlexNet convolutional neural network. We apply five-fold cross-validation and achieve an average accuracy rate of 0.9754 ± 0.0097. The results obtained suggest that the proposed non-invasive approach is highly effective in the automatic recognition of Holstein cattle from the side view. In principle, this approach is applicable to any farm animals that are characterized by distinctive coat patterns. -
DeepNautilus: A Deep Learning Based System for Nautical Engines’ Live Vibration Processing
Rosario Carbone, Raffaele Montella, Fabio Narducci, Alfredo PetrosinoAbstractRecent advances in sensor technologies and data analysis techniques allow reliable and efficient systems for the early diagnosis of breakdowns in the production chain of the car industry and, more generally, of engines. The performance of these systems is based fundamentally on the quality of the features extracted and on the learning technique. In this paper, we show our preliminary, but encouraging, results carried out through our research effort in the field of using deep neural network to recognize and eventually predict engine failures. We present the prototypal blueprint of the system DeepNautilius devoted to detect failures in marine engines using deep learning with the ambitious goal of reducing marine pollution. Our envision comprises a distributed sensor data acquisition system based on the fog/edge/cloud computing paradigm, with a consistent part of the computation located on the edge side. While our architectural approach is described as a design oriented issue, in this work we present our experience with the deep neural network (DNN) computational core, using a literature dataset from an air compression engine. We demonstrate that our approach is not only comparable with the one in literature but is even better performing. -
Binary Code for the Compact Palmprint Representation Using Texture Features
Agata Giełczyk, Gian Luca Marcialis, Michał ChoraśAbstractIn this paper, we present an effective approach to the biometric user verification using palmprints. The main idea and key innovation of the method is a compact 32-bit length vector to summarize the palmprint texture. This method provides the user verification with the accuracy reaching 92% in the experiments performed on the benchmark PolyU palmprint database. Moreover, the reported results show that the obtained accuracy appears to be hardly dependent on the number of enrolled samples. The proposed representation may be extremely useful in real life applications because of its compactness and effectiveness. -
Handwriting Analysis to Support Alzheimer’s Disease Diagnosis: A Preliminary Study
Nicole Dalia Cilia, Claudio De Stefano, Francesco Fontanella, Mario Molinara, Alessandra Scotto Di FrecaAbstractAlzheimer’s disease (AD) is the most common neurodegenerative dementia of old age and the leading chronic disease contributor to disability and dependence among older people worldwide. Handwriting is among the motor activities compromised by AD, which is the result of a complex network of cognitive, kinaesthetic and perceptive-motor skills. Indeed, researchers have shown that the patients affected by these diseases exhibit alterations in the spatial organization and poor control of movement. In this paper, we present the preliminary results of a study in which an experimental protocol (including the copy of words, letters and sentence task) has been used to assess the kinematic properties of the movements involved in the handwriting. The obtained results are very encouraging and seem to confirm the hypothesis that machine learning-based analysis of handwriting can be profitably used to support AD diagnosis. -
Geometrical and Statistical Properties of the Rational Order Pyramid Transform and Dilation Filtering
Kento Hosoya, Kouki Nozawa, Atsushi ImiyaAbstractThe pyramid transform of rational orders is described using matrix transform. This matrix expression of the rational order pyramid transform clarifies the eigenspace properties of the transform. The matrix-based expression, however, derives orthogonal base in each resolution. This orthogonal property of base of signals derive a unified computation of linear transformation to images any rational resolutions. Furthermore, the eigenspace property allows us to define the rational pyramid transform families using the discrete cosine transform. Numerical evaluation of the transform clarifies that rational order pyramid transform preserves the normalised distribution of grey-scale in images. -
Personal Identity Verification by EEG-Based Network Representation on a Portable Device
Giulia Orrú, Marco Garau, Matteo Fraschini, Javier Acedo, Luca Didaci, David Ibáñez, Aureli Soria-Frish, Gian Luca MarcialisAbstractEEG-based personal verification was investigated so far by using mainly standard non-portable device with a large number of electrodes (typically, 64) constrained to heavy headset configuration. Despite this equipment has been shown to be useful to investigate in depth EEG signal characteristics from a biomedical point of view, it may be considered less appropriate for designing real-life EEG-based biometric systems. In this work, EEG signals are collected by a portable and user-friendly device explicitly conceived for biometric applications, featured by a set of 16 channels. Investigated feature extraction algorithms are based on modelling the EEG channels as a network of mutually interacting units, which was shown to be effective for personal verification purposes when brain signals are acquired by standard EEG devices. This work shows that, even using a reduced set of channels, these approaches still remain effective. The aim of this paper is intended to stimulate research on the use of light and portable EEG headset configurations by adopting network-based representations of EEG brain signals, since a light headset represents a precondition in order to design real-life EEG-based personal verification systems. -
A System for Controlling How Carefully Surgeons Are Cleaning Their Hands
Luca Greco, Gennaro Percannella, Pierluigi Ritrovato, Alessia Saggese, Mario VentoAbstractIn this paper, we propose a method for the automatic compliance evaluation of the hand washing procedure performed by healthcare providers. The ideal cleaning procedure, as defined by the guidelines of the World Health Organization (WHO), is split into a sequence of ten distinct and specific hand gestures which have to be performed in the proper order. Thus, the conformance verification problem is formulated as the problem of recognizing that at a given time instant a specific gesture is carried out by the subject. The considered recognition problem is faced through a deep neural network inspired to AlexNet that classifies each image providing as output the guessed gesture class that the subject is performing. Images are captured by a depth camera mounted in a top-view position. The performance of the proposed approach has been assessed on a brand new dataset of about 131.765 frames obtained from 74 continuous recording from trained personnel. Preliminary evaluation confirms the feasibility of the approach with a recognition rate at the frame level that is about \(77\%\), and is about \(98\%\) when using a mobile window of 1 s. The developed system will be deployed for training students of the medicine course on the surgical hand-washing procedure. -
Class-Conditional Data Augmentation Applied to Image Classification
Eduardo Aguilar, Petia RadevaAbstractImage classification is widely researched in the literature, where models based on Convolutional Neural Networks (CNNs) have provided better results. When data is not enough, CNN models tend to be overfitted. To deal with this, often, traditional techniques of data augmentation are applied, such as: affine transformations, adjusting the color balance, among others. However, we argue that some techniques of data augmentation may be more appropriate for some of the classes. In order to select the techniques that work best for particular class, we propose to explore the epistemic uncertainty for the samples within each class. From our experiments, we can observe that when the data augmentation is applied class-conditionally, we improve the results in terms of accuracy and also reduce the overall epistemic uncertainty. To summarize, in this paper we propose a class-conditional data augmentation procedure that allows us to obtain better results and improve robustness of the classification in the face of model uncertainty. -
Fabric Classification and Matching Using CNN and Siamese Network for E-commerce
Chandrakant Sonawane, Dipendra Pratap Singh, Raghav Sharma, Aditya Nigam, Arnav BhavsarAbstractAccording to the Google-Kearney study (May 2016), the fashion industry in India has a tremendous scope and can easily surpass the electronic consumer product sector as early as 2020. However, the apparel sector faces a major limitation on the part of the subjectivity in judging the fabric quality. There is no doubt that the e-commerce industry can earn the highest rate of return from the apparel sector; still, its popularity often got limited. Any person purchasing apparel always first like to touch the fabric to get a ‘feel’ of the fabric and its texture to compare it with the mental/latent representation of other fabrics to assess the quality or equivalence. Though the ‘feel’ of any fabric texture cannot be physically quantified, the latent representation of fabric texture can be extracted and compared using Autoencoders and Siamese networks respectively. In this paper, we have utilized an inexpensive (less than \(5\%\) frugal cellular microscope for the data collection in contrast to any expensive fabric texture scanners. We have utilized Convolutional Neural Networks based Autoencoders and Siamese network for classification, clustering, and matching of similar fabric textures. We have shown that even with frugal data collection methods, the proposed CNN classifiers using the latent feature representation of fabric texture gives a higher accuracy of \(98.40\%\) for fabric texture classification. -
Real-Time Style Transfer with Strength Control
Victor KitovAbstractStyle transfer is a problem of rendering a content image in the style of another style image. A natural and common practical task in applications of style transfer is to adjust the strength of stylization. Algorithm of Gatys et al. [4] provides this ability by changing the weighting factors of content and style losses but is computationally inefficient. Real-time style transfer introduced by Johnson et al. [9] enables fast stylization of any image by passing it through a pre-trained transformer network. Although fast, this architecture is not able to continuously adjust style strength. We propose an extension to real-time style transfer that allows direct control of style strength at inference, still requiring only a single transformer network. We conduct qualitative and quantitative experiments that demonstrate that the proposed method is capable of smooth stylization strength control and removes certain stylization artifacts appearing in the original real-time style transfer method. Comparisons with alternative real-time style transfer algorithms, capable of adjusting stylization strength, show that our method reproduces style with more details. -
Faster Visual-Based Localization with Mobile-PoseNet
Claudio Cimarelli, Dario Cazzato, Miguel A. Olivares-Mendez, Holger VoosAbstractPrecise and robust localization is of fundamental importance for robots required to carry out autonomous tasks. Above all, in the case of Unmanned Aerial Vehicles (UAVs), efficiency and reliability are critical aspects in developing solutions for localization due to the limited computational capabilities, payload and power constraints. In this work, we leverage novel research in efficient deep neural architectures for the problem of 6 Degrees of Freedom (6-DoF) pose estimation from single RGB camera images. In particular, we introduce an efficient neural network to jointly regress the position and orientation of the camera with respect to the navigation environment. Experimental results show that the proposed network is capable of retaining similar results with respect to the most popular state of the art methods while being smaller and with lower latency, which are fundamental aspects for real-time robotics applications. -
Unsupervised Effectiveness Estimation Through Intersection of Ranking References
João Gabriel Camacho Presotto, Lucas Pascotti Valem, Daniel Carlos Guimarães PedronetteAbstractEstimating the effectiveness of retrieval systems in unsupervised scenarios consists in a task of crucial relevance. By exploiting estimations which dot not require supervision, the retrieval results of many applications as rank aggregation and relevance feedback can be improved. In this paper, a novel approach for unsupervised effectiveness estimation is proposed based the intersection of ranking references at top-k positions of ranked lists. An experimental evaluation was conducted considering public datasets and different image features. The linear correlation between the proposed measure and the effectiveness evaluation measures was assessed, achieving high scores. In addition, the proposed measure was also evaluated jointly with rank aggregation methods, by assigning weights to ranked lists according to the effectiveness estimation of each feature. -
Joint Correlation Measurements for PRNU-Based Source Identification
Vittoria Bruni, Alessandra Salvi, Domenico VitulanoAbstractCamera fingerprint, namely PRNU, is a multiplicative noise source contained in each image captured by a given sensor. Source camera identification derives from similarity assessment between camera fingerprint and a candidate image; it requires the extraction of image PRNU and the estimation of camera fingerprint. To this aim, a denoising procedure is commonly applied in both cases and correlation is used for assessing similarity. However, correlation measure strongly depends on the accuracy of camera fingerprint estimation and PRNU image extraction. This paper presents a method for making more robust correlation-based source camera identification. It consists of more than one estimation of camera fingerprint; then, identification consists of the quantification of the amount of concurrence between correlation measures. It is expected higher correspondence between measures whenever the candidate image has been captured by a given device (match case); while lack of correspondence is expected whenever the image does not come from the considered device (no match case). Preliminary experimental results show that the proposed joint correlation measurements contribute to improve the precision of correlation-based source camera identification methods, especially in terms of a reduced number of false positives. -
Detecting Sub-Image Replicas: Retrieval and Localization of Zoomed-In Images
Afraà Ahmad Alyosef, Andreas NürnbergerAbstractZoomed-in image retrieval is a special field of near-duplicate image retrieval. It allows determining the sight, panorama or landscape image where a zoomed-in image belongs to. In addition, it can help to detect copyright violations of images that have been cropped and rescaled from panorama images. So far, only few research have been done on this problem using supervised learning techniques. We present a method to retrieve and localize zoomed-in images with respect to the whole scene based on correlating groups of features. Feature grouping is used to filter features that do not contribute to identifying relations between images. The remaining features are used to estimate the scale and location of the zoomed-in image with respect to the whole scene. We provide results of a benchmark data study using the proposed method to detect zoomed-in images and to localize them in the correlating whole scene images. We compare our method with the RANSAC model in case of zoomed-in retrieval and localization. The results indicate that our approach is more robust than the RANSAC model and can detect the relation and localize zoomed-in images even when most matched features are not correlated or only a few matches can be found. -
Homogeneity Index as Stopping Criterion for Anisotropic Diffusion Filter
Fernando Pereira dos Santos, Moacir Antonelli PontiAbstractThe Anisotropic Diffusion Filter is an image smoothing method often applied to improve segmentation and classification tasks. Because it is an adaptive and iterative method, one should define some stopping criterion in order to avoid unnecessary computational cost while producing the desired output. However, state-of-the-art methods in this regard consider costly comparative functions computed at each iteration or allowing extra iterations before actually stopping. Therefore, in this paper we propose a new stopping criterion to overcome this difficulty that defines the number of iterations without additional comparisons during the image processing. Our stopping criterion is based on the image homogeneity index and the constants included in the filter definition, which can be calculated before the first iteration. Using three different measures of similarity in grayscale and colorful images from different domains with variation of tonality, our results indicate that the proposed stopping criterion reduces the number of iterations and, simultaneously, maintains the quality of the diffused images. Consequently, our method can be applied to images from different sources, color composition, and levels of noise. -
Adaptive Image Binarization Based on Multi-layered Stack of Regions
Hubert Michalak, Krzysztof OkarmaAbstractThe main purpose of conducted research is the development of a new image thresholding method, which is faster than typical adaptive methods and more accurate than global binarization. Since natural images captured by cameras are usually unevenly illuminated, due to unknown and various lighting conditions, an appropriate binarization influences the results of further image analysis significantly.In this paper, the analysis of multi-layered stack of regions, being the enhancement of the single-layer version, is proposed to calculate the local image properties. Since the balance between the global and local adaptive thresholding requires the choice of an appropriate number of shifted layers and block size, its verification has been made using a database of test images. The proposed local threshold value is chosen as the mean local intensity corrected using two additional parameters subjected to optimization.The developed procedure allows for more accurate and faster binarization, which can be applied in many technical systems. It has been verified by the example of text recognition accuracy for the non-uniformly illuminated document images in comparison to alternative global and local methods of similar of lower computational complexity. -
Evaluating Impacts of Motion Correction on Deep Learning Approaches for Breast DCE-MRI Segmentation and Classification
Antonio Galli, Michela Gravina, Stefano Marrone, Gabriele Piantadosi, Mario Sansone, Carlo SansoneAbstractDynamic Contrast Enhanced-Magnetic Resonance Imaging (DCE-MRI) is a diagnostic method suited for the early detection and diagnosis of cancer, involving the serial acquisition of images before and after the injection of a paramagnetic contrast agent. Dealing with long acquisition times, DCE-MRI inevitably shows noise (artefacts) in acquired images due to the patient (often involuntary) movements. As a consequence, over the years, machine learning approaches showed that some sort of motion correction technique (MCT) have to be applied in order to improve performance in tumours segmentation and classification. However, in recent times classic machine learning approaches have been outperformed by deep learning based ones, thanks to their ability to autonomously learn the best set of features for the task under analysis. This paper proposes a first investigation to understand if deep learning based approaches are more robust to the misalignment of images over time, making the registration no longer needed in this context. To this aim, we evaluated the effectiveness of a MCT both for the classification and for the segmentation of breast lesions in DCE-MRI by means of some literature proposal. Our results show that while MCTs seems to be still quite useful for the lesion segmentation task, they seem to be no longer strictly required for lesion classification one. -
A Two-Step System Based on Deep Transfer Learning for Writer Identification in Medieval Books
Nicole Dalia Cilia, Claudio De Stefano, Francesco Fontanella, Claudio Marrocco, Mario Molinara, Alessandra Scotto Di FrecaAbstractIn digital paleography, recent technology advancements are used to support paleographers in the study and analysis of ancient documents. One main goal of paleographers is to identify the different scribes (writers) who wrote a given manuscript. Deep learning has recently been applied to many domains. However, in order to overcome its requirement of large amount of labeled data, transfer learning have been used. This approach typically uses previously trained large deep networks as starting points to solve specific classification problems. In this paper, we present a two step deep transfer learning based tool to help paleographers identify the parts of a manuscript that were written by the same writer. The suggested approach has been tested on a set of digital images from a Bible of the XII century. The achieved results confirmed the effectiveness of the proposed approach. -
Restoration of Colour Images Using Backward Stochastic Differential Equations with Reflection
Dariusz BorkowskiAbstractColour image denoising methods based on the chromaticity-brightness decomposition are well-known for their excellent results. We propose a novel approach for chromaticity denoising using advanced techniques of stochastic calculus. In order to solve this problem we use backward stochastic differential equations with reflection. Our experiments show that the new approach gives very good results and compares favourably with deterministic differential equation methods. -
Sound Transformation: Applying Image Neural Style Transfer Networks to Audio Spectograms
Xuehao Liu, Sarah Jane Delany, Susan McKeeverAbstractImage style transfer networks are used to blend images, producing images that are a mix of source images. The process is based on controlled extraction of style and content aspects of images, using pre-trained Convolutional Neural Networks (CNNs). Our interest lies in adopting these image style transfer networks for the purpose of transforming sounds. Audio signals can be presented as grey-scale images of audio spectrograms. The purpose of our work is to investigate whether audio spectrogram inputs can be used with image neural transfer networks to produce new sounds. Using musical instrument sounds as source sounds, we apply and compare three existing image neural style transfer networks for the task of sound mixing. Our evaluation shows that all three networks are successful in producing consistent, new sounds based on the two source sounds. We use classification models to demonstrate that the new audio signals are consistent and distinguishable from the source instrument sounds. We further apply t-SNE cluster visualisation to visualise the feature maps of the new sounds and original source sounds, confirming that they form different sound groups from the source sounds. Our work paves the way to using CNNs for creative and targeted production of new sounds from source sounds, with specified source qualities, including pitch and timbre. -
Timber Tracing with Multimodal Encoder-Decoder Networks
Fedor Zolotarev, Tuomas Eerola, Lasse Lensu, Heikki Kälviäinen, Heikki Haario, Jere Heikkinen, Tomi KauppiAbstractTracking timber in the sawmill environment from the raw material (logs) to the end product (boards) provides various benefits including efficient process control, the optimization of sawing, and the prediction of end-product quality. In practice, the tracking of timber through the sawmilling process requires a methodology for tracing the source of materials after each production step. The tracing is especially difficult through the actual sawing step where a method is needed for identifying from which log each board comes from. In this paper, we propose an automatic method for board identification (board-to-log matching) using the existing sensors in sawmills and multimodal encoder-decoder networks. The method utilizes point clouds from laser scans of log surfaces and grayscale images of boards. First, log surface heightmaps are generated from the point clouds. Then both the heightmaps and board images are converted into “barcode” images using convolutional encoder-decoder networks. Finally, the “barcode” images are utilized to find matching logs for the boards. In the experimental part of the work, different encoder-decoder architectures were evaluated and the effectiveness of the proposed method was demonstrated using challenging data collected from a real sawmill. -
A Challenging Voice Dataset for Robotic Applications in Noisy Environments
Antonio Roberto, Alessia Saggese, Mario VentoAbstractArtificial Intelligence plays a fundamental role in the speech-based interaction between humans and machines in cognitive robotic systems. This is particularly true when dealing with very crowded environments, such as museums or fairs, where cognitive systems could be profitably used. The existing datasets “in the wild” are not sufficiently representative for this purposes, thus there is a growing need to make publicly available a more complex dataset for speaker recognition in extremely noisy conditions. In this paper, we propose the Speaker Recognition dataset in the Wild (SpReW), a novel and more challenging Italian audio database for speaker recognition tasks. Moreover, we report a quantitative evaluation of a novel CNN architecture for Speaker Identification tasks called SincNet, on the proposed dataset. SincNet has been chosen as a baseline architecture since it has obtained impressive results on widely used controlled datasets. Experimental results demonstrate the difficulties when dealing with very noisy test sets and few clearly acquired samples for training. -
Binary Classification Using Pairs of Minimum Spanning Trees or N-Ary Trees
Riccardo La Grassa, Ignazio Gallo, Alessandro Calefati, Dimitri OgnibeneAbstractOne-class classifiers are trained only with target class samples. Intuitively, their conservative modeling of the class description may benefit classical classification tasks where classes are difficult to separate due to overlapping and data imbalance. In this work, three methods leveraging on the combination of one-class classifiers based on non-parametric models, Trees and Minimum Spanning Trees class descriptors (MST_CD) are proposed.These methods deal with inconsistencies arising from combining multiple classifiers and with spurious connections that MST-CD creates in multi-modal class distributions. Experiments on several datasets show that the proposed approach obtains comparable and, in some cases, state-of-the-art results. -
Estimating the Noise Level Function with the Tree of Shapes and Non-parametric Statistics
Baptiste Esteban, Guillaume Tochon, Thierry GéraudAbstractThe knowledge of the noise level within an image is a valuable information for many image processing applications. Estimating the noise level function (NLF) requires the identification of homogeneous regions, upon which the noise parameters are computed. Sutour et al. have proposed a method to estimate this NLF based on the search for homogeneous regions of square shape. We generalize this method to the search for homogeneous regions with arbitrary shape thanks to the tree of shapes representation of the image under study, thus allowing a more robust and precise estimation of the noise level function. -
Deep Convolutional Neural Networks for Plant Species Characterization Based on Leaf Midrib
Leonardo F. S. Scabini, Rayner M. Condori, Isabella C. L. Munhoz, Odemir M. BrunoAbstractThe automatic characterization and classification of plant species is an important task for plant taxonomists. On this work, we propose the use of well-known pre-trained Deep Convolutional Neural Networks (DCNN) for the characterization of plants based on their leaf midrib. The samples studied are microscope images of leaf midrib cross-sections taken from different specimens under varying conditions. Results with traditional handcrafted image descriptors demonstrate the difficulty to effectively characterize these samples. Our proposal is to use DCNN as a feature extractor through Global Average Pooling (GAP) over the raw output of its last convolutional layers without the application of summarizing functions such as ReLU and local poolings. Results indicate considerably performance improvements over previous approaches under different scenarios, varying the image color-space (gray-level or RGB) and the classifier (KNN or LDA). The highest result is achieved by the deeper network analyzed, ResNet (101 layers deep), using the LDA classifier, with \(99.20\%\) of accuracy rate. However, shallower networks such as AlexNet also provide good classification results (\(97.36\%\)), which is still a significant improvement over the best previous result (\(83.67\%\) of combined fractal descriptors). -
Towards an Automatic Annotation of French Sign Language Videos: Detection of Lexical Signs
Hussein Chaaban, Michèle Gouiffès, Annelies BraffortAbstractThis paper presents an approach towards an automatic annotation system for French Sign Language (LSF). Such automation aims to reduce the processing time and the subjectivity of manual annotations done by linguists in order to study the sign language and simplify indexing for automatic signs recognition. The described system uses face and body keypoints collected from 2D RGB standard LSF videos. A naive Bayesian model was built to classify gestural units using the collected keypoints as features. We started from the observation that, for many signers, the production of lexical signs is very often accompanied by mouthing. Effectively, the results showed that the system is capable of detecting lexical signs, with highest success rate, using only information about mouthing and head direction. -
Orthogonal Affine Invariants from Gaussian-Hermite Moments
Jan Flusser, Tomáš Suk, Bo YangAbstractWe propose a new kind of moment invariants with respect to an affine transformation. The new invariants are constructed in two steps. First, the affine transformation is decomposed into scaling, stretching and two rotations. The image is partially normalized up to the second rotation, and then rotation invariants from Gaussian-Hermite moments are applied. Comparing to the existing approaches – traditional direct affine invariants and complete image normalization – the proposed method is more numerically stable. The stability is achieved thanks to the use of orthogonal Gaussian-Hermite moments and also due to the partial normalization, which is more robust to small changes of the object than the complete normalization. Both effects are documented in the paper by experiments. Better stability opens the possibility of calculating affine invariants of higher orders with better discrimination power. This might be useful namely when different classes contain similar objects and cannot be separated by low-order invariants. -
A Web-Based System to Assess Texture Analysis Methods and Datasets
Alex J. F. Farfán, Leonardo F. S. Scabini, Odemir M. BrunoAbstractTexture analysis is an active area of research in computer vision and image processing, being one of the most studied topics for image characterization. When facing with texture analysis in a novel application a researcher needs to evaluate different texture methods and classifiers to verify which are the most suitable for each type of image. This usually leads the researcher to spend time setting code to make comparisons and tests. In this context, we propose a research and collaboration platform for the study, analysis, and comparison of texture descriptors and image datasets. This web-based application eases the creation of experiments in texture analysis that consists of extracting texture features and performing a classification over these features. It has a collection of methods, datasets, and classification algorithms while also allows the user to upload the code of new descriptors, the files of new texture datasets and to perform various tasks over them. Another interesting feature of this application is its interactive confusion matrix in which the researcher can identify correctly and incorrectly classified images. -
TRINet: Tracking and Re-identification Network for Multiple Targets in Egocentric Videos Using LSTMs
Jyoti Nigam, Renu M. RameshanAbstractWe present a recurrent network based novel framework for tracking and re-identifying multiple targets in first-person perspective. Even though LSTMs can act as a sequence classifier, most of the previous works in multi target tracking use their output with some distance metric for data association. In this work, we employ an LSTM as a classifier and train it over the memory cells output vectors corresponding to different targets obtained from another LSTM. This classifier, based on appearance and motion features, discriminates the targets in two consecutive frames as well as re-identify them in a time interval. We integrate this classifier as an additional block in a detection free tracking architecture which enhances the performance in terms of re-identification of targets and also indicates the absence of targets. We propose a dataset of twenty egocentric videos containing multiple targets to validate our approach. -
Digital Signature Based Control Integrity for JPEG HDR Images
Meha Hachani, Azza Ouled ZaidAbstract“A ship is always safe at the shore, but that is not what it is built for.”-Albert Einstein. The same philosophy is applied to picture files saved safety on storage devices. This is, however, not always the case. Sharing and exchanging image data through computer networks affect their security. The issue that arises is how to protect these visual contents from malicious attacks. To alleviate this problem, an image verification feature can be exploited to automatically verify the integrity of these contents. In this way, users can detect any intentional corruptions, which can occur at any time during transmission.In the present work, we propose a selective authentication system adapted to HDR images. The main contribution consists in verifying the integrity of several parts from JPEG-HDR files by using a content-based watermarking method. Specifically, a couple of local digital thumbprint and digital signature is calculated based on the SHA-2(256 bits) secure hash algorithm. Then, a verification process is performed by using two different secret keys (private and public RSA keys). Our HDR image verification scheme is applied in the DCT transform domain and maintains backwards compatibility with the legacy JPEG standard. A performance analysis and comparison with related work demonstrate the effectiveness of the proposed approach. -
FS2Net: Fiber Structural Similarity Network (FS2Net) for Rotation Invariant Brain Tractography Segmentation Using Stacked LSTM Based Siamese Network
Ranjeet Ranjan Jha, Shreyas Patil, Aditya Nigam, Arnav BhavsarAbstractIn this paper, we propose a novel deep learning architecture combining stacked Bi-directional LSTM and LSTMs with the Siamese network architecture for segmentation of brain fibers, obtained from tractography data, into anatomically meaningful clusters. The proposed network learns the structural difference between fibers of different classes, which enables it to classify fibers with high accuracy. Importantly, capturing such deep inter and intra class structural relationship also ensures that the segmentation is robust to relative rotation among test and training data, hence can be used with unregistered data. Our extensive experimentation over order of hundred-thousands of fibers shows that the proposed model achieves state-of-the-art results, even in cases of large relative rotations between test and training data. -
Master and Rookie Networks for Person Re-identification
Danilo Avola, Marco Cascio, Luigi Cinque, Alessio Fagioli, Gian Luca Foresti, Cristiano MassaroniAbstractRecognizing different visual signatures of people across non-overlapping cameras is still an open problem of great interest for the computer vision community, especially due to its importance in automatic video surveillance on large-scale environments. A main aspect of this application field, known as person re-identification (re-id), is the feature extraction step used to define a robust appearance of a person. In this paper, a novel two-branch Convolutional Neural Network (CNN) architecture for person re-id in video sequences is proposed. A pre-trained branch, called Master, leads the learning phase of the other un-trained branch, called Rookie. Using this strategy, the Rookie network is able to learn complementary features with respect to those computed by the Master network, thus obtaining a more discriminative model. Extensive experiments on two popular challenging re-id datasets have shown increasing performance in terms of convergence speed as well as accuracy in comparison to standard models, thus providing an alternative and concrete contribution to the current re-id state-of-the-art. -
Knee Osteoarthritis Detection Using Power Spectral Density: Data from the OsteoArthritis Initiative
Abdelbasset Brahim, Rabia Riad, Rachid JennaneAbstractIn this paper, an aided diagnosis method for OsteoArthritis (OA) disease using knee X-ray imaging and spectral analysis is presented. The proposed method is based on the Power Spectral Density (PSD) over different orientations of the image as a feature for the classification task. Then, independent component analysis (ICA) is used to select the relevant PSD coefficients for OA detection. Finally, a logistic regression classifier is used to classify 688 knee X-ray images obtained from the Osteoarthritis Initiative (OAI). The proposed diagnosis approach yields classification results up to 78.92% of accuracy (with 79.65% of sensitivity and 78.20% of specificity). Thus, it outperforms several other recently developed OA diagnosis systems. -
Object Instance Segmentation in Digital Terrain Models
Bashir Kazimi, Frank Thiemann, Monika SesterAbstractWe use an object instance segmentation approach in deep learning to detect and outline objects in Digital Terrain Models (DTMs) derived from Airborne Laser Scanning (ALS) data. Object detection methods in computer vision have been extensively applied to RGB images, and gained excellent results. In this work, we use Mask R-CNN, a famous object detection model, to detect objects in archaeological sites by feeding the model with DTM data. Our experiments show successful application of the Mask R-CNN model, originally developed for image data, on DTM data. -
Improvement of Image Denoising Algorithms by Preserving the Edges
Ram Krishna Pandey, Harpreet Singh, A. G. RamakrishnanAbstractImage restoration is one of the well-studied problems in low-level image processing tasks. Recently, deep learning based image restoration techniques have shown promising results and outperform most of the state of the art image denoising algorithms. Most of the deep learning based methods use mean square error as a loss function to obtain the denoised output. This work focuses on further improving the existing deep learning based image denoising techniques by preserving edges using Canny edge based loss function, and hence improving peak signal to noise ratio (PSNR) and structural similarity (SSIM) of the images while restoring the visual quality. -
Fast Landmark Recognition in Photo Albums
Pierre Stefani, Cristina OpreanAbstractThe performance of landmark recognition models has significantly improved following the introduction of dedicated CNN-based methods. Most existing algorithms rely on instance-level image retrieval techniques and this induces a scalability problem. Here, we tackle this landmark recognition as a classification task to achieve real-time processing. Our focus is on building an industrial architecture which provides an optimal trade-off between the execution time and performance. We first use a DELF model, which achieves state-of-the-art accuracy, to extract features. Second, we add VLAD layer to create fixed size features from the initial variable length DELF features. Third, the VLAD features are fed to a classifier which provides the final results. Finally, we include a light but powerful distractor handling post-processing step to avoid false positives. It is needed to cope with real datasets which include both landmark and non-landmark images. As an alternate pipeline, we test fine-tuning of different deep learning models to propose a simpler and faster model without the use of attention layers. The experiments show that our approaches have competitive performances while reducing the complexity of the recognition process. -
Fast and Robust Detection of Solar Modules in Electroluminescence Images
Mathis Hoffmann, Bernd Doll, Florian Talkenberg, Christoph J. Brabec, Andreas K. Maier, Vincent ChristleinAbstractFast, non-destructive and on-site quality control tools, mainly high sensitive imaging techniques, are important to assess the reliability of photovoltaic plants. To minimize the risk of further damages and electrical yield losses, electroluminescence (EL) imaging is used to detect local defects in an early stage, which might cause future electric losses. For an automated defect recognition on EL measurements, a robust detection and rectification of modules, as well as an optional segmentation into cells is required. This paper introduces a method to detect solar modules and crossing points between solar cells in EL images. We only require 1-D image statistics for the detection, resulting in an approach that is computationally efficient. In addition, the method is able to detect the modules under perspective distortion and in scenarios, where multiple modules are visible in the image. We compare our method to the state of the art and show that it is superior in presence of perspective distortion while the performance on images, where the module is roughly coplanar to the detector, is similar to the reference method. Finally, we show that we greatly improve in terms of computational time in comparison to the reference method. -
Large Field/Close-Up Image Classification: From Simple to Very Complex Features
Quyet-Tien Le, Patricia Ladret, Huu-Tuan Nguyen, Alice CaplierAbstractIn this paper, the main contribution is to explore three different types of features including Exchangeable Image File (EXIF) features, handcrafted features and learned features in order to address the problem of large field/close up images classification with a Support Vector Machine (SVM) classifier. The impacts of every feature set on classification performances and computational complexities are investigated and compared to each other. Results prove that learned features are of course very efficient but with a computational cost that might be unreasonable. On the contrary, it appears that it is worthy to consider EXIF features when available because they represent a very good compromise between accuracy and computational cost. -
Enhancing Low Quality Face Image Matching by Neurovisually Inspired Deep Learning
Apurba Das, Pallavi SahaAbstractComputerized human face matching from low quality images is an active area of research in deformable pattern recognition especially in non-cooperative security, surveillance, authentication and multi-camera tracking. In low resolution and motion-blurry face images captured from surveillance cameras, it is challenging to get good match of faces and even extracting suitable feature vectors both in classical signal/image processing based and deep learning based approaches. In the current work, we have proposed a novel low quality face image matching algorithm in the light of a neuro-visually inspired method of figure-ground segregation (NFGS). The said framework is inspired by the non-linear interaction between the classical receptive field (CRF) and its non-classical extended surround, comprising of the non-linear mean increasing and decreasing sub-units. The current work demonstrates not only better detection of low quality face images in NFGS enabled deep learning framework, but also it prescribes an efficient way of low quality face image matching addressing low contrast, low resolution and motion blur which are prime responsible factors of making image low quality. The experimental results shows the effectiveness of proposed algorithm not only quantitatively but also qualitatively in terms of psycho-visual experiments and its statistical analysis outcome. -
On the Computation of the Euler Characteristic of Binary Images in the Triangular Grid
Lidija Čomić, Andrija BlesićAbstractApart from the widely used square grid, other regular grids (hexagonal and triangular) are gaining prominence in topological data analysis and image analysis/processing communities. One basic but important integer-valued topological descriptor of binary images in these grids is the Euler characteristic.We extend two algorithms for the computation of the Euler characteristic from the square to the triangular grid, taking into account specific properties of the triangular grid. The first algorithm is based on simple cell counting, and the second is based on critical point approach. Both algorithms iterate over the grid vertices. We extend also their improvement based on reusing information common to the previous and the next vertex in the scan order. Our experiments show that the critical point based algorithms outperform the naive cell-counting ones, with the improved versions reducing the average runtime further. -
Feature GANs: A Model for Data Enhancement and Sample Balance of Foreign Object Detection in High Voltage Transmission Lines
Yimin Dou, Xiangru Yu, Jinping LiAbstractThe suspension of foreign objects on high-voltage transmission lines is extremely harmful to the safety of the line. If it is not handled in time, it will easily cause phase-to-phase short circuit of the transmission line and even cause forest fires. Foreign object suspension is a small probability event with fewer existing samples. To use CNN to perform target classification detection, there is a problem of insufficient sample or sample imbalance. Aiming at the above problems that often occur in engineering applications of CNN, we propose a data enhancement algorithm based on GANs. The main idea of this algorithm is: Firstly, the pre-training model is used to extract the feature map of sample, and GANs is used to learn the feature map directly. Then, the feature map generated by GANs and the original data are used to train the classification layer of the pre-training model, so as to achieve the purpose of data enhancement and balancing samples, and then enhance the classification ability of the model. The experimental results show that the classification performance of several classical CNN models can be improved significantly by using this method in the case of insufficient sample and sample imbalance. -
Embedded Prototype Subspace Classification: A Subspace Learning Framework
Anders Hast, Mats Lind, Ekta VatsAbstractHandwritten text recognition is a daunting task, due to complex characteristics of handwritten letters. Deep learning based methods have achieved significant advances in recognizing challenging handwritten texts because of its ability to learn and accurately classify intricate patterns. However, there are some limitations of deep learning, such as lack of well-defined mathematical model, black-box learning mechanism, etc., which pose challenges. This paper aims at going beyond the black-box learning and proposes a novel learning framework called as Embedded Prototype Subspace Classification, that is based on the well-known subspace method, to recognise handwritten letters in a fast and efficient manner. The effectiveness of the proposed framework is empirically evaluated on popular datasets using standard evaluation measures.
-
- Title
- Computer Analysis of Images and Patterns
- Editors
-
Mario Vento
Gennaro Percannella
- Copyright Year
- 2019
- Publisher
- Springer International Publishing
- Electronic ISBN
- 978-3-030-29891-3
- Print ISBN
- 978-3-030-29890-6
- DOI
- https://doi.org/10.1007/978-3-030-29891-3
Accessibility information for this book is coming soon. We're working to make it available as quickly as possible. Thank you for your patience.