Skip to main content

Über dieses Buch

This book constitutes the thoroughly refereed proceedings of the 13th International Conference on Image Analysis and Recognition, ICIAR 2016, held in Póvoa de Varzim, Portugal, in July 2016.
The 79 revised full papers and 10 short papers presented were carefully reviewed and selected from 167 submissions. The papers are organized in the following topical sections: Advances in Data Analytics and Pattern Recognition with Applications, Image Enhancement and Restoration, Image Quality Assessment, Image Segmentation, Pattern Analysis and Recognition, Feature Extraction, Detection and Recognition, Matching, Motion and Tracking, 3D Computer Vision, RGB-D Camera Applications, Visual Perception in Robotics, Biometrics, Biomedical Imaging, Brain Imaging, Cardiovascular Image Analysis, Image Analysis in Ophthalmology, Document Analysis, Applications, and Obituaries.



Advances in Data Analytics and Pattern Recognition with Applications


Adaptation Approaches in Unsupervised Learning: A Survey of the State-of-the-Art and Future Directions

In real applications, data continuously evolve over time and change from one setting to another. This inspires the development of adaptive learning algorithms to deal with this data dynamics. Adaptation mechanisms for unsupervised learning have received an increasing amount of attention from researchers. This research activity has produced a lot of results in tackling some of the challenging problems of the adaptation process that are still open. This paper is a brief review of adaptation mechanisms in unsupervised learning focusing on approaches recently reported in the literature for adaptive clustering and novelty detection and discussing some future directions. Although these approaches have able to cope with different levels of data non-stationarity, there is a crucial need to extend these approaches to be able to handle large amount of data in distributed resource-limited environments.

JunHong Wang, YunQian Miao, Alaa Khamis, Fakhri Karray, Jiye Liang

Semi-supervised Dictionary Learning Based on Hilbert-Schmidt Independence Criterion

In this paper, a novel semi-supervised dictionary learning and sparse representation (SS-DLSR) is proposed. The proposed method benefits from the supervisory information by learning the dictionary in a space where the dependency between the data and class labels is maximized. This maximization is performed using Hilbert-Schmidt independence criterion (HSIC). On the other hand, the global distribution of the underlying manifolds were learned from the unlabeled data by minimizing the distances between the unlabeled data and the corresponding nearest labeled data in the space of the dictionary learned. The proposed SS-DLSR algorithm has closed-form solutions for both the dictionary and sparse coefficients, and therefore does not have to learn the two iteratively and alternately as is common in the literature of the DLSR. This makes the solution for the proposed algorithm very fast. The experiments confirm the improvement in classification performance on benchmark datasets by including the information from both labeled and unlabeled data, particularly when there are many unlabeled data.

Mehrdad J. Gangeh, Safaa M. A. Bedawi, Ali Ghodsi, Fakhri Karray

Transferring and Compressing Convolutional Neural Networks for Face Representations

In this work we have investigated face verification based on deep representations from Convolutional Neural Networks (CNNs) to find an accurate and compact face descriptor trained only on a restricted amount of face image data. Transfer learning by fine-tuning CNNs pre-trained on large-scale object recognition has been shown to be a suitable approach to counter a limited amount of target domain data. Using model compression we reduced the model complexity without significant loss in accuracy and made the feature extraction more feasible for real-time use and deployment on embedded systems and mobile devices. The compression resulted in a 9-fold reduction in number of parameters and a 5-fold speed-up in the average feature extraction time running on a desktop CPU. With continued training of the compressed model using a Siamese Network setup, it outperformed the larger model.

Jakob Grundström, Jiandan Chen, Martin Georg Ljungqvist, Kalle Åström

Efficient Melanoma Detection Using Texture-Based RSurf Features

Melanoma is the most dangerous form of skin cancer. It develops from the melanin-producing cells known as melanocytes. If melanoma is recognized and treated early, it is almost always curable. However, in early stages, melanomas are similar to benign lesions known as moles, which also originate from melanocytes. Therefore, much effort is put on the correct automated recognition of melanomas. Current computer-aided diagnosis relies on the use of various sets of colour and/or texture features. In this contribution, we present a fully automated melanoma recognition system, which employs a single set of texture-based RSurf features. The experimental evaluation demonstrates promising results and indicates strong discrimination power of these features for melanoma recognition tasks.

Tomáš Majtner, Sule Yildirim-Yayilgan, Jon Yngve Hardeberg

High-Frequency Spectral Energy Map Estimation Based Gait Analysis System Using a Depth Camera for Pathology Detection

This paper presents a new and simple gait analysis system, from a depth camera placed in front of a subject walking on a treadmill, capable of detecting a healthy gait from an impaired one. Our system relies on the fact that a normal or healthy walk typically exhibits a smooth motion (depth) signal, at each pixel with less high-frequency spectral energy content than an impaired or abnormal walk. Thus, the estimation of a map showing the location and the amplitude of the high-frequency spectral energy (HFSE), for each subject, allows clinicians to visually quantify and localize the different impaired body parts of the patient and to quickly detect a possible disease. Even if the HFSE maps obtained are clearly intuitive for a rapid clinical diagnosis, the proposed system makes an automatic classification between normal gaits and those who are not with success rates ranging from 88.23 % to 92.15 %.

Didier Ndayikengurukiye, Max Mignotte

Combining Low-Level Features of Offline Questionnaires for Handwriting Identification

When using anonymous offline questionnaires for reviewing services or products it is often not guaranteed that a reviewer does this only once as intended. In this paper an applied combination of different features of handwritten characteristics and its fusion is presented to expose such manipulations. The presented approach covers the aspects of alignment normalization, segmentation, feature extraction, classification and fusion. Nine features from handwritten text, numbers and checkboxes are extracted and used to recognize hand-writer duplicates. The proposed method has been tested on a novel database containing pages of handwritten text produced by 1,734 writers. Furthermore we show that the unified biometric decision using a weighted sum combination rule can significantly improve writer identification performance even on low level features.

Dirk Siegmund, Tina Ebert, Naser Damer

Person Profiling Using Image and Facial Attributes Analyses on Unconstrained Images Retrieved from Online Sources

With the existence and growth of Social Network Services (SNS), they have become focus in data and image processing research and concerning their potential to describe persons based on online available information. In this paper we propose a novel approach for person profiling solely based on images for children and adolescents of age 10+. The application acquires pictures from search engines and SNS and performs image-based analysis focusing on facial attributes. Image analysis results using different image datasets are presented showing that image analytics faces challenges of its application unconstrained datasets, but has the potential to push SNS analytics to a new level of detail in people profiling. The applications aims at improving the target users’ media literacy, raising their awareness for risks and consequences and at encouraging them in dealing responsibly with pictures online.

Elisabeth Wetzinger, Michael Atanasov, Martin Kampel

Palm Print Identification and Verification Using a Genetic-Based Feature Extraction Technique

In this paper, we investigate the performance of two feature extraction techniques on palm prints images. The first is the Local Binary Pattern (LBP) feature extraction technique. The second is the Genetic and Evolutionary Feature Extraction (GEFE) technique. A set of feature extractors are evolved by GEFE and the average and best performance of the extractors are compared to the best scheme of LBP. The techniques are tested on left hand, right hand and combined hand datasets. The results show varying performances between the extraction techniques, but the GEFE approach is promising.

Joseph Shelton, John Jenkins, Kaushik Roy

PCA-Based Face Recognition: Similarity Measures and Number of Eigenvectors

This paper examines the performance of face recognition using Principal Component Analysis by (i) varying number of eigenvectors; and (ii) using different similarity measures for classification. We tested 15 similarity measures. ORL database is used for experimentation work which consists of 400 face images. We observed that changing similarity measure causes significant change in the performance. System showed best performance using following distance measures: Cosine, Correlation and City block. Using Cosine similarity measure, we needed to extract lesser images (30 %) in order to achieve cumulative recognition of 100 %. The performance of the system improved with the increasing number of eigenvectors (till roughly 30 % of eigenvectors). After that performance almost stabilized. Some of the worst performers are Standardized Euclidean, Weighted Modified SSE and Weighted Modified Manhattan.

Sushma Niket Borade, Ratnadeep R. Deshmukh

Image Enhancement and Restoration


Sinogram Restoration Using Confidence Maps to Reduce Metal Artifact in Computed Tomography

Metal artifact reduction (MAR) is a well-known problem and lots of studies have been performed during the last decades. The common standard methods for MAR consist of synthesizing missing projection data by using an interpolation or in-painting process. However, no method has been yet proposed to solve MAR problem when no sinogram is available. This paper proposes a novel MAR approach using confidence maps to restore an artifacted sinogram computed directly from the reconstructed image.

Louis Frédérique, Benoit Recur, Sylvain Genot, Jean-Philippe Domenger, Pascal Desbarats

Enhancement of a Turbulent Degraded Frame Using 2D-DTW Averaging

Atmospheric turbulence causes objects in video sequences to appear blurred and waver slowly in a quasi-periodic fashion resulting in a loss of detail. A DTW (Dynamic Time Warping) averaging algorithm is presented to extract a single, geometrically improved and sharper frame from a sequence of frames using 2D-DTW. The extracted frame is shown to be sharper over utilizing simple temporal averaging by preserving edges and lines as well as being geometrically improved.

Rishaad Abdoola, Barend van Wyk

Denoising Multi-view Images Using Non-local Means with Different Similarity Measures

We present a stereo image denoising algorithm. Our algorithm takes as an input a pair of noisy images of an object captured from two different directions (stereo images). We use either Maximum Difference or Singular Value Decomposition similarity metrics for identifying locations of similar searching windows in the input images. We adapt the Non-local Means algorithm for denoising collected patches from the searching windows. Experimental results show that our algorithm outperforms the original Non-local Means and our previous method Stereo images denoising using Non-local Means with Structural SIMilarity (S-SSIM), and it helps to estimate more accurate disparity maps at various noise levels.

Monagi H. Alkinani, Mahmoud R. El-Sakka

Image Denoising Using Euler-Lagrange Equations for Function-Valued Mappings

In this paper, we consider a new method for representing complex images, e.g., hyperspectral images and video sequences, in terms of function-valued mappings (FVMs), also known as Banach-valued functions. At each (pixel) location x, the FVM image u(x) is a function, as opposed to the traditional vector approach. We define the Fourier transform of an FVM as well as Euler-Lagrange conditions for functionals involving FVMs and then show how these results can be used to devise some FVM-based methods of denoising. We consider a very simple functional and present some numerical results.

Daniel Otero, Davide La Torre, Edward R. Vrscay

Runtime Performance Enhancement of a Superpixel Based Saliency Detection Model

Reducing computational cost of image processing for various real time computer and robotic vision tasks, e.g. object recognition and tracking, adaptive compression, content aware image resizing, etc. remains a challenge. Saliency detection is often utilized as a pre-processing step for rapid, parallel, bottom-up processing of low level image features to compute saliency map. Subsequent higher level, complex computer vision tasks can then conveniently focus on identified salient locations for further image processing. Thus, saliency detection has successfully mitigated computational complexity of image processing tasks although processing speed enhancement still remains a desired goal. Recent fast and improved superpixel models are furnishing fresh incentive to employ them in saliency detection models to reduce computational complexity and enhance runtime speed. In this paper, we propose use of the superpixel extraction via energy driven sampling (SEEDS) algorithm to achieve processing speed enhancement in an existing saliency detection model. Evaluation results show that our modified model achieves over 60 % processing speed enhancement while maintaining accuracy comparable to the original model.

Qazi Aitezaz Ahmed, Mahmood Akhtar

Total Variation Minimization for Measure-Valued Images with Diffusion Spectrum Imaging as Motivation

In this paper, we present a notion of total variation for measure-valued images. Our motivation is Diffusion Spectrum Imaging (DSI) in which the diffusion at each voxel is characterized by a probability density function. We introduce a total variation denoising problem for measure-valued images. In the one-dimensional case, this problem (which involves the Monge-Kantorovich metric for measures) can be solved using cumulative distribution functions. In higher dimensions, more computationally expensive methods must be employed.

Davide La Torre, Franklin Mendivil, Oleg Michailovich, Edward R. Vrscay

Image Quality Assessment


Quality Assessment of Spectral Reproductions: The Camera’s Perspective

This study introduces a computationally efficient framework to measure the difference between two reflectance spectra in terms of how an arbitrary RGB camera can distinguish between them under an arbitrary light source. Given one set of selected illuminants and one of selected camera models (red, green and blue sensors’ spectral responses), results indicate that both sets can be reduced in order to alleviate the computational load of the task while losing little accuracy in measurements.

Steven Le Moan

An Image Database for Design and Evaluation of Visual Quality Metrics in Synthetic Scenarios

This paper presents a new image database which provides images for evaluation and design of visual quality assessment metrics. It contains 1688 images, 8 reference images, 7 types of distortions per reference image and 30 distortions per type and reference. The distortion types address image errors arising in visual compositions of real and synthetic content, thus provide a basis for visual quality assessment metrics targeting augmented and virtual reality content. In roughly 200 subjective experiments over 17.000 evaluations have been gathered and Mean Opinion Scores for the database have been obtained. The evaluation of several existing and widely used quality metrics on the proposed database is included in this paper. The database is freely available, reproducible and extendable for further scientific research.

Christopher Haccius, Thorsten Herfet

Perceptual Comparison of Multi-exposure High Dynamic Range and Single-Shot Camera RAW Photographs

In this paper we evaluate the perceptual fidelity of single-shot low dynamic range photographs of high dynamic range scenes. We argue that contemporary DSLR (digital single-lens reflex) cameras equipped with the high-end sensors are enough to capture full luminance range of the majority of typical scenes. The RGB images computed directly from the camera sensor data, called RAW images, retain the entire dynamic range of the sensor, however, they suffer from visible noise in dark regions. In this work we evaluate visibility of this noise in a perceptual experiment, in which people manually mark differences between a single-shot camera RAW image and a corresponding high quality image - the high dynamic range photograph created using the multi-exposure technique. We also show that the HDR-VDP-2 image quality metric can be efficiently applied to automatically detect noisy regions without the need for time-consuming experiments.

Tomasz Sergej, Radosław Mantiuk

Objective Image Quality Measures of Degradation in Compressed Natural Images and their Comparison with Subjective Assessments

This paper is concerned with the degradation produced in natural images by JPEG compression. Our study has been basically twofold: (i) To find relationships between the amount of compression-induced degradation in an image and its various statistical properties. The goal is to identify blocks that will exhibit lower/higher rates of degradation as the degree of compression increases. (ii) To compare the above objective characterizations with subjective assessments of observers.The conclusions of our study are rather significant in several aspects. First of all, “bad” blocks, i.e., blocks exhibiting greater degrees of degradation visually, have among the lowest RMSEs of all blocks and among the medium-to-highest structural similarity (SSIM)-based errors. Secondly, the standard deviations of “bad” blocks are among the lowest of all blocks, suggesting a kind of “Weber law for compression,” a consequence of contrast masking. Thirdly, “bad” blocks have medium-to-high high-frequency (HF) fractions as opposed to HF content.

Alison K. Cheeseman, Ilona A. Kowalik-Urbaniak, Edward R. Vrscay

Image Segmentation


Human Detection Based on Infrared Images in Forestry Environments

It is essential to have a reliable system to detect humans in close range of forestry machines to stop cutting or carrying operations to prohibit any harm to humans. Due to the lighting conditions and high occlusion from the vegetation, human detection using RGB cameras is difficult. This paper introduces two human detection methods in forestry environments using a thermal camera; one shape-dependent and one shape-independent approach. Our segmentation algorithm estimates location of the human by extracting vertical and horizontal borders of regions of interest (ROIs). Based on segmentation results, features such as ratio of height to width and location of the hottest spot are extracted for the shape-dependent method. For the shape-independent method all extracted ROI are resized to the same size, then the pixel values (temperatures) are used as a set of features. The features from both methods are fed into different classifiers and the results are evaluated using side-accuracy and side-efficiency. The results show that by using shape-independent features, based on three consecutive frames, we reach a precision rate of 80 % and recall of 76 %.

Ahmad Ostovar, Thomas Hellström, Ola Ringdahl

Cell Segmentation Using Level Set Methods with a New Variance Term

We present a new method for segmentation of phase-contrast microscopic images of cells. The algorithm is based on the variational formulation of the level set method, i.e. minimizing of a functional, which describes the level set function. The functional is minimized by a gradient flow described by an evolutionary partial differential equation. The most significant new ideas are initialization using thresholding and the introduction of a new term based on local variance that speeds up convergence and achieves more accurate results. The proposed algorithm is applied on real data and compared with another algorithm. Our method yields an average gain in accuracy of 2 %.

Zuzana Bílková, Jindřich Soukup, Václav Kučera

Video Object Segmentation Based on Superpixel Trajectories

In this paper, a video object segmentation method utilizing the motion of superpixel centroids is proposed. Our method achieves the same advantages of methods based on clustering point trajectories, furthermore obtaining dense clustering labels from sparse ones becomes very easy. Simply for each superpixel the label of its centroid is propagated to all its entire pixels. In addition to the motion of superpixel centroids, histogram of oriented optical flow, HOOF, extracted from superpixels is used as a second feature. After segmenting each object, we distinguish between foreground objects and the background utilizing the obtained clustering results.

Mohamed A. Abdelwahab, Moataz M. Abdelwahab, Hideaki Uchiyama, Atsushi Shimada, Rin-ichiro Taniguchi

Interactive 3D Segmentation of Lymphatic Valves in Confocal Microscopic Images

We present a novel method of segmentation of lymph valve leaflets from confocal microscopy studies. By using a user informed, layer based segmentation framework, we segment the outer boundary of the lymph valve in 3D from a series of confocal images. This boundary is then used to compute the surface structure of the vessel by providing a boundary constraint to a dual graph based on minimum surface segmentation. This segmentation creates a point cloud of voxels on the surface of the valve structure, we then apply an RBF interpolation to reconstruct it as a continuous surface.

Jonathan-Lee Jones, Xianghua Xie

Automatic Nonlinear Filtering and Segmentation for Breast Ultrasound Images

Breast cancer is one of the leading causes of cancer death among women worldwide. The proposed approach comprises three steps as follows. Firstly, the image is preprocessed to remove speckle noise while preserving important features of the image. Three methods are investigated, i.e., Frost Filter, Detail Preserving Anisotropic Diffusion, and Probabilistic Patch-Based Filter. Secondly, Normalized Cut or Quick Shift is used to provide an initial segmentation map for breast lesions. Thirdly, a postprocessing step is proposed to select the correct region from a set of candidate regions. This approach is implemented on a dataset containing 20 B-mode ultrasound images, acquired from UDIAT Diagnostic Center of Sabadell, Spain. The overall system performance is determined against the ground truth images. The best system performance is achieved through the following combinations: Frost Filter with Quick Shift, Detail Preserving Anisotropic Diffusion with Normalized Cut and Probabilistic Patch-Based with Normalized Cut.

Mohamed Elawady, Ibrahim Sadek, Abd El Rahman Shabayek, Gerard  Pons, Sergi Ganau

Pattern Analysis and Recognition


Phenotypic Integrated Framework for Classification of ADHD Using fMRI

Attention Deficit Hyperactive Disorder (ADHD) is one of the most common disorders affecting young children, and its underlying mechanism is not completely understood. This paper proposes a phenotypic integrated machine learning framework to investigate functional connectivity alterations between ADHD and control subjects not diagnosed with ADHD, employing fMRI data. Our aim is to apply computational techniques to (1) automatically classify a person’s fMRI signal as ADHD or control, (2) identify differences in functional connectivity of these two groups and (3) evaluate the importance of phenotypic information for classification. In the first stage of our framework, we determine the functional connectivity of brain regions by grouping brain activity using clustering algorithms. Next, we employ Elastic Net based feature selection to select the most discriminant features from the dense functional brain network and integrate phenotypic information. Finally, a support vector machine classifier is trained to classify ADHD subjects vs. control. The proposed framework was evaluated on a public dataset ADHD-200, and our classification results outperform the state-of-the-art on some subsets of the data.

Atif Riaz, Eduardo Alonso, Greg Slabaugh

Directional Local Binary Pattern for Texture Analysis

In this paper, a new features method, the Directional Local Binary Pattern (DLBP), is presented, with an objective to improve Local Directional Pattern (LDP) for texture analysis. The idea of Directional DLBP is inspired by the stability of the Kirsch mask directional responses and the LBP neighboring concept. The result shows that Directional Local Binary Pattern outperforms LDP and LBP.

Abuobayda M. Shabat, Jules-Raymond Tapamo

Kernel Likelihood Estimation for Superpixel Image Parsing

In superpixel-based image parsing, the image is first segmented into visually consistent small regions, i.e. superpixels; then superpixels are parsed into different categories. SuperParsing algorithm provides an elegant nonparametric solution to this problem without any need for classifier training. Superpixels are labeled based on the likelihood ratios that are computed from class conditional density estimates of feature vectors. In this paper, local kernel density estimation is proposed to improve the estimation of likelihood ratios and hence the labeling accuracy. By optimizing kernel bandwidths for each feature vector, feature densities are better estimated especially when the set of training samples is sparse. The proposed method is tested on the SIFT Flow dataset consisting of 2,688 images and 33 labels, and is shown to outperform SuperParsing and some of its extended versions in terms of classification accuracy.

Hasan F. Ates, Sercan Sunetci, Kenan E. Ak

Multinomial Sequence Based Estimation Using Contiguous Subsequences of Length Three

The Maximum Likelihood (ML) and Bayesian estimation paradigms work within the model that the data, from which the parameters are to be estimated, is treated as a set rather than as a sequence. The pioneering paper that dealt with the field of sequence-based estimation [2] involved utilizing both the information in the observations and in their sequence of appearance. The results of [2] introduced the concepts of Sequence Based Estimation (SBE) for the Binomial distribution, where the authors derived the corresponding MLE results when the samples are taken two-at-a-time, and then extended these for the cases when they are processed three-at-a-time, four-at-a-time etc. These results were generalized for the multinomial “two-at-a-time” scenario in [3]. This paper (This paper is dedicated to the memory of Dr. Mohamed Kamel, who was a close friend of the first author.) now further generalizes the results found in [3] for the multinomial case and for subsequences of length 3. The strategy used in [3] (and also here) involves a novel phenomenon called “Occlusion” that has not been reported in the field of estimation. The phenomenon can be described as follows: By occluding (hiding or concealing) certain observations, we map the estimation problem onto a lower-dimensional space, i.e., onto a binomial space. Once these occluded SBEs have been computed, the overall Multinomial SBE (MSBE) can be obtained by combining these lower-dimensional estimates. In each case, we formally prove and experimentally demonstrate the convergence of the corresponding estimates.

B. John Oommen, Sang-Woon Kim

Feature Extraction


Rotation Tolerant Hand Pose Recognition Using Aggregation of Gradient Orientations

The visual recognition of hand poses is one of the central problems in the development of applications controlled by visual gestures. In this paper, a generic orientation histogram based technique is described and applied to the pose recognition from intensity images. The technique addresses the need for rotation tolerant recognition using an orientation normalization technique, where the uncertainty related to the reference point of normalization is also taken into account by cyclic filtering. To complement the scheme, the circularly symmetric composition of histogram aggregation regions is introduced and the rotation tolerance can be controlled by range selection. In the experiments, we provide results on the choice between the parameter values and make comparisons to the existing techniques, which show the potential of the approach.

Pekka Sangi, Matti Matilainen, Olli Silvén

Extracting Lineage Information from Hand-Drawn Ancient Maps

In this paper, we present an efficient segmentation technique that extracts piecewise linear patterns from hand-drawn maps. The user is only required to place the starting and end points and the method is capable of extracting the route that connects the two, which closely colocates with the hand-drawn map. It provides an effective approach to interactively process and understand those historical maps. The proposed method employs supervised learning to evaluate at every pixel location the probability that such a lineage pattern exists, followed by shortest path segmentation to extract the border of interest.

Ehab Essa, Xianghua Xie, Richard Turner, Matthew Stevens, Daniel Power

Evaluation of Stochastic Gradient Descent Methods for Nonlinear Mapping of Hyperspectral Data

In this paper, we conducted a study of several gradient descent methods namely gradient descent, stochastic gradient descent, momentum method, and AdaGrad for nonlinear mapping of hyperspectral satellite images. The studied methods are compared in terms of both data mapping error and operation time. Two possible applications of the studied methods are considered. First application is the nonlinear dimensionality reduction of the hyperspectral images for the further classification. Another application is the visualization of the hyperspectral images in false colors. The study was carried out using well known hyperspectral satellite images.

Evgeny Myasnikov

Automatic Selection of the Optimal Local Feature Detector

A large number of different local feature detectors have been proposed in the last few years. However, each feature detector has its own strengths and weaknesses that limit its use to a specific range of applications. In this paper is presented a tool capable of quickly analysing input images to determine which type and amount of transformation is applied to them and then selecting the optimal feature detector, which is expected to perform the best. The results show that the performance and the fast execution time render the proposed tool suitable for real-world vision applications.

Bruno Ferrarini, Shoaib Ehsan, Naveed Ur Rehman, Ales̆ Leonardis, Klaus D. McDonald-Maier

Multiple Object Scene Description for the Visually Impaired Using Pre-trained Convolutional Neural Networks

This paper introduces a new method for multiple object scene description as part of a system to guide the visually impaired in an indoor environment. Here we are interested in a coarse scene description, where only the presence of certain objects is indicated regardless of its position in the scene. The proposed method is based on the extraction of powerful features using pre-trained convolutional neural networks (CNN), then training a Neural Network regression to predict the content of any unknown scene based on its CNN feature. We have found the CNN feature to be highly descriptive, even though it is trained on auxiliary data from a completely different domain.The proposed methodology was assessed on four datasets representing different indoor environments. It achieves better results in terms of both accuracy and processing time when compared to state-of-the art.

Haikel Alhichri, Bilel Bin Jdira, Yacoub bazi, Naif Alajlan

Detection and Recognition


Effective Comparison Features for Pedestrian Detection

For real applications of pedestrian detection, both detection speed and detection accuracy are important. In this paper we propose a detector based on effective comparison features (ECFs) for simultaneously improving detection accuracy and speed. ECFs are defined as the features helping to improve actual performance. Using only these ECFs as feature candidates for the split nodes of decision trees, our detector can achieve accurate results. As an additional benefit, detection speed is improved by earlier rejection of negative samples. Experiments are conducted using well-known benchmark datasets for pedestrian detection. The experimental results of our ECF detector show that our detection speed is 1–2 orders of magnitude faster than the speed of state-of-the-art algorithms, with comparable detection accuracy.

Kang-Kook Kong, Jong-Woo Lee, Ki-Sang Hong

Counting People in Crowded Scenes via Detection and Regression Fusion

It is particularly important for surveillance systems to track the number of people in crowded scenes. In this paper, we look into this problem of counting people in crowded scenes and propose a framework that fuses information coming from detection, tracking and region regression together. For counting by regression, we propose to use region covariance features in the form of Sigma Sets in conjunction with interest point features. Experimental results on two benchmark datasets demonstrate that using region covariance features for the purpose of people counting yields effective results. Moreover, our results indicate that fusing detection and regression is beneficial for more accurate people counting in crowded scenes.

Cemil Zalluhoglu, Nazli Ikizler-Cinbis

Multi-graph Based Salient Object Detection

We propose a multi-layer graph based approach for salient object detection in natural images. Starting from a set of multi-scale image decomposition using superpixels, we propose an objective function optimized on a multi-layer graph structure to diffuse saliency from image borders to salient objects. After isolating the object kernel, we enhance the accuracy of our saliency maps through an objectness-like based refinement approach. Beside its simplicity, our algorithm yields very accurate salient objects with clear boundaries. Experiments have shown that our approach outperforms several recent methods dealing with salient object detection.

Idir Filali, Mohand Said Allili, Nadjia Benblidia

Analysis of Temporal Coherence in Videos for Action Recognition

This paper proposes an approach to improve the performance of activity recognition methods by analyzing the coherence of the frames in the input videos and then modeling the evolution of the coherent frames, which constitute a sub-sequence, to learn a representation for the videos. The proposed method consist of three steps: coherence analysis, representation leaning and classification. Using two state-of-the-art datasets (Hollywood2 and HMDB51), we demonstrate that learning the evolution of subsequences in lieu of frames, improves the recognition results and makes actions classification faster.

Adel Saleh, Mohamed Abdel-Nasser, Farhan Akram, Miguel Angel Garcia, Domenec Puig

Effectiveness of Camouflage Make-Up Patterns Against Face Detection Algorithms

The goal of this research was to evaluate which make-up patterns are effective in disrupting face detection algorithms. Three free or open source implementations of various face detection algorithms were selected. These were at first tested on an unaltered dataset. The dataset was then augmented with different make-up patterns. The patterns were chosen arbitrarily with the goal to disrupt the detection algorithms. The results show that the selected patterns decrease the accuracy of the face detection algorithms by about 10 %.

Vojtěch Frič

A Comparative Study of Vision-Based Traffic Signs Recognition Methods

Traffic signs recognition is an important component in driver assistance systems as it helps driving under safety regulations. The aim of this work is to propose a vision based traffic sign recognition. In the recognition process, we detect the potential traffic signs regions using monocular color based segmentation. Afterwards, we identify the traffic sign class using its HoG features and the SVM classifier. As shown experimentally, compared to leading methods from the literature under complex conditions, our method has a higher efficiency.

Nadra Ben Romdhane, Hazar Mliki, Rabii El Beji, Mohamed Hammami

A Copy-Move Detection Algorithm Using Binary Gradient Contours

Nowadays copy-move attack is one of the most obvious ways of digital image forgery in order to hide the information contained in images. Copy-move process consists of copying the fragment from one place of an image, changing it and pasting it to another place of the same image. However, only a few existing studies reached high detection accuracy for a narrow range of transform parameters. In this paper, we propose a copy-move detection algorithm that uses features based on binary gradient contours that are robust to contrast enhancement, additive noise and JPEG compression. The proposed solution showed high detection accuracy and the results are supported by conducted experiments for wide ranges of transform parameters. A comparison of features based on binary gradient contours and based on various forms of local binary patterns showed a significant 20–30 % difference in detection accuracy, corresponding to an improvement with the proposed solution.

Andrey Kuznetsov, Vladislav Myasnikov

Object Detection and Localization Using Deep Convolutional Networks with Softmax Activation and Multi-class Log Loss

We introduce a deep neural network that can be used to localize and detect a region of interest (ROI) in an image. We show how this network helped us extract ROIs when working on two separate problems: a whale recognition problem and a heart volume estimation problem. In the former problem, we used this network to localize the head of the whale while in the later we used it to localize the heart left ventricle from MRI images. Most localization networks regress a bounding box around the region of interest. Unlike these architecture, we treat the problem as a classification problem where each pixel in the image is a separate class. The network is trained on images along with masks which indicate where the object is in the image. We treat the problem as a multi-class classification. Therefore, the last layer has a softmax activation. Furthermore, during training, the mutli-class log loss is minimized just like any classification task.

AbdulWahab Kabani, Mahmoud R. El-Sakka

Clustering-Based Abnormal Event Detection: Experimental Comparison for Similarity Measures’ Efficiency

The detection of abnormal events is a major challenge in video surveillance systems. In most of the cases, it is based on the analysis of the trajectories of moving objects in a controlled scene. The existing works rely on two phases. Firstly, they extract normal/abnormal clusters from saved trajectories through an unsupervised clustering algorithm. In the second phase, they consider a new detected trajectory and classify it as either normal or abnormal. In both phases, they need to compute similarity between trajectories. Thus, measuring such a similarity is a critical step while analyzing trajectories since it affects the quality of further applications such as clustering and classification. Despite the differences of the measured distances, authors claim the performance of the adopted distance. In this paper, we present a comparative experimental study on the efficiency of four distances widely used as trajectories’ similarity measure. Particularly, we examine the impact of the use of these distances on the quality of trajectory clustering. The experimental results demonstrate that the Longest Common SubSequence (LCSS) distance is the most accurate and efficient for the clustering task even in the case of different sampling rates and noise.

Najla Bouarada Ghrab, Emna Fendri, Mohamed Hammami



Improved DSP Matching with RPCA for Dense Correspondences

The Deformable Spatial Pyramid (DSP) matching method is popular for dense matching of images with different scenes but sharing similar semantic content, which achieves high matching accuracy. However, the warped image generated by DSP is not smooth, which mainly results from the noisy flow field by DSP. We observed the flow field could be decomposed into a low-rank term and a sparse term. Meanwhile, Robust Principle Component Analysis (RPCA) is capable of recovering the low-rank component from an observation with sparse noises. So, in this paper we propose to use RPCA to deal with the non-smoothness in DSP by recovering the low-rank term from the flow field. Experiments on VGG and LMO datasets verify that our approach obtains smoother warped image and gains higher matching accuracy than the DSP.

Fanhuai Shi, Yanli Zhang

An Approach to Improve Accuracy of Photo–to–Sketch Matching

The problem of automatically matching sketches to facial photos is discussed. The idea presented is based on generating a population of sketches which imitates sketches generated from verbal descriptions provided by a virtual group of witnesses in forensic practice. Structures of benchmark photo–sketch databases are presented that are intended to model and implement a face photo retrieval by a given sketch. A new component of these databases is a population of sketches that represents each separate class of original photos. In this case, the original sketch is transformed into such population and then within this population we find a sketch that is similar to the given sketch. We demonstrate results of experiments based on proposed methods for photo to sketch matching on CUFS and CUFSF databases.

Georgy Kukharev, Yuri Matveev, Paweł Forczmański

Motion and Tracking


Bio-inspired Boosting for Moving Objects Segmentation

Developing robust and universal methods for unsupervised segmentation of moving objects in video sequences has proved to be a hard and challenging task. State-of-the-art methods show good performance in a wide range of situations, but systematically fail when facing more challenging scenarios. Lately, a number of image processing modules inspired in biological models of the human visual system have been explored in different areas of application. This paper proposes a bio-inspired boosting method to address the problem of unsupervised segmentation of moving objects in video that shows the ability to overcome some of the limitations of widely used state-of-the-art methods. An exhaustive set of experiments was conducted and a detailed analysis of the results, using different metrics, revealed that this boosting is more significant when challenging scenarios are faced and state-of-the-art methods tend to fail.

Isabel Martins, Pedro Carvalho, Luís Corte-Real, José Luis Alba-Castro

A Lightweight Face Tracking System for Video Surveillance

This paper deals with the problem of multiple face tracking for video surveillance systems. Although a considerable number of object tracking approaches have been developed, the video surveillance scenario allows additional assumptions on the tracker’s operational environment. Based on these assumptions, the tracking system including a face detector and a tracking subsystem is presented. The tracking algorithm is based on computationally inexpensive Binary Robust Independent Elementary Features (BRIEF). The implemented tracking system was tested on two video sequences. The experiments showed a significant improvement of processing rate over a detector-based system along with a reasonable tracking quality.

Andrei Oleinik

Single Droplet Tracking in Jet Flow

Fluid systems such as the multiphase flow and the jet flow usually involve droplets and/or bubbles whose morphological properties can provide important clues about the underlying phenomena. In this paper, we develop a new visual tracking method to track the evolution of single droplets in the jet flow. Shape and motion features of the detected droplets are fused and Bhattacharyya distance is employed to find the closest droplet among possible candidates in consecutive frames. Shapes of the droplets are not assumed to be circles or ellipses during segmentation process, which utilizes morphological operations and thresholding. The evolution of single droplets in the jet flow were monitored via Particle Shadow Sizing (PSS) technique where they were tracked with 86 % average accuracy and 15 fps real-time performance.

Gokhan Alcan, Morteza Ghorbani, Ali Kosar, Mustafa Unel

Video Based Group Tracking and Management

Tracking objects in video is a very challenging research topic, particularly when people in groups are tracked, with partial and full occlusions and group dynamics being common difficulties. Hence, its necessary to deal with group tracking, formation and separation, while assuring the overall consistency of the individuals. This paper proposes enhancements to a group management and tracking algorithm that receives information of the persons in the scene, detects the existing groups and keeps track of the persons that belong to it. Since input information for group management algorithms is typically provided by a tracking algorithm and it is affected by noise, mechanisms for handling such noisy input tracking information were also successfully included. Performed experiments demonstrated that the described algorithm outperformed state-of-the-art approaches.

Américo Pereira, Alexandra Familiar, Bruno Moreira, Teresa Terroso, Pedro Carvalho, Luís Côrte-Real

3D Computer Vision


Calibration of Shared Flat Refractive Stereo Systems

The calibration of underwater camera systems differs significantly from calibration in air due to the refraction of light. In this paper, we present a calibration approach for a shared flat refractive stereo system that is based on virtual object points. We propose a sampling strategy in combination with an efficiently solvable set of equations for the calibration of the refractive parameters. Due to the independence of calibration targets of known dimensions, the approach can be realized by using stereo correspondences alone.

Tim Dolereit, Uwe Freiherr von Lukas

3D Structured Light Scanner on the Smartphone

In the recent years turning smartphones into 3D reconstruction devices has been greatly investigated. Different 3D reconstruction concepts have been proposed, and one of the most popular is based on IR projection of a pseudorandom dots (speckle) pattern. We demonstrate our idea how a pseudorandom dots pattern can be used and we also present an active approach applying a structured light (SL) scanning on the smartphone. SL has a number of advantages compared to other 3D reconstruction concepts and likewise our smartphone implementation inherits the same advantages compared to other smartphone based solutions. The shown qualitative and quantitative results demonstrate the comparable outcome with the standard type SL scanner.

Tomislav Pribanić, Tomislav Petković, Matea Đonlić, Vincent Angladon, Simone Gasparini

Stereo and Active-Sensor Data Fusion for Improved Stereo Block Matching

This paper proposes an algorithm which uses the depth information acquired from an active sensor as guidance for a block matching stereo algorithm. In the proposed implementation, the disparity search interval used for the block matching is reduced around the depth values obtained from the active sensor, which leads to an improved matching quality and denser disparity maps and point clouds. The performance of the proposed method is evaluated by carrying out a series of experiments on 3 different data sets obtained from different robotic systems. We demonstrate with experimental results that the disparity estimation is improved and denser disparity maps are generated.

Stefan-Daniel Suvei, Leon Bodenhagen, Lilita Kiforenko, Peter Christiansen, Rasmus N. Jørgensen, Anders G. Buch, Norbert Krüger

Dense Lightfield Disparity Estimation Using Total Variation Regularization

Plenoptic cameras make a trade-off between spatial and angular resolution. The knowledge of the disparity map allows to improve the resolution of these cameras using superresolution techniques. Nonetheless, the disparity map is often unknown and must be recovered from the lightfield captured. Hence, we focus on improving the disparity estimation from the structure tensor analysis of the epipolar plane images obtained from the lightfield. Using an hypercube representation, we formalize a data fusion problem with total variation regularization using the Alternating Direction Method of Multipliers. Assuming periodic boundary conditions allowed us to integrate the full 4D lightfield efficiently using the frequency domain. We applied this methodology to a synthetic dataset. The disparity estimations are more accurate than those of the structure tensor.

Nuno Barroso Monteiro, João Pedro Barreto, José Gaspar

Target Position and Speed Estimation Using LiDAR

In this paper, an efficient and reliable framework to estimate the position and speed of moving vehicles is proposed. The method fuses LiDAR data with image based object detection algorithm output. LiDAR sensors deliver 3D point clouds with a positioning accuracy of up to two centimeters. 2D object data leads to a significant reduction of the search space. Outliers removal techniques are applied to the reduced 3D point cloud for a more reliable representation of the data. Furthermore, a multi-hypothesis Kalman filter is implemented to determine the target object’s speed. The accuracy of the position and velocity estimation is verified through real data and simulation. Additionally, the proposed framework is real-time capable and suitable for embedded-vision related applications.

Enes Dayangac, Florian Baumann, Josep Aulinas, Matthias Zobel

RGB-D Camera Applications


Combining 3D Shape and Color for 3D Object Recognition

We present new results in object recognition based on color and 3D shape obtained from 3D cameras. Namely, we further exploit diffusion processes to represent shape and the use of color/texture as a perturbation to the diffusion process. Diffusion processes are an effective tool to replace shortest path distances in the characterization of 3D shapes. They also provide effective means for the seamlessly representation of color and shape, mainly because they provide information both the color and on their distribution over surfaces. While there have been different approaches for incorporating color information in the diffusion process, this is the first work that explores different parameterizations of color and their impact on recognition tasks. We present results using very challenging datasets, where we propose to recognize different instances of the same object class assuming a very limited a-priori knowledge on each individual object.

Susana Brandão, João P. Costeira, Manuela Veloso

Privacy-Preserving Fall Detection in Healthcare Using Shape and Motion Features from Low-Resolution RGB-D Videos

This paper addresses the issue on fall detection in healthcare using RGB-D videos. Privacy is often a major concern in video-based detection and analysis methods. We propose a video-based fall detection scheme with privacy preserving awareness. First, a set of features is defined and extracted, including local shape and shape dynamic features from object contours in depth video frames, and global appearance and motion features from HOG and HOGOF in RGB video frames. A sequence of time-dependent features is then formed by a sliding window averaging of features along the temporal direction, and use this as the input of a SVM classifier for fall detection. Separate tests were conducted on a large dataset for examining the fall detection performance with privacy-preserving awareness. These include testing the fall detection scheme that solely uses depth videos, solely uses RGB videos in different resolution, as well as the influence of individual features and feature fusion to the detection performance. Our test results show that both the dynamic shape features from depth videos and motion (HOGOF) features from low-resolution RGB videos may preserve the privacy meanwhile yield good performance (91.88 % and 97.5 % detection, with false alarm $$\le $$ 1.25 %). Further, our results show that the proposed scheme is able to discriminate highly confused classes of activities (falling versus lying down) with excellent performance. Our study indicates that methods based on depth or low-resolution RGB videos may still provide effective technologies for the healthcare, without impact personnel privacy.

Irene Yu-Hua Gu, Durga Priya Kumar, Yixiao Yun

Visual Perception in Robotics


Proprioceptive Visual Tracking of a Humanoid Robot Head Motion

This paper addresses the problem of measuring a humanoid robot head motion by fusing inertial and visual data. In this work, a model of a humanoid robot head, including a camera and inertial sensors, is moved on the tip of an industrial robot which is used as ground truth for angular position and velocity. Visual features are extracted from the camera images and used to calculate angular displacement and velocity of the camera, which is fused with angular velocities from a gyroscope and fed into a Kalman Filter. The results are quite interesting for two different scenarios and with very distinct illumination conditions. Additionally, errors are introduced artificially into the data to emulate situations of noisy sensors, and the system still performs very well.

João Peixoto, Vitor Santos, Filipe Silva

A Hybrid Top-Down Bottom-Up Approach for the Detection of Cuboid Shaped Objects

While bottom-up approaches to object recognition are simple to design and implement, they do not yield the same performance as top-down approaches. On the other hand, it is not trivial to obtain a moderate number of plausible hypotheses to be efficiently verified by top-down approaches. To address these shortcomings, we propose a hybrid top-down bottom-up approach to object recognition where a bottom-up procedure that generates a set of hypothesis based on data is combined with a top-down process for evaluating those hypotheses. We use the recognition of rectangular cuboid shaped objects from 3D point cloud data as a benchmark problem for our research. Results obtained using this approach demonstrate promising recognition performances.

Rafael Arrais, Miguel Oliveira, César Toscano, Germano Veiga

The Impact of Convergence Cameras in a Stereoscopic System for AUVs

Underwater imaging is being increasingly helpful for the autonomous robots to reconstruct and map the marine environments which is fundamental for searching for pipelines or wreckages in depth waters. In this context, the accuracy of the information obtained from the environment is of extremely importance. This work presents a study about the accuracy of a reconfigurable stereo vision system while determining a dense disparity estimation for underwater imaging. The idea is to explore the advantage of this kind of system for underwater autonomous vehicles (AUV) since varying parameters like the baseline and the pose of the cameras make possible to extract accurate 3D information at different distances between the AUV and the scene. Therefore, the impact of these parameters is analyzed using a metric error of the point cloud acquired by a stereoscopic system. Furthermore, results obtained directly from an underwater environment proved that a reconfigurable stereo system can have some advantages for autonomous vehicles since, in some trials, the error was reduced by 0.05 m for distances between 1.125 and 2.675 m.

João Aguiar, Andry Maykol Pinto, Nuno A. Cruz, Anibal C. Matos



Gender Recognition from Face Images Using a Fusion of SVM Classifiers

The recognition of gender from face images is an important application, especially in the fields of security, marketing and intelligent user interfaces. We propose an approach to gender recognition from faces by fusing the decisions of SVM classifiers. Each classifier is trained with different types of features, namely HOG (shape), LBP (texture) and raw pixel values. For the latter features we use an SVM with a linear kernel and for the two former ones we use SVMs with histogram intersection kernels. We come to a decision by fusing the three classifiers with a majority vote. We demonstrate the effectiveness of our approach on a new dataset that we extract from FERET. We achieve an accuracy of 92.6 %, which outperforms the commercial products Face++ and Luxand.

George Azzopardi, Antonio Greco, Mario Vento

Kinship Verification from Faces via Similarity Metric Based Convolutional Neural Network

The ability to automatically determine whether two persons are from the same family or not is referred to as Kinship (or family) verification. This is a recent and challenging research topic in computer vision. We propose in this paper a novel approach to kinship verification from facial images. Our solution uses similarity metric based convolutional neural networks. The system is trained using Siamese architecture specific constraints. Extensive experiments on the benchmark KinFaceW-I & II kinship face datasets showed promising results compared to many state-of-the-art methods.

Lei Li, Xiaoyi Feng, Xiaoting Wu, Zhaoqiang Xia, Abdenour Hadid

Combination of Topological and Local Shape Features for Writer’s Gender, Handedness and Age Classification

In this work, writer’s gender, handedness and age range prediction is addressed through automatic analysis of handwritten sentences. Three SVM-based predictors associated to different data features are developed. Then, a Fuzzy MIN-MAX combination rule is proposed to aggregate robust prediction from individual systems. Experiments are carried on two public Arabic and English datasets. Results in terms of prediction accuracy demonstrate the usefulness of the proposed algorithm, which provides a gain between 1 % and 10 % over both individual systems and classical combination rules. Moreover, it is much more relevant than various state of the art methods.

Nesrine Bouadjenek, Hassiba Nemmour, Youcef Chibani

Hybrid Off-Line Handwritten Signature Verification Based on Artificial Immune Systems and Support Vector Machines

This paper proposes a new handwritten signature verification method based on a combination of an artificial immune algorithm with SVM. In a first step, the Artificial Immune Recognition System (AIRS) is trained to develop a set of representative data (memory cells) of both genuine and forged signature classes. Usually, to classify a questioned signature, dissimilarities are calculated with respect to all memory cells and handled according to the k Nearest Neighbor rule. Presently, we propose the training of these dissimilarities by a Support Vector Machine (SVM) classifier to get a more discriminating decision. Histogram of oriented gradients is used for feature generation. Experiments conducted on two standard datasets reveal that the proposed system provides a significant accuracy improvement compared to the conventional AIRS.

Yasmine Serdouk, Hassiba Nemmour, Youcef Chibani

Selection of User-Dependent Cohorts Using Bezier Curve for Person Identification

The traditional biometric systems can be strengthened further with exploiting the concept of cohort selection to achieve the high demands of the organizations for a robust automated person identification system. To accomplish this task the researchers are being motivated towards developing robust biometric systems using cohort selection. This paper proposes a novel user-dependent cohort selection method using Bezier curve. It makes use of invariant SIFT descriptor to generate matching pair points between a pair of face images. Further for each subject, considering all the imposter scores as control points, a Bezier curve of degree n is plotted by applying De Casteljau algorithm. As long as the imposter scores represent the control points in the curve, a cohort subset is formed by considering the points determined to be far from the Bezier curve. In order to obtain the normalized cohort scores, T-norm cohort normalization technique is applied. The normalized scores are then used in recognition. The experiment is conducted on FEI face database. This novel cohort selection method achieves superior performance that validates its efficiency.

Jogendra Garain, Ravi Kant Kumar, Dakshina Ranjan Kisku, Goutam Sanyal

Biomedical Imaging


Bag of Visual Words Approach for Bleeding Detection in Wireless Capsule Endoscopy Images

Wireless Capsule Endoscopy(WCE) is a revolutionary technique for visualizing patient’s entire digestive tract. But, the analysis of a huge number of images produced during an examination of a patient is hindering the application of WCE. In this direction, we automated the process of bleeding detection in WCE images based on improved Bag of Visual Words (BoVW). Two feature integration schemes have been explored. Experimental results show that the best classification performance is obtained using integration of SIFT and uniform LBP features. The highest classification accuracy achieved is 95.06 % for a visual vocabulary of length 100. Results reveal that the proposed methodology is discriminating enough to classify bleeding images.

Indu Joshi, Sunil Kumar, Isabel N. Figueiredo

Central Medialness Adaptive Strategy for 3D Lung Nodule Segmentation in Thoracic CT Images

In this paper, a Hessian-based strategy, based on the central medialness adaptive principle, was adapted and proposed in a multiscale approach for the 3D segmentation of pulmonary nodules in chest CT scans. This proposal is compared with another well stated Hessian based strategy of the literature, for nodule extraction, in order to demonstrate its accuracy.Several scans from the Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI) database were employed in the test and validation procedure. The scans include a large and heterogeneous set of 569 solid and mostly solid nodules with a large variability in the nodule characteristics and image conditions. The results demonstrated that the proposal offers correct results, similar to the performance of the radiologists, providing accurate nodule segmentations that perform the desirable scenario for a posterior analysis and the eventual lung cancer diagnosis.

Luis Gonçalves, Jorge Novo, Aurélio Campilho

A Self-learning Tumor Segmentation Method on DCE-MRI Images

Tumor segmentation is a challenging, but substantial task in diagnosis, treatment planning and monitoring. This paper presents a self-learning technique to segment lesions on clinical 3D MRI images. The method is self-learning and iterative: instead of creating a model from manually segmented tumors it learns a given individual tumor in an iterative way without user interaction in the learning cycles. Based on a manually defined region of interest the presented iterative approach first learns the tumor features from the initial region using Random Forest classifier, then in each subsequent cycle it updates the previously learned model automatically. The method was evaluated on liver DCE-MRI images using manually defined tumor segmentation as reference. The algorithm was tested for various types of liver tumors. The presented results showed good correlation with the reference using absolute volume difference and DICE similarity measurements which gave 7.8 % and 88 % average results respectively.

Szabolcs Urbán, László Ruskó, Antal Nagy

Open Access

Morphological Separation of Clustered Nuclei in Histological Images

Automated nuclear segmentation is essential in the analysis of most microscopy images. This paper presents a novel concavity-based method for the separation of clusters of nuclei in binary images. A heuristic rule, based on object size, is used to infer the existence of merged regions. Concavity extrema detected along the merged-cluster boundary are used to guide the separation of overlapping regions. Inner split contours of multiple concavities along the nuclear boundary are estimated via a series of morphological procedures. The algorithm was evaluated on images of H400 cells in monolayer cultures and compares favourably with the state-of-art watershed method commonly used to separate overlapping nuclei.

Shereen Fouad, Gabriel Landini, David Randell, Antony Galton

Fitting of Breast Data Using Free Form Deformation Technique

Nowadays, breast cancer has become the most common cancer amongst females. As long as breast is assumed to be a feminine symbol, any imposed deformation of surgical procedures can affect the patients’ quality of life. However, using a planning tool which is based on parametric modeling, not only improves surgeons’ skills in order to perform surgeries with better cosmetic outcomes, but also increases the interaction between surgeons and patients during the decision for necessary procedures. In the current research, a methodology of parametric modeling, called Free-Form Deformation (FFD) is studied. Finally, confirmed by a quantitative analysis, we proposed two simplified versions of FFD methodology to increase model similarity to input data and decrease required fitting time.

Hooshiar Zolfagharnasab, Jaime S. Cardoso, Hélder P. Oliveira

Domain Adaptive Classification for Compensating Variability in Histopathological Whole Slide Images

Histopathological whole slide images of the same organ stained with the same dye exhibit substantial inter-slide variation due to the manual preparation and staining process as well as due to inter-individual variability. In order to improve the generalization ability of a classification model on data from kidney pathology, we investigate a domain adaptation approach where a classifier trained on data from the source domain is presented a small number of user-labeled samples from the target domain. Domain adaptation resulted in improved classification performance, especially when combined with an interactive labeling procedure.

Michael Gadermayr, Martin Strauch, Barbara Mara Klinkhammer, Sonja Djudjaj, Peter Boor, Dorit Merhof

Comparison of Flow Cytometry and Image-Based Screening for Cell Cycle Analysis

Quantitative cell state measurements can provide a wealth of information about mechanism of action of chemical compounds and gene functionality. Here we present a comparison of cell cycle disruption measurements from commonly used flow cytometry (generating one-dimensional signal data) and bioimaging (producing two-dimensional image data). Our results show high correlation between the two approaches indicating that image-based screening can be used as an alternative to flow cytometry. Furthermore, we discuss the benefits of image informatics over conventional single-signal flow cytometry.

Damian J. Matuszewski, Ida-Maria Sintorn, Jordi Carreras Puigvert, Carolina Wählby

Brain Imaging


Improving QuickBundles to Extract Anatomically Coherent White Matter Fiber-Bundles

The construction of White Matter (WM) fiber-bundles has been largely investigated in the literature. Indeed, both manual and automatic approaches for isolating and extracting WM fiber-bundles have been proposed in the past. Each family of approaches has its pros and cons. One of the most known automatic approaches is QuickBundles (QB). Undoubtedly, the main feature of this approach is its quickness. However, due to its way of proceeding, QB could return anatomically incoherent fiber-bundles. In this paper, we propose an approach that integrates QB with a string-based fiber representation to overcome this problem. We also present the results of some experiments conceived to compare our approach with QB.

Francesco Cauteruccio, Claudio Stamile, Giorgio Terracina, Domenico Ursino, Dominique Sappey-Marinier

Automatic Rating of Perivascular Spaces in Brain MRI Using Bag of Visual Words

Perivascular spaces (PVS), if enlarged and visible in magnetic resonance imaging (MRI), relate to poor cognition, depression in older age, Parkinson’s disease, inflammation, hypertension and cerebral small vessel disease. In this paper we present a fully automatic method to rate the burden of PVS in the basal ganglia (BG) region using structural brain MRI. We used a Support Vector Machine classifier and described the BG following the bag of visual words (BoW) model. The latter was evaluated using a) Scale Invariant Feature Transform (SIFT) descriptors of points extracted from a dense sampling and b) textons, as local descriptors. BoW using SIFT yielded a global accuracy of 82.34 %, whereas using textons it yielded 79.61 %.

Víctor González-Castro, María del C. Valdés Hernández, Paul A. Armitage, Joanna M. Wardlaw

White Matter Fiber-Bundle Analysis Using Non-negative Tensor Factorization

With the development of advanced image acquisition and processing techniques providing better biomarkers for the characterization of brain diseases, the automatic analysis of biomedical imaging constitutes a critical point. In particular, analysis of complex data structure is a challenge for better understanding complex brain pathologies like multiple sclerosis (MS).In this work, we describe a new fully automated method based on non-negative tensor factorization (NTF) to analyze white matter (WM) fiber-bundles. This method allows to extract, from a WM fiber-bundle, the set of fibers affected by the pathology, discriminating fibers affected by the pathological from the healthy fibers.Our method was validated on simulated data and also applied on real MS patients. Results show the high precision level of our method to extract fibers affected by the pathological process.

Claudio Stamile, François Cotton, Frederik Maes, Dominique Sappey-Marinier, Sabine Van Huffel

Cardiovascular Image Analysis


A Flexible 2D-3D Parametric Image Registration Algorithm for Cardiac MRI

We propose a mathematical formulation aimed at parametric intensity-based registration of a deformed 3D volume to a 2D slice. The approach is flexible and can accommodate various regularization schemes, similarity measures, and optimizers. We evaluate the framework on 2D-3D registration experiments of in vivo cardiac magnetic resonance imaging (MRI) aimed at image-guided surgery applications that use of real-time MRI as a visualization tool. An affine transformation is used to demonstrate this parametric model. Target registration error, Jaccard and Dice indices are used to validate the algorithm and demonstrate the accuracy of the registration scheme on both simulated and clinical data.

L. W. Lorraine Ma, Mehran Ebrahimi

Sparse-View CT Reconstruction Using Curvelet and TV-Based Regularization

The reconstruction from sparse-view projections is one of important problems in computed tomography limited by the availability or feasibility of a large number of projections. Total variation (TV) approaches have been introduced to improve the reconstruction quality by smoothing the variation between neighboring pixels. However, the TV-based methods for images with textures or complex shapes may generate artifacts and cause loss of details. Here, we propose a new regularization model for CT reconstruction by combining regularization methods based on TV and the curvelet transform. Combining curvelet regularizer, which is optimally sparse with better directional sensitivity than wavelet transforms with TV on the other hand will give us a unique regularization model that leads to the improvement of the reconstruction quality. The split-Bregman (augmented Lagrangian) approach has been used as a solver which makes it easy to incorporate multiple regularization terms including the one based on the multiresolution transformation, in our case curvelet transform, into optimization framework. We compare our method with the methods using only TV, wavelet, and curvelet as the regularization terms on the test phantom images. The results show that there are benefits in using the proposed combined curvelet and TV regularizer in the sparse view CT reconstruction.

Ali Pour Yazdanpanah, Emma E. Regentova

Estimating Ejection Fraction and Left Ventricle Volume Using Deep Convolutional Networks

We present a fully automated method to estimate the ejection fraction, the end-systolic and end-diastolic volumes from cardiac MRI images. These values can be manually measured by a cardiologist but the process is slow and time consuming. The method is based on localizing the left ventricle of the image. Then, the slices are cleaned, re-ordered, and preprocessed using the DICOM meta fields. The end-systolic and end-diastolic images for each slice are identified. Finally, the end-systolic and end-diastolic images are passed to a neural network to estimate the volumes.

AbdulWahab Kabani, Mahmoud R. El-Sakka

A Hybrid Model for Extracting the Aortic Valve in 3D Computerized Tomography and Its Application to Calculate a New Calcium Score Index

In this paper a new scheme for automatic segmentation of the Aortic Valve in 3D computed tomography image sequences is presented. The algorithm is based on a new approach that uses a combination of Region Growing and Mathematical Morphology techniques in a hybrid framework. The output of the algorithm is used to assess the Aortic Valve Calcium Score in a new way that calculates the Agatston Score separately in both Sinuses and Leaflets, deriving a new index based on their ratios. Aortic Valve borders and leaflets identification is still a challenging task, and commonly based on intensive user interaction that limits its applicability. In this paper a fast and accurate model-free, automated method for segmenting and extracting morphological parameters with Score Calcium calculation is presented. Results of the proposed method are also provided showing a high correlation with the expected values.

Laura Torío, César Veiga, María Fernández, Victor Jiménez, Emilio Paredes, Pablo Pazos, Francisco Calvo, Andrés Íñiguez

Image Analysis in Ophthalmology


Automatic Optic Disc and Fovea Detection in Retinal Images Using Super-Elliptical Convergence Index Filters

This paper presents an automatic optic disc (OD) and fovea detection technique using an innovative super-elliptical filter (SEF). This filter is suitable for the detection of semi-elliptical convex shapes and as such it performs well for the OD localization. Furthermore, we introduce a setup for the simultaneous localization of the OD and fovea, in which the detection result of one landmark facilitates the detection of the other one. The evaluation is performed on 1200 images of the MESSIDOR dataset containing both normal and pathological cases of diabetic retinopathy (DR) and macular edema (ME). The proposed approach achieves success rates of 99.75 % and 98.87 % for the OD and fovea detection, respectively and outperforms or equals all known similar methods.

Behdad Dashtbozorg, Jiong Zhang, Fan Huang, Bart M. ter Haar Romeny

Age-Related Macular Degeneration Detection and Stage Classification Using Choroidal OCT Images

Age-Related Macular Degeneration (AMD) is a progressive eye disease which damages the retina and causes visual impairment. Detecting those in the early stages at most risk of progression will allow more timely treatment and preserve sight. In this paper, we propose a machine learning based method to detect AMD and distinguish the different stages using choroidal images obtained from optical coherence tomography (OCT). We extract texture features using a Gabor filter bank and non-linear energy transformation. Then the histogram based feature descriptors are used to train the random forests, Support Vector Machine (SVM) and neural networks, which are tested on our choroid OCT image dataset with 21 participants. The experimental results show the feasibility of our method.

Jingjing Deng, Xianghua Xie, Louise Terry, Ashley Wood, Nick White, Tom H. Margrain, Rachel V. North

3D Retinal Vessel Tree Segmentation and Reconstruction with OCT Images

Detection and analysis of the arterio-venular tree of the retina is a relevant issue, providing useful information in procedures such as the diagnosis of different pathologies. Classical approaches for vessel extraction make use of 2D acquisition paradigms and, therefore, obtain a limited representation of the vascular structure. This paper proposes a new methodology for the automatic 3D segmentation and reconstruction of the retinal arterio-venular tree in Optical Coherence Tomography (OCT) images. The methodology takes advantage of different image analysis techniques to initially segment the vessel tree and estimate its calibers along it. Then, the corresponding depth for the entire vessel tree is obtained. Finally, with all this information, the method performs the 3D reconstruction of the entire vessel tree.The test and validation procedure employed 196 OCT histological images with the corresponding near infrared reflectance retinographies. The methodology showed promising results, demonstrating its accuracy in a complex domain, providing a coherent 3D vessel tree reconstruction that can be posteriorly analyzed in different medical diagnostic processes.

Joaquim de Moura, Jorge Novo, Marcos Ortega, Pablo Charlón

Segmentation of Retinal Blood Vessels Based on Ultimate Elongation Opening

This paper proposes a method for segmentation of retinal blood vessels based on ultimate attribute opening (UAO). The proposed approach analyzes the space of numerical residues generated by UAO in order to select the residues extracted from elongated regions by means of an elongation shape descriptor. Thus, the residues extracted are used to define the ultimate elongation opening. Experimental results, using the public datasets DRIVE and STARE show that the proposed approach is fast, simple and comparable to other methods found in the literature.

Wonder A. L. Alves, Charles F. Gobber, Sidnei A. Araújo, Ronaldo F. Hashimoto

Document Analysis


ISauvola: Improved Sauvola’s Algorithm for Document Image Binarization

Binarization of historical documents is difficult and is still an open area of research. In this paper, a new binarization technique for document images is presented. The proposed technique is based on the most commonly used binarization method: Sauvola’s, which performs relatively well on classical documents, however, three main defects remain: the window parameter of Sauvola’s formula does not fit automatically to the image content, is not robust to low contrasts, and not invariant with respect to contrast inversion. Thus on documents such as magazines, the content may not be retrieved correctly. In this paper we use the image contrast that is defined by the local image minimum and maximum in combination with the computed Sauvola’s binarization step to guarantee good quality binarization for both low and correctly contrasted objects inside a single document, without adjusting manually the user-defined parameters to the document content.

Zineb Hadjadj, Abdelkrim Meziane, Yazid Cherfa, Mohamed Cheriet, Insaf Setitra

Recognition of Handwritten Arabic Words with Dropout Applied in MDLSTM

Offline handwriting recognition is the ability to decode an intelligible handwritten input from paper documents into digitized format readable by machines. This field remains an on-going research problem especially for Arabic Script due to its cursive appearance, the variety of writers and the diversity of styles. In this paper we focus on the Intelligent Words Recognition system based on MDLSTM, on which a dropout technique is applied during training stage. This technique prevents our system against overfitting and improves the recognition rate. To evaluate our system we use IFN/ENIT database.

Rania Maalej, Najiba Tagougui, Monji Kherallah

Direct Unsupervised Text Line Extraction from Colored Historical Manuscript Images Using DCT

Extracting lines of text from a manuscript is an important preprocessing step in many digital paleography applications. These extracted lines play a fundamental part in the identification of the author and/or age of the manuscript. In this paper we present an unsupervised approach to text line extraction in historical manuscripts that can be applied directly to a color manuscript image. Each of the red, green and blue channels are processed separately by applying DCT on them individually. One of the key advantages of this approach is that it can be applied directly to the manuscript image without any preprocessing, training or tuning steps. Extensive testing on complex Arabic handwritten manuscripts shows the effectiveness of the proposed approach.

Asim Baig, Somaya Al-Maadeed, Ahmed Bouridane, Mohamed Cheriet



Time Series Analysis of Garment Distributions via Street Webcam

The discovery of patterns and events in the physical world by analysis of multiple streams of sensor data can provide benefit to society in more than just surveillance applications by focusing on automated means for social scientists, anthropologists and marketing experts to detect macroscopic trends and changes in the general population. This goal complements analogous efforts in documenting trends in the digital world, such as those in social media monitoring. In this paper we show how the contents of a street webcam, processed with state-of-the-art deep networks, can provide information about patterns in clothing and their relation to weather information. In particular, we analyze a large time series of street webcam images, using a deep network trained for garment detection, and demonstrate how the garment distribution over time significantly correlates to weather and temporal patterns. Finally, we additionally provide a new and improved labelled dataset of garments for training and benchmarking purposes, reporting $$58.19\,\%$$ overall accuracy on the ACS test set, the best performance yet obtained.

Sen Jia, Thomas Lansdall-Welfare, Nello Cristianini

Automatic System for Zebrafish Counting in Fish Facility Tanks

In this project we propose a computer vision method, based on background subtraction, to estimate the number of zebrafish inside a tank. We addressed questions related to the best choice of parameters to run the algorithm, namely the threshold blob area for fish detection and the reference area from which a blob area in a threshed frame may be considered as one or multiple fish. Empirical results obtained after several tests show that the method can successfully estimate, within a margin of error, the number of zebrafish (fries or adults) inside fish tanks proving that adaptive background subtraction is extremely effective for blob isolation and fish counting.

Francisco J. Silvério, Ana C. Certal, Carlos Mão de Ferro, Joana F. Monteiro, José Almeida Cruz, Ricardo Ribeiro, João Nuno Silva

A Lightweight Mobile System for Crop Disease Diagnosis

This paper presents a low-complexity mobile application for automatically diagnosing crop diseases in the field. In an initial pre-processing stage, the system leverages the capability of a smartphone device and basic image processing algorithms to obtain consistent leaf orientation and to remove the background. A number of different features are then extracted from the leaf, including texture, colour and shape features. Nine lightweight sub-features are combined and implemented as a feature descriptor for this mobile environment. The system is applied to six wheat leaf types: non-disease, yellow rust, Septoria, brown rust, powdery mildew and tan spots, which are commonly occurring wheat diseases worldwide. The standalone application demonstrates the possibilities for disease diagnosis under realistic circumstances, with disease/non-disease detection accuracy of approximately 88 %, and can provide a possible disease type within a few seconds of image acquisition.

Punnarai Siricharoen, Bryan Scotney, Philip Morrow, Gerard Parr

Automatic Cattle Identification Using Graph Matching Based on Local Invariant Features

Cattle muzzle classification can be considered as a biometric identifier important to animal traceability systems to ensure the integrity of the food chain. This paper presents a muzzle-based classification system that combines local invariant features with graph matching. The proposed approach consists of three phases; namely feature extraction, graph matching, and matching refinement. The experimental results showed that our approach is superior than existing works as ours achieves an all correct identification for the tested images. In addition, the results proved that our proposed method achieved this high accuracy even if the testing images are rotated in various angles.

Fernando C. Monteiro

An Intelligent Vision-Based System Applied to Visual Quality Inspection of Beans

In this work it is proposed an intelligent vision-based system for automatic classification of beans most consumed in Brazil. The system is able to classify the grains contained in a sample according to their skin colors, and is composed by three modules: image acquisition and pre-processing; segmentation of grains and classification of grains. In the conducted experiments, we used an apparatus controlled by a PC that includes a conveyor belt, an image acquisition chamber and a camera, to simulate an industrial line of production. The results obtained in the performed experiments indicate that the proposed system could be applied to visual quality inspection of beans produced in Brazil, since one of the steps in this process is the measurement of the mixture contained in a sample, taking into account the skin color of grains, for determining the predominant class of product and, consequently, its market price.

P. A. Belan, S. A. Araújo, W. A. L. Alves


Weitere Informationen

Premium Partner