Machine Learning in Image Recognition

Frontmatter

A Weight-Selection Strategy on Training Deep Neural Networks for Imbalanced Classification

Deep Neural Networks (DNN) have recently received great attention due to their superior performance in many machining-learning problems. However, the use of DNN is still impeded, if the input data is imbalanced. Imbalanced classification refers to the problem that one class contains a much smaller number of samples than the others in classification. It poses a great challenge to existing classifiers including DNN, due to the difficulty in recognizing the minority class. So far, there are still limited studies on how to train DNN for imbalanced classification. In this study, we propose a new strategy to reduce over-fitting in training DNN for imbalanced classification based on weight selection. In training DNN, by splitting the original training set into two subsets, one used for training to update weights, and the other for validation to select weights, the weights that render the best performance in the validation set would be selected. To our knowledge, it is the first systematic study to examine a weight-selection strategy on training DNN for imbalanced classification. Demonstrated by experiments on 10 imbalanced datasets obtained from MNIST, the DNN trained by our new strategy outperformed the DNN trained by a standard strategy and the DNN trained by cost-sensitive learning with statistical significance (p = 0.00512). Surprisingly, the DNN trained by our new strategy was trained on 20% less training images, corresponding to 12,000 less training images, but still achieved an outperforming performance in all 10 imbalanced datasets. The source code is available in https://github.com/antoniosehk/WSDeepNN.

Antonio Sze-To, Andrew K. C. Wong

End-to-End Deep Learning for Driver Distraction Recognition

In this paper, an end-to-end deep learning solution for driver distraction recognition is presented. In the proposed framework, the features from pre-trained convolutional neural networks VGG-19 are extracted. Despite the variation in illumination conditions, camera position, driver’s ethnicity, and genders in our dataset, our best fine-tuned model, VGG-19 has achieved the highest test accuracy of 95% and an average accuracy of 80% per class. The model is tested with leave-one-driver-out cross validation method to ensure generalization. The results show that our framework avoided the overfitting problem which typically occurs in low-variance datasets. A comparison between our framework with the state-of-the-art XGboost shows that the proposed approach outperforms XGBoost in accuracy by approximately 7%.

Arief Koesdwiady, Safaa M. Bedawi, Chaojie Ou, Fakhri Karray

Deep CNN with Graph Laplacian Regularization for Multi-label Image Annotation

To compensate for incomplete or imprecise tags in training samples, this paper proposes a learning algorithm for the convolutional neural network (CNN) for multi-label image annotation by introducing co-occurrence dependency between tags as a graph Laplacian regularization term. To exploit the co-occurrence dependency, we apply Hayashi’s quantification method-type III to the tags in the training samples and use the distances between the acquired representative vectors to define the weights for graph Laplacian regularization. By introducing this regularization term, the possibility of co-occurrence between tags with high co-occurrence frequency can be increased. To confirm the effectiveness of the proposed algorithm, we have done experiments using Corel5k’s dataset for multi-label image annotation.

Jonathan Mojoo, Keiichi Kurosawa, Takio Kurita

Transfer Learning Using Convolutional Neural Networks for Face Anti-spoofing

Face recognition systems are gaining momentum with current developments in computer vision. At the same time, tactics to mislead these systems are getting more complex, and counter-measure approaches are necessary. Following the current progress with convolutional neural networks (CNN) in classification tasks, we present an approach based on transfer learning using a pre-trained CNN model using only static features to recognize photo, video or mask attacks. We tested our approach on the REPLAY-ATTACK and 3DMAD public databases. On the REPLAY-ATTACK database our accuracy was 99.04% and the half total error rate (HTER) of 1.20%. For the 3DMAD, our accuracy was of 100.00% and HTER 0.00%. Our results are comparable to the state-of-the-art.

Oeslle Lucena, Amadeu Junior, Vitor Moia, Roberto Souza, Eduardo Valle, Roberto Lotufo

Depth from Defocus via Active Quasi-random Point Projections: A Deep Learning Approach

Depth estimation plays an important role in many computer vision and computer graphics applications. Existing depth measurement techniques are still complex and restrictive. In this paper, we present a novel technique for inferring depth measurements via depth from defocus using active quasi-random point projection patterns. A quasi-random point projection pattern is projected onto the scene of interest, and each projection point in the image captured by a cellphone camera is analyzed using a deep learning model to estimate the depth at that point. The proposed method has a relatively simple setup, consisting of a camera and a projector, and enables depth inference from a single capture. We evaluate the proposed method both quantitatively and qualitatively and demonstrate strong potential for simple and efficient depth sensing.

Avery Ma, Alexander Wong, David Clausi

Machine Learning for Medical Image Computing

Frontmatter

Discovery Radiomics via a Mixture of Deep ConvNet Sequencers for Multi-parametric MRI Prostate Cancer Classification

Prostate cancer is the most diagnosed form of cancer in men, but prognosis is relatively good with a sufficiently early diagnosis. Radiomics has been shown to be a powerful prognostic tool for cancer detection; however, these radiomics-driven methods currently rely on hand-crafted sets of quantitative imaging-based features, which can limit their ability to fully characterize unique prostate cancer tumour traits. We present a novel discovery radiomics framework via a mixture of deep convolutional neural network (ConvNet) sequencers for generating custom radiomic sequences tailored for prostate cancer detection. We evaluate the performance of the mixture of ConvNet sequencers against state-of-the-art hand-crafted radiomic sequencers for binary computer-aided prostate cancer classification using real clinical prostate multi-parametric MRI data. Results for the mixture of ConvNet sequencers demonstrate good performance in prostate cancer classification relative to the hand-crafted radiomic sequencers, and show potential for more efficient and reliable automatic prostate cancer classification.

Amir-Hossein Karimi, Audrey G. Chung, Mohammad Javad Shafiee, Farzad Khalvati, Masoom A. Haider, Ali Ghodsi, Alexander Wong

Discovery Radiomics for Pathologically-Proven Computed Tomography Lung Cancer Prediction

Lung cancer is the leading cause for cancer related deaths. As such, there is an urgent need for a streamlined process that can allow radiologists to provide diagnosis with greater efficiency and accuracy. A powerful tool to do this is radiomics: a high-dimension imaging feature set. In this study, we take the idea of radiomics one step further by introducing the concept of discovery radiomics for lung cancer prediction using CT imaging data. In this study, we realize these custom radiomic sequencers as deep convolutional sequencers using a deep convolutional neural network learning architecture. To illustrate the prognostic power and effectiveness of the radiomic sequences produced by the discovered sequencer, we perform cancer prediction between malignant and benign lesions from 97 patients using the pathologically-proven diagnostic data from the LIDC-IDRI dataset. Using the clinically provided pathologically-proven data as ground truth, the proposed framework provided an average accuracy of 77.52% via 10-fold cross-validation with a sensitivity of 79.06% and specificity of 76.11%, surpassing the state-of-the art method.

Devinder Kumar, Audrey G. Chung, Mohammad J. Shaifee, Farzad Khalvati, Masoom A. Haider, Alexander Wong

Left Ventricle Wall Detection from Ultrasound Images Using Shape and Appearance Information

Clinical analysis of heart conditions take into account parameters such as thickness, perimeter, and area of the left ventricle wall. These measurements are normally obtained from manual segmentation in ultrasound images, which depends on operator experience. Supporting this process through automatic segmentation methods is very challenging due to low resolution, missing information, noise, and blurring on these images. In this work, we propose a novel semi-automatic method of left ventricle detection in ultrasound images based on supervised learning. The method combines appearance and shape ventricle information implicitly through ring partitions, following the ventricle shape pattern in axial views. The results show the convenience of the method to deal with noise and missing information.

Gerardo Tibamoso, Sylvie Ratté, Luc Duong

Probabilistic Segmentation of Brain White Matter Lesions Using Texture-Based Classification

Lesions in brain white matter can cause significant functional deficits, and are often associated with neurological disease. The quantitative analysis of these lesions is typically performed manually by physicians on magnetic resonance images and represents a non-trivial, time-consuming and subjective task. The proposed method automatically segments white matter lesions using a probabilistic texture-based classification approach. It requires no parameters to be set, assumes nothing about lesion location, shape or size, and demonstrates better results (Dice coefficient of 0.84) when compared with other, similar published methods.

Mariana Bento, Yan Sym, Richard Frayne, Roberto Lotufo, Letícia Rittner

A Machine Learning-Driven Approach to Computational Physiological Modeling of Skin Cancer

Melanoma is the most lethal form of skin cancer in the world. To improve the accuracy of diagnosis, quantitative imaging approaches have been investigated. While most quantitative methods focus on the surface of skin lesions via hand-crafted imaging features, in this work, we take a machine-learning approach where abstract quantitative imaging features are learned to model physiological traits. In doing so, we investigate skin cancer detection via computational modeling of two major physiological features of melanoma namely eumelanin and hemoglobin concentrations from dermal images. This was done via employing a non-linear random forest regression model to leverage the plethora of quantitative features from dermal images to build the model. The proposed method was validated by separability test applied to clinical images. The results showed that the proposed method outperforms state-of-the-art techniques on predicting the concentrations of the skin cancer physiological features in dermal images (i.e., eumelanin and hemoglobin).

Daniel S. Cho, Farzad Khalvati, David A. Clausi, Alexander Wong

Ejection Fraction Estimation Using a Wide Convolutional Neural Network

We introduce a method that can be used to estimate the ejection fraction and volume of the left ventricle. The method relies on a deep and wide convolutional neural network to localize the left ventricle from MRI images. Then, the systole and diastole images can be determined based on the size of the localized left ventricle. Next, the network is used in order to segment the region of interest from the diastole and systole images. The end systolic and diastolic volumes are computed and then used in order to compute the ejection fraction. By using a localization network before segmentation, we are able to achieve results that are on par with the state-of-the-art and by annotating only 25 training subjects (5% of the available training subjects).

AbdulWahab Kabani, Mahmoud R. El-Sakka

Fully Deep Convolutional Neural Networks for Segmentation of the Prostate Gland in Diffusion-Weighted MR Images

Prostate cancer is a leading cause of mortality among men. Diffusion-weighted magnetic resonance imaging (DW-MRI) has shown to be successful at monitoring and detecting prostate tumors. The clinical guidelines to interpret DW-MRI for prostate cancer requires the segmentation of the prostate gland into different zones. Moreover, computer-aided detection tools which are designed to detect prostate cancer automatically, usually require the segmentation of prostate gland as a preprocessing step. In this paper, we present a segmentation algorithm for delineation of the prostate gland in DW-MRI via fully convolutional neural network. The segmentation algorithm was applied to images of 30 (testing) and 104 (training) patients and a median Dice Similarity Coefficient of 0.89 was achieved. This method is faster and returns similar results compared to registration based methods; although it has the potential to produce improved results given a larger training set.

Tyler Clark, Alexander Wong, Masoom A. Haider, Farzad Khalvati

Image Enhancement and Reconstruction

Frontmatter

Compensated Row-Column Ultrasound Imaging System Using Three Dimensional Random Fields

The row-column method received a lot of attention for 3-D ultrasound imaging. This simplification technique reduces the number of connections required to address a 2-D array and therefore reduces the amount of data to handle. However, Row-column ultrasound imaging still has its limitations: the issues of data sparsity, speckle noise, and the spatially varying point spread function with edge artifacts must all be taken into account when building a reconstruction framework. In this work, we introduce a compensated row-column ultrasound imaging system, termed 3D-CRC, that leverages 3-D information within an extended 3-D random field model to compensate for the intrinsic limitations of row-column method. Tests on 3D-CRC and previously published row-column ultrasound imaging systems show the potential of our proposed system as an effective tool for enhancing 3-D row-column imaging.

Ibrahim Ben Daya, Albert I. H. Chen, Mohammad Javad Shafiee, Alexander Wong, John T. W. Yeow

Curvelet-Based Bayesian Estimator for Speckle Suppression in Ultrasound Imaging

Ultrasound images are inherently affected by speckle noise, and thus reducing this noise is crucial for successful post-processing. One powerful approach for noise suppression in digital images is Bayesian estimation. In the Bayesian-based despeckling schemes, the choice of suitable statistical models and the development of a shrinkage function for estimation of the noise-free signal are the major concerns. In this paper, a novel curvelet-based Bayesian estimator for speckle removal in ultrasound images is developed. The curvelet coefficients of the degradation model of the noisy ultrasound image are decomposed into two components, namely noise-free signal and signal-dependent noise. The Cauchy and two-sided exponential distributions are assumed to be statistical models for the two components, respectively, and an efficient low-complexity realization of the Bayesian estimator is proposed. The experimental results demonstrate the superiority of the proposed despeckling scheme in achieving significant speckle suppression and preserving image details.

Rafat Damseh, M. Omair Ahmad

Object Boundary Based Denoising for Depth Images

Economical RGB-D cameras such as Kinect can produce both RGB and depth (RGB-D) images in real-time. The accuracy of various RGB-D related applications suffers from depth image noise. This paper proposes a solution to the problem by estimating depth edges that correspond to the object boundaries and using them as priors in the hole filling process. This method exhibits quantitative and qualitative improvements over the current state-of-the-art methods.

Mayoore S. Jaiswal, Yu-Ying Wang, Ming-Ting Sun

A Note on Boosting Algorithms for Image Denoising

In recent years, non-local methods have been among most efficient tools to address the classical problem of image denoising. Recently, Romano et al. have proposed a novel algorithm aimed at “boosting” of a number of non-local denoising algorithms as a “black-box.” In this manuscript, we consider this algorithm and derive an analytical expression corresponding to successive applications of their proposed “boosting scheme.” Mathematically, we prove that such successive application does not always enhance the input image and is equivalent to a re-parameterization of the original “boosting” algorithm. We perform a set of computational experiments on test images to support this claim. Finally, we conclude that considering the blind application of such boosting methods as a general remedy for all denoising schemes is questionable.

Cory Falconer, C. Sean Bohun, Mehran Ebrahimi

Image Segmentation

Frontmatter

K-Autoregressive Clustering: Application on Terahertz Image Analysis

In this paper, we propose to segment Terahertz (THz) images and introduce a new family of clustering based regression techniques suitable to time series. In particular, we propose a novel approach called K-Autoregressive (K-AR) model in which we assume that the time series depicting the pixels were generated by AR models. The K-AR model consists to minimize a new objective function for recovering the original K autoregressive models describing each cluster of time series. The corresponding pixels are then assigned to the clusters having the best AR model fitting. The order of K-AR model is automatically estimated using a model selection criterion. Our algorithm is tested on two real THz images. Comparison with existing clustering algorithms shows the efficiency of the proposed approach.

Mohamed Walid Ayech, Djemel Ziou

Scale and Rotation Invariant Character Segmentation from Coins

This paper presents a robust method for character segmentation from coin images. While many papers studied character segmentation and recognition from structured and unstructured documents. Several methods proposed that vary, in terms of targeted documents, from complex (degraded) into different languages. This is the first paper to study and propose a solution for character segmentation from coins. Character segmentation plays a crucial role in coin recognition, grading and authentication systems. Scaling and rotating the coins are challenging in character segmentation due to the circular nature of coins. In this paper, we transform the coin from circular into rectangular shape and then perform morphological operations to compute the horizontal and vertical projection profiles and apply dynamic adaptive mask to extract characters. Our method is evaluated on several coins from diverse countries with different image background complexity. The proposed method achieved precision and recall rates as high as 93.5% and 94.8% respectively demonstrating the effectiveness of the proposed method.

Ali K. Hmood, Tamarafinide V. Dittimi, Ching Y. Suen

Image Segmentation Based on Solving the Flow in the Mesh with the Connections of Limited Capacities

This paper presents a novel seeded segmentation technique inspired by the flowing of a liquid in a mesh of pipes. The method can be likened to the anisotropic diffusion algorithm. On the other hand, some substantial changes in the relation of how the diffusion works are included. The method is based on the spreading of liquid from the foreground seeds to the neighboring image points that represent basins with an initial amount of liquid. The background seeds drain the liquid from the neighboring basins. If a basin is full or empty, the corresponding pixel becomes a new source or sink. The algorithm runs until all pixels become either sources or sinks. The properties of the method are illustrated on the image segmentation of synthetic images. The comparison with other segmentation techniques is presented on real-life images. The experiments show promising results of the new method.

Michael Holuša, Andrey Sukhanov, Eduard Sojka

Motion and Tracking

Frontmatter

Exploiting Semantic Segmentation for Robust Camera Motion Classification

The goal of camera motion classification is to identify how the camera moves during a shot (Zoom, Pan, Tilt, etc.). This dynamic information about a video is valuable for many applications such as video indexing and retrieval. For that purpose, we propose an optical flow-based SVM classification to identify 9 types of motion. Numerous methods fail when large moving foreground objects are present in the scene, and we address this problem by combining semantic segmentation with our feature extraction in order to select only relevant motion vectors. We conducted an evaluation that shows promising results as our method reaches 90% correct classifications on a large set of varied video samples.

François-Xavier Derue, Mohamed Dahmane, Marc Lalonde, Samuel Foucher

An Event-Based Optical Flow Algorithm for Dynamic Vision Sensors

We present an event-based optical flow algorithm for the Davis Dynamic Vision Sensor (DVS). The algorithm is based on the Reichardt motion detector inspired by the fly visual system, and has a very low computational requirement for each event received from the DVS.

Iffatur Ridwan, Howard Cheng

People’s Re-identification Across Multiple Non-overlapping Cameras by Local Discriminative Patch Matching

People’s tracking in multi-camera systems is one of the most important parts for the study of human’s behavior. In this work, we propose a re-identification method for associating people across non-overlapping cameras for tracking purposes. The proposed method is based on the use of discriminatives patches (salient regions). Our method is based on the proposal of a new framework that is used for selecting the most discriminative patches for each tracked individual. This framework is based on exploiting both appearance and spatial information to find the most discriminative salient regions. In this framework, each individual is represented by a set of values representing a rough description for several local patches extracted from the given individual. Then, this representation is used to select some interest patches that most represent the individual of interest compared to other individuals. At the end, these patches are used for associating new detected individuals to tracked ones.

Rabah Iguernaissi, Djamal Merad, Pierre Drap

3D Computer Vision

Frontmatter

Hybrid Multi-modal Fusion for Human Action Recognition

We introduce in this paper a hybrid fusion approach allowing the efficient combination of the Kinect modalities within the feature, representation and decision levels. Our contributions are three-fold: (i) We propose an efficient concatenation of complementary per-modality descriptors that rely on the joint modality as a high-level information. (ii) We apply a multi-resolution analysis that combines the local frame-wise decisions with the global BoVW ones. We rely in this context on the scalability of the Fisher vector representation in order to handle large-scale data and apply additional concatenation of its output. (iii) We also propose an efficient score merging scheme by generating multiple weighting-coefficients that combine the strength of different SVM classifiers with a given action label. By evaluating our approach on the Cornell activity dataset, state-of-the-art performances are obtained.

Bassem Seddik, Sami Gazzah, Najoua Essoukri Ben Amara

Change Detection in Urban Streets by a Real Time Lidar Scanner and MLS Reference Data

In this paper, we introduce a new technique for change detection in urban environment based on the comparison of 3D point clouds with significantly different density characteristics. Our proposed approach extracts moving objects and environmental changes from sparse and inhomogeneous instant 3D (i3D) measurements, using as reference background model dense and regular point clouds captured by mobile laser scanning (MLS) systems. The introduced workflow consist of consecutive steps of point cloud classification, crossmodal measurement registration, Markov Random Field based change extraction in the range image domain and label back projection to 3D. Experimental evaluation is conducted in four different urban scenes, and the advantage of the proposed change detection step is demonstrated against a reference voxel based approach.

Bence Gálai, Csaba Benedek

Creating Immersive Virtual Reality Scenes Using a Single RGB-D Camera

We examine the problem of creating immersive virtual reality (VR) scenes using a single moving RGB-D camera. Our approach takes as input a RGB-D video containing one or more actors and constructs a complete 3D background within which human actors are properly embedded. A user can then view the captured video from any viewpoint and interact with the scene. We also provide a manually labeled database of RGB-D video sequences and evaluation metrics.

Po Kong Lai, Robert Laganière

Sunshine Hours and Sunlight Direction Using Shadow Detection in a Video

Previous systems used location information like GPS and the Suns location to detect sun light. However how much sunshine an area gets depends on its surround environment too, for instance we seldom get sunshine under a big tree or near a big building. So, we propose estimating sunshine hour just with a video by using image processing. We also calculate sunlight moving direction. One day outdoor video such as backyard, park or forest is processed to measure sunshine hour for every pixel to determine location of sunniest area. Shadow detection based on an algorithm using LAB color space where a difference in the light channel L is compared to neighbours to determine shadow. We improved this common algorithm by using adaptive threshold based on histogram of each frame of the video to overcome difficulty in tree and leaves shadow detection during sunset scene. We have tested 8 videos and the shadow detection rate has been improved to 93.04 from 85.34 by previously published algorithm. Then we use resultant image showing amount of sunlight on each pixel to obtain the sunshine hours. In addition, we calculate a sun direction from these images by using tracking algorithm for shadow movement.

Palak Bansal, Chao Sun, Won-Sook Lee

People-Flow Counting Using Depth Images for Embedded Processing

This paper proposes a people-flow counting algorithm using top-view depth images for implementation on low-power, embedded processors. In the people detection stage the algorithm uses morphological connected filters to find head candidates, and in the tracking stage it uses Kalman filtering in order to obtain good predictions in frames where detection fails. A fast interpolation algorithm is also proposed, which estimates the values of pixels affected by noise and generates an image with a continuous domain. The experiments were done using a Kinect sensor and the processing was performed in real time on a Raspberry Pi 3. The dataset consisted of 4025 short video sequences of people entering and exiting indoor environments, obtained from three different installations. The algorithm proved to be adequate for an embedded application, reaching an accuracy of 98% for frame rates as low as 5.5 FPS.

Guilherme S. Soares, Rubens C. Machado, Roberto A. Lotufo

Salient Object Detection in Images by Combining Objectness Clues in the RGBD Space

We propose a multi-stage approach for salient object detection in natural images which incorporates color and depth information. In the first stage, color and depth channels are explored separately through objectness-based measures to detect potential regions containing salient objects. This procedure produces a list of bounding boxes which are further filtered and refined using statistical distributions. The retained candidates from both color and depth channels are then combined using a voting system. The final stage consists of combining the extracted candidates from color and depth channels using a voting system that produces a final map narrowing the location of the salient object. Experimental results on real-world images have proved the performance of the proposed method in comparison with the case where only color information is used.

François Audet, Mohand Said Allili, Ana-Maria Cretu

Feature Extraction

Frontmatter

Development of an Active Shape Model Using the Discrete Cosine Transform

In a feature-based face recognition system using a set of features extracted from each of the prominent facial components, automatic and accurate localization of facial features is an essential pre-processing step. The active shape model (ASM) is a flexible shape model that was originally proposed to automatically locate a set of landmarks representing the facial features. This paper is concerned with developing a low-complexity ASM by incorporating the energy compaction property of the discrete cosine transform (DCT). The proposed ASM employs a 2-D profile based on the DCT of the local grey-level gradient pattern around a landmark, and is utilized in a scheme of facial landmark annotation for locating facial features of the face in an input image. The proposed model provides two distinct advantages: (i) the use of a smaller number of DCT coefficients in building a compressed DCT profile significantly reduces the computational complexity, and (ii) the process of choosing the low-frequency DCT coefficients filters out the noise contained in the image. The experimental results show that the use of the proposed model in the application of facial landmark annotation significantly reduces the execution time without affecting the accuracy of the facial shape fitting.

Kotaro Yasuda, M. Omair Ahmad

Ground Plane Segmentation Using Artificial Neural Network for Pedestrian Detection

This paper presents a method of ground plane segmentation for urban outdoor scenes using a feedforward artificial neural network (ANN). The main motivation of this project is to obtain some contextual information from the scene for use in pedestrian detection algorithms and to provide an accuracy improvement for this algorithms. The ANN input is fed with features extracted from a patch window of the image scene. The ANN output classifies the patch as belonging or not belonging to the ground plane. After that, the classified patches are joined into a full image with the ground plane area outlined. The images used for training, test and evaluation were obtained from the widely known Caltech-USA database. The accuracy of ground plane segmentation was above 96% in the experiments which improved the precision of the pedestrian detector in 38,5%.

Jorge Candido, Mauricio Marengoni

An Improved Directional Convexity Measure for Binary Images

Balázs et al. (Fundamenta Informaticae 141:151–167, 2015) proposed a measure of directional convexity of binary images based on the geometric definition of shape convexity. The measure is useful for various applications of digital image processing and pattern recognition, especially in binary tomography. Here we provide an improvement of this measure making it to follow better the intuitive concept of geometric convexity and to be more suitable to distinguish between thick and thin objects.

Péter Bodnár, Péter Balázs

Learning Salient Structures for the Analysis of Symmetric Patterns

Feature-based symmetry detection algorithms have become popular amongst researchers due to their dominance in performance, nevertheless, these approaches are computationally demanding. Also they are reliant on the presence of matched features, therefore they benefit from the abundance of detected keypoints; this implies that a trade-off between performance and computation time must be found. In this paper both issues are addressed, the detection of large sets of keypoints and the computation time for feature-based symmetry detection algorithms. We present an innovative process to learn rotation-invariant salient structures by clustering self-similarities. Keypoints are detected as local maxima in feature-maps computed using the learnt structures. Keypoints are described using BRISK. We consider an axis of symmetry to be a dense cloud of points in a parameter-space, a density-based clustering algorithm is used to find such clouds. Computing times are drastically shortened taking an average of 0.619 s to process an image. Detection results for single and multiple, straight and curved, reflection and glide-reflection symmetries are similar to the current state of the art.

Jaime Lomeli-R., Mark S. Nixon

Triplet Networks Feature Masking for Sketch-Based Image Retrieval

Freehand sketches are an intuitive tool for communication and suitable for various applications. In this paper, we present an effective approach that combines triplet networks and an attention mechanism for sketch-based image retrieval (SBIR). The study conducted in this work is based on features extracted using deep convolutional neural networks (ConvNets). In order to overcome the SBIR cross-domain challenge (i.e. searching for photographs from sketch queries), we use triplet loss to train ConvNets to compute shared embedding for both sketches and images. Our main novel contribution is to combine such triplet networks with an attention mechanism. Our approach outperform previous state-of-the-art on challenging SBIR benchmarks. We achieved a recall of 41.66% (at $$k=1$$) for the sketchy database (more than 4% improvement), a Kendal score of 42.9$$\mathcal {T}_\mathrm {b}$$ on the TU-Berlin SBIR benchmark (close to 5.5$$\mathcal {T}_\mathrm {b}$$ improvement) and a mean average precision (MAP) of 31% on Flickr15k (a category level SBIR benchmark).

Omar Seddati, Stéphane Dupont, Saïd Mahmoudi

Are You Smiling as a Celebrity? Latent Smile and Gender Recognition

Person gender detection is an important feature in many vision-based research fields including surveillance, human computer interaction, Biometrics, stratified behavior understanding, and content-based indexing. Researchers are still facing big challenges to establish automated systems to recognize gender from images where human face represents the most important source of information. In the present study, we elaborated and validated a methodology for gender perception by transfer learning. First, the face is located and the corresponding cropped image is fed to a pre-trained convolutional neural network, the generated deep “latent” features are used to train a linear-SVM classifier. The overall classification performance reached $$90.69\%$$ on the FotW validation set and $$91.52\%$$ on the private test set.In this paper, we investigated also whether these features can deliver a smile recognizer. A similar trained architecture for classification of smiling and non-smiling faces gave a rate of $$88.14\%$$ on the validation set and $$82.12\%$$ on the private test set.

M. Dahmane, S. Foucher, D. Byrns

An Empirical Analysis of Deep Feature Learning for RGB-D Object Recognition

Conventional deep feature learning methods use the same model parameters for both RGB and depth domains in RGB-D object recognition. Since the characteristics of RGB and depth data are different, the suitability of such approaches is suspicious. In this paper, we empirically investigate the effects of different model parameters on RGB and depth domains using the Washington RGB-D Object Dataset. We have explored the effects of different filter learning approaches, rectifier functions, pooling methods, and classifiers for RGB and depth data separately. We have found that individual model parameters fit best for RGB and depth data.

Ali Caglayan, Ahmet Burak Can

Image Registration Based on a Minimized Cost Function and SURF Algorithm

Computer vision and image recognition became one of the interesting research areas. Image registration has been widely used in fields such as computer vision, MRI images, and face recognition. Image registration is a process of aligning multiple images of the same scene which are taken from a different angle or at a different time to the same coordinate system. Image registration transforms the target image to the source image based on the affine transformation such as translation, scaling, reflection, rotation, shearing etc. It is a challenging task to find enough matching points between the source and the target images. In the proposed method, we used Speeded-Up Robust Features (SURF) and Random sample consensus (RANSAC) to find the best matching points between the pair images in addition to the minimized cost function which enhances the image registration with a few matching points. We took in our concentration some of the affine transformation which is translation, rotation, and scaling. We achieved a higher accuracy in the image registration with few matching points as low as two matching points. Experimental results show the efficiency and effectiveness of the proposed method.

Mohannad Abuzneid, Ausif Mahmood

A Better Trajectory Shape Descriptor for Human Activity Recognition

Sparse representation is one of the most popular methods for human activity recognition. Sparse representation describes a video by a set of independent descriptors. Each of these descriptors usually captures the local information of the video. These features are then mapped to another space, using Fisher Vectors, and an SVM is used for clustering them. One of the sparse representation methods proposed in the literature uses trajectories as features. Trajectories have been shown to be discriminative in many previous works on human activity recognition. In this paper, a more formal definition is given to trajectories and a new more effective trajectory shape descriptor is proposed. We tested the proposed method against our challenging dataset and demonstrated through experiments that our new trajectory descriptor outperforms the previously existing main shape descriptor with a good margin. For example, in one case the obtained results had a 5.58% improvement, compared to the existing trajectory shape descriptor. We run our tests over sparse feature sets, and we are able to reach comparable results to a dense sampling method, with fewer computations.

Pejman Habashi, Boubakeur Boufama, Imran Shafiq Ahmad

Detection and Classification

Frontmatter

Gaussian Mixture Trees for One Class Classification in Automated Visual Inspection

We present Gaussian mixture trees for density estimation and one class classification. A Gaussian mixture tree is a tree, where each node is associated with a Gaussian component. Each level of the tree provides a refinement of the data description of the level above. We show how this approach is applied to one class classification and how the hierarchical structure is exploited to significantly reduce computation time to make the approach suitable for real time systems. Experiments with synthetic data and data from a visual inspection task show that our approach compares favorably to flat Gaussian mixture models as well as one class support vector machines regarding both predictive performance and computation time.

Matthias Richter, Thomas Längle, Jürgen Beyerer

Shadow Detection for Vehicle Classification in Urban Environments

Finding an accurate and computationally efficient vehicle detection and classification algorithm for urban environment is challenging due to large video datasets and complexity of the task. Many algorithms have been proposed but there is no efficient algorithm due to various real-time issues. This paper proposes an algorithm which addresses shadow detection (which causes vehicles misdetection and misclassification) and incorporates solution of other challenges such as camera vibration, blurred image, illumination and weather changing effects. For accurate vehicles detection and classification, a combination of self-adaptive GMM and multi-dimensional Gaussian density transform has been used for modeling the distribution of color image data. RGB and HSV color space based shadow detection is proposed. Measurement-based feature and intensity based pyramid histogram of orientation gradient are used for classification into four main vehicle categories. The proposed method achieved 96.39% accuracy, while tested on Chile (MTT) dataset recorded at different times and weather conditions and hence suitable for urban traffic environment.

Muhammad Hanif, Fawad Hussain, Muhammad Haroon Yousaf, Sergio A. Velastin, Zezhi Chen

Input Fast-Forwarding for Better Deep Learning

This paper introduces a new architectural framework, known as input fast-forwarding, that can enhance the performance of deep networks. The main idea is to incorporate a parallel path that sends representations of input values forward to deeper network layers. This scheme is substantially different from “deep supervision”, in which the loss layer is re-introduced to earlier layers. The parallel path provided by fast-forwarding enhances the training process in two ways. First, it enables the individual layers to combine higher-level information (from the standard processing path) with lower-level information (from the fast-forward path). Second, this new architecture reduces the problem of vanishing gradients substantially because the fast-forwarding path provides a shorter route for gradient backpropagation. In order to evaluate the utility of the proposed technique, a Fast-Forward Network (FFNet), with 20 convolutional layers along with parallel fast-forward paths, has been created and tested. The paper presents empirical results that demonstrate improved learning capacity of FFNet due to fast-forwarding, as compared to GoogLeNet (with deep supervision) and CaffeNet, which are $$4{\times }$$ and $$18{\times }$$ larger in size, respectively. All of the source code and deep learning models described in this paper will be made available to the entire research community (https://github.com/aicentral/FFNet).

Ahmed Ibrahim, A. Lynn Abbott, Mohamed E. Hussein

Improving Convolutional Neural Network Design via Variable Neighborhood Search

An unsupervised method for convolutional neural network (CNN) architecture design is proposed. The method relies on a variable neighborhood search-based approach for finding CNN architectures and hyperparameter values that improve classification performance. For this purpose, t-Distributed Stochastic Neighbor Embedding (t-SNE) is applied to effectively represent the solution space in 2D. Then, k-Means clustering divides this representation space having in account the relative distance between neighbors. The algorithm is tested in the CIFAR-10 image dataset. The obtained solution improves the CNN validation loss by over $$15\%$$ and the respective accuracy by $$5\%$$. Moreover, the network shows higher predictive power and robustness, validating our method for the optimization of CNN design.

Teresa Araújo, Guilherme Aresta, Bernardo Almada-Lobo, Ana Maria Mendonça, Aurélio Campilho

Fast Spectral Clustering Using Autoencoders and Landmarks

In this paper, we introduce an algorithm for performing spectral clustering efficiently. Spectral clustering is a powerful clustering algorithm that suffers from high computational complexity, due to eigen decomposition. In this work, we first build the adjacency matrix of the corresponding graph of the dataset. To build this matrix, we only consider a limited number of points, called landmarks, and compute the similarity of all data points with the landmarks. Then, we present a definition of the Laplacian matrix of the graph that enable us to perform eigen decomposition efficiently, using a deep autoencoder. The overall complexity of the algorithm for eigen decomposition is O(np), where n is the number of data points and p is the number of landmarks. At last, we evaluate the performance of the algorithm in different experiments.

Ershad Banijamali, Ali Ghodsi

Improved Face and Head Detection Based on Traditional Middle Eastern Clothing

This paper is concerned with the detection of individuals in images who wear traditional Middle Eastern clothing. Traditional headwear for men includes a scarf known as the shemagh that often occludes the face or causes significant shadows. State-of-the-art face-detection systems do not perform well for these cases. To address this problem, we have developed a novel approach that detects a distinctive part of traditional headwear known as the igal. This is a band or cord, typically black, that rests on the shemagh to hold it in place. Our approach starts by applying multiscale SVM classification with a HoG descriptor to perform tentative detection. The proposed detections are then refined using a bag of visual words categorization system. Experimental results have shown significantly better performance for our technique over several face-detection systems. Our technique yielded an F1 score of 80% with a low false-positive rate, showing an improvement of 15% over the best face detector.

Abdulaziz Alorf, A. Lynn Abbott

Unsupervised Group Activity Detection by Hierarchical Dirichlet Processes

Detecting groups plays an important role for group activity detection. In this paper, we propose an automatic group activity detection by segmenting the video sequences automatically into dynamic clips. As the first step, groups are detected by adopting a bottom-up hierarchical clustering, where the number of groups is not provided beforehand. Then, groups are tracked over time to generate consistent trajectories. Furthermore, the Granger causality is used to compute the mutual effect between objects based on motion and appearances features. Finally, the Hierarchical Dirichlet Process is used to cluster the groups. Our approach not only detects the activity among the objects of a particular group (intra-group) but also extracts the activities among multiple groups (inter-group). The experiments on public datasets demonstrate the effectiveness of the proposed method. Although our approach is completely unsupervised, we achieved results with a clustering accuracy of up to 79.35$$\%$$ and up to 81.94$$\%$$ on the Behave and the NUS-HGA datasets.

Ali Al-Raziqi, Joachim Denzler

Classification Boosting by Data Decomposition Using Consensus-Based Combination of Classifiers

This paper is devoted to data decomposition analysis. We decompose data into functional groups depending on their complexity in terms of classification. We use consensus of classifiers as an effective algorithm for data decomposition. Present research considers data decomposition into two subsets of “easy” and “difficult” or “ambiguous” data. The easiest part of data is classified during decomposition using consensus of classifiers. For other part of data one has to apply other classifiers or classifier combination. One can prove experimentally that afore mentioned data decomposition using optimal consensus of classifiers leads to better performance and generalization ability of the entire classification algorithm.

Vitaliy Tayanov, Adam Krzyżak, Ching Suen

Classification Using Mixture of Discriminative Learners: The Case of Compositional Data

Compositional data arise in many fields and their analysis has to be done with care since these data are bounded and summing up to a constant. In this paper, we propose a mixture model which combines several discriminative models through a set of Dirichlet-based weights. It is worth noticing that the Dirichlet distribution is not used here as a prior to the mixing coefficients but instead to model the repartition of the tasks among the classifiers. By doing so, we do not need to transform the data while keeping interpretable results. Experiments on synthetic and real-world data sets show the efficiency of our model.

Elvis Togban, Djemel Ziou

Biomedical Image Analysis

Frontmatter

Mesh-Based Active Model Initialization for Multiple Organ Segmentation in MR Images

Active models are widely used for segmentation of medical images. One of the key issues of active models is the initialization phase which affects significantly the segmentation performance. This paper presents a novel method for an automatic initialization of different types of active models by exploiting an adaptive mesh generation technique which is suitable for automatic detection of multiple organs. This method has been applied on MR images and results show the ability of the proposed method in simultaneously extracting initial approximate boundaries that are close to the exact boundaries of multiple organs. The effect of the proposed initialization algorithm on the segmentation has been tested on a series of arm and thoracic MR images and the results show an improvement in the convergence and speed of active model segmentation of multiple organs with respect to those obtained using manual initialization.

M. R. Mohebpour, F. Guibault, F. Cheriet

Sperm Flagellum Center-Line Tracing in Fluorescence 3D+t Low SNR Stacks Using an Iterative Minimal Path Method

Intracellular calcium ([Ca$$^{2+}$$]i) regulates sperm motility. Visualizing [Ca$$^{2+}$$]i in 3D is not a simple matter since it requires complex fluorescence microscopy techniques where the resulting images have very low intensity and consequently low SNR (Signal to Noise Ratio). In 3D+t sequences, this problem is magnified since the flagellum beats (for human sperm) at an average frequency of 15 Hz, making it harder to obtain the three dimensional information. Moreover, 3D holographic techniques do not work for these fluorescence based images. In this paper, an algorithm to extract the flagellum’s center-line in 3D+t stacks is presented. For this purpose, an iterative algorithm based on the fast-marching method is proposed to extract the flagellum’s center-line. Quantitative and qualitative results are presented in a 3D+t stack to demonstrate the ability of the proposed algorithm to trace the flagellum’s center-line. Our method was qualitative and quantitatively compared against state-of-the-art tubular structure center-line extraction algorithms outperforming them and reaching a Precision and Recall of 0.96 as compared with a semi-manual method used as reference. The proposed methodology has proven to solve a major problem related with the analysis of the 3D motility of sperm cells in images with very low intensity.

Paul Hernandez-Herrera, Fernando Montoya, Juan M. Rendón, Alberto Darszon, Gabriel Corkidi

Curvelet-Based Classification of Brain MRI Images

Classification of brain MRI images is crucial in medical diagnosis. Automatic classification of these images helps in developing effective non-invasive procedures. In this paper, based on curvelet transform, a novel classification scheme of brain MRI images is proposed and a technique for extracting and selecting curvelet features is provided. To study the effectiveness of their use, the proposed features are employed into three different prediction algorithms, namely, K-nearest neighbours, support vector machine and decision tree. The method of K-fold stratified cross validation is used to assess the efficacy of the proposed classification solutions and the results are compared with those of various state-of-the-art classification schemes available in the literature. The experimental results demonstrate the superiority of the proposed decision tree classification scheme in terms of accuracy, generalization capability, and real-time reliability.

Rafat Damseh, M. Omair Ahmad

A Novel Automatic Method to Evaluate Scoliotic Trunk Shape Changes in Different Postures

We present a novel method to evaluate the external trunk shape of Adolescent Idiopathic Scoliosis (AIS) patients. A patient’s trunk surface is acquired in different postures (neutral standing, left and right lateral bending) at their preoperative visit and in standing posture at their postoperative visit following spinal deformity corrective surgery with an optical digitizing system. We use spectral shape decomposition to compute the eigenmodes of the trunk surface. This allows us to intrinsically define the principal shape directions robustly with respect to the patient’s posture. We then extract a set of contour levels that follow the trunk’s deformation from bottom to top, and characterize the trunk shape as a set of multilevel measurements taken at each level. Changes in trunk shape between postures/visits are calculated as differences between the measurement functionals. We performed a study on a small cohort of 14 patients with right thoracic spinal curves to assess the relationship between shape changes induced by the lateral bending positions and those resulting from surgical correction. The proposed method for scoliotic trunk shape evaluation represents a significant improvement over previous ones, as it is completely automatic and it adapts well to the lateral bending posture without the need to manually define control points/curves on the trunk surface.

Philippe Debanné, Ola Ahmad, Stefan Parent, Hubert Labelle, Farida Cheriet

Breast Density Classification Using Local Ternary Patterns in Mammograms

This paper presents a method for breast density classification. Local ternary pattern operators are employed to model the appearance of the fibroglandular disk region instead of the whole breast region as the majority of current studies have done. The Support Vector Machine classifier is used to perform the classification and a stratified ten-fold cross-validation scheme is employed to evaluate the performance of the method. The proposed method achieved 82.33% accuracy which is comparable with some of the best methods in the literature based on the same dataset and evaluation scheme.

Andrik Rampun, Philip Morrow, Bryan Scotney, John Winder

Segmentation of Prostate in Diffusion MR Images via Clustering

Automatic segmentation of prostate gland in magnetic resonance (MR) images is a challenging task due to large variations of prostate shapes and indistinct boundaries with adjacent tissues. In this paper, we propose an automatic pipeline to segment prostate gland in diffusion magnetic resonance images (dMRI). The most common approach for segmenting prostate in MR images is based on image registration, which is computationally expensive and solely relies on the pre-segmented images (also known as atlas). In contrast, the proposed method uses a clustering method applied to the dMRI to separate prostate gland from the surrounding tissues followed by a postprocessing stage via active contours. The proposed pipeline was validated on prostate MR images of 25 patients and the segmentation results were compared to manually delineated prostate contours. The proposed method achieves an overall accuracy with mean Dice Similarity Coefficient (DSC) of 0.84$$\ \pm \ $$0.04, while being the most effective in the middle prostate gland producing a mean DSC of 0.91$$\ \pm \ $$0.03. The proposed method has the potential to be integrated into clinical decision support systems that aid radiologists in monitoring prostate cancer.

Junjie Zhang, Sameer Baig, Alexander Wong, Masoom A. Haider, Farzad Khalvati

Facial Skin Classification Using Convolutional Neural Networks

Facial skin assessment is crucial for a number of fields including the make-up industry, dermatology and plastic surgery. This paper addresses skin classification techniques which use conventional machine learning and state-of-the-art Convolutional Neural Networks to classify three types of facial skin patches, namely normal, spots and wrinkles. This study aims to accomplish the pivotal work on the basis of these three classes to provide the collective facial skin quality score. In this work, we collected high quality face images of people from different ethnicities to create a derma dataset. Then, we outlined the skin patches of 100 $$\times $$ 100 resolution in the three pre-decided classes. With extensive parameter tuning, we ran a number of computer vision experiments using both traditional machine learning and deep learning techniques for this 3-class classification. Despite the limited dataset, GoogLeNet outperforms the Support Vector Machine approach with Accuracy of 0.899, F-Measure of 0.852 and Matthews Correlation Coefficient of 0.779. The result shows the potential use of deep learning for non-clinical skin images classification, which will be more promising with a larger dataset.

Jhan S. Alarifi, Manu Goyal, Adrian K. Davison, Darren Dancey, Rabia Khan, Moi Hoon Yap

Automatic Detection of Globules, Streaks and Pigment Network Based on Texture and Color Analysis in Dermoscopic Images

Melanoma diagnosis in early stages is a difficult task, which requires highly qualified and trained staff. Therefore, a computer aided diagnosis tool to assist non-specialized physicians in the assessment of pigmented lesions would be desirable. In this paper a method to detect streaks, globules and pigment network, which are very important features to evaluate the malignancy of a lesion, is presented. The algorithm calculates the texton histograms of color and texture features extracted from a filter bank, that feed a Support Vector Machine. The method has been tested with 176 images attaining an accuracy of 80%, outperfoming the benchmark techniques used as comparison.

Amaya Jiménez, Carmen Serrano, Begoña Acha

Image Analysis in Ophthalmology

Frontmatter

Learning to Deblur Adaptive Optics Retinal Images

In this paper we propose a blind deconvolution approach for reconstruction of Adaptive Optics (AO) high-resolution retinal images. The framework employs Random Forest to learn the mapping of retinal images onto the space of blur kernels expressed in terms of Zernike coefficients. A specially designed feature extraction technique allows inference of blur kernels for retinal images of various quality, taken at different locations of the retina. This model is validated on synthetically generated images as well as real AO high-resolution retinal images. The obtained results on the synthetic data showed an average root-mean-square error of 0.0051 for the predicted blur kernels and 0.0464 for the reconstructed images, compared to the ground truth (GT). The assessment of the reconstructed AO retinal images demonstrated that the contrast, sharpness and visual quality of the images have been significantly improved.

Anfisa Lazareva, Muhammad Asad, Greg Slabaugh

A Deep Neural Network for Vessel Segmentation of Scanning Laser Ophthalmoscopy Images

Retinal vessel segmentation is a fundamental and well-studied problem in the retinal image analysis field. The standard images in this context are color photographs acquired with standard fundus cameras. Several vessel segmentation techniques have been proposed in the literature that perform successfully on this class of images. However, for other retinal imaging modalities, blood vessel extraction has not been thoroughly explored. In this paper, we propose a vessel segmentation technique for Scanning Laser Opthalmoscopy (SLO) retinal images. Our method adapts a Deep Neural Network (DNN) architecture initially devised for segmentation of biological images (U-Net), to perform the task of vessel segmentation. The model was trained on a recent public dataset of SLO images. Results show that our approach efficiently segments the vessel network, achieving a performance that outperforms the current state-of-the-art on this particular class of images.

Maria Ines Meyer, Pedro Costa, Adrian Galdran, Ana Maria Mendonça, Aurélio Campilho

Adversarial Synthesis of Retinal Images from Vessel Trees

Synthesizing images of the eye fundus is a challenging task that has been previously approached by formulating complex models of the anatomy of the eye. New images can then be generated by sampling a suitable parameter space. Here we propose a method that learns to synthesize eye fundus images directly from data. For that, we pair true eye fundus images with their respective vessel trees, by means of a vessel segmentation technique. These pairs are then used to learn a mapping from a binary vessel tree to a new retinal image. For this purpose, we use a recent image-to-image translation technique, based on the idea of adversarial learning. Experimental results show that the original and the generated images are visually different in terms of their global appearance, in spite of sharing the same vessel tree. Additionally, a quantitative quality analysis of the synthetic retinal images confirms that the produced images retain a high proportion of the true image set quality.

Pedro Costa, Adrian Galdran, Maria Ines Meyer, Ana Maria Mendonça, Aurélio Campilho

Automated Analysis of Directional Optical Coherence Tomography Images

Directional optical coherence tomography (D-OCT) reveals reflectance properties of retinal structures by changing the incidence angle of the light beam. As no commercially available OCT device has been designed for such use, image processing is required to homogenize the grey levels between off-axis images before differential analysis. We describe here a method for automated analysis of D-OCT images and propose a color representation to highlight angle-dependent structures. Clinical results show that the proposed approach is robust and helpful for clinical interpretation.

Florence Rossant, Kate Grieve, Michel Paques

Contrast Enhancement by Top-Hat and Bottom-Hat Transform with Optimal Structuring Element: Application to Retinal Vessel Segmentation

Automatic detection of the retinal blood vessel can be used in biometric identification, computer assisted laser surgery, and diagnosis of many eye related diseases. Early detection of retinal blood vessel helps people to take proper treatment against diseases such as diabetic retinopathy, hypertension which can significantly reduce possible vision loss. This paper presents an efficient and simple contrast enhancement technique where morphological operations like top-hat and bottom-hat are applied to enhance the image. Edge Content-based contrast matrix is measured for selecting the optimal structuring element size and simple straightforward steps are applied for completely extracting the vessels from the enhanced retinal image. The proposed method acquires an average accuracy rate of 0.9379 and 0.9504 on two publicly available DRIVE and STARE benchmark dataset respectively.

Rafsanjany Kushol, Md. Hasanul Kabir, Md Sirajus Salekin, A. B. M. Ashikur Rahman

Retinal Biomarkers of Alzheimer’s Disease: Insights from Transgenic Mouse Models

In this paper, we use the retina as a window into the central nervous system and in particular to assess changes in the retinal tissue associated with the Alzheimer’s disease. We imaged the retina of wild-type (WT) and transgenic mouse model (TMM) of Alzheimer’s disease with optical coherence tomography and classify retinas into the WT and TMM groups using support vector machines with the radial basis function kernel. Predictions reached an accuracy over 80% at the age of 4 months and over 90% at the age of 8 months. Texture analysis of computed fundus reference images suggests a more heterogeneous organization of the retina in transgenic mice at the age of 8 months in comparison to controls.

Rui Bernardes, Gilberto Silva, Samuel Chiquita, Pedro Serranho, António Francisco Ambrósio

Particle Swarm Optimization Approach for the Segmentation of Retinal Vessels from Fundus Images

In this paper, we propose to use the Particle Swarm Optimization (PSO) algorithm to improve the Multi-Scale Line Detection (MSLD) method for the retinal blood vessel segmentation problem. The PSO algorithm is applied to find the best arrangement of scales in the basic line detector method. The segmentation performance was validated using a public high-resolution fundus images database containing healthy subjects. The optimized MSLD method demonstrates fast convergence to the optimal solution reducing the execution time by approximately 35%. For the same level of specificity, the proposed approach improves the sensitivity rate by 3.1% compared to the original MSLD method. The proposed method will allow to reduce the amount of missing vessels segments that might lead to false positives of red lesions detection in CAD systems used for diabetic retinopathy diagnosis.

Bilal Khomri, Argyrios Christodoulidis, Leila Djerou, Mohamed Chaouki Babahenini, Farida Cheriet

Retinal Vessel Segmentation from a Hyperspectral Camera Images

In this paper, a vessel segmentation method from hyperspectral retinal images based on the Multi-Scale Line Detection algorithm is proposed. The method consists in combining segmentation information from several consecutive images obtained at specific wavelengths around the green channel to produce an accurate segmentation of the retinal vessel network. Images obtained from six subjects were used to evaluate the performance of the proposed method. Preliminary results suggest a potential advantage of combining multispectral information instead of using only the green channel in segmenting retinal blood vessels.

Rana Farah, Samuel Belanger, Reza Jafari, Claudia Chevrefils, Jean-Philippe Sylvestre, Frédéric Lesage, Farida Cheriet

Remote Sensing

Frontmatter

The Potential of Deep Features for Small Object Class Identification in Very High Resolution Remote Sensing Imagery

Various generative and discriminative methods have been transferred from the computer vision field to remote sensing applications using different low and high semantic level descriptors. However, as classical approaches have shown their limits in representation learning and are not intended to deal with the great variability of the data. With the emergence of large-scale annotated datasets in vision, the convolutional deep approaches represent the most winning solutions by supporting this variability with spatial context integration through different semantic abstraction levels. In the lack of annotated remote sensing data, in this paper, we are comparing the performances of deep features produced by six different CNNs that have been trained on well established computer vision datasets with respect to the detection of small objects (cars) in very high resolution Pleiades imagery.Our findings show good generalization performance and are very encouraging for future applications.

M. Dahmane, S. Foucher, M. Beaulieu, Y. Bouroubi, M. Benoit

Segmentation of LiDAR Intensity Using CNN Feature Based on Weighted Voting

We propose an image labeling method for LiDAR intensity image obtained by Mobile Mapping System (MMS). Conventional segmentation method using CNN and KNN could give high accuracy but the accuracies of objects with small area are much lower than other classes with large area. We solve this issue by using voting cost. The first cost is determined from a local region. Another cost is determined from surrounding regions of the local region. Those costs become large when labeling result corresponds to class label of the region. In experiments, we use 36 LIDAR intensity images with ground truth labels. We divide 36 images into training (28 images) and test sets (8 images). We use class average accuracy as evaluation measures. Our proposed method gain 84.75% on class average accuracy, and it is 9.22% higher than our conventional method. We demonstrated that the proposed costs are effective to improve the accuracy.

Masaki Umemura, Kazuhiro Hotta, Hideki Nonaka, Kazuo Oda

A Lattice-Theoretic Approach for Segmentation of Truss-Like Porous Objects in Outdoor Aerial Scenes

Remote video surveillance of vast outdoor systems for structural health monitoring using e.g. drones is gaining rapid popularity. Many such systems are designed as truss structures, due to well-known mechanical reasons. A truss structure has interstices inherently porous, and hence no closed region or contour really represents useful properties or features of just foreground or just background. In this paper, we present a novel approach to segment and detect porous objects of truss-like structures in videos. Our approach is primarily based on modeling of such objects as composite shapes, organized in a structure called geometric lattices. We define a novel feature called shape density to classify and segment the truss region. The segmented region is then analyzed for various surveillance goals such as bending. The algorithm was tested against video data captured for many transmission towers along two different power grid corridors. We believe that our algorithm will be very useful for analysis of truss-like structures in many a outdoor vision applications.

Hrishikesh Sharma, Tom Sebastian, Balamuralidhar Purushothaman

Non-dictionary Aided Sparse Unmixing of Hyperspectral Images via Weighted Nonnegative Matrix Factorization

In this paper, we propose a method of blind (non-dictionary aided) sparse hyperspectral unmixing for the linear mixing model (LMM). In this method, both the spectral signatures of materials (endmembers) (SSoM) and their fractional abundances (FAs) are supposed to be unknown and the goal is to find the matrices represent SSoM and FAs. The proposed method employs a weighted version of the non-negative matrix factorization (WNMF) in order to mitigate the impact of pixels that suffer from a certain level of noise (i.e., low signal-to-noise-ratio (SNR) values). We formulate the WNMF problem thorough the regularized sparsity terms of FAs and use the multiplicative updating rules to solve the acquired optimization problem. The effectiveness of proposed method is shown through the simulations over real hyperspectral data set and compared with several competitive unmixing methods.

Yaser Esmaeili Salehani, Mohamed Cheriet

Stroke Width Transform for Linear Structure Detection: Application to River and Road Extraction from High-Resolution Satellite Images

The evaluation of lines of communication status in normal times or during crises is a very important task for many applications, such as disaster management and road network maintenance. However, due to their large geographic extent, the inspection of the these structures surfaces using traditional techniques such as laser scanning poses a very challenging problem. In this context, satellite images are pertinent because of their ability to cover a large part of the surface of communication lines, while offering a high level of detail, which makes it possible to discriminate objects forming these linear structures. In this paper, a novel approach for extracting linear structures from high-resolution optical and radar satellite images is presented. The proposed technique is based on the Stroke Width Transform (SWT), which allows parallel edges extraction from the input image. This transform has been successfully applied in the literature to extract characters from complex scenes based on their parallel edges. An adaptation of this transform to solve the problem of rivers extraction from Synthetic Aperture Radar (SAR) images and roads identification from optical images is described in this paper, and the results obtained show the efficiency of our approach.

Moslem Ouled Sghaier, Imen Hammami, Samuel Foucher, Richard Lepage

Applications

Frontmatter

Real Time Fault Detection in Photovoltaic Cells by Cameras on Drones

Hot spots are among the defects of photovoltaic panels which may cause the most destructive effects. In this paper we propose a method able to automatically detect the hot spots in photovoltaic panels by analyzing the sequence of thermal images acquired by a camera mounted on board of a drone flighting over the plant. The main novelty of the proposed approach lies in the fact that color based information, typically adopted in the literature, are combined with model based one, so as to strongly reduce the number of detected false positive. The experimentation, both in terms of accuracy and processing time, confirms the effectiveness and the efficiency of the proposed approach.

Alessandro Arenella, Antonio Greco, Alessia Saggese, Mario Vento

Cow Behavior Recognition Using Motion History Image Feature

In this paper, a cow behavior recognition algorithm is proposed to detect the optimal time of insemination by using the support vector machine (SVM) classifier with motion history image (MHI) feature information. In the proposed algorithm, area information indicating the amount of movements is extracted from MHI, instead of motion direction which has been widely used for person action recognition. In the experimental results, it is confirmed that the proposed method detects the cow mounting behavior with the detection rate of 72%.

Sung-Jin Ahn, Dong-Min Ko, Kang-Sun Choi

Footnote-Based Document Image Classification

Analyzing historical document images is considered a challenging task due to the complex and unusual structures of these images. It is even more challenging to automatically find the footnotes in them. In fact, detecting footnotes is one of the essential elements for scholars to analyze and answer key questions in the historical documents. In this work, we present a new framework for footnote detection in historical documents. To this aim, we used the most salient feature of the footnotes, which is their smaller font size compared to the rest of the page content. We proposed three types of features to track the font size changes and fed them to two classifiers: SVM and AdaBoost. The framework shows promising results over 80% for both classifiers using our dataset.

Sara Zhalehpour, Andrew Piper, Chad Wellmon, Mohamed Cheriet

Feature Learning for Footnote-Based Document Image Classification

Classifying document images is a challenging problem that is confronted by many obstacles; specifically, the pivotal need of hand-designed features and the scarcity of labeled data. In this paper, a new approach for classifying document images, based on the availability of footnotes in them, is presented. Our proposed approach depends mainly on a Deep Belief Network (DBN) that consists of two phases, unsupervised pre-training and supervised fine-tuning. The main advantage of using this approach is its capability to automatically engineer the best features to be extracted from a raw document image for the sake of generating an efficient representation of it. This feature learning approach takes advantage of the vast amount of available unlabeled data and employs it with the limited number of labeled data. The obtained results show that the proposed approach provides an effective document image classification framework with a highly reliable performance.

Sherif Abuelwafa, Mohamed Mhiri, Rachid Hedjam, Sara Zhalehpour, Andrew Piper, Chad Wellmon, Mohamed Cheriet

Analysis of Sloshing in Tanks Using Image Processing

Sloshing is referred to as the violent movement of liquid in a partially filled tank that undergoes dynamic motion. There are several examples of such types of motions. In ships, sloshing motion occur in oil tankers, liquefied natural gas carriers and large fuel oil tanks. In case of rockets, it happens in the liquid hydrogen (LOH) and liquid oxygen (LOX) containing external fuel tanks. The sloshing motion is mainly due to large dimensions of tanks with smooth plane surfaces in contact with the liquid. The tank layout fails to damp the sloshing motion of the liquid. The sloshing motion becomes more violent when the parent vehicle’s motion contains energy in the vicinity of the natural frequencies for liquid motion inside the tank. Determination of these frequencies is critical to determine the nature of fluid motion inside the tank and thereby predict impact load on the structure holding the liquid. The determination of hydrodynamic pressure on the tank walls due to liquid sloshing motion finds application in the design and construction of liquid tanks in rockets and ships. In some cases, determination of liquid motion inside the tank is also critical as it can get coupled with the parent vehicle’s motion dynamics. This paper deals with extraction of data from video recording of liquid sloshing motion inside a rectangular tank. Image processing techniques are used for this purpose. The important fluid dynamics properties which can be determined by image processing are discussed in the paper. The analysis presented is mainly for 2D motions.

Rahul Kamilla, Vishwanath Nagarajan

Light Field Estimation in the Ultraviolet Spectrum

Ultraviolet cameras are becoming well used, with their applications in botany, dermatology, and recently in photography. In this paper, we develop a novel method of light field estimation from an ultraviolet image. Our UV light field imaging model is obtained by exploiting the radiometry, the optic, and the acquisition geometry. First, we develop an optic simulation model for an UV camera with a thin lens. That model permits to reconstruct a scene image from the correspondent light field. Then, we define a variational formalism that integrates an image of a scene acquired by an UV camera, the depth map of that scene, and the optic simulation model in order to estimate the light field. The experimental results show that it is possible to estimate the light field in the UV spectrum with accuracy.

Julien Couillaud, Djemel Ziou, Wafa Benzaoui

Springer Professional

About this book

Table of Contents

Frontmatter

Machine Learning in Image Recognition

Frontmatter

A Weight-Selection Strategy on Training Deep Neural Networks for Imbalanced Classification

End-to-End Deep Learning for Driver Distraction Recognition

Deep CNN with Graph Laplacian Regularization for Multi-label Image Annotation

Transfer Learning Using Convolutional Neural Networks for Face Anti-spoofing

Depth from Defocus via Active Quasi-random Point Projections: A Deep Learning Approach

Machine Learning for Medical Image Computing

Frontmatter

Discovery Radiomics via a Mixture of Deep ConvNet Sequencers for Multi-parametric MRI Prostate Cancer Classification

Discovery Radiomics for Pathologically-Proven Computed Tomography Lung Cancer Prediction

Left Ventricle Wall Detection from Ultrasound Images Using Shape and Appearance Information

Probabilistic Segmentation of Brain White Matter Lesions Using Texture-Based Classification

A Machine Learning-Driven Approach to Computational Physiological Modeling of Skin Cancer

Ejection Fraction Estimation Using a Wide Convolutional Neural Network

Fully Deep Convolutional Neural Networks for Segmentation of the Prostate Gland in Diffusion-Weighted MR Images

Image Enhancement and Reconstruction

Frontmatter

Compensated Row-Column Ultrasound Imaging System Using Three Dimensional Random Fields

Curvelet-Based Bayesian Estimator for Speckle Suppression in Ultrasound Imaging

Object Boundary Based Denoising for Depth Images

A Note on Boosting Algorithms for Image Denoising

Image Segmentation

Frontmatter

K-Autoregressive Clustering: Application on Terahertz Image Analysis

Scale and Rotation Invariant Character Segmentation from Coins

Image Segmentation Based on Solving the Flow in the Mesh with the Connections of Limited Capacities

Motion and Tracking

Frontmatter

Exploiting Semantic Segmentation for Robust Camera Motion Classification

An Event-Based Optical Flow Algorithm for Dynamic Vision Sensors

People’s Re-identification Across Multiple Non-overlapping Cameras by Local Discriminative Patch Matching

3D Computer Vision

Frontmatter

Hybrid Multi-modal Fusion for Human Action Recognition

Change Detection in Urban Streets by a Real Time Lidar Scanner and MLS Reference Data

Creating Immersive Virtual Reality Scenes Using a Single RGB-D Camera

Sunshine Hours and Sunlight Direction Using Shadow Detection in a Video

People-Flow Counting Using Depth Images for Embedded Processing

Salient Object Detection in Images by Combining Objectness Clues in the RGBD Space

Feature Extraction

Frontmatter

Development of an Active Shape Model Using the Discrete Cosine Transform

Ground Plane Segmentation Using Artificial Neural Network for Pedestrian Detection

An Improved Directional Convexity Measure for Binary Images

Learning Salient Structures for the Analysis of Symmetric Patterns

Triplet Networks Feature Masking for Sketch-Based Image Retrieval

Are You Smiling as a Celebrity? Latent Smile and Gender Recognition

An Empirical Analysis of Deep Feature Learning for RGB-D Object Recognition

Image Registration Based on a Minimized Cost Function and SURF Algorithm

A Better Trajectory Shape Descriptor for Human Activity Recognition

Detection and Classification

Frontmatter

Gaussian Mixture Trees for One Class Classification in Automated Visual Inspection

Shadow Detection for Vehicle Classification in Urban Environments

Input Fast-Forwarding for Better Deep Learning

Improving Convolutional Neural Network Design via Variable Neighborhood Search

Fast Spectral Clustering Using Autoencoders and Landmarks

Improved Face and Head Detection Based on Traditional Middle Eastern Clothing

Unsupervised Group Activity Detection by Hierarchical Dirichlet Processes

Classification Boosting by Data Decomposition Using Consensus-Based Combination of Classifiers

Classification Using Mixture of Discriminative Learners: The Case of Compositional Data

Biomedical Image Analysis

Frontmatter

Mesh-Based Active Model Initialization for Multiple Organ Segmentation in MR Images

Sperm Flagellum Center-Line Tracing in Fluorescence 3D+t Low SNR Stacks Using an Iterative Minimal Path Method

Curvelet-Based Classification of Brain MRI Images

A Novel Automatic Method to Evaluate Scoliotic Trunk Shape Changes in Different Postures

Breast Density Classification Using Local Ternary Patterns in Mammograms

Segmentation of Prostate in Diffusion MR Images via Clustering

Facial Skin Classification Using Convolutional Neural Networks

Automatic Detection of Globules, Streaks and Pigment Network Based on Texture and Color Analysis in Dermoscopic Images

Image Analysis in Ophthalmology

Frontmatter

Learning to Deblur Adaptive Optics Retinal Images

A Deep Neural Network for Vessel Segmentation of Scanning Laser Ophthalmoscopy Images