main-content

## Über dieses Buch

The two volume set LNCS 10424 and 10425 constitutes the refereed proceedings of the 17th International Conference on Computer Analysis of Images and Patterns, CAIP 2017, held in Ystad, Sweden, in August 2017.
The 72 papers presented were carefully reviewed and selected from 144 submissions The papers are organized in the following topical sections: Vision for Robotics; Motion and Tracking; Segmentation; Image/Video Indexing and Retrieval; Shape Representation and Analysis; Biomedical Image Analysis; Biometrics; Machine Learning; Image Restoration; and Poster Sessions.

## Inhaltsverzeichnis

### A New Scoring Method for Directional Dominance in Images

We aim to develop a scoring method for expressing directional dominance in the images. It is predicted that this score will give an information of how much improvement in system performance can be achieved when using a directional total variation (DTV)-based regularization instead of total variation (TV). For this purpose, a dataset consists of 85 images taken from the noise reduction datasets is used. The DTV values are calculated by using different sensitivities in the direction of the directional dominance of these images. The slope of these values is determined as the directional dominance score of the image. To verify this score, the noise reduction performances are examined by using direction invariant TV and DTV regulators of images. As a result, we observe that the directional dominance score and the improvement rate in noise reduction performance are correlated. Therefore, the resulting score can be used to estimate the performance of DTV method.

Bilge Suheyla Akkoca-Gazioglu, Mustafa Kamasak

### A Multilayer Backpropagation Saliency Detection Algorithm Based on Depth Mining

Saliency detection is an active topic in multimedia field. Several algorithms have been proposed in this field. Most previous works on saliency detection focus on 2D images. However, for some complex situations which contain multiple objects or complex background, they are not robust and their performances are not satisfied. Recently, 3D visual information supplies a powerful cue for saliency detection. In this paper, we propose a multilayer backpropagation saliency detection algorithm based on depth mining by which we exploit depth cue from four different layers of images. The evaluation of the proposed algorithm on two challenging datasets shows that our algorithm outperforms state-of-the-art.

Chunbiao Zhu, Ge Li, Xiaoqiang Guo, Wenmin Wang, Ronggang Wang

### Laplacian Deformation with Symmetry Constraints for Reconstruction of Defective Skulls

Skull reconstruction is an important and challenging task in craniofacial surgery planning, forensic investigation and anthropological studies. Our previous method called FAIS (Flip-Avoiding Interpolating Surface) [17] is reported to produce more accurate reconstruction of skulls compared to several existing methods. FAIS iteratively applies Laplacian deformation to non-rigidly register a reference to fit the target. Both FAIS and Laplacian deformation have one major drawback. They can produce distorted results when they are applied on skulls with large amounts of defective parts. This paper introduces symmetric constraints to the original Laplacian deformation and FAIS. Comprehensive test results show that the Laplacian deformation and FAIS with symmetric constraints are more robust and accurate than their original counterparts in reconstructing defective skulls with large amounts of defects.

Shudong Xie, Wee Kheng Leow, Thiam Chye Lim

### A New Image Contrast Enhancement Algorithm Using Exposure Fusion Framework

Low-light images are not conducive to human observation and computer vision algorithms due to their low visibility. Although many image enhancement techniques have been proposed to solve this problem, existing methods inevitably introduce contrast under- and over-enhancement. In this paper, we propose an image contrast enhancement algorithm to provide an accurate contrast enhancement. Specifically, we first design the weight matrix for image fusion using illumination estimation techniques. Then we introduce our camera response model to synthesize multi-exposure images. Next, we find the best exposure ratio so that the synthetic image is well-exposed in the regions where the original image under-exposed. Finally, the input image and the synthetic image are fused according to the weight matrix to obtain the enhancement result. Experiments show that our method can obtain results with less contrast and lightness distortion compared to that of several state-of-the-art methods.

Zhenqiang Ying, Ge Li, Yurui Ren, Ronggang Wang, Wenmin Wang

### Quaternionic Flower Pollination Algorithm

Metaheuristic-based optimization techniques offer an elegant and easy-to-follow framework to optimize different types of problems, ranging from aerodynamics to machine learning. Though such techniques are suitable for global optimization, they can still be get trapped locally under certain conditions, thus leading to reduced performance. In this work, we propose a quaternionic-based Flower Pollination Algorithm (FPA), which extends standard FPA to possibly smoother search spaces based on hypercomplex representations. We show the proposed approach is more accurate than five other metaheuristic techniques in four benchmarking functions. We also present a parallel version of the proposed approach that runs much faster.

Gustavo H. Rosa, Luis C. S. Afonso, Alexandro Baldassin, João P. Papa, Xin-She Yang

### Automated Cell Nuclei Segmentation in Pleural Effusion Cytology Using Active Appearance Model

Pleural effusion is common in clinical practice and it is a frequently encountered specimen type in cytopathological assessment. In addition to being time-consuming and subjective, this assessment also causes inter-observer and intra-observer variability, and therefore an automated system is needed. In visual examination of cytopathological images, cell nuclei present significant diagnostic value for early cancer detection and prevention. So, efficient and accurate segmentation of cell nuclei is one of the prerequisite steps for automated analysis of cytopathological images. Nuclei segmentation also yields the following automated microscopy applications, such as cell counting and classification. In this paper, we present an automated technique based on active appearance model (AAM) for cell nuclei segmentation in pleural effusion cytology images. The AAM utilizes from both the shape and texture features of the nuclei. Experimental results indicate that the proposed method separates the nuclei from background effectively. In addition, comparisons are made with the segmentation methods of thresholding-based, clustering-based and graph-based, which show that the results obtained with the AAM method are actually more closer to the ground truth.

Elif Baykal, Hulya Dogan, Murat Ekinci, Mustafa Emre Ercin, Safak Ersoz

### Parkinson’s Disease Identification Using Restricted Boltzmann Machines

Currently, Parkinson’s Disease (PD) has no cure or accurate diagnosis, reaching approximately 60, 000 new cases yearly and worldwide, being more often in the elderly population. Its main symptoms can not be easily uncorrelated with other illness, being way more difficult to be identified at the early stages. As such, computer-aided tools have been recently used to assist in this task, but the challenge in the automatic identification of Parkinson’s Disease still persists. In order to cope with this problem, we propose to employ Restricted Boltzmann Machines (RBMs) to learn features in an unsupervised fashion by analyzing images from handwriting exams, which aim at assessing the writing skills of potential individuals. These are one of the main symptoms of PD-prone people, since such kind of ability ends up being severely affected. We show that RBMs can learn proper features that help supervised classifiers in the task of automatic identification of PD patients, as well as one can obtain a more compact representation of the exam for the sake of storage and computational load purposes.

Clayton R. Pereira, Leandro A. Passos, Ricardo R. Lopes, Silke A. T. Weber, Christian Hook, João Paulo Papa

### Attention-Based Two-Phase Model for Video Action Detection

This paper considers the task of action detection in long untrimmed video. Existing methods tend to process every single frame or fragment through the whole video to make detection decisions, which can not only be time-consuming but also burden the computational models. Instead, we present an attention-based model to perform action detection by watching only a few fragments, which is independent with the video length and can be applied to real-world videos consequently. Our motivation is inspired by the observation that human usually focus their attention sequentially on different frames of a video to quickly narrow down the extent where an action occurs. Our model is a two-phase architecture, where a temporal proposal network is designed to predict temporal proposals for multi-category actions in the first phase. The temporal proposal network observes a fixed number of locations in a video to predict action bounds and learn a location transfer policy. In the second phase, a well-trained classifier is prepared to extract visual information from proposals, to classify the action and decide whether to adopt the proposals. We evaluate our model on ActivityNet dataset and show it can significantly outperform the baseline.

Xiongtao Chen, Wenmin Wang, Weimian Li, Jinzhuo Wang

### Learning Discriminative Representation for Skeletal Action Recognition Using LSTM Networks

Human action recognition based on 3D skeleton data is a rapidly growing research area in computer vision due to their robustness to variations of viewpoint, human body scale and motion speed. Recent studies suggest that recurrent neural networks (RNNs) or convolutional neural networks (CNNs) are very effective to learn discriminative features of temporal sequences for classification. However, in prior models, the RNN-based method has a complicated multi-layer hierarchical architecture, and the CNN-based methods learn the contextual feature on fixed temporal scales. In this paper, we propose a framework which is simple and able to select temporal scales automatically with a single layer LSTM for skeleton based action recognition. Experimental results on three benchmark datasets show that our approach achieves the state-of-the-art performance compared to recent models.

Lizhang Hu, Jinhua Xu

### Progressive Probabilistic Graph Matching with Local Consistency Regularization

Graph matching has attracted extensive attention in computer vision due to its powerful representation and robustness. However, its combinatorial nature and computational complexity limit the size of input graphs. Most graph matching methods initially reconstruct the graphs, while the preprocessing often results in poor performance. In this paper, a novel progressive probabilistic model is proposed in order to handle the outliers and boost the performance. This model takes advantage of the cooperation between process of correspondence enrichment and graph matching. Candidate matches are propagated with local consistency regularization in a probabilistic manner, and unreliable ones are rejected by graph matching. Experiments on two challenging datasets demonstrate that the proposed model outperforms the state-of-the-art progressive method in challenging real-world matching tasks.

Min Tang, Wenmin Wang

### Automatic Detection of Utility Poles Using the Bag of Visual Words Method for Different Feature Extractors

One of the major problems in power distribution networks is abnormal heating associated with high resistance or excessive current flow, in which some of the affected components include three-phase transformers, switches, connectors, fuses, etc. Utility Pole detection aids in the classification of these affected components; thus, the importance of its study. In this work, we propose a method to detect the utility poles using a database of images obtained from Google Maps for the region of Campinas/SP. The Bag of Visual Words (BoVW) method was used to classify the two classes (those that are utility poles and those that are not utility poles), and know if the sub-image obtained belongs to a utility pole class.

Frank C. Cabello, Yuzo Iano, Rangel Arthur, Abel Dueñas, Julio León, Diogo G. Caetano

### Deep Objective Image Quality Assessment

We present a generic blind image quality assessment method that is able to detect common operations that affect image quality as well as estimate parameters of these operations (e.g. JPEG compression quality). For this purpose, we propose a CNN architecture for multi-label classification and integrate patch predictions to obtain continuous parameter estimates. We train this architecture using softmax layers that support multi-label classification and simultaneous training on multiple datasets with heterogeneous labels. Experimental results show that the resulting multi-label CNNs perform similarly to multiple individually trained CNNs while being several times more efficient, and that common image operations and their parameters can be estimated with high accuracy. Furthermore, we demonstrate that the learned features are discriminative for subjective image quality assessment, achieving state-of-the-art results on the LIVE2 dataset via transfer learning. The proposed CNN architecture supports any multi-label classification problem.

Christopher Pramerdorfer, Martin Kampel

### Enhancing Textbook Study Experiences with Pictorial Bar-Codes and Augmented Reality

Augmented Reality (AR) could overlay computer-generated graphics onto the student’s textbooks to make them more attractive, hence, motivate students to learn. However, most existing AR applications use either template (picture) markers or bar-code markers to conceal the information that it wants to display. The formal, being in a pictorial form, can be recognized easily but they are computationally expensive to generate and cannot be easily decoded. The latter displays only numeric data and are therefore cheap to produce and straightforward to decode. However, they look uninteresting and uninformative. In this paper, we present a way that combines the advantage of both the template and bar-code markers to be used in education, e.g. textbook’s figures. Our method decorates on top of an original pictorial textbook’s figure (e.g. historical photos, images, graphs, charts, maps, or drawings) additional regions, to form a single image stereogram that conceals a bar-code. This novel type of figure displays not only a realistic-looking picture but also contains encoded numeric information on students’ textbooks. Students can turn the pages of the book, look at the figures, and understand them without any additional technology. However, if students observe the pages through a hand-held Augmented Reality devices, they see 3D virtual models appearing out of the pages. In this article, we also demonstrate that this pictorial bar-code is relatively robust under various conditions and scaling. Thus, it provides a promising AR approach to be used in school textbooks of all grades, to enhance study experiences.

Huy Le, Minh Nguyen

### Stacked Progressive Auto-Encoders for Clothing-Invariant Gait Recognition

Gait recognition has been considered as an unique and useful biometric for person identification at distance. However, variations in covariate factors such as view angles, clothing, and carrying condition can alter an individual’s gait pattern. These variations make the task of gait analysis much more complicated. Recognizing different subjects under clothing variations remains one of the most challenging tasks in gait recognition. In this paper, we propose a Stacked Progressive Auto-encoders (SPAE) model for clothing-invariant gait recognition. A key contribution of this work is to directly learn clothing-invariant gait features for gait recognition in a progressive way by stacked multi-layer auto-encoders. In each progressive auto-encoder, our SPAE is designed to transform the Gait Energy Images (GEI) with complicated clothing types to ones of normal clothing, while keeping the GEI with normal clothing type unchanged. As a result, it gradually reduces the effect of appearance changes due to variations of clothes. The proposed method is evaluated on the challenging clothing-invariant gait recognition OU-ISIR Treadmill dataset B. The experimental results demonstrate that the proposed method can achieve a far better performance compared to existing works.

TzeWei Yeoh, Hernán E. Aguirre, Kiyoshi Tanaka

### An Improved Scheme of Local Directional Pattern for Texture Analysis with an Application to Facial Expressions

In this paper, several extensions and modifications of Local Directional Pattern (LDP) are proposed with an objective to increase its robustness and discriminative power. Typically, Local Directional pattern generates a code based on the edge response value for the eight directions around a particular pixel. This method ignores the center value which can include important information. LDP uses absolute value and ignores sign of the response which carries information about image gradient and may contain more discriminative information. The sign of the original value carries information about the different trends (positive or negative) of the gradient and may contain some more data. Centered Local Directional Pattern (CLDP), Signed Local Directional Pattern (SLDP) and Centered-SLDP (CSLDP) are proposed in different conditions. Experimental results on 20 texture types using 5 different classifiers in different conditions shows that CLDP in both upper and lower traversal and CSLDP substantially outperforms the formal LDP. All the proposed methods were applied to facial expression emotion application. Experimental results show that SLDP and CLDP outperform original LDP in facial expression analysis.

Abuobayda M. Shabat, Jules-Raymond Tapamo

### A Missing Singular Point Resistant Fingerprint Classification Technique, Based on Directional Patterns

Biometric fingerprint scanners that are integrated into numerous electronic devices, are compact. Commonly, individuals place their fingers on these compact scanners incorrectly causing loss of Singular Points ($$SP$$s). This has a severe impact on Exclusive Fingerprint Classification due to small inter-class variability amongst fingerprint classes. Directional Patterns ($$DP$$s) have recently shown potential in classifying fingerprints with missing $$SP$$s. However, the recent technique is designed to classify frequently occurring cases of missing SPs. In this paper the rules for complex cases where most of the key information has not been captured and tends to be extremely difficult to classify, are proposed to develop a complete classification algorithm using DPs. The proposed algorithm is tested on the $$FVC$$ 2002 $$DB$$1 and 2004 $$DB$$1 and achieves an overall accuracy of 92.48%.

Kribashnee Dorasamy, Leandra Webb-Ray, Jules-Raymond Tapamo

### Indexing of Single and Multi-instance Iris Data Based on LSH-Forest and Rotation Invariant Representation

Indexing of iris data is required to facilitate fast search in large-scale biometric systems. Previous works addressing this issue were challenged by the tradeoffs between accuracy, computational efficacy, storage costs, and maintainability. This work presents an iris indexing approach based on rotation invariant iris representation and LSH-Forest to produce an accurate and easily maintainable indexing structure. The complexity of insertion or deletion in the proposed method is limited to the same logarithmic complexity of a query and the required storage grows linearly with the database size. The proposed approach was extended into a multi-instance iris indexing scheme resulting in a clear performance improvement. Single iris indexing scored a hit rate of 99.7% at a 0.1% penetration rate while multi-instance indexing scored a 99.98% hit rate at the same penetration rate. The evaluation of the proposed approach was conducted on a large database of 50k references and 50k probes of the left and the right irises. The advantage of the proposed solution was put into prospective by comparing the achieved performance to the reported results in previous works.

Naser Damer, Philipp Terhörst, Andreas Braun, Arjan Kuijper

### Clustering-Based, Fully Automated Mixed-Bag Jigsaw Puzzle Solving

The jig swap puzzle is a variant of the traditional jigsaw puzzle, wherein all pieces are equal-sized squares that must be placed adjacent to one another to reconstruct an original, unknown image. This paper proposes an agglomerative hierarchical clustering-based solver that can simultaneously reconstruct multiple, mixed jig swap puzzles. Our solver requires no additional information beyond an unordered input bag of puzzle pieces, and it significantly outperforms the current state of the art in terms of both the reconstructed output quality as well the number of input puzzles it supports. In addition, we define the first quality metrics specifically tailored for multi-puzzle solvers, the Enhanced Direct Accuracy Score (EDAS), the Shiftable Enhanced Direct Accuracy Score (SEDAS), and the Enhanced Neighbor Accuracy Score (ENAS).

Zayd Hammoudeh, Chris Pollett

### A Violence Detection Approach Based on Spatio-temporal Hypergraph Transition

In the field of activity recognition, violence detection is one of the most challenging tasks due to the variety of action patterns and the lack of training data. In the last decade, the performance is getting improved by applying local spatio-temporal features. However, geometric relationships and transition processes of these features have not been fully utilized. In this paper, we propose a novel framework based on spatio-temporal hypergraph transition. First, we utilize hypergraphs to represent the geometric relationships among spatia-temporal features in a single frame. Then, we apply a new descriptor called Histogram of Velocity Change (HVC), which characterizes motion changing intensity, to model hypergraph transitions among consecutive frames. Finally, we adopt Hidden Markov Models (HMMs) with the hypergraphs and the descriptors to detect and localize violence in video frames. Experiment results on BEHAVE dataset and UT-Interaction dataset show that the proposed framework outperforms the existing methods.

Jingjia Huang, Ge Li, Nannan Li, Ronggang Wang, Wenmin Wang

### Blur Parameter Identification Through Optimum-Path Forest

Image acquisition processes usually add some level of noise and degradation, thus causing common problems in image restoration. The restoration process depends on the knowledge about the degradation parameters, which is critical for the image deblurring step. In order to deal with this issue, several approaches have been used in the literature, as well as techniques based on machine learning. In this paper, we presented an approach to identify blur parameters in images using the Optimum-Path Forest (OPF) classifier. Experiments demonstrated the efficiency and effectiveness of OPF when compared against some state-of-the-art pattern recognition techniques for blur parameter identification purpose, such as Support Vector Machines, Bayesian classifier and the k-nearest neighbors.

Rafael G. Pires, Silas E. N. Fernandes, João Paulo Papa

### Real-Time Human Pose Estimation via Cascaded Neural Networks Embedded with Multi-task Learning

Deep convolutional neural networks (DCNNs) have recently been applied to Human pose estimation (HPE). However, most conventional methods have involved multiple models, and these models have been independently designed and optimized, which has led to sub-optimal performance. In addition, these methods based on multiple DCNNs have been computationally expensive and unsuitable for real-time applications. This paper proposes a novel end-to-end framework implemented with cascaded neural networks. Our proposed framework includes three tasks: (1) detecting regions which include parts of the human body, (2) predicting the coordinates of human body joints in the regions, and (3) finding optimum points as coordinates of human body joints. These three tasks are jointly optimized. Our experimental results demonstrated that our framework improved the accuracy and the running time was 2.57 times faster than conventional methods.

Satoshi Tanabe, Ryosuke Yamanaka, Mitsuru Tomono, Makiko Ito, Teruo Ishihara

### Multi-view Separation of Background and Reflection by Coupled Low-Rank Decomposition

Images captured by a camera through glass often have reflection superimposed on the transmitted background. Among existing methods for reflection separation, multi-view methods are the most convenient to apply because they require the user to just take multiple images of a scene at varying viewing angles. Some of these methods are restricted to the simple case where the background scene and reflection scene are planar. The methods that handle non-planar scenes employ image feature flow to capture correspondence for image alignment, but they can overfit resulting in degraded performance. This paper proposes a multiple-view method for separating background and reflection based on robust principal component analysis. It models the background and reflection as rank-1 matrices, which are decomposed according to different transformations for aligning the background and reflection images. It can handle non-planar scenes and global reflection. Comprehensive test results show that our method is more accurate and robust than recent related methods.

Jian Lai, Wee Kheng Leow, Terence Sim, Guodong Li

### Fast and Easy Blind Deblurring Using an Inverse Filter and PROBE

PROBE (Progressive Removal of Blur Residual) is a recursive framework for blind deblurring. PROBE is neither a functional minimization approach, nor an open-loop sequential method where blur kernel estimation is followed by non-blind deblurring. PROBE is a feedback scheme, deriving its unique strength from the closed-loop architecture. Thus, with the rudimentary modified inverse filter at its core, PROBE’s performance meets or exceeds the state of the art, both visually and quantitatively. Remarkably, PROBE lends itself to analysis that reveals its convergence properties.

Naftali Zon, Rana Hanocka, Nahum Kiryati

### Feature Selection on Affine Moment Invariants in Relation to Known Dependencies

Moment invariants are one of the techniques of feature extraction frequently used for pattern recognition algorithms. A moment is a projection of function into polynomial basis and an invariant is a function returning the same value for an input with and without particular class of degradation. Several techniques of moment invariant creation exist often generating over-complete set of invariants. Dependencies in these sets are commonly in a form of complicated polynomials, furthermore they can contain dependencies of higher orders. These theoretical dependencies are valid in the continuous domain but it is well known that in discrete cases are often invalidated by discretization. Therefore, it would be feasible to begin classification with such an over-complete set and adaptively find the pseudo-independent set of invariants by the means of feature selection techniques. This study focuses on testing of the influence of theoretical invariant dependencies in discrete pattern recognition applications.

Aleš Zita, Jan Flusser, Tomáš Suk, Jan Kotera

### GMM Supervectors for Limited Training Data in Hyperspectral Remote Sensing Image Classification

Severely limited training data is one of the major and most common challenges in the field of hyperspectral remote sensing image classification. Supervised learning on limited training data requires either (a) designing a highly capable classifier that can handle such information scarcity, or (b) designing a highly informative and easily separable feature set. In this paper, we adapt GMM supervectors to hyperspectral remote sensing image features. We evaluate the proposed method on two datasets. In our experiments, inclusion of GMM supervectors leads to a mean classification improvement of about $$4.6\%$$.

AmirAbbas Davari, Vincent Christlein, Sulaiman Vesal, Andreas Maier, Christian Riess

### A New Shadow Removal Method Using Color-Lines

In this paper, we present a novel method for single-image shadow removal. From the observation of images with shadow, we find that the pixels from the object with same material will form a line in the RGB color space as illumination changes. Besides, we find these lines do not cross with the origin due to the effect of ambient light. Thus, we establish an offset correction relationship to remove the effect of ambient light. Then we derive a linear shadow image model to perform color-line identification. With the linear model, our shadow removal method is proposed as following. First, perform color-line clustering and illumination estimation. Second, use an on-the-fly learning method to detect umbra and penumbra. Third, estimate the shadow scale by the statistics of shadow-free regions. Finally, refine the shadow scale by illumination optimization. Our method is simple and effective for producing high-quality shadow-free images and has the ability for processing scenes with rich texture types and non-uniform shadows.

Xiaoming Yu, Ge Li, Zhenqiang Ying, Xiaoqiang Guo

### Improving Semantic Segmentation with Generalized Models of Local Context

Semantic segmentation (i.e. image parsing) aims to annotate each image pixel with its corresponding semantic class label. Spatially consistent labeling of the image requires an accurate description and modeling of the local contextual information. Superpixel image parsing methods provide this consistency by carrying out labeling at the superpixel-level based on superpixel features and neighborhood information. In this paper, we develop generalized and flexible contextual models for superpixel neighborhoods in order to improve parsing accuracy. Instead of using a fixed segmentation and neighborhood definition, we explore various contextual models to combine complementary information available in alternative superpixel segmentations of the same image. Simulation results on two datasets demonstrate significant improvement in parsing accuracy over the baseline approach.

Hasan F. Ates, Sercan Sunetci

### An Image-Matching Method Using Template Updating Based on Statistical Prediction of Visual Noise

An image-matching method that can continuously recognize images precisely over a long period of time is proposed. On a production line, although a multitude of the same kind of components can be recognized, the appearance of a target object changes over time. Usually, to accommodate that change in appearance, the template used for image recognition is periodically updated by using past recognition results. At that time, information other than that concerning the target object might be included in the template and cause false recognition. In this research, we define the pixels which become those factors as “noisy-pixel”. With the proposed method, noisy pixels in past recognition results are extracted, and they are excluded from the processing to update the template. Accordingly, the template can be updated in a stable manner. To evaluate the performance of the proposed method, 5000 images in which the appearance of the target object changes (due to variation of lighting and adhesion of dirt) were used. According to the results of the evaluation, the proposed method achieves recognition rate of 99.5%, which is higher than that of a conventional update-type template-matching method.

### Robust Accurate Extrinsic Calibration of Static Non-overlapping Cameras

An increasing number of robots and autonomous vehicles are equipped with multiple cameras to achieve surround-view sensing. The estimation of their relative poses, also known as extrinsic parameter calibration, is a challenging problem, particularly in the non-overlapping case. We present a simple and novel extrinsic calibration method based on standard components that performs favorably to existing approaches. We further propose a framework for predicting the performance of different calibration configurations and intuitive error metrics. This makes selecting a good camera configuration straightforward. We evaluate on rendered synthetic images and show good results as measured by angular and absolute pose differences, as well as the reprojection error distributions.

Andreas Robinson, Mikael Persson, Michael Felsberg

### An Integrated Multi-scale Model for Breast Cancer Histopathological Image Classification with Joint Colour-Texture Features

Breast cancer is one of the most commonly diagnosed cancer in women worldwide, and is commonly diagnosed via histopathological microscopy imaging. Image analysis techniques aid physicians by automating some tasks involved in the diagnostic workflow. In this paper, we propose an integrated model that considers images at different magnifications, for classification of breast cancer histopathological images. Unlike some existing methods which employ a small set of features and classifiers, the present work explores various joint colour-texture features and classifiers to compute scores for the input data. The scores at different magnifications are then integrated. The approach thus highlights suitable features and classifiers for each magnification. Furthermore, the overall performance is also evaluated using the area under the ROC curve (AUC) that can determine the system quality based on patient-level scores. We demonstrate that suitable feature-classifier combinations can largely outperform the state-of-the-art methods, and the integrated model achieves a more reliable performance in terms of AUC over those at individual magnifications.

Vibha Gupta, Arnav Bhavsar

### Multi-label Poster Classification into Genres Using Different Problem Transformation Methods

Classification of movies into genres from the accompanying promotional materials such as posters is a typical multi-label classification problem. Posters usually highlight a movie scene or characters, and at the same time should inform about the genre or the plot of the movie to attract the potential audience, so our assumption was that the relevant information can be captured in visual features.We have used three typical methods for transforming the multi-label problem into a number of single-label problems that can be solved with standard classifiers. We have used the binary relevance, random k-labelsets (RAKEL), and classifier chains with Naïve Bayes classifier as a base classifier. We wanted to compare the classification performance using structural features descriptor extracted from poster images, with the performance obtained using the Classeme feature descriptors that are trained on general images datasets. The classification performance of used transformation methods is evaluated on a poster dataset containing 6000 posters classified into 18 and 11 genres.

Miran Pobar, Marina Ivasic-Kos

### Hybrid Cascade Model for Face Detection in the Wild Based on Normalized Pixel Difference and a Deep Convolutional Neural Network

The main precondition for applications such as face recognition and face de-identification for privacy protection is efficient face detection in real scenes. In this paper, we propose a hybrid cascade model for face detection in the wild. The cascaded two-stage model is based on the fast normalized pixel difference (NPD) detector at the first stage, and a deep convolutional neural network (CNN) at the second stage. The outputs of the NPD detector are characterized by a very small number of false negative (FN) and a much higher number of false positive face (FP) detections. The FP detections are typically an order of magnitude higher than the FN ones. This very high number of FPs has a negative impact on recognition and/or de-identification processing time and on the naturalness of the de-identified images. To reduce the large number of FP face detections, a CNN is used at the second stage. The CNN is applied only on vague face region candidates obtained by the NPD detector that have an NPD score in the interval between two experimentally determined thresholds. The experimental results on the Annotated Faces in the Wild (AFW) test set and the Face Detection Dataset and Benchmark (FDDB) show that the hybrid cascade model significantly reduces the number of FP detections while the number of FN detections are only slightly increased.

Darijan Marčetić, Martin Soldić, Slobodan Ribarić

### Labeling Color 2D Digital Images in Theoretical Near Logarithmic Time

A design of a parallel algorithm for labeling color flat zones (precisely, 4-connected components) of a gray-level or color 2D digital image is given. The technique is based in the construction of a particular Homological Spanning Forest (HSF) structure for encoding topological information of any image. HSF is a pair of rooted trees connecting the image elements at inter-pixel level without redundancy. In order to achieve a correct color zone labeling, our proposal here is to correctly building a sub-HSF structure for each image connected component, modifying an initial HSF of the whole image. For validating the correctness of our algorithm, an implementation in OCTAVE/MATLAB is written and its results are checked. Several kinds of images are tested to compute the number of iterations in which the theoretical computing time differs from the logarithm of the width plus the height of an image. Finally, real images are to be computed faster than random images using our approach.

F. Díaz-del-Río, P. Real, D. Onchis

### What Is the Best Depth-Map Compression for Depth Image Based Rendering?

Many of the latest smart phones and tablets come with integrated depth sensors, that make depth-maps freely available, thus enabling new forms of applications like rendering from different view points. However, efficient compression exploiting the characteristics of depth-maps as well as the requirements of these new applications is still an open issue. In this paper, we evaluate different depth-map compression algorithms, with a focus on tree-based methods and view projection as application.The contributions of this paper are the following: 1. extensions of existing geometric compression trees, 2. a comparison of a number of different trees, 3. a comparison of them to a state-of-the-art video coder, 4. an evaluation using ground-truth data that considers both depth-maps and predicted frames with arbitrary camera translation and rotation.Despite our best efforts, and contrary to earlier results, current video depth-map compression outperforms tree-based methods in most cases. The reason for this is likely that previous evaluations focused on low-quality, low-resolution depth maps, while high-resolution depth (as needed in the DIBR setting) has been ignored up until now. We also demonstrate that PSNR on depth-maps is not always a good measure of their utility.

### Nonlinear Mapping Based on Spectral Angle Preserving Principle for Hyperspectral Image Analysis

The paper proposes three novel nonlinear dimensionality reduction methods for hyperspectral image analysis. The first two methods are based on the principle of preserving pairwise spectral angle mapper (SAM) measures for pixels in a hyperspectral image. The first method is derived in Cartesian coordinates, and the second one in hypersherical coordinates. The third method is based on the approximation of SAM measures by Euclidean distances. For the proposed methods, the paper provides both the theoretical background and fast numerical optimization algorithms based on the stochastic gradient descent technique. The experimental study of the proposed methods is conducted using publicly available hyperspectral images. The study compares the proposed nonlinear dimensionality reduction methods with the principal component analysis (PCA) technique that belongs to linear dimensionality reduction methods. The experimental results show that the proposed approaches provide higher classification accuracy compared to the linear technique when the nearest neighbor classifier using SAM measure is used for classification.

Evgeny Myasnikov

### Object Triggered Egocentric Video Summarization

Egocentric videos are usually of long duration and contains lot of redundancy which makes summarization an essential task for such videos. In this work we are targeting object triggered egocentric video summarization which aims at extracting all the occurrences of an object in a given video, in near real time. We propose a modular pipeline which first aims at limiting the redundant information and then uses a Convolutional Neural Network and LSTM based approach for object detection. Following this we represent the video as a dictionary which captures the semantic information in the video. Matching a query object reduces to doing an And-Or Tree traversal followed by deepmatching algorithm for fine grained matching. The frames containing the object, which would have been missed at the pruning stage are retrieved by running a tracker on the frames selected by the pipeline mentioned. The modular pipeline allows replacing any module with its more efficient version. Performance tests ran on the overall pipeline for egocentric datasets, EDUB dataset and personal recorded videos, give an average recall of 0.76.

Samriddhi Jain, Renu M. Rameshan, Aditya Nigam

### 3D Motion Consistency Analysis for Segmentation in 2D Video Projection

Motion segmentation for 2D videos is usually based on tracked 2D point motions, obtained for a sequence of frames. However, the 3D real world motion consistency is easily lost in the process, due to projection from 3D space to the 2D image plane. Several approaches have been proposed in the literature to recover 3D motion consistency from 2D point motions. To further improve on this, we here propose a new criterion and associated technique, which can be used to determine whether a group of points show 2D motions consistent with joint 3D motion. It is also applicable for estimating the 3D motion information content. We demonstrate that the proposed criterion can be applied to improve segmentation results in two ways: finding the misclassified points in a group, and assigning unclassified points to the correct group. Experiments with synthetic data and different noise levels, and with real data taken from a benchmark, give insight in the performance of the algorithm under various conditions.

Wei Zhao, Nico Roos, Ralf Peeters

### Space-Variant Gabor Decomposition for Filtering 3D Medical Images

This is an experimental paper in which we introduce the possibility to analyze and to synthesize 3D medical images by using multivariate Gabor frames with Gaussian windows. Our purpose is to apply a space-variant filter-like operation in the space-frequency domain to correct medical images corrupted by different types of acquisitions errors. The Gabor frames are constructed with Gaussian windows sampled on non-separable lattices for a better packing of the space-frequency plane. An implementable solution for 3D-Gabor frames with non-separable lattice is given and numerical tests on simulated data are presented.

Darian Onchis, Codruta Istin, Pedro Real

### Learning Based Single Image Super Resolution Using Discrete Wavelet Transform

Sparse representation has attracted considerable attention in image restoration field recently. In this paper, we study the implementation of sparse representation on single-image super resolution problem. In recent research, first and second-order derivatives are always used as features for patches to be trained as dictionaries. In this paper, we proposed a novel single image super resolution algorithm based on sparse representation with considering the effect of significant features. Therefore, the super resolution problem is approached from the viewpoint of preservation of high frequency details using discrete wavelet transform. The dictionaries are constructed from the distinctive features using K-SVD dictionary training algorithm. The proposed algorithm was tested on ‘Set14’ dataset. The proposed algorithm recovers the edges better as well as improving the computational efficiency. The quantitative, visual results and experimental time comparisons show the superiority and competitiveness of the proposed method over the simplest techniques and state-of-art SR algorithm.

Selen Ayas, Murat Ekinci

### Directional Total Variation Based Image Deconvolution with Unknown Boundaries

Like many other imaging inverse problems, image deconvolution suffers from ill-posedness and needs for an adequate regularization. Total variation (TV) is an effective regularizer; hence, frequently used in such problems. Various anisotropic alternatives to isotropic TV have also been proposed to capture different characteristics in the image. Directional total variation (DTV) is such an instance, which is convex, has the ability to capture the smooth boundaries as conventional TV does, and also handles the directional dominance by enforcing piecewice constancy through a direction. In this paper, we solve the deconvolution problem under DTV regularization, by using simple forward-backward splitting machinery. Besides, there are two bottlenecks of the deconvolution problem, that need to be addressed; one is the computational load revealed due to matrix inversions, second is the unknown boundary conditions (BCs). We tackle with the former one by switching to the frequency domain using fast Fourier transform (FFT), and the latter one by iteratively estimating a boundary zone to surrounder the blurred image by plugging a recently proposed framework into our algorithm. The proposed approach is evaluated in terms of the reconstruction quality and the speed. The results are compared to a very recent TV-based deconvolution algorithm, which uses a “partial” alternating direction method of multipliers (ADMM) as the optimization tool, by also plugging the same framework to cope with the unknown BCs.

Ezgi Demircan-Tureyen, Mustafa E. Kamasak

### Backmatter

Weitere Informationen