Skip to main content
Top

2015 | Book

Image Analysis and Processing — ICIAP 2015

18th International Conference, Genoa, Italy, September 7-11, 2015, Proceedings, Part I

insite
SEARCH

About this book

The two-volume set LNCS 9279 and 9280 constitutes the refereed proceedings of the 18th International Conference on Image Analysis and Processing, ICIAP 2015, held in Genoa, Italy, in September 2015. The 129 papers presented were carefully reviewed and selected from 231 submissions. The papers are organized in the following seven topical sections: video analysis and understanding, multiview geometry and 3D computer vision, pattern recognition and machine learning, image analysis, detection and recognition, shape analysis and modeling, multimedia, and biomedical applications.

Table of Contents

Frontmatter

Pattern Recognition and Machine Learning

Frontmatter
Transfer Learning Through Greedy Subset Selection

We study the binary transfer learning problem, focusing on how to select sources from a large pool and how to combine them to yield a good performance on a target task. In particular, we consider the transfer learning setting where one does not have direct access to the source data, but rather employs the source hypotheses trained from them. Building on the literature on the best subset selection problem, we propose an efficient algorithm that selects relevant source hypotheses and feature dimensions simultaneously. On three computer vision datasets we achieve state-of-the-art results, substantially outperforming transfer learning and popular feature selection baselines in a small-sample setting. Also, we theoretically prove that, under reasonable assumptions on the source hypotheses, our algorithm can learn effectively from few examples.

Ilja Kuzborskij, Francesco Orabona, Barbara Caputo
MEG: Multi-Expert Gender Classification from Face Images in a Demographics-Balanced Dataset

In this paper we focus on gender classification from face images, which is still a challenging task in unrestricted scenarios. This task can be useful in a number of ways, e.g., as a preliminary step in biometric identity recognition supported by demographic information. We compare a feature based approach with two score based ones. In the former, we stack a number of feature vectors obtained by different operators, and train a SVM based on them. In the latter, we separately compute the individual scores from the same operators, then either we feed them to a SVM, or exploit likelihood ratio based on a pairwise comparison of their answers. Experiments use EGA database, which presents a good balance with respect to demographic features of stored face images. As expected, feature level fusion achieves an often better classification performance but it is also quite computationally expensive. Our contribution has a threefold value: 1) the proposed score level fusion approaches, though less demanding, achieve results which are rather similar or slightly better than feature level fusion, especially when a particular set of experts are fused; since experts are trained individually, it is not required to evaluate a complex multi-feature distribution and the training process is more efficient; 2) the number of uncertain cases significantly decreases; 3) the operators used are not computationally expensive in themselves.

Modesto Castrillón-Santana, Maria De Marsico, Michele Nappi, Daniel Riccio
An Edge-Based Matching Kernel Through Discrete-Time Quantum Walks

In this paper, we propose a new edge-based matching kernel for graphs by using discrete-time quantum walks. To this end, we commence by transforming a graph into a directed line graph. The reasons of using the line graph structure are twofold. First, for a graph, its directed line graph is a dual representation and each vertex of the line graph represents a corresponding edge in the original graph. Second, we show that the discrete-time quantum walk can be seen as a walk on the line graph and the state space of the walk is the vertex set of the line graph, i.e., the state space of the walk is the edges of the original graph. As a result, the directed line graph provides an elegant way of developing new edge-based matching kernel based on discrete-time quantum walks. For a pair of graphs, we compute the

h

-layer depth-based representation for each vertex of their directed line graphs by computing entropic signatures (computed from discrete-time quantum walks on the line graphs) on the family of

K

-layer expansion subgraphs rooted at the vertex, i.e., we compute the depth-based representations for edges of the original graphs through their directed line graphs. Based on the new representations, we define an edge-based matching method for the pair of graphs by aligning the

h

-layer depth-based representations computed through the directed line graphs. The new edge-based matching kernel is thus computed by counting the number of matched vertices identified by the matching method on the directed line graphs. Experiments on standard graph datasets demonstrate the effectiveness of our new kernel.

Lu Bai, Zhihong Zhang, Peng Ren, Luca Rossi, Edwin R. Hancock
Implicit Boundary Learning for Connectomics

Segmentation of complete neurons in 3D electron microscopy images is an important task in Connectomics. A common approach for automatic segmentation is to detect membrane between neurons in a first step. This is often done with a random forest. We propose a new implicit boundary learning scheme that optimizes the segmentation error of neurons instead of the classification error of membrane. Given a segmentation, optimal labels for boundary between neurons and for non-boundary are found automatically and are used for training. In contrast to training random forests with labels for membrane and intracellular space, this novel training method does not require many labels for the difficult to label membrane and reduces the segmentation error significantly.

Tobias Maier, Thomas Vetter
A Parzen-Based Distance Between Probability Measures as an Alternative of Summary Statistics in Approximate Bayesian Computation

Approximate Bayesian Computation (ABC) are likelihood-free Monte Carlo methods. ABC methods use a comparison between simulated data, using different parameters drawn from a prior distribution, and observed data. This comparison process is based on computing a distance between the summary statistics from the simulated data and the observed data. For complex models, it is usually difficult to define a methodology for choosing or constructing the summary statistics. Recently, a nonparametric ABC has been proposed, that uses a dissimilarity measure between discrete distributions based on empirical kernel embeddings as an alternative for summary statistics. The nonparametric ABC outperforms other methods including ABC, kernel ABC or synthetic likelihood ABC. However, it assumes that the probability distributions are discrete, and it is not robust when dealing with few observations. In this paper, we propose to apply kernel embeddings using a sufficiently smooth density estimator or Parzen estimator for comparing the empirical data distributions, and computing the ABC posterior. Synthetic data and real data were used to test the Bayesian inference of our method. We compare our method with respect to state-of-the-art methods, and demonstrate that our method is a robust estimator of the posterior distribution in terms of the number of observations.

Carlos D. Zuluaga, Edgar A. Valencia, Mauricio A. Álvarez, Álvaro A. Orozco
Unsupervised Classification of Raw Full-Waveform Airborne Lidar Data by Self Organizing Maps

The paper proposes a procedure based on Kohonen’s Self Organizing Maps (SOMs) to perform the unsupervised classification of raw full-waveform airborne LIDAR (Light Detection and Ranging) data, without the need of extracting features from them, that is without any preprocessing. The proposed algorithm allows the classification of points into three classes (“grass”, “trees” and “road”) in two subsequent stages. During the first one, all the raw data are given as input to a SOM and points belonging to the category “trees” are extracted on the basis of the number of peaks that characterize the waveforms. In the second stage, data not previously classified as “trees” are used to create a new SOM that, together with a hierarchical clustering algorithm, allows to distinguish between the classes “road” and “grass”. Experiments carried out show that raw full-waveform LIDAR data were classified with an overall accuracy of 93.9%, 92.5% and 92.9%, respectively.

Eleonora Maset, Roberto Carniel, Fabio Crosilla
Fitting Multiple Models via Density Analysis in Tanimoto Space

This paper deals with the extraction of multiple models from noisy, outlier-contaminated data. We build on the “preference trick” implemented by T-linkage, weakening the prior assumptions on the data: without requiring the tuning of the inlier threshold we develop a new automatic method which takes advantage of the geometric properties of Tanimoto space to bias the sampling toward promising models and exploits a density based analysis in the conceptual space in order to robustly estimate the models. Experimental validation proves that our method compares favourably with T-Linkage on public, real data-sets.

Luca Magri, Andrea Fusiello
Bag of Graphs with Geometric Relationships Among Trajectories for Better Human Action Recognition

This paper presents a new video representation that exploits the geometric relationships among trajectories for human action recognition. Geometric relationships are provided by applying the Delaunay triangulation method on the trajectories of each video frame. Then, graph encoding method called bag of graphs (BOG) is proposed to handle the geometrical relationships between trajectories. BOG considers local graph descriptors to learn a more discriminative graph-based codebook and to represent the video with a histogram of visual graphs. The graph-based codebook is composed of the centers of graph clusters. To define graph clusters, a classification graph technique based on the Hungarian distance is proposed. Experiments using the human action recognition datasets (Hollywood2 and UCF50) show the effectiveness of the proposed approach.

Manel Sekma, Mahmoud Mejdoub, Chokri Ben Amar
Have a SNAK. Encoding Spatial Information with the Spatial Non-alignment Kernel

The standard bag of visual words model model ignores the spatial information contained in the image, but researchers have demonstrated that the object recognition performance can be improved by including spatial information. A state of the art approach is the spatial pyramid representation, which divides the image into spatial bins. In this paper, another general approach that encodes the spatial information in a much better and efficient way is described. The proposed approach is to embed the spatial information into a kernel function termed the Spatial Non-Alignment Kernel (SNAK). For each visual word, the average position and the standard deviation is computed based on all the occurrences of the visual word in the image. These are computed with respect to the center of the object, which is determined with the help of the objectness measure. The pairwise similarity of two images is then computed by taking into account the difference between the average positions and the difference between the standard deviations of each visual word in the two images. In other words, the SNAK kernel includes the spatial distribution of the visual words in the similarity of two images. Furthermore, various kernel functions can be plugged into the SNAK framework. Object recognition experiments are conducted to compare the SNAK framework with the spatial pyramid representation, and to assess the performance improvements for various state of the art kernels on two benchmark data sets. The empirical results indicate that SNAK significantly improves the object recognition performance of every evaluated kernel. Compared to the spatial pyramid, SNAK improves performance while consuming less space and time. In conclusion, SNAK can be considered a good candidate to replace the widely-used spatial pyramid representation.

Radu Tudor Ionescu, Marius Popescu
Convolved Multi-output Gaussian Processes for Semi-Supervised Learning

Multi-output learning has become in a strong field of research in machine learning community during the last years. This setup considers the occurrence of multiple and related tasks in real-world problems. Another approach called semi-supervised learning (SSL) is the middle point between the case where all training samples are labeled (supervised learning) and the case where all training samples are unlabeled (unsupervised learning). In many applications it is difficult or impossible to access to fully labeled data. At these scenarios, SSL becomes a very useful methodology to achieve successful results, either for regression or for classification. In this paper, we propose the use of kernels for vector-valued functions for Gaussian process multi-output regression in the context of semi-supervised learning. We combine a Gaussian process with process convolution (PC) type of covariance function with techniques commonly used in semi-supervised learning like the Expectation-Maximization (EM) algorithm, and Graph-based regularization. We test our proposed method in two widely used databases for multi-output regression. Results obtained by our method exhibit a better performance compared to supervised methods based on Gaussian processes in scenarios where there are not available a good amount of labeled data.

Hernán Darío Vargas Cardona, Mauricio A. Álvarez, Álvaro A. Orozco
Volcano-Seismic Events Classification Using Document Classification Strategies

In this paper we propose a novel framework for the classification of volcano-seismic events, based on strategies and concepts typically employed to classify documents – subsequently largely employed also in other fields. In the proposed approach, we define a dictionary of “seismic words”, used to represent a seismic event as a “seismic document” (i.e. a collection of seismic words). Given this representation, we exploit two well-known models for documents (Bag-of-words and topic models) to derive signatures for seismic events, usable for classification. An empirical evaluation, based on a set of seismic signals from Galeras volcano in Colombia, confirms the potentialities of the proposed scheme, both in terms of interpretability and classification accuracies, also in comparison with standard approaches.

Manuele Bicego, John Makario Londoño-Bonilla, Mauricio Orozco-Alzate
Unsupervised Feature Selection by Graph Optimization

Graph based methods have played an important role in machine learning due to their ability to encode the similarity relationships among data. A commonly used criterion in graph based feature selection methods is to select the features which best preserve the data similarity or a manifold structure derived from the entire feature set. However, these methods separate the processes of learning the feature similarity graph and feature ranking. In practice, the ideal feature similarity graph is difficult to define in advance. Because one needs to assign appropriate values for parameters such as the neighborhood size or the heat kernel parameter involved in graph construction, the process is conducted independently of subsequent feature selection. As a result the performance of feature selection is largely determined by the effectiveness of graph construction. In this paper, on the other hand, we attempt to learn a graph strucure closely linked with the feature selection process. The idea is to unify graph construction and data transformation, resulting in a new framework which results in an optimal graph rather than a predefined one. Moreover, the

$$\ell _{2,1}$$

-norm is imposed on the transformation matrix to achieve row sparsity when selecting relevant features. We derive an efficient algorithm to optimize the proposed unified problem. Extensive experimental results on real-world benchmark data sets show that our method consistently outperforms the alternative feature selection methods.

Zhihong Zhang, Lu Bai, Yuanheng Liang, Edwin R. Hancock
Gait Recognition Robust to Speed Transition Using Mutual Subspace Method

Person recognition from gait images is not robust to speed changes. To deal with this problem, generally existing methods have focused on training a model to transform gait features from various speeds into a common walking speed, and the model was trained with gait images with a variety of speeds. However in case that a subject walks with a speed which is not trained in the model, the performance gets worse. In this paper we introduce an idea that an image set-based matching approach, which omits walking speed information, has a potential to solve the problem. This is based on the assumption that speed information may not be critical information to gait recognition, since speed variations are universal phenomena. To prove the proposed idea, we apply a mutual subspace method to gait images and show the effectiveness of the proposed idea with the OU-ISIR gait speed transition database.

Yumi Iwashita, Hitoshi Sakano, Ryo Kurazume
Path-Based Dominant-Set Clustering

Although off-the-shelf clustering algorithms, such as those based on spectral graph theory, do a pretty good job at finding clusters of arbitrary shape and structure, they are inherently unable to satisfactorily deal with situations involving the presence of cluttered backgrounds. On the other hand, dominant sets, a generalization of the notion of maximal clique to edge-weighted graphs, exhibit a complementary nature: they are remarkably effective in dealing with background noise but tend to favor compact groups. In order to take the best of the two approaches, in this paper we propose to combine path-based similarity measures, which exploit connectedness information of the elements to be clustered, with the dominant-set approach. The resulting algorithm is shown to consistently outperform standard clustering methods over a variety of datasets under severe noise conditions.

Eyasu Zemene, Marcello Pelillo
Global and Local Gaussian Process for Multioutput and Treed Data

We propose a novel Multi-Level Multiple Output Gaussian Process framework for dealing with multivariate and treed data.We define a two-layer hierarchical tree with parent nodes on the upper layer and children nodes on the lower layer in order to represent the interaction between the multiple outputs.Then we compute the Multiple Output Gaussian Process (MGP) covariance matrix as a linear combination of a global multiple output covariance matrix (using the total number of outputs) and a set of local matrices (only using the outputs belonging to each parent node). With this construction of the covariance matrix and the tree we are capable to do interpolation using the MGP framework. To improve the results, we also test different ways of computing the Intrinsic Model of Coregionalization covariance matrix that uses the input space. Results over synthetic data, Motion Capture data and Wireless data shows that the proposed methodology makes a better representation of treed multiple output data.

Jhouben J. Cuesta, Mauricio A. Álvarez, Álvaro Á. Orozco
BRISK Local Descriptors for Heavily Occluded Ball Recognition

This paper focuses on the ball detection algorithm that analyzes candidate ball regions to detect the ball. Unfortunately, in the time of goal, the goal-posts (and sometimes also some players) partially occlude the ball or alter its appearance (due to their shadows cast on it). This often makes ineffective the traditional pattern recognition approaches and it forces the system to make the decision about the event based on estimates and not on the basis of the real ball position measurements. To overcome this drawback, this work compares different descriptors of the ball appearance, in particular it investigates on both different well known feature extraction approaches and the recent local descriptors BRISK in a soccer match context. This paper analyzes critical situations in which the ball is heavily occluded in order to measure robustness, accuracy and detection performances. The effectiveness of BRISK compared with other local descriptors is validated by a huge number of experiments on heavily occluded ball examples acquired under realistic conditions.

Pier Luigi Mazzeo, Paolo Spagnolo, Cosimo Distante
Neighborhood Selection for Dimensionality Reduction

Though a great deal of research work has been devoted to the development of dimensionality reduction algorithms, the problem is still open. The most recent and effective techniques, assuming datasets drawn from an underlying low dimensional manifold embedded into an high dimensional space, look for “small enough” neighborhoods which should represent the underlying manifold portion. Unfortunately, neighborhood selection is an open problem, for the presence of noise, outliers, points not uniformly distributed, and to unexpected high manifold curvatures, causing the inclusion of geodesically distant points in the same neighborhood. In this paper we describe our neighborhood selection algorithm, called

ONeS

; it exploits both distance and angular information to form neighborhoods containing nearby points that share a common local structure in terms of curvature. The reported experimental results show the enhanced quality of the neighborhoods computed by

ONeS

w.r.t. the commonly used

k

-neighborhoods solely employing the euclidean distance.

Paola Campadelli, Elena Casiraghi, Claudio Ceruti
Crowdsearching Training Sets for Image Classification

The success of an object classifier depends strongly on its training set, but this fact seems to be generally neglected in the computer vision community, which focuses primarily on the construction of descriptive features and the design of fast and effective learning mechanisms. Furthermore, collecting training sets is a very expensive step, which needs a considerable amount of manpower for selecting the most representative samples for an object class. In this paper, we face this problem, following the very recent trend of automatizing the collection of training images for image classification: in particular, here we exploit a source of information never considered so far for this purpose, that is the textual tags. Textual tags are usually attached by the crowd to the images of social platforms like Flickr, associating the visual content to explicit semantics, which unfortunately is noisy in many cases. Our approach leverages this shared knowledge, and collects images spanning the visual variance of an object class, removing at the same time the noise by different filtering query expansion techniques. Comparative results promote our method, which is capable to automatically generate in few minutes a training dataset leading to an 81.41% of average precision on the PASCAL VOC 2012 dataset.

Sami Abduljalil Abdulhak, Walter Riviera, Marco Cristani
The Color of Smiling: Computational Synaesthesia of Facial Expressions

This note gives a preliminary account of the transcoding or rechanneling problem between different stimuli as it is of interest for the natural interaction or affective computing fields. By the consideration of a simple example, namely the color response of an affective lamp to a sensed facial expression, we frame the problem within an information-theoretic perspective. A full justification in terms of the Information Bottleneck principle promotes a latent affective space, hitherto surmised as an appealing and intuitive solution, as a suitable mediator between the different stimuli.

Vittorio Cuculo, Raffaella Lanzarotti, Giuseppe Boccignone
Learning Texture Image Prior for Super Resolution Using Restricted Boltzmann Machine

Field of Expert (FoE) [

1

], which is one of the most popular probabilistic models of natural image prior, has been successfully applied to super resolution. Piecewise smoothness imposed on natural images is, however, a relatively limited model for texture image. In the field of deep learning, various approaches for texture modeling using the Restricted Boltzmann Machine (RBM) achieves or surpasses the state-of-the-art on many tasks such as texture synthesis and inpainting. In this paper, we apply the convolutional RBM (cRBM) to learning a texture prior. The maximum a posteriori (MAP) framework is proposed to utilize the probabilistic texture model well. The experiment is done on the Brodatz Dataset, and our experimental results are shown to be comparable to those using FoE and other super resolution approaches.

Chulmoo Kang, Minui Hong, Suk I. Yoo
GRUNTS: Graph Representation for UNsupervised Temporal Segmentation

We propose GRUNTS, a feature independent method for temporal segmentation via unsupervised learning. GRUNTS employs graphs, through skeletonization and polygonal approximation, to represent objects in each frame, and graph matching to efficiently compute a Frame Kernel Matrix able to encode the similarities between frames. We report the results of temporal segmentation in the case of human action recognition, obtained by adopting the Aligned Cluster Analysis (ACA), as unsupervised learning strategy. GRUNTS has been tested on three challenging datasets: the Weizmann dataset, the KTH dataset and the MSR Action3D dataset. Experimental results on these datasets demonstrate the effectiveness of GRUNTS for segmenting actions, mainly compared with supervised learning, typically more computationally expensive and not prone to be real time.

Francesco Battistone, Alfredo Petrosino, Gabriella Sanniti di Baja
A Strict Pyramidal Deep Neural Network for Action Recognition

A human action recognition method is reported in which pose representation is based on the contour points of the human silhouette and actions are learned by a strict 3d pyramidal neural network (3

DPyraNet

) model which is based on convolutional neural networks and the image pyramids concept. 3

DPyraNet

extracts features from both spatial and temporal dimensions by keeping biological structure, thereby it is capable to capture the motion information encoded in multiple adjacent frames. One outlined advantage of 3

DPyraNet

is that it maintains spatial topology of the input image and presents a simple connection scheme with lower computational and memory costs compared to other neural networks. Encouraging results are reported for recognizing human actions in real-world environments.

Ihsan Ullah, Alfredo Petrosino
Nerve Localization by Machine Learning Framework with New Feature Selection Algorithm

The application of Ultrasound-Guided Regional Anesthesia (UGRA) is growing rapidly in medical field and becoming a standard procedure in many worldwide hospitals. UGRA practice requires a high training skill. Nerve detection is among the difficult tasks that anesthetists can meet in UGRA procedure. There is a need for automatic method to localize the nerve zone in ultrasound images, in order to assist anesthetists to better perform this procedure. On the other hand, the nerve detection in this type of images is a challenging task, since the noise and other artifacts corrupt visual properties of such tissue. In this paper, we propose a nerve localization framework with a new feature selection algorithm. The proposed method is based on several statistical approaches and learning models, taking advantage of each approach to increase performance. Results show that the proposed method can correctly and efficiently identify the nerve zone and outperforms the state-of-the-art techniques. It achieves

$$82\%$$

of accuracy (f-score index) on a first dataset (8 patients) and

$$61\%$$

on a second dataset (5 patients, acquired in different period of time and not used for training).

Oussama Hadjerci, Adel Hafiane, Pascal Makris, Donatello Conte, Pierre Vieyres, Alain Delbos
Human Tracking Using a Top-Down and Knowledge Based Approach

In this paper, we propose a new top-down and knowledge-based approach to perform human tracking in video sequences. First, introduction of knowledge allows to anticipate most of common problems encountered by tracking methods. Second, we define a top-down approach rather than a classical bottom-up approach to encode the knowledge. The more global point of view of the scene provided by our top-down approach also allows to keep some consistency among the set of trajectories extracted from the video sequence. A preliminary experimentation has been conducted over some challenging sequences of the PETS 2009 dataset. The obtained results confirm that our approach can still achieve promising performance even with a consistent reduction in the amount of information taken into account during the tracking process. In order to show the relevance of considering knowledge to address tracking problem, we strongly reduce the amount of information provided to our approach.

Benoit Gaüzère, Pierluigi Ritrovato, Alessia Saggese, Mario Vento

Shape Analysis and 3D Computer Vision

Frontmatter
Fuzzy “Along” Spatial Relation in 3D. Application to Anatomical Structures in Maxillofacial CBCT

Spatial relations have proved to be of great importance in computer vision and image understanding. One issue is their modeling in the image domain, hence allowing for their integration in segmentation and recognition algorithms. In this paper, we focus on the “along” spatial relation. Based on a previous work in 2D, we propose extensions to 3D. Starting from the inter-objects region, we demonstrate that the elongation of the interface between the objects and this region gives a good evaluation of the alongness degree. We also integrate distance information to take into account only close objects parts. Then we describe how to define the alongness relation within the fuzzy set theory. Our method gives a quantitative satisfaction degree of the relation, reliable for differentiating spatial situations. An original example on the maxillofacial area in Cone-Beam Computed Tomography (CBCT) illustrates how the proposed approach could be used to recognize elongated structures.

Timothée Evain, Xavier Ripoche, Jamal Atif, Isabelle Bloch
Compression and Querying of Arbitrary Geodesic Distances

In this paper, we propose a novel method for accelerating the computation of geodesic distances over arbitrary manifold triangulated surfaces. The method is based on a preprocessing step where we build a data structure. This allows to store arbitrary complex distance metrics. We show that, by exploiting the precomputed data, the proposed method is significantly faster than the classical Dijkstra algorithm for the computation of point to point distances. Moreover, as we precompute exact geodesic distances, the proposed approach can be more accurate than state-of-the-art approximations.

Rosario Aiello, Francesco Banterle, Nico Pietroni, Luigi Malomo, Paolo Cignoni, Roberto Scopigno
Comparing Persistence Diagrams Through Complex Vectors

The natural pseudo-distance of spaces endowed with filtering functions is precious for shape classification and retrieval; its optimal estimate coming from persistence diagrams is the bottleneck distance, which unfortunately suffers from combinatorial explosion. A possible algebraic representation of persistence diagrams is offered by complex polynomials; since far polynomials represent far persistence diagrams, a fast comparison of the coefficient vectors can reduce the size of the database to be classified by the bottleneck distance. This article explores experimentally three transformations from diagrams to polynomials and three distances between the complex vectors of coefficients.

Barbara Di Fabio, Massimo Ferri
Pop-up Modelling of Hazy Scenes

This paper describes the construction of a layered scene-model, based on a single hazy image of an outdoor scene. A depth map and radiance image are estimated by standard dehazing methods. The radiance image is then segmented into a small number of clusters, and a corresponding scene-plane is estimated for each. This provides the basic structure of a 2.5-D scene model, without the need for multiple views, or image correspondences. We show that problems of gap-filling and depth blending can be addressed systematically, with respect to the layered depth-structure. The final models, which resemble cardboard ‘pop-ups’, are visually convincing. An HTML5/WebGL implementation is described, and subjective depth-preferences are tested in a psychophysical experiment.

Lingyun Zhao, Miles Hansard, Andrea Cavallaro
3D Geometric Analysis of Tubular Objects Based on Surface Normal Accumulation

This paper proposes a simple and efficient method for the reconstruction and extraction of geometric parameters from 3D tubular objects. Our method constructs an image that accumulates surface normal information, then peaks within this image are located by tracking. Finally, the positions of these are optimized to lie precisely on the tubular shape centerline. This method is very versatile, and is able to process various input data types like full or partial mesh acquired from 3D laser scans, 3D height map or discrete volumetric images. The proposed algorithm is simple to implement, contains few parameters and can be computed in linear time with respect to the number of surface faces. Since the extracted tube centerline is accurate, we are able to decompose the tube into rectilinear parts and torus-like parts. This is done with a new linear time 3D torus detection algorithm, which follows the same principle of a previous work on 2D arc circle recognition. Detailed experiments show the versatility, accuracy and robustness of our new method.

Bertrand Kerautret, Adrien Krähenbühl, Isabelle Debled-Rennesson, Jacques-Olivier Lachaud
Tongue in Cheek

Differences between image processing and other disciplines in the definition and the role of shape are explored. Diverse approaches to quantifying shape are reviewed. Attention is drawn to the need for close coupling between image acquisition and shape analysis. An example of the effect of the means of observation on dynamic shape extraction is drawn from linguistics. Current instrumentation for measuring shape changes in the vocal tract are described. Advanced image processing based on emerging imaging technologies is proposed for linguistic and therapeutic applications of articulatory phonetics.

George Nagy, Naomi Nagy
Where Is the Ground? Quality Measures for the Planar Digital Terrain Model in Terrestrial Laser Scanning

In the analysis of terrestrial laser scanning (TLS) data the digital terrain model (DTM) is one of important elements. To evaluate the DTM or to find the DTM by way of optimization it is necessary to formulate the measure of DTM quality. Three parameterized measures are proposed and tested against a comparative model for a series of TLS data. The measure equal to the number of points inside a layer of specified height above the plane appeared to produce the most distinct maximum for an optimal model. The measures have been applied to the planar DTM but their use for other models is possible.

Marcin Bator, Leszek J. Chmielewski, Arkadiusz Orłowski
Extending the sGLOH Descriptor

This paper proposes an extension of the sGLOH keypoint descriptor [

3

] which improves its robustness and discriminability. The sGLOH descriptor can handle discrete rotations by a cyclic shift of its elements thanks to its circular structure, but its performance can decrease when the keypoint relative rotation is in between two sGLOH discrete rotations. The proposed extension couples together two sGLOH descriptors for the same patch with different rotations in order to cope with this issue and it can be also applied straightly to the sCOr and sGOr matching strategies of sGLOH. Experimental results show a consistent improvement of the descriptor discriminability, while different setups can be used to reduce the running time according to the desired task.

Fabio Bellavia, Carlo Colombo
Fast Superpixel-Based Hierarchical Approach to Image Segmentation

Image segmentation is one of the core task in image processing. Traditionally such operation is performed starting from single pixels requiring a significant amount of computations. It has been shown that superpixels can be used to improve segmentation performance. In this work we propose a novel superpixel-based hierarchical approach for image segmentation that works by iteratively merging nodes of a weighted undirected graph initialized with the superpixels regions. Proper metrics to drive the regions merging are proposed and experimentally validated using the standard Berkeley Dataset. Our analysis shows that the proposed algorithm runs faster than state of the art techniques while providing accurate segmentation results both in terms of visual and objective metrics.

Francesco Verdoja, Marco Grangetto
Supertetras: A Superpixel Analog for Tetrahedral Mesh Oversegmentation

Over the past decade, computer vision algorithms have transitioned from relying on the direct, pixel-based representation of images to the use of

superpixels

, small regions whose boundaries agree with image contours. This intermediate representation improves the tractability of image understanding because it reduces the number of primitives to be taken under consideration from several million to a few hundred. Despite the improvements yielded in the area of image segmentation, the concept of an oversegmentation as an intermediate representation has not been adopted in volumetric mesh processing. We take a first step in this direction, adapting a fast and efficient superpixel algorithm to the tetrahedral mesh case, present results which demonstrate the quality of the output oversegmentation, and illustrate its use in a semantic segmentation application.

Giulia Picciau, Patricio Simari, Federico Iuricich, Leila De Floriani
Extraction of Successive Patterns in Document Images by a New Concept Based on Force Histogram and Thick Discrete Lines

The problematic of automatically searching series of broad patterns in technical documents is studied. Such series can be assumed to ordered information useful for the understanding of documents. The proposed methodology is able to extract successive patterns of different natures without a priori information. To make this, we consider the spatial location of triplets of similar connected components using force histogram and the recognition is performed by considering surrounding discrete lines. This new model is fast and it allows a good extraction of occulted patterns in presence of noise while requiring only few thresholds, which can be automatically set from data.

Isabelle Debled-Rennesson, Laurent Wendling
Hierarchical Mesh Segmentation Editing Through Rotation Operations

Hierarchical and multi-resolution models are well known tools used in may application domains for representing an object at varying levels of detail. In the case of segmentations computed on a mesh, a hierarchical model can be structured as a binary tree representing the hierarchy of the region merging operations performed on the original segmentation for reducing its resolution. In this paper, we address the problem of modifying a hierarchical segmentation in order to augment its expressive power. We adapt two well-known operators defined for modifying binary trees, namely left and right rotation, to the case of hierarchical segmentations. Such operators are then applied to modifying a given hierarchy based on a user-defined function and based on a user-defined segmentation.

Federico Iuricich, Patricio Simari
Local Feature Extraction in Log-Polar Images

We propose two different strategies to compute edges in the log-polar (cortical) domain. The space-variant processing is obtained by applying local operators (e.g. local derivative filters) directly on the log-polar images, or by embedding the same operators into the log-polar mapping, thus obtaining a cortical representation of the Cartesian features. The two approaches have been tested by taking into consideration three standard algorithms for edge detection (Canny, Marr-Hildreth and Harris), applied onto the BSDS500 dataset. Qualitative and quantitative comparisons show a first indication of the validity of the proposed approaches.

Manuela Chessa, Fabio Solari
Scale-Space Techniques for Fiducial Points Extraction from 3D Faces

We propose a method for extracting fiducial points from human faces that uses 3D information only and is based on two key steps: multi-scale curvature analysis, and the reliable tracking of features in a scale-space based on curvature. Our scale-space analysis, coupled to careful use of prior information based on variability boundaries of anthropometric facial proportions, does not require a training step, because it makes direct use of morphological characteristics of the analyzed surface. The proposed method precisely identifies important fiducial points and is able to extract new fiducial points that were previously unrecognized, thus paving the way to more effective recognition algorithms.

Nikolas De Giorgis, Luigi Rocca, Enrico Puppo
Filtering Non-Significant Quench Points Using Collision Impact in Grassfire Propagation

The skeleton of an object is defined as the set of quench points formed during Blum’s grassfire transformation. Due to high sensitivity of quench points with small changes in the object boundary and the membership function (for fuzzy objects), often, a large number of redundant quench points is formed. Many of these quench points are caused by peripheral protrusions and dents and do not associate themselves with core shape features of the object. Here, we present a significance measure of quench points using the collision impact of fire-fronts and explore its role in filtering noisy quench points. The performance of the method is examined on three-dimensional shapes at different levels of noise and fuzziness, and compared with previous methods. The results have demonstrated that collision impact together with appropriate filtering kernels eliminate most of the noisy quench voxels while preserving those associated with core shape features of the object

Dakai Jin, Cheng Chen, Punam K. Saha
Robust and Efficient Camera Motion Synchronization via Matrix Decomposition

In this paper we present a structure-from-motion pipeline based on the synchronization of relative motions derived from epipolar geometries. We combine a robust rotation synchronization technique with a fast translation synchronization method from the state of the art. Both reduce to computing matrix decompositions: low-rank & sparse and spectral decomposition. These two steps successfully solve the motion synchronization problem in a way that is both

efficient

and

robust

to outliers. The pipeline is global for it considers all the images at the same time. Experimental validation demonstrates that our pipeline compares favourably with some recently proposed methods.

Federica Arrigoni, Beatrice Rossi, Andrea Fusiello
Novel View-Synthesis from Multiple Sources for Conversion to 3DS

In this paper we confront the problem of uncalibrated view synthesis, i.e. rendering novel images from two, or more images without any knowledge on camera parameters. The method builds on the computation of planar parallax and focuses on the application of converting a monocular image sequence to a 3D stereo video, a problem that requires the positioning of the virtual camera outside the actual motion trajectory. The paper addresses both geometric and practical issues related to the rendering. We validate our method by showing both quantitative and qualitative results.

Francesco Malapelle, Andrea Fusiello, Beatrice Rossi, Pasqualina Fragneto
Dynamic Optimal Path Selection for 3D Triangulation with Multiple Cameras

When a physical feature is observed by two or more cameras, its position in the 3D space can be easily recovered by means of triangulation. However, for such estimate to be reliable, accurate intrinsic and extrinsic calibration of the capturing devices must be available. Extrinsic parameters are usually the most problematic, especially when dealing with a large number of cameras. This is due to several factors, including the inability to observe the same reference object over the entire network and the sometimes unavoidable displacement of cameras over time. With this paper we propose a game-theoretical method that can be used to dynamically select the most reliable rigid motion between cameras observing the same feature point. To this end we only assume to have a (possibly incomplete) graph connecting cameras whose edges are labelled with extrinsic parameters obtained through pairwise calibration.

Mara Pistellato, Filippo Bergamasco, Andrea Albarelli, Andrea Torsello
Smartphone-Based Obstacle Detection for the Visually Impaired

One of the main problems that visually impaired people have to deal with is moving autonomously in an unknown environment. Currently, the most used autonomous walking aid is still the white can. Though in the last few years more technological devices have been introduced, referred to as electronic travel aids (ETAs). In this paper, we present a novel ETA based on computer vision. Exploiting the hardware and software facilities of a standard smartphone, our system is able to extract a 3D representation of the scene and detect possible obstacles. To achieve such a result, images are captured by the smartphone camera and processed with a modified Structure from Motion algorithm that takes as input also information from the built-in gyroscope. Then the system estimates the ground-plane and labels as obstacles all the structures above it. Results on indoor and outdoor test sequences show the effectiveness of the proposed method.

Alessandro Caldini, Marco Fanfani, Carlo Colombo
Efficient Moving Point Handling for Incremental 3D Manifold Reconstruction

As incremental Structure from Motion algorithms become effective, a good sparse point cloud representing the map of the scene becomes available frame-by-frame. From the 3D Delaunay triangulation of these points, state-of-the-art algorithms build a manifold rough model of the scene. These algorithms integrate incrementally new points to the 3D reconstruction only if their position estimate does not change. Indeed, whenever a point moves in a 3D Delaunay triangulation, for instance because its estimation gets refined, a set of tetrahedra have to be removed and replaced with new ones to maintain the Delaunay property; the management of the manifold reconstruction becomes thus complex and it entails a potentially big overhead. In this paper we investigate different approaches and we propose an efficient policy to deal with moving points in the manifold estimation process. We tested our approach with four sequences of the KITTI dataset and we show the effectiveness of our proposal in comparison with state-of-the-art approaches.

Andrea Romanoni, Matteo Matteucci
Volumetric 3D Reconstruction and Parametric Shape Modeling from RGB-D Sequences

The recent availability of low-cost RGB-D sensors and the maturity of machine vision algorithms makes shape-based parametric modeling of 3D objects in natural environments more practical than ever before. In this paper, we investigate the use of RGB-D based modeling of natural objects using RGB-D sensors and a combination of volumetric 3D reconstruction and parametric shape modeling. We apply the general method to the specific case of detecting and modeling quadric objects, with the ellipsoid shape of a pineapple as a special case, in cluttered agricultural environments, towards applications in fruit health monitoring and crop yield prediction. Our method estimates the camera trajectory then performs volumetric reconstruction of the scene. Next, we detect fruit and segment out point clouds that belong to fruit regions. We use two novel methods for robust estimation of a parametric shape model from the dense point cloud: (i) MSAC-based robust fitting of an ellipsoid to the 3D-point cloud, and (ii) nonlinear least squares minimization of dense SIFT (scale invariant feature transform) descriptor distances between fruit pixels in corresponding frames. We compare our shape modeling methods with a baseline direct ellipsoid estimation method. We find that model-based point clouds show a clear advantage in parametric shape modeling and that our parametric shape modeling methods are more robust and better able to estimate the size, shape, and volume of pineapple fruit than is the baseline direct method.

Yoichi Nakaguro, Waqar S. Qureshi, Matthew N. Dailey, Mongkol Ekpanyapong, Pished Bunnun, Kanokvate Tungpimolrut

Biomedical Applications

Frontmatter
Efficient Resolution Enhancement Algorithm for Compressive Sensing Magnetic Resonance Image Reconstruction

Magnetic resonance imaging (MRI) has been widely applied in a number of clinical and preclinical applications. However, the resolution of the reconstructed images using conventional algorithms are often insufficient to distinguish diagnostically crucial information due to limited measurements. In this paper, we consider the problem of reconstructing a high resolution (HR) MRI signal from very limited measurements. The proposed algorithm is based on compressed sensing, which combines wavelet sparsity with the sparsity of image gradients, where the magnetic resonance (MR) images are generally sparse in wavelet and gradient domain. The main goal of the proposed algorithm is to reconstruct the HR MR image directly from a few measurements. Unlike the compressed sensing (CS) MRI reconstruction algorithms, the proposed algorithm uses multi measurements to reconstruct HR image. Also, unlike the resolution enhancement algorithms, the proposed algorithm perform resolution enhancement of MR image simultaneously with the reconstruction process from few measurements. The proposed algorithm is compared with three state-of-the-art CS-MRI reconstruction algorithms in sense of signal-to-noise ratio and full-with-half-maximum values.

Osama A. Omer, M. Atef Bassiouny, Ken’ichi Morooka
Towards Accurate Segmentation of Fibroglandular Tissue in Breast MRI Using Fuzzy C-Means and Skin-Folds Removal

Breast density measuring the volumetric portion of fibroglandular tissue is considered as an important factor in evaluating breast cancer risk of women. Categorizing breast density into different levels by human observers is time-consuming and subjective, which may result in large inter-reader variability. In this work, we propose a fully automated fibroglandular tissue segmentation technique aiming to assist automatic breast density measurement in magnetic resonance imaging (MRI). Firstly, a bias field correction algorithm is applied. Secondly, the breast mask is segmented to exclude air background and thoracic tissues, such as liver, heart and lung. Thirdly, the segmentation is further refined by removing the skin-folds that are normally included in the breast mask and mimic the fibroglandular tissue, leading to incorrect density estimation. Finally, we apply a fuzzy c-means approach to extract the fibroglandular tissue within the breast mask. To quantitatively evaluate the proposed method, a total of 50 MR scans were collected. By comparing the volume overlap between manually annotated fibroglandular tissue with the results of our method, we achieved an average Dice Similarity Coefficient (DSC) of 0.84.

Mohammad Razavi, Lei Wang, Albert Gubern-Mérida, Tatyana Ivanovska, Hendrik Laue, Nico Karssemeijer, Horst K. Hahn
Robust and Fast Vessel Segmentation via Gaussian Derivatives in Orientation Scores

We propose a robust and fully automatic matched filter-based method for retinal vessel segmentation. Different from conventional filters in 2D image domains, we construct a new matched filter based on second-order Gaussian derivatives in so-called orientation scores, functions on the coupled space of position and orientations

$$\mathbb {R}^2 \rtimes S^1$$

. We lift 2D images to 3D orientation scores by means of a wavelet-type transform using an anisotropic wavelet. In the domain

$$\mathbb {R}^2 \rtimes S^1$$

, we set up rotation and translation invariant second-order Gaussian derivatives. By locally matching the multi-scale second order Gaussian derivative filters with data in orientation scores, we are able to enhance vessel-like structures located in different orientation planes accordingly. Both crossings and tiny vessels are well-preserved due to the proposed multi-scale and multi-orientation filtering method. The proposed method is validated on public databases DRIVE and STARE, and we show that the method is both fast and reliable. With respectively a sensitivity and specificity of 0.7744 and 0.9708 on DRIVE, and 0.7940 and 0.9707 on STARE, our method gives improved performance compared to state-of-the-art algorithms.

Jiong Zhang, Erik Bekkers, Samaneh Abbasi, Behdad Dashtbozorg, Bart ter Haar Romeny
Information-Based Cost Function for a Bayesian MRI Segmentation Framework

A new information-based cost function is introduced for learning the conditional class probability model required in the probabilistic atlas-based brain magnetic resonance image segmentation. Aiming to improve the segmentation results, the

$$\alpha $$

-order Renyi’s entropy is considered as the function to be maximized since this kind of functions has been proved to lead to more discriminative distributions. Additionally, we developed the model parameter update for the considered function, leading to a set of weighted averages dependant on the

$$\alpha $$

factor. Our proposal is tested by segmenting the well-known

BrainWeb

synthetic brain MRI database and compared against the log-likelihood function. Achieved results show an improvement in the segmentation accuracy of

$$\sim 5\%$$

with respect to the baseline cost function.

David Cárdenas-Peña, Alvaro A. Orozco, Germán Castellanos-Dominguez
Learning by Sampling for White Blood Cells Segmentation

The visual analysis and the counting of white blood cells in microscopic peripheral blood smears is a very important procedure in the medical field. It can provide useful information concerning the health of the patients, e.g., the diagnosis of Acute Lymphatic Leukaemia or other important diseases. Blood experts in clinical centres traditionally use these methods in order to perform a manual analysis. The main issues of the traditional human analysis are certainly related to the difficulties encountered during this type of procedure: generally, the process is not rapid and it is strongly influenced by the operator’s capabilities and tiredness. The main purpose of this work is to realize a reliable automated system based on a multi-class Support Vector Machine in order to manage all the regions of immediate interests inside a blood smear: white blood cells nucleus and cytoplasm, erythrocytes and background. The experimental results demonstrate that the proposed method is very accurate and robust being able to reach an accuracy in segmentation of 99%, indicating the possibility to tune this approach to each couple of microscope and camera.

Cecilia Di Ruberto, Andrea Loddo, Lorenzo Putzu
Fully Automatic Brain Tumor Segmentation by Using Competitive EM and Graph Cut

Manual MRI brain tumor segmentation is a difficult and time consuming task which makes computer support highly desirable. This paper presents a hybrid brain tumor segmentation strategy characterized by the allied use of Graph Cut segmentation method and Competitive Expectation Maximization (CEM) algorithm. Experimental results were obtained by processing in-house collected data and public data from benchmark data sets. To see if the proposed method can be considered an alternative to contemporary methods, the results obtained were compared with those obtained by authors who undertook the

Multi-modal Brain Tumor Segmentation

challenge. The results obtained prove that the method is competitive with recently proposed approaches.

Valentina Pedoia, Sergio Balbi, Elisabetta Binaghi
An Automatic Method for Metabolic Evaluation of Gamma Knife Treatments

Lesion volume delineation of Positron Emission Tomography images is challenging because of the low spatial resolution and high noise level. Aim of this work is the development of an operator independent segmentation method of metabolic images. For this purpose, an algorithm for the biological tumor volume delineation based on random walks on graphs has been used. Twenty-four cerebral tumors are segmented to evaluate the functional follow-up after Gamma Knife radiotherapy treatment. Experimental results show that the segmentation algorithm is accurate and has real-time performance. In addition, it can reflect metabolic changes useful to evaluate radiotherapy response in treated patients.

Alessandro Stefano, Salvatore Vitabile, Giorgio Russo, Massimo Ippolito, Franco Marletta, Corrado D’Arrigo, Davide D’Urso, Maria Gabriella Sabini, Orazio Gambino, Roberto Pirrone, Edoardo Ardizzone, Maria Carla Gilardi
Spinal Canal and Spinal Marrow Segmentation by Means of the Hough Transform of Special Classes of Curves

In this paper we present a Hough Transform-based method for the detection of the spinal district in X-ray Computed Tomography (CT) images in order to build binary masks that can be applied to functional images to infer information on the metabolic activity of the spinal marrow. This kind of information may be of particular interest for the study of the spinal marrow physiology in both health and disease.

Annalisa Perasso, Cristina Campi, Anna Maria Massone, Mauro C. Beltrametti
A New Graph-Based Method for Automatic Segmentation

In this paper, a new graph-based segmentation method is proposed. Various Regions of Interest (ROIs) can be extracted from digital images/volumes without requiring any processing parameters. Only one point belonging to the region of interest must be given.

The method, starting from a single source element, proceeds with a specific propagation mechanism based on the graph theory, to find a Minimum Path Spanning Tree (MPST).

As compared with other existing segmentation methods, a new cost function is here proposed. It allows the process to be adaptive to both a local and global context, to be optimal and independent from the order of analysis, requiring a single iteration step. The final decision step is based on a threshold value that is automatically selected. Performance evaluation is presented by applying the method in the biomedical field, considering the extraction of wrist bones from real Magnetic Resonance Imaging (MRI) volumes.

Laura Gemme, Silvana Dellepiane
Color Spaces in Data Fusion of Multi-temporal Images

The data fusion process is strongly recommended in biomedical applications. It allows a better detection and localization of the pathology, as well as the diagnosis and follow-up of many diseases [

1

], especially with multi-parametric or multi-temporal data.

The independent visualization of multiple images from large volumes is a main cause of errors and inaccuracy within the interpretation process. In this respect, the use of color fusion methods allows to highlight small details from multi-temporal and multi-parametric images.

In the present work, a color data fusion approach is proposed for multi-temporal images, in particular for images of the liver acquired through triphasic CT.

The best color association has been studied considering various data sources. Different metrics for quality assessment have been selected from the color space theory, making an interesting comparison with the human visual perception.

Roberta Ferretti, Silvana Dellepiane
TRAgen: A Tool for Generation of Synthetic Time-Lapse Image Sequences of Living Cells

In biomedical image processing, correct tracking of individual cells is important task for the study of dynamic cellular processes. It is, however, often difficult to decide whether obtained tracking results are correct or not. This is mainly due to complexity of the data that can show hundreds of cells, due to improper data sampling either in time or in space, or when the time-lapse sequence consists of blurred noisy images. This prohibits manual extraction of reliable ground truth (GT) data as well. Nonetheless, if reliable testing data with GT were available, one could compare the results of the examined tracking algorithm with the GT and assess its performance quantitatively.

In this paper, we introduce a novel versatile tool capable of generating 2D image sequences showing simulated living cell populations with GT for evaluation of biomedical tracking. The simulated events include namely cell motion, cell division, and cell clustering up to tissue-level density. The method is primarily designed to operate at inter-cellular scope.

Vladimír Ulman, Zoltán Orémuš, David Svoboda
Automatic Image Analysis and Classification for Urinary Bacteria Infection Screening

In this paper, we present an automatic system for the screening of urinary tract infections. It is estimated that about 150 million infections of this kind occur world wide yearly, giving rise to roughly five billion health–care expenditures. Currently, Petri plates seeded with infected samples are analyzed by human experts, an error prone and lengthy process. Nevertheless, based on image processing techniques and machine learning tools, the recognition of the bacterium type and the colony count can be automatically carried out. The proposed system captures a digital image of the plate and, after a preprocessing stage to isolate the colonies from the culture ground, accurately identifies the infection type and severity. Moreover, it contributes to the standardization of the analysis process, also avoiding the continuous transition between sterile and external environments, which is typical in the classical laboratory procedure.

Paolo Andreini, Simone Bonechi, Monica Bianchini, Alessandro Mecocci, Vincenzo Di Massa
LBP-TOP for Volume Lesion Classification in Breast DCE-MRI

Dynamic Contrast Enhanced-Magnetic Resonance Imaging (DCE-MRI) is a complementary diagnostic method for early detection of breast cancer. However, due to the large amount of information, DCE-MRI data can hardly be inspected without the use of a Computer Aided Diagnosis (CAD) system. Among the major issues in developing CAD for breast DCE-MRI there is the classification of segmented regions of interest according to their aggressiveness.

While there is a certain amount of evidence that dynamic information can be suitably used for lesion classification, it still remains unclear whether other kinds of features (e.g. texture-based) can add useful information. This pushes the exploration of new features coming from different research fields such as Local Binary Pattern (LBP) and its variants. In particular, in this work we propose to use LBP-TOP (Three Orthogonal Projections) for the assessment of lesion malignancy in breast DCE-MRI. Different classifiers as well as the influence of a motion correction technique have been considered. Our results indicate an improvement by using LPB-TOP in combination with a Random Forest classifier (84.6% accuracy) with respect to previous findings in literature.

Gabriele Piantadosi, Roberta Fusco, Antonella Petrillo, Mario Sansone, Carlo Sansone
Kernel Centered Alignment Supervised Metric for Multi-Atlas Segmentation

Recently multi-atlas based methods have been used for supporting brain structure segmentation. These approaches encode the shape variability on a given population and provide prior information. However, the accuracy on the segmentation depend on the capability of the each atlas on the dataset to propagate the labels to the target image. In this sense, the selection of the most relevant atlases becomes an important task. In this paper, a new locally-weighted criterion is proposed to highlight spatial correspondences between images, aiming to enhance multi-atlas based segmentation results. Our proposal combines the spatial correspondences by a linear weighted combination and uses the kernel centered alignment criterion to find the best weight combination. The proposal is tested in an MRI segmentation task for state of the art image metrics as Mean Squares and Mutual Information and it is compared against other weighting criterion methods. Obtained results show that our approach outperforms the baseline methods providing a more suitable atlas selection and improving the segmentation of ganglia basal structures.

Mauricio Orbes-Arteaga, David Cárdenas-Peña, Mauricio A. Álvarez, Alvaro A. Orozco, Germán Castellanos-Dominguez

Multimedia

Frontmatter
Emotions in Abstract Art: Does Texture Matter?

The classification of images based on the emotions they evoke is a recent approach in multimedia. With the abundance of digitized images from museum archives and the ever-growing digital production of user-generated images, there is a greater need for intelligent image retrieval algorithms. Categorization of images according to their emotional impact offers a useful addition to the state of the art in image search. In this work, we apply computer vision techniques on abstract paintings to automatically predict emotional valence based on texture. We also propose a method to derive a small set of features (Perlin parameters) from an image to represent its overall texture. Finally, we investigate the saliency distribution in these images, and show that computational models of bottom-up attention can be used to predict emotional valence in a parsimonious manner.

Andreza Sartori, Berhan Şenyazar, Alkim Almila Akdag Salah, Albert Ali Salah, Nicu Sebe
Movie Genre Classification by Exploiting MEG Brain Signals

Genre classification is an essential part of multimedia content recommender systems. In this study, we provide experimental evidence for the possibility of performing genre classification based on brain recorded signals. The brain decoding paradigm is employed to classify magnetoencephalography (MEG) data presented in [

1

] to four genre classes: Comedy, Romantic, Drama, and Horror. Our results show that: 1) there is a significant correlation between audio-visual features of movies and corresponding brain signals specially in the visual and temporal lobes; 2) the genre of movie clips can be classified with an accuracy significantly over the chance level using the MEG signal. On top of that we show that the combination of multimedia features and MEG-based features achieves the best accuracy. Our study provides a primary step towards user-centric media content retrieval using brain signals.

Pouya Ghaemmaghami, Mojtaba Khomami Abadi, Seyed Mostafa Kia, Paolo Avesani, Nicu Sebe
Egocentric Video Personalization in Cultural Experiences Scenarios

In this paper we propose a novel approach for egocentric video personalization in a cultural experience scenario, based on shots automatic labelling according to different semantic dimensions, such as web leveraged knowledge of the surrounded cultural Points Of Interest, information about stops and moves, both relying on geolocalization, and camera’s wearer behaviour. Moreover we present a video personalization web system based on shots multi-dimensional semantic classification, that is designed to aid the visitor to browse and to retrieve relevant information to obtain a customized video. Experimental results show that the proposed techniques for video analysis achieve good performances in unconstrained scenario and user evaluation tests confirm that our solution is useful and effective.

Patrizia Varini, Giuseppe Serra, Rita Cucchiara
Advanced Content Based Image Retrieval for Fashion

In this paper we propose a new content based approach for clothing image retrieval trying to mimic the human vision understanding not only based on naive manipulation of texture and color, but also combining some recent and advanced techniques like human pose estimation, super-pixel segmentation and cloth parsing. Moreover, we exploit metric learning to improve the image matching phase by proposing a new approach to learn a distance properly designed for the analyzed application. Specially in fashion sector our work seems very helpful in obtaining more accurate categorization and naturally desirable image retrieval from a large database of images of models dressing various types of style, pattern and fashion. In particular, a drastic improvement is observed when the metric learning strategy is introduced.

Tewodros Mulugeta Dagnew, Umberto Castellani
Backmatter
Metadata
Title
Image Analysis and Processing — ICIAP 2015
Editors
Vittorio Murino
Enrico Puppo
Copyright Year
2015
Electronic ISBN
978-3-319-23231-7
Print ISBN
978-3-319-23230-0
DOI
https://doi.org/10.1007/978-3-319-23231-7

Premium Partner