Classified-Distance Based Shape Descriptor for Application to Image Retrieval

We propose a method to improve the quality and the query time of shape-based image retrieval. We define a novel and an accurate shape descriptor named

Distance Interior Ratio (DIR)

that is invariant to rigid motion and scaling. The DIR of shapes can also be stored in an efficient search structure. Our experimental result shows a higher retrieval rate and efficient query time.

Jinhee Chun, Natsuda Kaothanthong, Takeshi Tokuyama

A Shape Descriptor Based on Trainable COSFIRE Filters for the Recognition of Handwritten Digits

The recognition of handwritten digits is an application which has been used as a benchmark for comparing shape recognition methods. We train COSFIRE filters to be selective for different parts of handwritten digits. In analogy with the neurophysiological concept of population coding we use the responses of multiple COSFIRE filters as a shape descriptor of a handwritten digit. We demonstrate the effectiveness of the proposed approach on two data sets of handwritten digits: Western Arabic (MNIST) and Farsi for which we achieve high recognition rates of 99.52% and 99.33%, respectively. COSFIRE filters are conceptually simple, easy to implement and they are versatile trainable feature detectors. The shape descriptor that we propose is highly effective to the automatic recognition of handwritten digits.

George Azzopardi, Nicolai Petkov

Supporting Ancient Coin Classification by Image-Based Reverse Side Symbol Recognition

Coins and currency are studied in the field of Numismatics. Our aim in this article is to use the knowledge of Numismatics for the development of part of a framework for the visual classification of ancient coins. Symbols minted on the reverse side of these coins vary greatly in their shapes and visual structures. Due to this property of symbols, we propose to use them as a discriminative feature for the visual classification of ancient coins. We use dense sampling based bag of visual words (BoVWs) approach for our problem. Due to the fact that BoVWs lack the spatial information, we evaluate three types of schemes to incorporate spatial information. Other parameters of BoVWs such as the size of visual vocabulary, level of detail of the dense sampling grid and number of features per image to construct the visual vocabulary are also investigated.

Hafeez Anwar, Sebastian Zambanini, Martin Kampel

Eyewitness Face Sketch Recognition Based on Two-Step Bias Modeling

Over 30 years of psychological studies on eyewitness testimonies procedures show severe flaws including ignoring human face perception biases that render these procedures unreliable. In addition, recent studies show that current automatic face sketch recognition methods are only tested on over simplified databases, and therefore cannot address the real cases. We here present a face sketch recognition method based on non-artistic sketches in which we firstly estimate and remove personal face perception biases from face sketches, and then recognize them based on a psychologically inspired matching technique. In addition, we use a general-specific modeling that only needs a few training samples for each individual for an accurate and robust performance. In our experiments, we tested accuracy and robustness against previous works, and the effect of number of training samples on the accuracy of our method.

Hossein Nejati, Li Zhang, Terence Sim

Weighted Semi-Global Matching and Center-Symmetric Census Transform for Robust Driver Assistance

Automotive applications based on stereo vision require robust and fast matching algorithms, which makes semi-global matching (SGM) a popular method in this field. Typically the Census transform is used as a cost function, since it is advantageous for outdoor scenes. We propose an extension based on center-symmetric local binary patterns, which allows better efficiency and higher matching quality. Our second contribution exploits knowledge about the three-dimensional structure of the scene to selectively enforce the smoothness constraints of SGM. It is shown that information about surface normals can be easily integrated by weighing the paths according to the gradient of the disparity. The different approaches are evaluated on the KITTI benchmark, which provides real imagery with LIDAR ground truth. The results indicate improved performance compared to state-of-the-art SGM based algorithms.

Robert Spangenberg, Tobias Langner, Raúl Rojas

Handwritten Word Image Matching Based on Heat Kernel Signature

Keyword Spotting is an alternative method for retrieving query words, without Optical Character Recognition (OCR), by calculating the similarity between features of word images rather than ASCII content. However, because of unconstrained writing styles with large variations, the retrieving results are always not very satisfactory.

In this paper, we propose a novel method, which is based on Heat Kernel Signature (HKS) and Triangular Mesh Structure to achieve handwritten word image matching. HKS can tolerate large variations in handwritten word images and capture local features. On the other hand, the triangular mesh structure is used to present global characteristics. Moreover, our method does not need pre-processing steps.

Xi Zhang, Chew Lim Tan

Wrong Roadway Detection for Multi-lane Roads

The paper contributes to the detection of driving on the wrong side of the road by addressing in particular multi-lane road situations. We suggest a solution using video data of a single camera only for identifying the current lane of the ego-vehicle. GPS data are used for knowing defined constraints on driving directions for the current road.

Junli Tao, Bok-Suk Shin, Reinhard Klette

Blind Deconvolution Using Alternating Maximum a Posteriori Estimation with Heavy-Tailed Priors

Single image blind deconvolution aims to estimate the unknown blur from a single observed blurred image and recover the original sharp image. Such task is severely ill-posed and typical approaches involve some heuristic or other steps without clear mathematical explanation to arrive at an acceptable solution. We show that a straightforward maximum a posteriory estimation combined with very sparse priors and an efficient numerical method can produce results, which compete with much more complicated state-of-the-art methods.

Jan Kotera, Filip Šroubek, Peyman Milanfar

Focus Fusion with Anisotropic Depth Map Smoothing

Focus fusion methods combine a set of images focused at different depths into a single image where all parts are in focus. The quality of the fusion result strongly depends on a decision map that determines the in-focus areas. Most approaches in the literature achieve this by local decisions without explicitly enforcing smoothness of the depth map. The goal of our paper is to introduce a modern regularisation strategy where we assume that neighbouring pixels in the resulting image have a similar depth. To this end, we consider a partial differential equation (PDE) for the depth map. It combines a robustified data fidelity term with an anisotropic diffusion strategy that involves a matrix-valued diffusion tensor. Experiments with synthetic and real-world data show that this depth map regularisation can improve existing fusion methods substantially. Our methodology is general and can be applied to improve many existing fusion methods.

Madina Boshtayeva, David Hafner, Joachim Weickert

Accurate Fibre Orientation Measurement for Carbon Fibre Surfaces

Carbon- and glass fibre materials exhibit challenging optical properties, in particular highly specular reflectivity. State of the art vision-based sensor systems use diffuse light setups to suppress the specific reflectivity and texture analysis methods to obtain the fibre orientation for surface patches.

We propose a different sensor that is based on a fibre reflection model and directly measures fibre orientation, diffuse and specular reflectivity for each surface pixel. The proposed sensor is robust to changes of the fibre material and capable to deliver fast and accurate information about fibre direction without requiring time consuming texture analysis. We show that the root mean squared measurement error is around 0.4° for dry carbon fibre and 1.2° for pre-preg materials.

Stefan Thumfart, Werner Palfinger, Matthias Stöger, Christian Eitzinger

Benchmarking GPU-Based Phase Correlation for Homography-Based Registration of Aerial Imagery

Many multi-image fusion applications require fast registration methods in order to allow real-time processing. Although the most popular approaches, local-feature-based methods, have proven efficient enough for registering image pairs at real-time, some applications like multi-frame background subtraction, super-resolution or high-dynamic-range imaging benefit from even faster algorithms. A common trend to speed up registration is to implement the algorithms on graphic cards (GPUs). However not all algorithms are specially suited for massive parallelization via GPUs. In this paper we evaluate the speed of a well-known global registration method, i.e. phase correlation, for computing 8-DOF homographies. We propose a benchmark to compare a CPU- and GPU-based implementation using different systems and a dataset of aerial imagery. We demonstrate that phase correlation benefits from GPU-based implementations much more than local methods, significantly increasing the processing speed.

Falk Schubert, Krystian Mikolajczyk

Robustness of Point Feature Detection

This paper evaluates 2D feature detection methods with respect to invariance and efficiency properties. The studied feature detection methods are as follows: Speeded Up Robust Features, Scale Invariant Feature Transform, Binary Robust Invariant Scalable Keypoints, Oriented Binary Robust Independent Elementary Features, Features from Accelerated Segment Test, Maximally Stable Extremal Regions, Binary Robust Independent Elementary Features, and Fast Retina Keypoint. A long video sequence of traffic scenes is used for testing these feature detection methods. A brute-force matcher and Random Sample Consensus are used in order to analyse how robust these feature detection methods are with respect to scale, rotation, blurring, or brightness changes. After identifying matches in subsequent frames, RANSAC is used for removing inconsistent matches; remaining matches are taken as correct matches. This is the essence of our proposed evaluation technique. All the experiments use a proposed repeatability measure, defined as the ratio of the numbers of correct matches, and of all keypoints.

Zijiang Song, Reinhard Klette

Depth Super-Resolution by Enhanced Shift and Add

We use multi-frame super-resolution, specifically,

Shift & Add

, to increase the resolution of depth data. In order to be able to deploy such a framework in practice, without requiring a very high number of observed low resolution frames, we improve the initial estimation of the high resolution frame. To that end, we propose a new data model that leads to a median estimation from densely upsampled low resolution frames. We show that this new formulation solves the problem of undefined pixels and further allows to improve the performance of pyramidal motion estimation in the context of super-resolution without additional computational cost. As a consequence, it increases the motion diversity within a small number of observed frames, making the enhancement of depth data more practical. Quantitative experiments run on the Middlebury dataset show that our method outperforms state-of-the-art techniques in terms of accuracy and robustness to the number of frames and to the noise level.

Kassem Al Ismaeil, Djamila Aouada, Bruno Mirbach, Björn Ottersten

Using Region-Based Saliency for 3D Interest Points Detection

The detection of interest points is an important pre-processing step for the analysis of mesh surfaces. This paper proposes a new method for the detection of the interest points for 3D surfaces. We first use a smoothing process, then estimate the saliency employing a region-based detection method and select the interest points in terms of the entropy in due voxel. The evaluation is carried out by studying the repeatability of the detected interest points under different perceptual conditions, such as different viewpoints and noise corruption. The results show that the proposed method achieves better performance than competitors.

Yitian Zhao, Yonghuai Liu, Ziming Zeng

Accurate 3D Multi-marker Tracking in X-ray Cardiac Sequences Using a Two-Stage Graph Modeling Approach

The in-depth analysis of heart movements under varying conditions is an important problem of cardiac surgery. To reveal the movement of relevant muscular parts, biplanar X-ray recordings of implanted radio-opaque markers are acquired. As manually locating these markers in the images is a very time-consuming task, our goal is to automate this process. Taking into account the difficulties in the recorded data such as missing detections or 2D occlusions, we propose a two-stage graph-based approach for both 3D tracklet and 3D track generation. In the first stage of our approach, we construct a directed acyclic graph of 3D observations to obtain tracklets via shortest path optimization. Afterwards, full tracks are extracted from a tracklet graph in a similar manner. This results in a globally optimal linking of detections and tracklets, while providing a flexible framework which can easily be adapted to various tracking scenarios based on the edge cost functions. We validate our approach on an X-ray sequence of a beating sheep heart based on manually labeled ground-truth marker positions. The results show that the performance of our method is comparable to human experts, while standard 3D tracking approaches such as particle filters are outperformed.

Xiaoyan Jiang, Daniel Haase, Marco Körner, Wolfgang Bothe, Joachim Denzler

3D Mesh Decomposition Using Protrusion and Boundary Part Detection

The number of 3D models is growing every day and the segmentation of such models has recently attracted a lot of attention. In this paper we propose a two-phase approach for segmentation of 3D models. We leverage a well-known fact from electrical physics about charge distribution for both initial protruding part extraction and boundary detection. The first phase tries to locate the initial protruding parts, which have higher charge density, while the second phase utilizes the minima rule and an area-based approach to find the boundary in the concave regions. The proposed approach has a great advantage over the similar approach proposed by Wu and Levine [1]; our approach can find boundaries for some joining parts not entirely located in the concave region which is not the case in the work of Wu and Levine. The experimental result on the McGill and SHREC 2007 datasets show promising results for partial matching in 3D model retrieval.

Fattah Alizadeh, Alistair Sutherland

Isometrically Invariant Description of Deformable Objects Based on the Fractional Heat Equation

Recently, a number of researchers have turned their attention to the creation of isometrically invariant shape descriptors based on the heat equation. The reason for this surge in interest is that the Laplace-Beltrami operator, associated with the heat equation, is highly dependent on the topology of the underlying manifold, which may lead to the creation of highly accurate descriptors. In this paper, we propose a generalisation based on the fractional heat equation. While the heat equation enables one to explore the shape with a Markovian Gaussian random walk, the fractional heat equation explores the manifold with a non-Markovian Lévy random walk. This generalisation provides two advantages. These are, first, that the process has a memory of the previously explored geometry and, second, that it is possible to correlate points or vertices which are not part of the same neighbourhood. Consequently, a highly accurate, contextual shape descriptor may be obtained.

Eric Paquet, Herna Lydia Viktor

Discriminant Analysis Based Level Set Segmentation for Ultrasound Imaging

Segmentation is one of the fundamental tasks in computer vision applications. The nature of ultrasound images, which are subject to multiplicative noise instead of the widely used additive noise modeling, leads to problems of standard segmentation algorithms. In this paper we propose a new level set approach for the segmentation of medical ultrasound data. The advantage of this approach is both its simpleness and robustness: the noise inherent in ultrasound images does not have to be modeled explicitly but is rather estimated by means of discriminant analysis. In particular, we determine an optimal threshold, which enables us to separate two signal distributions in the intensity histogram and incorporate this information in the evolution of the level set contour. The superiority of our approach over the popular Chan-Vese formulation is demonstrated on real 2D patient data from echocardiography.

Daniel Tenbrinck, Xiaoyi Jiang

Region Based Contour Detection by Dynamic Programming

Dynamic programming (DP) is a popular technique for contour detection, particularly in biomedical image analysis. Although gradient information is typically used in such methods, it is not always a reliable measure to work with and there is a strong need of non-gradient based methods. In this paper we present a general framework for region based contour detection by dynamic programming. It is based on a global energy function which is approximated by a radial ray-wise summation to enable dynamic programming. Its simple algorithmic structure allows to use arbitrarily complex region models and model testing functions, in particular by means of techniques from robust statistics. The proposed framework was tested on synthetic data and real microscopic images. A performance comparison with the standard gradient-based DP and a recent non-gradient DP-based contour detection algorithm clearly demonstrates the superiority of our approach.

Xiaoyi Jiang, Daniel Tenbrinck

Sparse Coding and Mid-Level Superpixel-Feature for ℓ0-Graph Based Unsupervised Image Segmentation

We propose in this paper a graph-based unsupervised segmentation approach that combines superpixels, sparse representation, and a new mid-level feature to describe superpixels. Given an input image, we first extract a set of interest points either by sampling or using a local feature detector, and we compute a set of low-level features associated with the patches centered at the interest points. We define a low-level dictionary as the collection of all these low-level features. We call superpixel a region of an oversegmented image obtained from the input image, and we compute the low-level features associated with it. Then we compute for each superpixel a mid-level feature defined as the sparse coding of its low-level features in the aforementioned dictionary. These mid-level features not only carry the same information as the initial low-level features, but also carry additional contextual cue. We use the superpixels at several segmentation scales, their associated mid-level features, and the sparse representation coefficients to build graphs at several scales. Merging these graphs leads to a bipartite graph that can be partitioned using the Transfer Cut algorithm. We validate the proposed mid-level feature framework on the MSRC dataset, and the segmented results show improvements from both qualitative and quantitative viewpoints compared with other state-of-the-art methods.

Xiaofang Wang, Huibin Li, Simon Masnou, Liming Chen

Intuitive Large Image Database Browsing Using Perceptual Similarity Enriched by Crowds

The main objective of image browsers is to empower users to find a desired image with ease, speed and accuracy from a large database. In this paper we present a novel approach at creating an image browsing environment based on human perception with the aim of providing intuitive image navigation. In our approach, similarity judgments form the basic structural organization for the images in our browser. To enrich this we have developed a scalable crowd sourced method of augmenting a database with a large number of additional samples by capturing human judgments from members of a crowd. Experiments were conducted involving two databases that demonstrate the effectiveness of our method as an intuitive, fast browsing environment for large image databases.

Stefano Padilla, Fraser Halley, David A. Robb, Mike J. Chantler

Irreversibility Analysis of Feature Transform-Based Cancelable Biometrics

Technologies of cancelable biometrics protect biometric data by applying transforms to biometric signals which enable a comparison of biometric templates in the transformed domain. Thereby biometric data is permanently protected, preventing identity fraud and privacy violation. The inversion of transformed biometric signals must not be feasible for potential attackers, i.e. irreversibility must be guaranteed.

In this work we propose a theoretical estimation of the irreversibility of generic cancelable biometric systems. Based on a general formalization of cancelable feature transforms the complexity of reverse engineering is evaluated, quantifying the security provided by the respective approach. In addition, examples are presented which clearly point at the security provided by existing approaches.

Christian Rathgeb, Christoph Busch

L ∞ Norm Based Solution for Visual Odometry

In the present work, a novel approach to the monocular visual odometry problem is detailed. More powerful and robust techniques such as convex optimisation with the L

∞

norm and the H

∞

Filter are adopted. Using monocular systems makes the motion estimation challenging due to the absolute scale ambiguity caused by projective effects. For this, we propose robust tools to estimate both the trajectory of a moving object and the unknown absolute scale ratio between consecutive image pairs. The proposed solution uses as input only images provided by a single camera mounted on the roof of a ground vehicle. Experimental evaluations showed that convex optimisation with the L

∞

norm and the robust H

∞

Filter clearly outperforms classical methods based on least squares and Levenberg-Marquardt algorithms.

Mohammed Boulekchour, Nabil Aouf

Matching Folded Garments to Unfolded Templates Using Robust Shape Analysis Techniques

This work presents a novel method performing shape matching of folded garments to unfolded templates, aiming to facilitate unfolding by robotic manipulators. The proposed method incorporates robust shape analysis techniques, estimating point correspondences between contours of folded garments and unfolded templates. The analysis results are also used for estimating the location of the folding axis on the templates and discriminating between different types of garments. The method has been experimentally evaluated using both synthetic and real datasets of folded garments and the produced results indicate the usefulness of the proposed approach.

Ioannis Mariolis, Sotiris Malassiotis

Multi-scale Image Segmentation Using MSER

Recently several research works propose image segmentation algorithms using MSER. However they aim at segmenting out specific regions corresponding to user-defined objects. This paper proposes a novel algorithm based on MSER which segments natural images without user intervention and captures multi-scale structure. The algorithm collects MSERs and then partitions whole image plane by redrawing them in specific order. To denoise and smooth the region boundaries, hierarchical morphological operations are developed. To illustrate effectiveness of the algorithm’s multi-scale structure, effects of various types of LOD control are shown for image stylization.

Il-Seok Oh, Jinseon Lee, Aditi Majumder

Multi-spectral Material Classification in Landscape Scenes Using Commodity Hardware

We investigate the advantages of a stereo, multi-spectral acquisition system for material classification in ground-level landscape images. Our novel system allows us to acquire high-resolution, multi-spectral stereo pairs using commodity photographic equipment. Given additional spectral information we obtain better classification of vegetation classes than the standard RGB case. We test the system in two modes: splitting the visible spectrum into six bands; and extending the recorded spectrum to near infra-red. Our six-band design is more practical than standard multi-spectral techniques and foliage classification using acquired images compares favourably to using a standard camera.

Gwyneth Bradbury, Kenny Mitchell, Tim Weyrich

Multispectral Stereo Image Correspondence

This paper presents a novel multispectral stereo image correspondence approach. It is evaluated using a stereo rig constructed with a visible spectrum camera and a long wave infrared spectrum camera. The novelty of the proposed approach lies on the usage of Hough space as a correspondence search domain. In this way it avoids searching for correspondence in the original multispectral image domains, where information is low correlated, and a common domain is used. The proposed approach is intended to be used in outdoor urban scenarios, where images contain large amount of edges. These edges are used as distinctive characteristics for the matching in the Hough space. Experimental results are provided showing the validity of the proposed approach.

Marcelo D. Pistarelli, Angel D. Sappa, Ricardo Toledo

NLP EAC Recognition by Component Separation in the Eye Region

This paper investigates the recognition of the Eye Accessing Cues (EACs) used in Neuro-Linguistic Programming (NLP) and shows how computer vision techniques can be used for understanding the meaning of non-visual gaze directions. Any specific EAC is identified by the relative position of the iris within the eye bounding box, which is determined from modified versions of the classical integral projections. The eye cues are inferred via a logistic classifier from features extracted within the eye bounding box. The here proposed solution is shown to outperform in terms of detection rate other classical approaches.

Ruxandra Vrânceanu, Corneliu Florea, Laura Florea, Constantin Vertan

OPF-MRF: Optimum-Path Forest and Markov Random Fields for Contextual-Based Image Classification

Some machine learning methods do not exploit contextual information in the process of discovering, describing and recognizing patterns. However, spatial/temporal neighboring samples are likely to have same behavior. Here, we propose an approach which unifies a supervised learning algorithm - namely Optimum-Path Forest - together with a Markov Random Field in order to build a prior model holding a spatial smoothness assumption, which takes into account the contextual information for classification purposes. We show its robustness for brain tissue classification over some images of the well-known dataset IBSR.

Rodrigo Nakamura, Daniel Osaku, Alexandre Levada, Fabio Cappabianco, Alexandre Falcão, Joao Papa

Orthonormal Diffusion Decompositions of Images for Optical Flow Estimation

This paper proposes an ortho-diffusion decomposition of graphs for estimating motion from image sequences. Orthonormal decompositions of the adjacency matrix representations of image data are alternated with diffusions and data subsampling in order to robustly represent image features using undirected graphs. Modified Gram-Schmidt with pivoting the columns algorithm is applied recursively for the orthonormal decompositions at various scales. This processing produces a set of ortho-diffusion bases and residual diffusion wavelets at each image representation scale. The optical flow is estimated using the similarity in the ortho-diffusion bases space extracted from regions of two different image frames.

Sravan Gudivada, Adrian G. Bors

Pairwise Similarity for Line Extraction from Distorted Images

Clustering a given set of data is crucial in many fields including image processing. It plays important roles in image segmentation and object detection for example. This paper proposes a framework of building a similarity matrix for a given dataset, which is then used for clustering the dataset. The similarity between two points are defined based on how other points distribute around the line connecting the two points. It can capture the degree of how the two points are placed on the same line. The similarity matrix is considered as a kernel matrix of the given dataset, and based on it, the spectral clustering is performed. Clustering with the proposed similarity matrix is shown to perform well through experiments using an artificially designed problem and a real-world problem of detecting lines from a distorted image.

Hideitsu Hino, Jun Fujiki, Shotaro Akaho, Yoshihiko Mochizuki, Noboru Murata

Plant Leaf Classification Using Color on a Gravitational Approach

Literature describes the analysis and identification of plant leaves as a difficult task. Many features may be used to describe a plant leaf. One of them is its texture, which is also one of most important features in image analysis. This paper proposes to study the texture information of all three color channels of a plant leaf by converting it into a simplified gravitational system in collapse. We also use fractal dimension to describe the states of the gravitational collapse as they occur. This enable us to describe the texture information as a function of complexity and colapsing time. During the experiments, we compare our approach to other color texture analysis methods in a plant leaves dataset.

Jarbas J. de M. Sá Junior, André R. Backes, Paulo César Cortez

Semi-automatic Image Annotation

High quality ground truth data is essential for the development of image recognition systems. General purpose datasets are widely used in research, but they are not suitable as training sets for specialized real-world recognition tasks. The manual annotation of custom ground truth data sets is expensive, but machine learning techniques can be applied to preprocess image data and facilitate annotation. We propose a semi-automatic image annotation process, which clusters images according to similarity in a bag-of-features (BoF) approach. Clusters of images can be efficiently annotated in one go. The system recalculates the clustering continuously, based on partial annotations provided during annotation, by weighting BoF vector elements to increase intra-cluster similarity. Visualization of top-weighted codebook elements allows users to estimate the quality of annotations and of the recalculated clustering.

Julia Moehrmann, Gunther Heidemann

Segmentation of Skin Spectral Images Using Simulated Illuminations

Spectral imaging has been drawing attentions in skin segmentation related medical and computer vision applications. Usually spectral images contain radiance spectra or approximated reflectance spectra of objects. The reflectance spectrum is independent of illuminations, whereas the radiance spectrum is dependent on the illumination that was used during the measurement. Recently, the general illumination is using increasingly LED lights that have different spectral characteristics when compared to incandescent or fluorescent lights. In this paper, we studied effects of illuminations in unsupervised pixel-based skin segmentation with the Spectral Clustering (SC) algorithm based on two sets of skin spectral images, human hand and face. We adopted the eigengap heuristic to select the illumination in skin segmentation with the SC algorithm. Spectral radiance images under three CIE standard illuminants (A, D65, and F11), and two white LED illuminations, were respectively simulated from corresponding spectral reflectance images. The experimental results showed that CIE D65 illuminant return the highest average segmentation accuracies with 2.1% point and 3.4% point higher than the lowest results of illuminant CIE A in hand images and illuminant CIE F11 in face images, respectively. The accuracies of two LED illuminations are also high with around 1.5% point and 1.0% point decreases in average from the accuracy of illuminant CIE D65.

Zhengzhe Wu, Ville Heikkinen, Markku Hauta-Kasari, Jussi Parkkinen

Robust Visual Object Tracking via Sparse Representation and Reconstruction

Visual object tracking plays an essential role in vision based applications. Most of the previous research has limitations due to the non-discriminated features used or the focus on simple template matching without the consideration of appearance variations. To address these challenges, this paper proposes a new approach for robust visual object tracking via sparse representation and reconstruction, where two main contributions are devoted in terms of object representation and location respectively. And the sparse representation and reconstruction (SR

2

) are integrated into a Kalman filter framework to form a robust object tracker named as SR

2

KF tracker. The extensive experiments show that the proposed tracker is able to tolerate the appearance variations, background clutter and image deterioration, and outperforms the existing work.

Zhenjun Han, Qixiang Ye, Jianbin Jiao

Sphere Detection in Kinect Point Clouds via the 3D Hough Transform

We introduce a fast, robust and accurate Hough Transform (HT) based algorithm for detecting spherical structures in 3D point clouds. To our knowledge, our algorithm is the first HT based implementation that detects spherical structures in typical in 3D point clouds generated by consumer depth sensors such as the Microsoft Kinect. Our approach has been designed to be computationally efficient; reducing an established limitation of HT based approaches. We provide experimental analysis of the achieved results, showing a robust performance against occlusion, and we show superior performance to the only other HT based algorithm for detecting spheres in point clouds available in literature.

Anas Abuzaina, Mark S. Nixon, John N. Carter

Watermark Optimization of 3D Shapes for Minimal Distortion and High Robustness

An optimization-based methodology is proposed in this paper preserving mesh surfaces in 3D watermarking. The Levenberg-Mar- quardt optimization algorithm is used for displacing the vertices according to the message to be embedded. A specific cost function is used by this method in order to ensure minimal surface distortion while the watermark would be enabled with high robustness to attacks. This cost function consists of three components representing the distance to the original surface, to the watermarked surface and the vertex displacement, respectively. The proposed methodology is statistical, blind and robust. A study of its crypto-security is provided as well in this research study.

Adrian G. Bors, Ming Luo

Wavelet Network and Geometric Features Fusion Using Belief Functions for 3D Face Recognition

One of the challenges in pattern recognition technologies, especially face recognition, is the ability to handle scenarios where subjects are non-cooperative in terms of position (face pose) or deformation (face expression). In this paper, we propose an innovative approach for 3D face recognition that combines heterogeneous features by using evidence theory based on belief functions. The first feature is generated via wavelet network algorithm, which approximates every face, by an optimal linear combination. The second feature models each facial surface by a collection of facial curves based on geodesic distance. The fusion procedure adopt a refined model of belief function based on the Deampster-Shafer rule in the context of confusion matrix. Experimental evaluation performed on subset of the FRGC v2 database, shows that the recognition rate increases with fusion of redundant and/or independent data. Further, the technique demonstrates robustness under different facial expressions.

Mohamed Anouar Borgi, Maher El’Arbi, Chokri Ben Amar

A Color-Based Selective and Interactive Filter Using Weighted TV

In this paper we propose to introduce a new color interactive and selective filtering tool based on the minimization of a weighted vectorial total variation term (

TV

g

) with the

L

2

norm as data term. Our goal is to filter one region of an image while preserving the other. To this end, we introduce color moments in the

TV

g

term. Up to now, color moments have been mainly introduced for indexation purposes. In our case, the user selects some points in the area to preserve and some other points in the area to be filtered. Reference color moments are then computed on patches around the selected points and are included in the

TV

g

term. Two main functions

g

are tested within the HSL color space leading to interesting results on both synthetic and real images. Convex optimization tools are used to solve the minimization issue and we take benefit of an augmented Lagrangian formulation and Uzawa block relaxation schemes. The proposed filtering tool may be interesting as a pre-processing step for segmentation, movie post-production or object-oriented compression.

Cédric Loosli, François Lecellier, Stéphanie Jehan-Besson, Jonas Koko

A Composable Strategy for Shredded Document Reconstruction

The reconstruction of shredded documents is of interest in domains such as forensics, investigative sciences and archaeology, and has therefore been approached in many different ways. This paper takes a step towards bridging the gap between previous, disparate, efforts by proposing a composable, probabilistic solution. The task is first divided into independent sub-problems, and novel approaches to several of these sub-problems are then presented. The theoretical properties and empirical performance of these novel approaches are shown to compare favourably to those of previously published methods.

Razvan Ranca, Iain Murray

A Global-Local Approach to Saliency Detection

In this paper, we present a novel approach to saliency detection. We define a visually salient region with the following two properties; global saliency i.e. the spatial redundancy, and local saliency i.e. the region complexity. The former is its probability of occurrence within the image, whereas the latter defines how much information is contained within the region, and it is quantified by the entropy. By combining the global spatial redundancy measure and local entropy, we can achieve a simple, yet robust saliency detector. We evaluate it quantitatively and compare to Itti et al. [6] as well as to the spectral residual approach [5] on publicly available data where it shows a significant improvement.

Ahmed Boudissa, JooKooi Tan, Hyoungseop Kim, Seiji Ishikawa, Takashi Shinomiya, Krystian Mikolajczyk

A Moving Average Bidirectional Texture Function Model

The Bidirectional Texture Function (BTF) is the recent most advanced representation of visual properties of surface materials. It specifies their appearance due to varying spatial, illumination, and viewing conditions. Corresponding enormous BTF measurements require a mathematical representation allowing extreme compression but simultaneously preserving its high visual fidelity. We present a novel BTF model based on a set of underlying mono-spectral two-dimensional (2D) moving average factors. A mono-spectral moving average model assumes that a stochastic mono-spectral texture is produced by convolving an uncorrelated 2D random field with a 2D filter which completely characterizes the texture. The BTF model combines several multi-spectral band limited spatial factors, subsequently factorized into a set of mono-spectral moving average representations, and range map to produce the required BTF texture space. This enables very high BTF space compression ratio, unlimited texture enlargement, and reconstruction of missing unmeasured parts of the BTF space.

Michal Havlíček, Michal Haindl

A Multiscale Blob Representation of Mammographic Parenchymal Patterns and Mammographic Risk Assessment

Mammographic parenchymal patterns have been found to be a strong indicator of breast cancer risk and play an important role in mammographic risk assessment. In this paper, a novel representation of mammographic parenchymal patterns is proposed, which is based on multiscale blobs. Approximately blob-like tissue patterns are detected over a range of scales and parenchymal patterns are represented as a set of blobs. Spatial relations between blobs are considered to reduce the overlap between connected dense tissue regions. Quantitative measures of breast density are calculated from the resulting blobs and used for mammographic risk assessment. The proposed approach is evaluated using the full MIAS database and a large dataset from the DDSM database. A high agreement with expert radiologists is indicated according to the BIRADS density classification. The classification accuracies for the MIAS and DDSM databases are up to 79.44% and 76.90%, respectively.

Zhili Chen, Liping Wang, Erika Denton, Reyer Zwiggelaar

Alternating Optimization for Lambertian Photometric Stereo Model with Unknown Lighting Directions

Photometric stereo is a technique of surface reconstruction using several object images made with a fixed camera position and varying illumination directions. Reconstructed surfaces can have complex reflecting properties which are unknown a priori and often simplified by Lambertian model (reflecting light uniformly in all directions). Such simplification leads to certain inaccuracy of reconstruction but in most cases is sufficient to obtain general object relief important for further recognition. Not only surface properties but also lighting sources utilized for each image acquisition can be very complex for modeling, or even unknown. Our work demonstrates how to find surface normals from Lambertian photometric stereo model using color images made with a priori unknown lighting directions. Evaluation of model components is based on an alternating optimization approach.

Khrystyna Kyrgyzova, Lorène Allano, Michaël Aupetit

An Automated Visual Inspection System for the Classification of the Phases of Ti-6Al-4V Titanium Alloy

Metallography is the science of studying the physical properties of metal microstructures, by means of microscopes. While traditional approaches involve the direct observation of the acquired images by human experts, Computer Vision techniques may help experts in the analysis of the inspected materials. In this paper we present an automated system to classify the phases of a Titanium alloy, Ti-6Al-4V. Our system has been tested to analyze the final products of a Friction Stir Welding process, to study the states of the microstructures of the welded material.

Antonino Ducato, Livan Fratini, Marco La Cascia, Giuseppe Mazzola

Analysis of Bat Wing Beat Frequency Using Fourier Transform

Computer vision techniques have been used extensively to automatically monitor human activities; however, applications for analysing animal behaviour are sparse. The analysis of bat behaviour in particular has attracted only one or two studies. Most existing work uses either expensive thermal imaging equipment, or bespoke sensors which are not accessible to field researchers, ecologists, and scientists studying behaviour. The work we present here uses spectral analysis techniques to quantify wingbeat frequency, using a single imaging device in low-light. We propose two modified techniques based on bounded box metrics, and similarity matrices, for measuring periodic and cyclical motion as a 1D time domain signal. These are transformed to the frequency domain using Short Time Fourier Transform (STFT). Finally we evaluate these techniques against the baseline algorithm proposed by Cutler and Davis [5], using expert-annotated ground-truth data.

John Atanbori, Peter Cowling, John Murray, Belinda Colston, Paul Eady, Dave Hughes, Ian Nixon, Patrick Dickinson

Automated Ground-Plane Estimation for Trajectory Rectification

We present a system to determine ground-plane parameters in densely crowded scenes where use of geometric features such as parallel lines or reliable estimates of agent dimensions are not possible. Using feature points tracked over short intervals, together with some plausible scene assumptions, we can estimate the parameters of the ground-plane to a sufficient degree of accuracy to correct usefully for perspective distortion. This paper describes feasibility studies conducted on controlled, simulated data, to establish how different levels and types of noise affect the accuracy of the estimation, and a verification of the approach on live data, showing the method can estimate ground-plane parameters, thus allowing improved accuracy of trajectory analysis.

Ian Hales, David Hogg, Kia Ng, Roger Boyle

Breast Parenchymal Pattern Analysis in Digital Mammography: Associations between Tabár and Birads Tissue Compositions

We investigated the associations between Tabár based breast parenchymal patterns and Birads density parenchymal patterns in digital mammography. Breast parenchymal texture was analysed on a set of mammographc images segmented based on Tabár tissue modelling. Visual assessment indicates good and anatomically improved segmentation on tissue specific areas. At the tissue modelling stage, over and/or under training can cause tissue composition fluctuation between nodular and homogeneous tissue, whilst the percentages of radiolucent tissue are less sensitive to the algorithm’s parameter configurations. The clear depiction with digital mammography allows a better segmentation on tissue specific areas, which indicates that the breast parenchymal texture may be utilised in mammographic interpretation as the new technology advances further. The average tissue compositions for the Tabár parenchymal patterns show inadequate compositions of nodular and homogeneous tissue. Stronger associates were found between Tabár tissue compositions for [nodular, homogeneous], [nodular, homogeneous, radiolucent] and Birads breast density classes I and IV.

Wenda He, Reyer Zwiggelaar

Color Transfer Based on Earth Mover’s Distance and Color Categorization

Color transfer aims at changing an original image’s color theme to a target one while preserving the original image’s geometric structure information. In recent studies of color transfer, there are mainly two flaws. The first one is that the original image can only transfer color with a certain type of target images. The second one is that the color transformation always brings artefacts, noises, and graininess which make the result images unnatural. In this paper, we present a novel variational-based model to solve the problems effectively. A variety of experiments and comparison with previous research have proved that this model perform perfect and robust outcome.

Wenya Feng, Yilin Guo, Okhee Kim, Yonggan Hou, Long Liu, Huiping Sun

Empirical Comparison of Visual Descriptors for Multiple Bleeding Spots Recognition in Wireless Capsule Endoscopy Video

Wireless Capsule Endoscopy (WCE) is the latest technology able to screen intestinal anomalies at early stage. Although its convenience to the patient and its effectiveness to show small intestinal details, the physician diagnosis remains not straight forward and time consuming. Thus, a computer aid diagnosis would be helpful. In this paper, we focus on The Multiple Bleeding Spots (MBS) anomaly. We propose to conduct an empirical evaluation of four feature descriptors in a the challenging problem of MBS recognition on WCE video using the SVM classifier. The performance of the four descriptors is based on the assessment of the performance of the output of the SVM classifier.

Sarah Alotaibi, Sahar Qasim, Ouiem Bchir, Mohamed Maher Ben Ismail

Exploring Interest Points and Local Descriptors for Word Spotting Application on Historical Handwriting Images

Recently, local features especially point descriptors have received lots of interest in the computer vision and image processing communities. SIFT and SURF descriptors have shown their powerful usefulness on natural object recognition and classification. However, the use of local descriptors such as SIFT and SURF is still not very common in handwritten document image analysis now. In this paper, we propose an investigation on the description of handwriting by applying different interest points and local descriptors on historical handwritten images in the context of a coarse-to-fine segmentation-free word spotting method. The observation and analysis based on the experimental results can help optimizing the description of handwriting according to different applications.

Peng Wang, Véronique Eglin, Christine Largeron, Antony McKenna, Christophe Garcia

Gravitational Based Texture Roughness for Plant Leaf Identification

The analysis and identification of plant leaves is a difficult task. Among the many features available for its identification, texture pattern is one of the most important. In this work we propose to explore texture information from a plant leaf by converting it into a simplified dynamical system in gravitational collapse. We use complexity estimates, such as fractal dimension and lacunarity, to describe the states of gravitation collapse of the system and, as a consequence, the texture itself. We also compare our approach to other classical texture analysis methods in a plant leaf dataset.

Jarbas J. de M. Sá Junior, André R. Backes, Paulo César Cortez

Heterogeneity Index for Directed Graphs

Although there are a number of existing measures for quantifying the structural properties of undirected graphs, there are relatively few corresponding measures for directed graphs. To fill this gap in the literature, in this paper, we explore how to extend Estrada’s heterogeneity index from undirected to directed graphs and define an analogous heterogeneity measure for directed graphs. From the perspective of object recognition, this measure opens up the possibility of using directed graphs (such as nearest neighbour graphs) to represent the arrangement of object features. This type of representation is potentially more discriminating than an undirected graph. We show how our new heterogeneity measure can be used to characterize k-nearest neighbour graphs representing the arrangement of object features extracted from objects in the COIL-20 database. We compare the performance of this measure with the original Estrada’s heterogeneity index. Finally we achieve the conclusion that our measure gives a better characterization performance.

Cheng Ye, Richard C. Wilson, Edwin R. Hancock

High-Precision Lens Distortion Correction Using Smoothed Thin Plate Splines

Lens distortion and its modelling is an important factor for the calibration of optical cameras. Most calibration algorithms include a distortion model to cope with the discrepancy to a pinhole camera model induced by the camera lenses. However, for high-precision calibration sophisticated distortion models have to be used and their often numerous parameters have to be determined during calibration. In this work we present a simple, nonparametric method based on smoothed thin plate splines for correcting the lens distortion with a very high precision.

Sönke Schmid, Xiaoyi Jiang, Klaus Schäfers

Identification Using Encrypted Biometrics

Biometric identification is a challenging subject among computer vision scientists. The idea of substituting biometrics for passwords has become more attractive after powerful identification algorithms have emerged. However, in this regard, the confidentiality of the biometric data becomes of a serious concern. Biometric data needs to be securely stored and processed to guarantee that the user privacy and confidentiality is preserved. In this paper, a method for biometric identification using encrypted biometrics is presented, where a method of search over encrypted data is applied to manage the identification. Our experiments of facial identification demonstrate the effective performance of the system with a proven zero information leakage.

Mohammad Haghighat, Saman Zonouz, Mohamed Abdel-Mottaleb

Illumination Effects in Quantitative Virtual Microscopy

For quantitative virtual microscopy to be accepted into clinical practice, a virtual image has to be a ‘glass faithful’ representation of the underlying cellular objects, unaffected by artefacts such as illumination or optical distortion. In this paper we present experimental results from systematic measurements of features from calibration slides at different locations in the field-of-view. Our results show that measurements differ slightly from the expected values. However the values in the different locations are similar confirming the efficacy of virtual microscopy as objects can be measured independent of their location.

Doreen Altinay, Andrew P. Bradley

Improving the Correspondence Establishment Based on Interactive Homography Estimation

We present a method to find the correspondences between salient points of images in which an oracle can interact and influence on the final correspondences between points. The oracle looks up the current point correspondences and imposes new mappings or modifies some of them. This interaction influences on two aspects. On one hand, a new homography is computed such that the Least Square Error is minimized on the imposed correspondences. On the other hand, the Similarity Matrix between the set of points is modified and so, the interaction influences on the output of the Correspondence Algorithm. The method is independent of the algorithm to compute the homography and the correspondences. Practical evaluation shows that in few interactions of the oracle, the optimal correspondence is achieved.

Xavier Cortés, Carlos Moreno, Francesc Serratosa

Interactive Segmentation of Media-Adventitia Border in IVUS

In this paper, we present an approach for user assisted segmentation of media-adventitia border in IVUS images. This interactive segmentation is performed by a combination of point based soft constraint on object boundary and stroke based regional constraint. The edge based boundary constraint is imposed through searching the shortest path in a three-dimensional graph, derived from a multi-layer image representation. The user points act as attraction points and are treated as soft constraints, rather than hard constraints that the segmented boundary has to pass through the user specified points. User can also use strokes to specify foreground (region of interest). The probabilities of region of interest for each pixel are then calculated and their discontinuity is used to indicate object boundary. This combined approach is formulated as an energy minimization problem that is solved using a shortest path search algorithm. We show that this combined approach allows efficient and effective interactive segmentation, which is demonstrated through identifying media-adventitia border in IVUS images where image artifact, such as acoustic shadow and calcification, are common place. Both qualitative and quantitative analysis are provided based on manual labeled datasets.

Jonathan-Lee Jones, Ehab Essa, Xianghua Xie, Dave Smith

Kernel Maximum Mean Discrepancy for Region Merging Approach

Kernel methods are becoming increasingly challenging for use in a wide variety of computer vision applications. This paper introduces the use of Kernel Maximum Mean Discrepancy (KMMD) for region merging process. KMMD is a recent unsupervised kernel-based method commonly used in analysing and comparing distributions. We propose a region merging approach based on the KMMD framework which aims at improving the quality of an initial segmentation result. The performance of the proposed method has been compared with four states of the art region merging methods over a test of Berkeley image segmentation data set by means of the probabilistic rand index and variation of information errors. Experiments show that our approach succeeds in achieving a segmentation quality equal to or greater than the referenced methods.

Alya Slimene, Ezzeddine Zagrouba

Laplacian Derivative Based Regularization for Optical Flow Estimation in Driving Scenario

Existing state of the art optical flow approaches, which are evaluated on standard datasets such as Middlebury, not necessarily have a similar performance when evaluated on driving scenarios. This drop on performance is due to several challenges arising on real scenarios during driving. Towards this direction, in this paper, we propose a modification to the regularization term in a variational optical flow formulation, that notably improves the results, specially in driving scenarios. The proposed modification consists on using the Laplacian derivatives of flow components in the regularization term instead of gradients of flow components. We show the improvements in results on a standard real image sequences dataset (KITTI).

Naveen Onkarappa, Angel D. Sappa

Local and Global Statistics-Based Explicit Active Contour for Weld Defect Extraction in Radiographic Inspection

Welding is a process of utmost importance in the metal industry. With the advances in computer science and artificial intelligence techniques, the opportunity to develop computer aided technique for radiographic inspection in Non Destructive Testing arose. This paper deals with the weld defects detection in radiographic films. A greedy active contour model is used exploiting global and local statistics to drive the model to the boundaries. Moreover, and to decrease the computation cost, the local statistics computation is done only for pixels in a selected band. Results seem to be promising ones.

Aicha Baya Goumeidane, Nafaa Nacereddine, Mohammed Khamadja

Minimum Entropy Models for Laser Line Extraction

Minimum entropy model can find the optimal gray space for laser line extraction. A global model named Minimum Entropy Deconvolution is established to search for the peaks which constitute the laser line. Not only does it reach a high accuracy, but also it retains the line smoothness, which the previous work often paid little attention to. Besides, this work could extract several laser lines. Experimental results show that the robust models and fast algorithms outperform the compared.

Wei Yang, Liguo Zhang, Wei Ke, Ce Li, Jianbin Jiao

A Convenient and Fast Method of Endoscope Calibration under Surgical Environment

How to get the calibration parameters of an endoscopic camera online is one of the most important steps in the three-dimensional reconstruction and draws great attention from researchers. One of general approach requires two persons and one accurate pre-printed checkerboard, which is not desirable in actual use. In this paper, a convenient endoscope calibration method is presented. Instead of preparing a checkerboard in advance, computer display screen is used as a flexible calibrating-board to display checkerboards with different sizes to fill the endoscopic view. Instead of holding the checkerboard with different posture, the user could hold the endoscopic camera in front of the screen-displayed calibrating-board in various orientations. The whole calibration process can be implemented by just one person, which is of practical significance in real operating room. The experimental results show that such method is time-saving without additional manual intervention. The precision of calibration can be achieved at sub-pixel.

Meiqing Liu, Dayuan Yan, Xiaoming Hu, Ya Zhou, Zhaoguo Wu

SAMSLAM: Simulated Annealing Monocular SLAM

This paper proposes a novel monocular SLAM approach. For a triplet of successive keyframes, the approach inteleaves the registration of the three 3D maps associated to each image pair in the triplet and the refinement of the corresponding poses, by progressively limiting the allowable reprojection error according to a simulated annealing scheme. This approach computes only local overlapping maps of almost constant size, thus avoiding problems of 3D map growth. It does not require global optimization, loop closure and back-correction of the poses.

Marco Fanfani, Fabio Bellavia, Fabio Pazzaglia, Carlo Colombo

Spatial Patch Blending for Artefact Reduction in Pattern-Based Inpainting Techniques

Patch-based (or “pattern-based”) inpainting, is a popular processing technique aiming at reconstructing missing regions in images, by iteratively duplicating blocks of known image data (patches) inside the area to fill in. This kind of method is particularly effective to process wide image areas, thanks to its ability to reconstruct textured data. Nevertheless, “pathological” geometric configurations often happen, leading to visible reconstruction artefacts on inpainted images, whatever the chosen pattern-based inpainting algorithm. In this paper we focus on these problematic cases and propose a generic

spatial-blending technique

that can be adapted to any type of patch-based inpainting methods in order to reduce theses problematic artefacts.

Maxime Daisy, David Tschumperlé, Olivier Lézoray

Spatio-temporal Support for Range Flow Based Ego-Motion Estimators

A real-time

range flow

based ego-motion estimator for a moving depth sensor is presented. The estimator recovers the translation and rotation components of a sensor’s motion and integrates these temporally. To ensure accurate inter-frame motion estimates, an iterative form of the estimator is developed. To minimise drift in the pose, additional temporal constraint is provided through the use of

anchor frames

. The algorithm is evaluated on the recently published TUB RGB-D Benchmark. Performance is commensurate with alternative methodologies such as SLAM but at a fraction of the computational cost.

Graeme A. Jones, Gordon Hunter

Tracking for Quantifying Social Network of Drosophila Melanogaster

We introduce a simple, high performance and fast computer vision algorithm (Flytracker) for quantifying the social network of Drosophila Melanogaster. FlyTracker is fully automated software for detecting and tracking multiple flies simultaneous using low resolution video footage. These videos were acquired using Flyworld, a dedicated imaging platform. The developed algorithm segments and tracks the flies over time. From the obtained tracks, features for each fly are derived, allowing quantitative analysis of fly behavior. These features include location, orientation and time of interaction, and allow the quantification of fly-interactions. These social interactions, when computed in a group, form a social network, from which we can infer transient social interactions. To test FlyTracker, it is compared to current state of the art software for fly tracking. Results show that FlyTracker is able to track the flies in low resolution with better accuracy and thus providing an aid in quantifying their social network.

Tanmay Nath, Guangda Liu, Barbara Weyn, Bassem Hassan, Ariane Ramaekers, Steve De Backer, Paul Scheunders

Virtual Top View: Towards Real-Time Agregation of Videos to Monitor Large Areas

Currently, large areas are continuously monitored by camera networks, whereas an overall situation assessment within a reasonable time is a crucial requirement. In this paper, we propose our Virtual Top View (VTV) approach that provides a clear, concise and direct interpretation of on-field activities in real-time preserving the spatial relationship and, technically, employs planar homography to aggregate the Virtual Top View out of multiple, individual video streams. With an increasing number of cameras or size of the monitored area, the aggregation process slows down. Therefore, we develop acceleration methods (autogenerated warp maps) to achieve a real-time aggregation within large camera networks. Finally, we evaluate the performance and demonstrate our approach in an intra-logistics environment.

Hagen Borstell, Saira Saleem Pathan, Michael Soffner, Klaus Richter

Writer Identification in Old Music Manuscripts Using Contour-Hinge Feature and Dimensionality Reduction with an Autoencoder

Although most of the previous studies in writer identification in music scores assumed successful prior staff-line removal, this assumption does not hold when the music scores suffer from a certain level of degradation or deformation. The impact of staff-line removal on the result of writer identification in such documents is rather vague. In this study, we propose a novel writer identification method that requires no staff-line removal and no segmentation. Staff-line removal is virtually achieved without image processing, by dimensionality reduction with an autoencoder in Contour-Hinge feature space. The experimental result with a wide range of music manuscripts shows the proposed method can achieve favourable results without prior staff-line removal.

Masahiro Niitsuma, Lambert Schomaker, Jean-Paul van Oosten, Yo Tomita

Human Action Recognition Using Temporal Segmentation and Accordion Representation

In this paper, we propose a novel motion descriptor Seg-SIFT-ACC for human action recognition. The proposed descriptor is based both on the accordion representation of the video and its temporal segmentation into elementary motion segments. The accordion representation aims to put in space adjacency the columns of the video frames having a high temporal correlation. For complex videos containing many different elementary actions, the accordion representation may put in spatial adjacency temporally correlated pixels that belong to different elementary actions. To surmount this problem, we divide the video into elementary motions segments and we apply the accordion representation on each one separately.

Manel Sekma, Mahmoud Mejdoub, Chokri Ben Amar

Effective Diversification for Ambiguous Queries in Social Image Retrieval

Recent years have witnessed a great popularity of social photos sharing websites, which host a tremendous volume of digital images accompanied by their associated tags. Thus, extensive research efforts have been dedicated to tag-based social image search which enables users to formulate their queries using tags. However, tag queries are often ambiguous and typically short. Search results diversification approach is the common solution which aims to increase the number of satisfied users using only a single results set that cover the maximum of query aspects. However, not all queries are uniformly ambiguous and hence different diversification strategies might be suggested. In such context, we propose a new ranking process which dynamically predicts an effective trade-off between the relevance and diversity based results ranking according to the ambiguity level of a given query. Thorough experiments using 12 ambiguous queries over the NUS-WIDE dataset show the effectiveness of our approach over classical uniform diversification approaches.

Amel Ksibi, Ghada Feki, Anis Ben Ammar, Chokri Ben Amar

Springer Professional

About this book

Table of Contents

Frontmatter