On-The-Fly Handwriting Recognition Using a High-Level Representation

Automatic handwriting recognition plays a crucial role because writing with a pen is the most common and natural input method for humans. Whereas many algorithms detect the writing after finishing the input, this paper presents a handwriting recognition system that processes the input data during writing and thus detects misspelled characters on the fly from their origin.

The main idea of the recognition is to decompose the input data into defined structures. Each character can be composed out of the structures point, line, curve, and circle. While the user draws a character, the digitized points of the pen are processed successively, decomposed into structures, and classified with the help of samples. The intermediate classification allows a direct feedback to the user as soon as the input differs from a given character.

C. Reinders, F. Baumann, B. Scheuermann, A. Ehlers, N. Mühlpforte, A. O. Effenberg, B. Rosenhahn

What Is in Front? Multiple-Object Detection and Tracking with Dynamic Occlusion Handling

This paper proposes a multiple-object detection and tracking method that explicitly handles dynamic occlusions. A context-based multiple-cue detector is proposed to detect occluded vehicles (occludees). First, we detect and track fully-visible vehicles (occluders). Occludee detection adopts those occluders as priors. Two classifiers for partially-visible vehicles are trained to use appearance cues. Disparity is adopted to further constrain the occludee locations. A detected occludee is then tracked by a Kalman-based tracking-by-detection method. As dynamic occlusions lead to role changes for occluder or occludee, an integrative module is introduced for possibly switching occludee and occluder trackers. The proposed system was tested on overtaking scenarios. It improved an occluder-only tracking system by over 10% regarding the frame-based detection rate, and by over 20% regarding the trajectory detection rate. The occludees are detected and tracked in the proposed method up to 7 seconds before they are picked up by occluder-only method.

Junli Tao, Markus Enzweiler, Uwe Franke, David Pfeiffer, Reinhard Klette

Correlating Words - Approaches and Applications

The determination of characteristic and discriminating terms as well as their semantic relationships plays a vital role in text processing applications. As an example, term clustering techniques heavily rely on this information. Classic approaches for this means such as statistical co-occurrence analysis however usually only consider relationships between two terms that co-occur as immediate neighbours or on sentence level. This article presents flexible approaches to find statistically significant correlations between two or more terms using co-occurrence windows of arbitrary sizes. Their applicability will be discussed in detail by presenting solutions to improve the interactive and image-based search in the World Wide Web. Moreover, approaches to determine directed term associations and applications for them will be explained, too.

Mario M. Kubek, Herwig Unger, Jan Dusik

ExCuSe: Robust Pupil Detection in Real-World Scenarios

The reliable estimation of the pupil position is one the most important prerequisites in gaze-based HMI applications. Despite the rich landscape of image-based methods for pupil extraction, tracking the pupil in real-world images is highly challenging due to variations in the environment (e.g. changing illumination conditions, reflection, etc.), in the eye physiology or due to variations related to further sources of noise (e.g., contact lenses or mascara). We present a novel algorithm for robust pupil detection in real-world scenarios, which is based on edge filtering and oriented histograms calculated via the Angular Integral Projection Function. The evaluation on over 38,000 new, hand-labeled eye images from real-world tasks and 600 images from related work showed an outstanding robustness of our algorithm in comparison to the state-of-the-art. Download link (algorithm and data):

https://www.ti.uni-tuebingen.de/Pupil-detection.1827.0.html?&L=1

.

Wolfgang Fuhl, Thomas Kübler, Katrin Sippel, Wolfgang Rosenstiel, Enkelejda Kasneci

Textured Object Recognition: Balancing Model Robustness and Complexity

When it comes to textured object modelling, the standard practice is to use a multiple views approach. The numerous views allow reconstruction and provide robustness to viewpoint change but yield complex models. This paper shows that robustness with lighter models can be achieved through robust descriptors. A comparison between various descriptors allows choosing the one providing the best viewpoint robustness, in this case the ASIFT descriptor. Then, using this descriptor, the results show, for a wide variety of object shapes, that as few as seventeen views provide a high level of robustness to viewpoint change while being fast to process and having a small memory footprint. This work concludes advocating in favour of modelling methods using robust descriptors and a small number of views.

Guido Manfredi, Michel Devy, Daniel Sidobre

Review of Methods to Predict Social Image Interestingness and Memorability

An entire industry has developed around keyword optimization for ad buyers. However, social media landscape has shift to a photo driven behavior and there is a need to overcome the challenge to analyze all this large amount of visual data that users post in internet. We will address this analysis by providing a review on how to measure image and video interestingness and memorability from content that is tacked spontaneously in social networks. We will investigate current state-of-the-art of methods analyzing social media images and provide further research directions that could be beneficial for both, users and companies.

Xesca Amengual, Anna Bosch, Josep Lluís de la Rosa

Predicting the Number of DCT Coefficients in the Process of Seabed Data Compression

The paper presents Discrete Cosine Transform-based compression method applied to data describing seabed topography. It is an improvement over the previously developed and described algorithms capable of variable compression ratio and a possibility to limit the maximal reconstruction error. The main objective is to find an optimal number of DCT coefficients representing a surface with an acceptable reconstruction accuracy. In the original approach the compression was performed in an iterative manner, where successive values were tested, yielding high computational cost and time overhead. The algorithm presented in this paper allows to predict a number of DCT coefficients based on characteristics of specific input surface. Such characteristics are statistical measures describing a complexity of the surface. The classification using simple, fast and easy to learn classifiers does not introduce additional computational overhead. Presented experiments performed on real data gathered by maritime office gave encouraging results. Developed method can be employed in modern data storage and management system handling seabed topographic data.

Paweł Forczmański, Wojciech Maleika

Recognition of Images Degraded by Gaussian Blur

We introduce a new theory of invariants to Gaussian blur. The invariants are defined in Fourier spectral domain by means of projection operators and, equivalently, in the image domain by means of image moments. The application of these invariants is in blur-invariant image comparison and recognition. The behavior of the invariants is studied and compared with other methods in experiments on both artificial and real blurred and noisy images.

Jan Flusser, Tomáš Suk, Sajad Farokhi, Cyril Höschl IV

Rejecting False Positives in Video Object Segmentation

False-positive removal is a necessary step for robust video object segmentation because of the presence of visual noise introduced by unavoidable factors such as background movements, light changes, artifacts, etc. In this paper we present a set of generic visual cues that enable the discrimination between true positives and false positives detected by a video object segmentation approach. The devised object features encode real-world object properties, such as shape regularity, marked boundaries, color and texture uniformity and motion continuity and can be used in a post-processing layer to reject false positives.

A thorough performance evaluation of the employed features and classifiers is carried out in order to identify which visual cues/classifier allow for a better separability between true and false positives. The experimental results, obtained on three challenging datasets, showed that a post-processing layer exploiting the devised visual features is able a) to reduce the false alarm rate by about 10% to 20%, while keeping the number of true positives almost unaltered, and b) to generalize over different object classes and application domains.

Daniela Giordano, Isaak Kavasidis, Simone Palazzo, Concetto Spampinato

Ground Truth Correspondence Between Nodes to Learn Graph-Matching Edit-Costs

The Graph Edit Distance is the most used distance between Attributed Graphs and it is composed of three main costs on nodes and arcs: Insertion, Deletion and Substitution. We present a method to learn the Insertion and Deletion costs of nodes and edges defined in the Graph Edit Distance, whereas, we define the Edit Cost Substitution data dependent and without parameters (for instance the Euclidean distance). In some applications, the ground truth of the correspondence between some pairs of graphs is available or can be easily deducted. The aim of the method we present is the learning process depends on these few available ground truth correspondences and not to the classification set that in some applications is not available. To learn these costs, the optimisation algorithm tends to minimise the Hamming distance between the ground truth correspondences and the automatically extracted node correspondences. We believe that minimising the Hamming distance makes the matching algorithm to find a good correspondence and so, to increase the recognition ratio of the classification algorithm in a pattern recognition application.

Xavier Cortés, Francesc Serratosa, Carlos Francisco Moreno-García

Recognising Familiar Facial Features in Paintings Belonging to Separate Domains

We present a system

$$^1$$

that detects faces in various paintings and subsequently recognise and points out any similarities that a certain face in one painting may have to another on a different artwork. The results would be ranked up according to similarity in a bid to produce an output that may assist art researchers to discover new links between different works which pertain to the same or different artist. Through various tests conducted, we have proved that our method was successful in exposing new links of similarity in various scenarios including cases where the human visual system failed to pinpoint any.

Wilbert Tabone, Dylan Seychell

Content Based Image Retrieval Based on Modelling Human Visual Attention

In this paper we propose to employ human visual attention models for content based image retrieval. This approach is called query by saliency content retrieval (QSCR) and considers visual saliency at both local and global image levels. Each image, from a given database, is segmented and specific features are evaluated locally for each of its regions. The global saliency is evaluated based on edge distribution and orientation. During the retrieval stage, the most similar images are retrieved by using an optimization approach such as the Earth Moving Distance (EMD) algorithm. The proposed method ranks the similarity between the query image and a set of given images based on their similarity in the features associated with the salient regions.

Alex Papushoy, Adrian G. Bors

Tensor-Directed Spatial Patch Blending for Pattern-Based Inpainting Methods

Despite the tremendous advances made in recent years, in the field of patch-based image inpainting algorithms, it is not uncommon to still get visible artefacts in the parts of the images that have been resynthetized using this kind of methods. Mostly, these artifacts take the form of discontinuities between synthetized patches which have been copied/pasted in nearby regions, but from very different source locations. In this paper, we propose a generic patch blending formalism which aims at strongly reducing this kind of artifacts. To achieve this, we define a tensor-directed anisotropic blending algorithm for neighboring patches, inspired somehow from what is done by anisotropic smoothing PDE’s for the classical image regularization problem. Our method has the advantage of blending/removing incoherent patch data while preserving the significant structures and textures as much as possible. It is really fast to compute, and adaptable to most patch-based inpainting algorithms in order to visually enhance the quality of the synthetized results.

Maxime Daisy, Pierre Buyssens, David Tschumperlé, Olivier Lézoray

A Novel Image Descriptor Based on Anisotropic Filtering

In this paper, we present a new image patch descriptor for object detection and image matching. The descriptor is based on the standard HoG pipeline. The descriptor is generated in a novel way, by embedding the response of an oriented anisotropic derivative half Gaussian kernel in the Histogram of Orientation Gradient (HoG) framework. By doing so, we are able to bin more curvature information. As a result, our descriptor performs better than the state of art descriptors such as SIFT, GLOH and DAISY. In addition to this, we repeat the same procedure by replacing the anisotropic derivative half Gaussian kernel with a computationally less complex anisotropic derivative half exponential kernel and achieve similar results. The proposed image descriptors using both the kernels are very robust and shows promising results for variations in brightness, scale, rotation, view point, blur and compression. We have extensively evaluated the effectiveness of the devised method with various challenging image pairs acquired under varying circumstances.

Darshan Venkatrayappa, Philippe Montesinos, Daniel Diep, Baptiste Magnier

A Novel Method for Simultaneous Acquisition of Visible and Near-Infrared Light Using a Coded Infrared-Cut Filter

This paper presents a novel image sensing method to enhance the sensitivity of a camera. Most image sensors used in commercial digital cameras are sensitive for both visible and infrared light. An IR-cut filter, that obstructs the infrared component of natural light, is used in such cameras to realize a similar color reproduction as for the human visual system. However, recent studies have shown that the near infrared light contains useful information to further enhance the visible image. This paper introduces a new sensing method by using a coded IR-cut filter to enable simultaneous capturing of NIR and visible light on a single image sensor. The coded IR-cut filter lets a fraction of the near infrared light pass and blocks out the rest. Both visible and near infrared light images can be separated from the sensor output when taking the diffraction of the NIR light into account. Experiments, using a synthesized image sensor output, demonstrate the validity of the method.

Kimberly McGuire, Masato Tsukada, Boris Lenseigne, Wouter Caarls, Masato Toda, Pieter Jonker

Scale-Space Clustering on a Unit Hypersphere

We present an algorithm for the scale-space clustering of a point cloud on a hypersphere in a higher-imensional Euclidean space. Our method achieves clustering by estimating the density distribution of the points in the linear scale space on the sphere. The algorithm regards the union of observed point sets as an image defined by the delta functions located at the positions of the points on the sphere. As numerical examples, we illustrate clustering on the 3-sphere

$$\mathbb {S}^3$$

in four-dimensional Euclidean space.

Yuta Hirano, Atsushi Imiya

Bokeh Effects Based on Stereo Vision

Bokeh, a sought-after photo rendering style of out-of-focus blur, typically aims at an aesthetic quality which is not available to low-end consumer grade cameras due to the lens design. We present a bokeh simulation method using stereo-vision techniques. We refine a depth map obtained with stereo matching with a little user interaction. A depth-aware bokeh effect is then applied with user-adjustable apertures sizes or shapes. Our method mainly aims at the visual quality of the bokeh effect rather than time efficiency. Experiments show that our results are natural-looking, and can be similar to the bokeh effect of a real-world bokeh-capable camera system.

Dongwei Liu, Radu Nicolescu, Reinhard Klette

Confidence Based Rank Level Fusion for Multimodal Biometric Systems

Multimodal biometric systems have proven advantages over single biometric systems as they are using multiple traits of users. The intra-class variance provided by using more than one trait results in a high identification rate. Still, one of the missing parts in a multimodal system is inattention to the discriminability of each rank list for each specific user. This paper introduces a novel approach to select a combination of rank lists in rank level so that it provides the highest discrimination for any specific query. The rank list selection is based on pseudo-scores lists that are created by combination of rank lists and resemblance probability distribution of users. The experimental results on a multimodal biometric system based on frontal face, profile face, and ear indicated higher identification rate by using novel confidence based rank level fusion.

Hossein Talebi, Marina L. Gavrilova

Optical Flow Computation with Locally Quadratic Assumption

The purpose of this paper is twofold. First, we develop a quadratic tracker which computes a locally quadratic optical flow field by solving a model-fitting problem for each point in its local neighbourhood. This local method allows us to select a region of interest for the optical flow computation. Secondly, we propose a method to compute the transportation of a motion field in long-time image sequences using the Wasserstein distance for cyclic distributions. This measure evaluates the motion coherency in an image sequence and detects collapses of smoothness of the motion vector field in an image sequence.

Tomoya Kato, Hayato Itoh, Atsushi Imiya

Pose Normalisation for 3D Vehicles

This study

$$^1$$

investigates the various pose normalisation techniques that can be used for 3D vehicle models. A framework is built on which the pose normalisation performance of four PCA based techniques are tested on a database of 335 3D vehicles. The evaluation is performed using two methods. In the first method a silhouette view of each pose normalised vehicle is rendered from a consitent point in the 3D space. The pose consitency of each vehicle is then compared to the silhouettes of the vehicles in the same category. The second method compares the direct influence of the four techniques on the final precision and recall results of a search algorithm based on a simple scan-line feature descriptor. Results from both methods show that Center-of-Gravity PCA and Continous-PCA performed noticably better then PCA and Normal-PCA. The superiority of Continous-PCA over Center-of-Gravity PCA was negligible.

Trevor Farrugia, Jonathan Barbarar

Multimodal Output Combination for Transcribing Historical Handwritten Documents

Transcription of digitalised historical documents is an interesting task in the document analysis area. This transcription can be achieved by using Handwritten Text Recognition (HTR) on digitalised pages or by using Automatic Speech Recognition (ASR) on the dictation of contents. Moreover, another option is using both systems in a multimodal combination to obtain a draft transcription, given that combining the outputs of different recognition systems will generally improve the recognition accuracy. In this work, we present a new combination method based on Confusion Network. We check its effectiveness for transcribing a Spanish historical book. Results on both unimodal combination with different optical (for HTR) and acoustic (for ASR) models, and multimodal combination, show a relative reduction of Word and Character Error Rate of

$$14.3\%$$

and

$$16.6\%$$

, respectively, over the HTR baseline.

Emilio Granell, Carlos-D. Martínez-Hinarejos

Unsupervised Surface Reflectance Field Multi-segmenter

An unsupervised, illumination invariant, multi-spectral, multi-resolution, multiple-segmenter for textured images with unknown number of classes is presented. The segmenter is based on a weighted combination of several unsupervised segmentation results, each in different resolution, using the modified sum rule. Multi-spectral textured image mosaics are locally represented by eight causal directional multi-spectral random field models recursively evaluated for each pixel. The single-resolution segmentation part of the algorithm is based on the underlying Gaussian mixture model and starts with an over segmented initial estimation which is adaptively modified until the optimal number of homogeneous texture segments is reached. The performance of the presented method is extensively tested on the Prague segmentation benchmark both on the surface reflectance field textures as well as on the static colour textures using the commonest segmentation criteria and compares favourably with several leading alternative image segmentation methods.

Michal Haindl, Stanislav Mikeš, Mineichi Kudo

A Dynamic Approach and a New Dataset for Hand-detection in First Person Vision

Hand detection and segmentation methods stand as two of the most most prominent objectives in First Person Vision. Their popularity is mainly explained by the importance of a reliable detection and location of the hands to develop human-machine interfaces for emergent wearable cameras. Current developments have been focused on hand segmentation problems, implicitly assuming that hands are always in the field of view of the user. Existing methods are commonly presented with new datasets. However, given their implicit assumption, none of them ensure a proper composition of frames with and without hands, as the

hand-detection

problem requires. This paper presents a new dataset for hand-detection, carefully designed to guarantee a good balance between positive and negative frames, as well as challenging conditions such as illumination changes, hand occlusions and realistic locations. Additionally, this paper extends a state-of-the-art method using a dynamic filter to improve its detection rate. The improved performance is proposed as a baseline to be used with the dataset.

Alejandro Betancourt, Pietro Morerio, Emilia I. Barakova, Lucio Marcenaro, Matthias Rauterberg, Carlo S. Regazzoni

Segmentation and Labelling of EEG for Brain Computer Interfaces

Segmentation and labelling of time series is a common requirement for several applications. A brain computer interface (BCI) is achieved by classification of time intervals of the electroencephalographic (EEG) signal and thus requires EEG signal segmentation and labelling. This work investigates the use of an autoregressive model, extended to a switching multiple modelling framework, to automatically segment and label EEG data into distinct modes of operation that may switch abruptly and arbitrarily in time. The applicability of this approach to BCI systems is illustrated on an eye closure dependent BCI and on a motor imagery based BCI. Results show that the proposed autoregressive switching multiple model approach offers a unified framework of detecting multiple modes, even in the presence of limited training data.

Tracey A. Camilleri, Kenneth P. Camilleri, Simon G. Fabri

Wood Veneer Species Recognition Using Markovian Textural Features

A mobile Android application that can automatically recognize wood species from a low quality mobile phone photo under varying illumination conditions is presented. The wood recognition is based on the Markovian, spectral, and illumination invariant textural features. The method performance was verified on a wood database, which contains veneers from sixty-six varied European and exotic wood species. The Markovian features improvement of the correct wood recognition rate is about 40% compared to the best alternative - the Local Binary Patterns features.

Michal Haindl, Pavel Vácha

Performance Analysis of Active Shape Reconstruction of Fractured, Incomplete Skulls

Reconstruction of normal skulls from deformed skulls is a very important but difficult task in practice. Active shape model (ASM) is among the most popular methods for reconstructing skulls. To apply ASM to skull reconstruction, it is necessary to establish shape correspondence among the training and testing samples because wrong correspondence will introduce unwanted shape variations in ASM reconstruction. Despite the popularity of ASM, the accuracy of ASM skull reconstruction has not been well investigated in existing literature. In particular, it is unclear how to estimate the reconstruction error of skulls without ground truth. This paper aims to investigate the source of error of ASM skull reconstruction. Comprehensive tests show that the error of accurate correspondence algorithm is uncorrelated and small compared to reconstruction error. On the other hand, ASM fitting error is highly correlated to reconstruction error, which allows us to estimate the reconstruction error of real deformed skulls using ASM fitting error. Moreover, ASM fitting error is correlated to the severity of skull defects, which places a limit on the reconstruction accuracy that can be achieved by ASM.

Kun Zhang, Wee Kheng Leow, Yuan Cheng

Content Extraction from Marketing Flyers

The rise of online shopping has hurt physical retailers, which struggle to persuade customers to buy products in physical stores rather than online. Marketing flyers are a great mean to increase the visibility of physical retailers, but the unstructured offers appearing in those documents cannot be easily compared with similar online deals, making it hard for a customer to understand whether it is more convenient to order a product online or to buy it from the physical shop. In this work we tackle this problem, introducing a content extraction algorithm that automatically extracts structured data from flyers. Unlike competing approaches that mainly focus on textual content or simply analyze font type, color and text positioning, we propose novel and more advanced visual features that capture the properties of graphic elements typically used in marketing materials to attract the attention of readers towards specific deals, obtaining excellent results and a high language and genre independence.

Ignazio Gallo, Alessandro Zamberletti, Lucia Noce

Puzzle Approach to Pose Tracking of a Rigid Object in a Multi Camera System

Optical tracking is a large field of research with countless sophisticated methods for a multitude of applications. However, there always exist tasks with special requirements and constraints that are not covered by traditional methods. This work presents a puzzle-based approach to tackle the problem of tracking all 6 degrees of freedom of a rigid object with few trackable features using a multi camera system. The presented algorithm capitalizes on non-sequential processing to assemble tracking information bit by bit. Validation shows that it achieves very high accuracy on real data.

Sönke Schmid, Xiaoyi Jiang, Klaus Schäfers

Adaptive Information Selection in Images: Efficient Naive Bayes Nearest Neighbor Classification

We propose different methods for adaptively selecting information in images during object recognition. In contrast to standard feature selection, we consider this problem in a Bayesian framework where features are sequentially selected based on the current belief distribution over object classes. We define three different selection criteria and provide efficient Monte Carlo algorithms for the selection. In particular, we extend the successful Naive Bayes Nearest Neighbor (NBNN) classification approach, which is very costly to compute in its original form. We show that the proposed information selection methods result in a significant speed-up because only a small number of features needs to be extracted for accurate classification. In addition to adaptive methods based on the current belief distribution, we also consider image-based selection methods and we evaluate the performance of the different methods on a standard object recognition data set.

Thomas Reineking, Tobias Kluth, David Nakath

The Brightness Clustering Transformand Locally Contrasting Keypoints

In recent years a new wave of feature descriptors has been presented to the computer vision community, ORB, BRISK and FREAK amongst others. These new descriptors allow reduced time and memory consumption on the processing and storage stages of tasks such as image matching or visual odometry, enabling real time applications. The problem is now the lack of fast interest point detectors with good repeatability to use with these new descriptors. We present a new blob-detector which can be implemented in real time and is faster than most of the currently used feature-detectors. The detection is achieved with an innovative non-deterministic low-level operator called the

Brightness Clustering Transform

(BCT). The BCT can be thought as a coarse-to-fine search through scale spaces for the true derivative of the image; it also mimics trans-saccadic perception of human vision. We call the new algorithm

Locally Contrasting Keypoints

detector or

LOCKY

. Showing good repeatability and robustness to image transformations included in the Oxford dataset, LOCKY is amongst the fastest affine-covariant feature detectors.

J. Lomeli-R, Mark S. Nixon

Feature Evaluation with High-Resolution Images

The extraction of scale invariant image features is a fundamental task for many computer vision applications. Features are localized in the scale space of the image. A descriptor is build for each feature which is used to determine the correspondence to a second feature, usually extracted from a second image. For the evaluation of detectors and descriptors, benchmark image sets are used. The benchmarks consist of image sequences and homographies which determine the ground truth for the mapping between the images. The repeatability criterion evaluates the detection accuracy of the detectors while precision and recall measure the quality of the descriptors.

Current data sets provide images with resolutions of less than one megapixel. A recent data set provides challenging images and highly accurate homographies. It allows for the evaluation at different image resolutions with the same scene content. Thus, the scale invariant properties of the extracted features can be examined. This paper presents a comprehensive evaluation of state of the art detectors and descriptors on this data set. The results show significant differences compared to the standard benchmark. Furthermore, it is shown that some detectors perform differently on different resolutions. It follows that high resolution images should be considered for future feature evaluations.

Kai Cordes, Lukas Grundmann, Jörn Ostermann

Fast Re-ranking of Visual Search Results by Example Selection

In this paper we present a simple, novel method to use state-of-the-art image concept detectors and publicly available image search engines to retrieve images for semantically more complex queries from local databases without re-indexing of the database. Our low-key, data-driven method for associative recognition of unknown, or more elaborate, concepts in images allows user selection of visual examples to tailor query results to the typical preferences of the user. The method is compared with a baseline approach using ConceptNet-based semantic expansion of the query phrase to known concepts, as set by the concepts of the image concept detectors. Using the output of the image concept detector as index for all images in the local image database, a quick nearest-neighbor matching scheme is presented that can match queries swiftly via concept output vectors. We show preliminary results for a number of query phrases followed by a general discussion.

John Schavemaker, Martijn Spitters, Gijs Koot, Maaike de Boer

Egomotion Estimation and Reconstruction with Kalman Filters and GPS Integration

This paper presents an approach for egomotion estimation over stereo image sequences combined with extra GPS data. The accuracy of the estimated motion data is tested with 3D roadside reconstruction. Our proposed method follows the traditional flowchart of many visual odometry algorithms: it firstly establishes the correspondences between the keypoints of every two frames, then it uses the depth information from the stereo matching algorithms, and it finally computes the best description of the cameras’ motion. However, instead of simply using keypoints from consecutive frames, we propose a novel technique that uses a set of augmented and selected keypoints, which are carefully tracked by a Kalman filter fusion. We also propose to use the GPS data for each key frame in the input sequence, in order to reduce the positioning errors of the estimations, so that the drift errors could be corrected at each key frame. Finally, the overall growth of the build-up errors can be bounded within a certain range. A least-squares process is used to minimise the reprojection error and to ensure a good pair of translation and rotation measures, frame by frame. Experiments are carried out for trajectory estimation, or combined trajectory and 3D scene reconstruction, using various stereo-image sequences.

Haokun Geng, Hsiang-Jen Chien, Radu Nicolescu, Reinhard Klette

Bundle Adjustment with Implicit Structure Modeling Using a Direct Linear Transform

Bundle adjustment (BA) is considered to be the “golden standard” optimisation technique for multiple-view reconstruction over decades of research. The technique simultaneously tunes camera parameters and scene structure to fit a nonlinear function, in a way that the discrepancy between the observed scene points and their reprojections are minimised in a least-squares manner. Computational feasibility and numerical conditioning are two major concerns of todays BA implementations, and choosing a proper parametrization of structure in 3D space could dramatically improve numerical stability, convergence speed, and cost of evaluating Jacobian matrices. In this paper we study several alternative representations of 3D structure and propose an implicit modeling approach based on a Direct Linear Transform (DLT) estimation. The performances of a variety of parametrization techniques are evaluated using simulated visual odometry scenarios. Experimental results show that the computational cost and convergence speed is further improved to achieve similar accuracy without explicit adjustment over the structure parameters.

Hsiang-Jen Chien, Haokun Geng, Reinhard Klette

Efficient Extraction of Macromolecular Complexes from Electron Tomograms Based on Reduced Representation Templates

Electron tomography is the most widely applicable method for obtaining 3D information by electron microscopy. In the field of biology it has been realized that electron tomography is capable of providing a complete, molecular resolution three-dimensional mapping of entire proteoms. However, to realize this goal, information needs to be extracted efficiently from these tomograms. Owing to extremely low signal-to-noise ratios, this task is mostly carried out manually. Standard template matching approaches tend to generate large amounts of false positives. We developed an alternative method for feature extraction in biological electron tomography based on reduced representation templates, approximating the search model by a small number of anchor points used to calculate the scoring function. Using this approach we see a reduction of about 50% false positives with matched-filter approaches to below 5%. At the same time, false negatives stay below 5%, thus essentially matching the performance one would expect from human operators.

Xiao-Ping Xu, Christopher Page, Niels Volkmann

Gradients and Active Contour Models for Localization of Cell Membrane in HER2/neu Images

The paper presents an application of the snake model to recognition of the cell membrane in the HER2 breast and kidney cancer images. It applies the modified snake to build the system recognizing the membrane and associating it with the neighboring cell. We study different forms of gradient estimation, the core point in the snake model. The particle swarm optimization algorithm is used in tuning the parameters of the snake model. On the basis of the applied procedure the estimation of the membrane continuity of cell is made. The experimental results performed on 100 cells in breast and 100 cells in kidney cancers have shown high accuracy of the membrane localizations and acceptable agreement with the expert estimations.

Marek Wdowiak, Tomasz Markiewicz, Stanislaw Osowski, Janusz Patera, Wojciech Kozlowski

Combination Photometric Stereo Using Compactness of Albedo and Surface Normal in the Presence of Shadows and Specular Reflection

We present a novel combination photometric stereo which can estimate surface normals precisely even for images including shadows and specular reflection. We can use photometric stereo if there are more than three input images. Therefore we can employ photometric stereo with

$$_n C_3$$

combinations for

n

input images. We make 3D distribution of albedos and surface normals estimated from pixel intensities of

$$_n C_3$$

pixel combinations. In the distribution, we define a novel value “compactness” to distinguish pixels which are included in neither shadows nor specular reflection from pixels which are included in shadows or specular reflection. Through experimental results, we demonstrate that the proposed method can estimate surface normals in the presence of shadows and specular reflection. Moreover the proposed method is superior to previous works in better accuracy.

Naoto Ienaga, Hideo Saito, Kouichi Tezuka, Yasumasa Iwamura, Masayoshi Shimizu

Craniofacial Reconstruction Using Gaussian Process Latent Variable Models

Craniofacial reconstruction aims at estimating the facial outlook associated to a skull. It can be applied in victim identification, forensic medicine and archaeology. In this paper, we propose a craniofacial reconstruction method using Gaussian Process Latent Variable Models (GP-LVM). GP-LVM is used to represent the skull and face skin data in a low dimensional latent space respectively. The mapping from the skull to face skin is built in the latent spaces by using least square support vector machine (LSSVM) regression model. Experimental results show that the GP-LVM latent space improves the representation of craniofacial data and boosts the reconstruction results compared with the methods in literature.

Zedong Xiao, Junli Zhao, Xuejun Qiao, Fuqing Duan

A High-Order Depth-Based Graph Matching Method

We recently proposed a novel depth-based graph matching method by aligning the depth-based representations of vertices. One drawback of the new method is that it only considers the structural co-relations, and the spatial co-relations of vertices are discarded. This drawback limits the performance of the method on graph-based image matching problems. To overcome the shortcoming, we develop a new high-order depth-based matching method, by incorporating the spatial coordinate information of vertices (i.e., the pixel coordinates of vertices in original images). The new matching method is based on a high order dominant cluster analysis [

1

]. We use the new high-order matching method to identify the mismatches in the original first-order depth-based matching results, and remove the incorrect matches. Experiments on real world image databases demonstrate the effectiveness of our new high-order DB matching method.

Lu Bai, Zhihong Zhang, Peng Ren, Edwin R. Hancock

On Different Colour Spaces for Medical Colour Image Classification

Analysis of cells and tissues allow the evaluation and diagnosis of a vast number of diseases. Nowadays this analysis is still performed manually, involving numerous drawbacks, in particular the results accuracy heavily depends on the operator skills. Differently, the automated analysis by computer is performed quickly, requires only one image of the sample and provides precise results. In this work we investigate different texture descriptors extracted from medical images in different colour spaces. We compare these features in order to identify the features set able to properly classify medical images presenting different classification problems. Furthermore, we investigate different colour spaces to identify most suitable for this purpose. The feature sets tested are based on a generalization of some existent grey scale approaches for feature extraction to colour images. The generalization has been applied to the calculation of Grey-Level Co-Occurrence Matrix, Grey-Level Difference Matrix and Grey-Level Run-Length Matrix. Furthermore, we calculate Grey-Level Run-Length Matrix starting from the Grey-Level Difference Matrix. The resulting feature sets performances have been compared using the Support Vector Machine model. To validate our method we have used three different databases, HistologyDS, Pap-smear and Lymphoma, that present different medical problems and so they represent different classification problems. The obtained experimental results have showed that in general features extracted from the HSV colour space perform better than the other and that the best feature subset has been obtained from the generalized Grey-Level Co-Occurrence Matrix, demonstrating excellent performances for this purpose.

Cecilia Di Ruberto, Giuseppe Fodde, Lorenzo Putzu

SIFT Descriptor for Binary Shape Discrimination, Classification and Matching

In this work, we study efficiency of SIFT descriptor in discrimination of binary shapes. We also analyze how the use of

$$2-tuples$$

of SIFT keypoints can affect discrimination of shapes. The study is divided into two parts, the first part serves as a primary analysis where we propose to compute overlap of classes using SIFT and a majority vote of keypoints. In the second part, we analyze both classification and matching of binary shapes using SIFT and Bag of Features. Our empirical study shows that SIFT although being considered as a texture feature, can be used to distinguish shapes in binary images and can be applied to the classification of foreground’s silhouettes.

Insaf Setitra, Slimane Larabi

Where is My Cup? - Fully Automatic Detection and Recognition of Textureless Objects in Real-World Images

In this work, we propose a new method for fully automatic detection and recognition of textureless objects present in complex visual scenes. While most approaches only deal with shape matching, our approach considers objects both in terms of low-level features and high-level information, and represents objects’ view-based templates as trees. Multi-level matching increases algorithm robustness, while the new tree structure of the template reduces its computational burden. We have evaluated our algorithm on the CMU dataset consisting of objects under arbitrary viewpoints and in cluttered environment. Our proposed approach has shown excellent performance, outperforming state-of-the-art methods.

Joanna Isabelle Olszewska

Automatic Differentiation of u- and n-serrated Patterns in Direct Immunofluorescence Images

Epidermolysis bullosa acquisita (EBA) is a subepidermal autoimmune blistering disease of the skin. Manual u- and n-serrated patterns analysis in direct immunofluorescence (DIF) images is used in medical practice to differentiate EBA from other forms of pemphigoid. The manual analysis of serration patterns in DIF images is very challenging, mainly due to noise and lack of training of the immunofluorescence (IF) microscopists. There are no automatic techniques to distinguish these two types of serration patterns. We propose an algorithm for the automatic recognition of such a disease. We first locate a region where u- and n-serrated patterns are typically found. Then, we apply a bank of

B

-COSFIRE filters to the identified region of interest in the DIF image in order to detect ridge contours. This is followed by the construction of a normalized histogram of orientations. Finally, we classify an image by using the nearest neighbors algorithm that compares its normalized histogram of orientations with all the images in the dataset. The best results that we achieve on the UMCG publicly available data set is

$$84.6\%$$

correct classification, which is comparable to the results of medical experts.

Chenyu Shi, Jiapan Guo, George Azzopardi, Joost M. Meijer, Marcel F. Jonkman, Nicolai Petkov

Means of 2D and 3D Shapes and Their Application in Anatomical Atlas Building

This works deals with the concept of mean when applied to 2D or 3D shapes and with its applicability to the construction of digital atlases to be used in digital anatomy. Unlike numerical data, there are several possible definitions of the mean of a shape distribution and procedures for its estimation from a sample of shapes. Most popular definitions are based in the distance function or in the coverage function, each with its strengths and limitations. Closely related to the concept of mean shape is the concept of atlas, here understood as a probability or membership map that tells how likely is that a point belongs to a shape drawn from the shape distribution at hand. We devise a procedure to build probabilistic atlases from a sample of similar segmented shapes using information simultaneously from both functions: the distance and the coverage. Applications of the method in digital anatomy are provided as well as experiments to show the advantages of the proposed method regarding state of the art techniques based on the coverage function.

Juan Domingo, Esther Dura, Guillermo Ayala, Silvia Ruiz-España

Optimized NURBS Curves Modelling Using Genetic Algorithm for Mobile Robot Navigation

This paper presents a new approach for solving one of the crucial robotic tasks: the global path planning problem. It consists in calculating the existing optimal path, for a non-point, non-holonomic robot, from start to goal position in terms of Non Uniform Rational B-Spline (NURBS) curve. With a priori knowledge of the environment and the robot characteristics (size and radius of curvature), the algorithm begins by selecting a set of control points derived from the shortest, collision-free polyline path. Then, an optimized NURBS curve modelling using Genetic Algorithm (GA) is introduced to replace that polyline path by a smooth curvature-constrained curve which avoids obstacles. Computer simulation studies demonstrate the effectiveness of the proposed method.

Sawssen Jalel, Philippe Marthon, Atef Hamouda

Robust Learning from Ortho-Diffusion Decompositions

This paper describes a new classification method based on modeling data by embedding diffusions into orthonormal decompositions of graph-based data representations. The training data is represented by an adjacency matrix calculated using either the correlation or the covariance of the training set. The application of the modified Gram-Schmidt orthonormal decomposition alternating with diffusion and data reduction stages, is applied recursively at each scale level. The diffusion process is strengthening the representation pattern of representative features. Meanwhile, noise is removed together with non-essential detail during the data reduction stage. The proposed methodology is shown to be robust when applied to face recognition considering low image resolution and corruption by various types of noise.

Sravan Gudivada, Adrian G. Bors

Filter-Based Approach for Ornamentation Detection and Recognition in Singing Folk Music

Ornamentations in music play a significant role for the emotion whi1ch a performer or a composer aims to create. The automated identification of ornamentations enhances the understanding of music, which can be used as a feature for tasks such as performer identification or mood classification. Existing methods rely on a pre-processing step that performs note segmentation. We propose an alternative method by adapting the existing two-dimensional COSFIRE filter approach to one-dimension (1D) for the automatic identification of ornamentations in monophonic folk songs. We construct a set of 1D COSFIRE filters that are selective for the 12 notes of the Western music theory. The response of a 1D COSFIRE filter is computed as the geometric mean of the differences between the fundamental frequency values in a local neighbourhood and the preferred values at the corresponding positions. We apply the proposed 1D COSFIRE filters to the pitch tracks of a song at every position along the entire signal, which in turn give response values in the range [0,1]. The 1D COSFIRE filters that we propose are effective to recognize meaningful musical information which can be transformed into symbolic representations and used for further analysis. We demonstrate the effectiveness of the proposed methodology in a new data set that we introduce, which comprises five monophonic Cypriot folk tunes consisting of 428 ornamentations. The proposed method is effective for the detection and recognition of ornamentations in singing folk music.

Andreas Neocleous, George Azzopardi, Christos N. Schizas, Nicolai Petkov

Vision-Based System for Automatic Detection of Suspicious Objects on ATM

Most skimming devices attached to an automatic teller machine (ATM) are similar in color and shape to the host machine, vision-based detection of such things is therefore difficult. A background subtraction method may be used to detect changes in a normal situation. However, without human detection, its background model is sometimes polluted by the ATM user, and the method cannot detect suspicious objects left in the scene. This paper proposes a real-time system which integrates (i) a simple image subtraction for detection of user arrival and departure, and (ii) an automatic detection of suspicious objects left on the ATM. The background model is updated only when no user is found, and used to detect suspicious objects based on a

guided

adaptive threshold. To avoid a detection miss, nonlinear enhancement is applied to amplify the intensity differences between foreign objects and host machine. Experimental results show that the proposed system increases correctly detected area by 13.21% compared with the fixed threshold method. It has no detection miss and false alarm either.

Wirat Rattanapitak, Somkiat Wangsiripitak

Towards Ubiquitous Autonomous Driving: The CCSAD Dataset

Several online real-world stereo datasets exist for the development and testing of algorithms in the fields of perception and navigation of autonomous vehicles. However, none of them was recorded in developing countries, and therefore they lack the particular challenges that can be found on their streets and roads, like abundant potholes, irregular speed bumpers, and peculiar flows of pedestrians. We introduce a novel dataset that possesses such characteristics. The stereo dataset was recorded in Mexico from a moving vehicle. It contains high-resolution stereo images which are complemented with direction and acceleration data obtained from an IMU, GPS data, and data from the car computer. This paper describes the structure and contents of our dataset files and presents reconstruction experiments that we performed on the data.

Roberto Guzmán, Jean-Bernard Hayet, Reinhard Klette

Discriminative Local Binary Pattern for Image Feature Extraction

Local binary pattern (LBP) is widely used to extract image features in various visual recognition tasks. LBP is formulated in quite a simple form and thus enables us to extract effective image features with a low computational cost. There, however, are some limitations mainly regarding sensitivity to noise and loss of image contrast information. In this paper, we propose a novel LBP-based image feature to remedy those drawbacks without degrading the simplicity of the original LBP formulation. Encoding local pixel intensities into binary patterns can be regarded as separating them into two modes (clusters). We introduce Fisher discriminant criterion to optimize the LBP coding for exploiting binary patterns stably and discriminatively with robustness to noise. Besides, image contrast information is incorporated in a unified way by leveraging the discriminant score as a weight on the corresponding binary pattern; thereby, the prominent patterns are emphasized. In the experiments on pedestrian detection, the proposed method exhibits superior performance compared to the ordinary LBP and the other methods, especially in the case of lower-dimensional features.

Takumi Kobayashi

A Homologically Persistent Skeleton is a Fast and Robust Descriptor of Interest Points in 2D Images

2D images often contain irregular salient features and interest points with non-integer coordinates. Our skeletonization problem for such a noisy sparse cloud is to summarize the topology of a given 2D cloud across all scales in the form of a graph, which can be used for combining local features into a more powerful object-wide descriptor.

We extend a classical Minimum Spanning Tree of a cloud to a Homologically Persistent Skeleton, which is scale-and-rotation invariant and depends only on the cloud without extra parameters. This graph

(1)

is computable in time

$$O(n\log n)$$

for any

n

points in the plane;

(2)

has the minimum total length among all graphs that span a 2D cloud at any scale and also have most persistent 1-dimensional cycles;

(3)

is geometrically stable for noisy samples around planar graphs.

Vitaliy Kurlin

A k-max Geodesic Distance and Its Application in Image Segmentation

The geodesic distance is commonly used when solving image processing problems. In noisy images, unfortunately, it often gives unsatisfactory results. In this paper, we propose a new

k

-max geodesic distance. The length of path is defined as the sum of the

k

maximum edge weights along the path. The distance is defined as the length of the path that is the shortest one in this sense. With an appropriate choice of the value of

k

, the influence of noise can be reduced substantially. The positive properties are demonstrated on the problem of seeded image segmentation. The results are compared with the results of geodesic distance and with the results of the random walker segmentation algorithm. The influence of

k

value is also discussed.

Michael Holuša, Eduard Sojka

Ground Level Recovery from Terrestrial Laser Scanning Data with the Variably Randomized Iterated Hierarchical Hough Transform

The planar digital terrain model to be used in the analysis of forest measurements made with terrestrial LIDAR scanning is proposed for regions dominated by plains. The structure of the data suggests that the iterated version of the Hough transform is a suitable method. This makes it possible to reduce the time and memory requirements of the method. Randomization with the fraction of data used varying with distance to the scanner is proposed to address the biasing of the result towards the measurements which are made with higher density in the central part of the stand. Using this method instead of weighted voting reduces the time of analysis. Hierarchical approach leads to further reduction of time. The method can be extended to models formed from more than one plane.

Leszek J. Chmielewski, Arkadiusz Orłowski

U3PT: A New Dataset for Unconstrained 3D Pose Tracking Evaluation

3D pose tracking using monocular cameras is an important topic, which has been receiving a great attention since last decades. It is useful in many domains such as: Video Surveillance, Human-Computer Interface, Biometrics, etc. The problem gets much challenging if occurring, for example, fast motion, out-of-plane rotation, the illumination changes, expression, or occlusions. In the literature, there are some datasets reported for 3D pose tracking evaluation, however, all of them retains simple background, no-expression, slow motion, frontal rotation, or no-occlusion. It is not enough to test advances of in-the-wild tracking. Indeed, collecting accurate ground-truth of 3D pose is difficult because some special devices or sensors are required. In addition, the magnetic sensors usually used for 3D pose ground-truth, is uncomfortable to wear and move because of their wires. In this paper, we propose a new recording system that allows people move more comfortable. We create a new challenging dataset, named U3PT (Unconstrained 3D Pose Tracking). It could be considered as a benchmark to evaluate and compare the robustness and precision of state-of-the-art methods that aims to work in-the-wild. This paper will also present the performances of two well-known state-of-the-art methods compared to our method on face tracking when applied to this database. We have carried out several experiments and have reported advantages and some limitations to be improved in the future.

Ngoc-Trung Tran, Fakhreddine Ababsa, Maurice Charbit

Characterization and Distinction Between Closely Related South Slavic Languages on the Example of Serbian and Croatian

The paper proposes a new method for characterization and distinction between closely related languages on the example of Serbian and Croatian languages. In the first step, the method transforms the text in different languages into the uniformly coded text. It is carried out in accordance to the position of each sign of the script in the text line and its height. Then, the coded text given as 1-D image is subjected to the texture analysis. According to that analysis, a feature vector of 28 elements is established. These 28 elements are extracted from co-occurrence texture and adjacent local binary pattern analysis. The feature vector is a starting point for classification by an extension of a state of the art method, called GA-ICDA. As a result, the distinction between the closely related languages is correctly accomplished. The method is tested on a database of documents in Serbian and Croatian languages. The experiments give promising results.

Darko Brodić, Alessia Amelio, Zoran N. Milivojević

Few-Views Image Reconstruction with SMART and an Allowance for Contrast Structure Shadows

The paper describes an original algorithm for reconstructing tomographic images from a few views. The algorithm is based on the known iterative Simultaneous Multiplicative Algebraic Reconstruction Technique (SMART). It is peculiar in that corrections for different zones of the reconstruction area are calculated differently with allowance for the distribution of shadows from contrast structures. The algorithm we call SMART-SA (SMART with Shadow Allowance) is implemented in 2D and tested for two numerical models with an air cavity and a material interface with 10% contrast. Reconstruction results are evaluated visually and quantitatively with such characteristics as correlation coefficient and deviation factor. It is shown that SMART-SA is capable of reconstructing the images that are free of artifacts typical of few-views tomography, and it performs especially well in combination with the MART-AP algorithm we published earlier.

Vitaly V. Vlasov, Alexander B. Konovalov, Alexander S. Uglov

Gaussian Mixture Model Selection Using Multiple Random Subsampling with Initialization

Selecting optimal number of components in Gaussian Mixture Model (GMM) is of interest to many researchers in the last few decades. Most current approaches are based on information criterion, which was introduced by Akaike (1974) and modified by many other researchers. The standard approach uses the EM algorithm, which fits model parameters to training data and determines log-likelihood functions for increasing number of components. Penalized forms of log-like- lihood function are then used for selecting number of components. Just searching for new or modified forms of penalty function is subject of permanent effort how to improve or do robust these methods for various type of distributed data. Our new technique for selection of optimal number of GMM components is based on

Mu

ltiple

R

andom

S

ubsampling of training data with

I

nitialization of the EM algorithm (MuRSI). Results of many performed experiments demonstrate the advantages of this method.

Josef V. Psutka

Vectorisation of Sketched Drawings Using Co-occurring Sample Circles

This paper presents a drawing vectorisation algorithm which uses multiple concentric families of circles placed in a dense grid on the image space. We show that any off-centered junction within the family of circles can be located and hence show how these junction points may be linked to neighbouring junction points, thereby creating a vector representation of the drawing geometry. The proposed algorithm identified

$$98\%$$

of the junctions in the drawings on which it was evaluated, each within a localisation error of

$$4.7 \pm 2.3$$

pixels, resulting in straight line vectors which are well placed with respect to the drawn edges.

Alexandra Bonnici, Kenneth P. Camilleri

Robust Contact Lens Detection Using Local Phase Quantization and Binary Gabor Pattern

Due to its resistance to circumvention, iris has been used as a prime biometric trait in border crossings and identity related civil projects. However, sensor level spoofing attacks such as the use of printed iris, plastic eyeballs and contact lens pose a challenge by helping intruders to sidestep security in iris based biometric systems. Attacks through contact lenses are most challenging to detect as they obfuscate the iris partially and part of original iris remains visible through them. In this paper, we present a contact lens dataset containing 12823 images acquired from 50 subjects. Each subject has images pertaining to no lens, soft lens and cosmetic lens class. Verification results with three different techniques on three datasets suggest an average degradation of 3.10% in EER when subject is wearing soft lens and 17.34% when subject is wearing cosmetic lens. Further we propose a cosmetic lens detection approach based on Local Phase Quantization(LPQ) and Binary Gabor Pattern(BGP). Experiments conducted on publicly available IIITD Vista, IIITD Cogent, ND_2010 and self-collected dataset indicate that our method outperforms previous lens detection techniques in terms of Correct Classification Rate and false Acceptance Rate. The results suggest that a comprehensive texture descriptor having blur tolerance of LPQ and robustness of BGP is suitable for cosmetic lens detection.

Lovish, Aditya Nigam, Balender Kumar, Phalguni Gupta

Low-Dimensional Tensor Principle Component Analysis

We clarify the equivalence between second-order tensor principal component analysis and two-dimensional singular value decomposition. Furthermore, we show that the two-dimensional discrete cosine transform is a good approximation to two-dimensional singular value decomposition and classical principal component analysis. Moreover, for the practical computation in two-dimensional singular value decomposition, we introduce the marginal eigenvector method, which was proposed for image compression. To evaluate the performances of the marginal eigenvector method and two-dimensional discrete cosine transform for dimension reduction, we compute recognition rates for image patterns. The results show that the marginal eigenvector method and two-dimensional discrete cosine transform have almost the same recognition rates for images in six datasets.

Hayato Itoh, Atsushi Imiya, Tomoya Sakai

Empirical Study of Audio-Visual Features Fusion for Gait Recognition

The goal of this paper is to evaluate how the fusion of audio and visual features can help in the challenging task of people identification based on their gait (i.e. the way they walk), or

gait recognition

. Most of previous research on gait recognition has focused on designing visual descriptors, mainly on binary silhouettes, or building sophisticated machine learning frameworks. However, little attention has been paid to audio patterns associated to the action of walking. So, we propose and evaluate here a multimodal system for gait recognition. The proposed approach is evaluated on the challenging ‘TUM GAID’ dataset, which contains audio recordings in addition to image sequences. The experimental results show that using late fusion to combine two kinds of tracklet-based visual features with audio features improves the state-of-the-art results on the standard experiments defined on the dataset.

Francisco M. Castro, Manuel J. Marín-Jiménez, Nicolás Guil

Web User Interact Task Recognition Based on Conditional Random Fields

Recognition activity of web users based on their navigational behavior during interaction process is an important topic of Human Computer Interaction. To improve the interaction process and interface usability, many studies have been performed for understanding how users interact with a web interface in order to perform a given activity. In this paper we apply the Conditional Random Fields approach for modeling human navigational behavior based on mouse movements to recognize web user tasks. Experimental results show the efficiency of the proposed model and confirm the superiority of Conditional Random Fields approach with respect to the Hidden Markov Models approach in human activity recognition.

Anis Elbahi, Mohamed Nazih Omri

Tree Log Identification Based on Digital Cross-Section Images of Log Ends Using Fingerprint and Iris Recognition Methods

Tree log biometrics is an approach to establish log traceability from forest to further processing companies. This work assesses if algorithms developed in the context of fingerprint and iris recognition can be transferred to log identification by means of cross-section images of log ends. Based on a test set built up on 155 tree logs the identification performances for a set of configurations and in addition the impacts of two enhancement procedures are assessed.

Results show, that fingerprint and iris recognition based approaches are suited for log identification by achieving 100% detection rate for the best configurations. In assessing the performance for a large set of tree logs this work provides substantial conclusions for the further development of log biometrics.

Rudolf Schraml, Heinz Hofbauer, Alexander Petutschnigg, Andreas Uhl

Detecting Human Falls: A Vision-FSM Approach

In this paper, we present a computer vision based system able to detect human falls. We show in detail all the stages of our system and the considerations taken for the provided results. We propose a simple scheme for detection and tracking followed by a Finite State Machine (FSM). The proposed system presents a good performance under different environment conditions.

Roger Trullo, Duber Martinez

Trademark Image Retrieval Using Inverse Total Feature Frequency and Multiple Detectors

Conventional similar trademark search methods have mainly handled only binary images and measured similarities globally between trademark images. Recent image retrieval methods using the bag-of-visual-words strategy can deal with the same object detection on the some various conditions like image size variation but cannot well handle vague similarity for simple shape objects in particular. However the real task for screening trademark images demands several image retrieval functions such as simultaneous validation of global and local similarities. In this paper we describe more effective methods for managing trademark image screening. Our method is twofold; One is a combination of multiple detectors for more various shape description and the other is an inverse total feature frequency that reflects extracted feature number for weighting each visual word more effectively in the bag-of-visual words strategy. Experiments with real trademark images show that our proposed method achieves higher accuracies than conventional methods.

Minoru Mori, Xiaomeng Wu, Kunio Kashino

Adaptive Graph Learning for Unsupervised Feature Selection

Most existing feature selection methods select features by evaluating a criterion which measures their ability to preserve the similarity structure of a data graph. However, these methods dichotomise the process of constructing or learning the underlying data graph and subsequent feature ranking. Once the graph is determined so as to characterize the structure of the similarity data, it is left fixed in the following ranking or regression steps. As a result, the performance of feature selection is largely determined by the effectiveness of graph construction step. The key to constructing an effective similarity graph is to determine a data similarity matrix. In this paper we perform the problem of estimating or learning the data similarity matrix and data-regression as simultaneous tasks, to perform unsupervised spectral feature selection. Our new method learns the data similarity matrix by optimally re-assigning the neighbors for each data point based on local distances or dis-similarities. Meanwhile, the

$$\ell _{2,1}$$

-norm is imposed to the transformation matrix to achieve row sparsity, which leads to the selection of relevant features. We derive an efficient optimization method to solve the simultaneous feature similarity graph and feature selection problems. Extensive experimental results on real-world benchmark data sets shows that our method consistently outperforms the alternative feature selection methods.

Zhihong Zhang, Lu Bai, Yuanheng Liang, Edwin R. Hancock

Shot and Scene Detection via Hierarchical Clustering for Re-using Broadcast Video

Video decomposition techniques are fundamental tools for allowing effective video browsing and re-using. In this work, we consider the problem of segmenting broadcast videos into coherent scenes, and propose a scene detection algorithm based on hierarchical clustering, along with a very fast state-of-the-art shot segmentation approach. Experiments are performed to demonstrate the effectiveness of our algorithms, by comparing against recent proposals for automatic shot and scene segmentation.

Lorenzo Baraldi, Costantino Grana, Rita Cucchiara

Locally Adapted Gain Control for Reliable Foreground Detection

One of the first steps in video analysis systems is the detection of objects moving in the scene, namely the foreground detection. Therefore, the accuracy and precision obtained in this phase have a strong impact on the performance of the whole system. Many camera manufacturers include internal systems, such as the automatic gain control (AGC), so as to improve the image quality; although some of these options enhance the human perception, they may also introduce sudden changes in the intensity of the overall image, which risk to be wrongly interpreted as moving objects by traditional foreground detection algorithms. In this paper we propose a method able to detect the changes introduced by the AGC, and properly manage them, so as to minimize their impact on the foreground detection algorithms. The experimentation has been carried out over a wide and publicly available dataset by adopted one well known background subtraction technique and the obtained results confirm the effectiveness of the proposed approach.

Duber Martinez, Alessia Saggese, Mario Vento, Humberto Loaiza, Eduardo Caicedo

Fourier Features For Person Detection in Depth Data

A robust and reliable person detection is crucial for many applications. In the domain of service robots that we focus on, knowing the location of a person is an essential requirement for any meaningful human-robot interaction. In this work we present a people detection algorithm exploiting RGB-D data from Kinect-like cameras. Two features are obtained from the data representing the geometrical properties of a person. These features are transformed into the frequency domain using Discrete Fourier Transform (DFT) and used to train a Support Vector Machine (SVM) for classification. Additionally, we present a hand detection algorithm based on the extracted silhouette of a person. We evaluate the proposed method on real world data from the Cornell Activity Dataset and on a dataset created in our laboratory.

Viktor Seib, Guido Schmidt, Michael Kusenbach, Dietrich Paulus

Springer Professional

Über dieses Buch

Inhaltsverzeichnis

Frontmatter