Skip to main content

2014 | Buch

Advances in Visual Computing

10th International Symposium, ISVC 2014, Las Vegas, NV, USA, December 8-10, 2014, Proceedings, Part II

herausgegeben von: George Bebis, Richard Boyle, Bahram Parvin, Darko Koracin, Ryan McMahan, Jason Jerald, Hui Zhang, Steven M. Drucker, Chandra Kambhamettu, Maha El Choubassi, Zhigang Deng, Mark Carlson

Verlag: Springer International Publishing

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

The two volume set LNCS 8887 and 8888 constitutes the refereed proceedings of the 10th International Symposium on Visual Computing, ISVC 2014, held in Las Vegas, NV, USA. The 74 revised full papers and 55 poster papers presented together with 39 special track papers were carefully reviewed and selected from more than 280 submissions. The papers are organized in topical sections: Part I (LNCS 8887) comprises computational bioimaging, computer graphics; motion, tracking, feature extraction and matching, segmentation, visualization, mapping, modeling and surface reconstruction, unmanned autonomous systems, medical imaging, tracking for human activity monitoring, intelligent transportation systems, visual perception and robotic systems. Part II (LNCS 8888) comprises topics such as computational bioimaging , recognition, computer vision, applications, face processing and recognition, virtual reality, and the poster sessions.

Inhaltsverzeichnis

Frontmatter

ST: Computational Bioimaging II

Fast Mesh-Based Medical Image Registration

In this paper a fast triangular mesh based registration method is proposed. Having

Template

and

Reference

images as inputs, the template image is triangulated using a content adaptive mesh generation algorithm. Considering the pixel values at mesh nodes, interpolated using spline interpolation method for both of the images, the energy functional needed for image registration is minimized. The minimization process was achieved using a mesh based discretization of the distance measure and regularization term which resulted in a sparse system of linear equations, which due to the smaller size in comparison to the pixel-wise registration method, can be solved directly. Mean Squared Difference (MSD) is used as a metric for evaluating the results. Using the mesh based technique, higher speed was achieved compared to pixel-based curvature registration technique with fast DCT solver. The implementation was done in MATLAB without any specific optimization. Higher speeds can be achieved using C/C++ implementations.

Ahmadreza Baghaie, Zeyun Yu, Roshan M. D’souza
Multimodal Non-Rigid Registration Methods Based on Demons Models and Local Uncertainty Quantification Used in 3D Brain Images

In this work, we propose a novel fully automated method to solve the 3D multimodal non-rigid image registration problem. The proposed strategy overcomes the monomodal intensity restriction of fluid-like registration (FLR) models, such as Demons-based registration algorithms, by applying a mapping that relies on an intensity uncertainty quantification in a local neighbourhood, bringing the target and source images into a common domain where they are comparable, no matter their image modalities or mismatched intensities between them. The proposed methodology was tested with T1, T2 and PD weighted brain magnetic resonance (MR) images with synthetic deformations, and CT-MR brain images from a radiotherapy clinical case. The performance of the proposed approach was evaluated quantitatively by standard indices that assess the correct alignment of anatomical structures of interest. The results obtained in this work show that the addition of the local uncertainty mapping properly resolve the monomodal restriction of FLR algorithms when same anatomic counterparts exists in the images to register, and suggest that the proposed strategy can be an option to achieve multimodal 3D registrations.

Isnardo Reducindo, Aldo R. Mejía-Rodríguez, Edgar Arce-Santana, Daniel U. Campos-Delgado, Elisa Scalco, Giovanni M. Cattaneo, Giovanna Rizzo
Principal Axes-Based Asymmetry Assessment Methodology for Skin Lesion Image Analysis

Skin cancer is the most common of all cancer types and Malignant Melanoma is the most dangerous form of it, thus prevention is vital. Risk assessment of skins lesions is usually done through the ABCD rule (asymmetry, border, color and differential structures) that classifies the lesion as benign, suspicious or highly suspicious of Malignant Melanoma. A methodology to assess the asymmetry of a skin lesion image in relation to each axis of inertia, for both dermoscopic and mobile acquired images, is presented. It starts by extracting a set of 310 of asymmetry features, followed by testing several feature selection and machine learning classification methods in order to minimize the classification error. For dermoscopic images, the developed methodology achieves an accuracy of 87% regarding asymmetry classification while, for mobile acquired images the accuracy reaches 73.1%.

Maria João M. Vasconcelos, Luís Rosado, Márcia Ferreira
Roles of Various Brain Structures on Non-Invasive Lateralization of Temporal Lobe Epilepsy

In this paper, we evaluate roles of different brain structures on lateralization of the epileptogenic focus in temporal lobe epilepsy (TLE) patients based on imaging features. To this end, we

extract

volumes of multiple brain structures from preoperative images of a retrospective cohort of seventy-five TLE patients with surgical outcome of Engel class I. Then, we apply data mining techniques such as feature extraction, feature selection, and machine learning classifiers. Exploiting volumes of various structures and two machine learning classifiers, we examine contributions of brain structures and classifiers to the lateralization of TLE patients.

Our experiments, using volumes of hippocampus and amygdala, show correct lateralization rates of 86.7% to 93.3% for

decision

tree and support vector machine (SVM) classifiers. This reflects 6.7% to 10.6% improvement in accuracy relative to using hippocampus volume alone. Also, using volumes of hippocampus, amygdala, and thalamus, we reach correct lateralization rate of 96.0% for SVM. Rules extracted from decision tree indicate that for intermediate hippocampus volumes, amygdala enlargement may determine side of epileptogenic focus. In conclusion, classification of the selected brain structures using the proposed classifiers improve decision-making of surgical resection in TLE and may reduce the need for implantation of intracranial monitoring electrodes.

Fariborz Mahmoudi, Mohammad-Reza Nazem-Zadeh, Hassan Bagher-Ebadian, Jason M. Schwalb, Hamid Soltanian-Zadeh
Spatio-temporal Level-Set Based Cell Segmentation in Time-Lapse Image Sequences

Automated segmentation and tracking of cells in time-lapse imaging is a process of fundamental significance in several biomedical applications. In this work our interest is focused on cell segmentation over a set of fluorescence microscopy images with varying levels of difficulty with respect to cell density, resolution, contrast, and signal-to-noise ratio. We utilize a region-based approach to curve evolution based on the level-set formulation. We introduce and test the use of temporal linking for level-set initialization to improve the robustness and computational time of level-set convergence. We validate our segmentation approach against manually segmented images provided by the Cell Tracking Challenge consortium. Our method produces encouraging segmentation results with an average DICE score of 0.78 over a variety of simulated and real sequences and speeds up the convergence rate by an average factor of 10.2.

Fatima Boukari, Sokratis Makrogiannis

Recognition

A Nonstationary Hidden Markov Model with Approximately Infinitely-Long Time-Dependencies

Hidden Markov models (HMMs) are a popular approach for modeling sequential data, typically based on the assumption of a first-order Markov chain. In other words, only one-step back dependencies are modeled which is a rather unrealistic assumption in most applications. In this paper, we propose a method for postulating HMMs with approximately infinitely-long time-dependencies. Our approach considers the whole history of model states in the postulated dependencies, by making use of a recently proposed nonparametric Bayesian method for modeling label sequences with infinitely-long time dependencies, namely the

sequence memoizer.

We manage to derive training and inference algorithms for our model with computational costs identical to simple first-order HMMs, despite its entailed infinitely-long time-dependencies, by employing a mean-field-like approximation. The efficacy of our proposed model is experimentally demonstrated.

Sotirios P. Chatzis, Dimitrios I. Kosmopoulos, George M. Papadourakis
Proximity Clustering for Revealing a Semantically Dominant Class

We propose the proximity clustering (PxC) algorithm, which finds the spatially diverse but semantically similar data-point colonies in the form of a dominant class cluster. It searches through different proximity metric spaces for revealing the hidden porous data pattern. For each data-point, it provides the degree of belongingness to a dominant cluster and thereby performing the soft clustering. Performance evaluation on an artificial dataset, for finding a dominant class, shows that the PxC outperforms the other clustering methods. We experimentally validate its applications for image classification, segmentation and an abnormal event detection from the video using the proposed motion features. Experimental results show that the PxC improves performance of the image classification and the unsupervised image segmentation.

Tushar Sandhan, Kimin Yun, Jin Young Choi
One-Shot Learning of Sketch Categories with Co-regularized Sparse Coding

Categorizing free-hand human sketches has profound implications in applications such as human computer interaction and image retrieval. The task is non-trivial due to the iconic nature of sketches, signified by large variances in both appearance and structure when compared with photographs. Prior works often utilize off-the-shelf low-level features and assume the availability of a large training set, rendering them sensitive towards abstraction and less scalable to new categories. To overcome this limitation, we propose a transfer learning framework which enables one-shot learning of sketch categories. The framework is based on a novel co-regularized sparse coding model which exploits common/shareable parts among human sketches of seen categories and transfer them to unseen categories. We contribute a new dataset consisting of 7,760 human segmented sketches from 97 object categories. Extensive experiments reveal that the proposed method can classify unseen sketch categories given just one training sample with a 33.04% accuracy, offering a two-fold improvement over baselines.

Yonggang Qi, Wei-Shi Zheng, Tao Xiang, Yi-Zhe Song, Honggang Zhang, Jun Guo
Hierarchical Spanning Tree-Structured Approximation for Conditional Random Fields: An Empirical Study

We present a learning algorithm to construct a discriminative Conditional Random Fields cascade model. We decompose the original grid-structured graph model using a set of spanning trees which are learned and added into the cascade architecture iteratively one after another. A spanning tree at each cascade layer takes both outputs from the previous layer nodes as well as the observed variables, which are processed by all the layers. The structure of spanning trees is generated uniformly at random among all spanning trees of the original graph. The result of the learning is the number of cascade layers, the structure of the generated spanning trees, and the set of optimized parameters corresponding to the spanning trees. We performed the experimental validation on synthetic and real-world imagery datasets and demonstrated better performance of the cascade tree-based model over the original grid-structured CRF model with loopy belief propagation inference.

Alexei N. Skurikhin
Thresholding a Random Forest Classifier

The original Random Forest derives the final result with respect to the number of leaf nodes voted for the corresponding class. Each leaf node is treated equally and the class with the most number of votes wins. Certain leaf nodes in the topology have better classification accuracies and others often lead to a wrong decision. Also the performance of the forest for different classes differs due to uneven class proportions. In this work, a novel voting mechanism is introduced: each leaf node has an individual weight. The final decision is not determined by majority voting but rather by a linear combination of individual weights leading to a better and more robust decision. This method is inspired by the construction of a strong classifier using a linear combination of small rules of thumb (AdaBoost). Small fluctuations which are caused by the use of binary decision trees are better balanced. Experimental results on several datasets for object recognition and action recognition demonstrate that our method successfully improves the classification accuracy of the original Random Forest algorithm.

Florian Baumann, Fangda Li, Arne Ehlers, Bodo Rosenhahn
Texture Retrieval Using Cauchy-Schwarz Divergence and Generalized Gaussian Mixtures

In this paper, we introduce the Cauchy-Schwarz divergence (CSD) in the context of texture retrieval. First, we model wavelet coefficients histograms using the already existing mixture of generalized Gaussians (MoGG) distribution. Then, we propose the CSD as a similarity measure between two MoGGs. As there is no closed-form of CSD, we compute this measure by a Monte-Carlo sampling method. Thanks to its tractable mathematical expression, CSD becomes computationally less expensive in contrast with Kullback-Leibler divergence (KLD). This later often needs other approximations with good sampling strategies or using bounding methods to avoid the heavy sampling process. Through the conducted experiments on two popular databases VisTeX and Brodatz, a retrieval rate of 98% is achieved.

Hassan Rami, Ahmed Drissi El Maliani, Mohammed El Hassouni, Driss Aboutajdine

3D Computer Vision

Spatial Uncertainty Model of a Three-View RGB-D Camera System

Multi-view stereo cameras and RGB-D cameras are widely used in robotic vision for 3D map reconstruction in navigation tasks. RGB-D cameras provide accurate depth measurements even in textureless areas, but are sensitive to distortion of its actively projected patterns. Stereo cameras are reliable if and only if there are sufficient features in the visible region. The two kinds of sensors are complementary in performance, so we combine them to a three-view RGB-D system and propose a fusion method for reliable 3D point cloud reconstruction. Furthermore, the reliability of the reconstructed map is vital for robotic navigation, so we build a spatial uncertainty model for the system, which can be easily specialized to either subsystems. The fusion method is shown to have gain in performance from the spatial uncertainty perspectives.

Chen Zhu, Simon Bilgeri, Christoph Günther
3D Estimation of Isometric Surfaces Using a ToF-Based Approach

This paper addresses the 3D reconstruction of non-rigid surfaces deforming isometrically. In fact, this reconstruction process aims at the estimation of isometric 3D surface. To this end, a ToF camera is used with a conventional monocular camera. The goal is to use the high-resolution images from the RGB camera in combination with the low-resolution depth map to enhance the estimation of non-rigid shapes. We describe how to model an isometric surface by means of a triangular mesh. The ToF sensor provides the depth of feature points, which is subsequently used to obtain the depth of the mesh vertices by means of linear programming-based approach. Given the mesh depth data, a second-order cone programming method is then developed to determine the image points of the mesh vertices. Bundle adjustment is then used to isometrically reconstruct the surfaces. Experimental results show that the proposed approach is robust against noise, generating accurate 3D reconstructions despite the low-resolution of the depth images.

S. Jafar Hosseini, Helder Araujo
3D Microscopy Vision Using Multiple View Geometry and Differential Evolutionary Approaches

The Scanning Electron Microscope (SEM) as 2D imaging equipment has been widely used in biology and material sciences to determine the surface attributes of a microscopic object. Having 3D surfaces from SEM images would provide true anatomic shapes of micro samples which allow for quantitative measurements and informative visualization of the systems being investigated. In this contribution, we present a Differential Evolutionary (DE) approach for both SEM extrinsic calibration and 3D surface reconstruction. We show that the SEM extrinsic calibration and its 3D shape model can be accurately estimated in a global optimization platform. Several experiments from various perspectives are performed on real and synthetic data to validate the speed, reliability and accuracy of the proposed system. The present work is expected to stimulate more interest and draw attentions from the computer vision community to the fast-growing SEM application area.

A. Pahlavan Tafti, A. B. Kirkpatrick, H. A. Owen, Z. Yu
Shape from Refocus

We present a method exploiting computational refocusing capabilities of a light-field camera in order to obtain 3D shape information. We consider a light-field constructed from the relative motion between a camera and observed objects, i.e. points on the object surface are imaged under different angles along the direction of the motion trajectory. Computationally refocused images are handled by a shape-from-focus algorithm. A linear sharpness measure is shown to be computational advantageous as computational refocusing to a specific depth and sharpness assessment of each refocused image can be reordered. We also present a view matching method which further stabilizes the suggested procedure when fused with sharpness assessment. Results for real-world objects from an inspection task are presented. Comparison to ground-truth data showed average depth errors on the order of magnitude of 1 mm for a depth range of 1 cm.

R. Huber-Mörk, S. Štolc, D. Soukup, B. Holländer
Ultrasound Surface Extraction Using Radial Basis Functions

Data acquired from ultrasound examinations is of interest not only for the physician, but also for the patient. While the physician uses the ultrasound data for diagnostic purposes the patient might be more interested in beautiful images in the case of prenatal imaging. Ultrasound data is noisy by nature and visually compelling 3D renderings are not always trivial to produce. This paper presents a technique which enables extraction of a smooth surface mesh from the ultrasound data by combining previous research in ultrasound processing with research in point cloud surface reconstruction. After filtering the ultrasound data using Variational Classification we extract a set of surface points. This set of points is then used to train an Adaptive Compactly Supported Radial Basis Functions system, a technique for surface reconstruction of noisy laser scan data. The resulting technique can be used to extract surfaces with adjustable smoothness and resolution and has been tested on various ultrasound datasets.

Rickard Englund, Timo Ropinski
Sphere Packing Aided Surface Reconstruction for Multi-view Data

Surface reconstruction has long been targeted at scan data. With the rise of multi-view acquisition, existing surface reconstruction techniques often turn out to be ill adapted to the highly irregular sampling and multilayered aspect of such data. In this paper, a novel surface reconstruction technique is developed to address these new challenges by means of an advancing front guided by a sphere packing methodology. The method is fairly simple and can efficiently triangulate point clouds into high quality meshes. The substantiated experimental results demonstrate the robustness and the generality of the proposed method.

Kun Liu, Patricio A. Galindo, Rhaleb Zayer

Applications

A New Coin Segmentation and Graph-Based Identification Method for Numismatic Application

The automatic identification of coins from photos helps coin experts to accelerate their study of coins and to reduce the associated expenses. To address this challenging problem for numismatic applications, we propose a novel coin identification system that consists of two stages. In the first stage, an active model based segmentation approach extracts precisely the coin from the photo with its shape features; in the second stage, the coin is identified to a monetary class represented by a template coin. The similarity score of two coins is computed from graphs constructed by feature points. Validation on the USA Grading dataset demonstrates that the proposed method obtains promising results with an identification accuracy of 94.4% on 2450 coins of 148 classes.

Xingyu Pan, Kitti Puritat, Laure Tougne
Evaluating Depth-Based Computer Vision Methods for Fall Detection under Occlusions

Falls are one of the major risks for seniors living alone at home. Fall detection has been widely studied in the computer vision community, especially since the advent of affordable depth sensing technology like the Kinect. Most existing methods assume that the whole fall process is visible to the camera. This is not always the case, however, since the end of the fall can be completely occluded by a certain object, like a bed. For a system to be usable in real life, the occlusion problem must be addressed. To quantify the challenges and assess performance in this topic, we present an occluded fall detection benchmark dataset containing 60 occluded falls for which the end of the fall is completely occluded. We also evaluate four existing fall detection methods using a single depth camera [1–4] on this benchmark dataset.

Zhong Zhang, Christopher Conly, Vassilis Athitsos
A Novel Modeling for Video Summarization Using Constraint Satisfaction Programming

This paper focuses on automatic video summarization. We propose a novel modeling for summary creation using constraint satisfaction programming (CSP). The proposed modeling aims to provide the summarization method with more flexibility. It allows users to easily modify the expected summary depending on their preferences or the video type. Using this new modeling, constraints become easier to formulate. Moreover, the CSP solver explores more efficiently the search space. It provides more quickly better solutions. Our model is evaluated and compared with an existing modeling on tennis videos.

Haykel Boukadida, Sid-Ahmed Berrani, Patrick Gros
Automated Bird Plumage Coloration Quantification in Digital Images

Quantitative measurements of bird plumage color and patch size provide valuable insights into the impact of environmental conditions on the habitat and breeding of birds. This paper presents a novel perceptual-based framework for the automated extraction and quantification of bird plumage coloration from digital images with slowly varying background colors. The image is first coarsely segmented into a few classes using the dominant colors of the image in a perceptually uniform color space. The required foreground class is then identified by eliminating the dominant background color based on the color histogram of the image. The determined foreground is segmented further using a Bayesian classifier and an edge-enhanced model-based classification for eliminating regions of human skin and is refined by using a perceptual-based Saturation-Brightness quantization to only preserve the perceptually relevant colors. Results are presented to illustrate the performance of the proposed method.

Tejas S. Borkar, Lina J. Karam
Which Phoneme-to-Viseme Maps Best Improve Visual-Only Computer Lip-Reading?

A critical assumption of all current visual speech recognition systems is that there are visual speech units called visemes which can be mapped to units of acoustic speech, the phonemes. Despite there being a number of published maps it is infrequent to see the effectiveness of these tested, particularly on visual-only lip-reading (many works use audio-visual speech). Here we examine 120 mappings and consider if any are stable across talkers. We show a method for devising maps based on phoneme confusions from an automated lip-reading system, and we present new mappings that show improvements for individual talkers.

Helen L. Bear, Richard W. Harvey, Barry-John Theobald, Yuxuan Lan
Iris and Pupil Measurement on Low Resolution Images for Driver Observation

Given the fact that the eyes are one of the most important clues for the driver’s attention in cases of traffic accidents, we propose a robust method for accurate iris and pupil detection and parameter estimation. The latter can be used to infer the eye gaze. The presented video-based system is calibration-free and does not require any training procedure. We implemented and adapted an eye feature extraction technique using computer-vision methods and estimated pupil and iris parameters with the help of a polar representation of the eye image. In order to detect the glints (reflections on the eye cornea), caused by the infrared illumination mounted on the camera, we apply a corner detection algorithm. The experiment results show high applicability of the presented eye feature extraction on low resolution images.

Emin Tarayan, Matthias Höffken, Andra Stefania Herta, Ulrich Kressel

Face Processing and Recognition

A Bayesian Framework for Accurate Eye Center Localization

Accurate localization of eye centers is very important in many computer vision applications. In this paper, we present a novel hybrid method for accurate eye center localization, in which the global appearance, the local features and the temporal information through eye tracking are fused under the Bayesian framework. Specifically, we first construct the position prior to incorporate the global appearance information, which makes our approach robust for images or videos with low resolutions. Then, the likelihood function is built based on local features in the eye region. Finally, after fusing the temporal information provided by eye tracking, we obtain the posterior distribution, and the mean shift method is used to find the locations of the eye centers. Our extensive experimental results on public datasets demonstrate that our system is robust to the variations of illumination and head pose, and outperforms several state-of-the-art methods.

Zhou Liu, Heng Yang, Ming Dong, Jing Hua
Facial Point Localization Using Combination Method under Occlusion

Under the occlusion of the face in an image, the existing discriminative or generative methods often fail to localize facial points because of the limitations of local facial point detectors and appearance modeling in the discriminative and generative methods, respectively. To solve this problem, we propose a new facial point localization method that combines the discriminative and generative methods. The proposed method consists of an initialization stage and optimization stage. The initialization stage detects the face, estimates the facial pose, and obtains the initial parameter set by locating the pose-specific mean shape on the detected face. The optimization stage obtains the facial points by updating the parameter set using the combined Hessian matrix and gradient vector of shape and appearance errors obtained from two methods. In experiments, the proposed method yields more accurate facial point localization under heavy occlusions and pose variations than the existing methods.

Jongju Shin, Jieun Kim, Daijin Kim
Personalized Modeling of Facial Action Unit Intensity

Facial expressions depend greatly on facial morphology and expressiveness of the observed person. Recent studies have shown great improvement of the personalized over non-personalized models in variety of facial expression related tasks, such as face and emotion recognition. However, in the context of facial action unit (AU) intensity estimation, personalized modeling has been scarcely investigated. In this paper, we propose a two-step approach for personalized modeling of facial AU intensity from spontaneously displayed facial expressions. In the first step, we perform facial feature decomposition using the proposed matrix decomposition algorithm that separates the person’s identity from facial expression. These two are then jointly modeled using the framework of Conditional Ordinal Random Fields, resulting in a personalized model for intensity estimation of AUs. Our experimental results show that the proposed personalized model largely outperforms non-personalized models for intensity estimation of AUs.

Shuang Yang, Ognjen Rudovic, Vladimir Pavlovic, Maja Pantic
Automatic Facial Expression Recognition Using Evolution-Constructed Features

One of the common ways of human showing emotion is through the change in facial expression. In this paper, we propose a new method for emotion detection by analyzing facial expression images. Facial expression information is analyzed by using a new feature construction method called Evolution-COnstructed (ECO) Features. The proposed algorithm is able to automatically recognize seven basic emotions that include Anger, Contempt, Disgust, Fear, Happiness, Sadness and Surprise. The test results on the Cohn- Kanade dataset show that the proposed algorithm has a very high classification accuracy.

Meng Zhang, Dah-Jye Lee, Alok Desai, Kirt D. Lillywhite, Beau J. Tippetts
View-Constrained Latent Variable Model for Multi-view Facial Expression Classification

We propose a view-constrained latent variable model for multi-view facial expression classification. In this model, we first learn a discriminative manifold shared by multiple views of facial expressions, followed by the expression classification in the shared manifold. For learning, we use the expression data from multiple views, however, the inference is performed using the data from a single view. Our experiments on data of posed and spontaneously displayed facial expressions show that the proposed approach outperforms the state-of-the-art methods for multi-view facial expression classification, and several state-of-the-art methods for multi-view learning.

Stefanos Eleftheriadis, Ognjen Rudovic, Maja Pantic
Pyramid Mean Representation of Image Sequences for Fast Face Retrieval in Unconstrained Video Data

This paper addresses the problem of face retrieval on large datasets by proposing an efficient representation for face videos. In comparison to the classical face verification problem, face retrieval poses additional challenges originating from database size. First, a different characteristic of recognition performance is required because retrieval scenarios have only very few correct face samples embedded in a large amount of imposters. In addition, the large number of samples in the database requires fast matching techniques. In this contribution, we present a face retrieval system which addresses these challenges. The first step consists of a set of measures to reduce frame descriptor dimension which saves processing time while keeping recognition performance. Afterwards, a novel Pyramid Mean Representation (PMR) of face videos is presented which allows for fast and accurate queries on large databases. The key concept is a hierarchical data representation with increasing sparsity which is used for an iterative query evaluation in a coarse to fine manner. The effectiveness of the proposed system is evaluated on the currently largest and most challenging public dataset of unconstrained videos, the YouTube Faces Database. In addition to the official verification test protocol, we define a protocol for face retrieval using a leave-one-out strategy. The proposed system achieves the best performance in this protocol with less processing time than baseline methods.

Christian Herrmann, Jürgen Beyerer

Virtual Reality

Evaluation of Image Feature Descriptors for Marker-Less AR Applications

For employing marker-less augmented reality (AR), image-based geometric alignment is one of the fundamental functions. Image feature descriptor is widely used for this purpose. In this paper, we evaluate various image feature descriptors for vision-based marker-less AR applications. To evaluate descriptors in a case where occlusion exists, we use not only 2D image test data but also images generated by 3D computer graphic models. We conducted experiments evaluating performance of detection, matching, and tracking, and compared them.

Hiroshi Koyasu, Kotaro Nozaki, Hitoshi Maekawa
Study of 2D Vibration Summing for Improved Intensity Control in Vibrotactile Array Rendering

2D tactile arrays may be integrated into handheld devices or VR controllers to enhance user experience, for example, with touch communication for collaborative tasks. Multiple tactors (tactile elements) may be activated in combination to approximate a vibration point (virtual tactor) having arbitrary position and intensity. We studied the combination of intensities from multiple tactors to guide virtual tactor rendering approaches. Subjects matched perceived loudness of multi-tactor vibrations to a reference tactor. The multi-tactor vibrations corresponded to overall perceived positions halfway between tactor pairs and in the center of a 2D 4-tactor group. Results inform the relationship between tactor signal level and perceived loudness at these critical positions. The relationship leads us to propose a nonlinear 2D rendering approach, provides a basis for assessment of existing rendering techniques, and lays a foundation for further study of 2D array rendering.

Nicholas G. Lipari, Christoph W. Borst
AR-Based Hologram Detection on Security Documents Using a Mobile Phone

Holograms are used frequently in creating fraud resistant security documents, such as passports, ID cards or banknotes. The key contribution of this paper is a real time method to automatically detect holograms in images acquired with a standard smartphone. Following a robust algorithm for creating a tracking target, our approach evaluates an evolving stack of observations of the document to automatically determine the location and size of holograms. We demonstrate the plausibility of our method using a variety of security documents, which are common in everyday use. Finally, we show how suitable data can be captured in a novel mobile gaming experience and draw the link between serious applications and entertainment.

Andreas Hartl, Clemens Arth, Dieter Schmalstieg
Markerless Planar Tracking in Augmented Reality Using Geometric Structure

This paper presents a novel tracking method based on Kanade-Lucas-Tomasi(KLT) for Markerless Augmented Reality system. Two main contributions are listed as follow: 1) feature points are tracked in a multi-level image pyramid model and the tracked points on different levels are blended together; 2) tracked points are refined using the geometric structure between them, including wiping off fictitious points and recovering missing points. In addition,it adapts to acute scale variation and poor quality image tracking. Experimental results show that the proposed algorithm is robust to scale invariant and the re-projection error of feature points on each frame is smaller than one pixel, which is kept at sub-pixel.

Chunling Fan, Yonggang Zhao, Liangbing Feng
A Haptic-Based Application for Active Exploration of Facial Expressions by the Visually Impaired

In this paper, a haptic-based interpretation of video images directly acquired by a webcam is introduced to help individuals who are visually impaired dynamically explore their own faces in an immersive haptic environment. Haptic interaction involves perception of the movement of different facial features through geometric cues in a 3D environment. Spatio-temporal variation of features due to facial movement assists those who are visually impaired to understand different facial expressions of emotions through active exploration of their own facial features. A dynamic haptic interface appropriate for this kind of interaction has been discussed. Lastly, application of this kind of training has been proposed for communication among individuals who are deafblind.

Shamima Yasmin, Troy McDaniel, Sethuraman Panchanathan

Poster Session

Affine Invariant Harris-Bessel Interest Point Detector

An affine invariant interest point detector named here as Harris-Bessel detector employing Bessel filters is proposed in this paper. The Harris-Bessel detector is applied on the images a well-known database in the literature. Our numerical results indicate that this detector is competitive and has better repeatability and localization measures than those of the affine invariant Harris-Laplace interest point detector. We also numerically demonstrate in this paper that the Harris-Bessel detector is affine invariant and is in average 450 times faster than the affine invariant Harris-Laplace detector for the images of the database.

Sasan Mahmoodi, Nasim Saba
Layered Depth Image Based HEVC Multi-view Codec

Multi-view video has gained widespread popularity in the recent years. 3DTV, surveillance, immersive teleconferencing and free view-point television are few notable applications of multi-view video. Excessive storage and transmission bandwidth requirements are the major challenges faced by the industry in facilitating multi-view video applications. This paper presents efficient tools for coding of multi-view video based on the state of the art single view video coding standard H.265/HEVC (High Efficiency Video Coding). Our approach employs the LDI (Layered Depth Image) representation technique which is capable of compactly representing 3D scene content. We propose techniques and algorithms for LDI construction, view synthesis, efficient coding of LDI layers and associated auxiliary information. Subjective assessments indicate that our approach offers more than 50% reduction in bitrate compared to HEVC simulcast for the same subjective quality under practical operating bitrates.

S. Kirshanthan, L. Lajanugen, P. N. D. Panagoda, L. P. Wijesinghe, D. V. S. X. De Silva, A. A. Pasqual
On Detectability of Moroccan Coastal Upwelling in Sea Surface Temperature Satellite Images

This work aims at automatically identify the upwelling areas in coastal ocean of Morocco using the Sea Surface Temperature (SST) satellite images. This has been done by using the fuzzy clustering technique. The proposed approach is started with the application of Gustafson-Kessel clustering algorithm in order to detect groups in each SST image with homogenous and non-overlapping temperature, resulting in a

c

-partitioned labeled image. Cluster validity indices are used to select the

c

-partition that best reproduces the shape of upwelling areas. An area opening technique is developed that is used to filter out the residuals noise and fine structures in offshore waters not belonging to the upwelling regions. The developed algorithm is applied and adjusted over a database of 70 SST images from years 2007 and 2008, covering the southern part of Moroccan atlantic coast. The system was evaluated by an oceanographer and provided acceptable results for a wide variety of oceanographic conditions.

Ayoub Tamim, Khalid Minaoui, Khalid Daoudi, Abderrahman Atillah, Driss Aboutajdine
High-Order Diffusion Tensor Connectivity Mapping on the GPU

We present an efficient approach to computing white matter fiber connectivity on the graphics processing unit (GPU). We utilize a high-order tensor model of fiber orientation computed from high angular resolution diffusion imaging (HARDI) and a stochastic model of white matter fibers to compute and display global white matter connectivity in real time. The high-order tensor model overcomes limitations of the 2nd-order tensor model in regions of crossing or fanning fibers. By utilizing modern GPU features exposed in recent versions of the OpenGL API we can perform processing and visualization without costly GPU-CPU data transfers.

Tim McGraw, Donald Herring
A Sequential 3D Curve-Thinning Algorithm Based on Isthmuses

Curve-thinning is a frequently applied technique to obtain centerlines from volumetric binary objects. Conventional curve-thinning algorithms preserve endpoints to provide important geometric information relative to the objects. An alternative strategy is also proposed that accumulates isthmuses (i.e., generalization of curve interior points as elements of the centerlines). This paper presents a computationally efficient sequential isthmus-based 3D curve-thinning algorithm.

Kálmán Palágyi
Automatic Identification of CAPTCHA Schemes

Text based CAPTCHAs are ubiquitous on the Internet since they are easily generated by machines, easily solvable by humans and yet not easily defeated by state-of-the-art computer algorithms. Over the years, several attacks have been designed by researchers to solve different types of CAPTCHAs. These attacks always assume that the type of CAPTCHA is known. However, in order to devise a common frame work, comprising of different attacks that can be launched automatically, the first prime step is to recognize the CAPTCHA scheme. In this paper we present a method based on geometric features to automatically identify text based CAPTCHA schemes. The proposed method is verified on a data set comprising of 25 different types of CAPTCHA (1,000 samples per type). We achieve an identification / classification accuracy of up to approximately 99%.

M. A. Asim K. Jalwana, Muhammad Murtaza Khan, Muhammad U. Ilyas
Object Detection Based on Multiresolution CoHOG

Recently, co-occurrence histograms of oriented gradients (CoHOG), a method for describing image features in order to calculate the co-occurrence of pixels allocated at the local level, has attracted attention as an effective method for object detection. However, there are some problems. For feature descriptions that focus on individual pixels, the calculation cost and the number of dimensions tends to increase exponentially with the number of pixels. This paper proposes the multiresolution co-occurrence histograms of oriented gradients (MRCoHOG) as a feature description method that is able to suppress these exponential increases into a linear increase without reducing the classification accuracy. MRCoHOG can reduce the number of dimensions of a feature by calculating the co-occurrence only between adjacent pixels, and it maintains accuracy by extracting features from multiple low-resolution images. We performed classification experiments using a vehicle data set cropped from surveillance images of a parking area and the INRIA Person Data Set, and the results showed that the performance of MRCoHOG is equivalent to that of CoHOG.

Sohei Iwata, Shuichi Enokida
Who Shot the Picture and When?

Consider a set of images corresponding to a dynamic scene captured using multiple hand-held cameras. Assuming that we do not have any other information about the camera settings and the dynamic scene, we would like to identify the cameras which captured each of these images. Further, we would like to estimate the order in which these images were captured by each of the cameras. We address this challenging problem using principles derived from multiple view geometry and unsupervised learning techniques. We show that the camera identification problem can be modelled as clustering of the affine camera matrices estimated from the images. We show that homography estimation from the static regions of the scene enables us to order the images captured by each camera individually. Apart from discussing the advantages of the proposed approach, we conclude the paper providing the limitations of the approach and future directions.

Gagan Kanojia, Sri Raghu Malireddi, Sai Chowdary Gullapally, Shanmuganathan Raman
Matching Affine Features with the SYBA Feature Descriptor

Many vision-based applications require a robust feature descriptor that works well with image deformations such as compression, illumination, and blurring. It remains a challenge for a feature descriptor to work well with image deformation caused by viewpoint change. This paper introduces, first, a new binary feature descriptor called SYnthetic BAsis (SYBA) for feature point description and matching, and second, a method for removing non-affine features from the initial feature list to further improve the feature matching accuracy. This new approach has been tested on the

Oxford

dataset and a newly created dataset by comparing the feature matching accuracy using only affine features with the accuracy of using both affine and non-affine features. A statistical T-test was performed on the newly created dataset to demonstrate the advantages of using only affine feature points for matching. SYBA is less computationally complex than other feature descriptors and gives better feature matching results using affine features.

Alok Desai, Dah-Jye Lee, Dan Ventura
Boosted Fractal Integral Paths for Object Detection

In boosting-based object detectors, weak classifiers are often build on Haar-like features using conventional integral images. That approach leads to the utilization of simple rectangle-shaped structures which are only partial suitable for curved-shaped structures, as present in natural object classes such as faces. In this paper, we propose a new class of fractal features based on space-filling curves, a special type of fractals also known as Peano curves. Our method incorporates the new feature class by computing integral images along these curves. Therefore space-filling curves offer our proposed features to describe a wider variety of shapes including self-similar structures. By introducing two subtypes of fractal features, three-point and four-point features, we get a richer representation of curved and topology separated but correlated structures. We compare AdaBoost using conventional Haar-like features and our proposed fractal feature class in several experiments on the well-known MIT+CMU upright face test set and a microscopy cell test set.

Arne Ehlers, Florian Baumann, Bodo Rosenhahn
Depth Estimation within a Multi-Line-Scan Light-Field Framework

We present algorithms for depth estimation from light-field data acquired by a multi-line-scan image acquisition system. During image acquisition a 3-D light field is generated over time, which consists of multiple views of the object observed from different viewing angles. This allows for the construction of so-called epipolar plane images (EPIs) and subsequent EPI-based depth estimation. We compare several approaches based on testing various slope hypotheses in the EPI domain, which can directly be related to depth. The considered methods used in hypothesis assessment, which belong to a broader class of block-matching algorithms, are modified sum of absolute differences (MSAD), normalized cross correlation (NCC), census transform (CT) and modified census transform (MCT). The methods are compared w.r.t. their qualitative results for depth estimation and are presented for artificial and real-world data.

D. Soukup, R. Huber-Mörk, S. Štolc, B. Holländer
A Weighted Regional Voting Based Ensemble of Multiple Classifiers for Face Recognition

Face Recognition has become a heavily studied field of AI. Competing techniques have been proposed, both holistic and local, each has their own advantages and disadvantages. Recently, a unified methodology using a Regional Voting framework has improved the accuracy of all holistic algorithms significantly and is currently regarded as one of the best approaches. In this work, based on the success of regional voting, we developed a two layer voting system called Weighted Regional Voting Based Ensemble of Multiple Classifiers (WREC), which can embed all available face recognition algorithms. The first layer embeds a holistic algorithm into a Regional Voting framework. The second layer gathers the classification results of different algorithms from the first layer and then makes the final decision. Extensive experiments carried out on benchmark face databases show the proposed system is faster and more accurate than several other leading algorithms/approaches in every case.

Jing Cheng, Liang Chen
Depth Data-Driven Real-Time Articulated Hand Pose Recognition

This paper presents a fast but robust method to recognize articulated hand pose from single depth images in real-time. We tackle the main challenges in the hand pose recognition, which include the high degree of freedom and self-occlusion of articulated hand motion, using efficient retrieval of a large set of hand pose templates. The normalized orientation templates are used for encoding the depth images containing hand poses, and the locality sensitive hashing is used for finding the nearest neighbors in real time. Our approach does not suffer from the common problems in the conventional tracking approaches such as model initialization and tracking drift, and qualitatively outperforms the existing hand pose estimation techniques.

Young-Woon Cha, Hwasup Lim, Min-Hyuk Sung, Sang Chul Ahn
3-D Model Alignment for Retrieval from Part of Model Considering the Rotation, Scaling and Translation with Projections around an Axis

Three-dimensional (3-D) models are used in a wide range of fields such as 3-D printing, industry, and medicine. Efficient retrieval of objects becomes attractive to manage and reuse 3-D models. Also when we can select a registration model whose part is similar to the query, it becomes possible to create 3-D models which are applicable to various fields by using the part obtained by the retrieval as a part of the model we want to create. It is necessary to consider the rotation, scaling and translation of the 3-D models during retrieval. Although the generally used principal axis method estimates the geometric transform parameters, a large error may result when the query is a part of the corresponding registration model. This paper proposes a method using projections around an axis. The projections are generated after selecting a projection axis for every input and registration models. After estimating the parameters of rotation, scaling and translation using the phase correlation between these projections, the similarity of the two models is calculated. Experiments to evaluate the proposed method show that proposed method is effective in cases where the query is corresponding to a part of the registration model.

Yohei Kayanuma, Fumiko Umeda, Akira Kawanaka
Strokes Detection for Skeletonisation of Characters Shapes

Skeletonisation is a key process in character recognition in natural images. Under the assumption that a character is made of a stroke of uniform colour, with small variation in thickness, the process of recognising characters can be decomposed in the three steps. First the image is segmented, then each segment is transformed into a set of connected strokes (skeletonisation), which are then abstracted in a descriptor that can be used to recognise the character. The main issue with skeletonisation is the sensitivity with noise, and especially, the presence of holes in the masks. In this article, a new method for the extraction of strokes is presented, which address the problem of holes in the mask and does not use any parameters.

Cyrille Berger
Face Detection and Tracking for Intent Recognition

An algorithm for visual intent recognition based on adaptive boosting and principal components analysis is presented, with two different motions of the human face namely rotation and vertical motion as intent indicators. The context for which this solution is intended is that of wheelchair bound individuals who want to travel in a certain direction through their face in rotation, and to go forward and stop through their face in vertical motion. The approach is based on the work of Jia and Hu [1], [2], where instead of inferring intention as they proposed through head pose estimation on a single frame, the face in motion is represented using an intention curve that is subsequently classified for intent recognition through a decision rule. This work intends to provide a contribution to the realization of an enabled environment allowing people with severe disabilities and the elderly to be more independent and active in society.

K. T. Luhandjula, B. J. van Wyk, K. Djouani, Y. Amirat
Embedded Image Processing System for Automatic Page Segmentation of Open Book Images

In this paper the image processing stage of an automatic book scanning system is presented. The scanning system is composed of a camera, an image processing platform, an illumination subsystem and a hardware platform which holds an open book to an adequate distance to the camera. The image processing platform communicates to a camera in order to obtain the input image of open book. Once the open book image acquired, the image processing stage applies a set of operators over the image in order to segment and store images of the individual book pages. Segmentation analysis and performance evaluation of the proposed image processing platform are performed. Results of the segmentation process show that at least 75% of the pages are correctly segmented. Execution time results show that up to 18 pages per minute can be automatically digitalized. The proposed system was implemented on two different processing platforms, a laptop computer with an Intel processor and a Raspberry-Pi minicomputer.

Victor Rodríguez-Osoria, Marco Aurelio Nuño-Maganda, Yahir Hernández-Mier, Cesar Torres-Huitzil
Towards an Embedded Real-Time High Resolution Vision System

This paper proposes an approach to image processing for high performance vision systems. Focus is on achieving a scalable method for real-time disparity estimation which can support high resolution images and large disparity ranges. The presented implementation is a non-local matching approach building on the innate qualities of the processing platform which, through utilization of a heterogeneous system, combines low-complexity approaches into performing a high-complexity task. The complementary platform composition allows for the FPGA to reduce the amount of data to the CPU while at the same time promoting the available informational content, thus both reducing the workload as well as raising the level of abstraction. Together with the low resource utilization, this allows for the approach to be designed to support advanced functionality in order to qualify as part of unified image processing in an embedded system.

Fredrik Ekstrand, Carl Ahlberg, Mikael Ekström, Giacomo Spampinato
Violence Detection in Video by Using 3D Convolutional Neural Networks

Whereas most researches are about the action recognition problem, the detection of fights has been comparatively less involved. Such capability may be of great importance. Typical methods mostly rely on domain knowledge to construct complex handcraft features from inputs. On the contrary, deep models can act directly on the raw inputs and automatically extracts features. So we developed in this paper a novel 3D ConvNets model for violence detection in video without using any prior knowledge. To evaluate our method, experimental validation conducted in the context of the Hockey dataset. The results show that the method achieves superior performance without relying on handcrafted features.

Chunhui Ding, Shouke Fan, Ming Zhu, Weiguo Feng, Baozhi Jia
Modified Adaptive Extended Bilateral Motion Estimation with Scene Change Detection for Motion Compensated Frame Rate Up-Conversion

In this paper, a novel frame rate up conversion (FRUC) algorithm using modified adaptive extended bilateral motion estimation (MAEBME) with scene change detection is proposed. Conventionally, extended bilateral motion estimation (EBME) carries out bilateral motion estimation (BME) twice on the same region, therefore involves high complexity. Adaptive extended bilateral motion estimation (AEBME) is proposed to reduce complexity and increase visual quality by using block type matching process and considering frame motion activity. In MAEBME algorithm, calculated edge information is used to detect a global scene cut change, and then is used in block type matching process whether to use EBME. Finally, overlapped block motion compensation (OBMC) and motion compensated frame interpolation (MCFI) are adopted to interpolate the intermediate frame in which OBMC is employed adaptively by considering frame motion activity. Experimental results show that this proposed algorithm has outstanding performance and fast computation comparing with the anchor algorithms.

Daejun Park, Jechang Jeong
Concealed Target Detection with Fusion of Visible and Infrared

Concealed or buried improvised explosive devices (IEDs) are a major cause of fatalities for both civilians and soldiers. For detecting hidden targets, many technologies have been considered such as ground penetrating radar (GPR), infrared cameras, and even visible wavelength cameras. In this work, we propose fusing visible and infrared sensors for automatic detection of shallowly buried (< 10

cm

) or above ground targets. We use Gaussian Mixture Models (GMMs) to create a base model of the temperature and color variation of the background scene and dynamically update our models for new scenes. Anomalous temperatures and colors are identified using the GMM components. Fusion is performed at the pixel level, confidence map level, and decision level for comparison. Data was collected with a Xenics Gobi 480 long wave infrared camera and a Canon Powershot A1200 visible wavelength camera with metal targets placed in various concealed configurations. The observed results show that infrared can detect shallowly buried targets and targets above ground ”out in the open” effectively, but cannot detect metal targets nearby bushes. Visible cameras, on the other hand, can detect the metal targets in the bushes effectively. Confidence map and decision level fusion led to the best results when there was a mix of buried targets and targets hidden in bushes.

Philip Saponaro, Kelly Sherbondy, Chandra Kambhamettu
Enhancement of Hazy Color Images Using a Self-Tunable Transformation Function

Vision based outdoor mobile systems are very sensitive to infelicitous weather circumstances like hazy and foggy conditions. The acquisition of image frames in such an environment deteriorates the scene contrast and biases the color information. In order to recover the scene details, we propose a new method which takes a nonlinear approach, where the haze pixel intensity is manipulated effectively with a specially designed sine nonlinear function. This function is integrated with the optics based haze model to approximate the enhanced inverse transmission of the scene. The transformation function is composed with a variable parameter, which tunes automatically, to produce desired nonlinear mapping for each pixel while maintaining the local contrast. Unlike other state-of art haze removal techniques, which operates on local regions, proposed method operates on each pixel to eliminate the blocking artifacts and minimizes the processing complexity. Our experimental results with quantitative measures demonstrate that the proposed technique yields state-of-the-art performance on hazy images and is suitable to process a dynamic video scenes captured in adverse weather conditions.

Saibabu Arigela, Vijayan K Asari
Determine Absolute Soccer Ball Location in Broadcast Video Using SYBA Descriptor

This paper presents research work on the detection, tracking, and localization of the soccer ball in a broadcast soccer video and maps the ball locations to the global coordinate system of the soccer field. Because of the lack of reference points in these frames, the calculation of the global coordinates of the ball remains a very challenging task. This paper proposes to use an object-based algorithm and Kalman filter to detect and track the ball in such videos. Once the ball is located, frames are registered to static soccer field, and the absolute ball location is found in the field. The existing feature matching algorithms do not work well for frame registration, especially when involving lighting variations and large camera pan-tile-zoom change. To overcome this challenge, a new feature descriptor and matching algorithm that is robust to these deformations is developed and presented in this paper. Experimental results show the proposed algorithm is very effective and accurate.

Alok Desai, Dah-Jye Lee, Craig Wilson
Colour Perception Graph for Characters Segmentation

Characters recognition in natural images is a challenging problem, as it involves segmenting characters of various colours on various background. In this article, we present a method for segmenting images that use a colour perception graph. Our algorithm is inspired by graph cut segmentation techniques and it use an edge detection technique for filtering the graph before the graph-cut as well as merging segments as a final step. We also present both qualitative and quantitative results, which show that our algorithm perform at slightly better and faster to a state of the art algorithm.

Cyrille Berger
Initial Closed-Form Solution to Mapping from Unknown Planar Motion of an Omni-directional Vision Sensor

This paper proposes a novel method to construct a stationary environment map and estimate the ego-motion of a sensor system from unknown planar motion by using an omni-directional vision sensor. Most environments where sensor moves to obtain maps are limited to two-dimensional space. However, conventional ”Structure from Motion (SFM)” algorithms cannot be applied to planar motion and one-dimensional measurements because they use the epipolar geometry. We propose an algorithm that can be applied to two-dimensional space. Since the number of parameters to be estimated is reduced, computational advantages can be obtained for large map reconstruction. Proposed algorithm exploits the azimuths of features which are obtained from an omni-directional vision sensor and gives robust results against the noise of image information by taking advantage of large field of view. A relation between observed azimuth and motion parameters of a vision sensor are constrained by a nonlinear equation. The proposed method obtains closed form solutions to all the motion parameters and an environment map through a two-step procedure. These estimation results can be used as a good initial seed for the incremental reconstruction of a large map.

Jae-Hean Kim, Jin Sung Choi
Multimodal Approach for Natural Biomedical Multi-scale Exploration

Pathologies which simultaneously involve different spatial scales are often difficult to understand. Biomedical data from different modalities and spatio-temporal scales needs to be combined to obtain an understandable representation for its examination. Despite of requests to improve the exploration of multi-scale biomedical data, no major progress has been made in terms of a common strategy combining the state of art in visualization and interaction. This work presents a multimodal approach for a natural biomedical multi-scale exploration. The synergy of a multi-layered visualization environment based on spatial scales with hand gestures and haptic interfaces opens new perspectives for a natural data manipulation.

Jan Rzepecki, Ricardo Manuel Millán Vaquero, Alexander Vais, Karl-Ingo Friese, Franz-Erich Wolter
Generating Super-Resolved Depth Maps Using Low-Cost Sensors and RGB Images

There are a lot of three-dimensional reconstruction applications of real scenes. The rise of low-cost sensors, like the Microsoft Kinect, suggests the development of systems cheaper than the existing ones. Nevertheless, data provided by this device are worse than that provided by more sophisticated sensors. In the academic and commercial world, some initiatives try to solve that problem. Studying that attempts, this work suggests the modification of super-resolution algorithm described by Mitzel et al. [1] in order to consider in its calculations colored images provided by Kinect. This change improved the super-resolved depth maps provided, mitigating interference caused by sudden changes of captured scenes. The tests showed the improvement of generated maps and analysed the impact of CPU and GPU algorithms implementation in the super-resolution step.

Leandro Tavares Aragão dos Santos, Manuel Eduardo Loaiza Fernandez, Alberto Barbosa Raposo
Learning and Association of Features for Action Recognition in Streaming Video

We propose a novel framework which learns and associates local motion pattern manifolds in streaming videos using generalized regression neural networks (GRNN) to facilitate real time human action recognition. The motivation is to determine an individual’s action even when the action cycle has not yet been completed. The GRNNs are trained to model the regression function of patterns in latent action space on the input local motion-shape patterns. This manifold learning makes the framework invariant to different sequence length and varying action states. Computation of latent action basis is done using EOF analysis and association of local temporal patterns to an action class at runtime follows a probabilistic formulation. This corresponds to finding the closest estimate the GRNN obtains to the corresponding action basis. Experimental results on two datasets, KTH and the UCF Sports, show accuracy of above 90% obtained from 15 to 25 frames.

Binu M. Nair, Vijayan K. Asari
Cell Classification in 3D Phase-Contrast Microscopy Images via Self-Organizing Maps

Cancer cell morphology can be used as an indicator of metastasizing behaviors. To analyze cancer cell morphology, we used 3D phase-contrast microscopy. This is one of the most common imaging modalities for the observation of long-term multi-cellular processes of living cells without phototoxicity and photobleaching, which is common in other fluorescent labeling techniques. However, it also has certain drawbacks at the image level, such as non-uniform illumination and phase-contrast interference rings. Our first step compensates for row-contrast artifacts via single cell detection using intensity-based global segmentation. We extracted cross-sections using principle component analysis; this was due to the interference’s non-symmetric diffusion pattern, which appeared around each individual cell. Then, we analyzed cell morphology by an intensity gradient, considering local peaks as bright ring regions. Finally, we applied a self-organizing map method that has potential viability for cancer cell classification into active and inactive categories.

Mi-Sun Kang, Hye-Ryun Kim, Myoung-Hee Kim
Pose-Aware Smoothing Filter for Depth Images

We propose a novel smoothing algorithm for depth images. Inspired by a bilateral filter, we design a pose-aware smoothing filter for sequentially captured multiple depth images. First, our method solves the frame-to-frame tracking problem by using the point-to-plane iterative corresponding point (ICP) algorithm between a keyframe depth image and multiple nearby depth images, which gives us relative transforms between camera poses. Then, we re-project those depth images onto a keyframe depth image by using previously computed relative transforms. Finally, we merge those depth images onto a keyframe depth image by applying a proposed smoothing filter. Since our kernel function uses pixel distance and information similarities not in one depth image but in several depth images that are temporally nearby, it is more robust to noise while preserving geometric details than a conventional bilateral filter. Our smoothing algorithm can be combined with the traditional bilateral filter to take advantages of both algorithms. Additionally, our algorithm can be applied on dynamic scenes by detecting dynamic pixels and removing dynamic values. We present experiments with depth images captured by a depth camera.

Seungpyo Hong, Jinwook Kim
Scene Understanding for Auto-Calibration of Surveillance Cameras

In the last decade, several research results have presented formulations for the auto-calibration problem. Most of these have relied on the evaluation of vanishing points to extract the camera parameters. Normally vanishing points are evaluated using pedestrians or the Manhattan World assumption i.e. it is assumed that the scene is necessarily composed of orthogonal planar surfaces. In this work, we present a robust framework for auto-calibration, with improved results and generalisability for real-life situations. This framework is capable of handling problems such as occlusions and the presence of unexpected objects in the scene. In our tests, we compare our formulation with the state-of-the-art in auto-calibration using pedestrians and Manhattan World-based assumptions. This paper reports on the experiments conducted using publicly available datasets; the results have shown that our formulation represents an improvement over the state-of-the-art.

Lucas Teixeira, Fabiola Maffra, Atta Badii
A Multi-view Profilometry System Using RGB Channel Separated Fringe Patterns and Unscented Kalman Filter

In this paper a one-shot method to determine the shape of an object from overlapping cosine fringes projected from multiple projectors is presented. This overcomes the limitation with single projector systems that do not allow imaging the entire object with a single shot. The proposed method projects orthogonal fringe patterns of different colours from different projectors and uses colour channel isolation and Fourier domain filtering to isolate the fringes. An Unscented Kaman Filter and smoother are used to demodulate the fringe pattern, which does not rely on a strictly sinusoidal fringe pattern for good results. Sources of error are discussed and their effects on the resulting parameter estimation are shown, as well as methods to reduce their impact. The proposed method is tested on simulations and real world objects and it is shown to be effective to isolate interfering fringes and determine the shape of an object with non-sinusoidal fringes input as opposed to Fourier Transform Profilometry.

Stuart Woolford, Ian. S. Burnett
A 3D Tracker for Ground-Moving Objects

Multi-object tracking is a major area of research because of its wide application scope. In this paper we describe a set of improvements, toward video surveillance context, to the multi-object tracker proposed by [1]. First, we generalize the tracking by removing the specialization made for pedestrians. Then, we integrate easily available scene knowledge in order to allow three-dimensional reasoning and better handle occlusions. Additionally, we improve the group creation and destruction mechanism by adding an association pass and an overlap similarity criterion. We evaluate the proposed method on several synthetic and real-world videos.

M. Rogez, L. Robinault, L. Tougne
Counting the Crowd at a Carnival

The focus of this paper is to count the number of people participating in a specific carnival, namely Aalborg Carnival in Denmark, which is believed to be the biggest in Northern Europe. A carnival poses significant challenges from a computer vision viewpoint due to high density, occlusion and non-human objects in the scene. To this end we apply a passive stereo vision approach to create a depth image where the heads of people are segmented, tracked, and counted in real-time. The results from the parade demonstrated that the system is able to count the people passing by with an uncertainty of 5.8 %.

J. B. Pedersen, J. B. Markussen, M. P. Philipsen, M. B. Jensen, T. B. Moeslund
Image Retrieval Based on Statistical and Geometry Features

The most popular approach to large scale image retrieval is based on the bag-of-word (BoW) representation of images. There is an important trick how the statistical and geometry features of BoW are efficiently used. We present a two-step approach for image retrieval with statistical features and spatial geometry information been considered in different step. In the first step, the statistical features of the images’ BoW are achieved to capture the underlying image topic to screen those images. In the second step, images from same topic are ranked using the concept of co-occurrence features (a type of geometry features). Computational cost of the retrieval is reduced because the first step does not consider computing the expensive spatial geometry information and the second step only uses significant features to rank images. Experiments on the Oxford 5K benchmark show that the proposed technique can stably achieve nearly the same result compared with a state-of-the-art retrieval method [1] while only spending about a tenth of the time the method takes.

Yu Liu, Liangbing Feng, Xing Wang, Ning Guo
PixSearcher: Searching Similar Images in Large Image Collections through Pixel Descriptors

Searching and mining huge image databases has become a daunting task for many application areas such as astronomy, medicine, geology, oceanography, and crime prevention. We introduce a new image indexing and retrieval system, PixSearcher, that computes image descriptors from pixels obtained by thresholding images at different levels. In contrast to raster techniques (which use the entire pixel raster for distance computations), PixSearcher uses a small set of descriptors and consequently can handle large image collections within reasonable time. We use benchmark image databases from different domains to evaluate PixSearcher’s performance versus well-known image retrieval techniques.

Tuan Nhon Dang, Leland Wilkinson
Shortest Enclosing Walks with a Non-zero Winding Number in Directed Weighted Planar Graphs: A Technique for Image Segmentation

This paper presents an efficient graph-based image segmentation algorithm based on finding the shortest closed directed walks surrounding a given point in the image. Our work is motivated by the Intelligent Scissors algorithm, which finds open contours using the shortest-path algorithm, and the Corridor Scissors algorithm, which is able to find closed contours. Both of these algorithms focus on undirected, nonnegatively weighted graphs. We generalize these results to directed planar graphs (not necessary with nonnegative weights), which allows our approach to utilize knowledge of the object’s appearance. The running time of our algorithm is approximately the same as that of a standard shortest-path algorithm.

Alexey Malistov
Intuitive Alignment of Point-Clouds with Painting-Based Feature Correspondence

Throughout the course of several years, significant progress has been made with regard to the accuracy and performance of pair-wise alignment techniques; however when considering low-resolution scans with minimal pair-wise overlap, and scans with high levels of symmetry, the process of successfully performing sequential alignments in the object reconstruction process remains a challenging task. Even with the improvements in surface point sampling and surface feature correspondence estimation, existing techniques do not guarantee an alignment between arbitrary point-cloud pairs due to statistically-driven estimation models. In this paper we define a robust and intuitive painting-based feature correspondence selection methodology that can refine input sets for these existing techniques to ensure alignment convergence. Additionally, we consolidate this painting process into a semi-automated alignment compilation technique that can be used to ensure the proper reconstruction of scanned models.

Shane Transue, Min-Hyung Choi
Precise 3D Measurements for Tracked Objects from Synchronized Stereo-Video Sequences

This paper presents a system suitable to perform precise and fast 3D measurements from synchronized stereo-video sequences and provide target’s georeference in a known reference system. To this direction we combine a robust tracker with photogrammetric techniques into a fast and reliable system. For tracking objects and people, we adopt and modify a stable human tracker able to cope efficiently with the trade-off between model stability and adaptability. For achieving accurate and precise 3D measurements, camera calibration was implemented in order to recover the intrinsic parameters of the cameras of the configuration. Finally, for precise and reliable calculation of the 3D trajectory of the moving person, we apply bundle adjustment for all frames. Bundle adjustment is a very accurate algorithm and has the advantages of being tolerant of missing data while providing a true Maximum Likelihood estimate. The results have been tested and evaluated in real life conditions for proving the robustness and the accuracy of the system.

Panagiotis Agrafiotis, Andreas Georgopoulos, Anastasios D. Doulamis, Nikolaos D. Doulamis
Artificial Intelligence Gaming Assistant for Google Glass

The interaction between humans and computers has traditionally been through the medium of desktop computing. However, in recent years, an alternative computing concept known as ubiquitous computing is ushering in various wearable computing technologies, such as Google Glass. These technologies enable even more immediate ways to share and access information. Our research seeks to explore novel methods in which these wearable technologies can be combined with more powerful computing techniques to compute useful context-specific information. The scope of this research is in utilizing Google Glass to act as an artificially intelligent game assistant. The approach of this work is to make use of Google’s Mirror API to build a web-based service to interact with Glass. The Mirror API is used to share an image from Glass to a web-based service where the image is processed and key features are extracted. An appropriate algorithm is then used to compute a near-optimal game move depending on the game being played. The results are promising, and the Glassware that was implemented suggests appropriate moves while playing a game of Connect Four. Our results foreshadow what is possible when wearable technology is combined with artificially intelligent computation in the cloud.

Scott Bouloutian, Edward Kim
Adding Color Sensitivity to the Shape Adaptive Image Ray Transform

The Image Ray Transform (IRT) is a recent approach to extract structures at a low level by mimicking the attributes of light rays. We suggest an extension of IRT to detect shapes of chosen colors. The new approach uses the

CIEL

*

a

*

b

* color model within the IRT’s light ray analogy. The capability of the extended IRT using color information is evaluated for correct shape location by conventional methods such as the Hough Transform. We show that the new approach can indeed be used to detect shapes of specified colors. Further work will aim to be expanded descriptive capability of new extraction.

Ah-Reum Oh, Mark S. Nixon
A Fast Algorithm for Reconstructing hv-Convex Binary Images from Their Horizontal Projection

The reconstruction of certain types of binary images from their projections is a frequently studied problem in combinatorial image processing.

hv

-convex images with fixed projections play an important role in discrete tomography. In this paper, we provide a fast polynomial-time algorithm for reconstructing canonical

hv

-convex images with given number of 4-connected components and with minimal number of columns satisfying a prescribed horizontal projection. We show that the method gives a solution that is always 8-connected. We also explain how the algorithm can be modified to obtain solutions with any given number of columns, and also with non-connected components.

Norbert Hantos, Péter Balázs
Gaussian Process Dynamical Models for Emotion Recognition

We describe a method for dynamic emotion recognition from facial expression sequences. Our model is based on learning a latent space using the Gaussian Process Latent Variable Model (GP-LVM), encapsulating facial landmarks shapes which describe a given facial expression. We incorporate the dynamic model by learning the latent representation, with the aim to respect the data’s dynamics (facial shapes should maintain their correspondence along time). Then, a Gaussian process classifier is implemented to evaluate the relevance of the latent space features in the emotion recognition task. The results show that the proposed method can efficiently model a dynamic facial emotion and recognize with high accuracy a facial emotion sequence.

Hernán F. García, Mauricio A. Álvarez, Álvaro Orozco
Evaluation of Perceptual Biases in Facial Expression Recognition by Humans and Machines

In this paper, we applied a reverse correlation approach to study the features that humans use to categorize facial expressions. The well-known portrait of Mona Lisa was used as the base image to investigate the features differentiating happy and sad expressions. The base image was blended with sinusoidal noise masks to create the stimulus. Observers were required to view each image and categorized it as happy or sad. Analysis of responses using superimposed classification images revealed both locations and identity of information required to represent each certain expression. To further investigate the results, a neural network based classifier was developed to identify the expression of the superimposed images from the machine learning perspective, which reveals that the pattern which humans perceive the expression is acknowledged by machines.

Xing Zhang, Lijun Yin, Daniel Hipp, Peter Gerhardstein
Handwritten Signature Verification Based on Enhanced Direction and Grid Features

Signatures continue to be an important biometric trait because it remains widely used primarily to authenticate the identity of human beings. This paper presents an efficient text-based directional signature recognition algorithm which verifies signatures, even when they are composed of special unconstrained cursive characters which are superimposed and embellished. This algorithm extends the character-based signature verification technique. The feature extraction techniques are optimized by implementing the proposed thinning technique that performs a repetitive scanning of the signature image and removes non-skeleton pixels until obtaining a structure in which all the pixels are skeletal ones. Further feature extraction enhancement is achieved by extending a new approach for fitting of a bounding rectangle to closed regions to compute a minimum bounding rectangle of any signature image. The computed minimum bounding rectangle is not necessarily horizontal. The minimum bounding rectangle is used to compute accurate geometric signature features such as

true

aspect ratio. The experiments carried out on the GPDS signature database and an additional database created from signatures captured using the ePadInk tablet, show that the approach is effective and efficient, with a positive verification rate of 96.56%.

Serestina Viriri
Improving Human Gait Recognition Using Feature Selection

Human gait, a biometric aimed to recognize individuals by the way they walk has recently come to play an increasingly important role in visual surveillance applications. Most of the existing approaches in this area, however, have mostly been evaluated without explicitly considering the most relevant gait features, which might have compromised the performance. In this paper, we have investigated the effect of discarding irrelevant or redundant gait features, by employing Genetic Algorithms (GAs) to select an optimal subset of features, on improving the performance of a gait recognition system. Experimental results on the CASIA dataset demonstrate that the proposed system achieves considerable gait recognition performance.

Faezeh Tafazzoli, George Bebis, Sushil Louis, Muhammad Hussain
Automatic Recognition of Microcalcifications in Mammography Images through Fractal Texture Analysis

Mammography images are widely used for detection of microcalcifications (MCs), which constitute the early stage of breast cancer. Moreover, these images allow the medical specialist to perform a timely diagnosis and to prevent complications in patients. Automatic identification of MCs in mammography images may be useful as a decision support given by a specialist. In this paper, we construct a mammography image database with medical validation and expert labeling. The test subjects are from a local population located in the

Eje cafetero, Colombia

. Also, we present a methodology for automatic recognition of microcalcifications based on segmentation with fractal texture analysis (SFTA) and a support vector machine (SVM). For a comparison framework with the state of the art, we compare our methodology with the local binary patterns (LBP) method, that is widely applied in digital images processing. Results show that SFTA methodology for recognition of MCs achieves an accuracy over 92.5% improving significatively when compared to LBP. Also, our database satisfies the epidemiological parameters to represent a local population.

Hernán Darío Vargas Cardona, Álvaro Orozco, Mauricio A. Álvarez
Bayesian Shape Models with Shape Priors for MRI Brain Segmentation

Planning a deep brain stimulation surgery in Parkinson disease is a critical task because the medical team needs to accurately locate the basal ganglia area (i.e. sub-thalamus) in a magnetic resonance image study. This paper proposes a new method for shape prior information based on the Chan-Vese model and Bayesian shape models for brain structure segmentation on magnetic resonance images. The method allows to initialize efficiently a given shape by fitting an active contour (Chan-Vese model), and then robustly fits a brain structure, performing a Bayesian shape fitting. The experimental results show that the proposed model can effectively segment a brain structure. Also, the proposed model, provides a fast segmentation which improves the computational cost compared with common segmentation techniques such as active shape models.

Hernán F. García, Mauricio A. Álvarez, Álvaro Orozco
Disocclusion Mitigation for Image Based Point Cloud Imposters

Image based imposters suffer from common errors called disocclusion artifacts where portions of the scene that should be occluded by real geometry are visible when using image based imposters. These artifacts are the result of parallax error created by camera motion where regions of a mesh that were not visible at the time of imposter generation have become visible. This paper presents a computationally inexpensive on-line technique to resolve these disocclusions by stretching existing imposter texture information over new geometry bridging the gap between imposters. This paper also includes an analysis of six automatic metrics for similarity between output frames of a flight path using this novel technique and the impersonated data, to quantify the quality of the technique compared to the original scene and other level-of-detail techniques.

Chad Mourning, Scott Nykl, David Chelberg
Adaptive Visualization of Linked-Data

Adaptive visualizations reduces the required cognitive effort to comprehend interactive visual pictures and amplify cognition. Although the research on adaptive visualizations grew in the last years, the existing approaches do not consider the transformation pipeline from data to visual representation for a more efficient and effective adaptation. Further todays systems commonly require an initial training by experts from the field and are limited to adaptation based either on user behavior or on data characteristics. A combination of both is not proposed to our knowledge. This paper introduces an enhanced instantiation of our previously proposed model that combines both: involving different influencing factors for and adapting various levels of visual peculiarities, on content, visual layout, visual presentation, and visual interface. Based on data type and users’ behavior, our system adapts a set of applicable visualization types. Moreover, retinal variables of each visualization type are adapted to meet individual or canonical requirements on both, data types and users’ behavior. Our system does not require an initial expert modeling.

Kawa Nazemi, Dirk Burkhardt, Reimond Retz, Arjan Kuijper, Jörn Kohlhammer
Parkinson Data Analysis and Interpretation with Data Visualization Methods

Information Visualization has been used in many areas and now its applications are growing each day. Modelling and visualization psychological processes, based on the analysis of psychological data can be a real necessary for all experts in this field. In this article we use PCA and person fit statistical for analyzing data, and some visualization tools for modeling to provide some visualizations of the Parkinson patient’s information. The results show some meaningful relations between BMI, Age in one side and percentage of some Parkinson patient features that can be appear in men and women on the other side.

Mehdi Ghayoumi, Ye Zhao
IntelliViz- A Tool for Visualizing Social Networks with Hashtags

Visualizing a real-time social network, such as from Twitter, can potentially discover the patterns and insight of actors’ interconnectedness and interactions according to the links between actors and activities. This paper presents a novel system for an intelligent and interactive visualization of social networks with hashtags. We provide a flexible, animated and simple, yet powerful, visualization to represent activities, relations and networks of involvers associated with hashtags. The system consists of multiple phases, including data collection and processing, and interactive visualization with intelligence. Early experimental results indicate its effectiveness for real-time analyzing the property of dynamic networks based on hashtags.

Jesse Tran, Quang Vinh Nguyen, Simeon Simoff
3D Previsualization Using a Computational Photography Camera

During movie production, movie directors use previsualization tools to convey the movie visuals as they see them in their minds eye. Traditional methods of previsualization include hand-drawn sketches, storyboards and still photographs. Recently, video game engines have been used for previsualization so that once the movie set is modeled, scene lighting, geometry, textures and various scene elements can be changed interactively and the effects of many potential changes can be previewed quickly. The use of video games for previsualization involves manually modeling the movie set by artists to create a digital version, which is expensive. We envision that a computational photography camera can be used for capturing images of a physical set from which a model of the scene can be automatically generated. A wide range of possible changes can be explored interactively and previewed on-set including scene geometry and textures. Since our vision is large, we focus initially on an initial prototype (a computational photography camera and previsualization algorithms), which enable scene lighting to be captured, inferred, manipulated and new lights applied (relighting). Evaluations of our light previsualization prototype shows low photometric error rates and encouraging feedback from experts.

Clifford Lindsay, Emmanuel Agu
Formation Control of Multiple Rectangular Agents with Limited Communication Ranges

Formation control of multiple agents has attracted many robotic and control researchers recently because of its potential applications in various fields. This paper presents a novel approach to the formation control of multiple rectangular agents with limited communication ranges. The proposed distributed control algorithm is designed by utilizing an artificial potential function. The proposed control algorithm can guarantee fast formation performance and no collision among agents. As a result, the rectangular agents can move together and quickly form a pre-defined shape of formation such as straight line and circle, etc. Simulation results are conducted to demonstrate the effectiveness of the proposed algorithm.

Thang Nguyen, Hung Manh La
Extrinsic Calibration between 2D Laser Range Finder and Fisheye Camera

This paper presents an approach of extrinsic calibration between a camera with fisheye lens and an invisible single-planar laser range finder (LRF). The proposed approach requires LRF and camera to observe a chessboard moved in their field of view. Through checking the changment of LRF measurements, a set of points located in the laser beams plane is detected. These detected points are then used to estimate the equation of the plane of the laser beams in the camera coordinate system. Finally, two geometrical constraints based on the equation of the plane and the set of points are constructed to estimate the extrinsic parameters between the fisheye camera and the LRF. According to simulation results, we show that the proposed approach permits to improve the results (when compared with the approach proposed in paper [1]). At last, real data experiments are carried out and results are presented.

Yong Fang, Cindy Cappelle, Yassine Ruichek
Hardware/Software Co-design of Embedded Real-Time KD-Tree Based Feature Matching Systems

Feature matching is an important step in many computational photography applications such as image stitching, 3D reconstruction and object recognition. KD-trees based Best Bin First (KD-BBF) search is one of the most widely used feature matching scheme being employed along with SIFT and SURF. The real time requirements of such computer vision applications for embedded systems put tight compute bounds on the processor. In this paper we propose a soft-core and a hardware accelerator based architecture that enables real time matching of SIFT feature descriptors for HD resolution images at 30 FPS. The proposed accelerator provides a speedup of more than 8 times over the pure software implementation.

Saad Shoaib, Rehan Hafiz, Muhammad Shafique
Backmatter
Metadaten
Titel
Advances in Visual Computing
herausgegeben von
George Bebis
Richard Boyle
Bahram Parvin
Darko Koracin
Ryan McMahan
Jason Jerald
Hui Zhang
Steven M. Drucker
Chandra Kambhamettu
Maha El Choubassi
Zhigang Deng
Mark Carlson
Copyright-Jahr
2014
Verlag
Springer International Publishing
Electronic ISBN
978-3-319-14364-4
Print ISBN
978-3-319-14363-7
DOI
https://doi.org/10.1007/978-3-319-14364-4

Premium Partner