Skip to main content

Über dieses Buch

This book constitutes the thoroughly refereed post-conference proceedings of the 6th Pacific Rim Symposium on Image and Video Technology, PSIVT 2013, held in Guanajuato, México in October/November 2013. The total of 43 revised papers was carefully reviewed and selected from 90 submissions. The papers are organized in topical sections on image/video processing and analysis, image/video retrieval and scene understanding, applications of image and video technology, biomedical image processing and analysis, biometrics and image forensics, computational photography and arts, computer and robot vision, pattern recognition and video surveillance.



Gamut Mapping through Perceptually-Based Contrast Reduction

In this paper we present a spatial gamut mapping algorithm that relies on a perceptually-based variational framework. Our method adapts a well-known image energy functional whose minimization leads to image enhancement and contrast modification. We show how by varying the importance of the contrast term in the image functional we are able to perform gamut reduction. We propose an iterative scheme that allows our algorithm to successfully map the colors from the gamut of the original image to a given destination gamut while preserving the colors’ perception and texture close to the original image. Both subjective and objective evaluation validate the promising results achieved via our proposed framework.

Syed Waqas Zamir, Javier Vazquez-Corral, Marcelo Bertalmío

Precise Correction of Lateral Chromatic Aberration in Images

This paper addresses the problem of lateral chromatic aberration correction in images through color planes warping. We aim at high precision (largely sub-pixel) realignment of color channels. This is achieved thanks to two ingredients: high precision keypoint detection, which in our case are disk centers, and more general correction model than what is commonly used in the literature, radial polynomial. Our setup is quite easy to implement, requiring a pattern of black disks on white paper and a single snapshot. We measure the errors in terms of geometry and of color and compare our method to three different software programs. Quantitative results on real images show that our method allows alignment of average 0.05 pixel of color channels and a residual color error divided by a factor 3 to 6.

Victoria Rudakova, Pascal Monasse

High Accuracy Optical Flow for 3D Medical Image Registration Using the Census Cost Function

In 2004, Brox et al. described how to minimize an energy functional for dense 2D optical flow estimation that enforces both intensity and gradient constancy.

This paper presents a novel variant of their method, in which the census cost function is utilized in the data term instead of absolute intensity differences. The algorithm is applied to the task of pulmonary motion estimation in 3D computed tomography (CT) image sequences. The performance evaluation is based on DIR-lab benchmark data for lung CT registration. Results show that the presented algorithm can compete with current state-of-the-art methods in regards to both registration accuracy and run-time.

Simon Hermann, René Werner

Non-rigid Multimodal Image Registration Based on the Expectation-Maximization Algorithm

In this paper, we present a novel methodology for multimodal non-rigid image registration. The proposed approach is formulated by using the Expectation-Maximization (EM) technique in order to estimate a displacement vector field that aligns the images to register. In this approach, the image alignment relies on hidden stochastic random variables which allow to compare the intensity values between images of different modality. The methodology is basically composed of two steps: first, we provide an initial estimation of the the global deformation vector field by using a rigid registration technique based on particle filtering, obtaining, at the same time, an initial estimation of the joint conditional intensity distribution of the registered images; second, we approximate the remaining deformations by applying an iterative EM-technique approach, where at each step, a new estimation of the joint conditional intensity distribution and the displacement vector field are computed. The proposed algorithm was tested with different kinds of medical images; preliminary results show that the methodology is a good alternative for non-rigid multimodal registration.

Edgar Arce-Santana, Daniel U. Campos-Delgado, Flavio Vigueras-Gómez, Isnardo Reducindo, Aldo R. Mejía-Rodríguez

Wide-Baseline Dense Feature Matching for Endoscopic Images

Providing a feature-matching strategy to accurately recover tracked features after a fast and large endoscopic-camera motion or a strong organ deformation, is key in many endoscopic-imaging applications, such as augmented reality or soft-tissue shape recovery. Despite recent advances, existing feature-matching algorithms are characterized by limiting assumptions, and have not yet met the necessary levels of accuracy, especially when used to recover features in distorted or poorly-textured tissue areas. In this paper, we present a novel feature-matching algorithm that accurately recovers the position of image features over the entire organ’s surface. Our method is fully automatic, it does not require any explicit assumption about the organ’s 3-D surface, and leverages Gaussian Process Regression to incorporate noisy matches in a probabilistically sound way. We have conducted extensive tests with a large database of more than 100 endoscopic-image pairs, which show the improved accuracy and robustness of our approach when compared to current state-of-the-art methods.

Gustavo A. Puerto-Souza, Gian-Luca Mariottini

Vehicle Detection Based on Multi-feature Clues and Dempster-Shafer Fusion Theory

On-road vehicle detection and rear-end crash prevention are demanding subjects in both academia and automotive industry. The paper focuses on monocular vision-based vehicle detection under challenging lighting conditions, being still an open topic in the area of driver assistance systems. The paper proposes an effective vehicle detection method based on multiple features analysis and Dempster-Shafer-based fusion theory. We also utilize a new idea of

Adaptive Global

Haar-like (AGHaar) features as a promising method for feature classification and vehicle detection in both daylight and night conditions. Validation tests and experimental results show superior detection results for day, night, rainy, and challenging conditions compared to state-of-the-art solutions.

Mahdi Rezaei, Mutsuhiro Terauchi

UHDB11 Database for 3D-2D Face Recognition

Performance boosts in face recognition have been facilitated by the formation of facial databases, with collection protocols customized to address challenges such as light variability, expressions, pose, sensor/modality differences, and, more recently, uncontrolled acquisition conditions. In this paper, we present database UHDB11, to facilitate 3D-2D face recognition evaluations, where the gallery has been acquired using 3D sensors (3D mesh and texture) and the probes using 2D sensors (images). The database consists of samples from 23 individuals, in the form of 2D high-resolution images spanning six illumination conditions and 12 head-pose variations, and 3D facial mesh and texture. It addresses limitations regarding resolution, variability and type of 3D/2D data and has demonstrated to be statistically more challenging, diverse and information rich than existing cohorts of 10 times larger number of subjects. We propose a set of 3D-2D experimental configurations, with frontal 3D galleries and pose-illumination varying probes and provide baseline performance for identification and verification (available at


George Toderici, Georgios Evangelopoulos, Tianhong Fang, Theoharis Theoharis, Ioannis A. Kakadiaris

Joint Dictionary and Classifier Learning for Categorization of Images Using a Max-margin Framework

The Bag-of-Visual-Words (BoVW) model is a popular approach for visual recognition. Used successfully in many different tasks, simplicity and good performance are the main reasons for its popularity. The central aspect of this model, the visual dictionary, is used to build mid-level representations based on low level image descriptors. Classifiers are then trained using these mid-level representations to perform categorization. While most works based on BoVW models have been focused on learning a suitable dictionary or on proposing a suitable pooling strategy, little effort has been devoted to explore and improve the coupling between the dictionary and the top-level classifiers, in order to generate more discriminative models. This problem can be highly complex due to the large dictionary size usually needed by these methods. Also, most BoVW based systems usually perform multiclass categorization using a one-vs-all strategy, ignoring relevant correlations among classes. To tackle the previous issues, we propose a novel approach that jointly learns dictionary words and a proper top-level multiclass classifier. We use a max-margin learning framework to minimize a regularized energy formulation, allowing us to propagate labeled information to guide the commonly unsupervised dictionary learning process. As a result we produce a dictionary that is more compact and discriminative. We test our method on several popular datasets, where we demonstrate that our joint optimization strategy induces a word sharing behavior among the target classes, being able to achieve state-of-the-art performance using far less visual words than previous approaches.

Hans Lobel, René Vidal, Domingo Mery, Alvaro Soto

Multibit Embedding Algorithm for Steganography of Palette-Based Images

We propose a high-capacity data hiding scheme for palette-based images that does not seriously degrade the image quality in this paper. The proposed scheme can embed a multiple-bit message within the unit of a pixel matrix by using Euclidean distance, while some conventional schemes can embed only a one-bit message per pixel. The stego-images created by using our scheme offer a better quality compared to those by the conventional scheme. Moreover, we have obtained these results with low implementation cost. The experimental results show that the proposed scheme is efficient.

Shoko Imaizumi, Kei Ozawa

A Statistical Method for Peak Localization in Hough Space by Analysing Butterflies

The Hough transform is an efficient method for extracting lines in images. Precision of detection relies on how to find and locate accurately the peak in Hough space after the voting process. In this paper, a statistical method is proposed to improve peak localization by considering quantization error and image noise, and by considering the coordinate origin selection. The proposed peak localization is based on butterfly analysis: statistical standard variances and statistical means are computed and used as parameters of fitting and interpolation processes. We show that accurate peak parameters are achieved. Experimental results compare our results with those provided by other peak detection methods. In summary, we show that the proposed peak localization method for the Hough transform is both accurate and robust in the presence of quantization error and image noise.

Zezhong Xu, Bok-Suk Shin

Efficiency Analysis of POC-Derived Bases for Combinatorial Motion Estimation

Motion estimation is a fundamental problem in many computer vision applications. One solution to this problem consists in defining a large enough set of candidate motion vectors, and using a combinatorial optimization algorithm to find, for each point of interest, the candidate which best represents the motion at the point of interest. The choice of the candidate set has a direct impact in the accuracy and computational complexity of the optimization method. In this work, we show that a set containing the most representative maxima of the phase-correlation function between the two input images, computed for different overlapping regions, provides better accuracy and contains less spurious candidates than other choices in the literature. Moreover, a pre-selection stage, based in a local motion estimation algorithm, can be used to further reduce the cardinality of the candidate set, without affecting the accuracy of the results.

Alejandro Reyes, Alfonso Alba, Edgar Arce-Santana

Stereo and Motion Based 3D High Density Object Tracking

In order to understand the behavior of adult

Drosophila melanogaster

(fruit flies), vision-based 3D trajectory reconstruction methods are adopted. To improve the statistical strength of subsequent analysis, high-throughput measurements are necessary. However, ambiguities in both stereo matching and temporal tracking appear more frequently in high density situations, aggravating the complexity of the 3D tracking situation. In this paper we propose a high density object tracking algorithm. Instead of approximating trajectories for all frames in a direct manner, in ambiguous situations, tracking is terminated to generate robust tracklets based on the modified tracking-by-matching method. The terminated tracklets are linked to ongoing (unterminated) tracklets with minimum linking cost in an on-line fashion. Furthermore, we introduce a set of new evaluation metrics to analyze the tracking results. These metrics are used to analyse the effect of detection noise and compare our tracking algorithm with two state-of-the-art 3D tracking methods based on simulated data with hundreds of flies. The results indicate that our proposed algorithm outperforms both, the tracking-by-matching algorithm and a global correspondence selection approach.

Junli Tao, Benjamin Risse, Xiaoyi Jiang

TV-L1-Based 3D Medical Image Registration with the Census Cost Function

A recent trend in computer vision is to combine the census cost function with a TV-L


energy minimization scheme. Although this combination is known for its robust performance in computer vision applications, it has not been introduced to 3D medical image registration yet. Addressing pulmonary motion estimation in 4D (3D+t) CT images, we propose incorporating the census cost function into a 3D implementation of the ‘duality-based approach for realtime TV-L


optical flow’ for the task of lung CT registration. The performance of the proposed algorithm is evaluated on the DIR-lab benchmark and compared to state-of-the-art approaches in this field. Results highlight the potential of the census cost function for accurate pulmonary motion estimation in particular, and 3D medical image registration in general.

Simon Hermann, René Werner

An Automatic Timestamp Replanting Algorithm for Panorama Video Surveillance

Timestamp replanting is required when we want to remove timestamps in individual videos and to plant a timestamp into their merged panorama video. This paper presents a preliminary automatic timestamp replanting algorithm for producing panorama surveillance video. Timestamp replanting is a challenge problem because localization, removal, and recognition of timestamp are three difficulty tasks. This paper develops methods to attack the difficulties to finish the tasks. First, it presents a novel localization procedure which first localizes second-digit by using a pixel secondly-periodicity method. And then it localizes timestamp via extracting all digits of timestamp. Second, it adopts a homography-based method to conduct timestamp removal. Third, it presents a digit-sequence recognition method to recognize second-digit and online template matching to recognize the other digits. Experimental results show that the algorithm can accurately localize timestamp in a very low computing cost and that the performances of replanting are visually acceptable.

Xinguo Yu, Wu Song, Jun Cheng, Bo Qiu, Bin He

Virtual View Synthesis Based on DIBR and Image Inpainting

In 3DTV research, virtual view synthesis is a key component to the technology. Depth-image-based-rendering (DIBR) is an important method to realize virtual view synthesis. However, DIBR always results in hole problems where the depth and colour values are not known. Hole-filling methods often cause other problems, such as edge-ghosting and cracks. This paper proposes an algorithm that uses the depth and colour images to address the holes. It exploits the assumption of a virtual view between two laterally aligned reference cameras. The hole-filling method is performed on the blended depth image by morphological operations, and inpainting of the holes is obtained with the position information provided by the filtered depth maps. A new interpolation method to eliminate edge-ghosting is also presented, which additionally uses a post-processing technique to improve image quality. The main novelty of this paper is the unique image blending, which is more efficient than pre-processing depth maps. It is also the first method that is using morphological closing in the depth map de-noising process. The method proposed in this paper can effectively remove holes and edge-ghosting. Experimental quantitative and qualitative results show the proposed algorithm improves quality dramatically on traditional methods.

Yuhan Gao, Hui Chen, Weisong Gao, Tobi Vaudrey

Implementation Strategy of NDVI Algorithm with Nvidia Thrust

The calculation of Normalized Difference Vegetation Index (NDVI) has been studied in literature by multiple authors inside the remote sensing field and image processing field, however its application in large image files as satellite images restricts its use or need preprocessed phases to compensate for the large amount of resources needed or the processing time. This paper shown the implementation strategy to calculates NDVI for satellite images in RAW format, using the benefits of economic Supercomputing that were obtained by the video cards or Graphics Processing Units (GPU). Our algorithm outperforms other works developed in NVIDIA CUDA, the images used were provided by NASA and taken by Landsat 71 located on the Mexican coast, Ciudad del Carmen, Campeche.

Jesús Alvarez-Cedillo, Juan Herrera-Lozada, Israel Rivera-Zarate

Generation of a Super-Resolved Stereo Video Using Two Synchronized Videos with Different Magnifications

In this paper, we address the problem of changing the optical zoom magnification of stereo video that uses a stereo camera system with 4K or 8K digital cameras. We proposes a solution for generating a zoomed stereo video from a pair of zoomed and non-zoomed videos. To achieve this, part of the non-zoomed video image is isolated and super-resolved, so that the resolution of the image becomes the same as that of the optically-zoomed image. The non-zoomed video is super-resolved by energy minimization using the optically-zoomed image as an example. The effectiveness of this method is validated through experiments.

Yusuke Hayashi, Norihiko Kawai, Tomokazu Sato, Miyuki Okumoto, Naokazu Yokoya

Video Saliency Modulation in the HSI Color Space for Drawing Gaze

We propose a method for drawing gaze to a given target in videos, by modulating the value of pixels based on the saliency map. The change of pixel values is described by enhancement maps, which are weighted combination of center-surround difference maps of intensity channel and two color opponency channels. Enhancement maps are applied to each video frame in the HSI color space to increase saliency in the target region, and to decrease that in the background. The TLD tracker is employed for tracking the target over frames. Saliency map is used to control the strength of modulation. Moreover, a


step is introduced for accelerating computation, and a post-processing module helps to eliminate flicker. Experimental results show that this method is effective in drawing attention of subjects, but the problem of flicker may rise in minor cases.

Tao Shi, Akihiro Sugimoto

Posture Based Detection of Attention in Human Computer Interaction

Unacted posture conveys cues about people’s attentional disposition. We aim to identify robust markers of attention from posture while people carry out their duties seated in front of their computers at work. Body postures were randomly captured from 6 subjects while at work using a Kinect, and self-assessed as attentive or not attentive. Robust postural features exhibiting higher discriminative power across classification exercises with 4 well-known classifiers were identified. Average classification of attention from posture reached 76.47%±4.58% (F-measure). A total of 40 postural features were tested and those proxy of head tilt were found to be the most stable markers of attention in seated conditions based upon 3 class separability criteria. Unobtrusively monitoring posture of users while working in front of a computer can reliably be used to infer attentional disposition from the user. Human-computer interaction systems can benefit from this knowledge to customize the experience to the user changing attentional state.

Patrick Heyer, Javier Herrera-Vega, Dan-El N. Vila Rosado, Luis Enrique Sucar, Felipe Orihuela-Espina

Evaluation of AFIS-Ranked Latent Fingerprint Matched Templates

The methodology currently practiced in latent print examination (known as ACE-V) yields only a decision as result, namely individualization, exclusion or inconclusive. From such a decision, it is not possible to express the strength of opinion of a forensic examiner quantitatively with a scientific basis to the criminal justice system. In this paper, we propose a framework to generate a score from the matched template generated by the forensic examiner. Such a score can be viewed as a measure of confidence of a forensic examiner quantitatively, which in turn can be used in statistics-based evidence evaluation framework, for e.g, likelihood ratio. Together with the description and evaluation of new realistic forensic case driven score computation, we also exploit the developed experimental framework to understand more about matched templates in forensic fingerprint databases.

Ram P. Krish, Julian Fierrez, Daniel Ramos, Raymond Veldhuis, Ruifang Wang

Exemplar-Based Hole-Filling Technique for Multiple Dynamic Objects

Entire shape reconstruction of dynamic objects is an important research subject with applications on film production, virtual reality, modeling and engineering. Typically, entire shape reconstruction of real objects is achieved by combining the outcome of objects scanned from multiple directions. However, due to limitations on the number of 3D sensors enclosing the scene, occlusions inevitably occur, causing holes to appear on the reconstructed surfaces. These issues are intensified if dynamic, moving objects are considered. Volumetric and polygonal approaches exist to address these problems. Most notably, exemplar-based polygonal methods have gained momentum due to their overall improved visual quality. In this paper we propose an extension to the plain exemplar-based technique that allows for multiple dynamic objects. With our method, adequate hole-filling candidates are sampled from spatial and temporal domains and then used to synthesize likely plausible surfaces with smooth boundaries for the hole regions.

Matteo Pagliardini, Yasuhiro Akagi, Marcos Slomp, Ryo Furukawa, Ryusuke Sagawa, Hiroshi Kawasaki

Line Segment Detection with Hough Transform Based on Minimum Entropy

The Hough transform is a popular technique used in the field of image processing. In this paper, fitting and interpolation techniques are employed to compute high-accuracy peak parameters by considering peak spreading. The entropy is selected to measure the scatter-degree of voting. The voting in each column is considered as a random variable and voting values are considered as a probabilistic distribution. The corresponding entropies are computed and used to estimate the peak parameters. Endpoint coordinates of a line segment are computed by fitting a sine curve with more cells. It is more accurate and robust compared to solving directly two equations. The proposed method is tested on simulated and real images.

Zezhong Xu, Bok-Suk Shin

Easy-to-Use and Accurate Calibration of RGB-D Cameras from Spheres

RGB-Depth (or RGB-D) cameras are increasingly being adopted for real-world applications, especially in areas of healthcare and at-home monitoring. As for any other sensor, and since the manufacturer’s parameters (e.g., focal length) might change between models, calibration is necessary to increase the camera’s sensing accuracy. In this paper, we present a novel RGB-D camera-calibration algorithm that is easy-to-use even for non-expert users at their home; our method can be used for any arrangement of RGB and depth sensors, and only requires that a spherical object (e.g., a basketball) is moved in front of the camera for a few seconds. A robust image-processing pipeline automatically detects the moving sphere and rejects noise and outliers in the image data. A novel closed-form solution is presented to accurately compute an initial set of calibration parameters which are then utilized in a nonlinear minimization stage over all the camera parameters including lens distortion. Extensive simulation and experimental results show the accuracy and robustness to outliers of our algorithm with respect to existing checkerboard-based methods. Furthermore, an

RGB-D Calibration Toolbox

for MATLAB is made freely available for the entire research community.

Aaron Staranowicz, Garrett R. Brown, Fabio Morbidi, Gian Luca Mariottini

Tree Species Classification Based on 3D Bark Texture Analysis

Terrestrial Laser Scanning (TLS) technique is today widely used in ground plots to acquire 3D point clouds from which forest inventory attributes are calculated. In the case of mixed plantings where the 3D point clouds contain data from several different tree species, it is important to be able to automatically recognize the tree species in order to analyze the data of each of the species separately. Although automatic tree species recognition from TLS data is an important problem, it has received very little attention from the scientific community. In this paper we propose a method for classifying five different tree species using TLS data. Our method is based on the analysis of the 3D geometric texture of the bark in order to compute roughness measures and shape characteristics that are fed as input to a Random Forest classifier to classify the tree species. The method has been evaluated on a test set composed of 265 samples (53 samples of each of the 5 species) and the results obtained are very encouraging.

Ahlem Othmani, Alexandre Piboule, Oscar Dalmau, Nicolas Lomenie, Said Mokrani, Lew Fock Chong Lew Yan Voon

Singular Vector Methods for Fundamental Matrix Computation

The normalized eight-point algorithm is broadly used for the computation of the fundamental matrix between two images given a set of correspondences. However, it performs poorly for low-size datasets due to the way in which the rank-two constraint is imposed on the fundamental matrix. We propose two new algorithms to enforce the rank-two constraint on the fundamental matrix in closed form. The first one restricts the projection on the manifold of fundamental matrices along the most favorable direction with respect to algebraic error. Its complexity is akin to the classical seven point algorithm. The second algorithm relaxes the search to the best plane with respect to the algebraic error. The minimization of this error amounts to finding the intersection of two bivariate cubic polynomial curves. These methods are based on the minimization of the algebraic error and perform equally well for large datasets. However, we show through synthetic and real experiments that the proposed algorithms compare favorably with the normalized eight-point algorithm for low-size datasets.

Ferran Espuny, Pascal Monasse

Global Haar-Like Features: A New Extension of Classic Haar Features for Efficient Face Detection in Noisy Images

This paper addresses the problem of detecting human faces in noisy images. We propose a method that includes a denoising preprocessing step, and a new face detection approach based on a novel extension of Haar-like features. Preprocessing of the input images is focused on the removal of different types of noise while preserving the phase data. For the face detection process, we introduce the concept of



dynamic global

Haar-like features, which are complementary to the well known classical Haar-like features. Matching dynamic global Haar-like features is faster than that of the traditional approach. Also, it does not increase the computational burden in the learning process. Experimental results obtained using images from the MIT-CMU dataset are promising in terms of detection rate and the false alarm rate in comparison with other competing algorithms.

Mahdi Rezaei, Hossein Ziaei Nafchi, Sandino Morales

High Accuracy Ellipse-Specific Fitting

We propose a new method that always fits an ellipse to a point sequence extracted from images. The currently known best ellipse fitting method is hyper-renormalization of Kanatani et al., but it may return a hyperbola when the noise in the data is very large. Our proposed method returns an ellipse close to the point sequence by random sampling of data points. Doing simulation, we show that our method has higher accuracy than the method of Fitzgibbon et al. and the method of Szpak et al., the two methods so far proposed to always return an ellipse.

Tomonari Masuzaki, Yasuyuki Sugaya, Kenichi Kanatani

A Trajectory Estimation Method for Badminton Shuttlecock Utilizing Motion Blur

To build a robust visual tracking method it is important to consider issues such as low observation resolution and variation in the target object’s shape. When we capture an object moving fast in a video camera motion blur is observed. This paper introduces a visual trajectory estimation method using blur characteristics in the 3D space. We acquire a movement speed vector based on the shape of a motion blur region. This method can extract both the position and speed of the moving object from an image frame, and apply them to a visual tracking process using Kalman filter. We estimated the 3D position of the object based on the information obtained from two different viewpoints as shown in figure 1. We evaluated our proposed method by the trajectory estimation of a badminton shuttlecock from video sequences of a badminton game.

Hidehiko Shishido, Itaru Kitahara, Yoshinari Kameda, Yuichi Ohta

Fast Human Detection in RGB-D Images with Progressive SVM-Classification

In this article, we propose a new, fast approach to detect human beings from RGB-D data, named Progressive Classification. The idea of this method is quite simple: As in several state-of-the-art algorithms, the classification is based on the evaluation of HOG-like descriptors within image test windows, which are divided into a set of blocks. In our method, the evaluation of the set of blocks is done progressively in a particular order, in such a way that the blocks that most contribute to the separability between the human and non-human classes are evaluated first. This permits to make an early decision about the human detection without necessarily reaching the evaluation of all the blocks, and therefore accelerating the detection process. We evaluate our method with different HOG-like descriptors and on a challenging dataset.

Domingo Iván Rodríguez González, Jean-Bernard Hayet

An Object Recognition Model Based on Visual Grammars and Bayesian Networks

A novel proposal for a general model for object recognition is presented. The proposed method is based on symbol-relational grammars and Bayesian networks. An object is modeled as a hierarchy of features and spatial relationships using a symbol-relational grammar. This grammar is learned automatically from examples, incorporating a simple segmentation algorithm in order to generate the lexicon. The grammar is created with the elements of the lexicon as terminal elements. This representation is automatically transformed into a Bayesian network structure which parameters are learned from examples. Thus, recognition is based on probabilistic inference in the Bayesian network representation. Preliminary results in modeling natural objects are presented. The main contribution of this work is a general methodology for building object recognition systems which combines the expressivity of a grammar with the robustness of probabilistic inference.

Elias Ruiz, Luis Enrique Sucar

Color Image Segmentation with a Hyper-Conic Multilayer Perceptron

We apply the Hyper-Conic Artificial Multilayer Perceptron (HC-MLP) to color image segmentation, where we consider image segmentation as a classification problem distinguishing between foreground and background pixels. The HC-MLP was designed by using the conic space and conformal geometric algebra. The neurons in the hidden layer contain a transfer function that defines a quadratic surface (spheres, ellipsoids, paraboloids and hyperboloids) by means of inner and outer products, and the neurons in the output layer contain a transfer function that decides whether a point is inside or outside a sphere. The Particle Swarm Optimization algorithm (PSO) is used to train the HC-MLP. A benchmark of fifty images is used to evaluate the performance of the algorithm and compare our proposal against statistical methods which use copula gaussian functions.

Juan Pablo Serrano, Arturo Hernández, Rafael Herrera

TimeViewer, a Tool for Visualizing the Problems of the Background Subtraction

This paper is about the TimeViewer tool that facilitates understanding of the most common problems in Background Subtraction. The tool displays patterns of each frame, and through the historical values of the pixels allows for visual identification of changes in a sequence of pixels. The paper demonstrates the usefulness of TimeViewer by showing how it visually presents the most common Background Subtraction problems.

Alejandro Sánchez Rodríguez, Juan Carlos González Castolo, Óscar Déniz Suárez

Block-Based Search Space Reduction Technique for Face Detection Using Shoulder and Head Curves

Conventional face detection techniques usually employ sliding window based approaches involving series of classifiers to accurately determine the position of the face in an input image resulting in high computational redundancy. Pre-processing techniques are being investigated to reduce the search space for face detection. In this paper, we propose a systematic approach to reduce the search space for face detection using head and shoulder curves. The proposed method includes Gradient Angle Histograms (GAH) that are applied in a block-based manner to detect these curves, which are further associated to determine the search space for face detection. A performance evaluation of the proposed method on the datasets (CASIA and Buffy) shows that an average search space reduction upto 80% is achieved with detection rates of over 90% for specific parameters of the dataset.

Supriya Sathyanarayana, Ravi Kumar Satzoda, Suchitra Sathyanarayana, Srikanthan Thambipillai

A Thermal Facial Emotion Database and Its Analysis

In recent years, thermal image has extensively been used in many fields such as military (e.g., target acquisition, surveillance, night vision, homing and tracking) and civilian purposes (e.g., medical diagnosis, thermal efficiency analysis, environmental monitoring). It may be a promising alternative for investigation of facial expression and emotion. Currently there are very few database to support the research in facial expression and emotion, however most of them either only include posed thermal expression images or lack thermal information. For these reasons, we propose and establish a natural visible and thermal facial emotion database. The database contains seven spontaneous emotions of 26 subjects. We also analyze a visible database, a thermal database to recognize expression and thermal information to recognize emotion.

Hung Nguyen, Kazunori Kotani, Fan Chen, Bac Le

Comparative Analysis of the Variability of Facial Landmarks for Forensics Using CCTV Images

This paper reports a study of the variability of facial landmarks in a forensic scenario using images acquired from CCTV images. This type of images presents a very low quality and a large range of variability factors such as differences in pose, expressions, occlusions, etc. Apart from this, the variability of facial landmarks is affected by the precision in which the landmarks are tagged. This process can be done manually or automatically depending on the application (e.g., forensics or automatic face recognition, respectively). This study is carried out comparing both manual and automatic procedures, and also 3 distances between the camera and the subjects. Results show that landmarks located in the outer part of the face (highest end of the head, ears and chin) present a higher level of variability compared to the landmarks located the inner face (eye region, and nose). This study shows that the landmark variability increases with the distance between subject and camera, and also the results of the manual and automatic approaches are similar for the inner facial landmarks.

Ruben Vera-Rodriguez, Pedro Tome, Julian Fierrez, Javier Ortega-Garcia

Human Action Recognition from Inter-temporal Dictionaries of Key-Sequences

This paper addresses the human action recognition in video by proposing a method based on three main processing steps. First, we tackle problems related to intraclass variations and differences in video lengths. We achieve this by reducing an input video to a set of key-sequences that represent atomic meaningful acts of each action class. Second, we use sparse coding techniques to learn a representation for each key-sequence. We then join these representations still preserving information about temporal relationships. We believe that this is a key step of our approach because it provides not only a suitable shared representation to characterize atomic acts, but it also encodes global temporal consistency among these acts. Accordingly, we call this representation inter-temporal acts descriptor. Third, we use this representation and sparse coding techniques to classify new videos. Finally, we show that, our approach outperforms several state-of-the-art methods when is tested using common benchmarks.

Analí Alfaro, Domingo Mery, Alvaro Soto

Summarization of Egocentric Moving Videos for Generating Walking Route Guidance

In this paper, we propose a method to summarize an egocentric moving video (a video recorded by a moving wearable camera) for generating a walking route guidance video. To summarize an egocentric video, we analyze it by applying pedestrian crosswalk detection as well as ego-motion classification, and estimate an importance score of each section of the given video. Based on the estimated importance scores, we dynamically control video playing speed instead of generating a summarized video file in advance. In the experiments, we prepared an egocentric moving video dataset including more than one-hour-long videos totally, and evaluated crosswalk detection and ego-motion classification methods. Evaluation of the whole system by user study has been proved that the proposed method is much better than a simple baseline summarization method without video analysis.

Masaya Okamoto, Keiji Yanai

A Robust Integrated Framework for Segmentation and Tracking

Recent studies on human motion capture (HMC) indicate the need for a likelihood model that does not rely on a static background. In this paper, we present an approach to human motion capture using a robust version of the oriented chamfer matching scheme. Our method relies on an MRF based segmentation to isolate the subject from the background, and therefore does not require a static background. Furthermore, we use robust statistics and make the likelihood robust to outliers. We compare the proposed approach to the alternative methods used in recent studies in HMC using the Human Eva I dataset. We show that our method performs significantly better than the alternatives despite of not assuming a static background.

Prabhu Kaliamoorthi, Ramakrishna Kakarala

Video Error Concealment Based on Data Hiding for the Emerging Video Technologies

The new high efficiency video coding standard (HEVC) includes structures and tools that were not available in previous standards. The macrobock concept was replaced by a quad-tree structure that includes: coding units, prediction units and transform units; also, new parallelization tools are now available. Video transmissions over error prone environments have the need of reliable and efficient error concealment methods. Unfortunately, most of the existent error concealment methods interfere, or do not take advantage of the new structures and tools.

In this work, a data hiding based error concealment method is proposed for the HEVC. During encoding, information is embedded into the residual transform coefficients; this information, is later retrieved and used during the error concealment process. The performed experiments and results show a superior performance when compared against the non-normative error concealment method included in the H.264/AVC joint model.

Francisco Aguirre-Ramos, Claudia Feregrino-Uribe, Rene Cumplido

Incorporating Audio Signals into Constructing a Visual Saliency Map

The saliency map has been proposed to identify regions that draw human visual attention. Differences of features from the surroundings are hierarchially computed for an image or an image sequence in multiple resolutions and they are fused in a fully bottom-up manner to obtain a saliency map. A video usually contains sounds, and not only visual stimuli but also auditory stimuli attract human attention. Nevertheless, most conventional methods discard auditory information and image information alone is used in computing a saliency map. This paper presents a method for constructing a visual saliency map by integrating image features with auditory features. We assume a single moving sound source in a video and introduce a sound source feature. Our method detects the sound source feature using the correlation between audio signals and sound source motion, and computes its importance in each frame in a video using an auditory saliency map. The importance is used to fuse the sound source feature with image features to construct a visual saliency map. Experiments using subjects demonstrate that a saliency map by our proposed method reflects human’s visual attention more accurately than that by a conventional method.

Jiro Nakajima, Akihiro Sugimoto, Kazuhiko Kawamoto

Generation of Animated Stereo Panoramic Images for Image-Based Virtual Reality Systems

Panoramic image-based representation of real world environments has received much attention in virtual/augmented reality applications due to advantages such as shorter generation times, faster rendering speeds, higher photorealism, and less storage needed when compared to the 3D modeling approaches. In this paper, a novel approach is proposed for generating animated stereo panoramic images based on a single video sequence captured by a panning camera. The techniques involved include generating seamless stereo video textures, inpainting the unsatisfactory regions, and embedding video textures into stereo panoramic images. A player is also developed to provide user stereo visualization and real-time navigation of the image-based virtual environment that are featured with seamlessly-looping animated scenes. The quality of the generated animated stereo panoramic image is satisfactory for virtual reality applications and the computation time is acceptable for practical use.

Fay Huang, Chih-Kai Chang, Zi-Xuan Lin

Gray-World Assumption on Perceptual Color Spaces

In this paper, the estimation of the illuminant in color constancy issues is analysed in two perceptual color spaces, and a variation of a well-known methodology is presented. Such approach is based on the Gray-World assumption, here particularly applied on the chromatic components in the CIELAB and CIELUV color spaces. A comparison is made between the outcomes on imagery for each color model considered. Reference images from the Gray-Ball dataset are considered for experimental tests. The performance of the approach is evaluated with the angular error, a metric well accepted in this field. The experimental results show that operating on perceptual color spaces improves the illuminant estimation, in comparison with the results obtained using the standard approach in RGB.

Jonathan Cepeda-Negrete, Raul E. Sanchez-Yanez

A Variational Method for the Optimization of Tone Mapping Operators

Given any metric that compares images of different dynamic range, we propose a method to reduce their distance with respect to this metric. The key idea is to consider the metric as a non local operator. Then, we transform the problem of distance reduction into a non local variational problem. In this context, the low dynamic range image having the smallest distance with a given high dynamic range is the minimum of a suitable energy, and can be reached through a gradient descent algorithm. Dealing with an appropriate metric, we present an application to Tone Mapping Operator (TMO) optimization. We apply our gradient descent algorithm, where the initial conditions are Tone Mapped (TM) images. Experiments show that our algorithm does reduce the distance of the TM images with the high dynamic range source images, meaning that our method improves the corresponding TMOs.

Praveen Cyriac, Thomas Batard, Marcelo Bertalmío


Weitere Informationen