Nonlinear Transfer Function-Based Image Detail Preserving Dynamic Range Compression for Color Image Enhancement

This paper presents a method for color image enhancement in HSV space with preserving image details. The RGB color image is converted into HSV space and V channel image is now subjected for enhancement. By applying image dependent nonlinear transfer function the local image contrast preserving dynamic range compression as well as contrast enhancement is performed simultaneously on the V channel image. Finally, the enhanced V channel image and original H and S channel images are converted back to RGB image to obtain enhanced RGB image. The original color of the image is preserved because H and S component are kept unchanged. The experimental results show that the performance of the proposed method is better in terms of both subjective and objective evaluation in comparison with conventional methods.

Deepak Ghimire, Joonwhoan Lee

3D Perception Adjustment of Stereoscopic Images Based upon Depth Map

Recently, a variety of stereoscopic contents have been provided to academic and industrial fields for broadcasting, movies and mobile materials. However, few works have been interested in the adjustment of 3D contents for diverse displays. For instance, movie contents suited to large screen frequently do not deliver the same 3D perception to small-size screen such as mobile phone, tabular PCs, etc. For this, this paper presents an adjustment method of stereoscopic contents. 2D+Depth is one of popular methods with which stereoscopic images are generated. For this, depth planes are derived based on a depth histogram. By adjusting depth planes, a new depth map is made. Then 2D+Depth produces a stereoscopic image. Experiments performed on various 2D+Depth images validate that the proposed methods deliver more enhanced 3D depth based on subjective evaluation experiments.

Jong In Gil, Seung Eun Jang, Manbae Kim

Super-Resolved Free-Viewpoint Image Synthesis Using Semi-global Depth Estimation and Depth-Reliability-Based Regularization

A method for synthesizing high-quality free-viewpoint images from a set of multi-view images is presented. First, an accurate depth map is estimated from a given target viewpoint using modified semi-global stereo matching. Then, a high-resolution image from that viewpoint is obtained through super-resolution reconstruction. The depth estimation results from the first step are used for the second step. First, the depth values are used to associate pixels between the input images and the latent high-resolution image. Second, the pixel-wise reliabilities of the depth information are used for regularization to adaptively control the strength of the super-resolution reconstruction. Experimental results using real images showed the effectiveness of our method.

Keita Takahashi, Takeshi Naemura

Heat Kernel Smoothing via Laplace-Beltrami Eigenfunctions and Its Application to Subcortical Structure Modeling

We present a new subcortical structure shape modeling framework using heat kernel smoothing constructed with the Laplace-Beltrami eigenfunctions. The cotan discretization is used to numerically obtain the eigenfunctions of the Laplace-Beltrami operator along the surface of subcortical structures of the brain. The eigenfunctions are then used to construct the heat kernel and used in smoothing out measurements noise along the surface. The proposed framework is applied in investigating the influence of age (38-79 years) and gender on amygdala and hippocampus shape. We detected a significant age effect on hippocampus in accordance with the previous studies. In addition, we also detected a significant gender effect on amygdala. Since we did not find any such differences in the traditional volumetric methods, our results demonstrate the benefit of the current framework over traditional volumetric methods.

Seung-Goo Kim, Moo K. Chung, Seongho Seo, Stacey M. Schaefer, Carien M. van Reekum, Richard J. Davidson

SLAM and Navigation in Indoor Environments

In this paper, we propose a system for wheeled robot SLAM and navigation in indoor environments. An omni-directional camera and a laser range finder are the sensors to extract the point features and the line features as the landmarks. In SLAM and self-localization while navigation, we use extended Kalman filter (EKF) to deal with the uncertainty of robot pose and landmark feature estimation. After the map is built, robot can navigate in the environment based on it. We apply two scale path-planning for navigation. The large-scale planning finds an appropriate path from starting point to destination. The local-scale path-planning fills up the drawbacks of the prior step, such as dealing with the static and dynamic obstacles and smoothing the path for easier robot following. Through the experiment results, we show that the proposed system can smoothly and correctly locate itself, build the environment map and navigate in indoor environments.

Shang-Yen Lin, Yung-Chang Chen

Color Based Stool Region Detection in Colonoscopy Videos for Quality Measurements

Colonoscopy is the accepted screening method for detecting colorectal cancer or colorectal polyps. One of the main factors affecting the diagnostic accuracy of colonoscopy is the quality of bowel preparation. Despite a large body of published data on methods that could optimize cleansing, a substantial level of inadequate cleansing occurs in 10% to 75% of patients in randomized controlled trials. In this paper, we propose a novel approach that automatically determines percentages of stool areas in images of digitized colonoscopy video files, and automatically computes an estimate of the BBPS (Boston Bowel Preparation Scale) score based on the percentages of stool areas. It involves the classification of image pixels based on their color features using a new method of planes on RGB (Red, Green and Blue) color space. Our experiments show that the proposed stool classification method is sound and very suitable for colonoscopy video analysis where variation of color features is considerably high.

Jayantha Muthukudage, JungHwan Oh, Wallapak Tavanapong, Johnny Wong, Piet C. de Groen

Improving Motion Estimation Using Image-Driven Functions and Hybrid Scheme

We introduce an alternative method to improve optical flow estimation using image data for control functions. Base on the nature of object motion, we tune the energy minimization process with an image-adaptive scheme embedded inside the energy function. We propose a hybrid scheme to improve the quality of the flow field and we use it along with the multiscale approach to deal with large motion in the sequence. The proposed hybrid scheme take advantages from multigrid solver and the pyramid model. Our proposed method yields good estimation results and it shows the potential to improve the performance of a given model. It can be applied to other advanced models. By improving quality of motion estimation, various applications in intelligent systems are available such as gesture recognition, video analysis, motion segmentation, etc.

Duc Dung Nguyen, Jae Wook Jeon

Real-Time Background Compensation for PTZ Cameras Using GPU Accelerated and Range-Limited Genetic Algorithm Search

We propose a range-limited Genetic Algorithm (GA) search with an accelerated Graphics Processing Unit (GPU) based implementation for background compensation where pan-tilt-zoom (PTZ) cameras are used. Our method contains GA with search ranges restricted using histogram matching and GPU implementation of the range-limited GA. First, based on histogram matching, estimation of approximate scale (camera zoom) and translation (camera pan and tilt) parameters is used to restrict the ranges for the later GA search. Next, the GA is applied to find an optimal solution. Experimental comparisons of the proposed method to existing methods show that our work has advantages: robust to critical situations due to using GA, and fast processing.

Thuy Tuong Nguyen, Jae Wook Jeon

Audio-Visual Speech Recognition Based on AAM Parameter and Phoneme Analysis of Visual Feature

As one of the techniques for robust speech recognition under noisy environment, audio-visual speech recognition using lip dynamic visual information together with audio information is attracting attention and the research is advanced in recent years. Since visual information plays a great role in audio-visual speech recognition, what to select as the visual feature becomes a significant point. This paper proposes, for spoken word recognition, to utilize

c

combined parameter(combined parameter) as the visual feature extracted by Active Appearance Model applied to a face image including the lip area. Combined parameter contains information of the coordinate value and the intensity value as the visual feature. The recognition rate was improved by the proposed feature compared to the conventional features such as DCT and the principal component score. Finally, we integrated the phoneme score from audio information and the viseme score from visual information with high accuracy.

Yuto Komai, Yasuo Ariki, Tetsuya Takiguchi

Multi-scale Integration of Slope Data on an Irregular Mesh

We describe a fast and robust gradient integration method that computes scene depths (or heights) from surface gradient (or surface normal) data such as would be obtained by photometric stereo or interferometry. Our method allows for uncertain or missing samples, which are often present in experimentally measured gradient maps; for sharp discontinuities in the scene’s depth, e.g. along object silhouette edges; and for irregularly spaced sampling points. To accommodate these features of the problem, we use an original and flexible representation of slope data, the weight-delta mesh. Like other state of the art solutions, our algorithm reduces the problem to a system of linear equations that is solved by Gauss-Seidel iteration with multi-scale acceleration. Its novel key step is a mesh decimation procedure that preserves the connectivity of the initial mesh. Tests with various synthetic and measured gradient data show that our algorithm is as accurate and efficient as the best available integrators for uniformly sampled data. Moreover our algorithm remains accurate and efficient even for large sets of weakly-connected instances of the problem, which cannot be efficiently handled by any existing algorithm.

Rafael F. V. Saracchini, Jorge Stolfi, Helena C. G. Leitão, Gary Atkinson, Melvyn L. Smith

Virtual Viewpoint Disparity Estimation and Convergence Check for Real-Time View Synthesis

In this paper, we propose a new method for real-time disparity estimation and intermediate view synthesis from stereoscopic images. Some 3D video systems employ both the left and right depth images for virtual view synthesis; however, we estimate only one disparity map at a virtual viewpoint. In addition, we utilize hierarchical belief propagation and convergence check methods to find the global solution rapidly. In order to use the virtual viewpoint disparity map for intermediate view synthesis, we build an occlusion map that describes the occlusion information in the virtual viewpoint region of the reference image. We have also implemented the total system using GPU programming to synthesize virtual viewpoint images in real time.

In-Yong Shin, Yo-Sung Ho

Spatial Feature Interdependence Matrix (SFIM): A Robust Descriptor for Face Recognition

In this paper, a new face descriptor called spatial feature interdependence matrix (SFIM) is proposed for addressing representation of human faces under variations of illumination and facial expression. Unlike traditional face descriptors which usually use a hierarchically organized or a sequentially concatenated structure to describe the spatial arrangement of features in different facial regions, SFIM is focused on exploring inherent spatial feature interdependences among separated facial regions in a face image. We compute the feature interdependence strength between each pair of facial regions as the Chi square distance between two corresponding histogram based feature vectors. Once face images are represented as SFIMs, we then employ spectral regression discriminant analysis (SRDA) to achieve face recognition under a nearest neighbor search framework. Extensive experimental results on two well-known face databases demonstrate that the proposed method has superior performance in comparison with related approaches.

Anbang Yao, Shan Yu

Coding of Dynamic 3D Mesh Model for 3D Video Transmission

Recently, 3D video has gained increasing attention in multimedia field. The representation of 3D video is often based on dynamic 3D mesh model, which is reconstructed from multi-view video, plus surrounding texture information for rendering, so that arbitrary novel views can be synthesized accordingly. However, the dynamic 3D mesh model herein is not time-consistent, resulting in a difficulty in applying traditional mesh compression tools efficiently (e.g., MPEG-4 AFX 3DMC). In this paper, we modify the 3DMC algorithm for the coding and transmission of 3D video, taking its advantage of high coding efficiency for edge topologies and enhancing it with 3D motion estimation of vertices between two time-successive mesh models. Experiment results show that our method can reach about 30 times of compression ratio. Compared to MPEG-4 AFX 3DMC, under comparable reconstruction quality, our algorithm has a bit rate saving of about 20%~45%.

Jui-Chiu Chiang, Chun-Hung Chen, Wen-Nung Lie

Ray Divergence-Based Bundle Adjustment Conditioning for Multi-view Stereo

An algorithm that shows how ray divergence in multi-view stereo scene reconstruction can be used towards improving bundle adjustment weighting and conditioning is presented. Starting with a set of feature tracks, ray divergence when attempting to compute scene structure for each track is first obtained. Assuming accurate feature matching, ray divergence reveals mainly camera parameter estimation inaccuracies. Due to its smooth variation across neighboring feature tracks, from its histogram a set of weights can be computed that can be used in bundle adjustment to improve its convergence properties. It is proven that this novel weighting scheme results in lower reprojection errors and faster processing times than others such as image feature covariances, making it very suitable in general for applications involving multi-view pose and structure estimation.

Mauricio Hess-Flores, Daniel Knoblauch, Mark A. Duchaineau, Kenneth I. Joy, Falko Kuester

Temporally Consistent Disparity and Optical Flow via Efficient Spatio-temporal Filtering

This paper presents a new efficient algorithm for computing temporally consistent disparity maps from video footage. Our method is motivated by recent work [1] that achieves high quality stereo results by smoothing disparity costs with a fast edge-preserving filter. This previous approach was designed to work with single static image pairs and does not maintain temporal coherency of disparity maps when applied to video streams.

The main contribution of our work is to transfer this concept to the spatio-temporal domain in order to efficiently achieve temporally consistent disparity maps, where disparity changes are aligned with spatio-temporal edges of the video sequence. We further show that our method can be used as spatio-temporal regularizer for optical flow estimation. Our approach can be implemented efficiently, achieving real-time results for stereo matching. Quantitative and qualitative results demonstrate that our approach (i) considerably improves over frame-by-frame methods for both stereo and optical flow; and (ii) outperforms the state-of-the-art for local space-time stereo approaches.

Asmaa Hosni, Christoph Rhemann, Michael Bleyer, Margrit Gelautz

Specular-Free Residual Minimization for Photometric Stereo with Unknown Light Sources

We address a photometric stereo problem that has unknown lighting conditions. To estimate the shape, reflection properties, and lighting conditions, we employ a nonlinear minimization that searches for parameters that can synthesize images that best fit the input images. A similar approach has been reported previously, but it suffers from slow convergence due to specular reflection parameters. In this paper, we introduce specular-free residual minimization that avoids the negative effects of specular reflection components by projecting the residual onto the complementary space of the light color. The minimization process simultaneously searches for the optimal light color and other parameters. We demonstrate the effectiveness of the proposed method using several real and synthetic image sets.

Tsuyoshi Migita, Kazuhiro Sogawa, Takeshi Shakunaga

Analysing False Positives and 3D Structure to Create Intelligent Thresholding and Weighting Functions for SIFT Features

This paper outlines image processes for object detection and feature match weighting utilising stereoscopic image pairs, the Scale Invariant Feature Transform (SIFT) [13,4] and 3D reconstruction. The process is called FEWER; Feature Extraction and Weighting for Enhanced Recognition. The object detection technique is based on noise subtraction utilising the false positive matches from random features. The feature weighting process utilises a 3D spatial information generated from the stereoscopic pairs and 3D feature clusters. The features are divided into three different types, matched from the target to the scene and weighted based on their 3D data and spatial cluster properties. The weightings are computed by analysing a large number of false positive matches and this gives an estimation of the probability that a feature is matched correctly. The techniques described provide increased accuracy, reduces the occurrence of false positives and can create a reduced set of highly relevant features.

Michael May, Martin Turner, Tim Morris

Verging Axis Stereophotogrammetry

Conventional stereophotogrammetry uses a canonical configuration in which the optical axes of both cameras are parallel. However, if we follow lessons from evolution and swivel the cameras so that their axes intersect in a fixation point, then we obtain considerably better depth resolution. We modified our real-time stereo hardware to handle verging axis configurations and show that the predicted depth resolution is practically obtainable. We compare two techniques for rectifying images for verging configurations. Bouguet’s technique gives a simpler geometry - the iso-disparity lines are straight and the familiar reciprocal relationship between depth and disparity may still be used. However when the iso-disparity lines are the Veith-Muller circles, slightly better depth resolution may be obtained in the periphery of the field of view - at the expense of a more complex conversion from disparity to depth.

Khurram Jawed, John Morris

More on Weak Feature: Self-correlate Histogram Distances

In object detection research, there is a discussion on weak feature and strong feature, feature descriptors, regardless of being considered as ’weak feature descriptors’ or ’strong feature descriptors’ does not necessarily imply detector performance unless combined with relevant classification algorithms. Since 2001, main stream object detection research projects have been following the Viola Jone’s weak feature (Haar-like feature) and AdaBoost classifier approach. Until 2005, when Dalal and Triggs have created the approach of a strong feature (Histogram of Oriented Gradient) and Support Vector Machine (SVM) framework for human detection.

This paper proposes an approach to improve the salience of a weak feature descriptor by using intra-feature correlation. Although the intensity histogram distance feature known as Histogram Distance of Haar Regions (HDHR) itself is considered as a weak feature and can only be used to construct a weak learner to learn an AdaBoost classifier. In our paper, we explore the pairwise correlations between each and every histograms constructed and a strong feature can then be formulated. With the newly constructed strong feature based on histogram distances, a SVM classifier can be trained and later used for classification tasks. Promising experimental results have been obtained.

Sheng Wang, Qiang Wu, Xiangjian He, Wenjing Jia

Mid-level Segmentation and Segment Tracking for Long-Range Stereo Analysis

This paper presents a novel way of combining dense stereo and motion analysis for the purpose of mid-level scene segmentation and object tracking. The input is video data that addresses long-range stereo analysis, as typical when recording traffic scenes from a mobile platform. The task is to identify shapes of traffic-relevant objects without aiming at object classification at the considered stage. We analyse disparity dynamics in recorded scenes for solving this task. Statistical shape models are generated over subsequent frames. Shape correspondences are established by using a similarity measure based on set theory. The motion of detected shapes (frame to frame) is compensated by using a dense motion field as produced by a real-time optical flow algorithm. Experimental results show the quality of the proposed method which is fairly simple to implement.

Simon Hermann, Anko Börner, Reinhard Klette

Applications of Epsilon Radial Networks in Neuroimage Analyses

“

Is the brain ’wiring’ different between groups of populations?

” is an increasingly important question with advances in diffusion MRI and abundance of network analytic tools. Recently, automatic, data-driven and computationally efficient framework for extracting brain networks using tractography and epsilon neighborhoods were proposed in the diffusion tensor imaging (DTI) literature [1]. In this paper we propose new extensions to that framework and show potential applications of such epsilon radial networks (

ERN

) in performing various types of neuroimage analyses. These extensions allow us to use

ERN

s not only to mine for topo-physical properties of the structural brain networks but also to perform classical region-of-interest (ROI) analyses in a very efficient way. Thus we demonstrate the use of

ERN

s as a novel image processing lens for statistical and machine learning based analyses. We demonstrate its application in an autism study for identifying topological and quantitative group differences, as well as performing classification. Finally, these views are not restricted to

ERN

s but can be effective for population studies using any computationally efficient network-extraction procedures.

Nagesh Adluru, Moo K. Chung, Nicholas T. Lange, Janet E. Lainhart, Andrew L. Alexander

Road Image Segmentation and Recognition Using Hierarchical Bag-of-Textons Method

While the bag-of-words models are popular and powerful method for generic object recognition, they discard the context information for spatial layout. This paper presents a novel method for road image segmentation and recognition using a hierarchical bag-of-textons method. The histograms of extracted textons are concatenated to regions of interest with multi-scale regular grid windows. This method can learn automatically spatial layout and relative positions between objects in a road image. Experimental results show that the proposed hierarchical bag-of-textons method can effectively classify not only the texture-based objects, e.g. road, sky, sidewalk, building, but also shape-based objects, e.g. car, lane, of a road image comparing the conventional bag-of-textons methods for object recognition. In the future, the proposed system can combine with a road scene understanding system for vehicle environment perception.

Yousun Kang, Koichiro Yamaguchi, Takashi Naito, Yoshiki Ninomiya

On the Security of a Hybrid SVD-DCT Watermarking Method Based on LPSNR

Watermarking schemes allow a cover image to be embedded with a watermark, for diverse applications including proof of ownership and covert communication. In this paper, we present attacks on watermarking scheme proposed by Huang and Guan. This scheme is hybrid singular value decomposition (SVD) based scheme in the sense that they employ both SVD and other techniques for watermark embedding and extraction. By attacks, we mean that we show how the designers’ security claim, related to proof of ownership application can be invalidated. Our results are the first known attacks on this hybrid SVD-based watermarking scheme.

Huo-Chong Ling, Raphael C. -W. Phan, Swee-Huay Heng

Improved Entropy Coder in H.264/AVC for Lossless Residual Coding in the Spatial Domain

Since a block-based frequency transform is applied to residual data in lossy coding, it can reduce the spatial correlation efficiently. However, since residual data obtained from prediction is directly encoded without transform and quantization in lossless coding, there are some differences of the statistical properties in residuals between lossy and lossless coding. Based on the statistical characteristics of residuals in the spatial domain, we proposed an efficient context-based adaptive binary arithmetic coder (CABAC) for lossless residual coding. Experimental results show that the proposed CABAC provided approximately 19% bit saving, compared to the conventional CABAC.

Jin Heo, Yo-Sung Ho

Attention Prediction in Egocentric Video Using Motion and Visual Saliency

We propose a method of predicting human egocentric visual attention using bottom-up visual saliency and egomotion information. Computational models of visual saliency are often employed to predict human attention; however, its mechanism and effectiveness have not been fully explored in egocentric vision. The purpose of our framework is to compute attention maps from an egocentric video that can be used to infer a person’s visual attention. In addition to a standard visual saliency model, two kinds of attention maps are computed based on a camera’s rotation velocity and direction of movement. These rotation-based and translation-based attention maps are aggregated with a bottom-up saliency map to enhance the accuracy with which the person’s gaze positions can be predicted. The efficiency of the proposed framework was examined in real environments by using a head-mounted gaze tracker, and we found that the egomotion-based attention maps contributed to accurately predicting human visual attention.

Kentaro Yamada, Yusuke Sugano, Takahiro Okabe, Yoichi Sato, Akihiro Sugimoto, Kazuo Hiraki

FAW for Multi-exposure Fusion Features

This paper introduces a process where fusion features assist matching scale invariant feature transform (SIFT) image features from high contrast scenes. FAW defines the order for extracting features: features, alignment then weighting. The process uses three quality measures to select features from a series of differently exposed images and select a subset of the features in favour of those areas that are defined as well exposed from the different images. The results show an advantage in using these features over features extracted from the common alternative techniques of exposure fusion and tone mapping which extract the features as AWF; alignment, weighting then features. This paper also shows that the process allows for a more robust response when using misaligned or stereoscopic image sets.

Michael May, Martin Turner, Tim Morris

Efficient Stereo Image Rectification Method Using Horizontal Baseline

In this paper, we propose an efficient stereo image rectification method using the horizontal baseline. Since the stereo camera is generally manually arranged, there are geometric errors due to the camera misalignment and the differences between the camera internal characteristics. Although the conventional calibration-based stereo image rectification method is simple, it has an opportunity to provide the results that have some visual distortion such as image skewness. Therefore, the proposed method calculates the baseline for stereo image rectification, which is parallel to the horizontal line in the real world. Using this baseline, we estimate the camera parameters and the rectification transform. By applying the transform to the original images, we obtain the rectified stereo images. Experimental results show that the results of the proposed method provide the better rectified stereo image without visual distortion.

Yun-Suk Kang, Yo-Sung Ho

Real-Time Image Mosaicing Using Non-rigid Registration

Mosaicing is a classical application of image registration where images from the same scene are stitched together to generate a larger seamless image. This paper presents a real-time incremental mosaicing method that generates 2D mosaics by stitching video key-frames as soon as they are detected. The contributions are three-fold: (1) we propose a “fast” key-frame selection procedure based solely on the distribution of the distance of matched feature descriptors. This procedure automatically selects key-frames that are used to expand the mosaics while achieving real-time performance; (2) we register key-frame images by using a non-rigid deformation model in order to “smoothly” stitch images when scene transformations can not be expressed by homography: (3) we add a new constraint on the non-rigid deformation model that penalizes over-deformation in order to create “visually natural” mosaics. The performance of the proposed method was validated by experiments in non-controlled conditions and by comparison with the state-of-the-art method.

Rafael Henrique Castanheira de Souza, Masatoshi Okutomi, Akihiko Torii

Adaptive Guided Image Filtering for Sharpness Enhancement and Noise Reduction

Sharpness enhancement and noise reduction play crucial roles in computer vision and image processing. The problem is to enhance the appearance and reduce the noise of the digital images without causing halo artifacts. In this paper, we propose an adaptive guided image filtering (AGF) able to perform halo-free edge slope enhancement and noise reduction simulaneously. The proposed method is developed based on guided image filtering (GIF) and the shift-variant technique, part of adaptive bilateral filtering (ABF). Experiments showed the results produced from our method are superior to those produced from unsharp masking-based techniques and comparable to ABF filtered output. Our proposed AGF outperforms ABF in terms of computational complexity. It is implemented using a fast and exact linear-time algorithm.

Cuong Cao Pham, Synh Viet Uyen Ha, Jae Wook Jeon

Half-Sweep Imaging for Depth from Defocus

Depth from defocus (DFD) is a technique to recover the scene depth from defocusing in images. DFD usually involves two differently focused images (near-focused and far-focused) and calculates the size of the depth blur in the captured images. In recent years, the coded aperture technique, which uses a special pattern for the aperture to engineer the point spread function (PSF), has been used to improve the accuracy of DFD estimation. However, coded aperture sacrifices an incident light and loses a SNR of captured images which is needed for the accurate estimation. In this paper, we propose a new computational imaging, called half-sweep imaging. Half-sweep imaging engineers PSFs for improving DFD and maintaining the SNR of captured images. We confirmed the advantage of the imaging in comparison with conventional DFD and coded aperture in experiments.

Shuhei Matsui, Hajime Nagahara, Rin-ichiro Taniguchi

A Hierarchical Approach to Practical Beverage Package Recognition

In this paper we study the beverage package recognition problem for mobile applications. Unlike products such as books and CDs that are primarily packaged in rigid forms, the beverage labels may be attached on various forms including cans and bottles. Therefore, query images captured by users may have a wide range or variations in appearance. Furthermore, similar visual patterns may appear on distinct beverage packages that belong to the same series. To address these challenges, we propose a fast, hierarchical approach that can be used to effectively recognize a beverage package in real-time. A weighting scheme is introduced to enhance the recognition accuracy rate when the query beverage is among flavor varieties in a series. We examine the development of a practical system that can achieve a fairly good recognition performance (93% accuracy rate using an evaluation set of 120 images) in real-time.

Mei-Chen Yeh, Jason Tai

An Equivalent 3D Otsu’s Thresholding Method

Due to unsatisfactory segmentation results when images contain noise by the Otsu’s thresholding method. Two-dimensional (2D) and three-dimensional (3D) Otsu’s methods thus were proposed. These methods utilize not only grey levels of pixels but also their spatial informations such as mean and median values. The 3D Otsu’s methods use both kinds of spatial information while 2D Otsu’s methods use only one. Consequently the 3D Otsu’s methods more resist to noise, but also require more computational time than the 2D ones. We thus propose a method to reduce computational time and still provide satisfactory results. Unlike the 3D Otsu’s methods, our method selects each threshold component in the threshold vector independently instead of one threshold vector. The experimental results show that our method is more robust against noise, and its computational time is very close to that of the 2D Otsu’s methods.

Puthipong Sthitpattanapongsa, Thitiwan Srinark

Human Motion Tracking with Monocular Video by Introducing a Graph Structure into Gaussian Process Dynamical Models

This paper presents a novel approach to tracking articulated human motion with monocular video. In a conventional tracking system based on particle filters, it is very challenging to track a complex human pose with many degrees of freedom. A typical solution to this problem is to track the pose in a low dimensional latent space by manifold learning techniques, e.g., the Gaussian process dynamical model (GPDM model). In this paper, we extend the GPDM model into a graph structure (called

GPDM graph

) to better express the diverse dynamics of human motion, where multiple latent spaces are constructed and dynamically connected to each other appropriately by an unsupervised learning method. Basically, the proposed model has both intra-transitions (in each latent space) and inter-transitions (among latent spaces). Moreover, the probability of inter-transition is dynamic, depending on the current latent state. Using the proposed GPDM graph model, we can track human motion with monocular video, where the average tracking errors are improved from the state-of-the-art methods in our experiments.

Jianfeng Xu, Koichi Takagi, Shigeyuki Sakazawa

Depth Map Up-Sampling Using Random Walk

For the high quality three-dimensional broadcasting, depth maps are important data. Although commercially available depth cameras capture high-accuracy depth maps in real time, their resolutions are much smaller than those of the corresponding color images due to technical limitations. In this paper, we propose the depth map up-sampling method using a high-resolution color image and a low-resolution depth map. The proposed method is appropriate to match boundaries between the color image and the depth map. Experimental results show that our method enhances the depth map resolution successfully.

Gyo-Yoon Lee, Yo-Sung Ho

Evaluation of a New Coarse-to-Fine Strategy for Fast Semi-Global Stereo Matching

The paper considers semi-global stereo matching in the context of vision-based driver assistance systems. The need for real-time performance in this field requires a design change of the originally proposed method to run on current hardware. This paper proposes such a new design; the novel strategy first generates a disparity map from half-resolution input images. The result is then used as prior to restrict the disparity search space for full-resolution computation. This approach is compared to an SGM strategy as employed currently in a state-of-the-art real-time FPGA solution. Furthermore, trinocular stereo evaluation is performed on ten real-world traffic sequences with a total of 4,000 trinocular frames. An extension to the original evaluation methodology is proposed to resolve ambiguities and to incorporate disparity density in a statistically meaningful way. Evaluation results indicate that the novel SGM method is up to 40% faster when compared to the previous strategy. It returns denser disparity maps, and is also more accurate on evaluated traffic scenes.

Simon Hermann, Reinhard Klette

Theoretical Analysis of Multi-view Camera Arrangement and Light-Field Super-Resolution

We analyzed a light-field super-resolution problem in which, with a given set of multi-view images with a low resolution, the 3-D scene is reconstructed with a higher resolution using super-resolution (SR) reconstruction. The arrangement of the multi-view cameras is important because it determines the quality of the reconstruction. To simplify the analysis, we considered a situation in which a plane is located at a certain depth and a texture on that plane is super-resolved. We formulated the SR reconstruction process in the frequency domain, where the camera arrangement can be independently expressed as a matrix in the image formation model. We then evaluated the condition number of the matrix to quantify the quality of the SR reconstruction. We clarified that when the cameras are arranged in a regular grid, there exist singular depths in which the SR reconstruction becomes ill-posed. We also determined that this singularity can be avoided if the arrangement is randomly perturbed.

Ryo Nakashima, Keita Takahashi, Takeshi Naemura

Springer Professional

About this book

Table of Contents

Frontmatter