Skip to main content

2013 | Buch

Computer Vision – ACCV 2012

11th Asian Conference on Computer Vision, Daejeon, Korea, November 5-9, 2012, Revised Selected Papers, Part IV

herausgegeben von: Kyoung Mu Lee, Yasuyuki Matsushita, James M. Rehg, Zhanyi Hu

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

The four-volume set LNCS 7724--7727 constitutes the thoroughly refereed post-conference proceedings of the 11th Asian Conference on Computer Vision, ACCV 2012, held in Daejeon, Korea, in November 2012. The total of 226 contributions presented in these volumes was carefully reviewed and selected from 869 submissions. The papers are organized in topical sections on object detection, learning and matching; object recognition; feature, representation, and recognition; segmentation, grouping, and classification; image representation; image and video retrieval and medical image analysis; face and gesture analysis and recognition; optical flow and tracking; motion, tracking, and computational photography; video analysis and action recognition; shape reconstruction and optimization; shape from X and photometry; applications of computer vision; low-level vision and applications of computer vision.

Inhaltsverzeichnis

Frontmatter

Oral Session 8: Shape Reconstruction and Optimization

Self-calibration and Motion Recovery from Silhouettes with Two Mirrors

This paper addresses the problem of self-calibration and motion recovery from a single snapshot obtained under a setting of two mirrors. The mirrors are able to show five views of an object in one image. In this paper, the epipoles of the real and virtual cameras are firstly estimated from the intersection of the bitangent lines between corresponding images, from which we can easily derive the horizon of the camera plane. The imaged circular points and the angle between the mirrors can then be obtained from equal angles between the bitangent lines, by planar rectification. The silhouettes produced by reflections can be treated as a special circular motion sequence. With this observation, technique developed for calibrating a circular motion sequence can be exploited to simplify the calibration of a single-view two-mirror system. Different from the state-of-the-art approaches, only one snapshot is required in this work for self-calibrating a natural camera and recovering the poses of the two mirrors. This is more flexible than previous approaches which require at least two images. When more than a single image is available, each image can be calibrated independently and the problem of varying focal length does not complicate the calibration problem. After the calibration, the visual hull of the objects can be obtained from the silhouettes. Experimental results show the feasibility and the preciseness of the proposed approach.

Hui Zhang, Ling Shao, Kwan-Yee Kenneth Wong
Stereo Reconstruction and Contrast Restoration in Daytime Fog

Stereo reconstruction serves many outdoor applications, and thus sometimes faces foggy weather. The quality of the reconstruction by state of the art algorithms is then degraded as contrast is reduced with the distance because of scattering. However, as shown by defogging algorithms from a single image, fog provides an extra depth cue in the gray level of far away objects. Our idea is thus to take advantage of both stereo and atmospheric veil depth cues to achieve better stereo reconstructions in foggy weather. To our knowledge, this subject has never been investigated earlier by the computer vision community. We thus propose a Markov Random Field model of the stereo reconstruction and defogging problem which can be optimized iteratively using the

α

-expansion algorithm. Outputs are a dense disparity map and an image where contrast is restored. The proposed model is evaluated on synthetic images. This evaluation shows that the proposed method achieves very good results on both stereo reconstruction and defogging compared to standard stereo reconstruction and single image defogging.

Laurent Caraffa, Jean-Philippe Tarel
Large-Scale Bundle Adjustment by Parameter Vector Partition

We propose an efficient parallel bundle adjustment (BA) algorithm to refine 3D reconstruction of the large-scale structure from motion (SfM) problem, which uses image collections from Internet. Different from the latest BA techniques that improve efficiency by optimizing the reprojection error function with Conjugate Gradient (CG) methods, we employ the parameter vector partition strategy. More specifically, we partition the whole BA parameter vector into a set of individual sub-vectors via normalized cut (Ncut). Correspondingly, the solution of the BA problem can be obtained by minimizing subproblems on these sub-vector spaces. Our approach is approximately parallel, and there is no need to solve the large-scale linear equation of the BA problem. Experiments carried out on a low-end computer with 4GB RAM demonstrate the efficiency and accuracy of the proposed algorithm.

Shanmin Pang, Jianrue Xue, Le Wang, Nanning Zheng
Learning Feature Subspaces for Appearance-Based Bundle Adjustment

We present an improved bundle adjustment method based on the online learned appearance subspaces of 3D points. Our method incorporates the additional information from the learned appearance models into bundle adjustment. Through the online learning of the appearance models, we are able to include more plausible observations of 2D features across diverse viewpoints. Bundle adjustment can benefit from such an increase in the number of observations. Our formulation uses the appearance information to impose additional constraints on the optimization. The detailed experiments with ground-truth data show that the proposed method is able to enhance the reliability of 2D correspondences, and more important, can improve the accuracy of camera motion estimation and the overall quality of 3D reconstruction.

Chia-Ming Cheng, Hwann-Tzong Chen

Poster Session 8: Shape from X and Photometry

Toward Efficient Acquisition of BRDFs with Fewer Samples

In this paper we propose a novel method for measuring reflectance of isotropic materials efficiently by carefully choosing a set of sampling directions which yields less modeling error. The analysis is based on the empirical observation that most isotropic BRDFs can be approximated using 2D bivariate representation. Further a compact representation in the form of basis is computed for a large database of densely measured materials. Using these basis and an iterative optimization process, an appropriate set of sampling directions necessary for acquiring reflectance of new materials are selected. Finally, the measured data using selected sampling directions is projected onto the compact basis to obtain weighting factors for linearly representing new material as a combination of basis of several previously measured materials. This compact representation with an appropriate BRDF parameterization allows us to significantly reduce the time and effort required for making new reflectance measurements of any isotropic material. Experimental results obtained using few sampling directions on the MERL dataset show comparative performance to an exhaustively captured set of BRDFs.

Muhammad Asad Ali, Imari Sato, Takahiro Okabe, Yoichi Sato
Shadow-Free TILT for Facade Rectification

In this paper, we propose a shadow-free TILT method to rectify facade images corrupted by shadows. The proposed method is deduced from the original TILT, and improve it by introducing a multiplicative shadow factor. That is, in our method, the constraint is represented that the rectified image equals to the low-rank image multiplied by the shadow image, yet with the additive noise corruption. Moreover, the objective function is improved by incorporating the smooth shadow model. Experimental results on both synthetic and real images demonstrate that our method provides more accurate and stable rectification results as compared with the original TILT, especially when shadows are strong in the input images.

Lumei Li, Hongping Yan, Lingfeng Wang, Chunhong Pan
Reconstructing Shape from Dictionaries of Shading Primitives

Although a lot of research has been performed in the field of reconstructing 3D shape from the shading in an image, only a small portion of this work has examined the association of local shading patterns over image patches with the underlying 3D geometry. Such approaches are a promising way to tackle the ambiguities inherent in the shape-from-shading (SfS) problem, but issues such as their sensitivity to non-lambertian reflectance or photometric calibration have reduced their real-world applicability. In this paper we show how the information in local shading patterns can be utilized in a practical approach applicable to real-world images, obtaining results that improve the state of the art in the SfS problem. Our approach is based on learning a set of geometric primitives, and the distribution of local shading patterns that each such primitive may produce under different reflectance parameters. The resulting dictionary of primitives is used to produce a set of hypotheses about 3D shape; these hypotheses are combined in a Markov Random Field (MRF) model to determine the final 3D shape.

Alexandros Panagopoulos, Sunil Hadap, Dimitris Samaras
Iterative Feedback Estimation of Depth and Radiance from Defocused Images

This paper presents a novel iterative feedback framework for simultaneous estimation of depth map and All-In-Focus (AIF) image, which benefits each other in each stage to obtain final convergence: For the recovery of AIF image, sparse prior of natural image is incorporated to ensure high quality defocus removal even under inaccurate depth estimation. In depth estimation step, we feed back the constraints from the high quality AIF image and adopt a numerical solution which is robust to the inaccuracy of AIF recovery to further raise the performance of DFD algorithm. Compared with traditional DFD methods, another advantage offered by this iterative framework is that by introducing AIF, which follows the prior knowledge of natural images to regularize the depth map estimation, DFD is much more robust to camera parameter changes. In addition, the proposed approach is a general framework that can incorporate depth estimation and AIF image recovery algorithms. The experimental results on both synthetic and real images demonstrate the effectiveness of the proposed method, especially on the challenging data sets containing large textureless regions and within a large range of camera parameters.

Xing Lin, Jinli Suo, Xun Cao, Qionghai Dai
Two-Image Perspective Photometric Stereo Using Shape-from-Shading

Shape-from-Shading and photometric stereo are two fundamental problems in Computer Vision aimed at reconstructing surface depth given either a single image taken under a known light source or multiple images taken under different illuminations, respectively. Whereas the former utilizes partial differential equation (PDE) techniques to solve the image irradiance equation, the latter can be expressed as a linear system of equations in surface derivatives when 3 or more images are given. It therefore seems that current photometric stereo techniques do not extract all possible depth information from each image by itself. This paper utilizes PDE techniques for the solution of the combined Shape-from-Shading and photometric stereo problem when only 2 images are available. Extending our previous results on this problem, we consider the more realistic perspective projection of surfaces during the photographic process. Under these assumptions, there is a unique weak (Lipschitz continuous) solution to the problem at hand, solving the well known convex/concave ambiguity of the Shape-from-Shading problem. We propose two approximation schemes for the numerical solution of this problem, an up-wind finite difference scheme and a Semi-Lagrangian scheme, and analyze their properties. We show that both schemes converge linearly and accurately reconstruct the original surfaces. In comparison with a similar method for the orthographic 2-image photometric stereo, the proposed perspective one outperforms the orthographic one. We also demonstrate the method on real-life images. Our results thus show that using methodologies common in the field of Shape-from-Shading it is possible to recover more depth information for the photometric stereo problem under the more realistic perspective projection assumption.

Roberto Mecca, Ariel Tankus, Alfred Marcel Bruckstein
Stable Two View Reconstruction Using the Six-Point Algorithm

We propose a practical scheme for selecting a pair of images which can be a good initial seed for incremental SfM to accomplish a feasible reconstruction from input images with no external camera information such as EXIF. The key idea is the effective use of the 6-point algorithm by detecting infeasible pairs of images due to the degenerate configurations as well as the other conditions. We deeply analyze all the degenerate configurations of the 6-point algorithm and derive the algorithms for detecting image pairs fallen into those degenerate configurations. Further, we implement an efficient pipeline for selecting the initial pair, which can be easily plugged into the standard incremental SfM systems. Our experimental results on synthetic and real data show that our algorithms successfully detect and reject the pairs of images which are infeasible for 3D reconstruction. Further, we demonstrate 3D reconstruction by plugging our infeasible pair detection algorithm into the standard SfM pipeline.

Kazuki Nozawa, Akihiko Torii, Masatoshi Okutomi
Unknown Radial Distortion Centers in Multiple View Geometry Problems

The radial undistortion model proposed by Fitzgibbon and the radial fundamental matrix were early steps to extend classical epipolar geometry to distorted cameras. Later minimal solvers have been proposed to find relative pose and radial distortion, given point correspondences between images. However, a big drawback of all these approaches is that they require the distortion center to be exactly known. In this paper we show how the distortion center can be absorbed into a new radial fundamental matrix. This new formulation is much more practical in reality as it allows also digital zoom, cropped images and camera-lens systems where the distortion center does not exactly coincide with the image center. In particular we start from the setting where only one of the two images contains radial distortion, analyze the structure of the particular radial fundamental matrix and show that the technique also generalizes to other linear multi-view relationships like trifocal tensor and homography. For the new radial fundamental matrix we propose different estimation algorithms from 9,10 and 11 points. We show how to extract the epipoles and prove the practical applicability on several epipolar geometry image pairs with strong distortion that - to the best of our knowledge - no other existing algorithm can handle properly.

José Henrique Brito, Roland Angst, Kevin Köser, Christopher Zach, Pedro Branco, Manuel João Ferreira, Marc Pollefeys
Depth-Estimation-Free Condition for Projective Factorization and Its Application to 3D Reconstruction

This paper concerns depth-estimation-free conditions for projective factorization. We first show that, using an algebraic approach, the estimation of the projective depth is avoidable if and only if the origins of all camera coordinate systems are lying on a single plane, and optical axes of the coordinate systems point the same direction that is perpendicular to the plane. Next, we generalize the result to the case where the points are possibly restricted on a plane or on a line. The result clearly reveals the trade-off between the freedom of camera motion and that of point location. We also give a least-square-based method for Euclidean reconstruction from the result of the projective reconstruction. The proposed method is evaluated through simulation from the viewpoint of computational time.

Yohei Murakami, Takeshi Endo, Yoshimichi Ito, Noboru Babaguchi
Epipolar Geometry Estimation for Urban Scenes with Repetitive Structures

Algorithms for the estimation of epipolar geometry from a pair of images have been very successful in recent years, being able to deal with wide baseline images. The algorithms succeed even when the percentage of correct matches from the initial set of matches is very low. In this paper the problem of scenes with repeated structures is addressed, concentrating on the common case of building facades. In these cases a large number of repeated features is found and can not be matched initially, causing state-of-the-art algorithms to fail. Our algorithm therefore clusters similar features in each of the two images and matches clusters of features. From these cluster pairs, a set of hypothesized homographies of the building facade are generated and ranked mainly according the support of matches of non-repeating features. Then in a separate step the epipole is recovered yielding the fundamental matrix. The algorithm then decides whether the fundamental matrix has been recovered reliably enough and if not returns only the homography. The algorithm has been tested successfully on a large number of pairs of images of buildings from the benchmark ZuBuD database for which several state-of-the-art algorithms nearly always fail.

Maria Kushnir, Ilan Shimshoni
Non-rigid Self-calibration of a Projective Camera

Rigid structure-from-motion (SfM) usually consists of two steps: First, a projective reconstruction is computed which is then upgraded to Euclidean structure and motion in a subsequent step. Reliable algorithms exist for both problems. In the case of non-rigid SfM, on the other hand, especially the Euclidean upgrading has turned out to be difficult. A few algorithms have been proposed for upgrading an

affine

reconstruction, and are able to obtain successful 3D-reconstructions. For upgrading a non-rigid

projective

reconstruction, however, either simple sequences are used, or no 3D-reconstructions are shown at all.

In this article, an algorithm is proposed for estimating the self-calibration of a projectively reconstructed non-rigid scene. In contrast to other algorithms, neither prior knowledge of the non-rigid deformations is required, nor a subsequent step to align different motion bases. An evaluation with synthetic data reveals that the proposed algorithm is robust to noise and it is able to accurately estimate the 3D-reconstructions and the intrinsic calibration. Finally, reconstructions of a challenging real image with strong non-rigid deformation are presented.

Hanno Ackermann, Bodo Rosenhahn
Piecewise Planar Scene Reconstruction and Optimization for Multi-view Stereo

This paper presents a multi-view stereo algorithm for piecewise planar scene reconstruction and optimization. Our segmentation-based reconstruction algorithm is iterative to minimize our defined energy function, consisting of reconstruction, refinement and optimization steps. The first step is a plane initialization to allow each segment to have a set of initial plane candidates. Then a plane refinement based on non-linear optimization improves the accuracy of the segment planes. Finally a plane optimization with a segment-adjacency graph leads to optimal segment planes, each of which is chosen among possible plane candidates by evaluating its relationship with adjacent planes in 3D. This algorithm yields better accuracy and performance, compared to the previous algorithms described in this paper. The results show our method is suitable for outdoor or aerial urban scene reconstruction, especially in wide baselines and images with textureless regions.

Hyojin Kim, Hong Xiao, Nelson Max
A Bayesian Approach to Uncertainty-Based Depth Map Super Resolution

The objective of this paper is to increase both spacial resolution and depth precision of a depth map. Our work aims to produce a super resolution depth map with quality as well as precision. This paper is motivated by the fact that errors of depth measurements from the sensor are inherent. By combining prior geometry of the scene, we propose a Bayesian approach to the uncertainty-based depth map super resolution. In particular, uncertainty of depth measurements is modeled in terms of kernel estimation and is used to formulate the likelihood. In this paper, we incorporate a gauss kernel on depth direction as well as an anisotropic spatial-color kernel. We further utilize geometric assumptions of the scene, namely the piece-wise planar assumption, to model the prior. Experiments on different datasets demonstrate effectiveness and precision of our algorithm compared with the state-of-art.

Jing Li, Gang Zeng, Rui Gan, Hongbin Zha, Long Wang
Cross Image Inference Scheme for Stereo Matching

In this paper, we propose a new interconnected Markov Random Field (MRF) or iMRF model for the stereo matching problem. Comparing with the standard MRF, our model takes into account the consistency between the label of a pixel in one image and the labels of its possible matching points in the other image. Inspired by the turbo decoding scheme, we formulate this consistency by a cross image reference term which is iteratively updated in our matching framework. The proposed iMRF model represents the matching problem better than the standard MRF and gives better results even without using any other information from segmentation prior or occlusion detection. We incorporate segmentation information and the coarse-to-fine scheme into our model to further improve the matching performance.

Xiao Tan, Changming Sun, Xavier Sirault, Robert Furbank, Tuan D. Pham
Bayesian Epipolar Geometry Estimation from Tomographic Projections

In this paper, we first show that the affine epipolar geometry can be estimated by identifying the common 1D projection from a pair of tomographic parallel projection images and the 1D affine transform between the common 1D projections. To our knowledge, the link between the common 1D projections and the affine epipolar geometry has been unknown previously; and in contrast to the traditional methods of estimating the epipolar geometry, no point correspondences are required. Using these properties, we then propose a Bayesian method for estimating the affine epipolar geometry, where we apply a Gaussian model for the noise and non-informative priors for the nuisance parameters. We derive an analytic form for the marginal posterior distribution, where the nuisance parameters are integrated out. The marginal posterior is sampled by a hybrid Gibbs–Metropolis–Hastings sampler and the conditional mean and the covariance over the posterior are evaluated on the homogeneous manifold of affine fundamental matrices. We obtained promising results with synthetic 3D Shepp–Logan phantom as well as with real cryo-electron microscope projections.

Sami S. Brandt, Katrine Hommelhoff Jensen, François Lauze
On the Global Self-calibration of Central Cameras Using Two Infinitesimal Rotations

The calibration of a generic central camera can be described non-parametrically by a map assigning to each image pixel a 3D projection ray. We address the determination of this map and the motion of a camera that performs two infinitesimal rotations about linearly independent axes. A complex closed-form solution exists, which in practice allows to visually identify the geometry of a range of sensors, but it only works at the center of the image domain and not accurately.

We present a new two-step method to solve the stated self-calibration problem that overcomes these drawbacks. Firstly, the Gram matrix of the camera rotation velocities is estimated jointly with the Lie bracket of the two rotational flows computed from the data images. Secondly, the knowledge that such Lie bracket is also a rotational flow is exploited to provide a solution for the calibration map which is defined on the whole image domain. Both steps are essentially linear, being robust to the noise inherent to the computation of optical flow from images.

The accuracy of the proposed method is quantitatively demonstrated for different noise levels, rotation pairs, and imaging geometries. Several applications are exemplified, and possible extensions and improvements are also considered.

Ferran Espuny
Adaptive Structure from Motion with a Contrario Model Estimation

Structure from Motion (SfM) algorithms take as input multi-view stereo images (along with internal calibration information) and yield a 3D point cloud and camera orientations/poses in a common 3D coordinate system. In the case of an incremental SfM pipeline, the process requires repeated model estimations based on detected feature points: homography, fundamental and essential matrices, as well as camera poses. These estimations have a crucial impact on the quality of 3D reconstruction. We propose to improve these estimations using the

a contrario

methodology. While SfM pipelines usually have globally-fixed thresholds for model estimation, the

a contrario

principle adapts thresholds to the input data and for each model estimation. Our experiments show that adaptive thresholds reach a significantly better precision. Additionally, the user is free from having to guess thresholds or to optimistically rely on default values. There are also cases where a globally-fixed threshold policy, whatever the threshold value is, cannot provide the best accuracy, contrary to an adaptive threshold policy.

Pierre Moulon, Pascal Monasse, Renaud Marlet
Precise 3D Reconstruction from a Single Image

3D object reconstruction from single images has extensive applications in multimedia. Most of existing related methods only recover rough 3D objects and the objects are often required to be interconnected. In this paper, we propose a novel method which uses a set of auxiliary reference grids to precisely reconstruct 3D objects from a single uncalibrated image. In our system, the user first draws the line drawings of the objects. Then, the initial focal length

f

of the camera is computed with a calibration method, and then the initial focal length is refined by a reference grid. With the refined

f

, a 3D position measurement environment is constructed, and a world coordinate system is defined by the user. After that, a set of reference grids are used to find the precise 3D locations of the object points and the wireframes of the objects are recovered automatically. Finally, the system generates the surfaces and renders the complete 3D objects. Besides the precise 3D modeling, our reconstruction method does not require the objects in a scene are interconnected. A set of examples are provided to demonstrate the ability of handling complex polyhedral objects and curved surfaces within one framework.

Changqing Zou, Jianbo Liu, Jianzhuang Liu
An Efficient Image Matching Method for Multi-View Stereo

Most existing Multi-View Stereo (MVS) algorithms employ the image matching method using Normalized Cross-Correlation (NCC) to estimate the depth of an object. The accuracy of the estimated depth depends on the step size of the depth in NCC-based window matching. The step size of the depth must be small for accurate 3D reconstruction, while the small step significantly increases computational cost. To improve the accuracy of depth estimation and reduce the computational cost, this paper proposes an efficient image matching method for MVS. The proposed method is based on Phase-Only Correlation (POC), which is a high-accuracy image matching technique using the phase components in Fourier transforms. The advantages of using POC are (i) the correlation function is obtained only by one window matching and (ii) the accurate sub-pixel displacement between two matching windows can be estimated by fitting the analytical correlation peak model of the POC function. Thus, using POC-based window matching for MVS makes it possible to estimate depth accurately from the correlation function obtained only by one window matching. Through a set of experiments using the public MVS datasets, we demonstrate that the proposed method performs better in terms of accuracy and computational cost than the conventional method.

Shuji Sakai, Koichi Ito, Takafumi Aoki, Tomohito Masuda, Hiroki Unten
Self-calibration of a PTZ Camera Using New LMI Constraints

In this paper, we propose a very reliable and flexible method for self-calibrating rotating and zooming cameras - generally referred to as PTZ (Pan-Tilt-Zoom) cameras. The proposed method employs a Linear Matrix Inequality (LMI) resolution approach and allows extra tunable constraints on the intrinsic parameters to be taken into account during the process of estimating these parameters. Furthermore, the considered constraints are simultaneously enforced in all views rather than in a single reference view. The results of our experiments show that the proposed approach allows for significant improvement in terms of accuracy and robustness when compared against state of the art methods.

François Rameau, Adlane Habed, Cédric Demonceaux, Désiré Sidibé, David Fofi
Fast 3D Surface Reconstruction from Point Clouds Using Graph-Based Fronts Propagation

This paper proposes a surface reconstruction approach that is based on fronts propagation over weighted graphs of arbitrary structure. The problem of surface reconstruction from a set of points has been extensively studied in the literature so far. The novelty of this approach resides in the use of the eikonal equation using Partial difference Equation on weighted graph. It produces a fast algorithm, which is the main contribution of this study. It also presents several examples that illustrate this approach.

Abdallah El Chakik, Xavier Desquesnes, Abderrahim Elmoataz

Oral Session 9: Applications of Computer Vision

Apparel Classification with Style

We introduce a complete pipeline for recognizing and classifying people’s clothing in natural scenes. This has several interesting applications, including e-commerce, event and activity recognition, online advertising,

etc

. The stages of the pipeline combine a number of state-of-the-art building blocks such as upper body detectors, various feature channels and visual attributes. The core of our method consists of a multi-class learner based on a Random Forest that uses strong discriminative learners as decision nodes. To make the pipeline as automatic as possible we also integrate automatically crawled training data from the web in the learning process. Typically, multi-class learning benefits from more labeled data. Because the crawled data may be noisy and contain images unrelated to our task, we extend Random Forests to be capable of transfer learning from different domains. For evaluation, we define 15 clothing classes and introduce a benchmark data set for the clothing classification task consisting of over 80,000 images, which we make publicly available. We report experimental results, where our classifier outperforms an SVM baseline with 41.38 % vs 35.07 % average accuracy on challenging benchmark data.

Lukas Bossard, Matthias Dantone, Christian Leistner, Christian Wengert, Till Quack, Luc Van Gool
Deblurring Vein Images and Removing Skin Wrinkle Patterns by Using Tri-band Illumination

We present a new method for enhancing images of blood vessels in skin tissues by using tri-band illumination. Transmitted-light vein images captured by a camera contain an image blur due to light scattering in skin. The blur can be described by a point spread function (PSF) that is a function of the thickness of skin layers in front of a vein and the extinction coefficients of skin tissues. The PSFs cannot be directly observed because the depth of a vein in skin tissue is unknown and the thickness of the skin tissues in different parts of the human body vary. Moreover, skin wrinkle patterns are observed as dark lines and need to be eliminated for clear vein imaging. We propose a method for removing image blur and skin wrinkle patterns from transmitted-light images of veins by using tri-band illumination. First, wrinkle patterns are separated from vein patterns by using a difference between the light absorbances of blood at two wavelengths. Subsequently, image blurs caused by light scattering at skin layers are removed by using a PSF estimated from two vein images. The key observations in this work are that at one of the three wavelengths to obtain the vein images the extinction coefficient of skin tissues must be twice as large as that at another of the wavelengths, and that at the third wavelength the extinction coefficient of blood must be smaller than it is at either of the other two wavelengths. This allows us to estimate true vein patterns without knowing the depth of a vein and to eliminate skin wrinkle patterns from a vein image. Our experiments show that our method can separate skin wrinkle patterns form vein patterns and that it reduces blur and improves the contrast of vein images better than a conventional method does. The results indicate that our method will contribute to the development of highly accurate personal authentication technology based on vein patterns.

Naoto Miura, Yoichi Sato
Reconstruction of 3D Surface and Restoration of Flat Document Image from Monocular Image Sequence

There is a strong demand for the digitization of books. To meet this demand, camera-based scanning systems are considered to be effective because they could work with the cameras built into mobile terminals. One promising technique proposed to speed up book digitization involves scanning a book while the user flips the pages. In this type of camera-based document image analysis, it is extremely important to rectify distorted images. In this paper, we propose a new method of reconstructing the 3D deformation and restoring a flat document image by utilizing a unique planar development property of a sheet of paper from a monocular image sequence captured while the paper is deformed. Our approach uses multiple input images and is based on the natural condition that a sheet of paper is a developable surface, enabling high-quality restoration without relying on the document structure. In the experiments, we tested the proposed method for the target application using images of different documents and different deformations, and demonstrated its effectiveness.

Hiroki Shibayama, Yoshihiro Watanabe, Masatoshi Ishikawa
Utilizing Optical Aberrations for Extended-Depth-of-Field Panoramas

Optical aberrations in off-the-shelf photographic lenses are commonly treated as unwanted artifacts that degrade image quality. In this paper we argue that such aberrations can be useful, as they often produce point-spread functions (PSFs) that have greater frequency-preserving abilities in the presence of defocus compared to an ideal thin lens. Specifically, aberrated and defocused PSFs often contain sharp, edge-like structures that vary with depth and image position, and become increasingly anisotropic away from the image center. In such cases, defocus blur varies spatially and preserves high spatial frequencies in some directions but not others. Here we take advantage of this fact to create extended-depth-of-field panoramas from overlapping photos taken with off-the-shelf lenses and a wide aperture. We achieve this by first measuring the lens PSF through a one-time calibration and then using multi-image deconvolution to restore anisotropic blur in areas of image overlap. Our results suggest that common lenses may preserve frequencies well enough to allow extended-depth-of-field panoramic photography with large apertures, resulting in potentially much shorter exposures.

Huixuan Tang, Kiriakos N. Kutulakos

Poster Session 9: Low-level Vision and Applications of Computer Vision

Motion-Invariant Coding Using a Programmable Aperture Camera

A moving object or camera causes motion blur in a conventional photograph, which is a fundamental problem of a camera. In this research, we propose to code a motion-invariant blur using a programmable aperture camera. The camera can realizes virtual camera motion by translating the opening, and as a result, we obtain a coded image in which motion blur is invariant with the object velocity. Thereby, we recover motion blurs without estimation of the motion blur kernels or knowledge of the object speeds. We model the projection of the programmable aperture camera, and also demonstrate that our proposed coding works for a prototype camera.

Toshiki Sonoda, Hajime Nagahara, Rin-ichiro Taniguchi
Color-Aware Regularization for Gradient Domain Image Manipulation

We propose a color-aware regularization for use with gradient domain image manipulation to avoid color shift artifacts. Our work is motivated by the observation that colors of objects in natural images typically follow distinct distributions in the color space. Conventional regularization methods ignore these distributions which can lead to undesirable colors appearing in the final output. Our approach uses an anisotropic Mahalanobis distance to control output colors to better fit original distributions. Our color-aware regularization is simple, easy to implement, and does not introduce significant computational overhead. To demonstrate the effectiveness of our method, we show the results with and without our color-aware regularization on three gradient domain tasks: gradient transfer, gradient boosting, and saliency sharpening.

Fanbo Deng, Seon Joo Kim, Yu-Wing Tai, Michael S. Brown
Local Covariance Filtering for Color Images

In this paper, we introduce a novel edge-aware filter that manipulates the local covariances of a color image. A covariance matrix obtained at each pixel is decomposed by the singular value decomposition (SVD), then diagonal eigenvalues are filtered by characteristic control functions. Our filter form generalizes a wide class of edge-aware filters. Once the SVDs are calculated, users can control the filter characteristic graphically by modifying the curve of the characteristic control functions, just like tone curve manipulation while seeing a result in real-time. We also introduce an efficient iterative calculation of the pixel-wise SVD which is able to significantly reduce its execution time.

Keiichiro Shirai, Masahiro Okuda, Takao Jinno, Masayuki Okamoto, Masaaki Ikehara
A New Projection Space for Separation of Specular-Diffuse Reflection Components in Color Images

In this paper, we propose a new reflectance separation model to separate the diffuse and specular reflection components. The model is based on a two-dimensional space called Ch-CV space, which is spanned by maximum chromaticity (Ch) and the coefficient of variation (CV) of RGB color. The space exhibits a more direct correspondence to diffuse and specular reflection components than the RGB color space, as well as the HSI color space. Under the whitened illumination, the surface points with the same diffuse chromaticity have the same slope in Ch-CV space. Based on these properties, we propose a slope-based region growing method to implement an image segmentation in the specular regions, and to separate the reflection components for each segmented region. The comparison experiments with several state-of-the-art algorithms show its superior capability to separate the specular and diffuse reflection components.

Jianwei Yang, Zhaowei Cai, Longyin Wen, Zhen Lei, Guodong Guo, Stan Z. Li
Hand Vein Recognition Based on Oriented Gradient Maps and Local Feature Matching

The hand vein pattern as a biometric trait for identification has attracted increasing interests in recent years thanks to its properties of uniqueness, permanence, non-invasiveness as well as strong immunity against forgery. In this paper, we propose a novel approach for back of the hand vein recognition. It first makes use of Oriented Gradient Maps (OGMs) to represent the Near-Infrared (NIR) hand vein images, simultaneously highlighting the distinctiveness of vein patterns and texture of their surrounding corium, in contrast to the state-of-the-art studies that only focused on the segmented vein region. SIFT based local matching is then performed to associate the keypoints between corresponding OGM pairs of the same subject. The proposed approach was benchmarked on the NCUT database consisting of 2040 NIR hand vein images from 102 subjects. The experimental results clearly demonstrate the effectiveness of our approach.

Di Huang, Yinhang Tang, Yiding Wang, Liming Chen, Yunhong Wang
Fusing Warping, Cropping, and Scaling for Optimal Image Thumbnail Generation

Image retargeting, as a content aware technique, is regarded as a logical tool for generating image thumbnails. However, the enormous difference between the size of source and target usually hinders single retargeting method from obtaining satisfactory results. In this paper, an unified framework is proposed to fuse three popular retargeting strategies, i.e. warping, cropping, and scaling, for thumbnail generation. Complementing each other, three retargeting strategies work together efficiently. Firstly, cropping selectively discards the unimportant regions in order to free up more space for displaying important content aesthetically. Next, warping helps to incorporate as much as possible visual information into thumbnails by rearranging important content more compactly through non-uniform deformation. Finally, scaling retrains the important content at an optimal size rather than undergoing an improper shrinkage. In our solution, warping, cropping and scaling are encoded as three energy terms of the objective function respectively, which can be solved efficiently by numerical optimization. Both qualitative and quantitative comparison results demonstrate that the proposed method achieves an excellent trade-off among smoothness, completeness and distinguishableness in thumbnail generation. Through these results, our method shows obvious superiority over state-of-the-art techniques.

Zhan Qu, Jinqiao Wang, Min Xu, Hanqing Lu
Shift-Map Based Stereo Image Retargeting with Disparity Adjustment

This paper introduces a novel image retargeting algorithm for 3D images given as pairs of stereo images. In the context of 3D image retargeting, the novel viewpoint advocated in this paper is that the geometric consistency in the form of preserving disparity values should not be an overpowering objective formulated as hard constraints. Instead, for maximizing viewing experience and comfort, it is desirable to simultaneously retarget the images as well as adjust the disparity values. The proposed retargeting algorithm is based on the methods of shift-map and importance filtering, and the main technical contribution of this paper is a successful extension of these earlier techniques to 3D images. We have evaluated the proposed method extensively, and the results demonstrate the efficiency of the proposed method as well as its potential for producing high-quality outputs. In particular, comparing with the state-of-the-art, the proposed method has a considerably shorter running time, and at the same time, it produces the retargeted 3D images that are more agreeable and pleasing for viewing.

Shaoyu Qi, Jeffrey Ho
Object Templates for Visual Place Categorization

The Visual Place Categorization (VPC) problem refers to the categorization of the semantic category of a place using only visual information collected from an autonomous robot. Previous works on this problem only made use of the global configurations observation, such as the Bag-of-Words model and spatial pyramid matching. In this paper, we present a novel system solving the problem utilizing both global configurations observation and local objects information. To be specific, we propose a local objects classifier that can automatically and effectively select key local objects of a semantic category from randomly sampled patches by the structural similarity support vector machine; and further classify the test frames with the Local Naive Bayes Nearest Neighbors algorithm. We also improve the global configurations observation with histogram intersection codebook and a noisy codewords removal mechanism. The temporal smoothness of the classification results is ensured by employing a Bayesian filtering framework. Empirically, our system outperforms state-of-the-art methods on two large scale and difficult datasets, demonstrating the superiority of the system.

Hao Yang, Jianxin Wu
Reconstructing Sequential Patterns without Knowing Image Correspondences

In this paper, we propose a method for reconstructing 3D sequential patterns from multiple images without knowing image correspondences and without calibrating camera sensitivity parameters on intensity. The sequential pattern is defined as a series of colored 3D points. We assume that the order of the points is obtained in multiple images, but the correspondence of individual points is not known among multiple images. For reconstructing sequential patterns, we consider a camera projection model which combines geometric and photometric information of objects. Furthermore, we consider camera projections in the frequency space. By considering the multi-view relationship on the new projection model, we show that the 3D sequential patterns can be reconstructed without knowing correspondence of individual image points in the sequential patterns, and also the recovered 3D patterns do not suffer from changes in camera sensitivity parameters.

Saba Batool Miyan, Jun Sato
Registration of Multi-view Images of Planar Surfaces

This paper presents a novel image-based registration method for high-resolution multi-view images of a planar material surface. Contrary to standard registration approaches, this method aligns images based on a true plane of the material’s surface and not on a plane defined by registration marks. It combines the camera calibration and the iterative fitting of desired position and slant of the surface plane, image re-registration, and evaluation of the surface alignment. To optimize image compression performance, we use an error of a compression method as a function evaluating the registration quality. The proposed method shows encouraging results on example visualizations of view- and illumination-dependent textures. In addition to a standard multi-view data registration approach, it provides a better alignment of multi-view images and thus allows more detailed visualization using the same compressed parameterization size.

Radomír Vávra, Jiří Filip
Automatic Stave Discovery for Musical Facsimiles

Lately, there is an increased interest in the analysis of music score facsimiles, aiming at automatic digitization and recognition. Noise, corruption, variations in handwriting, non-standard page layouts and notations are common problems affecting especially the centuries-old manuscripts.

Starting from a facsimile, the current state-of-the-art methods binarize the image, detect and group the staff lines, then remove the staff lines and classify the remaining symbols imposing rules and prior knowledge to obtain the final digital representation. The first steps are critical for the performance of the overall system.

Here we propose to handle binarization, staff detection and noise removal by means of dynamic programming (DP) formulations. Our main insights are: a) the staves (the 5-groups of staff lines) are represented by repetitive line patterns, are more constrained and informative, and thus we propose direct optimization over such patterns instead of first spotting single staff lines, b) the optimal binarization threshold also is the one giving the maximum evidence for the presence of staves, c) the noise, or background, is given by the regions where there is insufficient stave pattern evidence.

We validate our techniques on the CVC-MUSCIMA(2011) staff removal benchmark, achieving the best error rates (1.7%), as well as on various, other handwritten score facsimiles from the Renaissance.

Radu Timofte, Luc Van Gool
Unsupervised Language Learning for Discovered Visual Concepts

Computational models of grounded language learning have been based on the premise that words and concepts are learned simultaneously. Given the mounting cognitive evidence for concept formation in infants, we argue that the availability of pre-lexical concepts (learned from image sequences) leads to considerable computational efficiency in word acquisition. Key to the process is a model of bottom-up visual attention in dynamic scenes. We have used existing work in background-foreground segmentation, multiple object tracking, object discovery and trajectory clustering to form object category and action concepts. The set of acquired concepts under visual attentive focus are then correlated with contemporaneous commentary to learn the grounded semantics of words and multi-word phrasal concatenations from the narrative. We demonstrate that even based on mere 5 minutes of video segments, a number of rudimentary visual concepts can be discovered. When these concepts are associated with unedited English commentary, we observe that several words emerge - more than 60% of the concepts discovered from the video are associated with correct language labels. Thus, the computational model imitates the beginning of language comprehension, based on attentional parsing of the visual data. Finally, the emergence of multi-word phrasal concatenations, a precursor to syntax, is observed where there are more salient referents than single words.

Prithwijit Guha, Amitabha Mukerjee
Parameterized Variety Based View Synthesis Scheme for Multi-view 3DTV

This paper presents a novel parameterized variety based view synthesis scheme for 3DTV and multi-view systems. We have generalized the parameterized image variety approach to image based rendering proposed in [1] to handle full perspective cameras. An algebraic geometry framework is proposed for the parameterization of the variety associated with full perspective images, by image positions of three reference scene points. A complete parameterization of the 3D scene is constructed. This allows to generate realistic novel views from arbitrary viewpoints without explicit 3D reconstruction, taking few multi-view images as input from uncalibrated cameras.

Another contribution of this paper is to provide a generalised and flexible architecture based on this variety model for multi-view 3DTV. The novelty of the architecture lies in merging this variety based approach with standard depth image based view synthesis pipeline, without explicitly obtaining sparse or dense 3D points. This integrated framework subsequently overcomes the problems associated with existing depth based representations. The key aspects of this joint framework are: 1) Synthesis of artifacts free novel views from arbitrary camera positions for wide angle viewing. 2) Generation of signal representation compatible with standard multi-view systems. 3) Extraction of reliable view dependent depth maps from arbitrary virtual viewpoints without recovering exact 3D points. 4) Intuitive interface for virtual view specification based on scene content. Experimental results on standard multi-view sequences are presented to demonstrate the effectiveness of the proposed scheme.

Mansi Sharma, Santanu Chaudhury, Brejesh Lall
Quasi-regular Facade Structure Extraction

In this paper we present a novel two-stage framework for extracting what we define as a quasi-regular structure in facade images. A quasi-regular structure is an irregular rectangular grid representing the placements of repetitive structural architecture objects, e.g., windows, in a facade. Such a structure generalizes a perfect lattice structure generated by the 2D symmetry groups, studied by the previous work. First, we propose to formulate the quasi-regular structure detection in an object-oriented Marked Point Process framework by treating the architectural elements as objects. This leads to an initial quasi-regular structure map which serves as an indicator map of potential object locations. Then, we propose a regularization scheme to recover the complete quasi-regular structures from the initial incomplete structure. This stage takes advantage of the intrinsic low rank constraint of the quasi-regular structure representing a regularized facade. By applying such a regularization, the complete quasi-regular facade structure is obtained. We have extensively tested our method on a large variety of facade images, and demonstrated both the effectiveness and the robustness of our two-stage framework.

Tian Han, Chun Liu, Chiew Lan Tai, Long Quan
Multi-view Synthesis Based on Single View Reference Layer

We propose a virtual view synthesis method based on depth image-based rendering (DIBR) to realize wide multi-view 3D displays. The proposed multi-view rendering method focuses on reducing the repetitive hole restoration process and generating spatiotemporally consistent multi-views. First, we determine a single view reference layer (SVRL) and set the maximum hole area in this SVRL to cover the maximum hole occurrence in the synthesized views. The hole in the SVRL is also restored by referencing the non-hole region of the current SVRL and the accumulated background data of the previous frame. If the newly uncovered background region exists in the restored SVRL, we continuously accumulate the background region and use it to restore the hole of the next SVRL to achieve temporal consistency of the synthesized views. Finally, the restored hole in the SVRL is propagated to the hole in each synthesized view, thereby preserving the spatial consistency of the synthesized views because the hole region in each synthesized view is restored by using the common SVRL. The experimental results showed that the proposed method generates spatiotemporally consistent multi-view images and decreases the complexity of the hole restoration process by reducing the number of repetitive hole restoration process.

Yang-Ho Cho, Ho-Young Lee, Du-Sik Park
Hand-Eye Calibration without Hand Orientation Measurement Using Minimal Solution

In this paper we solve the problem of estimating the relative pose between a robot’s gripper and a camera mounted rigidly on the gripper in situations where the rotation of the gripper w.r.t. the robot global coordinate system is not known. It is a variation of the so called hand-eye calibration problem. We formulate it as a problem of seven equations in seven unknowns and solve it using the Gröbner basis method for solving systems of polynomial equations. This enables us to calibrate from the minimal number of two relative movements and to provide the first exact algebraic solution to the problem. Further, we describe a method for selecting the geometrically correct solution among the algebraically correct ones computed by the solver. In contrast to the previous iterative methods, our solution works without any initial estimate and has no problems with error accumulation. Finally, by evaluating our algorithm on both synthetic and real scene data we demonstrate that it is fast, noise resistant, and numerically stable.

Zuzana Kukelova, Jan Heller, Tomas Pajdla
Detecting Changes in Images of Street Scenes

In this paper we propose an novel algorithm for detecting changes in street scenes when the vehicle revisits sections of the street at different times. The proposed algorithm detects structural geometric changes, changes due to dynamically moving objects and as well as changes in the street appearance (e.g. posters put up) between two traversal times. We exploit geometric, appearance and semantic information to determine which areas have changed and formulate the problem as an optimal image labeling problem in the Markov Random Field framework. The approach is evaluated on street sequences from 3 different locations which were visited multiple times by the vehicle. The proposed method is applicable to monitoring and updating models and images of urban environments.

Jana Košecka
Adaptive Background Defogging with Foreground Decremental Preconditioned Conjugate Gradient

The quality of outdoor surveillance videos are always degraded by bad weathers, such as fog, haze, and snowing. The degraded videos not only provide poor visualizations, but also increase the difficulty of vision-based analysis such as foreground/background segmentation. However, haze/fog removal has never been an easy task, and is often very time consuming. Most of the existing methods only consider a single image, and no temporal information of a video is used. In this paper, a novel adaptive background defogging method is presented. It is observed that most of the background regions between two consecutive video frames do not vary too much. Based on this observation, each video frame is firstly defogged by a background transmission map which is generated adaptively by the proposed foreground decremental preconditioned conjugate gradient (FDPCG). It is shown that foreground/background segmentation can be improved dramatically with such background-defogged video frames. With the help of a foreground map, the defogging of foreground regions is then completed by 1) foreground transmission estimation by fusion, and 2) transmission refinement by the proposed foreground incremental preconditioned conjugate gradient (FIPCG). Experimental results show that the proposed method can effectively improve the visualization quality of surveillance videos under heavy fog and snowing weather. Comparing with the state-of-the-art image defogging methods, the proposed method is much more efficient.

Jacky Shun-Cho Yuk, Kwan-Yee Kenneth Wong
A Shadow Repair Approach for Kinect Depth Maps

The depth data provided by Kinect is incomplete because of no-measured depth (NMD for short) pixels, so a preprocessing approach for depth map is necessary. In this paper, a depth map repair approach is proposed for one specific NMD pixels’ (shadow) removal. Firstly, the NMD pixels are divided into three types. Then a mathematical model based on the depth measurement of Kinect is built to explain the cause of shadow. A shadow discriminant based on the model is also designed. Finally, the repair approach is proposed for shadow regions detection and removal. Experimental results show that our method is both time saving and accurate.

Yu Yu, Yonghong Song, Yuanlin Zhang, Shu Wen
A Unified Framework for Line Extraction in Dioptric and Catadioptric Cameras

Many of the omnidirectional visual systems have revolution symmetry and, consequently, they can be described by the radially symmetric distortion model. Following this projection model, straight lines are projected on curves called line-images. In this paper we present a novel unified framework to deal with these line-images directly on the image which is valid for any central system. In order to validate this framework we have developed a method to extract line-images with a 2-points RANSAC, which makes use of the camera calibration. The proposed method also gives the adjacent regions of line-images which can be used for matching purposes. The line-images extractor has been implemented and tested with simulated and real images.

Jesus Bermudez-Cameo, Gonzalo Lopez-Nicolas, Jose J. Guerrero
Fusion of Time-of-Flight and Stereo for Disambiguation of Depth Measurements

The complementary nature of time-of-flight and stereo has led to their fusion systems, providing high quality depth maps robustly against depth bias and random noise of the time-of-flight camera as well as the lack of scene texture. This paper shows that the fusion system is also effective for disambiguating time-of-flight depth measurements caused by phase wrapping, which records depth values that are much less than their actual values if the scene points are farther than a certain maximum range. To recover the unwrapped depth map, we build a Markov random field based on a constraint that an accurately unwrapped depth value should minimize the dissimilarity between its projections on the stereo images. The unwrapped depth map is then adapted to stereo matching, reducing the matching ambiguity and enhancing the depth quality in textureless regions. Through experiments we show that the proposed method extends the range use of the time-of-flight camera, delivering unambiguous depth maps of real scenes.

Ouk Choi, Seungkyu Lee
Backmatter
Metadaten
Titel
Computer Vision – ACCV 2012
herausgegeben von
Kyoung Mu Lee
Yasuyuki Matsushita
James M. Rehg
Zhanyi Hu
Copyright-Jahr
2013
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-642-37447-0
Print ISBN
978-3-642-37446-3
DOI
https://doi.org/10.1007/978-3-642-37447-0