main-content

## Über dieses Buch

This classroom-tested and easy-to-understand textbook/reference describes the state of the art in 3D reconstruction from multiple images, taking into consideration all aspects of programming and implementation. Unlike other computer vision textbooks, this guide takes a unique approach in which the initial focus is on practical application and the procedures necessary to actually build a computer vision system. The theoretical background is then briefly explained afterwards, highlighting how one can quickly and simply obtain the desired result without knowing the derivation of the mathematical detail. Features: reviews the fundamental algorithms underlying computer vision; describes the latest techniques for 3D reconstruction from multiple images; summarizes the mathematical theory behind statistical error analysis for general geometric estimation problems; presents derivations at the end of each chapter, with solutions supplied at the end of the book; provides additional material at an associated website.

## Inhaltsverzeichnis

### Chapter 1. Introduction

Abstract
This chapter states the background and organization of this book and describes distinctive features of the volume.
Kenichi Kanatani, Yasuyuki Sugaya, Yasushi Kanazawa

### Chapter 2. Ellipse Fitting

Abstract
Extracting elliptic edges from images and fitting ellipse equations to them is one of the most fundamental tasks of computer vision. This is because circular objects, which are very common in daily scenes, are projected as ellipses in images. We can even compute their 3D positions from the fitted ellipse equations, which along with other applications are discussed in Chap. 9. In this chapter, we present computational procedures for accurately fitting an ellipse to extracted edge points by considering the statistical properties of image noise. The approach is classified into algebraic and geometric. The algebraic approach includes least squares, iterative reweight, the Taubin method, renormalization, HyperLS, and hyper-renormalization; the geometric approach includes FNS, geometric distance minimization, and hyper-accurate correction. We then describe the ellipse-specific method of Fitzgibbon et al. and the random sampling technique for avoiding hyperbolas, which may occur when the input information is insufficient. The RANSAC procedure is also discussed for removing nonelliptic arcs from the extracted edge point sequence.
Kenichi Kanatani, Yasuyuki Sugaya, Yasushi Kanazawa

### Chapter 3. Fundamental Matrix Computation

Abstract
Two images of the same scene are related by what is called the epipolar equation. It is specified by a matrix called the fundamental matrix. By computing the fundamental matrix between two images, one can analyze the 3D structure of the scene, which we discuss in Chaps. 4 and 5. This chapter describes the principle and typical computational procedures for accurately computing the fundamental matrix by considering the statistical properties of the noise involved in correspondence detection. As in ellipse fitting, the methods are classified into algebraic and geometric approaches. However, the fundamental matrix has an additional property called the rank constraint: it is required to have determinant 0. Three approaches for enforcing it are introduced here: a posteriori rank correction, hidden variables, and extended FNS. We then describe the procedure of repeatedly using them to compute the geometric distance minimization solution. The RANSAC procedure for removing wrong correspondences is also described.
Kenichi Kanatani, Yasuyuki Sugaya, Yasushi Kanazawa

### Chapter 4. Triangulation

Abstract
This chapter describes the principles and computational procedures for triangulation that compute the 3D position of the point determined by a given pair of corresponding points over two images, using the knowledge of the positions, orientations, and internal parameters of the two cameras, which are specified by their camera matrices. First, we illustrate the geometry of perspective projection and describe the procedure for optimally correcting the corresponding point pair so that the associated lines of sight intersect in the scene, considering the statistical properties of image noise. This turns out to be closely related to the optimal fundamental matrix computation described in the preceding chapter.
Kenichi Kanatani, Yasuyuki Sugaya, Yasushi Kanazawa

### Chapter 5. 3D Reconstruction from Two Views

Abstract
This chapter describes a method for 3D reconstruction from two views, that is, computing the 3D positions of corresponding point pairs. To do this, we need to know the camera matrices that specify the positions, orientations, and internal parameters, such as focal lengths, of the two cameras. We estimate them from the fundamental matrix computed from the two images; this process is called self-calibration. We first express the fundamental matrix in terms of the positions, orientations, and focal lengths of the two cameras. Then we show that the focal lengths of the two cameras are computed from the fundamental matrix by an analytical formula. Using them, we can compute the positions and orientations of the two cameras, which determine their camera matrices. Once the camera matrices are determined, the 3D positions of corresponding point pairs are computed by the triangulation computation described in the preceding chapter.
Kenichi Kanatani, Yasuyuki Sugaya, Yasushi Kanazawa

### Chapter 6. Homography Computation

Abstract
If we take two images of a planar surface from two different places, the two images are related by a mapping called a homography. Computing a homography from point correspondences over two images is one of the most fundamental processes of computer vision. This is because, among other things, the 3D positions of the planar surface we are viewing and the two cameras that took the images can be computed from the computed homography. Such applications are discussed in Chaps. 7 and 8. This chapter describes the principles and typical computational procedures for accurately computing the homography by considering the statistical properties of the noise involved in correspondence detection. As in ellipse fitting and fundamental matrix computation, the methods are classified into algebraic (least squares, iterative reweight, the Taubin method, renormalization, HyperLS, and hyper-renormalization) and geometric (FNS, geometric distance minimization, and hyperaccurate correction). We also describe the RANSAC procedure for removing wrong correspondences (outliers).
Kenichi Kanatani, Yasuyuki Sugaya, Yasushi Kanazawa

### Chapter 7. Planar Triangulation

Abstract
This chapter describes the principles and procedure for computing the 3D position of a corresponding point pair between two images of a known planar surface by assuming knowledge of the camera matrices of the two cameras. This process is called planar triangulation. We first show that the homography between the two images is determined from the equation of the plane and the camera matrices. The principle of planar triangulation is to correct the corresponding point pair optimally such that the associated lines of sight intersect precisely at a point on the assumed plane, using knowledge of the statistical properties of image noise. It turns out that the procedure is closely related to the optimal homography computation described in the preceding chapter (Chap. 6).
Kenichi Kanatani, Yasuyuki Sugaya, Yasushi Kanazawa

### Chapter 8. 3D Reconstruction of a Plane

Abstract
This chapter describes the procedure for computing, from corresponding points between two images of a planar scene, the 3D position of that plane and the camera matrices of the two cameras that took those images. First, we express the matrix of the homography between the two images in terms of the 3D position of the plane and the two camera matrices. We then show how to decompose the homography matrix into the 3D position of the plane and the two camera matrices in an analytical form. The solution is not unique; we describe the procedure for selecting the correct one. Once the camera matrices are obtained, the 3D positions of the corresponding point pairs are computed by the planar triangulation procedure of the preceding chapter.
Kenichi Kanatani, Yasuyuki Sugaya, Yasushi Kanazawa

### Chapter 9. Ellipse Analysis and 3D Computation of Circles

Abstract
Circular objects in the scene are projected onto the image plane as ellipses. By fitting ellipse equations to them as shown in Chap. 2, we can compute the 3D properties of the circular objects. This chapter describes typical procedures for such computations. First, we show how we can compute various attributes of ellipses such as intersections, centers, tangents, and perpendiculars. Then we describe the procedure for computing from an ellipse image of a circle the position and orientation of the circle in the scene and the 3D position of its center. These computations allow us to generate an image of the circle seen from the front. The basic principle underlying these computations is the analysis of homographies induced by hypothetical camera rotation around the viewpoint, where projective geometry plays a central role.
Kenichi Kanatani, Yasuyuki Sugaya, Yasushi Kanazawa

### Chapter 10. Multiview Triangulation

Abstract
In Chap. 4, we showed how we can reconstruct the 3D point positions from their two-view images using knowledge of the camera matrices. Here, we extend it, reconstructing the 3D point positions from multiple images. The basic principle is the same as the two-view case: we optimally correct the observed point positions such that the lines of sight they define intersect at a single point in the scene. We begin with the three-view case and describe the optimal triangulation procedure based on the fact that three rays intersect in the scene if and only if the trilinear constraint is satisfied, just in the same way that two rays intersect if and only if the epipolar constraint is satisfied. We then extend this to general M views, imposing the trilinear constraint on all three consecutive images.
Kenichi Kanatani, Yasuyuki Sugaya, Yasushi Kanazawa

Abstract
In the preceding chapter, we used knowledge of the camera matrices for reconstructing the 3D from multiple views. In this chapter, we describe a procedure, called bundle adjustment, for computing from multiple views not only the 3D shape but also the positions, orientations, and intrinsic parameters of all the cameras simultaneously. Specifically, starting from given initial values for all the unknowns, we iteratively update them such that the reconstructed shape and observed images better satisfy the perspective projection relationship. Mathematically, this is nothing but minimization of a multivariable function, but the number of unknowns is very large. Hence, we need to devise an ingenious scheme for efficiently storing intermediate values and avoiding unnecessary computations. Here, we describe a typical programming technique for such implementation.
Kenichi Kanatani, Yasuyuki Sugaya, Yasushi Kanazawa

### Chapter 12. Self-calibration of Affine Cameras

Abstract
For doing the bundle adjustment of the preceding chapter, we need an initial reconstruction from which to start. To this end, this chapter presents a simple procedure for computing an approximate reconstruction. This is done by approximating the perspective camera projection model by a simplified affine camera model. We describe the procedure for affine reconstruction using affine cameras and discuss the metric condition for upgrading the result to a correct Euclidean reconstruction. Because the computation involves the singular value decomposition (SVD) of matrices, this procedure is widely known as the factorization method. We introduce symmetric affine cameras that best mimic perspective projection in the affine camera framework. They include, as special cases, paraperspective cameras, weak perspective cameras, and orthographic cameras. For each camera modeling, we describe the self-calibration procedure for reconstructing the 3D shape and the camera motion.
Kenichi Kanatani, Yasuyuki Sugaya, Yasushi Kanazawa

### Chapter 13. Self-calibration of Perspective Cameras

Abstract
We extend the affine camera self-calibration technique of the preceding chapter to perspective projection. We show that the factorization of the preceding chapter can be applied to perspective cameras if we introduce new unknowns called projective depths. They are determined so that the observation matrix can be factorized, for which two approaches exist. One, called the primary method, iteratively determines the projective depths with the result that the observation matrix has column rank 4, and the other, called the dual method, iteratively determines them with the result that it has row rank 4. However, the reconstructed 3D shape is a projective transformation of the true shape, called projective reconstruction. The correct shape is obtained by applying an appropriate Euclidean upgrading. The entire self-calibration procedure requires a large number of iterations over a large number of unknowns, therefore the computational efficiency is the main concern. We discuss the complexity and efficiency of the involved computation.
Kenichi Kanatani, Yasuyuki Sugaya, Yasushi Kanazawa

### Chapter 14. Accuracy of Geometric Estimation

Abstract
We focus here on algebraic methods for ellipse fitting, fundamental matrix computation, and homography computation described in Chaps. 2, 3, and 6 in a more generalized mathematical framework. We do a detailed error analysis in general terms and derive explicit expressions for the covariance and bias of the solution. The hyper-renormalization procedure is derived in this mathematical framework.
Kenichi Kanatani, Yasuyuki Sugaya, Yasushi Kanazawa

### Chapter 15. Maximum Likelihood of Geometric Estimation

Abstract
We discuss here maximum likelihood (ML) estimation and Sampson error minimization in the general mathematical framework of the preceding chapter. We first derive the Sampson error as a first approximation to the Mahalanobis distance (a generalization of the geometric distance or the reprojection error) of ML. Then we do high-order error analysis to derive explicit expressions for the covariance and bias of the solution. The hyperaccurate correction procedure is derived in this framework.
Kenichi Kanatani, Yasuyuki Sugaya, Yasushi Kanazawa

### Chapter 16. Theoretical Accuracy Limit

Abstract
We derive here a theoretical accuracy limit of the geometric estimation problem in the general mathematical framework described in Chaps. 14 and 15. It is given in the form of a bound, called the KCR (Kanatani-Cramer-Rao) lower bound, on the covariance matrix of the solution $$\varvec{\theta }$$. The resulting form indicates that all iterative algebraic and geometric methods achieve this bound up to higher-order terms in $$\sigma$$, meaning that these are all optimal with respect to covariance. As in Chaps. 14 and 15, we treat $$\varvec{\theta }$$ and $$\varvec{\xi }_{\alpha }$$ as nD vectors for generality.
Kenichi Kanatani, Yasuyuki Sugaya, Yasushi Kanazawa

### Backmatter

Weitere Informationen