Multi-view structure-from-motion for hybrid camera scenarios

doi:10.1016/j.imavis.2012.06.001

Image and Vision Computing

Volume 30, Issue 8, August 2012, Pages 557-572

https://doi.org/10.1016/j.imavis.2012.06.001 Get rights and content

Abstract

We describe a pipeline for structure-from-motion (SfM) with mixed camera types, namely omnidirectional and perspective cameras. For the steps of this pipeline, we propose new approaches or adapt the existing perspective camera methods to make the pipeline effective and automatic. We model our cameras of different types with the sphere camera model. To match feature points, we describe a preprocessing algorithm which significantly increases scale invariant feature transform (SIFT) matching performance for hybrid image pairs. With this approach, automatic point matching between omnidirectional and perspective images is achieved. We robustly estimate the hybrid fundamental matrix with the obtained point correspondences. We introduce the normalization matrices for lifted coordinates so that normalization and denormalization can be performed linearly for omnidirectional images. We evaluate the alternatives of estimating camera poses in hybrid pairs. A weighting strategy is proposed for iterative linear triangulation which improves the structure estimation accuracy. Following the addition of multiple perspective and omnidirectional images to the structure, we perform sparse bundle adjustment on the estimated structure by adapting it to use the sphere camera model. Demonstrations of the end-to-end multi-view SfM pipeline with the real images of mixed camera types are presented.

Graphical abstract

Highlights

► We describe a pipeline to perform structure-from-motion with mixed camera types. ► Sphere camera model is used throughout the pipeline for different camera types. ► Demonstrations of the proposed approach in real world scenarios are presented.

Introduction

Omnidirectional cameras provide a 360° horizontal field of view in a single image, which is an important advantage in many application areas such as surveillance [1], [2], robot navigation [3], [4] and 3D reconstruction [5], [6]. Point correspondences from a variety of angles provide more stable structure estimation [7] and degenerate cases like viewing only a planar surface are less likely to occur. Omnidirectional images can provide omni-present correspondences when the fields of view of perspective images do not overlap as we will discuss in this paper. A major drawback of these cameras is their lower spatial resolution than perspective cameras due to their large field of view. Using perspective cameras together with the omnidirectional ones could improve the resolution while preserving the advantage of an enlarged field of view. A possible scenario is 3D reconstruction in which omnidirectional cameras provide low resolution background reconstruction whereas the images of perspective cameras are used for modeling specific objects in the foreground. Another possible application opportunity for hybrid SfM is hybrid surveillance systems, for instance using a pan-tilt-zoom camera with an omnidirectional camera [1]. Enhancement of these systems by 3D structure and location estimation algorithms is possible without increasing the number of cameras.

For 3D reconstruction with such hybrid camera systems, we need to adapt the steps that are employed in systems using a single type of camera. In Fig. 1, an SfM pipeline which is commonly used for perspective camera systems is given. We investigate the applicability of this pipeline to hybrid camera systems and we propose improved or modified methods for different steps of this pipeline when needed.

Regarding the previous studies on hybrid systems, Adorni et al. [3] used a hybrid system for the obstacle detection problem in robot navigation. Chen and Yang [8] developed a region matching algorithm for hybrid views based on planar homographies. The epipolar geometry between hybrid camera views was first explained by Sturm [9] for mixtures of paracatadioptric (catadioptric camera with a parabolic mirror) and perspective cameras. The framework was extended to catadioptric cameras with hyperbolic mirrors and cameras with lens distortion by Barreto and Daniilidis [10]. Puig et al. [11] worked on feature point matching and fundamental matrix estimation between perspective and catadioptric camera images. For point matching, they first applied a catadioptric-to-panoramic conversion and employed regular SIFT [12] between panoramic and perspective views. They employed RANSAC [13] based on satisfying the epipolar constraint and compared the representation capabilities of 3x4, 3x6 and 6x6 hybrid fundamental matrices for mirrors withvarying parameters.

To our knowledge, the only work on hybrid SfM was conducted by Ramalingam et al. [14]. They employed a highly generic non-parametric imaging model where the cameras are modeled with sets of projection rays. They mentioned that directly applying SIFT [12] did not provide good results for their fisheye-perspective image pairs and used manually selected feature point correspondences to estimate the epipolar geometry. They employed the midpoint method for triangulation to estimate 3D point coordinates. They also tested two different bundle adjustment approaches, one minimizing the distances between projection rays and 3D points and the other minimizing reprojection error, concluding that both approaches are comparable to each other.

In our work, we employ the sphere camera model [15] which covers central (single-viewpoint) catadioptric systems as well as perspective cameras. The details are presented in Section 2. The SfM pipeline described here applies to all the cameras that can be modeled with the sphere model. The proposed methods for point matching and triangulation can be used with the cameras beyond the scope of the sphere camera model, since they do not employ this camera model.

Widely accepted feature matching methods (e.g. SIFT [12], MSER [16]) do not perform well when they are directly employed for hybrid camera images [11], [14]. Main reasons are the resolution difference and the distortion of features between the images of different camera types. Our analysis showed that, most of the false matches in SIFT output are due to matching a high-resolution feature in the perspective image to a feature in the omnidirectional image which does not have such high-resolution. We propose an algorithm that preprocesses the perspective images before matching. In this way, the probability of matching the features between the incorrect scales (octaves) decreases and SIFT matching produces a significantly higher true-positive ratio allowing us to perform automatic omnidirectional-perspective matching. We performed tests on a total of 20 image pairs taken from different scenes and with different omnidirectional (both catadioptric and fisheye) cameras. To decrease the effect of distortion in hybrid image pairs, we evaluate the use of virtual camera plane (VCP) images and include VCP-perspective matching to our experiments. Experimental results, given in Section 3, indicate the success of our method.

We employ RANSAC [13] to robustly compute the hybrid fundamental matrix (F) which requires the usage of lifted coordinates for linear estimation [9], [10]. We introduce normalization matrices for lifted coordinates so that normalization and denormalization can be performed linearly. We compare two options for motion estimation, one is directly estimating the essential matrix (E) with the calibrated 3D rays, the other option is estimating the hybrid F and then extracting E from F. We give details of our analysis in Section 4.

The only previous study including hybrid camera triangulation is using the midpoint method [14], however it was shown that iterative linear methods are superior to the midpoint method [17]. We propose a weighting strategy for the iterative linear-Eigen triangulation method to improve its 3D location estimation accuracy by trusting the (high resolution) perspective image more when employed for hybrid image pairs (Section 5).

In Section 6, we describe how we perform multi-view SfM. Briefly, we employ the approach of adding views to the structure [18] and to refine the final 3D point coordinates and camera motion parameters, we adapt the sparse bundle adjustment method [19] by modifying its projection function to be used with the sphere camera model. We present the results of our experiments for the individual steps of the SfM pipeline within the related sections. In Section 7, we present the demonstrations of the complete pipeline, i.e. multi-view hybrid SfM with real images, to show that our approach is working effectively in real world scenarios. We present two scenarios, also mentioned at the beginning of this section, where employing a hybrid camera system is advantageous. One of them is a surveillance setup where the scene can be dynamic and images are captured simultaneously. Thus, a mobile camera can not be used and it is not practical to use many perspective cameras to cover the whole scene. Section 7.1 presents such a scenario in which an omnidirectional camera is used in conjunction with a limited number of perspective cameras that do not view the same part of the scene. Such hybrid systems are becoming more widespread with the increased demand for video surveillance. We demonstrate how an omnidirectional camera can combine the 3D structures viewed by two or more perspective cameras with no overlapping views. Section 7.2 presents a second scenario in which two omnidirectional images are used to provide low resolution background reconstruction whereas several perspective views are used for modeling the objects at the foreground. The hybrid method alleviates the need for a mobile camera or a network of cameras for background reconstruction. In Section 7.3, we draw the reader's attention to another advantage of the hybrid systems and we demonstrate that adding omnidirectional cameras to perspective SfM scenarios increases the accuracy of motion estimation. Finally, in Section 7.4, we present an outdoor experiment, in which an image sequence from a captured video was used, to investigate the applicability of our hybrid SfM in other realistic scenarios.

The work presented here mainly comprises the research included in the first author's dissertation [20] and some of the experimental results are presented in [21].

Section snippets

Camera model and calibration

We use the sphere camera model by Geyer and Daniilidis [15] which was introduced to model central catadioptric cameras. Later, this model was extended to cover perspective cameras with lens distortions [22]. The model, comprises a unit sphere and a perspective camera and the projection of 3D points can be performed in two steps (Fig. 2). The first one is the projection of point Q in 3D space onto a unitary sphere and the second one is the projection from the sphere to the image plane. The first

Feature matching

To match the features in hybrid image pairs automatically, we describe a preprocessing algorithm to be applied with SIFT [12] matching. Matching performance decreases with distortion of features due to increasing baseline length or changing camera geometry. SIFT detects features at different scales and matches them regardless of their scales. We also observed low matching accuracy when there is a major scale difference between the two images. These conditions especially apply to our hybrid

Epipolar geometry and motion estimation

Epipolar geometry between hybrid camera views was explained by Sturm [9] for mixtures of paracatadioptric and perspective cameras. Barreto showed that the framework can also be extended to cameras with lens distortion due to the similarities between the paracatadioptric and division models [10]. To summarize this relationship, let us denote the corresponding image points in perspective and catadioptric images with q_p and q_c respectively. They are represented as 3-vectors in homogeneous 2D

Triangulation

We propose an improvement for the iterative linear-Eigen triangulation method for effective use in hybrid SfM. According to the comprehensive study by Hartley and Sturm [17], iterative linear-Eigen is one of best triangulation methods for Euclidean reconstruction. It is superior to the midpoint method and non-iterative linear methods especially when 2D error is considered.

Let the two corresponding points be q = (x, y, 1), $q^{'} = (x^{'}, y^{'}, 1)$ which are obtained by projecting the 3D point Q on the images

Adding views and bundle adjustment

To integrate additional views for multi-view SfM, we employed the approach proposed by Beardsley et al. [18]. In this approach, when a sequence of views is available, initially SfM is applied for the first two views. Then, for each new view, feature matching is performed with the previous view and the features which correspond to the already reconstructed 3D points are detected. The projection matrix of the new view is computed using these final 2D–3D matches.

Sparse bundle adjustment (SBA), see

Multi-view SfM experiments

We first present a multi-view SfM experiment with real images of mixed cameras, thus we employ the entire pipeline shown in Fig. 1. All the views (two omnidirectional, three perspective) in Fig. 6 were used for this experiment.

Estimated coordinates of the points (551 points were reconstructed) and estimated camera positions are shown in Fig. 18.

We performed SBA on this structure and camera parameters. The reprojection errors before and after SBA are given in Table 8 for all five views, where we

Conclusions

We described an SfM pipeline and proposed new approaches or improved existing methods for the steps of this pipeline so that hybrid camera scenarios are covered.

It had been stated that directly applying SIFT is not sufficient to obtain good results for hybrid image pairs. In our study, we analyzed the reasons of false matches in SIFT and proposed a preprocessing algorithm that increases the matching performance considerably. After a few remaining false matches are eliminated by employing RANSAC

References (40)

R. Hartley et al.
Triangulation
Comput. Vision Image Understanding
(1997)
G. Scotti et al.
Dual camera intelligent sensor for high definition 360 degrees surveillance
IEE Proc. Vis. Image Signal Process.
(2005)
K. Yamazawa et al.
Detecting moving objects with an omnidirectional camera based on adaptive background subtraction
Lect. Notes Comput. Sci.
(2003)
G. Adorni et al.
Omnidirectional Stereo Systems for Robot Navigation
T. Goedeme et al.
Omnidirectional vision based topological navigation
Int. J. Comput. Vision
(2007)
S. Fleck et al.
Omnidirectional 3D Modeling on a Mobile Robot using Graph Cuts
M. Lhuillier
Toward Flexible 3D Modeling using a Catadioptric Camera
P. Chang et al.
Omni-directional Structure from Motion
D. Chen et al.
Image Registration with Uncalibrated Cameras in Hybrid Vision Systems
P. Sturm
Mixing Catadioptric and Perspective Cameras

J. Barreto et al.

Epipolar Geometry of Central Projection Systems using Veronese Maps

L. Puig et al.

Matching of Omnidirectional and Perspective Images using the Hydrid Fundamental Matrix

D. Lowe

Distinctive image features from scale invariant keypoints

Int. J. Comput. Vision

(2004)

M. Fischler et al.

Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography

Commun. ACM

(1981)

S. Ramalingam et al.

A Generic Structure-from-motion Algorithm for Cross-camera Scenarios

C. Geyer et al.

A Unifying Theory for Central Panoramic Systems

J. Matas et al.

Robust Wide Baseline Stereo from Maximally Stable Extremal Regions

P. Beardsley et al.

Sequential updating of projective and affine structure from motion

Int. J. Comput. Vision

(1997)

M. Lourakis et al.

The Design and Implementation of a Generic Sparse Bundle Adjustment Software Package based on the LM Algorithm

Y. Bastanlar, Structure-from-Motion for Systems with Perspective and Omnidirectional Cameras, Ph.D. Thesis, Middle East...

Cited by (19)

A simplified two-view geometry based external calibration method for omnidirectional and PTZ camera pairs
2016, Pattern Recognition Letters
Citation Excerpt :
The relative position of the active camera with respect to the static camera is solved together with the parameters associated with active camera's pan and tilt mechanism. In [6], authors focus on multi-view structure-from-motion and perform external calibration with essential matrix estimation using many points. The omnidirectional and perspective cameras are moving freely in 3D space with any orientation.
The external calibration of a camera system is essential for most of the applications that involve an omnidirectional and a pan-tilt-zoom (PTZ) camera. The methods in the literature fall into two major categories; (1) a complete external calibration of the system which allows all degrees of freedom but highly time consuming, (2) spatial mapping between the pixel coordinates in omnidirectional camera and pan/tilt angles of the PTZ camera instead of explicitly computing the rotation and translation. Most methods in this category make restrictive assumptions about the camera setup such as optical axes of the cameras coincide. We propose an external calibration method that is effective and practical. Using the two-view geometry principles and making reasonable assumptions about the camera setup, calibration is performed with just two scene points. We extract rotation using the point correspondences in images. Locating the PTZ camera in the omnidirectional image is used to find the translation parameters and the real distance between the two scene points lets us compute the translation in correct scale. Results of the simulated and real image experiments show that our method works effectively in real world cases and its accuracy is comparable to the state-of-the-art methods.
Surround-View Fisheye Optics in Computer Vision and Simulation: Survey and Challenges
2024, arXiv
Self-supervised Interest Point Detection and Description for Fisheye and Perspective Images
2023, IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
ANISOTROPIC EDGE DETECTION IN CATADIOPTRIC IMAGES
2022, Proceedings - International Conference on Image Processing, ICIP
A Hybrid Camera System for High-Resolutionization of Target Objects in Omnidirectional Images
2021, IEEE Sensors Journal
Catadioptric images compression using an adapted neighborhood and the shape-adaptive DCT
2020, Multimedia Tools and Applications

View all citing articles on Scopus

^☆: This paper has been recommended for acceptance by “Jan-Michael Frahm, Dr.-Ing”.

View full text

Multi-view structure-from-motion for hybrid camera scenarios☆

Abstract

Graphical abstract

Highlights

Introduction

Section snippets

Camera model and calibration

Feature matching

Epipolar geometry and motion estimation

Triangulation

Adding views and bundle adjustment

Multi-view SfM experiments

Conclusions

Comput. Vision Image Understanding

Dual camera intelligent sensor for high definition 360 degrees surveillance

IEE Proc. Vis. Image Signal Process.

Detecting moving objects with an omnidirectional camera based on adaptive background subtraction

Lect. Notes Comput. Sci.

Omnidirectional Stereo Systems for Robot Navigation

Omnidirectional vision based topological navigation

Int. J. Comput. Vision

Omnidirectional 3D Modeling on a Mobile Robot using Graph Cuts

Toward Flexible 3D Modeling using a Catadioptric Camera

Omni-directional Structure from Motion

Image Registration with Uncalibrated Cameras in Hybrid Vision Systems

Mixing Catadioptric and Perspective Cameras

Epipolar Geometry of Central Projection Systems using Veronese Maps

Matching of Omnidirectional and Perspective Images using the Hydrid Fundamental Matrix

Distinctive image features from scale invariant keypoints

Int. J. Comput. Vision

Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography

Commun. ACM

A Generic Structure-from-motion Algorithm for Cross-camera Scenarios

A Unifying Theory for Central Panoramic Systems

Robust Wide Baseline Stereo from Maximally Stable Extremal Regions

Sequential updating of projective and affine structure from motion

Int. J. Comput. Vision

The Design and Implementation of a Generic Sparse Bundle Adjustment Software Package based on the LM Algorithm