Multi-view structure-from-motion for hybrid camera scenarios☆
Graphical abstract
Highlights
► We describe a pipeline to perform structure-from-motion with mixed camera types. ► Sphere camera model is used throughout the pipeline for different camera types. ► Demonstrations of the proposed approach in real world scenarios are presented.
Introduction
Omnidirectional cameras provide a 360° horizontal field of view in a single image, which is an important advantage in many application areas such as surveillance [1], [2], robot navigation [3], [4] and 3D reconstruction [5], [6]. Point correspondences from a variety of angles provide more stable structure estimation [7] and degenerate cases like viewing only a planar surface are less likely to occur. Omnidirectional images can provide omni-present correspondences when the fields of view of perspective images do not overlap as we will discuss in this paper. A major drawback of these cameras is their lower spatial resolution than perspective cameras due to their large field of view. Using perspective cameras together with the omnidirectional ones could improve the resolution while preserving the advantage of an enlarged field of view. A possible scenario is 3D reconstruction in which omnidirectional cameras provide low resolution background reconstruction whereas the images of perspective cameras are used for modeling specific objects in the foreground. Another possible application opportunity for hybrid SfM is hybrid surveillance systems, for instance using a pan-tilt-zoom camera with an omnidirectional camera [1]. Enhancement of these systems by 3D structure and location estimation algorithms is possible without increasing the number of cameras.
For 3D reconstruction with such hybrid camera systems, we need to adapt the steps that are employed in systems using a single type of camera. In Fig. 1, an SfM pipeline which is commonly used for perspective camera systems is given. We investigate the applicability of this pipeline to hybrid camera systems and we propose improved or modified methods for different steps of this pipeline when needed.
Regarding the previous studies on hybrid systems, Adorni et al. [3] used a hybrid system for the obstacle detection problem in robot navigation. Chen and Yang [8] developed a region matching algorithm for hybrid views based on planar homographies. The epipolar geometry between hybrid camera views was first explained by Sturm [9] for mixtures of paracatadioptric (catadioptric camera with a parabolic mirror) and perspective cameras. The framework was extended to catadioptric cameras with hyperbolic mirrors and cameras with lens distortion by Barreto and Daniilidis [10]. Puig et al. [11] worked on feature point matching and fundamental matrix estimation between perspective and catadioptric camera images. For point matching, they first applied a catadioptric-to-panoramic conversion and employed regular SIFT [12] between panoramic and perspective views. They employed RANSAC [13] based on satisfying the epipolar constraint and compared the representation capabilities of 3x4, 3x6 and 6x6 hybrid fundamental matrices for mirrors withvarying parameters.
To our knowledge, the only work on hybrid SfM was conducted by Ramalingam et al. [14]. They employed a highly generic non-parametric imaging model where the cameras are modeled with sets of projection rays. They mentioned that directly applying SIFT [12] did not provide good results for their fisheye-perspective image pairs and used manually selected feature point correspondences to estimate the epipolar geometry. They employed the midpoint method for triangulation to estimate 3D point coordinates. They also tested two different bundle adjustment approaches, one minimizing the distances between projection rays and 3D points and the other minimizing reprojection error, concluding that both approaches are comparable to each other.
In our work, we employ the sphere camera model [15] which covers central (single-viewpoint) catadioptric systems as well as perspective cameras. The details are presented in Section 2. The SfM pipeline described here applies to all the cameras that can be modeled with the sphere model. The proposed methods for point matching and triangulation can be used with the cameras beyond the scope of the sphere camera model, since they do not employ this camera model.
Widely accepted feature matching methods (e.g. SIFT [12], MSER [16]) do not perform well when they are directly employed for hybrid camera images [11], [14]. Main reasons are the resolution difference and the distortion of features between the images of different camera types. Our analysis showed that, most of the false matches in SIFT output are due to matching a high-resolution feature in the perspective image to a feature in the omnidirectional image which does not have such high-resolution. We propose an algorithm that preprocesses the perspective images before matching. In this way, the probability of matching the features between the incorrect scales (octaves) decreases and SIFT matching produces a significantly higher true-positive ratio allowing us to perform automatic omnidirectional-perspective matching. We performed tests on a total of 20 image pairs taken from different scenes and with different omnidirectional (both catadioptric and fisheye) cameras. To decrease the effect of distortion in hybrid image pairs, we evaluate the use of virtual camera plane (VCP) images and include VCP-perspective matching to our experiments. Experimental results, given in Section 3, indicate the success of our method.
We employ RANSAC [13] to robustly compute the hybrid fundamental matrix (F) which requires the usage of lifted coordinates for linear estimation [9], [10]. We introduce normalization matrices for lifted coordinates so that normalization and denormalization can be performed linearly. We compare two options for motion estimation, one is directly estimating the essential matrix (E) with the calibrated 3D rays, the other option is estimating the hybrid F and then extracting E from F. We give details of our analysis in Section 4.
The only previous study including hybrid camera triangulation is using the midpoint method [14], however it was shown that iterative linear methods are superior to the midpoint method [17]. We propose a weighting strategy for the iterative linear-Eigen triangulation method to improve its 3D location estimation accuracy by trusting the (high resolution) perspective image more when employed for hybrid image pairs (Section 5).
In Section 6, we describe how we perform multi-view SfM. Briefly, we employ the approach of adding views to the structure [18] and to refine the final 3D point coordinates and camera motion parameters, we adapt the sparse bundle adjustment method [19] by modifying its projection function to be used with the sphere camera model. We present the results of our experiments for the individual steps of the SfM pipeline within the related sections. In Section 7, we present the demonstrations of the complete pipeline, i.e. multi-view hybrid SfM with real images, to show that our approach is working effectively in real world scenarios. We present two scenarios, also mentioned at the beginning of this section, where employing a hybrid camera system is advantageous. One of them is a surveillance setup where the scene can be dynamic and images are captured simultaneously. Thus, a mobile camera can not be used and it is not practical to use many perspective cameras to cover the whole scene. Section 7.1 presents such a scenario in which an omnidirectional camera is used in conjunction with a limited number of perspective cameras that do not view the same part of the scene. Such hybrid systems are becoming more widespread with the increased demand for video surveillance. We demonstrate how an omnidirectional camera can combine the 3D structures viewed by two or more perspective cameras with no overlapping views. Section 7.2 presents a second scenario in which two omnidirectional images are used to provide low resolution background reconstruction whereas several perspective views are used for modeling the objects at the foreground. The hybrid method alleviates the need for a mobile camera or a network of cameras for background reconstruction. In Section 7.3, we draw the reader's attention to another advantage of the hybrid systems and we demonstrate that adding omnidirectional cameras to perspective SfM scenarios increases the accuracy of motion estimation. Finally, in Section 7.4, we present an outdoor experiment, in which an image sequence from a captured video was used, to investigate the applicability of our hybrid SfM in other realistic scenarios.
The work presented here mainly comprises the research included in the first author's dissertation [20] and some of the experimental results are presented in [21].
Section snippets
Camera model and calibration
We use the sphere camera model by Geyer and Daniilidis [15] which was introduced to model central catadioptric cameras. Later, this model was extended to cover perspective cameras with lens distortions [22]. The model, comprises a unit sphere and a perspective camera and the projection of 3D points can be performed in two steps (Fig. 2). The first one is the projection of point Q in 3D space onto a unitary sphere and the second one is the projection from the sphere to the image plane. The first
Feature matching
To match the features in hybrid image pairs automatically, we describe a preprocessing algorithm to be applied with SIFT [12] matching. Matching performance decreases with distortion of features due to increasing baseline length or changing camera geometry. SIFT detects features at different scales and matches them regardless of their scales. We also observed low matching accuracy when there is a major scale difference between the two images. These conditions especially apply to our hybrid
Epipolar geometry and motion estimation
Epipolar geometry between hybrid camera views was explained by Sturm [9] for mixtures of paracatadioptric and perspective cameras. Barreto showed that the framework can also be extended to cameras with lens distortion due to the similarities between the paracatadioptric and division models [10]. To summarize this relationship, let us denote the corresponding image points in perspective and catadioptric images with qp and qc respectively. They are represented as 3-vectors in homogeneous 2D
Triangulation
We propose an improvement for the iterative linear-Eigen triangulation method for effective use in hybrid SfM. According to the comprehensive study by Hartley and Sturm [17], iterative linear-Eigen is one of best triangulation methods for Euclidean reconstruction. It is superior to the midpoint method and non-iterative linear methods especially when 2D error is considered.
Let the two corresponding points be q = (x, y, 1), which are obtained by projecting the 3D point Q on the images
Adding views and bundle adjustment
To integrate additional views for multi-view SfM, we employed the approach proposed by Beardsley et al. [18]. In this approach, when a sequence of views is available, initially SfM is applied for the first two views. Then, for each new view, feature matching is performed with the previous view and the features which correspond to the already reconstructed 3D points are detected. The projection matrix of the new view is computed using these final 2D–3D matches.
Sparse bundle adjustment (SBA), see
Multi-view SfM experiments
We first present a multi-view SfM experiment with real images of mixed cameras, thus we employ the entire pipeline shown in Fig. 1. All the views (two omnidirectional, three perspective) in Fig. 6 were used for this experiment.
Estimated coordinates of the points (551 points were reconstructed) and estimated camera positions are shown in Fig. 18.
We performed SBA on this structure and camera parameters. The reprojection errors before and after SBA are given in Table 8 for all five views, where we
Conclusions
We described an SfM pipeline and proposed new approaches or improved existing methods for the steps of this pipeline so that hybrid camera scenarios are covered.
It had been stated that directly applying SIFT is not sufficient to obtain good results for hybrid image pairs. In our study, we analyzed the reasons of false matches in SIFT and proposed a preprocessing algorithm that increases the matching performance considerably. After a few remaining false matches are eliminated by employing RANSAC
References (40)
- et al.
Triangulation
Comput. Vision Image Understanding
(1997) - et al.
Dual camera intelligent sensor for high definition 360 degrees surveillance
IEE Proc. Vis. Image Signal Process.
(2005) - et al.
Detecting moving objects with an omnidirectional camera based on adaptive background subtraction
Lect. Notes Comput. Sci.
(2003) - et al.
Omnidirectional Stereo Systems for Robot Navigation
- et al.
Omnidirectional vision based topological navigation
Int. J. Comput. Vision
(2007) - et al.
Omnidirectional 3D Modeling on a Mobile Robot using Graph Cuts
Toward Flexible 3D Modeling using a Catadioptric Camera
- et al.
Omni-directional Structure from Motion
- et al.
Image Registration with Uncalibrated Cameras in Hybrid Vision Systems
Mixing Catadioptric and Perspective Cameras
Epipolar Geometry of Central Projection Systems using Veronese Maps
Matching of Omnidirectional and Perspective Images using the Hydrid Fundamental Matrix
Distinctive image features from scale invariant keypoints
Int. J. Comput. Vision
Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography
Commun. ACM
A Generic Structure-from-motion Algorithm for Cross-camera Scenarios
A Unifying Theory for Central Panoramic Systems
Robust Wide Baseline Stereo from Maximally Stable Extremal Regions
Sequential updating of projective and affine structure from motion
Int. J. Comput. Vision
The Design and Implementation of a Generic Sparse Bundle Adjustment Software Package based on the LM Algorithm
Cited by (19)
A simplified two-view geometry based external calibration method for omnidirectional and PTZ camera pairs
2016, Pattern Recognition LettersCitation Excerpt :The relative position of the active camera with respect to the static camera is solved together with the parameters associated with active camera's pan and tilt mechanism. In [6], authors focus on multi-view structure-from-motion and perform external calibration with essential matrix estimation using many points. The omnidirectional and perspective cameras are moving freely in 3D space with any orientation.
Self-supervised Interest Point Detection and Description for Fisheye and Perspective Images
2023, IEEE Computer Society Conference on Computer Vision and Pattern Recognition WorkshopsANISOTROPIC EDGE DETECTION IN CATADIOPTRIC IMAGES
2022, Proceedings - International Conference on Image Processing, ICIPA Hybrid Camera System for High-Resolutionization of Target Objects in Omnidirectional Images
2021, IEEE Sensors JournalCatadioptric images compression using an adapted neighborhood and the shape-adaptive DCT
2020, Multimedia Tools and Applications
- ☆
This paper has been recommended for acceptance by “Jan-Michael Frahm, Dr.-Ing”.