Editor’s Choice ArticleImage-consistent patches from unstructured points with J-linkage☆
Introduction
Although the current state of the art in three-dimensional (3D) reconstruction from images addresses the recovery of dense and accurate models [1], [2], [3], there is still an unfulfilled need for compact, abstract representations of objects.
What separates unstructured cloud of points from higher-level renditions of a model is a semantic gap, which could be bridged by leveraging additional information, such as meshing, surfaces, ordering, occlusion, parallelism and orthogonality of structures. Increasing levels of abstraction can then be obtained progressing to recognition of scene elements or entire architectural scenes.
Very recent works based on multiview stereo coupled with a structure-and-motion pipeline [1], [2], [3] produce visually compelling results, but they do not address the semantic gap issue at all, since the output is a dense, non-compact representation of the scene.
Three main approaches can be recognized that aim at bridging the semantic gap: interactive, top-down and bottom-up.
Interactive approaches require user intervention to recognize higher level structures, usually basing on the 3D information previously extracted [4], [5], [6], [7].
Top-down or model-based approaches start from the prior knowledge of the set of potential parametric models and try to infer the best fitting one along with its parameters [8], [9], [10], [11]. Potentially, only one image could be employed if the prior knowledge is enough to derive the 3D model [12], [13].
Bottom-up methods start directly from 3D data points trying to aggregate them in progressively higher-level structures, possibly using the reflectance information coming from the images, namely image-consistency. This paper falls in this category: The aim is to leverage models from unorganized point clouds to an intermediate representation made of planar patches as they are a good starting point for a complete automatic reconstruction of surfaces.
Some methods try to optimize an initial triangulation using visibility [14] or image-consistency [15], [16] only. They work only with very simple convex polyhedra objects, and they assume all points visible in at least one view.
Others [17], [18] extract the planes underlying the scene using RANSAC (or MSAC [19]) with spatial or image-consistency information. However, the sequential application of an algorithm designed for single model extraction is not suitable for large, noisy datasets. In fact, the experiments reported in those papers involve extremely simple objects.
In this paper we describe a general robust technique designed to fit multiple model instances, and then specialize it to the task of fitting planar patches to large unorganized point clouds, by integrating in a seamless way image-consistency and visibility constraints. Part of the work reported here has been published in [20], [21].
A related stream of work is those referred to as “planar stereo” [22], [23], [24], where one aims at obtaining a piecewise planar representation of scene starting from dense multiview stereo. In this work, instead, we face a more difficult problem, for we start with the sparse output of structure-and-motion, thereby avoiding the multiview stereo stage.
The rest of the paper is organized as follows. Section 2 introduces the J-linkage algorithm, whereas Section 3 describes how geometrical and image constraints have been added to address the patch fitting problem. The post-processing is illustrated in Section 4. Section 5 presents the hierarchical processing. Experiments are reported in Section 6 and, finally, conclusions are drawn in Section 7.
Section snippets
J-linkage
A widespread problem in Computer Vision is fitting a model to noisy data: The RANSAC algorithm [25] is the common practice for that task. It works reliably when data contains measurements from a single structure corrupted by gross outliers.
When multiple instances of the same structure are present in the data, the problem becomes tough, as the robust estimator must tolerate both gross outliers and pseudo-outliers. The latter are defined in [26] as “outliers to the structure of interest but
Constraint integration
Fitting planes, however, does not solve the problem of exacting planar patches, for a patch is a region of the plane, and the same plane may contain more patches (see Fig. 6).
This section describes how to leverage the J-linkage algorithm to fit planar patches to a cloud of 3D points that are considered as samples of actual surfaces in the observed scene.
The planar patch associated to a set of coplanar points is the convex hull of the projection of the points onto the supporting plane.4
Post-processing
During the agglomerative clustering of J-linkage, it is sufficient that a single triangle does not satisfy a constraint to discard the entire merge, because it is inductively assumed that patches are convex. As a result, triangles that fulfill the constraints are discarded, thereby leaving gaps in the surfaces between neighboring patches. Gaps arise also between non-coplanar patches because in J-linkage a point can belong to only one patch (or plane): as a result non-coplanar triangles cannot
Hierarchical processing
In this section we leverage the hierarchical partitioning of data (camera and points) provided by Samantha [43] to obtain a hierarchical patch fitting procedure which is more computationally efficient, inherently parallel and suitable for out-of-core processing. Samantha is a structure-and-motion pipeline that first runs an agglomerative clustering on the set of images and then processes them following the dendrogram. As a result, a hierarchy of contained cluster of points is created, where the
Results
Experiments were performed on publicly available data and other lab-made datasets. The 3D points were produced by Samantha [43], together with the hierarchical partitioning of the data.
Four datasets, namely “Piazza Brà6”, “Campidoglio”, “Duomo[]” and “Castle-P307” have an architectural scale, whereas “Bas-relief” and “Small sculpture” represent smaller objects captured closer by. The latter, in particular,
Discussion
We presented a method for extracting a triangle mesh from an unstructured cloud of points, based on the detection of image-consistent planar patches with J-linkage. Our approach copes with sparse and real data, optimizes the position of the points with regard to image-consistency (photo-adjustment) and follows a hierarchical processing scheme that cuts the computing time from O (n2 log n) to O (n log n). Moreover, we proposed a new post processing step that improves the quality of the final
Acknowledgments
This research has been supported by the University of Verona and Gexcel s.r.l. under the “Joint Projects” scheme. The laser scanning of “Piazza Brà” has been conducted by Gexcel s.r.l. with the EU JRC — Ispra and the permission of the municipality of Verona. The laser data of the “Duomo di Pisa” comes from the “Cattedrale Digitale9” project, while the photo set is courtesy of the Visual Computing Lab (ISTI-CNR, Pisa).
References (43)
Scene modelling from sparse 3D data
Image Vision Comput.
(2005)A random sampling strategy for piecewise planar scene segmentation
Comput. Vision Image Understanding
(2007)- et al.
A new curve detection method: randomized Hough transform (RHT)
Pattern Recognit. Lett.
(1990) - et al.
Multiresolution hierarchies on unstructured triangle meshes
Comput. Geom.
(1999) - et al.
Multi-view stereo for community photo collections
- et al.
Towards internet-scale multi-view stereo
- et al.
Dynamic and scalable large scale image reconstruction
- et al.
Reconstructing polyhedral models of architectural scenes from photographs
Lect. Notes Comput. Sci
(1996) - et al.
PhotoBuilder—3D models of architectural scenes from uncalibrated images
- et al.
Using geometric constraints through parallelepipeds for calibration and 3D modeling
IEEE Trans. Pattern Anal. Mach. Intell.
(2005)
VideoTrace: rapid interactive scene modelling from video
A model-based method for building reconstruction
Modelling and interpretation of architecture from several images
Int. J. Comput. Vis.
Extraction of facades using rjM-CMC and constraint equations
Photogramm. Comput. Vision
Knowledge-based topological reconstruction for building façade surface patches
Bayesian reconstruction of 3D shapes and scenes from a single image
Image-based procedural modeling of facades
ACM Trans. Graph.
Image-consistent surface triangulation
Optimizing a triangular mesh for shape reconstruction from images
IEICE Trans. Inf. Syst.
Automatic augmentation and meshing of sparse 3D scene structure
MLESAC: a new robust estimator with application to estimating image geometry
Comput. Vision Image Understanding
Cited by (12)
Biclustering with dominant sets
2020, Pattern RecognitionCitation Excerpt :Multiple structure recovery (MSR) relates to the extraction of multiple models from noisy or outlier-contaminated data. MSR is a challenging and significant problem, which is a crucial element for many computer vision applications [32,33,31]. In general, an instance of an MSR problem can be represented by a preference matrix that contains the points under analysis, along one dimension, and the hypotheses/structures for the models, in the other dimension.
Clustering via binary embedding
2018, Pattern RecognitionCitation Excerpt :Typically, MSR is addressed by extracting some tentative structures (obtained by random sampling), and estimating how they fit the given points; the result is a (binary) matrix where an entry (i, j) indicates if the jth model (structure) “explains” the ith point. Instead of the classical consensus analysis, which looks for the models that explain most of the points (i.e., methods based on the Hough transform or RANSAC [33,34]), PA reverses that viewpoint, by analyzing how the points are explained by the different models [30–32]. In particular, similarly to our approach, each point is represented by a signature which indicates its fit to the pool of tentative structures; the MSR problem is then solved by clustering these signatures.
Spike and slab biclustering
2017, Pattern RecognitionCitation Excerpt :Multiple structure recovery (MSR) concerns the extraction of multiple models from noisy or outlier-contaminated data. MSR is an important and challenging problem, which emerges in many computer vision applications [57–59]. In general, an instance of an MSR problem is represented by a preference matrix containing, in one dimension, the points under analysis, and in the other, the hypotheses/structures to which points should belong.
Multiple structure recovery via robust preference analysis
2017, Image and Vision ComputingCitation Excerpt :Geometric multi-model fitting aims at extracting parametric models from unstructured data in order to organize and aggregate visual content in suitable higher-level geometric structures1. This ubiquitous task can be encountered in many Computer Vision applications, for example in 3D reconstruction, where it is employed to estimate multiple rigid moving objects in order to initialize multi-body structure from motion (e.g., [1]), or in the processing of 3D point clouds, where planar patches are fitted to produce intermediate geometric interpretations (e.g., [2]). Several challenges are afoot.
Multiple structure recovery with T-linkage
2017, Journal of Visual Communication and Image RepresentationCitation Excerpt :Finding multiple models (or structures) that fit data corrupted by noise and outliers is an ubiquitous problem in the empirical sciences, including Computer Vision, where organizing and aggregating unstructured visual content in higher level geometric structures is a necessary and basic step to derive better descriptions and understanding of a scene. A typical example of this problem can be found in 3D reconstruction, where multi-model fitting is employed either to estimate multiple rigid moving objects and hence to initialize multi-body Structure from Motion [1,2], or to produce intermediate geometric interpretations of reconstructed 3D point cloud by fitting geometric primitives [3–5]. Other scenarios in which the estimation of multiple geometric structure plays a primary role include face clustering, body-pose estimation, augmented reality, image stitching and video motion segmentation, to name just a few.
Automatic Threshold RanSaC Algorithms for Pose Estimation Tasks
2023, Communications in Computer and Information Science
- ☆
Editor's Choice Articles are invited and handled by a select rotating 12 member Editorial Board committee. This paper has been recommended for acceptance by Enrique Dunn.
- 1
R.T. is now with 3Dflow s.r.l., Verona, Italy.
- 2
A.F. is now with the University of Udine — DIEGM.