Image-consistent patches from unstructured points with J-linkage

doi:10.1016/j.imavis.2013.07.007

Image and Vision Computing

Volume 31, Issue 10, October 2013, Pages 756-770

https://doi.org/10.1016/j.imavis.2013.07.007 Get rights and content

Abstract

Going from unstructured cloud of points to surfaces is a challenging problem. However, as points are produced by a structure-and-motion pipeline, image-consistency is a powerful clue that comes to the rescue. In this paper we present a method for extracting planar patches from an unstructured cloud of points, based on the detection of image-consistent planar patches with J-linkage, a robust algorithm for multiple model fitting. The method integrates several constraints inside J-linkage, optimizes the position of the points with regard to image-consistency and deploys a hierarchical processing scheme that decreases the computational load. With respect to previous work this approach has the advantage of starting from sparse data. Several results show the effectiveness of the proposed approach.

Introduction

Although the current state of the art in three-dimensional (3D) reconstruction from images addresses the recovery of dense and accurate models [1], [2], [3], there is still an unfulfilled need for compact, abstract representations of objects.

What separates unstructured cloud of points from higher-level renditions of a model is a semantic gap, which could be bridged by leveraging additional information, such as meshing, surfaces, ordering, occlusion, parallelism and orthogonality of structures. Increasing levels of abstraction can then be obtained progressing to recognition of scene elements or entire architectural scenes.

Very recent works based on multiview stereo coupled with a structure-and-motion pipeline [1], [2], [3] produce visually compelling results, but they do not address the semantic gap issue at all, since the output is a dense, non-compact representation of the scene.

Three main approaches can be recognized that aim at bridging the semantic gap: interactive, top-down and bottom-up.

Interactive approaches require user intervention to recognize higher level structures, usually basing on the 3D information previously extracted [4], [5], [6], [7].

Top-down or model-based approaches start from the prior knowledge of the set of potential parametric models and try to infer the best fitting one along with its parameters [8], [9], [10], [11]. Potentially, only one image could be employed if the prior knowledge is enough to derive the 3D model [12], [13].

Bottom-up methods start directly from 3D data points trying to aggregate them in progressively higher-level structures, possibly using the reflectance information coming from the images, namely image-consistency. This paper falls in this category: The aim is to leverage models from unorganized point clouds to an intermediate representation made of planar patches as they are a good starting point for a complete automatic reconstruction of surfaces.

Some methods try to optimize an initial triangulation using visibility [14] or image-consistency [15], [16] only. They work only with very simple convex polyhedra objects, and they assume all points visible in at least one view.

Others [17], [18] extract the planes underlying the scene using RANSAC (or MSAC [19]) with spatial or image-consistency information. However, the sequential application of an algorithm designed for single model extraction is not suitable for large, noisy datasets. In fact, the experiments reported in those papers involve extremely simple objects.

In this paper we describe a general robust technique designed to fit multiple model instances, and then specialize it to the task of fitting planar patches to large unorganized point clouds, by integrating in a seamless way image-consistency and visibility constraints. Part of the work reported here has been published in [20], [21].

A related stream of work is those referred to as “planar stereo” [22], [23], [24], where one aims at obtaining a piecewise planar representation of scene starting from dense multiview stereo. In this work, instead, we face a more difficult problem, for we start with the sparse output of structure-and-motion, thereby avoiding the multiview stereo stage.

The rest of the paper is organized as follows. Section 2 introduces the J-linkage algorithm, whereas Section 3 describes how geometrical and image constraints have been added to address the patch fitting problem. The post-processing is illustrated in Section 4. Section 5 presents the hierarchical processing. Experiments are reported in Section 6 and, finally, conclusions are drawn in Section 7.

Section snippets

J-linkage

A widespread problem in Computer Vision is fitting a model to noisy data: The RANSAC algorithm [25] is the common practice for that task. It works reliably when data contains measurements from a single structure corrupted by gross outliers.

When multiple instances of the same structure are present in the data, the problem becomes tough, as the robust estimator must tolerate both gross outliers and pseudo-outliers. The latter are defined in [26] as “outliers to the structure of interest but

Constraint integration

Fitting planes, however, does not solve the problem of exacting planar patches, for a patch is a region of the plane, and the same plane may contain more patches (see Fig. 6).

This section describes how to leverage the J-linkage algorithm to fit planar patches to a cloud of 3D points that are considered as samples of actual surfaces in the observed scene.

The planar patch associated to a set of coplanar points is the convex hull of the projection of the points onto the supporting plane.⁴

Post-processing

During the agglomerative clustering of J-linkage, it is sufficient that a single triangle does not satisfy a constraint to discard the entire merge, because it is inductively assumed that patches are convex. As a result, triangles that fulfill the constraints are discarded, thereby leaving gaps in the surfaces between neighboring patches. Gaps arise also between non-coplanar patches because in J-linkage a point can belong to only one patch (or plane): as a result non-coplanar triangles cannot

Hierarchical processing

In this section we leverage the hierarchical partitioning of data (camera and points) provided by Samantha [43] to obtain a hierarchical patch fitting procedure which is more computationally efficient, inherently parallel and suitable for out-of-core processing. Samantha is a structure-and-motion pipeline that first runs an agglomerative clustering on the set of images and then processes them following the dendrogram. As a result, a hierarchy of contained cluster of points is created, where the

Results

Experiments were performed on publicly available data and other lab-made datasets. The 3D points were produced by Samantha [43], together with the hierarchical partitioning of the data.

Four datasets, namely “Piazza Brà⁶”, “Campidoglio”, “Duomo[]” and “Castle-P30⁷” have an architectural scale, whereas “Bas-relief” and “Small sculpture” represent smaller objects captured closer by. The latter, in particular,

Discussion

We presented a method for extracting a triangle mesh from an unstructured cloud of points, based on the detection of image-consistent planar patches with J-linkage. Our approach copes with sparse and real data, optimizes the position of the points with regard to image-consistency (photo-adjustment) and follows a hierarchical processing scheme that cuts the computing time from O (n² log n) to O (n log n). Moreover, we proposed a new post processing step that improves the quality of the final

Acknowledgments

This research has been supported by the University of Verona and Gexcel s.r.l. under the “Joint Projects” scheme. The laser scanning of “Piazza Brà” has been conducted by Gexcel s.r.l. with the EU JRC — Ispra and the permission of the municipality of Verona. The laser data of the “Duomo di Pisa” comes from the “Cattedrale Digitale⁹” project, while the photo set is courtesy of the Visual Computing Lab (ISTI-CNR, Pisa).

References (43)

A. Hilton
Scene modelling from sparse 3D data
Image Vision Comput.
(2005)
A. Bartoli
A random sampling strategy for piecewise planar scene segmentation
Comput. Vision Image Understanding
(2007)
L. Xu et al.
A new curve detection method: randomized Hough transform (RHT)
Pattern Recognit. Lett.
(1990)
L. Kobbelt et al.
Multiresolution hierarchies on unstructured triangle meshes
Comput. Geom.
(1999)
M. Goesele et al.
Multi-view stereo for community photo collections
Y. Furukawa et al.
Towards internet-scale multi-view stereo
P.F.C. Strecha et al.
Dynamic and scalable large scale image reconstruction
C. Taylor et al.
Reconstructing polyhedral models of architectural scenes from photographs
Lect. Notes Comput. Sci
(1996)
R. Cipolla et al.
PhotoBuilder—3D models of architectural scenes from uncalibrated images
M. Wilczkowiak et al.
Using geometric constraints through parallelepipeds for calibration and 3D modeling
IEEE Trans. Pattern Anal. Mach. Intell.
(2005)

A. van den Hengel et al.

VideoTrace: rapid interactive scene modelling from video

K. Schindler et al.

A model-based method for building reconstruction

A. Dick et al.

Modelling and interpretation of architecture from several images

Int. J. Comput. Vis.

(2004)

C. Brenner et al.

Extraction of facades using rjM-CMC and constraint equations

Photogramm. Comput. Vision

(2006)

Y. Tiana et al.

Knowledge-based topological reconstruction for building façade surface patches

F. Han et al.

Bayesian reconstruction of 3D shapes and scenes from a single image

P. Muller et al.

Image-based procedural modeling of facades

ACM Trans. Graph.

(2007)

D. Morris et al.

Image-consistent surface triangulation

A. Nakatsuji et al.

Optimizing a triangular mesh for shape reconstruction from images

IEICE Trans. Inf. Syst.

(2005)

O. Cooper et al.

Automatic augmentation and meshing of sparse 3D scene structure

P.H.S. Torr et al.

MLESAC: a new robust estimator with application to estimating image geometry

Comput. Vision Image Understanding

(2000)

Cited by (12)

Biclustering with dominant sets
2020, Pattern Recognition
Citation Excerpt :
Multiple structure recovery (MSR) relates to the extraction of multiple models from noisy or outlier-contaminated data. MSR is a challenging and significant problem, which is a crucial element for many computer vision applications [32,33,31]. In general, an instance of an MSR problem can be represented by a preference matrix that contains the points under analysis, along one dimension, and the hypotheses/structures for the models, in the other dimension.
Biclustering can be defined as the simultaneous clustering of rows and columns in a data matrix and it has been recently applied to many scientific scenarios such as bioinformatics, text analysis and computer vision to name a few. In this paper we propose a novel biclustering approach, that is based on the concept of dominant-set clustering and extends such algorithm to the biclustering problem. In more detail, we propose a novel encoding of the biclustering problem as a graph so to use the dominant set concept to analyse rows and columns simultaneously. Moreover, we extend the Dominant Set Biclustering approach to facilitate the insertion of prior knowledge that may be available on the domain. We evaluated the proposed approach on a synthetic benchmark and on two computer vision tasks: multiple structure recovery and region-based correspondence. The empirical evaluation shows that the method achieves promising results that are comparable to the state-of-the-art and that outperforms competitors in various cases.
Clustering via binary embedding
2018, Pattern Recognition
Citation Excerpt :
Typically, MSR is addressed by extracting some tentative structures (obtained by random sampling), and estimating how they fit the given points; the result is a (binary) matrix where an entry (i, j) indicates if the jth model (structure) “explains” the ith point. Instead of the classical consensus analysis, which looks for the models that explain most of the points (i.e., methods based on the Hough transform or RANSAC [33,34]), PA reverses that viewpoint, by analyzing how the points are explained by the different models [30–32]. In particular, similarly to our approach, each point is represented by a signature which indicates its fit to the pool of tentative structures; the MSR problem is then solved by clustering these signatures.
In this paper, we present a novel clustering scheme based on binary embeddings, which provides compact and informative binary representations of high-dimensional objects. The binary representations are obtained with a collection of one-class classifiers learned from (pseudo) randomly selected points in the dataset. To cluster the binary representations, we consider two approaches: a mixture of Bernoulli distributions and a recent biclustering approach called CRAFT. The empirical evaluation in comparison with both classic and recent clustering methods, based on 12 different datasets, provides encouraging results. The main feature of the proposed method is that it is agnostic to the shape of the clusters.
Spike and slab biclustering
2017, Pattern Recognition
Citation Excerpt :
Multiple structure recovery (MSR) concerns the extraction of multiple models from noisy or outlier-contaminated data. MSR is an important and challenging problem, which emerges in many computer vision applications [57–59]. In general, an instance of an MSR problem is represented by a preference matrix containing, in one dimension, the points under analysis, and in the other, the hypotheses/structures to which points should belong.
Biclustering refers to the problem of simultaneously clustering the rows and columns of a given data matrix, with the goal of obtaining submatrices where the selected rows present a coherent behaviour in the selected columns, and vice-versa. To face this intrinsically difficult problem, we propose a novel generative model, where biclustering is approached from a sparse low-rank matrix factorization perspective. The main idea is to design a probabilistic model describing the factorization of a given data matrix in two other matrices, from which information about rows and columns belonging to the sought for biclusters can be obtained. One crucial ingredient in the proposed model is the use of a spike and slab sparsity-inducing prior, thus we term the approach spike and slab biclustering (SSBi). To estimate the parameters of the SSBi model, we propose an expectation-maximization (EM) algorithm, termed SSBiEM, which solves a low-rank factorization problem at each iteration, using a recently proposed augmented Lagrangian algorithm. Experiments with both synthetic and real data show that the SSBi approach compares favorably with the state-of-the-art.
Multiple structure recovery via robust preference analysis
2017, Image and Vision Computing
Citation Excerpt :
Geometric multi-model fitting aims at extracting parametric models from unstructured data in order to organize and aggregate visual content in suitable higher-level geometric structures1. This ubiquitous task can be encountered in many Computer Vision applications, for example in 3D reconstruction, where it is employed to estimate multiple rigid moving objects in order to initialize multi-body structure from motion (e.g., [1]), or in the processing of 3D point clouds, where planar patches are fitted to produce intermediate geometric interpretations (e.g., [2]). Several challenges are afoot.
This paper address the extraction of multiple models from outlier-contaminated data by exploiting preference analysis and low rank approximation. First points are represented in the preference space, then Robust PCA (Principal Component Analysis) and Symmetric NMF (Non negative Matrix Factorization) are used to break the multi-model fitting problem into many single-model problems, which in turn are tackled with an approach inspired to MSAC (M-estimator SAmple Consensus) coupled with a model-specific scale estimate. Experimental validation on public, real data-sets demonstrates that our method compares favorably with the state of the art.
Multiple structure recovery with T-linkage
2017, Journal of Visual Communication and Image Representation
Citation Excerpt :
Finding multiple models (or structures) that fit data corrupted by noise and outliers is an ubiquitous problem in the empirical sciences, including Computer Vision, where organizing and aggregating unstructured visual content in higher level geometric structures is a necessary and basic step to derive better descriptions and understanding of a scene. A typical example of this problem can be found in 3D reconstruction, where multi-model fitting is employed either to estimate multiple rigid moving objects and hence to initialize multi-body Structure from Motion [1,2], or to produce intermediate geometric interpretations of reconstructed 3D point cloud by fitting geometric primitives [3–5]. Other scenarios in which the estimation of multiple geometric structure plays a primary role include face clustering, body-pose estimation, augmented reality, image stitching and video motion segmentation, to name just a few.
This work addresses the problem of robust fitting of geometric structures to noisy data corrupted by outliers. An extension of J-linkage (called T-linkage) is presented and elaborated. T-linkage improves the preference analysis implemented by J-linkage in term of performances and robustness, considering both the representation and the segmentation steps. A strategy to reject outliers and to estimate the inlier threshold is proposed, resulting in a versatile tool, suitable for multi-model fitting “in the wild”. Experiments demonstrate that our methods perform better than J-linkage on simulated data, and compare favorably with state-of-the-art methods on public domain real datasets.
Automatic Threshold RanSaC Algorithms for Pose Estimation Tasks
2023, Communications in Computer and Information Science

View all citing articles on Scopus

^☆: Editor's Choice Articles are invited and handled by a select rotating 12 member Editorial Board committee. This paper has been recommended for acceptance by Enrique Dunn.

¹: R.T. is now with 3Dflow s.r.l., Verona, Italy.

²: A.F. is now with the University of Udine — DIEGM.

View full text

Editor’s Choice ArticleImage-consistent patches from unstructured points with J-linkage☆

Abstract

Introduction

Section snippets

J-linkage

Constraint integration

Post-processing

Hierarchical processing

Results

Discussion

Acknowledgments

Image Vision Comput.

Comput. Vision Image Understanding

Pattern Recognit. Lett.

Comput. Geom.

Multi-view stereo for community photo collections

Towards internet-scale multi-view stereo

Dynamic and scalable large scale image reconstruction

Reconstructing polyhedral models of architectural scenes from photographs

Lect. Notes Comput. Sci

PhotoBuilder—3D models of architectural scenes from uncalibrated images

Using geometric constraints through parallelepipeds for calibration and 3D modeling

IEEE Trans. Pattern Anal. Mach. Intell.

VideoTrace: rapid interactive scene modelling from video

A model-based method for building reconstruction

Modelling and interpretation of architecture from several images

Int. J. Comput. Vis.

Extraction of facades using rjM-CMC and constraint equations

Photogramm. Comput. Vision

Knowledge-based topological reconstruction for building façade surface patches

Bayesian reconstruction of 3D shapes and scenes from a single image

Image-based procedural modeling of facades

ACM Trans. Graph.

Image-consistent surface triangulation

Optimizing a triangular mesh for shape reconstruction from images

IEICE Trans. Inf. Syst.

Automatic augmentation and meshing of sparse 3D scene structure

MLESAC: a new robust estimator with application to estimating image geometry

Comput. Vision Image Understanding

Editor’s Choice Article
Image-consistent patches from unstructured points with J-linkage☆