Editor’s Choice Article
Image-consistent patches from unstructured points with J-linkage

https://doi.org/10.1016/j.imavis.2013.07.007Get rights and content

Abstract

Going from unstructured cloud of points to surfaces is a challenging problem. However, as points are produced by a structure-and-motion pipeline, image-consistency is a powerful clue that comes to the rescue. In this paper we present a method for extracting planar patches from an unstructured cloud of points, based on the detection of image-consistent planar patches with J-linkage, a robust algorithm for multiple model fitting. The method integrates several constraints inside J-linkage, optimizes the position of the points with regard to image-consistency and deploys a hierarchical processing scheme that decreases the computational load. With respect to previous work this approach has the advantage of starting from sparse data. Several results show the effectiveness of the proposed approach.

Introduction

Although the current state of the art in three-dimensional (3D) reconstruction from images addresses the recovery of dense and accurate models [1], [2], [3], there is still an unfulfilled need for compact, abstract representations of objects.

What separates unstructured cloud of points from higher-level renditions of a model is a semantic gap, which could be bridged by leveraging additional information, such as meshing, surfaces, ordering, occlusion, parallelism and orthogonality of structures. Increasing levels of abstraction can then be obtained progressing to recognition of scene elements or entire architectural scenes.

Very recent works based on multiview stereo coupled with a structure-and-motion pipeline [1], [2], [3] produce visually compelling results, but they do not address the semantic gap issue at all, since the output is a dense, non-compact representation of the scene.

Three main approaches can be recognized that aim at bridging the semantic gap: interactive, top-down and bottom-up.

Interactive approaches require user intervention to recognize higher level structures, usually basing on the 3D information previously extracted [4], [5], [6], [7].

Top-down or model-based approaches start from the prior knowledge of the set of potential parametric models and try to infer the best fitting one along with its parameters [8], [9], [10], [11]. Potentially, only one image could be employed if the prior knowledge is enough to derive the 3D model [12], [13].

Bottom-up methods start directly from 3D data points trying to aggregate them in progressively higher-level structures, possibly using the reflectance information coming from the images, namely image-consistency. This paper falls in this category: The aim is to leverage models from unorganized point clouds to an intermediate representation made of planar patches as they are a good starting point for a complete automatic reconstruction of surfaces.

Some methods try to optimize an initial triangulation using visibility [14] or image-consistency [15], [16] only. They work only with very simple convex polyhedra objects, and they assume all points visible in at least one view.

Others [17], [18] extract the planes underlying the scene using RANSAC (or MSAC [19]) with spatial or image-consistency information. However, the sequential application of an algorithm designed for single model extraction is not suitable for large, noisy datasets. In fact, the experiments reported in those papers involve extremely simple objects.

In this paper we describe a general robust technique designed to fit multiple model instances, and then specialize it to the task of fitting planar patches to large unorganized point clouds, by integrating in a seamless way image-consistency and visibility constraints. Part of the work reported here has been published in [20], [21].

A related stream of work is those referred to as “planar stereo” [22], [23], [24], where one aims at obtaining a piecewise planar representation of scene starting from dense multiview stereo. In this work, instead, we face a more difficult problem, for we start with the sparse output of structure-and-motion, thereby avoiding the multiview stereo stage.

The rest of the paper is organized as follows. Section 2 introduces the J-linkage algorithm, whereas Section 3 describes how geometrical and image constraints have been added to address the patch fitting problem. The post-processing is illustrated in Section 4. Section 5 presents the hierarchical processing. Experiments are reported in Section 6 and, finally, conclusions are drawn in Section 7.

Section snippets

J-linkage

A widespread problem in Computer Vision is fitting a model to noisy data: The RANSAC algorithm [25] is the common practice for that task. It works reliably when data contains measurements from a single structure corrupted by gross outliers.

When multiple instances of the same structure are present in the data, the problem becomes tough, as the robust estimator must tolerate both gross outliers and pseudo-outliers. The latter are defined in [26] as “outliers to the structure of interest but

Constraint integration

Fitting planes, however, does not solve the problem of exacting planar patches, for a patch is a region of the plane, and the same plane may contain more patches (see Fig. 6).

This section describes how to leverage the J-linkage algorithm to fit planar patches to a cloud of 3D points that are considered as samples of actual surfaces in the observed scene.

The planar patch associated to a set of coplanar points is the convex hull of the projection of the points onto the supporting plane.4

Post-processing

During the agglomerative clustering of J-linkage, it is sufficient that a single triangle does not satisfy a constraint to discard the entire merge, because it is inductively assumed that patches are convex. As a result, triangles that fulfill the constraints are discarded, thereby leaving gaps in the surfaces between neighboring patches. Gaps arise also between non-coplanar patches because in J-linkage a point can belong to only one patch (or plane): as a result non-coplanar triangles cannot

Hierarchical processing

In this section we leverage the hierarchical partitioning of data (camera and points) provided by Samantha [43] to obtain a hierarchical patch fitting procedure which is more computationally efficient, inherently parallel and suitable for out-of-core processing. Samantha is a structure-and-motion pipeline that first runs an agglomerative clustering on the set of images and then processes them following the dendrogram. As a result, a hierarchy of contained cluster of points is created, where the

Results

Experiments were performed on publicly available data and other lab-made datasets. The 3D points were produced by Samantha [43], together with the hierarchical partitioning of the data.

Four datasets, namely “Piazza Brà6”, “Campidoglio”, “Duomo[]” and “Castle-P307” have an architectural scale, whereas “Bas-relief” and “Small sculpture” represent smaller objects captured closer by. The latter, in particular,

Discussion

We presented a method for extracting a triangle mesh from an unstructured cloud of points, based on the detection of image-consistent planar patches with J-linkage. Our approach copes with sparse and real data, optimizes the position of the points with regard to image-consistency (photo-adjustment) and follows a hierarchical processing scheme that cuts the computing time from O (n2 log n) to O (n log n). Moreover, we proposed a new post processing step that improves the quality of the final

Acknowledgments

This research has been supported by the University of Verona and Gexcel s.r.l. under the “Joint Projects” scheme. The laser scanning of “Piazza Brà” has been conducted by Gexcel s.r.l. with the EU JRC — Ispra and the permission of the municipality of Verona. The laser data of the “Duomo di Pisa” comes from the “Cattedrale Digitale9” project, while the photo set is courtesy of the Visual Computing Lab (ISTI-CNR, Pisa).

References (43)

  • A. van den Hengel et al.

    VideoTrace: rapid interactive scene modelling from video

  • K. Schindler et al.

    A model-based method for building reconstruction

  • A. Dick et al.

    Modelling and interpretation of architecture from several images

    Int. J. Comput. Vis.

    (2004)
  • C. Brenner et al.

    Extraction of facades using rjM-CMC and constraint equations

    Photogramm. Comput. Vision

    (2006)
  • Y. Tiana et al.

    Knowledge-based topological reconstruction for building façade surface patches

  • F. Han et al.

    Bayesian reconstruction of 3D shapes and scenes from a single image

  • P. Muller et al.

    Image-based procedural modeling of facades

    ACM Trans. Graph.

    (2007)
  • D. Morris et al.

    Image-consistent surface triangulation

  • A. Nakatsuji et al.

    Optimizing a triangular mesh for shape reconstruction from images

    IEICE Trans. Inf. Syst.

    (2005)
  • O. Cooper et al.

    Automatic augmentation and meshing of sparse 3D scene structure

  • P.H.S. Torr et al.

    MLESAC: a new robust estimator with application to estimating image geometry

    Comput. Vision Image Understanding

    (2000)
  • Cited by (12)

    • Biclustering with dominant sets

      2020, Pattern Recognition
      Citation Excerpt :

      Multiple structure recovery (MSR) relates to the extraction of multiple models from noisy or outlier-contaminated data. MSR is a challenging and significant problem, which is a crucial element for many computer vision applications [32,33,31]. In general, an instance of an MSR problem can be represented by a preference matrix that contains the points under analysis, along one dimension, and the hypotheses/structures for the models, in the other dimension.

    • Clustering via binary embedding

      2018, Pattern Recognition
      Citation Excerpt :

      Typically, MSR is addressed by extracting some tentative structures (obtained by random sampling), and estimating how they fit the given points; the result is a (binary) matrix where an entry (i, j) indicates if the jth model (structure) “explains” the ith point. Instead of the classical consensus analysis, which looks for the models that explain most of the points (i.e., methods based on the Hough transform or RANSAC [33,34]), PA reverses that viewpoint, by analyzing how the points are explained by the different models [30–32]. In particular, similarly to our approach, each point is represented by a signature which indicates its fit to the pool of tentative structures; the MSR problem is then solved by clustering these signatures.

    • Spike and slab biclustering

      2017, Pattern Recognition
      Citation Excerpt :

      Multiple structure recovery (MSR) concerns the extraction of multiple models from noisy or outlier-contaminated data. MSR is an important and challenging problem, which emerges in many computer vision applications [57–59]. In general, an instance of an MSR problem is represented by a preference matrix containing, in one dimension, the points under analysis, and in the other, the hypotheses/structures to which points should belong.

    • Multiple structure recovery via robust preference analysis

      2017, Image and Vision Computing
      Citation Excerpt :

      Geometric multi-model fitting aims at extracting parametric models from unstructured data in order to organize and aggregate visual content in suitable higher-level geometric structures1. This ubiquitous task can be encountered in many Computer Vision applications, for example in 3D reconstruction, where it is employed to estimate multiple rigid moving objects in order to initialize multi-body structure from motion (e.g., [1]), or in the processing of 3D point clouds, where planar patches are fitted to produce intermediate geometric interpretations (e.g., [2]). Several challenges are afoot.

    • Multiple structure recovery with T-linkage

      2017, Journal of Visual Communication and Image Representation
      Citation Excerpt :

      Finding multiple models (or structures) that fit data corrupted by noise and outliers is an ubiquitous problem in the empirical sciences, including Computer Vision, where organizing and aggregating unstructured visual content in higher level geometric structures is a necessary and basic step to derive better descriptions and understanding of a scene. A typical example of this problem can be found in 3D reconstruction, where multi-model fitting is employed either to estimate multiple rigid moving objects and hence to initialize multi-body Structure from Motion [1,2], or to produce intermediate geometric interpretations of reconstructed 3D point cloud by fitting geometric primitives [3–5]. Other scenarios in which the estimation of multiple geometric structure plays a primary role include face clustering, body-pose estimation, augmented reality, image stitching and video motion segmentation, to name just a few.

    • Automatic Threshold RanSaC Algorithms for Pose Estimation Tasks

      2023, Communications in Computer and Information Science
    View all citing articles on Scopus

    Editor's Choice Articles are invited and handled by a select rotating 12 member Editorial Board committee. This paper has been recommended for acceptance by Enrique Dunn.

    1

    R.T. is now with 3Dflow s.r.l., Verona, Italy.

    2

    A.F. is now with the University of Udine — DIEGM.

    View full text