Skip to main content

2008 | Buch

A Theory of Shape Identification

verfasst von: Frédéric Cao, José-Luis Lisani, Jean-Michel Morel, Pablo Musé, Frédéric Sur

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Mathematics

insite
SUCHEN

Über dieses Buch

Recent years have seen dramatic progress in shape recognition algorithms applied to ever-growing image databases. They have been applied to image stitching, stereo vision, image mosaics, solid object recognition and video or web image retrieval. More fundamentally, the ability of humans and animals to detect and recognize shapes is one of the enigmas of perception. The book describes a complete method that starts from a query image and an image database and yields a list of the images in the database containing shapes present in the query image. A false alarm number is associated to each detection. Many experiments will show that familiar simple shapes or images can reliably be identified with false alarm numbers ranging from 10-5 to less than 10-300. Technically speaking, there are two main issues. The first is extracting invariant shape descriptors from digital images. Indeed, a shape can be seen from various angles and distances and in various lights.

Inhaltsverzeichnis

Frontmatter

Extracting Image boundaries

1. Introduction
Digital images became an object of scientific interest in the seventies of the last century. The emerging science dealing with digital images is called Computer Vision. Computer Vision aims to give wherever possible a mathematical definition of visual perception. It can be therefore viewed in the realm of perception theory. Images are, however, a much more affordable object than percepts. Indeed, digital images are sampled real or vectorial functions defined on a part of the plane, usually a rectangle. They are accessible to all kinds of numerical, geometric, and statistical measurements. In addition, the results of artificial perception algorithms can be confronted to human perception. This confrontation is both advantageous and dangerous. Experimental results may easily be misinterpreted during visual inspection. The results look disappointing when matched with our perception. Obvious objects are often very hard to find in digital images by an algorithm.
In a recent book by Desolneux et al. [54], a general mathematical principle, the so called Helmholtz principle, was extensively explored as a way to define all visual percepts (gestalts) as large deviations from randomness. According to the main thesis of these authors one can compute detection thresholds deciding whether a given geometric structure is present or not in each digital image. Several applications of this principle have been developed by these authors and others for the detection of alignments [50], boundaries [51, 35], clusters [53, 33], smooth curves and good continuations [30, 31], vanishing points [2] and robust point matches through epipolar constraint [128].
2. Extracting Meaningful Curves from Images
The set of level lines of an image (isophotes) or topographic map is a complete and contrast invariant representation of an image. Level lines are ordered by inclusion in a tree structure. These two structure properties make level lines excellent candidates to shape representatives. However, some complexity issues have to be handled: The number of level lines in eight-bits encoded images of size 512×512 is typically 105. Most of them are very small lines due to noise or micro-texture. So the stable level lines must be selected, namely the ones that are likely to correspond to image contours. The starting point is the MSER method, a variant of the Monasse and Guichard Fast Level Set Transform. The MSER selects a set of level lines which are local extrema of contrast. This method will be put in the Helmholtz framework, following the a contrario boundary detection algorithm by Desolneux, Moisan and Morel [51], [54] and two powerful recent variants. The experiments in this chapter will show that selecting the most meaningful level lines reduces their number by a factor 100 without significant shape contents loss.
A method which selects one out of hundred level lines in the image without significant information loss is necessarily sophisticated. Sect. 2.1 briefly reviews the level line tree of a digital image. Sect. 2.2 describes a first way to extract well contrasted level lines, the MSER method. Sect. 2.3 makes an account of the Desolneux et al. maximal meaningful boundaries and Sect. 2.4 gives a mathematical justification which was actually missing in the original theory. Sect. 2.5 is devoted to a multiscale extension which avoids missing boundaries because of high noise level and Sect. 2.6 deals with the so called “blue sky” effect which can lead to over-detections in textured parts of the image.

Level Line Invariant Descriptors

3. Robust Shape Directions
This chapter deals with shape affine normalization. This method associates with all shapes deduced from each other by an affine distortion a single normalized shape. A crucial ingredient for normalization is the computation of a small affine covariant set of robust straight lines associated with a shape. The set of all tangent lines to a shape has this covariance property, but it is too large. A very successful idea is to use bitangent lines, that is, lines tangent to a shape at two different points. If the shape has a finite number of inflexion points it also has a finite number of bitangent lines. In Sect. 3.3 a well-established curve affine invariant smoothing algorithm will be briefly described. This smoothing permits a drastic reduction of the number of bitangent lines. Yet, not all shapes can be encoded by using bitangents. Convex shapes have no bitangents and simple shapes have only a few. This explains why shape recognition algorithms compute other robust straight lines associated with the shape. Flat parts of curves are informally defined as intervals of the curve along which the direction of the tangent line does not vary too much. For instance, large enough polygons show as many reliable flat parts as sides. This chapter will present a simple parameterless definition of flat parts, based again on the Helmholtz principle.
4. Invariant Level Line Encoding
Chapters 2 and 3 described the level lines extraction, selection and smoothing procedures, as well as the selection of a few stable, local directions on these curves. These procedures yield shape elements which cannot be directly compared or recognized since they have undergone an unknown affine transformation. The classical way to address this problem is normalization. We call affine invariant normalization a method to build shape representations that are invariant to any planar affine transformation T(x) = Ax + b, such that det(A) > 0. In other words, an affine invariant normalization transforms a planar shape F into a normalized shape such that any deformation of F by a planar affine transformation will give back the same normalized shape. Notice that shapes related by axial symmetry are not considered to be equivalent in this framework and will not yield the same normalized shape. Similarity invariant normalization is simpler and will be defined in the same way. Section 4.1 first presents the most classical moment method for affine normalization. We will show that this method is not efficient. In Sect. 4.1.3, a much more accurate normalization method is proposed, involving local and robust features of a level line such as bitangent lines and flat parts. This method is applied first to global level lines and then adapted in Sect. 4.2 to pieces of level lines, thus making shape recognition robust to occlusions. These normalization techniques will be used to describe, first, the MSER moment normalization method. The more sophisticated geometric affine normalization methods will be applied throughout the book to the recognition of LLDs (level line descriptors).

Recognizing Level Lines

5. A Contrario Decision: the LLD Method
In this chapter we will try to answer the question “does that shape element look like this one?”, and to measure the confidence level of this answer. This confidence level will be computed as the probability that two observed shapes match just by chance. This requires an a contrario or background model, which will be accurately computed from the shape database itself. The goal is to reach very high recognition confidence levels and therefore very small probabilities in the background model. How can we estimate very small probabilities? This cannot been done by simple counting. Indeed, the number of required samples grows as the inverse of the probability to be computed. There is, however, a classical way to circumvent this impossibility. It is enough to use independence. The probability of a very unlikely event can be estimated accurately provided it is a conjunction of independent events whose probabilities are larger, and therefore observable.
6. Meaningful Matches: Experiments on LLD and MSER
This chapter tests the shape matching method described in the previous chapter. Section 6.1 deals with the semi-local invariant recognition method. Both similarity and affine methods are considered, and a comparative study based on examples is presented. When images differ by a similarity, affine matching usually returns less matches because affine encoding is more demanding. Nevertheless, affine encoding proves more robust as soon as there is a slight perspective effect, and yields much smaller NFAs.We will also test an improved MSER method (namely a global affine matching algorithm of closed level lines). This algorithm works but we will point out a problem with convex shapes, which turn out to be very hard to distinguish up to an affine transformation. Finally the context-dependence of recognition will be illustrated by striking experiments on character recognition.
Now comes the time to check the applicability of the shape comparison scheme described in the previous chapters. All the experiments presented thereafter follow the same procedure: detection of meaningful boundaries (Chap. 2), affine invariant smoothing (Chap. 3, Sect. 3.3), similarity or affine normalization-encoding (Chap. 3 and 4), and then matching (Chap. 5).

Grouping Shape Elements

7. Hierarchical Clustering and Validity Assessment
The unsupervised classification of points into groups is commonly referred to as clustering or grouping. Clustering aims at discovering structure in a point data set by dividing it into its natural groups. There are three classical problems related to the construction of the right clusters. The first is evaluating the validity of a cluster candidate. In other words, is a group of points really a cluster, i.e. a group with a large enough density? The second problem is that meaningful clusters can contain or be contained in other meaningful clusters. A rule is needed to define locally optimal clusters by inclusion. This rule, however, is not enough to interpret correctly the data. The third problem is defining a correct merging rule between meaningful clusters, and thus being able to decide whether they should stay separate or unit. A unified a contrario method will be proposed for these problems. In continuation, some complexity issues and heuristics to find sound candidate clusters will be considered. In the next chapters, the clustering theory developed here will find a main application in shape recognition: the grouping of several local matches into a more global shape matching.
8. Grouping Spatially Coherent Meaningful Matches
This chapter is about forming coherent groups of matching shape elements. This grouping, sometimes called generalized Hough transform, is crucial for virtually all shape recognition algorithms. It will be used for the main shape identification method treated in this book, namely the LLD, the MSER method in Chap. 9 and the SIFT method in Chap. 11. Each pair of matching shape elements leads to a unique transformation (similarity or affine map). A natural way to group these shape elements into larger shapes is to find clusters in the transformation space. The theory in the previous chapter is immediately applicable. The main problem addressed here is the correct definition and computation of the background model π. This background model is a probability distribution on the set of similarities, or on the set of affine transformations. In order to have accurate shape clusters, π must be built from empirical measurements on observable shape matching transformations. As in Chap. 5, the main issue is to compute accurately a density function in high dimension (4 or 6) with relatively few samples. The found solution is analogous: determine the marginal variables for which an independence assumption is sound. Then the density functions of these marginal laws can be accurately estimated on the data and yield an accurate background model.
9. Experimental Results
In this chapter we illustrate a complete and parameterless recognition process by presenting many experiments with manifold image kinds. The whole algorithm is concisely described in Appendix B.1.

The SIFT Method

10. The SIFT Method
In this chapter and in the next one, we describe one of the most popular shape descriptors, Lowe's Scale-Invariant Feature Transform (SIFT) method [114]. In continuation we will perform a structural and practical comparison of the SIFTbased matching method with the Level Line Descriptor method (LLD) developed in this book. The LLD method in fact includes the features of the recent, also popular, MSER method. Comparing SIFT and LLD is not an easy task, since they are of different nature. On the one hand LLD is based on geometrical shape descriptors, rigorously invariant with respect to similarity or affine transformations. Moreover, the method comes with decision rules, either for matching or grouping. On the other hand, SIFT descriptors are local patches which are based on key points and which are just similarity-invariant. The comparison will be based on ad hoc experimental protocols, in the spirit of the SIFT method itself. These protocols check the robustness of local descriptors to all perturbations listed in Sect. 1.2 (Chap. 1).
We start with a comprehensive description of the SIFT shape encoding (Sect. 10.1). Then we compare robustness and stability of both shape descriptors (Sect. 10.2). Sect. 10.3 compares the matching performances of both algorithms on pair of images having similar shapes or obtained by photographing the same scene under different viewpoints. The main focus of the book is the computation of matching thresholds. In the SIFT method the thresholds are learned from the pair of images. We shall see that obvious matches can be missed when the query shape appears more than once in the searched image. In the next chapter a fusion of SIFT and of the a contrario techniques both for matching and grouping will be proposed.
11. Securing SIFT with A Contrario Techniques
In the previous chapter two shortcomings of Lowe's SIFT algorithm have been pointed out, namely its low matching efficiency (ratio between the number of correct matches and the total number of matches) and its inability to match several instances of the same object. The grouping stage of the method also is widely empirical and requires some fix.
In this chapter we shall examine three easy improvements of the SIFT method, all based on the a contrario techniques developed in the present book. They permit to treat all raised issues. The first one (Sect. 11.1) is the direct application of the theory for a contrario grouping of transformations developed in Chap. 8. The second one (Sect. 11.2) is the use of a background model for SIFT matches which prevents the elimination of multiple matches. Finally Sect. 11.4 yields an efficient a contrario technique computing a NFA for each SIFT match. In summary, the aim is to demonstrate that the whole SIFT algorithm can be secured and associated realistic NFAs, as we did in Chap. 5 and 8 for the LLD method.
Backmatter
Metadaten
Titel
A Theory of Shape Identification
verfasst von
Frédéric Cao
José-Luis Lisani
Jean-Michel Morel
Pablo Musé
Frédéric Sur
Copyright-Jahr
2008
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-540-68481-7
Print ISBN
978-3-540-68480-0
DOI
https://doi.org/10.1007/978-3-540-68481-7

Premium Partner