Elsevier

Pattern Recognition

Volume 45, Issue 6, June 2012, Pages 2363-2376
Pattern Recognition

Shape matching using a binary search tree structure of weak classifiers

https://doi.org/10.1016/j.patcog.2011.11.025Get rights and content

Abstract

In this paper, a novel algorithm for shape matching based on the Hausdorff distance and a binary search tree data structure is proposed. The shapes are stored in a binary search tree that can be traversed according to a Hausdorff-like similarity measure that allows us to make routing decisions at any given internal node. Each node functions as a classifier that can be trained using supervised learning. These node classifiers are very similar to perceptrons, and can be trained by formulating a probabilistic criterion for the expected performance of the classifier, then maximizing that criterion. Methods for node insertion and deletion are also available, so that a tree can be dynamically updated. While offline training is time consuming, all online training and both online and offline testing operations can be performed in O(logn) time. Experimental results on pedestrian detection indicate the efficiency of the proposed method in shape matching.

Highlights

► Fast template matching. ► Robustness to noise. ► Possibility to update the classifier online.

Introduction

The ability to locate and identify an object in an image or a video segment is a core task in visual information understanding. Out of the different object features that can be used for detection, object shape provides robustness to texture and illumination. Given an input image and a collection of shapes in a database, the objective it to detect the presence and estimate the location of any shape stored in the database. In order to use shape matching for object recognition, one must consider the following factors:

  • Shape representation

  • Shape matching

  • Shape database organization.

In this paper, our main contribution concerns the last two factors, as we propose a novel way to efficiently manage large shape databases and train shape classifiers that can be used in binary search trees for shapes.

A 2D shape can be represented in various forms. For an overview, the reader may refer to [35]. One possible representation is contour based, in which a shape is described as closed, continuous, parametric curve [12]. The statistical shape analysis regarding this representation is related mostly to clustering and hierarchical organization [27]. The dissimilarity between two shapes is proportional to the amount of effort required by the geodesic of one shape in order to morph into the geodesic of the other one.

Another way to represent a shape is by describing the shape using elements from a set of features. The work presented in [11] syntactically describes a shape as an ordered list of line segments that are characterized by their length and orientation. Matching and clustering are achieved through techniques inspired by text string matching algorithms. In similar fashion, [28] uses Shock Trees, a variation of the Shock Graph [26], to represent shapes and clusters shapes by merging individual shapes' Shock Trees into the clusters' Shock Trees in a way that minimizes the minimum description length of the entire dataset.

The simplest and most abstract way to represent a 2D contour is as a list of points. There is no requirement for closure, nor any information regarding the interconnectivity of said points. Such a list of points can be easily obtained, for example, by applying an edge detector on the image.

Object recognition through shape matching involves comparing an input shape with various template shapes and reporting the template with the highest similarity (or lowest dissimilarity) to the input shape. There are various similarity measures to be considered for this task [31]. A very useful similarity measure that can be easily computed is the Hausdorff distance [22], [23] and its many variants. The Hausdorff distance was originally introduced to measure similarity between two sets of points, so it can also be used to measure similarity between shapes, since we can consider them to be sets of points. In fact, Hausdorff distance based metrics have been widely used for locating objects in images [22], [4]. The directed Hausdorff distance formula from point set X to point set Y is given by:DDHD(X,Y)=maxxXminyY(d(x,y))where d(x,y) is the scalar L2 distance between point x of point set X to point y of point set Y. The overall Hausdorff distance is defined as:DHD=max(DDHD(X,Y),DDHD(Y,X))This measure, however, is very sensitive to outliers. In practice, several other variants of the Hausdorff distance are preferred over the original definition. Such variants include the k-th quantile Hausdorff distance, the modified Hausdorff distance and the weighted Hausdorff distance, used in [17] to match letters in scanned text. The interested reader may refer to [6] for the description of each variant and their performance comparison between the various Hausdorff distance variants. An image window that is densely populated by edge pixels will yield small Hausdorff distances (thus high similarity), when any template is matched against it. This leads to many false positive decisions. In order to overcome this problem, taking the orientation of the edges into account has been proposed in [20].

Probabilistic formulations of Hausdorff distance based metrics revolve mostly around using previous measurements to affect the number of possible future searches. When searching for a certain template in an image, the matching results of that template in a specific location in the image can be used to infer whether it is necessary to search in neighbouring spots or not through maximum likelihood estimation [19]. A probabilistic model is used in [10] to determine whether a matching result given for a certain template can warrant the further matching of similar templates. In this paper, we attempt probabilistic formulations on pixel level, studying how to choose the right template in the presence of displaced pixels in the input shape, which is a more challenging task.

As the template database becomes larger, exhaustive template search becomes impractical and the need of a better organization the template database aiming to minimize the cost of the search operations arises. Many applications, such as visual surveillance or pedestrian detection systems have real-time constraints, thus the template search speed is vital to the viability of such a system.

While binary decision trees have been employed for shape related tasks, these attempts mostly considered shape classification based on the presence or absence of features [25], or feature extraction for classification purposes [2]. A tree of weak classifiers has also been proposed in [8] for a sports pictures classification system. However, there is a key difference between the labelled classification task the above works handle and shape template matching. In the case of template matching, there is no correct answer concerning the decision of the classifier at each node. The objective is to find which template bears the closest resemblance to a target input shape. It is possible that every template of the training set is a sample from the same class and that every template belongs to its own subclass. Classical binary decision tree construction works by selecting a feature that minimizes the class entropy in the resulting split subsets. However, measuring entropy in this case is pointless, as the class entropy of the entire training set is 0, while the subclass entropy is impossible to reduce by splitting the training set. It is for this reason that such training methods are unsuitable for template matching.

An early attempt to take advantage of tree structures to match shapes can be found in [1], where a license plate reading system through template matching is presented. It uses 37 templates and employs a Coarse-to-Fine Strategy to first find rectangular areas in an input image then uses a data structure similar to a B-tree to gradually refine the matching. This tree's nodes represent clusters of templates, with the root containing every template and each child node representing a sub-cluster of its parent's cluster, until the leaf nodes, which only contain one template.

The tree construction issue is also addressed in [10], by organizing the templates in a data structure again similar to a B-tree. The templates are initially grouped in subsets in the tree leaves, based on their mutual similarity. Along the path from a subset of templates to the root, a shape exemplar is selected to represent the subset of the previous level tree nodes to the new level. These representatives (called exemplars) are again grouped by similarity and the process repeats, thus creating a hierarchical structure. A probabilistic model is used to determine whether to search further down a tree path or not, based on the matching result of the current node exemplar, in order to reduce the total number of measurements in a set of multiple templates. The weak point of this approach is that it provides no theoretical sublinear upper bound in search complexity, as it allows multiple paths to be followed at each tree node, and it also does not allow new templates to be added in an already constructed tree.

A self-balancing binary search tree (BST) is a well known data structure for the quick search, insertion and deletion of data. It was originally proposed for the efficient storage of real and integer numbers and it has been successfully adapted for storing other information as well [13]. The many useful properties of binary search trees are a clear motivation for attempting an adaptation of this data structure for the handling of shapes. The challenge lies in the fact that Hausdorff distance relationships lack several useful properties. For example, they are not transitive, they are not symmetric and they do not have to satisfy the triangular inequality [31]. Thus, there is no way to order shape similarity for a standard binary search.

Such an attempt at adapting the binary search tree data structure to be used for shapes has been proposed in [29], where the traditional BST operations for the search, insertion and deletion of nodes are presented, along with a weak classifier inside the internal nodes that facilitates these operations. By using the described data structure it is possible to search for (and even insert or delete) shape templates in logarithmic worst case computational complexity. Furthermore, whenever the number of templates doubles, a balanced BST needs only one more tree level to handle the extra data, which means that the scalability of the structure is also very good.

In this paper, we propose a novel offline node training algorithm and a novel offline tree construction method for a binary search tree data structure for shapes. Our work is also closely related to [10]. Our novelty consists of the definition of a novel weak classifier, a method to train it and the organization of said weak classifiers into a tree structure for improved shape matching results.

In order to obtain the offline node training formulae, we perform probabilistic formulations at pixel-level, thus examining the low-level elements of shape matching. So far, various probabilistic formulations of Hausdorff distance have been discussed. However, they are only based on distance measurements, a high-level element of the Hausdorff based metrics. The usefulness of this approach is more limited, due to the poor properties of Hausdorff distance relationships.

This paper is organized as follows: In Section 2 the proposed similarity measure is briefly discussed and the functionality of the proposed trees' nodes is defined. The offline construction of a binary search tree for shape templates is described in Section 3. The implementation and computational complexity details are given in Section 4. The various experiments on both artificially generated data and real data concerning pedestrian detection the were conducted to evaluate the performance of the proposed structure are detailed in Section 5. Finally, the paper is concluded in Section 6.

Section snippets

Tree structure

The proposed data structure is a binary shape tree for shapes and will henceforth be referred to as a shape tree. For our purposes, we view a shape as a set of points with 2-dimensional integer (pixel) coordinates of the form [i,j]T. A shape can either be represented as a set of points X, or as a binary NX×MX matrix X such that: X(i,j)=1ifx=[i,j]TX0otherwiseAny shape stored in a shape tree is referred to as a template Ti, while Ti will be used to denote the template in matrix form. All the

Offline node classifier training and tree construction algorithm

In this section, we describe the offline construction of a shape tree and the training of the classifiers (whose parameters are WL, WR, SL and SR) in its internal nodes. We begin by deriving a training method that maximizes the a posteriori probability that every template will be correctly classified at a given node and a tree construction algorithm that determines how the training set of each node is partitioned.

Implementation and complexity issues

In this section we provide further details on the implementation of a real-time system that performs shape matching in images using the proposed data structure. We also provide estimates for the computational complexity of the basic tree operations.

Experimental results

In this section the performance evaluation of the proposed data structure on a classification task is presented. We first tested the learning and generalization capabilities of both offline and online constructed trees. Finally, both types of shape trees have been tested using real data. We have chosen to evaluate the proposed shape matching system in the task of pedestrian detection [15], though the system is suitable for other applications as well, such as hand gesture recognition,

Conclusion

In this paper, a novel algorithm for shape matching based on the Hausdorff distance and a binary search tree data structure has been proposed. By using a variation of a well known similarity measure in a manner that allows us to make a decision on which subtree to direct a search at each node of the proposed tree, we can store and search for templates in logarithmic time. Through the use of probabilistic formulations, we came up with a way to train a weak classifier inside each node so that it

Acknowledgement

The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013) under Grant agreement No. 211471 (i3DPost).

Nikolaos Tsapanos received his B.Sc. in Computer Science in 2003 and his M.Sc. in Computer Science specialized in Technology and Applications in 2005 from the University of Ioannina and he has been working on his Ph.D. degree in the University of Thessaloniki since 2007. His research interests include machine learning and pattern recognition.

References (35)

  • S. Shlien

    Multiple binary decision tree classifiers

    Pattern Recognition

    (1990)
  • N. Tsapanos et al.

    Online shape learning using binary search trees

    Image and Vision Computing

    (2010)
  • D. Zhang et al.

    Review of shape representation and description techniques

    Pattern Recognition

    (2004)
  • Y. Amit et al.

    A coarse-to-fine strategy for multiclass shape detection

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2004)
  • Y. Amit et al.

    Joint induction of shape features and tree classifiers

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (1997)
  • G. Borgefors, Distance transformations in digital images, Computer Vision, Graphics, and Image Processing 34 (3) (1986)...
  • D. Huttenlocher, G.K. Rucklidge, W., Comparing images using the Hausdorff distance, IEEE Transactions on Pattern...
  • N. Dalal et al.

    Histograms of oriented gradients for human detection

  • M.-P. Dubuisson, A. Jain, A modified Hausdorff distance for object matching, in: 1994 – Conference A: Computer Vision &...
  • M. Enzweiler et al.

    Monocular pedestrian detection: survey and experiments

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2009)
  • B. Foo et al.

    Resource constrained stream mining with classifier tree topologies

    IEEE Signal Processing Letters

    (2008)
  • D. Gavrila, V. Philomin, Real-time object detection for “smart” vehicles, in: IEEE International Conference on Computer...
  • D.M. Gavrila, A Bayesian, exemplar-based approach to hierarchical shape matching, IEEE Transactions on Pattern Analysis...
  • Y. Gdalyahu et al.

    Flexible syntactic matching of curves and its application to automatic hierarchical classification of silhouettes

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (1999)
  • E. Klassen et al.

    Analysis of planar shapes using geodesic paths on shape spaces

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2004)
  • D.E. Knuth, Art of Computer Programming, Sorting and Searching, second ed., vol. 3, April...
  • B. Leibe, E. Seemann, B. Schiele, Pedestrian detection in crowded scenes, in: IEEE Computer Society Conference on...
  • Cited by (0)

    Nikolaos Tsapanos received his B.Sc. in Computer Science in 2003 and his M.Sc. in Computer Science specialized in Technology and Applications in 2005 from the University of Ioannina and he has been working on his Ph.D. degree in the University of Thessaloniki since 2007. His research interests include machine learning and pattern recognition.

    Anastasios Tefas received the B.Sc. in informatics in 1997 and the Ph.D. degree in informatics in 2002, both from the Aristotle University of Thessaloniki, Greece. Since 2008, he has been a Lecturer at the Department of Informatics, Aristotle University of Thessaloniki. From 2006 to 2008, he was an Assistant Professor at the Department of Information Management, Technological Institute of Kavala. From 2003 to 2004, he was a temporary lecturer in the Department of Informatics, University of Thessaloniki. From 1997 to 2002, he was a researcher and teaching assistant in the Department of Informatics, University of Thessaloniki. Dr. Tefas participated in 10 research projects financed by national and European funds. He has co-authored over 70 journal and conference papers and contributed two chapters in edited books. Over 400 citations by third researchers have been recorded to his publications. His current research interests include computational intelligence, pattern recognition, statistical machine learning, digital signal and image processing and computer vision.

    Nikolaos Nikolaidis received the Diploma of Electrical Engineering in 1991 and the Ph.D. degree in Electrical Engineering in 1997, both from the Aristotle University of Thessaloniki, Greece. From 1998 to 2002 he was postdoctoral researcher and teaching assistant at the Department of Informatics, Aristotle University of Thessaloniki. He is currently an Assistant Professor in the same Department. Dr. Nikolaidis participated in 19 European and national R&D projects and serves as a reviewer to a number of international scientific journals & conferences in his area of expertise. He is the co-author of the book “3-D Image Processing Algorithms” (Wiley, 2000). He has co-authored 15 book chapters, 34 journal papers and 107 conference papers and co-edited one book and two special issues. The number of citations to his work by third authors exceeds 1300. His research interests include computer graphics, image and video processing and analysis, copyright protection of multimedia, computer vision and 3-D image processing.

    Ioannis Pitas received the Diploma of Electrical Engineering in 1980 and the Ph.D. degree in Electrical Engineering in 1985, both from the University of Thessaloniki, Greece. Since 1994 he has been a Professor at the Department of Informatics, University of Thessaloniki, Greece. From 1980 to 1993 he served as Scientific Assistant, Lecturer, Assistant Professor, and Associate Professor in the Department of Electrical and Computer Engineering at the same University. He served as Visiting Professor and ASI fellow at the University of British Columbia, Canada, as Visiting Professor at Ecole Polytechnique Federale de Lausanne, at Tampere University of Technology, Finland, as Visiting Assistant Professor at the University of Toronto and as a Visiting Research Associate at the University of Toronto, Canada and at the University of Erlangen-Nuernberg, Germany. His current interests are in the areas of digital image processing,multimedia signal processing, multidimensional signal processing and computer vision. He has published over 510 papers, contributed in 20 books and authored, co-authored, edited, co-edited seven books in his area of interest.

    View full text