Elsevier

Pattern Recognition

Volume 47, Issue 6, June 2014, Pages 2116-2125
Pattern Recognition

Bag of contour fragments for robust shape classification

https://doi.org/10.1016/j.patcog.2013.12.008Get rights and content

Highlights

  • A new shape representation is proposed by encoding contour fragments in shape.

  • The proposed shape representation is compact yet informative.

  • The proposed shape representation is robust to shape deformation and conclusion.

  • We obtain the state-of-the-art shape classification performance on several bench-mark datasets.

Abstract

Shape representation is a fundamental problem in computer vision. Current approaches to shape representation mainly focus on designing low-level shape descriptors which are robust to rotation, scaling and deformation of shapes. In this paper, we focus on mid-level modeling of shape representation. We develop a new shape representation called Bag of Contour Fragments (BCF) inspired by classical Bag of Words (BoW) model. In BCF, a shape is decomposed into contour fragments each of which is then individually described using a shape descriptor, e.g., the Shape Context descriptor, and encoded into a shape code. Finally, a compact shape representation is built by pooling shape codes in the shape. Shape classification with BCF only requires an efficient linear SVM classifier. In our experiments, we fully study the characteristics of BCF, show that BCF achieves the state-of-the-art performance on several well-known shape benchmarks, and can be applied to real image classification problem.

Introduction

Shape is an intrinsic feature for image understanding, which is stable to illumination and variations in object color and texture. Because of these advantages, shape is widely considered for object recognition. In particular, with the recent advance in contour detection proposed by Arbelaez et al. in [1], shape based object recognition in natural image is becoming more practical and attracts more attention in computer vision community. Main challenges in shape based object recognition include deformation, occlusion and viewpoint variation of objects. Various shape descriptors have been proposed to address these challenges, e.g., [2], [3], [4], [5]. Shape based object recognition is usually considered as a classification problem. Given a set of training shapes and category label of each training shape, we need to determine which category a testing shape belongs to. Traditional shape classification methods are usually based on matching shape descriptors from two different shapes: for every training shape, we find correspondences between its shape descriptors and the shape descriptors in the testing shape using matching algorithms, such as Hungarian algorithm, dynamic programming algorithm; then we compute matching costs according to the matching results; finally, we rank training shapes based on the matching costs and classify the testing shape using the nearest neighbor (NN) classifier. This exemplar-based shape classification strategy has been widely used, for example, in [2], [3], [6]. However, it has its own limitations. With few training samples, it is difficult to capture the large intra-class variation using these algorithms. For large training samples, it is extremely time consuming to perform shape matching one-by-one.

Different from exemplar-based shape matching, in this paper, we propose a compact shape representation and handle the large intra-class variation by discriminative learning. Inspired by the huge progress in image classification and representation with Bag-of-Words (BoW) [7], [8], we decompose shape into contour fragments and quantize the contour fragments into shape codes. The contour fragments under different scales contain both local and global shape information which can be encoded utilizing coding strategies for local descriptors [9], [10]. Then, a statistical histogram of shape codes is used to represent each shape and similarity of shapes can be directly computed from these histograms. Matching shapes based on this new shape representation does not explicitly give correspondences between contour fragments. But using a classifier for shape classification is much more efficient than using the typical matching algorithms such as Hungarian, thin plate spline (TPS), dynamic programming, dynamic time warping, and so on. In fact, BoW model is a natural solution for finding correspondences between two sets of features and can be used efficiently for recognition tasks. However, it has seldom been successfully applied to shape analysis, since the popular image descriptors such as SIFT [11] and LBP [12] are mainly designed for describing the local texture/appearance variations. These image features are not good at capturing the intrinsic structure in shape. Toward this end, we directly work on shape contour by decomposing it into contour fragments. We name our method Bag of Contour Fragments (BCF), which can not only provide a compact and informative representation, but also achieve the state-of-the-art classification performance on several popular shape benchmarks.

Pipeline of building shape representation in BCF is shown in Fig. 1. The outer contour of each shape is decomposed into salient contour fragments using a well-known contour decomposition method named discrete contour evolution (DCE) [13]. Each contour fragment is then described by collecting the shape context features [2] on its reference points, and encoded into shape codes. Finally, the shape codes are pooled into a compact image representation with spatial pyramid. We utilize the current advances in image classification, such as local-constrained linear coding (LLC) [9] for feature coding and spatial pyramid matching (SPM) [14] in our BCF shape classification framework. Both LLC and SPM are seldom used in shape analysis. LLC utilizes the locality constraints and encodes each descriptor with its local-coordinate system in a codebook. In practice, it first performs k-nearest-neighbor search to find local-coordinates for feature to be encoded, and then solves a constrained least square fitting problem on the local-coordinates. The state-of-the-art performance on PASCAL VOC [15] image classification has shown effectiveness of LLC. SPM is a simple and computationally efficient extension of the orderless BoW model for image representation. It works by partitioning the image into increasingly fine sub-regions and computing histograms of local features found inside each sub-region. Histograms of different sub-regions are concatenated as final image representation. SPM can capture the spatial information in contour fragments which are useful for shape recognition. BCF naturally utilizes LLC and SPM to improve the accuracy of shape classification.

One of the major difficulties involved in shape classification for many shape-matching based algorithms is to directly match two shapes with large deformation since shapes are only partially similar to each other. BCF can easily solve this problem caused by large shape deformation, and is good at classifying shapes with partial similarity. As each shape contour is divided into contour fragments in BCF, the contour fragments contain partial shape information. After coding, a discriminative classifier such as SVM or Adaboost can be used to select the representative and informative contour parts for each shape category. Fig. 4 shows some contour fragments selected by linear SVM in four shape categories in our experiments. We can see that even though contour fragments are parts of the shapes, they are very informative for recognizing shape category. Thus, BCF is able to deal with partial occlusion in shape, especially, in the edge map extracted from real image. Besides, we find that BCF is also robust to noisy contour in our experiments.

In summary, the proposed BCF has several good properties:

  • 1.

    It provides a very compact shape representation which is a single vector rather than a set of feature vectors used in many other methods.

  • 2.

    It precisely preserves information of individual shape contour via LLC and spatial layout of contour fragments in one shape via SPM.

  • 3.

    For shape classification it avoids pairwise matching between local shape descriptors and significantly reduces the time cost.

  • 4.

    It is robust to the shapes with occlusions or parts missing, and can be easily applied to real image classification.

The rest of the paper is organized as follows. We review the related works in Section 2. Then, we introduce the details of our shape representation with BCF in Section 3, including extracting, encoding and pooling contour fragments, and so on. We evaluate the proposed method on several popular shape benchmarks, illustrate good properties of BCF in applications, and demonstrate its effectiveness in shape classification in various datasets in Section 4. Finally, we conclude this paper in Section 5.

Section snippets

Related work

Here, we briefly review the recent progress in shape classification. Sun and Super [16] proposed a shape classification framework for recognizing contour shapes using class contour segments as input features with Bayesian classifier. Bai et al. [6] adopted contour segments and skeleton paths as the input features for shape classification with a Gaussian mixture model. Daliri and Torre [17], [18] transformed the contour points into a symbol representation, and then used the edit distance between

Bag of contour fragments

In this section, given a shape S, we show how to build BCF shape representation f(S) for S and use f(S) for shape classification step by step.

Experiments

In this section we test our method for shape classification on a variety of shape datasets and compare results of our method with the state-of-the-art shape classification approaches, available for those datasets in the literature. We also study the robustness of our method. First of all, we give implementation details as following.1

Conclusions

In this paper, we present a novel shape representation called BCF for shape classification. To the best of our knowledge, this is the first paper that introduces the idea of BoW together with LLC and SPM for shape representation. Since BCF is a part-based model, it is intrinsically robust to occlusion and deformation of shape. In the experiments, we have extensively tested the performance of BCF; all these experimental results on shape benchmarks show that BCF is able to achieve the

Conflict of interest

None declared.

Acknowledgements

We would like to thank the anonymous reviewer for their helpful suggestions. This work was supported by National Natural Science Foundation of China (NSFC) Grants 61222308 and 61173120, the Program for New Century Excellent Talents in University in China, National Science Foundation (NSF) Grants OIA-1027897 and IIS-1302164, and Fundamental Research Funds for the Central Universities (HUST 2013TS115). We thank Mr. Ho Simon Wang for assisting to improve the language of this paper.

Xinggang Wang received the B.E. degree in electronic information engineering from Huazhong University of Technology and Science, Wuhan, China, in 2009. He is currently working toward the Ph.D. degree in Department of Electronic Information Engineering from Huazhong University of Technology and Science. From May 2010 to July 2011, he was with the Department of Computer and Information Science, Temple University, Philadelphia, PA., as a visiting scholar. From February 2013 to present, he is with

References (60)

  • E. Attalla et al.

    Robust shape similarity retrieval based on contour segmentation polygonal multiresolution and elastic matching

    Pattern Recognit.

    (2005)
  • P. Arbelaez et al.

    Contour detection and hierarchical image segmentation

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2011)
  • S. Belongie et al.

    Shape matching and object recognition using shape contexts

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2002)
  • H. Ling et al.

    Shape classification using the inner-distance

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2007)
  • F. Mokhtarian et al.

    Efficient and robust retrieval by shape content through curvature scale space

    Ser. Softw. Eng. Knowl. Eng.

    (1997)
  • X. Bai, W. Liu, Z. Tu, Integrating contour and skeleton for shape classification, in: International Conference on...
  • J. Sivic, A. Zisserman, Video google: a text retrieval approach to object matching in videos, in: International...
  • G. Csurka, C. Dance, L. Fan, J. Willamowski, C. Bray, Visual categorization with bags of keypoints, in: Workshop on...
  • J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, Y. Gong, Locality-constrained linear coding for image classification, in:...
  • F. Perronnin, J. Sánchez, T. Mensink, Improving the fisher kernel for large-scale image classification, in: Europe...
  • D. Lowe

    Distinctive image features from scale-invariant keypoints

    Int. J. Comput. Vis.

    (2004)
  • T. Ahonen et al.

    Face description with local binary patternsapplication to face recognition

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2006)
  • S. Lazebnik, C. Schmid, J. Ponce, Beyond bags of features: spatial pyramid matching for recognizing natural scene...
  • M. Everingham et al.

    The Pascal visual object classes (voc) challenge

    Int. J. Comput. Vis.

    (2010)
  • K. Sun, B. Super, Classification of contour shapes using class segment sets, in: IEEE Conference on Computer Vision and...
  • B. Wang, W. Shen, W. Liu, X.-G. You, X. Bai, Shape classification using tree-unions, in: International Conference on...
  • A. Torsello et al.

    Learning shape-classes using a mixture of tree-unions

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2006)
  • K. Siddiqi et al.

    Shock graphs and shape matching

    Int. J. Comput. Vis.

    (1999)
  • X. Bai et al.

    Path similarity skeleton graph matching

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2008)
  • D. Zhang, G. Lu, Generic fourier descriptor for shape-based image retrieval, in: IEEE International Conference on...
  • Cited by (169)

    • Multi-level contour combination features for shape recognition

      2023, Computer Vision and Image Understanding
    View all citing articles on Scopus

    Xinggang Wang received the B.E. degree in electronic information engineering from Huazhong University of Technology and Science, Wuhan, China, in 2009. He is currently working toward the Ph.D. degree in Department of Electronic Information Engineering from Huazhong University of Technology and Science. From May 2010 to July 2011, he was with the Department of Computer and Information Science, Temple University, Philadelphia, PA., as a visiting scholar. From February 2013 to present, he is with the University of California, Los Angeles, as a visiting graduate researcher. He is a reviewer of Pattern Recognition. His research interests include object recognition and shape analysis in computer vision, and machine learning.

    Bin Feng received the B.S. and Ph.D. degrees from Huazhong University of Science and Technology (HUST), Wuhan, China, in 2001 and 2006, respectively, all in electronics and information engineering. He is currently an Associate Professor with the Department of Electronics and Information Engineering, HUST. From February 2009 to January 2010, he was with the Department of Electronics and Computer Science, Hongkong University of Science and Technology (HKUST) as a visiting researcher. His research interests include computer vision and intelligent video processing.

    Xiang Bai received the B.S., M.S., and Ph.D. degrees from Huazhong University of Science and Technology (HUST), Wuhan, China, in 2003, 2005, and 2009, respectively, all in electronics and information engineering. He is currently an Associate Professor with the Department of Electronics and Information Engineering, HUST. From January 2006 to May 2007, he was with the Department of Computer and Information Science, Temple University, Philadelphia, PA. From October 2007 to October 2008, he was with the University of California, Los Angeles, as a joint Ph.D. student. His research interests include computer graphics, computer vision, and pattern recognition.

    Wenyu Liu received the B.S. degree in computer science from Tsinghua University, Beijing, China, in 1986 and the M.S. and Ph.D. degrees in electronics and information engineering from Huazhong University of Science and Technology (HUST), Wuhan, China, in 1991 and 2001, respectively. He is currently a Professor and Associate Dean of the Department of Electronics and Information Engineering, HUST. His current research interests include multimedia information processing and computer vision.

    Longin Jan Latecki received the Ph.D. degree in computer science from Hamburg University, Germany, in 1992. He is a professor of computer science at Temple University, Philadelphia. His main research interests include shape representation and similarity, object detection and recognition in images, robot perception, machine learning, and digital geometry. He has published 200 research papers and books. He is an editorial board member of Pattern Recognition and the International Journal of Mathematical Imaging. He received the annual Pattern Recognition Society Award, together with Azriel Rosenfeld, for the best article published in the journal Pattern Recognition in 1998. He is the recipient of the 2000 Olympus Prize, the main annual award from the German Society for Pattern Recognition (DAGM). He is a senior member of the IEEE.

    View full text