Elsevier

Pattern Recognition Letters

Volume 35, 1 January 2014, Pages 78-90
Pattern Recognition Letters

IMISketch: An interactive method for sketch recognition

https://doi.org/10.1016/j.patrec.2013.08.011Get rights and content

Highlights

  • We present a new generic method for an interactive sketches interpretation to avoid a fastidious verification phase.

  • The analyzer is based on a competitive breadth-first exploration of the analysis tree.

  • As opposed to well structural approaches, this method allows to evaluate several recognition hypotheses in a dynamic local context.

  • The decision process is able to solicit the user.

  • The experiments have been reported on handwritten architectural plans.

Abstract

In this paper, we present a new generic method for an interactive interpretation of sketches to avoid a fastidious verification phase. After a preprocessing phase in which we extract a set of primitives, the interpretation process consists of an interactive analysis. The analyzer is based on a competitive breadth-first exploration of the analysis tree. As opposed to well known structural approaches, this method allows to evaluate simultaneously several possible hypotheses of recognition in a dynamic local context of document. The decision process is able to solicit the user in the case of strong ambiguity: when it is not sure to make the right decision. The user explicitly validates the right decision. While, in practice these approaches often induce a large combinatory, this paper presents optimization strategies to reduce the combinatory. The goal of these optimizations is to have time analysis compatible with user expectations. These strategies have been integrated into both preprocessing and analysis phases. To validate this interactive analysis method, several experiments are reported in this paper on off-line handwritten 2D architectural floor plans.

Introduction

Nowadays, digital documents are becoming more and more omnipresent in our life. Many reasons, such as the flexibility provided by digital processing, have led to transform handwritten documents to digital ones. In this context, people are working on mapping technical paper documents, like architectural floor plans, to digital ones. We aim at offering a complete and homogeneous solution to unify paper document recognition and pen-based composition (for instance: with Tablet PC). We present the IMISktech system : Interactive Method for Interpretation of Sketches. The input of this system is a scanned image of handwritten architectural plan and after interpretation the output is its digital version. This method is the result of four years of research that leads to several scientific publication (Ghorbel et al., 2011a, Ghorbel et al., 2011b, Ghorbel et al., 2012a, Ghorbel et al., 2012b). This paper is a synthesis of this method. We focus on the extraction of primitives that feed the analyzer to optimize the management of combinatory.

We have identified two major approaches for document analysis: syntactic and statistical approaches. Choosing one of these two approaches depends on the document type.

The syntactic approaches (Chan and Yeung, 2000, Coüasnon, 2006, Mao et al., 2003, Fitzgerald et al., 2007, Hammond et al., 2011, Hammond and Davis, 2009) lean on prior knowledge of the document structure to drive the analysis. They are often based on visual languages for describing this knowledge and generating the analyzer. However, syntactic methods have difficulties to incorporate the uncertainty.

The statistical approaches (Lemaitre et al., 2007, Montreuil et al., 2009, Sezgin and Davis, 2005) provide a better ability to incorporate uncertainty and usually lack the ability to convey the hierarchical structure of the document. The use of statistical approaches needs a wide learning on a homogeneous and labeled base. Each type of approach has advantages and drawbacks. The interpretation of handwritten structured documents needs on the one hand an approach that retains its structure, ie a syntactic approach, and on the other hand an approach that provides a better ability to incorporate uncertainty, ie a statistical approach.

In this work, we design a complete system for the sket interpretation: IMISketch.1 One of the main originalities of IMISketch is to avoid an a posteriori verification phase by soliciting the user. After a preprocessing phase in which primitives of the structured document are extracted, the system is characterized by an interactive analysis phase. The analyzer (referred as IMISketch) uses a new syntactic approach based on an interactive and lazy interpretation of the document. Unlike the classical syntactic approaches, IMISketch does not always select the first or the best found hypothesis. The associated analysis process is able to take into account the uncertainty.

Thanks to the interactivity, the user can be solicited, if needed, by the analyzer to raise ambiguities of recognition (Ghorbel et al., 2011b) i.e. to choose between two or more possible hypotheses or to enrich the a priori knowledge of the system (Ghorbel et al., 2011a). In fact, the user participation has a great impact to avoid error accumulation during the analysis step. To detect ambiguities, we should adopt a method based on breadth-first exploration. Like all the analysis methods based in breadth-first exploration, this approach can induce a large combinatorics. This combinatorics mainly depends on the quality of primitives extracted from the image and the manner to analyze them. In this paper we propose some optimizations to reduce it, by addressing these two points. These optimizations are introduced from the phase of segmentation to the analysis. They will lead to the new system IMISketch+.

The complete system can be applied to off-line documents (image), as illustrated in this paper (Fig. 9(a)), as well as on-line or vectored documents.

In the state of the art, one interesting generic approach is the LADDER (Hammond and Davis, 2006, Hammond and Davis, 2005) system which has been proposed by Hammond and Davis for interpreting a posteriori or on the fly on-line handwritten documents. LADDER language has been exploited for the design of various systems of interpretation of structured documents, such as UML (Hammond and Davis, 2006), electrical diagrams (Alvarado and Davis, 2007) or complex graphs (Hammond and O’Sullivan, 2007). Also Plimmer proposed InkKit that is a framework and a toolkit to recognize complex components (Plimmer and Freeman, 2007, Freeman and Plimmer, 2007). In addition VR Sketchpad (luen Do, 2001) is a pen-based computing environment for inputting and locating 3D objects in a virtual world.

Unlike these methods, our method interprets off-line handwritten structured documents. It has been tested on 2D architectural floor plans. The specific task of floor plan analysis has been addressed for more than twenty years. Lladós et al. (1997) proposed a method for understanding hand drawn floor plans using subgraph isomorphism and Hough transform. Aoki et al. (1996) proposed also a method for interpreting a hand-sketched floor plan. This method focuses on understanding the hand sketched floor plan and converting it into a CAD representation. Also, Sheraz Ahmed et al. (2011) proposed an analysis method specified in printed architectural floor plans.

Contrary to these methods that can require a fastidious a posteriori verification phase, IMISketch system attempts to avoid this phase by integrating the user during the analysis process.

The recognition of a structured document using a structural approach needs an a priori description. The modeling of structured documents differs from one type to another. Several techniques allow the document description. Yamamoto et al. (2006) and Bunke (1990) use classical one-dimensional grammars. Other techniques are used to model two-dimensional documents. Fahmy and Blostein (1992) and Bunke (1982) offer grammar graphs. These grammars have been widely used in the various communities for interpreting off-line documents such as mathematical formulas. Despite graph grammars offer a very expressive mechanism for pattern recognition, these grammars have their limitations. They are expensive to implement and difficult to handle by the developer, especially when the productions become numerous. These graphs are also poorly adapted to deal with uncertainty.

Our goal is to analyze documents of different kinds such as handwritten documents. To overcome this problem, we adopt context-driven constraint multi-set grammars (CD-CMG), designed for on-line recognition (Macé and Anquetil, 2009) associated with a scoring approach based on the fuzzy logic theory. The main contribution of our work is to modify and to extend this formalism to design an interactive analyser for off-line recognition. This strategy allows to solicit the user in the case of strong ambiguity and avoids the fastidious a posteriori verification task to find and correct the remaining interpretation errors.

The remaining of the paper is organized as follows. In the Section 3(a), we introduce the architecture and the basic principles of IMISketch method. The phase of primitive extraction is described in Section 3. Section 4 presents the concepts that are linked with an interactive breadth-first analysis. In Section 5 implementation and optimization of IMISketch analyzer are presented. Experimental results on interpretation of images of 2D handwritten architectural floor plans are reported in Section 6 and finally, Section 7 concludes the paper.

Section snippets

Interactive analysis stages

In this section, we summarize the different steps of treatment to ensure the recognition of a handwritten structured document (cf. Fig. 1). The first step is the segmentation process. This step is purely off-line (i.e. without user interaction). The aim of this phase is to extract all the basic primitives that will be used to analyze the document. In the context of sketch recognition, the segmentation process consists in extracting handwritten strokes as a set of segments. This part is detailed

Related works

The recognition of architectural plans has already been studied in particular by Dosch et al. (2000). In this kind of plans, segments, representing the walls, are primitives widely used. In these works, most of the analyzed plans have been drawn with a ruler (or CAD software), consequently, segments are really straight. In 2006, Hilaire proposed a method that improves the method of Dosch (Hilaire and Tombre, 2006). The originality of his work comes from the segmentation process of the skeleton.

Interactive breadth-first exploration

In this section, we present the analyzer by first describing its main characteristics. Then, we detail the different steps of the internal analysis. The description of this method is followed by a concrete example in the next section (Section 5).

Implementation of IMISketch

In this section, we describe the implementation of our interactive analysis method (IMISketch) and illustrate it on 2D handwritten architectural plans.

Experimental results

In this section we report different results obtained with the complete interactive recognition system. The results are focused on the impact of the presented optimizations : the use of the polygon primitive and the building strategy of tree analysis in terms of recognition rate and of computing time. For this reason, we propose three versions of IMISketch (Fig. 10). The first one (referred as IMISketch) explores all the hypotheses (branches) of the tree analysis in a local context with a set of

Conclusion

In this paper, we have presented a complete generic method to interpret sketches such as 2D architectural floor plans. This method consists of a preprocessing phase in which we extract useful primitives which constitute the inputs of an interactive analyzer. This analyzer is based on a competitive breadth-first exploration of the analysis tree according to a dynamical local context of the document. The decision process is able to solicit the user in the case of strong ambiguity. Generally, the

Acknowledgments

The authors would like to thank all the people who took part in the experiments. This work partially benefits from the financial support of the ANR Project Mobisketch.

References (35)

  • de Brucq, D., Amara, M., Courtellemont, P., Wallon, P., Mesmin, C., 1996. A recursive estimation of parameters of...
  • P. Dosch et al.

    A complete system for analysis of architectural drawings

    International Journal on Document Analysis and Recognition

    (2000)
  • Fahmy, H., Blostein, D., 1992. A survey of graph grammars: theory and applications. In: 11th IAPR International...
  • Fitzgerald, J., Geiselbrechtinger, F., Kechadi, T., 2007. Mathpad: a fuzzy logic-based recognition system for...
  • I.J. Freeman et al.

    Connector semantics for sketched diagram recognition

  • Ghorbel, A., Almaksour, A., Lemaitre, A., Anquetil, E., 2011a. Incremental learning for interactive for sketch...
  • Ghorbel, A., Macé, S., Lemaitre, A., Anquetil, E., 2011. Interactive competitive breadth-first exploration for sketch...
  • Cited by (6)

    View full text