Skip to main content

2010 | Buch

Symbol Spotting in Digital Libraries

Focused Retrieval over Graphic-rich Document Collections

verfasst von: Marçal Rusiñol, Josep Lladós

Verlag: Springer London

insite
SUCHEN

Über dieses Buch

Pattern recognition basically deals with the recognition of patterns, shapes, objects, things in images. Document image analysis was one of the very ?rst applications of pattern recognition and even of computing. But until the 1980s, research in this ?eld was mainly dealing with text-based documents, including OCR (Optical Character Recognition) and page layout analysis. Only a few people were looking at more speci?c documents such as music sheet, bank cheques or forms. The community of graphics recognition became visible in the late 1980s. Their speci?c interest was to recognize high-level objects represented by line drawings and graphics. The speci?c pattern recognition problems they had to deal with was raster-to-graphics conversion (i.e., recognizing graphical primitives in a cluttered pixel image), text-graphics separation, and symbol recognition. The speci?c problem of symbol recognition in graphical documents has received a lot of attention. The symbols to be recognized can be musical notation, electrical symbols, architectural objects, pictograms in maps, etc. At ?rst glance, the symbol recognition problems seems to be very similar to that of character recognition; - ter all, characters are basically a subset of symbols. Therefore, the large know-how in OCR has been extensively used in graphical symbol recognition: starting with segmenting the document to extract the symbols, extracting features from the s- bols, and then recognizing them through classi?cation or matching, with respect to a training/learning set.

Inhaltsverzeichnis

Frontmatter

Introduction

Frontmatter
Chapter 1. Introduction
Abstract
This first chapter puts in context the symbol spotting problem. By giving a general overview of the Document Image Analysis and Recognition field and, in particular, of the Graphics Recognition research topic, we present the motivations for the present study. We summarize the objectives and contributions of this book as well as the contents of each chapter.
Marçal Rusiñol, Josep Lladós
Chapter 2. State-of-the-Art in Symbol Spotting
Abstract
In this chapter, we will review the related work on symbol spotting which has been done in the last years. We first present a review of the contributions from the Graphics Recognition community to the spotting problem. In the second part, we focus our attention on the different symbol description techniques and the families we can find in the literature. Then, the existing data structures which aim to store the extracted descriptors and provide efficient access to them will be analyzed. We finally review the existing methods for hypotheses validation which can be used for spotting purposes.
Marçal Rusiñol, Josep Lladós

On the Use of Photometric Descriptors for Symbol Spotting

Chapter 3. Symbol Spotting for Document Categorization
Abstract
In this chapter, we present a method for spotting symbols in document images by using a photometric description of symbols. As a running example we present an application of logo spotting. The presented method uses a bag-of-words model in order to perform a categorization of document images such as invoices or receipts. The hypotheses validation is done in terms of spatial coherence by the use of a Hough-like voting scheme. Experiments which demonstrate the effectiveness of this system on a large set of real data are presented at the end of the chapter.
Marçal Rusiñol, Josep Lladós

On the Use of Geometric and Structural Constraints for Symbol Spotting

Frontmatter
Chapter 4. Vectorial Signatures for Symbol Recognition and Spotting
Abstract
In this chapter, we present a method to determine which symbols are probable to be found in technical drawings by the use of vectorial signatures as symbol descriptors. The proposed signature model is formulated in terms of geometric and structural constraints among segments, such as parallelisms, straight angles, etc. After representing vectorized line drawings with attributed graphs, our approach works with a multi-scale representation of these graphs, retrieving the features that are expressive enough to create the signature.
Marçal Rusiñol, Josep Lladós
Chapter 5. Symbol Spotting Through Prototype-based Search
Abstract
In this chapter, we present a method to determine which symbols are probable to be found in technical drawings by the use of a prototype-based search. First, symbols are decomposed into primitives representing closed regions. These primitives are then encoded in terms of attributed strings. Second, the strings are organized in a lookup table so that the set median strings act as representative prototypes of the clusters of similar primitives. This indexing data structure aims at efficiently retrieving the locations from the document collection where similar primitives as the queried ones can be found. Finally, a voting scheme formulates hypotheses in the locations of the line drawing image where there is a high presence of regions similar to the queried ones, and therefore, a high probability to find the queried graphical symbol. The proposed approach has been proved to work even in the presence of noise and distortion introduced by the scanning and raster-to-vector processes.
Marçal Rusiñol, Josep Lladós
Chapter 6. A Relational Indexing Method for Symbol Spotting
Abstract
In this chapter, we present a method to retrieve from a collection of document images the regions of interest where a query symbol is likely to be found. In order to foster the querying speed, a hashing technique is proposed which is able to retrieve very efficiently primitives by similarity. Vectorial primitives are coarsely encoded by well-known shape description methods providing a numerical description of the primitives. A relational indexing approach is presented in order to introduce some structural information of the symbols and provide an accurate hypotheses validation. Experimental results show the performance of the proposed approach.
Marçal Rusiñol, Josep Lladós

A Performance Evaluation Protocol for Symbol Spotting Systems

Chapter 7. Performance Evaluation of Symbol Spotting Systems
Abstract
Symbol spotting systems are intended to retrieve regions of interest from a document image database where the queried symbol is likely to be found. They shall have the ability to recognize and locate graphical symbols in a single step. In this chapter, we present a set of measures to evaluate the performance of a symbol spotting system in terms of recognition abilities, location accuracy and scalability. We show that the proposed measures allow determining the weaknesses and strengths of different methods. In particular, we have evaluated in detail the spotting method presented in Chapter 6.
Marçal Rusiñol, Josep Lladós
Chapter 8. Conclusions
Abstract
In this chapter, we summarize the contributions of this book to the symbol spotting problem and, in particular, to the application of focused retrieval of graphical symbols from collections of line-drawing images. We also present a discussion and the limitations of the presented approaches. We finally point some possible lines of continuation on the field of symbol spotting and some improvements of the proposed methods which should be further studied.
Marçal Rusiñol, Josep Lladós
Backmatter
Metadaten
Titel
Symbol Spotting in Digital Libraries
verfasst von
Marçal Rusiñol
Josep Lladós
Copyright-Jahr
2010
Verlag
Springer London
Electronic ISBN
978-1-84996-208-7
Print ISBN
978-1-84996-207-0
DOI
https://doi.org/10.1007/978-1-84996-208-7