Skip to main content

Über dieses Buch

Document image analysis is the automatic computer interpretation of images of printed and handwritten documents, including text, drawings, maps, music scores, etc. Research in this field supports a rapidly growing international industry. This is the first book to offer a broad selection of state-of-the-art research papers, including authoritative critical surveys of the literature, and parallel studies of the architectureof complete high-performance printed-document reading systems. A unique feature is the extended section on music notation, an ideal vehicle for international sharing of basic research. Also, the collection includes important new work on line drawings, handwriting, character and symbol recognition, and basic methodological issues. The IAPR 1990 Workshop on Syntactic and Structural Pattern Recognition is summarized,including the reports of its expert working groups, whose debates provide a fascinating perspective on the field. The book is an excellent text for a first-year graduate seminar in document image analysis,and is likely to remain a standard reference in the field for years.



Printed Documents


Towards the Understanding of Printed Documents

Document analysis aims at the transformation of data presented on paper and addressed to human comprehension into a computer-revisable form. The pixel representation of a scanned document must be converted into a structured set of symbolic entities, which are appropriate for the intended kind of computerized information processing. It can be argued that the achieved symbolic description level resembles the degree of understanding acquired by a document analysis system. This interpretation of the term ‘understanding’ shall be explained a little more deeply. An attempt shall be made to clarify the important question: “Up to what level can a machine really understand a given document?” Looking at the many problems still unsolved, this is indeed questionable.

Thomas Bayer, Jürgen Franke, Ulrich Kressel, Eberhard Mandler, Matthias Oberländer, Jürgen Schürmann

An Experimental Implementation of a Document Recognition System for Papers Containing Mathematical Expressions

This paper describes the current state of an experimental document recognition system for scientific papers. A scientific paper contains not only text but also tables, pictures, graphics, and mathematical expressions. This system can convert character or symbol strings in text as well as mathematical expressions and tables into coded data. Out of all the functions required for the entire process from document scanning through recognition, these have been investigated and implemented: skew detection and correction, region (block) segmentation, and mathematical expression recognition. The algorithms have been designed for high speed as much as possible. This experimental system is implemented entirely in software on a work station under X-windows. Some experimental results on each stage of the document recognition process are presented.

Masayuki Okamoto, Akira Miyazawa

Towards a Structured-Document-Image Utility

Digitized documents are preprocessed for library archival and distribution. Pages from technical journals scanned at 300 dpi are recursively segmented with multiple alternating horizontal and vertical cuts according to publication-specific layout information. The resulting rectangular blocks are labeled concurrently in terms of functional components such as author block, title, abstract. Three specific applications for integrating paper-based information into an electronic environment are described. The first will allow library users to interactively select and view portions of a page-image, such as illustrations, subtitles, or text paragraphs, when either the screen resolution of their workstation or low transmission rates prevent viewing the bitmap of an entire page. The second application will automate the selection of suitable portions of the document ( e.g., abstracts) for subsequent optical character recognition for full-text search of journals and reports in paper form. The third application will automatically link pixel maps of illustrations with ASCII text files. All three applications are based on syntactic processing of binary document images.

George Nagy

ANASTASIL: A System for Low-Level and High-Level Geometric Analysis of Printed Documents

This paper focuses on the knowledge-based document analysis system ANASTASIL (Analysis System to Interpret Areas in Single-sided Letters). The system identifies important conceptual parts (logical objects) within business letters, like recipient, sender or company-specific printings. Thereby, the system works completely independent of text recognition. Instead, it only utilizes geometric knowledge sources. These are: global geometric knowledge about logical object arrangements, and local geometric knowledge about formal features of logical objects (e.g. extensions, typical font sizes, etc). As a result, a document image is classified by labeling area items by corresponding logical object designators after hypothesizing and testing geometric properties of the captured physical units (layout objects). Due to this strategy, ANASTASIL can be envisioned as a key for expectation-driven further analysis of logical objects by text or graphic recognition. The system has been completely implemented and has achieved some remarkable results. It is composed of a low-level geometric analysis module for image processing tasks and a high-level geometric analysis module that performs logical labeling of layout objects. The implementation was done on a SUN 3/60 workstation in C and Common-Lisp and will be soon available in the MacIntosh environment.

Andreas Dengel

A Top-Down Approach to the Analysis of Document Images

Top-down methods for document image understanding to extract information from documents are presented. A language FDL, Form Definition Language, is first introduced which can represent generic layout structures as a set of rectangular regions, each of which are recursively defined in terms of inclusive rectangular regions. To identify concrete layout structures, the generic descriptions are matched against input document images, and specific items on a page are extracted. Although this approach is powerful, it is rather complex and a simplification is possible. A language SFDL, a simplified version of FDL, can be used for rather regular document forms to describe characteristic patterns of the document form in terms of templates. Templates are rectangular regions among which spatial constraints are defined without recursions. By matching such templates, input document images can be classified, and bibliographic items such as a title and author’s name can be extracted. Experiments have shown promising results and it can be applied to an automatic document filing system.

Hiromichi Fujisawa, Yasuaki Nakano

Analysis of Scanned Documents — a Syntactic Approach

Pages from technical journals are analyzed with a syntactic approach to hierarchically identify the spatial structure of a scanned document image and to assign logical labels to its component blocks. These blocks may be fields of text, tables and figures. Publication-specific knowledge is used in the segmentation and labeling of the blocks. The information is meticulously coded in the form of block-grammars used to describe the relationships between the various entity classes or blocks. Lexical and syntax analysis tools, Lex and Yacc, are used to segment and label the blocks. These block-grammars can be generated manually or from a parameter table. Further extensions involving multiple grammars and backtracking algorithms are also discussed.

Mahesh Viswanathan

Character and Symbol Recognition


Structural Analysis and Description of Curves by Quasi-Topological Features and Singular Points

We propose a compact and concise description method of curves in terms of the quasi-topological features and the structure of each singular point. By quasi-topological features, we mean the convexity, loop, and connectivity. The quasi-topological structure is analyzed in a hierarchical way, and algebraic structure is presented explicitly on each representation level. The lower-level representations are integrated into the higher-level one in the systematic way. When a curve has singular points (branch points), the curve is decomposed into components each of which is a simple arc or a simple closed curve by decomposing each singular point. The description scheme is applied to character recognition.

Hirobumi Nishida, Shunji Mori

Combination of Decisions by Multiple Classifiers

A technique for combining the results of classifier decisions in a multi-classifier recognition system is presented. Each classifier produces a ranking of a set of classes. The combination technique uses these rankings to determine a small subset of the set of classes that contains the correct class. A group consensus function is then applied to re-rank the elements in the subset. This methodology is especially suited for recognition systems with large numbers of classes where it is valuable to reduce the decision problem to a manageable size before making a final determination about the identity of the image. Experimentation is discussed in which the proposed method is used with a word recognition problem where 40 classifiers are applied to degraded machine-printed word images and where a typical lexicon contains 235 words. A 96.6% correct rate is achieved within the 10 best decisions for 817 test images.

Tin Kam Ho, Jonathan J. Hull, Sargur N. Srihari

Resolving Ambiguity in Segmenting Touching Characters

This paper presents an efficient and powerful character segmentation method which enables touching characters in a document to be read accurately at high speed. The character segmentation phase extracts characters from a text line. Connected components in a text line image may have to be segmented or combined to form recognizable characters. For example, the character ‘i’ is formed by combining two components. Touching characters are segmented to identify each character. Segmenting touching characters is an open problem, whose solution would advance the field. Touching characters have several candidates for their break position, which are then confirmed by recursive segmentation and recognition, and finally by the determination of the linguistic context. There are several possible candidates at each stage. For example, several candidates for the break position of touching characters are nominated. Any segmented area might possibly fit several alternative characters. Therefore, an efficient resolution of ambiguity at each stage is significantly critical and indispensable for practical text reading. The authors’ approach is based on knowledge about character composition (e.g. an ‘m’ is like a combination of an ‘r’ and an ‘n’), as well as knowledge about omni-fonts. Knowledge about character composition compresses the number of recursive segmentation and recognition.

Shuichi Tsujimoto, Haruo Asada



Off-line Identification With Handwritten Signature Images: Survey and Perspectives

The first part of this paper presents a survey of the literature on automatic handwritten signature verification systems using binary or gray-level images, and focuses primarily on preprocessing techniques, feature or primitive extraction methods, comparison processes, and performance evaluation. With these previous studies in mind, we propose, in the second part of this paper, an image-understanding system based on the extraction of a novel representation of handwritten signature images. This approach is text insensitive. A structural match between a reference primitive set Pr and a test primitive set Pt takes into account the geometric shape and spatial relations between primitives. Finally, the local comparison of gray levels between pairs of primitives next to each node of the static solution path N results in a pseudodynamic similarity measure ϑ d (Pr, Pt). This scheme allows the elimination, with a certain degree of success, of skilled forgeries such as tracings and photocopies, showing marked gray-level dissimilarity along the signature line.

Robert Sabourin, Réjean Plamondon, Guy Lorette

Difficult Cases in Handwritten Numeral Recognition

One avenue to improve the performance of computer systems for the recognition of handwritten data is to develop a better grasp of human expertise in this area. Through an experiment, we investigate the clues and cues used by humans to recognize the most confusing samples of our database. Four algorithms are also applied, individually and in a combined manner, to the same difficult data set. Machine and human performances are compared. Such knowledge should be used to refine the existing algorithms which are proved to be clearly inferior to the human experts in recognition of unconstrained handwritten numerals.

Raymond Legault, Ching Y. Suen, Christine Nadal

Recognition of Hand-Printed Chinese Characters and the Japanese Cursive Syllabary

We describe a method for the recognition of hand-printed Chinese characters (Kanji) and Japanese cursive syllabary characters (Hiragana). The system preclassifies using cellular features and then classifies by a combination of feature extraction of line segments, an extreme-point method, and a relaxation matching method. Relaxation is a technique using contextual information to reduce local ambiguities: initial probabilities are assigned to matches between pairs of the line segments in the dictionary and the input, and then iteration finds acceptable combinations of the matches. The system was trained and tested on the ETL8 database; recognition results were 99% correct on the “good quality” data set, and 94.6% on the “poor quality” data set.

Kazuhiko Yamamoto, Hiromitsu Yamada

Regularities and Singularities in Line Pictures

The main difficulty of Pattern Recognition is the interpretation of the iconic picture, the pixels, into primitive features. The interest of a feature classification as “regular” and “singular” is recalled and justified. These ideas are applied to line images, and particularly to handwritten words. On the graph representation of a word, a regular feature, the “axis,” is found; it allows to segment the word representation and to obtain a descriptive chain. Examples of such symbolic descriptions are given, with their interpretation as a word among a list of 25 words. The results are of the order of 87% success, 3% substitution on one handwriting. Preliminary results show that it may be extended to large categories of handwriting.

Jean-Claude Simon, Olivier Baret

Graphics, Maps, and Technical Drawings


An Overview of Techniques for Graphics Recognition

An overview is presented of algorithms and techniques for document image analysis with an emphasis on those for graphics recognition and interpretation. Topics covered are data capture, segmentation into text and graphics regions, vectorization, identification of graphical primitives, and generation of succinct image interpretations. This paper is primarily survey in nature, but an effort is made to provide information to evaluate and compare techniques, both through reference to more focused articles, as well as through our own experience.

Rangachar Kasturi, Rajesh Raman, Chakravarthy Chennubhotla, Lawrence O’Gorman

High Quality Vectorization Based on a Generic Object Model

We propose a new method for high quality vectorization based on a generic object model. In conventional methods, the quality of vectorization has been regarded as the same thing as digitizing accuracy, which is evaluated in terms of the error between the original line-drawing and the resultant vector sequence. However, this accuracy does not always correspond to perceptual quality. Based on the discussion about possible distortions in a vectorization process which strongly affects perceptual quality, we introduce a generic model of an object. The generic properties are described in the object model, for example, “The object is a polygon whose corners all have a right angle” or “The object is composed of several pairs of almost parallel lines”. The method consists of three processes: pre-vectorization, object recognition and shape modification based on an object model. The approximation line-figure which is first defined in the pre-vectorization process, is modified so as to meet the properties described in the object model. A cost function is derived from the model so that shape modification can be formalized as an optimization problem. A relaxation method is employed to save on computation time. Object recognition is needed for object model selection. The method was applied to geographic maps for urban planning to extract building polygons. Subjective evaluation on the extracted building shapes showed that 93.6% of the buildings were vectorized with high quality and that the number of buildings with insufficient quality was reduced to one twelfth.

Osamu Hori, Akio Okazaki

Recognizing Hand-Drawn Electrical Circuit Symbols with Attributed Graph Matching

We propose a model-based approach to the recognition of hand-drawn electrical circuit symbols under arbitrary unknown similarity transformations (translation, rotation, and scale). A hybrid representation, called an attributed graph (AG), incorporating structural and statistical characteristics of image patterns, is used for matching an unknown symbol to models. We describe experimental results that suggest that AG matching is effective and efficient for this problem.

Seong-Whan Lee

Self-Structural Syntax-Directed Pattern Recognition of Dimensioning Components in Engineering Drawings

A mechanical engineering drawing is a set of orthogonal 2D views of a 3D object. To establish vision-based communication among CAD/CAM systems and to implement 3D reconstruction, drawings need to be preprocessed to exploit the information conveyed by the annotation before separating it from the geometry. A rationale for a machine drawing understanding system is first argued. A context-free dimensioning grammar provides a basis for a self-structural, syntax-directed pattern recognition scheme. It employs learning the characteristic parameters of arrowheads and text from a detected sample, and using them to detect the entire population. Syntactic considerations that significantly decrease the search space help predict the spatial location and orientation of potential components. The underlying ideas of the scheme may be applicable to a broad scope of tasks involving intelligent recognition.

Dov Dori

Analysis of Technical Documents: The REDRAW System

The automated analysis of technical documents is a problem which has been worked on for several years [19, 37, 16, 18], in many kinds of applications, including flow charts [1], mechanical engineering [8, 23], city maps (plats) [17, 27, 31], electrical circuitry [22, 13], hand-drawn figures [30] and geographical maps [38].

Dominique Antoine, Suzanne Collin, Karl Tombre

Music Notation


A Critical Survey of Music Image Analysis

The research literature concerning the automatic analysis of images of printed and handwritten music notation, for the period 1966 through 1990, is surveyed and critically examined.

Dorothea Blostein, Henry S. Baird

A Recognition System for Printed Piano Music Using Musical Knowledge and Constraints

We describe a recognition system for printed piano music, which presents challenging problems in both image pattern matching and semantic analysis. In music notation, the shape of symbols is simple, but confusing connections and overlaps among symbols occur. In order to deal with these difficulties, proper knowledge is required, so our system adopts a top-down approach based on bar-unit recognition to use musical knowledge and constraints effectively. Recognition results, described with a symbolic playable data format, exceed 90% correct on beginner’s piano music.

Hirokazu Kato, Seiji Inokuchi

Automatic Recognition of Printed Music

There is a need for an automatic recognition system for printed music scores. The work presented here forms the basis of an omnifont, size-independent system with significant tolerance of noise and rotation of the original image. A structural decomposition technique is used based on an original transformation of the line adjacency graph. An example of output is given in the form of a data file and its score reconstruction.

Nicholas P. Carter, Richard A. Bacon

Automatic Recognition of Several Types of Musical Notation

This paper describes recent progress towards systems for automatic recognition of several different types of musical notation, including printed sheet music, Braille music, and dance notation.

Takebumi Itagaki, Masayuki Isogai, Shuji Hashimoto, Sadamu Ohteru



Syntactic and Structural Methods in Document Image Analysis

We survey applications of syntactic and structural pattern recognition (SSPR) methods to the field of document image analysis, analyze the strengths and weakness of these methods for document image analysis, and discuss other complementary techniques.

Alberto Sanfeliu

Syntactic Analysis of Context—Free Plex Languages for Pattern Recognition

Plex languages are a generalization of string languages into two dimensions. In this paper we describe an algorithm for the syntactic analysis of plex languages. The algorithm is an extension of Earley’s parser which was originally developed for context free string languages. Our algorithm is able to recognize not only complete two—dimensional structures generated by a plex grammar but also partial ones.

Horst Bunke, Bernhard Haller

Document Image Analysis Using Logic-Grammar-Based Syntactic Pattern Recognition

Logic grammars such as Definite Clause Grammars — linguistic formalisms closely associated with the Prolog language — are powerful and highly versatile systems for specifying and detecting patterns not only in natural language, but in any suitably organized string, signal, or image. This paper reviews several such prototypes we have developed, which demonstrate the characteristics that make logic grammars a good vehicle for many Syntactic Pattern Recognition applications.

David B. Searls, Suzanne Liebowitz Taylor

Document Image Defect Models

A lack of explicit quantitative models of imaging defects due to printing, optics, and digitization has retarded progress in some areas of document image analysis, including syntactic and structural approaches. Establishing the essential properties of such models, such as completeness (expressive power) and calibration (closeness of fit to actual image populations) remain open research problems. Work-in-progress towards a parameterized model of local imaging defects is described, together with a variety of motivating theoretical arguments and empirical evidence. A pseudo-random image generator implementing the model has been built. Applications of the generator are described, including a polyfont classifier for ASCII and a single-font classifier for a large alphabet (Tibetan U-Chen), both of which which were constructed with a minimum of manual effort. Image defect models and their associated generators permit a new kind of image database which is explicitly parameterized and indefinitely extensible, alleviating some drawbacks of existing databases.

Henry S. Baird

IAPR 1990 Workshop on SSPR


SSPR’90 Workshop Report

The International Association for Pattern Recognition (IAPR) Workshop on Syntactic and Structural Pattern Recognition (SSPR), was held in Murray Hill, New Jersey, USA, during 13–15 June 1990. This was the most recent in a bi-yearly series of workshops sponsored by IAPR Technical Committee No. 2 (TC-2) on SSPR, that historically has attracted theoreticians from Europe, Japan, and the U.S. SSPR methods emphasize the use of formal a priori models of the symbolic content of images or signals. For many types of complex printed and handwritten documents, SSPR methods are natural and helpful. For this reason, the organizers decided to place special emphasis on applications to document image analysis, and invited IAPR TC-11 on Text Processing to co-sponsor the workshop.

Henry S. Baird, Lawrence O’Gorman

Document Analysis: SSPR’90 Working Group Report

This report summarizes the discussions of the Working Group on the Document Analysis of the IAPR 1990 Workshop on Syntactic and Structural Pattern Recognition, Murray Hill, NJ, 13–15 June 1990. Thirteen researchers from three countries participated: T. Bayer, S. C. Chennubhotla, A. Dengel, H. Fujisawa, J. Kanai, R. Kasturi, G. Nagy, L. O’Gorman, M. Okamoto, L. Spitz, S. Tsujimoto, M. Viswanathan, and M. Yamada. Andreas Dengel moderated the discussion, and Junichi Kanai served as scribe.

Junichi Kanai, Andreas Dengel

Character Recognition: SSPR’90 Working Group Report

This report summarizes the discussions of the Working Group on Character Recognition of the IAPR 1990 Workshop on Syntactic and Structural Pattern Recognition, Murray Hill, NJ, 13–15 June 1990. The participants were: H. Baird, T. Bayer, H. Fujisawa, T. K. Ho, J. Hull, T. Itagaki, D. Lee, S. Liebowitz, O. Matan, G. Nagy, T. Pavlidis, and S. Srihari. George Nagy moderated the discussion and Thomas Bayer prepared this report based on notes by Nagy, Jonathan Hull, and himself.

Thomas Bayer, Jonathan Hull, George Nagy

Line Drawings, Feature Extraction, and Symbol Recognition: SSPR’90 Working Group Report

This report summarizes the discussions of the Working Group on the Line Drawing, Feature Extraction, and Symbol Recognition, of the IAPR 1990 Workshop on Syntactic and Structural Pattern Recognition, Murray Hill, NJ, 13–15 June 1990. The somewhat awkward mix of topics was, nevertheless, very popular: nineteen people attended, six from Japan, three from the UK, two from Germany, two from France, one from Korea, and the rest from the USA. The discussion moderator was Dov Dori, and Patrick Wang served as scribe.

Patrick S. P. Wang, Dov Dori

Technical Drawing Analysis: SSPR’90 Working Group Report

This report summarizes the discussions of the Working Group on Technical Drawing Analysis of the IAPR 1990 Workshop on Syntactic and Structural Pattern Recognition, Murray Hill, NJ, 13–15 June 1990. Approximately twelve persons attended this session. We were charged with the task of identifying sound methods and open problems in computer-based or computer-assisted analysis of technical drawings. The discussion was moderated by Stephen Joseph and recorded by Eric Saund.

Eric Saund, Steven H. Joseph

Recognition of Music Notation: SSPR’90 Working Group Report

This report summarizes the discussions of the Working Group on the Recognition of Music Notation, of the IAPR 1990 Workshop on Syntactic and Structural Pattern Recognition, Murray Hill, NJ, 13–15 June 1990. The participants were: D. Blostein, N. Carter, R. Haralick, T. Itagaki, H. Kato, H. Nishida, and R. Siromoney. The discussion was moderated by Nicholas Carter and recorded by Dorothea Blostein.

Dorothea Blostein, Nicholas P. Carter

Syntactic Methods: SSPR’90 Working Group Report

This report summarizes the discussions of the Working Group on Syntactic Methods of the IAPR 1990 Workshop on Syntactic and Structural Pattern Recognition, Murray Hill, NJ, 13–15 June 1990. The group broadened its discussion somewhat to embrace structural as well as syntactic models, and placed particular emphasis on the inference of models. The members of the group were M. Chandrasekaran, S. Collin, A. Huq, O. V. Larsen, N. S. V. Rao, A. Sanfeliu, R. Siromoney, and E. Tanaka. The discussion moderator was A. Sanfeliu, and R. Siromoney served as scribe.

Rani Siromoney, Alberto Sanfeliu


Weitere Informationen