Chinese Handwriting Recognition: An Algorithmic Perspective

verfasst von: Tonghua Su

Verlag: Springer Berlin Heidelberg

Buchreihe : SpringerBriefs in Electrical and Computer Engineering

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

Designing machines that can read handwriting like human beings has been an ambitious goal for more than half a century, driving talented researchers to explore diverse approaches. Obstacles have often been encountered that at first appeared insurmountable but were indeed overcome before long. Yet some open issues remain to be solved. As an indispensable branch, Chinese handwriting recognition has been termed as one of the most difficult Pattern Recognition tasks. Chinese handwriting recognition poses its own unique challenges, such as huge variations in strokes, diversity of writing styles, and a large set of confusable categories. With ever-increasing training data, researchers have pursued elaborate algorithms to discern characters from different categories and compensate for the sample variations within the same category. As a result, Chinese handwriting recognition has evolved substantially and amazing achievements can be seen. This book introduces integral algorithms used in Chinese handwriting recognition and the applications of Chinese handwriting recogniers. The first part of the book covers both widespread canonical algorithms to a reliable recognizer and newly developed scalable methods in Chinese handwriting recognition. The recognition of Chinese handwritten text is presented systematically, including instructive guidelines for collecting samples, novel recognition paradigms, distributed discriminative learning of appearance models and distributed estimation of contextual models for large categories, in addition to celebrated methods, e.g. Gradient features, MQDF and HMMs. In the second part of this book, endeavors are made to create a friendlier human-machine interface through application of Chinese handwriting recognition. Four scenarios are exemplified: grid-assisted input, shortest moving input, handwritten micro-blog, and instant handwriting messenger. All the while, the book moves from basic to more complex approaches, also providing a list for further reading with literature comments.

Inhaltsverzeichnis

Frontmatter

Chapter 1. Introduction

Abstract

Chinese character recognition, as an attractive toolset for Chinese digital library projects, has been drawing intense attention. The Chinese character system is used for communication and has served various political purposes in China, having played an important role in the development of Chinese civilization for over 3000 years. Thus, there are a large number of invaluable archives and documents recorded in Chinese, awaiting conversion as readable text for worldwide sharing. As an indispensable branch, Chinese handwriting recognition has been viewed as one of the most difficult pattern recognition tasks that pose its own unique challenges, such as huge variations in strokes, diversity of writing styles, and a large set of confusable categories. With ever-increasing amounts of training data, researchers have been developing effective algorithms to discern characters from different categories and compensate for the sample variations within the same category. With the help of their efforts, substantial achievements have been made in the field of Chinese handwriting recognition. In this book, essential algorithms for effective Chinese handwriting recognition are presented.

Tonghua Su

Chapter 2. HIT-MW Database

Abstract

Standard databases play a fundamental part in handwriting recognition research. This chapter presents a Chinese handwriting database named HIT-MW, designed to facilitate Chinese handwritten text recognition. Both the writers and the texts for handcopying are carefully sampled using a systematic approach. To collect naturally written handwriting, the forms were distributed by postal mail or middleman instead of face to face. The current version of HIT-MW includes 853 forms and 1,86,444 characters that were produced under natural and unconstrained conditions without preprinted character boxes. The statistics show that the database provides an excellent representation of the realistic Chinese handwriting. Many new applications concerning realistic handwriting recognition can be supported by the database. Hundreds of institutes and universities have begun using the HIT-MW database in their experiments over the world.

Tonghua Su

Chapter 3. Integrated Segmentation-Recognition Strategy

Abstract

Reliable recognition of realistic Chinese handwriting is of overwhelming interest yet remains a challenging undertaking. Both sufficient training samples and advanced learning methods are critical to identifying the underlying symbols of a string image. This chapter first outlines two integrated segmentation-recognition reference systems. Then five sophisticated techniques are explored to improve the recognition accuracy or reduce the training costs. Among them, two techniques, PL-MQDF and active set, deal with isolated character classification; others are used to address natural handwriting recognition. String-level training is applied to Chinese handwriting recognition in order to provide robust training. To expand the training data, a perturbation model has been utilized for synthesizing string samples; linguistic constraints are also incorporated. Experiments are conducted to verify the techniques and steady improvements are demonstrated.

Tonghua Su

Chapter 4. Segmentation-Free Strategy: Basic Algorithms

Abstract

The off-line recognition of realistic Chinese handwriting poses significant challenges. This chapter presents a baseline system for a HMMs-based, segmentation-free strategy to address this problem, in which the character segmentation stage is avoided prior to recognition. Handwritten text lines are first converted to observation sequences using sliding windows. Then an embedded Baum-Welch algorithm is used to train character HMMs. Finally, a posterior best character string maximizing is performed with the help of the Viterbi algorithm. Experiments are conducted on the HIT-MW database, which includes data from more than 780 writers. The results show the feasibility of such systems and reveal apparent complementary capacities between the segmentation-free systems and the segmentation-based ones.

Tonghua Su

Chapter 5. Segmentation-Free Strategy: Advanced Algorithms

Abstract

The hidden Markov model (HMM), a powerful tool, has been widely applied to sequence analysis tasks such as speech recognition and handwriting recognition performance. However, its recognition performance is normally limited to the general framework. In this chapter, we investigate sophisticated techniques for improving its recognition performance. This includes a method for synthesizing string samples from isolated character images. Second, enhanced features are derived considering the uniqueness of Chinese characters and text line normalization is added to improve feature discrimination. Third, discriminative training based on MPE criteria is explored in the context of Chinese handwriting recognition for the first time. Fourth, a bridge is built between a segmentation-free and a segmentation-based system. This chapter also discusses the distributed training of the bigram language model.

Tonghua Su

Titel: Chinese Handwriting Recognition: An Algorithmic Perspective
verfasst von: Tonghua Su
Verlag: Springer Berlin Heidelberg
Electronic ISBN: 978-3-642-31812-2
Print ISBN: 978-3-642-31811-5
DOI: https://doi.org/10.1007/978-3-642-31812-2