article

Combining character-based bigrams with word-based bigrams in contextual postprocessing for Chinese script recognition

Authors:
Yuanxiang Li

National University of Singapore

National University of Singapore
View Profile

,
Xiaoqing Ding

Tsinghua University

Tsinghua University
View Profile

,
Chew Lim Tan

National University of Singapore

National University of Singapore
View Profile

ACM Transactions on Asian Language Information Processing Volume 1 Issue 4pp 297–309https://doi.org/10.1145/795458.795461

Published:01 December 2002Publication History

ACM Transactions on Asian Language Information Processing

Abstract

It is crucial to use contextual information to improve the recognition accuracy of Chinese script in an offline, handwritten Chinese character-recognition system. However, with the increase in the number of candidates given by a character recognizer, contextual postprocessing using a word-based bigram is time-consuming. This article presents a novel contextual postprocessing method that integrates character-based bigram postprocessing with word-based bigram postprocessing in light of the complementary action between Chinese characters and Chinese words. On the basis of isolated character recognition, character-based bigram postprocessing using a forward-backward search is first executed on a big candidate set, which improves both the accuracy and efficiency of the candidate set (the cumulative accuracy of the top ten candidates is greatly boosted). Then, to further improve accuracy, word-based bigram postprocessing (WBP) is executed on a small candidate set. This method obtains high accuracy while paying attention to postprocessing speed at the same time. Experimental results for three Chinese scripts (about 66,000 characters in total) demonstrate the effectiveness of our method: character-based bigram postprocessing improves accuracy from 81.58% to 94.50%, and the cumulative accuracy of the top ten candidates rises from 94.33% to 98.25%. After WBP, 95.75% accuracy is achieved, which is equivalent to the accuracy of WBP executed on a big candidate set. However, our method is more than 100 times faster than that of WBP.

References

CHANG, C.-H. 1996. Simulated annealing clustering of Chinese words for contextual text recognition. Pattern Recogn. Lett. 17, 3, 30-36.]] Google Scholar
CHEN, Y. B. 1997. Research on hand-printed Chinese character recognition. Ph.D. dissertation, Tsinghua University, China (in Chinese).]]Google Scholar
CHEN, S. F. AND GOODMAN, J. 1999. An empirical study of smoothing techniques for language modeling. Comput. Speech Lang. 13, 4, 359-394.]]Google Scholar
GU, H.-Y., TSENG, C.-Y., AND LEE, L.-S. 1991. Markov modeling of mandarin Chinese for decoding the phonetics sequence into Chinese characters. Comput. Speech Lang. 5, 4, 363-377.]]Google Scholar
HO, K.T., HULL, J.J., AND SRIHARI, S.N. 1994. Decision combination in multiple classifier systems. IEEE Trans. Pattern Anal. Mach. Intell. 16, 1, 66-75.]] Google Scholar
HOSMER, D.W. AND LEMESHOW, S. 1989. Applied Logistic Regression. Wiley, New York.]]Google Scholar
LEE, H.-J. AND TUNG, C.-H. 1997. A language model based on semantically clustered words in a Chinese character recognition system. Pattern Recogn. 30, 8, 1339-1346.]]Google Scholar
LI, Y. X. AND DING, X. Q. 2001. Multiple candidate characters in the post-processing for off-line handwritten Chinese character recognition. In Proceedings of the 2001 International Conferences on Info-tech and Info-net (ICII2001, Conference C, Beijing), 438-443.]]Google Scholar
LI, Y. X. AND DING, X. Q. 2002. Evaluation of character candidate confidence measure using logistic regression model. Pattern Recogn. Artif. Intell. 15, 2, 160-166 (in Chinese).]]Google Scholar
LI, Y. X., DING, X. Q., AND LIU C. S. 1999. Post-processing study of Chinese document recognition based on HMM. J. Chinese Inf. Process. 13, 4, 29-34 (in Chinese).]]Google Scholar
LIN, X. F., DING, X. Q., CHEN, M., ET AL. 1998. Adaptive confidence transform based classifier combination for Chinese character recognition. Pattern Recogn. Lett. 19, 10, 975-988.]] Google Scholar
LIU, J. 2000. Research on large vocabulary mandarin Chinese continuous speech recognition system. Acta Electron. Sinica 28, 1, 85-91 (in Chinese).]]Google Scholar
NEY, H. J. AND ORTAMMNS, S. 1999. Dynamic programming search for continuous speech recognition. IEEE Signal Process. Mag. (Sept.), 64-83.]]Google Scholar
RABINER, L. R. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77, 2, 257-286.]]Google Scholar
SCHWARTZ, R. AND AUSTIN, S. 1991. A comparison of several approximate algorithms for finding multiple (N-Best) sentence hypotheses. In Proceedings of 1991 International Conferences on Acoustics, Speech, Signal and Processing (ICASSP1991, Toronto, Canada), 701-704.]] Google Scholar
TUNG, C.-H. AND LEE, H.-J. 1994. Increasing character recognition accuracy by detection and correction of erroneously identified characters. Pattern Recogn. 27, 9, 1259-1266.]]Google Scholar
WONG, P.-K. AND CHAN, C. 1999. Post-processing statistical language models for a handwritten Chinese character recognizer. IEEE Trans. Syst. Man Cybern. 29, 2, 286-291.]]Google Scholar
XIA, Y., MA, S. P., CHANG, X. G., ET AL. 1996. The method of automatic post-processing based statistical probabilities for Chinese recognition text. Pattern Recogn. Artif. Intell. 9, 2, 172-178 (in Chinese).]]Google Scholar

Index Terms

Combining character-based bigrams with word-based bigrams in contextual postprocessing for Chinese script recognition
1. Applied computing
  1. Document management and text processing
    1. Document capture
      1. Optical character recognition
2. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
    2. Search methodologies
      1. Discrete space search
      2. Game tree search

Recommendations

A hybrid post-processing system for offline handwritten Chinese script recognition

In the recognition of offline handwritten Chinese scripts, contextual post-processing plays a vital role in improving accuracy. In this paper, we systematically analyze the key factors that have an impact on the performance of contextual post-processing:...
Read More
Attributed String Matching by Split-and-Merge for On-Line Chinese Character Recognition

Consecutive strokes of Chinese characters tend to be connected in fast writing, and this causes a problem for most stroke-based recognition approaches. A recognition scheme for recognizing cursive Chinese characters under the constraint of correct ...
Read More
Hippocampus-heuristic character recognition network for zero-shot learning in Chinese character recognition
Highlights
- A novel hippocampus-heuristic character recognition network (HCRN) is proposed for zero/few-shot learning.
Abstract
The recognition of Chinese characters has always been a challenging task due to their huge variety and complex structures. The current radical-based methods fail to recognize Chinese characters without learning all of their radicals in ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Asian Language Information Processing Volume 1, Issue 4
December 2002
29 pages
ISSN:1530-0226
EISSN:1558-3430
DOI:10.1145/795458
Issue’s Table of Contents

Copyright © 2002 ACM
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 December 2002
Published in talip Volume 1, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Chinese character recognition
contextual post-processing
efficiency of candidate set
forward-backward search
statistical language model
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 393
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Combining character-based bigrams with word-based bigrams in contextual postprocessing for Chinese script recognition

ACM Transactions on Asian Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

A hybrid post-processing system for offline handwritten Chinese script recognition

Attributed String Matching by Split-and-Merge for On-Line Chinese Character Recognition

Hippocampus-heuristic character recognition network for zero-shot learning in Chinese character recognition

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Combining character-based bigrams with word-based bigrams in contextual postprocessing for Chinese script recognition

ACM Transactions on Asian Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

A hybrid post-processing system for offline handwritten Chinese script recognition

Attributed String Matching by Split-and-Merge for On-Line Chinese Character Recognition

Hippocampus-heuristic character recognition network for zero-shot learning in Chinese character recognition

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media