Abstract
It is crucial to use contextual information to improve the recognition accuracy of Chinese script in an offline, handwritten Chinese character-recognition system. However, with the increase in the number of candidates given by a character recognizer, contextual postprocessing using a word-based bigram is time-consuming. This article presents a novel contextual postprocessing method that integrates character-based bigram postprocessing with word-based bigram postprocessing in light of the complementary action between Chinese characters and Chinese words. On the basis of isolated character recognition, character-based bigram postprocessing using a forward-backward search is first executed on a big candidate set, which improves both the accuracy and efficiency of the candidate set (the cumulative accuracy of the top ten candidates is greatly boosted). Then, to further improve accuracy, word-based bigram postprocessing (WBP) is executed on a small candidate set. This method obtains high accuracy while paying attention to postprocessing speed at the same time. Experimental results for three Chinese scripts (about 66,000 characters in total) demonstrate the effectiveness of our method: character-based bigram postprocessing improves accuracy from 81.58% to 94.50%, and the cumulative accuracy of the top ten candidates rises from 94.33% to 98.25%. After WBP, 95.75% accuracy is achieved, which is equivalent to the accuracy of WBP executed on a big candidate set. However, our method is more than 100 times faster than that of WBP.
- CHANG, C.-H. 1996. Simulated annealing clustering of Chinese words for contextual text recognition. Pattern Recogn. Lett. 17, 3, 30-36.]] Google Scholar
- CHEN, Y. B. 1997. Research on hand-printed Chinese character recognition. Ph.D. dissertation, Tsinghua University, China (in Chinese).]]Google Scholar
- CHEN, S. F. AND GOODMAN, J. 1999. An empirical study of smoothing techniques for language modeling. Comput. Speech Lang. 13, 4, 359-394.]]Google Scholar
- GU, H.-Y., TSENG, C.-Y., AND LEE, L.-S. 1991. Markov modeling of mandarin Chinese for decoding the phonetics sequence into Chinese characters. Comput. Speech Lang. 5, 4, 363-377.]]Google Scholar
- HO, K.T., HULL, J.J., AND SRIHARI, S.N. 1994. Decision combination in multiple classifier systems. IEEE Trans. Pattern Anal. Mach. Intell. 16, 1, 66-75.]] Google Scholar
- HOSMER, D.W. AND LEMESHOW, S. 1989. Applied Logistic Regression. Wiley, New York.]]Google Scholar
- LEE, H.-J. AND TUNG, C.-H. 1997. A language model based on semantically clustered words in a Chinese character recognition system. Pattern Recogn. 30, 8, 1339-1346.]]Google Scholar
- LI, Y. X. AND DING, X. Q. 2001. Multiple candidate characters in the post-processing for off-line handwritten Chinese character recognition. In Proceedings of the 2001 International Conferences on Info-tech and Info-net (ICII2001, Conference C, Beijing), 438-443.]]Google Scholar
- LI, Y. X. AND DING, X. Q. 2002. Evaluation of character candidate confidence measure using logistic regression model. Pattern Recogn. Artif. Intell. 15, 2, 160-166 (in Chinese).]]Google Scholar
- LI, Y. X., DING, X. Q., AND LIU C. S. 1999. Post-processing study of Chinese document recognition based on HMM. J. Chinese Inf. Process. 13, 4, 29-34 (in Chinese).]]Google Scholar
- LIN, X. F., DING, X. Q., CHEN, M., ET AL. 1998. Adaptive confidence transform based classifier combination for Chinese character recognition. Pattern Recogn. Lett. 19, 10, 975-988.]] Google Scholar
- LIU, J. 2000. Research on large vocabulary mandarin Chinese continuous speech recognition system. Acta Electron. Sinica 28, 1, 85-91 (in Chinese).]]Google Scholar
- NEY, H. J. AND ORTAMMNS, S. 1999. Dynamic programming search for continuous speech recognition. IEEE Signal Process. Mag. (Sept.), 64-83.]]Google Scholar
- RABINER, L. R. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77, 2, 257-286.]]Google Scholar
- SCHWARTZ, R. AND AUSTIN, S. 1991. A comparison of several approximate algorithms for finding multiple (N-Best) sentence hypotheses. In Proceedings of 1991 International Conferences on Acoustics, Speech, Signal and Processing (ICASSP1991, Toronto, Canada), 701-704.]] Google Scholar
- TUNG, C.-H. AND LEE, H.-J. 1994. Increasing character recognition accuracy by detection and correction of erroneously identified characters. Pattern Recogn. 27, 9, 1259-1266.]]Google Scholar
- WONG, P.-K. AND CHAN, C. 1999. Post-processing statistical language models for a handwritten Chinese character recognizer. IEEE Trans. Syst. Man Cybern. 29, 2, 286-291.]]Google Scholar
- XIA, Y., MA, S. P., CHANG, X. G., ET AL. 1996. The method of automatic post-processing based statistical probabilities for Chinese recognition text. Pattern Recogn. Artif. Intell. 9, 2, 172-178 (in Chinese).]]Google Scholar
Index Terms
- Combining character-based bigrams with word-based bigrams in contextual postprocessing for Chinese script recognition
Recommendations
A hybrid post-processing system for offline handwritten Chinese script recognition
In the recognition of offline handwritten Chinese scripts, contextual post-processing plays a vital role in improving accuracy. In this paper, we systematically analyze the key factors that have an impact on the performance of contextual post-processing:...
Attributed String Matching by Split-and-Merge for On-Line Chinese Character Recognition
Consecutive strokes of Chinese characters tend to be connected in fast writing, and this causes a problem for most stroke-based recognition approaches. A recognition scheme for recognizing cursive Chinese characters under the constraint of correct ...
Hippocampus-heuristic character recognition network for zero-shot learning in Chinese character recognition
Highlights- A novel hippocampus-heuristic character recognition network (HCRN) is proposed for zero/few-shot learning.
AbstractThe recognition of Chinese characters has always been a challenging task due to their huge variety and complex structures. The current radical-based methods fail to recognize Chinese characters without learning all of their radicals in ...
Comments