skip to main content
article

Combining character-based bigrams with word-based bigrams in contextual postprocessing for Chinese script recognition

Published:01 December 2002Publication History
Skip Abstract Section

Abstract

It is crucial to use contextual information to improve the recognition accuracy of Chinese script in an offline, handwritten Chinese character-recognition system. However, with the increase in the number of candidates given by a character recognizer, contextual postprocessing using a word-based bigram is time-consuming. This article presents a novel contextual postprocessing method that integrates character-based bigram postprocessing with word-based bigram postprocessing in light of the complementary action between Chinese characters and Chinese words. On the basis of isolated character recognition, character-based bigram postprocessing using a forward-backward search is first executed on a big candidate set, which improves both the accuracy and efficiency of the candidate set (the cumulative accuracy of the top ten candidates is greatly boosted). Then, to further improve accuracy, word-based bigram postprocessing (WBP) is executed on a small candidate set. This method obtains high accuracy while paying attention to postprocessing speed at the same time. Experimental results for three Chinese scripts (about 66,000 characters in total) demonstrate the effectiveness of our method: character-based bigram postprocessing improves accuracy from 81.58% to 94.50%, and the cumulative accuracy of the top ten candidates rises from 94.33% to 98.25%. After WBP, 95.75% accuracy is achieved, which is equivalent to the accuracy of WBP executed on a big candidate set. However, our method is more than 100 times faster than that of WBP.

References

  1. CHANG, C.-H. 1996. Simulated annealing clustering of Chinese words for contextual text recognition. Pattern Recogn. Lett. 17, 3, 30-36.]] Google ScholarGoogle Scholar
  2. CHEN, Y. B. 1997. Research on hand-printed Chinese character recognition. Ph.D. dissertation, Tsinghua University, China (in Chinese).]]Google ScholarGoogle Scholar
  3. CHEN, S. F. AND GOODMAN, J. 1999. An empirical study of smoothing techniques for language modeling. Comput. Speech Lang. 13, 4, 359-394.]]Google ScholarGoogle Scholar
  4. GU, H.-Y., TSENG, C.-Y., AND LEE, L.-S. 1991. Markov modeling of mandarin Chinese for decoding the phonetics sequence into Chinese characters. Comput. Speech Lang. 5, 4, 363-377.]]Google ScholarGoogle Scholar
  5. HO, K.T., HULL, J.J., AND SRIHARI, S.N. 1994. Decision combination in multiple classifier systems. IEEE Trans. Pattern Anal. Mach. Intell. 16, 1, 66-75.]] Google ScholarGoogle Scholar
  6. HOSMER, D.W. AND LEMESHOW, S. 1989. Applied Logistic Regression. Wiley, New York.]]Google ScholarGoogle Scholar
  7. LEE, H.-J. AND TUNG, C.-H. 1997. A language model based on semantically clustered words in a Chinese character recognition system. Pattern Recogn. 30, 8, 1339-1346.]]Google ScholarGoogle Scholar
  8. LI, Y. X. AND DING, X. Q. 2001. Multiple candidate characters in the post-processing for off-line handwritten Chinese character recognition. In Proceedings of the 2001 International Conferences on Info-tech and Info-net (ICII2001, Conference C, Beijing), 438-443.]]Google ScholarGoogle Scholar
  9. LI, Y. X. AND DING, X. Q. 2002. Evaluation of character candidate confidence measure using logistic regression model. Pattern Recogn. Artif. Intell. 15, 2, 160-166 (in Chinese).]]Google ScholarGoogle Scholar
  10. LI, Y. X., DING, X. Q., AND LIU C. S. 1999. Post-processing study of Chinese document recognition based on HMM. J. Chinese Inf. Process. 13, 4, 29-34 (in Chinese).]]Google ScholarGoogle Scholar
  11. LIN, X. F., DING, X. Q., CHEN, M., ET AL. 1998. Adaptive confidence transform based classifier combination for Chinese character recognition. Pattern Recogn. Lett. 19, 10, 975-988.]] Google ScholarGoogle Scholar
  12. LIU, J. 2000. Research on large vocabulary mandarin Chinese continuous speech recognition system. Acta Electron. Sinica 28, 1, 85-91 (in Chinese).]]Google ScholarGoogle Scholar
  13. NEY, H. J. AND ORTAMMNS, S. 1999. Dynamic programming search for continuous speech recognition. IEEE Signal Process. Mag. (Sept.), 64-83.]]Google ScholarGoogle Scholar
  14. RABINER, L. R. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77, 2, 257-286.]]Google ScholarGoogle Scholar
  15. SCHWARTZ, R. AND AUSTIN, S. 1991. A comparison of several approximate algorithms for finding multiple (N-Best) sentence hypotheses. In Proceedings of 1991 International Conferences on Acoustics, Speech, Signal and Processing (ICASSP1991, Toronto, Canada), 701-704.]] Google ScholarGoogle Scholar
  16. TUNG, C.-H. AND LEE, H.-J. 1994. Increasing character recognition accuracy by detection and correction of erroneously identified characters. Pattern Recogn. 27, 9, 1259-1266.]]Google ScholarGoogle Scholar
  17. WONG, P.-K. AND CHAN, C. 1999. Post-processing statistical language models for a handwritten Chinese character recognizer. IEEE Trans. Syst. Man Cybern. 29, 2, 286-291.]]Google ScholarGoogle Scholar
  18. XIA, Y., MA, S. P., CHANG, X. G., ET AL. 1996. The method of automatic post-processing based statistical probabilities for Chinese recognition text. Pattern Recogn. Artif. Intell. 9, 2, 172-178 (in Chinese).]]Google ScholarGoogle Scholar

Index Terms

  1. Combining character-based bigrams with word-based bigrams in contextual postprocessing for Chinese script recognition

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader