Skip to main content
Erschienen in: Pattern Analysis and Applications 3/2005

01.12.2005 | Theoretical Advances

A hybrid post-processing system for offline handwritten Chinese script recognition

verfasst von: Yuan-Xiang Li, Chew Lim Tan, Xiaoqing Ding

Erschienen in: Pattern Analysis and Applications | Ausgabe 3/2005

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In the recognition of offline handwritten Chinese scripts, contextual post-processing plays a vital role in improving accuracy. In this paper, we systematically analyze the key factors that have an impact on the performance of contextual post-processing: statistical language models (LMs), candidate confidence, candidate set size, and search strategy. We then present a hybrid post-processing system, which integrates various kinds of information available. Next, we investigate seven LMs, four estimation methods of candidate confidence and different size of candidate set, and illustrate their influence on the performance of contextual post-processing in detail. Experimental results justify that the performance of the LMs are affected by training corpora size, smoothing method, and model pruning, and that lower perplexity correlates with a high accuracy. Comparing different estimation methods of candidate confidence shows that, it is vital to the contextual post-processing. We also show that allowing the correct characters to be captured in a limited number of candidates is extremely important for obtaining good post-processing performance. By adopting the hybrid post-processing, we can obtain high accuracy while paying attention to post-processing speed and memory space at the same time. It is shown that the average recognition accuracy of three Chinese scripts (about 66,000 characters in total) can reach 97.65%, which means 87% error correction rate in comparison with the 81.58% average accuracy before post-processing. In the end, we give some proposals for choosing a proper post-processing method for real script recognition tasks.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
x denotes a character image and c k is the kth recognition candidate of x.
 
2
Cumulative recognition accuracy=(1.0×the number of correct characters in the top K candidates / total characters)×100%
 
3
recognition accuracy= (1.0 − the number of incorrect characters / total characters)×100%
 
4
error correction rate=(1.0 − the number of errors after post-processing / the number of errors before post-processing)×100%
 
Literatur
1.
Zurück zum Zitat Suen CY, Mori S, Kim SH, et al (2003) Analysis and recognition of Asian scripts—the state of the art. In: Proceedings of 7th international conference on document analysis and recognition, Edinburgh, UK, pp 866–878 Suen CY, Mori S, Kim SH, et al (2003) Analysis and recognition of Asian scripts—the state of the art. In: Proceedings of 7th international conference on document analysis and recognition, Edinburgh, UK, pp 866–878
2.
Zurück zum Zitat Xiong Y, Huo Q, Chan C (2001) A discrete contextual stochastic model for the offline recognition of handwritten Chinese characters. IEEE Trans Pattern Anal Mach Intell 23(7):774–782CrossRef Xiong Y, Huo Q, Chan C (2001) A discrete contextual stochastic model for the offline recognition of handwritten Chinese characters. IEEE Trans Pattern Anal Mach Intell 23(7):774–782CrossRef
3.
Zurück zum Zitat Zhang J, Ding X, Liu C (2000) Multi-scale feature extraction and nested-subset classifier design for high accuracy handwritten character recognition. In: Proc 15th international conference on pattern recognition, Barcelona, Spain 2:581–584 Zhang J, Ding X, Liu C (2000) Multi-scale feature extraction and nested-subset classifier design for high accuracy handwritten character recognition. In: Proc 15th international conference on pattern recognition, Barcelona, Spain 2:581–584
4.
Zurück zum Zitat Tang YY, Tu LT, Liu J, et al (1998) Offline recognition of Chinese handwriting by multifeature and multilevel classification. IEEE Trans Pattern Anal Mach Intell 20(5):556–561CrossRef Tang YY, Tu LT, Liu J, et al (1998) Offline recognition of Chinese handwriting by multifeature and multilevel classification. IEEE Trans Pattern Anal Mach Intell 20(5):556–561CrossRef
5.
Zurück zum Zitat Tung CH, Lee HJ (1994) Increasing character recognition accuracy by detection and correction of erroneously identified characters. Pattern Recogn 27(9):1259–1266CrossRef Tung CH, Lee HJ (1994) Increasing character recognition accuracy by detection and correction of erroneously identified characters. Pattern Recogn 27(9):1259–1266CrossRef
6.
Zurück zum Zitat Chang CH (1996) Simulated annealing clustering of Chinese words for contextual text recognition. Pattern Recogn Lett 17(1):57–66CrossRef Chang CH (1996) Simulated annealing clustering of Chinese words for contextual text recognition. Pattern Recogn Lett 17(1):57–66CrossRef
7.
Zurück zum Zitat Lee HJ, Tung CH (1997) A Language model based on semantically clustered words in a Chinese character recognition system. Pattern Recogn 30(8):1339–1346CrossRef Lee HJ, Tung CH (1997) A Language model based on semantically clustered words in a Chinese character recognition system. Pattern Recogn 30(8):1339–1346CrossRef
8.
Zurück zum Zitat Wong PK, Chan C (1999) Post-processing statistical language models for a handwritten Chinese character recognizer. IEEE Trans Syst Man Cybern Part B Cybern 29(2):286–291CrossRef Wong PK, Chan C (1999) Post-processing statistical language models for a handwritten Chinese character recognizer. IEEE Trans Syst Man Cybern Part B Cybern 29(2):286–291CrossRef
9.
Zurück zum Zitat Samuelsson C, Reichl W (1999) A class-based language model for large-vocabulary speech recognition extracted from part-of speech statistics. In: Proceedings of international conference on acoustics, speech and signal processing, Phoenix, USA 1:537–540 Samuelsson C, Reichl W (1999) A class-based language model for large-vocabulary speech recognition extracted from part-of speech statistics. In: Proceedings of international conference on acoustics, speech and signal processing, Phoenix, USA 1:537–540
10.
Zurück zum Zitat Li YX, Tan CL, Ding X, et al (2004) Contextual post-processing based on the confusion matrix in offline handwritten Chinese script recognition. Pattern Recogn 37(9):1901–1912MATHCrossRef Li YX, Tan CL, Ding X, et al (2004) Contextual post-processing based on the confusion matrix in offline handwritten Chinese script recognition. Pattern Recogn 37(9):1901–1912MATHCrossRef
11.
Zurück zum Zitat Li Y, Ding X, Tan CL (2002) Combining character-based bigram with word-based bigram in contextual post-processing for Chinese script. ACM Trans Asian Lang Inform Process 1(4):297–309CrossRef Li Y, Ding X, Tan CL (2002) Combining character-based bigram with word-based bigram in contextual post-processing for Chinese script. ACM Trans Asian Lang Inform Process 1(4):297–309CrossRef
12.
Zurück zum Zitat Martin S, Liermann J, Ney H (1998) Algorithms for bigram and trigram word clustering. Speech Commun 24:9–37 Martin S, Liermann J, Ney H (1998) Algorithms for bigram and trigram word clustering. Speech Commun 24:9–37
13.
Zurück zum Zitat Liu CL, Nakagawa M (2000) Precise candidate selection for large character set recognition by confidence evaluation. IEEE Trans Pattern Anal Mach Intell 22(6):636–642CrossRef Liu CL, Nakagawa M (2000) Precise candidate selection for large character set recognition by confidence evaluation. IEEE Trans Pattern Anal Mach Intell 22(6):636–642CrossRef
14.
Zurück zum Zitat Wu LD (1997) Large-scale Chinese text processing. Fudan University Press, China Wu LD (1997) Large-scale Chinese text processing. Fudan University Press, China
15.
Zurück zum Zitat Gu HY, Tseng CY, Lee LS (1991) Markov modeling of mandarin Chinese for decoding the phonetics sequence into Chinese characters. Comput Speech Lang 15(4):363–377CrossRef Gu HY, Tseng CY, Lee LS (1991) Markov modeling of mandarin Chinese for decoding the phonetics sequence into Chinese characters. Comput Speech Lang 15(4):363–377CrossRef
16.
Zurück zum Zitat Rabiner LR. (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286CrossRef Rabiner LR. (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286CrossRef
17.
Zurück zum Zitat Ney H, Ortammns S (1999) Dynamic programming search for continuous speech recognition. IEEE Signal Process Mag 16(5):64–83CrossRef Ney H, Ortammns S (1999) Dynamic programming search for continuous speech recognition. IEEE Signal Process Mag 16(5):64–83CrossRef
18.
Zurück zum Zitat Koerich AL, Sabourin R, Suen CY (2003) Large vocabulary off-line handwriting recognition: a survey. Patt Anal Appl 6(1):97–121CrossRefMathSciNet Koerich AL, Sabourin R, Suen CY (2003) Large vocabulary off-line handwriting recognition: a survey. Patt Anal Appl 6(1):97–121CrossRefMathSciNet
19.
Zurück zum Zitat Xu R, Yeung D, Shu W (2002) A hybrid post-processing system for handwriting Chinese character recognition. Int J Patt Recogn Artif Intell 16(6):657–679CrossRef Xu R, Yeung D, Shu W (2002) A hybrid post-processing system for handwriting Chinese character recognition. Int J Patt Recogn Artif Intell 16(6):657–679CrossRef
20.
Zurück zum Zitat Perraud F, Viard-Gaudin C, Morin E et al (2003) N-gram and n-class models for online handwriting recognition. In: Proceedings of 7th international conference on document analysis and recognition. Edinburgh, UK, pp 1053–1057 Perraud F, Viard-Gaudin C, Morin E et al (2003) N-gram and n-class models for online handwriting recognition. In: Proceedings of 7th international conference on document analysis and recognition. Edinburgh, UK, pp 1053–1057
21.
Zurück zum Zitat Jurafsky D, Martin JH (2000) Speech and language processing: an introduction to natural language processing, computational linguistics and speech recognition. Prentice Hall, New Jersey Jurafsky D, Martin JH (2000) Speech and language processing: an introduction to natural language processing, computational linguistics and speech recognition. Prentice Hall, New Jersey
22.
Zurück zum Zitat Chen SF, Goodman J (1999) An empirical study of smoothing techniques for language modeling. Comput Speech Lang 13(4):359–394CrossRef Chen SF, Goodman J (1999) An empirical study of smoothing techniques for language modeling. Comput Speech Lang 13(4):359–394CrossRef
23.
Zurück zum Zitat Xu L, Krzyzak A, Suen CY (1992) Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Trans Syst Man Cybern 22(3):418–435CrossRef Xu L, Krzyzak A, Suen CY (1992) Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Trans Syst Man Cybern 22(3):418–435CrossRef
24.
Zurück zum Zitat Lee YS, Chen HH (1996) Analysis of error count distributions for improving the post-processing performance of OCCR. Commun COLIPS 6(2):81–86 Lee YS, Chen HH (1996) Analysis of error count distributions for improving the post-processing performance of OCCR. Commun COLIPS 6(2):81–86
25.
Zurück zum Zitat Lin X, Ding X, Chen M, et al (1998) Adaptive confidence transform based on classifier combination for Chinese character recognition. Pattern Recogn Lett 19(10):975–988CrossRef Lin X, Ding X, Chen M, et al (1998) Adaptive confidence transform based on classifier combination for Chinese character recognition. Pattern Recogn Lett 19(10):975–988CrossRef
26.
Zurück zum Zitat Li Y, Ding X (2002) Evaluation of character candidate confidence measure using logistic regression model (in Chinese). Pattern Recogn Artif Intell 15(2):160–166 Li Y, Ding X (2002) Evaluation of character candidate confidence measure using logistic regression model (in Chinese). Pattern Recogn Artif Intell 15(2):160–166
27.
Zurück zum Zitat Webb A (2002) Statistical pattern recognition. Wiley, EnglandMATH Webb A (2002) Statistical pattern recognition. Wiley, EnglandMATH
28.
Zurück zum Zitat Hosmer DW, Lemeshow S (1989) Applied logistic regression. Wiley, New York Hosmer DW, Lemeshow S (1989) Applied logistic regression. Wiley, New York
29.
Zurück zum Zitat Chen Y (1997) Research on hand-printed Chinese character recognition. PhD Thesis, Tsinghua University Chen Y (1997) Research on hand-printed Chinese character recognition. PhD Thesis, Tsinghua University
30.
Zurück zum Zitat Lin X (1999) Theory and application of confidence analysis and multiple classifier combination in character recognition. PhD Thesis, Tsinghua University Lin X (1999) Theory and application of confidence analysis and multiple classifier combination in character recognition. PhD Thesis, Tsinghua University
Metadaten
Titel
A hybrid post-processing system for offline handwritten Chinese script recognition
verfasst von
Yuan-Xiang Li
Chew Lim Tan
Xiaoqing Ding
Publikationsdatum
01.12.2005
Verlag
Springer-Verlag
Erschienen in
Pattern Analysis and Applications / Ausgabe 3/2005
Print ISSN: 1433-7541
Elektronische ISSN: 1433-755X
DOI
https://doi.org/10.1007/s10044-005-0009-3

Weitere Artikel der Ausgabe 3/2005

Pattern Analysis and Applications 3/2005 Zur Ausgabe

Premium Partner