Top

International Journal on Document Analysis and Recognition (IJDAR)

Published in:

28-07-2020 | Original Paper

A benchmark for unconstrained online handwritten Uyghur word recognition

Authors: Wujiahemaiti Simayi, Mayire Ibrahim, Xu-Yao Zhang, Cheng-Lin Liu, Askar Hamdulla

Published in: International Journal on Document Analysis and Recognition (IJDAR) | Issue 3/2020

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Despite some interesting results from different research groups, a public database for Uyghur online handwriting recognition and a baseline study are not yet available for comparison purpose. In order to fill this void, we present a database of Uyghur online handwritten words and carry out the first benchmark experiments using it. This database contains 125,020 samples of 2030 words collected from 393 writers. According to Uyghur lexicon characteristics, two out-of-vocabulary datasets are especially provided for evaluation. We carry out some unconstrained handwritten word recognition experiments on the database using recurrent neural networks as base model. Recognition results are acquired using connectionist temporal classification without lexicon search and external language model. Concatenated and averaged bidirectional recurrent layers are compared for better generalization. Based on Uyghur unicode representation, we are interested in comparing the models using different alphabets, based both on character types and character forms. To improve generalization, we propose 1D convolutional model which implements 1D convolutional layers for sequence feature extraction. In our experiments, the proposed 1D convolutional model and its variations surpassed the base recurrent layered model on the out-of-vocabulary words by clear margin. 83.23% CAR (character accurate rate) was resulted when out-of-vocabulary samples are used for testing. The highest recognition rate is as high as 94.95% CAR when the test set shares the same lexicon to the training set. The experiments in this paper can be the baseline references for the future study using this database.

previous article Model-based Persian calligraphy synthesis via learning to transfer templates to personal styles

next article A robust watermarking approach for security issue of binary documents using fully convolutional networks

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Liu, C.L., Yin, F., Wang, D.H., Wang, Q.F.: Online and online handwritten Chinese character recognition: benchmarking on new databases. Pattern Recognit. 46(1), 155–162 (2013)

Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)MATH

Su, T.: Chinese handwriting recognition: an algorithmic perspective. In: Springer Briefs In Electrical & Computer Engineering (2013)

Guyon, L.: Schomaker, UNIPEN project of on-line data exchange and recognizer benchmarks. In: Proceedings of the 12th IAPR International Conference Pattern Recognition, pp. 29–33 (1994)

Mori, S., Yamamoto, K., Yamada, H., et al.: On a handprinted Kyoiku-Kanji character database. Bull. Electrotech. Lab 43(11–12), 752–773 (1979)

Jaderberg, A., Simonyan, K., Vedaldi, A., Zisserman, A.: Reading text in the wild with convolutional neural networks. Int. J. Comput. Vis. 116, 1–20 (2016)MathSciNet

MNIST: http://yann.lecun.com/exdb/mnist/index.html

Marti, U.V., Bunke, H.: The IAM-database: an English sentence database for online handwriting recognition. Int. J. Doc. Anal. Recognit. 5(1), 39–46 (2002)MATH

Nakagawa, M., Matsumoto, K.: Collection of on-line handwritten Japanese character pattern databases and their analyses. Doc. Anal. Recognit. 7(1), 69–81 (2004)

10.

Liu, C.L., Yin, F., Wang, D.H, et al.: CASIA online and online Chinese handwriting databases. In: Proceedings of the 2011 International Conference on Document Analysis and Recognition, pp. 37–41 (2011)

11.

Abed, H.E, Margner, V.: The IFN/ENIT-database—a tool to develop Arabic handwriting recognition systems. In: Proceedings of the International Symposium on Signal Processing & Its Applications, pp. 1–4 (2007)

12.

Grosicki, E., El-Abed, H.: ICDAR 2011—French handwriting recognition competition. In: Proceedings of the 2011 International Conference on Document Analysis & Recognition, pp. 1459–1463 (2011)

13.

Märgner, Volker, Abed, El: Haikal: ICDAR 2009 Arabic handwriting recognition competition. Int. J. Doc. Anal. Recognit. 14(1), 15–23 (2009)

14.

Yin, F., Wang, Q.F., Zhang, X.Y., Liu, C.L.: ICDAR 2013 Chinese handwriting recognition competition. In: Proceedings of the 12th International Conference on Document Analysis and Recognition, pp. 1464–1470 (2013)

15.

Viard-Gaudin, C., Lallican, P.M., Knerr, S., Binter, P.: The IRESTE on/off (IRONOFF) dual handwriting database. In: Proceedings of the 5th International Conference on Document Analysis and Recognition, pp. 455–458 (1999)

16.

Shivram, A., Ramaiah, C., Setlur, S., et al.: IBM_UB_1: A dual mode unconstrained English handwriting dataset. In: Proceedings of the 12th International Conference on Document Analysis and Recognition, pp. 13–17 (2013)

17.

Su, T., Zhang, T., Guan, D.: Corpus-based HIT-MW database for online recognition of general-purpose Chinese handwritten text. Int. J. Doc. Anal. Recognit. 10(1), 27–38 (2007)

18.

Jin, L., Gao, Y., Liu, G., et al.: SCUT-COUCH2009—a comprehensive online unconstrained Chinese handwriting database and benchmark evaluation. Int. J. Doc. Anal. Recognit. (IJDAR) 14(1), 53–64 (2011)

19.

Hosny, I., Abdou, S., Al-Barhamtoshy, H.: Large vocabulary Arabic online handwriting recognition system. Formal Pattern Analysis & Applications, Eprint Arxiv (2014)

20.

Simayi, W., Ibrayim, M., Tursun, D., Hamdulla, A.: A survey on the classifiers in on-line handwritten Uyghur character recognition system. Int. J. Hybrid Inf. Technol. 9(3), 189–198 (2016)

21.

Ibrahim, M.: Key technologies for recognition of online handwritten Uyghur characters and words. Ph.D. dissertation, Wuhan University (in Chinese) (2013)

22.

Xu, Y.M.: A study of key techniques for Uighur handwriting recognition. Ph.D. dissertation, Xidian University (in Chinese) (2014)

23.

Chherawala, Y., Roy, P.P., Cheriet, M.: Combination of context-dependent bidirectional long short-term memory classifier s for robust online handwriting recognition. Pattern Recognit. Lett. 90, 58–64 (2017)

24.

Wu, Y.C., Yin, F., Chen, Z., Liu, C.L.: Handwritten Chinese text recognition using separable multi-dimensional recurrent neural network. In: Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 79–84 (2017)

25.

Sun, L., Su, T., Liu, C., Wang, R.: Deep LSTM networks for online Chinese handwriting recognition. In: Proceedings of the 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 271–276 (2016)

26.

Kurban, A., Mamat, H.: BeidaFangzheng Uighur text to Unicode text code code-conversion. J. Xinjiang Univ. 23(3), 343–347 (2006) (in Chinese)

27.

Ablimit, M., Hamdulla, A., Kawahara, T.: Morpheme concatenation approach in language modeling for large-vocabulary Uyghur speech recognition. In: Proceedings of the 2011 International Conference on Speech Database and Assessments (Oriental COCOSDA). pp. 112–115 (2011)

28.

Yilahun, H., Enwer, S., Hamdulla, A.: Uyghur word stemming based on stem and aix features. In: Proceedings of the National Conference on Man-Machine Speech Communication. Springer, Singapore, pp. 1–12 (2017)

29.

Uighursoft Spelling Corrector: http://www.uighursoft.com/ug-CN/products/uspell.html

30.

Eraqi, H.M., Abdelazeem, S., Rashwan, M.A.A.: Combining analytical and holistic strategies for handwriting recognition. In: Proceedings of the 15th IEEE International Conference on Machine Learning and Applications, pp. 993–997 (2016)

31.

Simayi, W., Hamdulla, A., Liu, C.L.: Holistic handwritten Uyghur word recognition using convolutional neural networks. In: Proceedings of the 4th IAPR Asian Conference on Pattern Recognition, pp. 846–851 (2017)

32.

Liu, C.L., Sako, H., Fujisawa, H.: Effects of classifier structures and training regimes on integrated segmentation and recognition of handwritten numeral strings. IEEE Trans. Pattern Anal. Mach. Intell. 26(11), 1395–1407 (2004)

33.

Vinciarelli, A.: A survey on of-line cursive word recognition. Pattern Recognit. 35(7), 1433–1446 (2002)MATH

34.

Graves, A., Liwicki, M., Fernández, S., Bertolami, R., Bunke, H., Schmidhuber, J.: A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 855–868 (2009)

35.

El Abed, H., Märgner, V., Kherallah, M., Alimi, A.M.: ICDAR 2009 online arabic handwriting recognition competition. In: Proceedings of the 10th International Conference on Document Analysis and Recognition, pp. 1388–1392 (2009)

36.

Nguyen, H.T., Nguyen, C.T., Bao, P.T., Nakagawa, M.: A database of unconstrained Vietnamese online handwriting and recognition experiments by recurrent neural networks. Pattern Recognit. 78, 291–306 (2018)

37.

Ma, L.L., Liu, J., Wu, J.: A new database for online handwritten Mongolian word recognition. In: Proceedings of the 23rd International Conference on Pattern Recognition, pp. 1131–1136 (2016)

38.

Hanvon-Pen: http://www.hanvon.com/

39.

Zhang, X.Y., Yin, F., Zhang, Y.M., et al.: Drawing and recognizing chinese characters with recurrent neural network. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 849–862 (2017)

40.

Graves, A., Fernández, S., Gomez, F.: Connectionist temporal classification: labeling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376 (2006)

41.

Boureau, Y.L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th International Conference on Machine Learning, pp. 111–118 (2010)

42.

Chung, J., Gulcehre, C., Cho, K.H., et al.: Empirical evaluation of gated recurrent neural networks on sequence modeling (2014). arXiv:1412.3555

43.

Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv:1409.1556

44.

Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift (2015). arXiv:1502.03167

45.

Li, D., Zhang, J., Zhang, Q., et al.: Classification of ECG signals based on 1D convolution neural network. In: Proceedings of the IEEE 19th International Conference on e-Health Networking, Applications and Services, pp. 1–6 (2017)

46.

Zhang, H., Meng, L., Wei, X., et al.: 1D-convolutional capsule network for hyperspectral image classification (2019). arXiv:1903.09834

47.

Li, J., Zhang, H., Cai, X., Xu, B.: Towards end-to-end speech recognition for chinese mandarin using long short-term memory recurrent neural networks. In: Proceedings of the 6th Annual Conference of the International Speech Communication Association (2015)

48.

Wu, Y.C., Yin, F., Liu, C.L.: Improving handwritten Chinese text recognition using neural network language models and convolutional neural network shape models. Pattern Recognit. 65, 251–264 (2017)

49.

Voigtlaender, P., Doetsch, P., Ney H.: Handwriting recognition with large multidimensional long short-term memory recurrent neural networks. In: Proceedings of the 15th International Conference on Frontiers in Handwriting Recognition, pp. 228–233 (2016)

50.

Srivastava, N., Hinton, G., Krizhevsky, A., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)MathSciNetMATH

51.

Zeiler, M.D.: ADADELTA: an adaptive learning rate method. Computer Science. (2012). arXiv:1212.5701

Title: A benchmark for unconstrained online handwritten Uyghur word recognition
Authors: Wujiahemaiti Simayi
Mayire Ibrahim
Xu-Yao Zhang
Cheng-Lin Liu
Askar Hamdulla
Publication date: 28-07-2020
Publisher: Springer Berlin Heidelberg
Published in: International Journal on Document Analysis and Recognition (IJDAR) / Issue 3/2020
Print ISSN: 1433-2833
Electronic ISSN: 1433-2825
DOI: https://doi.org/10.1007/s10032-020-00354-0

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner