Skip to main content

2021 | OriginalPaper | Buchkapitel

Handwritten Mathematical Expression Recognition with Bidirectionally Trained Transformer

verfasst von : Wenqi Zhao, Liangcai Gao, Zuoyu Yan, Shuai Peng, Lin Du, Ziyin Zhang

Erschienen in: Document Analysis and Recognition – ICDAR 2021

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Encoder-decoder models have made great progress on handwritten mathematical expression recognition recently. However, it is still a challenge for existing methods to assign attention to image features accurately. Moreover, those encoder-decoder models usually adopt RNN-based models in their decoder part, which makes them inefficient in processing long https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-86331-9_37/MediaObjects/520871_1_En_37_Figa_HTML.gif sequences. In this paper, a transformer-based decoder is employed to replace RNN-based ones, which makes the whole model architecture very concise. Furthermore, a novel training strategy is introduced to fully exploit the potential of the transformer in bidirectional language modeling. Compared to several methods that do not use data augmentation, experiments demonstrate that our model improves the ExpRate of current state-of-the-art methods on CROHME 2014 by 2.23%. Similarly, on CROHME 2016 and CROHME 2019, we improve the ExpRate by 1.92% and 2.28% respectively.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Alvaro, F., Sánchez, J.A., Benedí, J.M.: Recognition of on-line handwritten mathematical expressions using 2D stochastic context-free grammars and hidden Markov models. Pattern Recogn. Lett. 35, 58–67 (2014)CrossRef Alvaro, F., Sánchez, J.A., Benedí, J.M.: Recognition of on-line handwritten mathematical expressions using 2D stochastic context-free grammars and hidden Markov models. Pattern Recogn. Lett. 35, 58–67 (2014)CrossRef
2.
Zurück zum Zitat Bengio, Y., Frasconi, P., Simard, P.: The problem of learning long-term dependencies in recurrent networks. In: IEEE International Conference on Neural Networks, pp. 1183–1188. IEEE (1993) Bengio, Y., Frasconi, P., Simard, P.: The problem of learning long-term dependencies in recurrent networks. In: IEEE International Conference on Neural Networks, pp. 1183–1188. IEEE (1993)
3.
Zurück zum Zitat Chan, K.F., Yeung, D.Y.: An efficient syntactic approach to structural analysis of on-line handwritten mathematical expressions. Pattern Recogn. 33(3), 375–384 (2000)CrossRef Chan, K.F., Yeung, D.Y.: An efficient syntactic approach to structural analysis of on-line handwritten mathematical expressions. Pattern Recogn. 33(3), 375–384 (2000)CrossRef
4.
Zurück zum Zitat Chan, K.F., Yeung, D.Y.: Mathematical expression recognition: a survey. Int. J. Doc. Anal. Recogn. 3(1), 3–15 (2000)CrossRef Chan, K.F., Yeung, D.Y.: Mathematical expression recognition: a survey. Int. J. Doc. Anal. Recogn. 3(1), 3–15 (2000)CrossRef
5.
Zurück zum Zitat Chan, K.F., Yeung, D.Y.: Error detection, error correction and performance evaluation in on-line mathematical expression recognition. Pattern Recogn. 34(8), 1671–1684 (2001)CrossRef Chan, K.F., Yeung, D.Y.: Error detection, error correction and performance evaluation in on-line mathematical expression recognition. Pattern Recogn. 34(8), 1671–1684 (2001)CrossRef
6.
Zurück zum Zitat Cheng, Z., Bai, F., Xu, Y., Zheng, G., Pu, S., Zhou, S.: Focusing attention: towards accurate text recognition in natural images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5076–5084 (2017) Cheng, Z., Bai, F., Xu, Y., Zheng, G., Pu, S., Zhou, S.: Focusing attention: towards accurate text recognition in natural images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5076–5084 (2017)
7.
Zurück zum Zitat Deng, Y., Kanervisto, A., Rush, A.M.: What you get is what you see: a visual markup decompiler. arXiv preprint arXiv:1609.04938 10, 32–37 (2016) Deng, Y., Kanervisto, A., Rush, A.M.: What you get is what you see: a visual markup decompiler. arXiv preprint arXiv:​1609.​04938 10, 32–37 (2016)
8.
Zurück zum Zitat Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018) Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:​1810.​04805 (2018)
9.
Zurück zum Zitat Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 315–323 (2011) Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 315–323 (2011)
10.
Zurück zum Zitat He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
11.
Zurück zum Zitat Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017) Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
12.
Zurück zum Zitat Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015) Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:​1502.​03167 (2015)
13.
Zurück zum Zitat Jiang, Z., Gao, L., Yuan, K., Gao, Z., Tang, Z., Liu, X.: Mathematics content understanding for cyberlearning via formula evolution map. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 37–46 (2018) Jiang, Z., Gao, L., Yuan, K., Gao, Z., Tang, Z., Liu, X.: Mathematics content understanding for cyberlearning via formula evolution map. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 37–46 (2018)
14.
Zurück zum Zitat Kang, L., Riba, P., Rusiñol, M., Fornés, A., Villegas, M.: Pay attention to what you read: non-recurrent handwritten text-line recognition. arXiv preprint arXiv:2005.13044 (2020) Kang, L., Riba, P., Rusiñol, M., Fornés, A., Villegas, M.: Pay attention to what you read: non-recurrent handwritten text-line recognition. arXiv preprint arXiv:​2005.​13044 (2020)
15.
Zurück zum Zitat Liu, L., Utiyama, M., Finch, A., Sumita, E.: Agreement on target-bidirectional neural machine translation. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 411–416 (2016) Liu, L., Utiyama, M., Finch, A., Sumita, E.: Agreement on target-bidirectional neural machine translation. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 411–416 (2016)
16.
Zurück zum Zitat MacLean, S., Labahn, G.: A new approach for recognizing handwritten mathematics using relational grammars and fuzzy sets. Int. J. Doc. Anal. Recogn. (IJDAR) 16(2), 139–163 (2013)CrossRef MacLean, S., Labahn, G.: A new approach for recognizing handwritten mathematics using relational grammars and fuzzy sets. Int. J. Doc. Anal. Recogn. (IJDAR) 16(2), 139–163 (2013)CrossRef
17.
Zurück zum Zitat Mahdavi, M., Zanibbi, R., Mouchere, H., Viard-Gaudin, C., Garain, U.: ICDAR 2019 CROHME + TFD: competition on recognition of handwritten mathematical expressions and typeset formula detection. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1533–1538. IEEE (2019) Mahdavi, M., Zanibbi, R., Mouchere, H., Viard-Gaudin, C., Garain, U.: ICDAR 2019 CROHME + TFD: competition on recognition of handwritten mathematical expressions and typeset formula detection. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1533–1538. IEEE (2019)
18.
Zurück zum Zitat Mouchere, H., Viard-Gaudin, C., Zanibbi, R., Garain, U.: ICFHR 2014 competition on recognition of on-line handwritten mathematical expressions (CROHME 2014). In: 2014 14th International Conference on Frontiers in Handwriting Recognition, pp. 791–796. IEEE (2014) Mouchere, H., Viard-Gaudin, C., Zanibbi, R., Garain, U.: ICFHR 2014 competition on recognition of on-line handwritten mathematical expressions (CROHME 2014). In: 2014 14th International Conference on Frontiers in Handwriting Recognition, pp. 791–796. IEEE (2014)
19.
Zurück zum Zitat Mouchère, H., Viard-Gaudin, C., Zanibbi, R., Garain, U.: ICFHR 2016 CROHME: competition on recognition of online handwritten mathematical expressions. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 607–612. IEEE (2016) Mouchère, H., Viard-Gaudin, C., Zanibbi, R., Garain, U.: ICFHR 2016 CROHME: competition on recognition of online handwritten mathematical expressions. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 607–612. IEEE (2016)
20.
Zurück zum Zitat Parmar, N., et al.: Image transformer. In: International Conference on Machine Learning, pp. 4055–4064. PMLR (2018) Parmar, N., et al.: Image transformer. In: International Conference on Machine Learning, pp. 4055–4064. PMLR (2018)
21.
Zurück zum Zitat Truong, T.N., Nguyen, C.T., Phan, K.M., Nakagawa, M.: Improvement of end-to-end offline handwritten mathematical expression recognition by weakly supervised learning. In: 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 181–186. IEEE (2020) Truong, T.N., Nguyen, C.T., Phan, K.M., Nakagawa, M.: Improvement of end-to-end offline handwritten mathematical expression recognition by weakly supervised learning. In: 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 181–186. IEEE (2020)
22.
Zurück zum Zitat Vaswani, A., et al.: Attention is all you need. Adv. Neural. Inf. Process. Syst. 30, 5998–6008 (2017) Vaswani, A., et al.: Attention is all you need. Adv. Neural. Inf. Process. Syst. 30, 5998–6008 (2017)
23.
Zurück zum Zitat Wu, J.W., Yin, F., Zhang, Y.M., Zhang, X.Y., Liu, C.L.: Handwritten mathematical expression recognition via paired adversarial learning. Int. J. Comput. Vis., 1–16 (2020) Wu, J.W., Yin, F., Zhang, Y.M., Zhang, X.Y., Liu, C.L.: Handwritten mathematical expression recognition via paired adversarial learning. Int. J. Comput. Vis., 1–16 (2020)
24.
Zurück zum Zitat Wu, J.W., Yin, F., Zhang, Y., Zhang, X.Y., Liu, C.L.: Graph-to-graph: towards accurate and interpretable online handwritten mathematical expression recognition. In: AAAI 2021 (2021) Wu, J.W., Yin, F., Zhang, Y., Zhang, X.Y., Liu, C.L.: Graph-to-graph: towards accurate and interpretable online handwritten mathematical expression recognition. In: AAAI 2021 (2021)
25.
Zurück zum Zitat Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057 (2015) Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057 (2015)
26.
Zurück zum Zitat Yan, Z., Ma, T., Gao, L., Tang, Z., Chen, C.: Persistence homology for link prediction: an interactive view. arXiv preprint arXiv:2102.10255 (2021) Yan, Z., Ma, T., Gao, L., Tang, Z., Chen, C.: Persistence homology for link prediction: an interactive view. arXiv preprint arXiv:​2102.​10255 (2021)
27.
Zurück zum Zitat Yan, Z., Zhang, X., Gao, L., Yuan, K., Tang, Z.: ConvMath: a convolutional sequence network for mathematical expression recognition. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 4566–4572. IEEE (2021) Yan, Z., Zhang, X., Gao, L., Yuan, K., Tang, Z.: ConvMath: a convolutional sequence network for mathematical expression recognition. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 4566–4572. IEEE (2021)
28.
Zurück zum Zitat Yuan, K., Gao, L., Jiang, Z., Tang, Z.: Formula ranking within an article. In: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries, pp. 123–126 (2018) Yuan, K., Gao, L., Jiang, Z., Tang, Z.: Formula ranking within an article. In: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries, pp. 123–126 (2018)
29.
Zurück zum Zitat Yuan, K., Gao, L., Wang, Y., Yi, X., Tang, Z.: A mathematical information retrieval system based on RankBoost. In: Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries, pp. 259–260 (2016) Yuan, K., Gao, L., Wang, Y., Yi, X., Tang, Z.: A mathematical information retrieval system based on RankBoost. In: Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries, pp. 259–260 (2016)
30.
Zurück zum Zitat Yuan, K., He, D., Jiang, Z., Gao, L., Tang, Z., Giles, C.L.: Automatic generation of headlines for online math questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 9490–9497 (2020) Yuan, K., He, D., Jiang, Z., Gao, L., Tang, Z., Giles, C.L.: Automatic generation of headlines for online math questions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 9490–9497 (2020)
31.
Zurück zum Zitat Yuan, K., He, D., Yang, X., Tang, Z., Kifer, D., Giles, C.L.: Follow the curve: arbitrarily oriented scene text detection using key points spotting and curve prediction. In: 2020 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2020) Yuan, K., He, D., Yang, X., Tang, Z., Kifer, D., Giles, C.L.: Follow the curve: arbitrarily oriented scene text detection using key points spotting and curve prediction. In: 2020 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2020)
32.
Zurück zum Zitat Zanibbi, R., Blostein, D.: Recognition and retrieval of mathematical expressions. Int. J. Doc. Anal. Recogn. (IJDAR) 15(4), 331–357 (2012)CrossRef Zanibbi, R., Blostein, D.: Recognition and retrieval of mathematical expressions. Int. J. Doc. Anal. Recogn. (IJDAR) 15(4), 331–357 (2012)CrossRef
34.
Zurück zum Zitat Zhang, J., Du, J., Dai, L.: Multi-scale attention with dense encoder for handwritten mathematical expression recognition. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 2245–2250. IEEE (2018) Zhang, J., Du, J., Dai, L.: Multi-scale attention with dense encoder for handwritten mathematical expression recognition. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 2245–2250. IEEE (2018)
35.
Zurück zum Zitat Zhang, J., Du, J., Yang, Y., Song, Y.Z., Wei, S., Dai, L.: A tree-structured decoder for image-to-markup generation. In: ICML (2020, in Press) Zhang, J., Du, J., Yang, Y., Song, Y.Z., Wei, S., Dai, L.: A tree-structured decoder for image-to-markup generation. In: ICML (2020, in Press)
36.
Zurück zum Zitat Zhang, J., et al.: Watch, attend and parse: an end-to-end neural network based approach to handwritten mathematical expression recognition. Pattern Recogn. 71, 196–206 (2017)CrossRef Zhang, J., et al.: Watch, attend and parse: an end-to-end neural network based approach to handwritten mathematical expression recognition. Pattern Recogn. 71, 196–206 (2017)CrossRef
37.
Zurück zum Zitat Zhang, X., Gao, L., Yuan, K., Liu, R., Jiang, Z., Tang, Z.: A symbol dominance based formulae recognition approach for pdf documents. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1144–1149. IEEE (2017) Zhang, X., Gao, L., Yuan, K., Liu, R., Jiang, Z., Tang, Z.: A symbol dominance based formulae recognition approach for pdf documents. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1144–1149. IEEE (2017)
38.
Zurück zum Zitat Zhang, Z., Wu, S., Liu, S., Li, M., Zhou, M., Xu, T.: Regularizing neural machine translation by target-bidirectional agreement. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 443–450 (2019) Zhang, Z., Wu, S., Liu, S., Li, M., Zhou, M., Xu, T.: Regularizing neural machine translation by target-bidirectional agreement. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 443–450 (2019)
39.
Zurück zum Zitat Zhou, L., Zhang, J., Zong, C.: Synchronous bidirectional neural machine translation. Trans. Assoc. Comput. Linguist. 7, 91–105 (2019)CrossRef Zhou, L., Zhang, J., Zong, C.: Synchronous bidirectional neural machine translation. Trans. Assoc. Comput. Linguist. 7, 91–105 (2019)CrossRef
Metadaten
Titel
Handwritten Mathematical Expression Recognition with Bidirectionally Trained Transformer
verfasst von
Wenqi Zhao
Liangcai Gao
Zuoyu Yan
Shuai Peng
Lin Du
Ziyin Zhang
Copyright-Jahr
2021
DOI
https://doi.org/10.1007/978-3-030-86331-9_37

Premium Partner