nach oben

International Journal on Document Analysis and Recognition (IJDAR)

Erschienen in:

08.11.2020 | Original Paper

Translating math formula images to LaTeX sequences using deep neural networks with sequence-level training

verfasst von: Zelun Wang, Jyh-Charn Liu

Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) | Ausgabe 1-2/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In this paper, we propose a deep neural network model with an encoder–decoder architecture that translates images of math formulas into their LaTeX markup sequences. The encoder is a convolutional neural network that transforms images into a group of feature maps. To better capture the spatial relationships of math symbols, the feature maps are augmented with 2D positional encoding before being unfolded into a vector. The decoder is a stacked bidirectional long short-term memory model integrated with the soft attention mechanism, which works as a language model to translate the encoder output into a sequence of LaTeX tokens. The neural network is trained in two steps. The first step is token-level training using the maximum likelihood estimation as the objective function. At completion of the token-level training, the sequence-level training objective function is employed to optimize the overall model based on the policy gradient algorithm from reinforcement learning. Our design also overcomes the exposure bias problem by closing the feedback loop in the decoder during sequence-level training, i.e., feeding in the predicted token instead of the ground truth token at every time step. The model is trained and evaluated on the IM2LATEX-100 K dataset and shows state-of-the-art performance on both sequence-based and image-based evaluation metrics.

Vorheriger Artikel CNN-based segmentation of speech balloons and narrative text boxes from comic book page images

Nächster Artikel Combination of deep neural networks and logical rules for record segmentation in historical handwritten registers using few examples

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

LaTeX (version 3.1415926–2.5–1.40.14).

Different sizes of width–height buckets (in pixel): (320, 40), (360, 60), (360, 50), (200, 50), (280, 50), (240, 40), (360, 100), (500, 100), (320, 50), (280, 40), (200, 40), (400, 160), (600, 100), (400, 50), (160, 40), (800, 100), (240, 50), (120, 50), (360, 40), (500, 200).

Ion, P., Miner, R., Buswell, S., Devitt, A.: Mathematical Markup Language (MathML) 1.0 Specification. World Wide Web Consortium (W3C) (1998)

Anderson, R.H.: Syntax-directed recognition of hand-printed two-dimensional mathematics. In: Symposium on Interactive Systems for Experimental Applied Mathematics: Proceedings of the Association for Computing Machinery Inc. Symposium, pp. 436–459. ACM (1967)

Suzuki, M., Tamari, F., Fukuda, R., Uchida, S., Kanahori, T.: INFTY: an integrated OCR system for mathematical documents. In: Proceedings of the 2003 ACM Symposium on Document Engineering, pp. 95–104. ACM (2003)

Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Deep structured output learning for unconstrained text recognition (2014). arXiv preprint arXiv:1412.5903

Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3156–3164 (2015)

Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057 (2015)

Deng, Y., Kanervisto, A., Rush, A.M.: What you get is what you see: a visual markup decompiler, vol. 10, pp. 32–37 (2016). arXiv preprint arXiv:1609.04938

Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics (2002)

Sutton, R.S., McAllester, D.A., Singh, S. P., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Advances in Neural Information Processing Systems, pp. 1057–1063 (2000).

10.

Ranzato, M.A., Chopra, S., Auli, M., Zaremba, W.: Sequence level training with recurrent neural networks (2015). arXiv preprint arXiv:1511.06732

11.

Chan, K.-F., Yeung, D.-Y.: Mathematical expression recognition: a survey. Int. J. Doc. Anal. Recogn. 3(1), 3–15 (2000)CrossRef

12.

Garain, U., Chaudhuri, B., Chaudhuri, A.R.: Identification of embedded mathematical expressions in scanned documents. In: Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004, vol. 1, pp. 384–387. IEEE (2004)

13.

Wang, Z., Beyette, D., Lin, J., Liu, J.-C.: Extraction of math expressions from PDF documents based on unsupervised modeling of fonts. In: IAPR International Conference on Document Analysis and Recognition (ICDAR), Sydney, Australia. IEEE (2019).

14.

Wang, X., Wang, Z., Liu, J.-C.: Bigram label regularization to reduce over-segmentation on inline math expression detection. In: IAPR International Conference on Document Analysis and Recognition (ICDAR), Sydney, Australia. IEEE (2019)

15.

Gao, L., Yi, X., Liao, Y., Jiang, Z., Yan, Z., Tang, Z.: A deep learning-based formula detection method for PDF documents. In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 553–558. IEEE (2017)

16.

Twaakyondo, H.M., Okamoto, M.: Structure analysis and recognition of mathematical expressions. In: Proceedings of 3rd International Conference on Document Analysis and Recognition, vol. 1, pp. 430–437. IEEE (1995)

17.

Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012).

18.

Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376. ACM (2006)

19.

Wang, T., Wu, D.J., Coates, A., Ng, A.Y.: End-to-end text recognition with convolutional neural networks. In: Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), pp. 3304–3308. IEEE (2012)

20.

Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)CrossRef

21.

Luong, M.-T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation (2015). arXiv preprint arXiv:1508.04025

22.

Zhang, J., Du, J., Dai, L.: A gru-based encoder-decoder approach with attention for online handwritten mathematical expression recognition. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 902–907. IEEE (2017)

23.

Zhang, J., Du, J., Zhang, S., Liu, D., Hu, Y., Hu, J., Wei, S., Dai, L.: Watch, attend and parse: an end-to-end neural network based approach to handwritten mathematical expression recognition. Pattern Recogn. 71, 196–206 (2017)CrossRef

24.

Deng, Y., Kanervisto, A., Ling, J., Rush, A.M.: Image-to-markup generation with coarse-to-fine attention. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 980–989 (2017)

25.

Wang, J., Sun, Y., Wang, S.: Image to latex with DenseNet encoder and joint attention. Procedia Comput. Sci. 147, 374–380 (2019)CrossRef

26.

Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017).

27.

Chen, L., Zhang, H., Xiao, J., Nie, L., Shao, J., Liu, W., Chua, T.-S.: Sca-cnn: spatial and channel-wise attention in convolutional networks for image captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5659–5667 (2017)

28.

Zhang, W., Bai, Z., Zhu, Y.: An improved approach based on CNN-RNNs for mathematical expression recognition. In: Proceedings of the 2019 4th International Conference on Multimedia Systems and Signal Processing, pp. 57–610. ACM (2019)

29.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)

30.

Levy O., Goldberg, Y.: Neural word embedding as implicit matrix factorization. In: Advances in Neural Information Processing Systems, pp. 2177–2185 (2014)

31.

Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef

32.

Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)CrossRef

33.

Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning, p. 129. MIT Press, Cambridge (2016)MATH

34.

Shen, S., Cheng, Y., He, Z., He, W., Wu, H., Sun, M., Liu, Y.: Minimum risk training for neural machine translation (2015). arXiv preprint arXiv:1512.02433

35.

Wu, L., Tian, F., Qin, T., Lai, J., Liu, T.-Y.: A study of reinforcement learning for neural machine translation (2018). arXiv preprint arXiv:1808.08866

36.

Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8(3–4), 229–256 (1992)MATH

37.

Chatterjee S., Cancedda, N.: Minimum error rate training by sampling the translation lattice. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 606–615. Association for Computational Linguistics (2010)

38.

KaTex (2019, Aug 25). https://katex.org/

39.

Álvaro, F., Sánchez, J.-A., Benedí, J.-M.: An image-based measure for evaluation of mathematical expression recognition. In: Iberian Conference on Pattern Recognition and Image Analysis, pp. 682–690. Springer (2013)

40.

Mathpix Snip (2020, May 6th). https://mathpix.com/

41.

Kingma D.P., Ba, J.: Adam: a method for stochastic optimization (2014). arXiv preprint arXiv:1412.6980

42.

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)MathSciNetMATH

43.

Graves, A.: Sequence transduction with recurrent neural networks (2012). arXiv preprint arXiv:1211.3711

44.

Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017)

45.

Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation (2014) arXiv preprint arXiv:1406.1078

Titel: Translating math formula images to LaTeX sequences using deep neural networks with sequence-level training
verfasst von: Zelun Wang
Jyh-Charn Liu
Publikationsdatum: 08.11.2020
Verlag: Springer Berlin Heidelberg
Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) / Ausgabe 1-2/2021
Print ISSN: 1433-2833
Elektronische ISSN: 1433-2825
DOI: https://doi.org/10.1007/s10032-020-00360-2

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 1-2/2021

Offline script recognition from handwritten and printed multilingual documents: a survey

Text recognition for Vietnamese identity card based on deep features network

Knowledge-driven description synthesis for floor plan interpretation

CNN-based segmentation of speech balloons and narrative text boxes from comic book page images

Combination of deep neural networks and logical rules for record segmentation in historical handwritten registers using few examples

Cross-modal photo-caricature face recognition based on dynamic multi-task learning