nach oben

International Journal of Computer Vision

Erschienen in:

21.01.2020

Handwritten Mathematical Expression Recognition via Paired Adversarial Learning

verfasst von: Jin-Wen Wu, Fei Yin, Yan-Ming Zhang, Xu-Yao Zhang, Cheng-Lin Liu

Erschienen in: International Journal of Computer Vision | Ausgabe 10-11/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Recognition of handwritten mathematical expressions (MEs) is an important problem that has wide applications in practice. Handwritten ME recognition is challenging due to the variety of writing styles and ME formats. As a result, recognizers trained by optimizing the traditional supervision loss do not perform satisfactorily. To improve the robustness of the recognizer with respect to writing styles, in this work, we propose a novel paired adversarial learning method to learn semantic-invariant features. Specifically, our proposed model, named PAL-v2, consists of an attention-based recognizer and a discriminator. During training, handwritten MEs and their printed templates are fed into PAL-v2 simultaneously. The attention-based recognizer is trained to learn semantic-invariant features with the guide of the discriminator. Moreover, we adopt a convolutional decoder to alleviate the vanishing and exploding gradient problems of RNN-based decoder, and further, improve the coverage of decoding with a novel attention method. We conducted extensive experiments on the CROHME dataset to demonstrate the effectiveness of each part of the method and achieved state-of-the-art performance.

Vorheriger Artikel Discriminative Region Proposal Adversarial Network for High-Quality Image-to-Image Translation

Nächster Artikel DRIT++: Diverse Image-to-Image Translation via Disentangled Representations

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Alvaro, F., Sánchez, J., & Benedí, J. (2014). Recognition of on-line handwritten mathematical expressions using 2d stochastic context-free grammars and hidden Markov models. Pattern Recognition Letters, 35, 58–67.CrossRef

Alvaro, F., Sánchez, J., & Benedí, J. (2016). An integrated grammar-based approach for mathematical expression recognition. Pattern Recognition, 51, 135–147.MATHCrossRef

Anderson, R. H. (1967). Syntax-directed recognition of hand-printed two-dimensional mathematics. In Symposium on interactive systems for experimental applied mathematics: Proceedings of the Association for Computing Machinery Inc. Symposium (pp. 436–459). ACM.

Aneja, J., Deshpande, A., & Schwing, A. G. (2018). Convolutional image captioning. In 2018 IEEE conference on computer vision and pattern recognition, CVPR 2018, Salt Lake City, UT, USA, 18–22 June 2018, pp. 5561–5570.

Awal, A., Mouchère, H., & Viard-Gaudin, C. (2014). A global learning approach for an online handwritten mathematical expression recognition system. Pattern Recognition Letters, 35, 68–77.CrossRef

Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. CoRR arXiv:1409.0473.

Bai, S., Kolter, J. Z., & Koltun, V. (2018). An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. CoRR arXiv:1803.01271.

Bousmalis, K., Silberman, N., Dohan, D., Erhan, D., & Krishnan, D. (2017). Unsupervised pixel-level domain adaptation with generative adversarial networks. In 2017 IEEE conference on computer vision and pattern recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017, pp. 95–104.

Chan, K., & Yeung, D. (2000). Mathematical expression recognition: A survey. IJDAR, 3(1), 3–15.MathSciNetCrossRef

Chan, K., & Yeung, D. (2001). Error detection, error correction and performance evaluation in on-line mathematical expression recognition. Pattern Recognition, 34(8), 1671–1684.MATH

Cho, K. (2015). Natural language understanding with distributed representation. CoRR arXiv:1511.07916.

Cho, K., Courville, A. C., & Bengio, Y. (2015). Describing multimedia content using attention-based encoder-decoder networks. IEEE Transactions on Multimedia, 17(11), 1875–1886.

Chorowski, J., Bahdanau, D., Serdyuk, D., Cho, K., & Bengio, Y. (2015). Attention-based models for speech recognition. In Advances in neural information processing systems 28: Annual conference on neural information processing systems 2015, 7–12 December 2015, Montreal, Quebec, Canada, pp. 577–585.

Dauphin, Y. N., Fan, A., Auli, M., & Grangier, D. (2017). Language modeling with gated convolutional networks. In Proceedings of the 34th international conference on machine learning, ICML 2017, Sydney, NSW, Australia, 6–11 August 2017, pp. 933–941.

Deng, Y., Kanervisto, A., Ling, J., & Rush, A. M. (2017). Image-to-markup generation with coarse-to-fine attention. In Proceedings of the 34th international conference on machine learning, ICML 2017, Sydney, NSW, Australia, 6–11 August 2017, pp. 980–989.

Deng, Y., Kanervisto, A., & Rush, A. M. (2016). What you get is what you see: A visual markup decompiler. CoRR arXiv:1609.04938.

Gehring, J., Auli, M., Grangier, D., Yarats, D., & Dauphin, Y. N. (2017). Convolutional sequence to sequence learning. In Proceedings of the 34th international conference on machine learning, ICML 2017, Sydney, NSW, Australia, 6–11 August 2017, pp. 1243–1252.

Ghiasi, G., Lin, T., & Le, Q. V. (2018). Dropblock: A regularization method for convolutional networks. In Advances in neural information processing systems 31: Annual conference on neural information processing systems 2018, NeurIPS 2018, 3–8 December 2018, Montréal, Canada, pp. 10750–10760.

Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., et al. (2014). Generative adversarial nets. In Advances in neural information processing systems 27: Annual conference on neural information processing systems 2014, 8–13 December 2014, Montreal, Quebec, Canada, pp. 2672–2680.

Graves, A. (2011). Practical variational inference for neural networks. In Advances in neural information processing systems 24: 25th annual conference on neural information processing systems 2011. Proceedings of a meeting held 12–14 December 2011, Granada, Spain, pp. 2348–2356.

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, pp. 770–778.

Huang, G., Liu, Z., van der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In 2017 IEEE conference on computer vision and pattern recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017, pp. 2261–2269.

Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd international conference on machine learning, ICML 2015, Lille, France, 6–11 July 2015, pp. 448–456.

Jaderberg, M., Simonyan, K., Vedaldi, A., & Zisserman, A. (2016). Reading text in the wild with convolutional neural networks. International Journal of Computer Vision, 116(1), 1–20.MathSciNet

Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., et al. (2017). Visual genome: Connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision, 123(1), 32–73.MathSciNetCrossRef

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems 25: 26th annual conference on neural information processing systems 2012. Proceedings of a meeting held 3–6 December 2012, Lake Tahoe, Nevada, USA, pp. 1106–1114.

Krogh, A., & Hertz, J. A. (1991) A simple weight decay can improve generalization. In Advances in neural information processing systems 4, [NIPS Conference, Denver, Colorado, USA, 2–5 December 1991], pp. 950–957.

Le, A. D., & Nakagawa, M. (2017) Training an end-to-end system for handwritten mathematical expression recognition by generated patterns. In 14th IAPR international conference on document analysis and recognition, ICDAR 2017, Kyoto, Japan, 9–15 November 2017, pp. 1056–1061.

Li, L., Tang, S., Deng, L., Zhang, Y., & Tian, Q. (2017). Image caption with global-local attention. In Proceedings of the thirty-first AAAI conference on artificial intelligence, 4–9 February 2017, San Francisco, California, USA, pp. 4133–4139.

Liu, Y., Wang, Z., Jin, H., & Wassell, I. J. (2018). Synthetically supervised feature learning for scene text recognition. In Computer vision—ECCV 2018—15th European Conference, Munich, Germany, 8–14 September 2018, Proceedings, Part V, pp. 449–465.

Maaten, L. V. D., & Hinton, G. (2008). Visualizing data using t-sne. Journal of Machine Learning Research, 9, 2579–2605.MATH

MacLean, S., & Labahn, G. (2013). A new approach for recognizing handwritten mathematics using relational grammars and fuzzy sets. IJDAR, 16(2), 139–163.

Mahdavi, M., Zanibbi, R., Mouchere, H., & Garain, U. (2019). ICDAR 2019 CROHME+ TFD: Competition on recognition of handwritten mathematical expressions and typeset formula detection. ICDAR: In Proc.

Mouchère, H., Viard-Gaudin, C., Zanibbi, R., & Garain, U. (2016a). ICFHR2016 CROHME: Competition on recognition of online handwritten mathematical expressions. In 15th international conference on frontiers in handwriting recognition, ICFHR 2016, Shenzhen, China, 23–26 October 2016, pp. 607–612.

Mouchère, H., Zanibbi, R., Garain, U., & Viard-Gaudin, C. (2016b). Advancing the state of the art for handwritten math recognition: The CROHME competitions, 2011–2014. IJDAR, 19(2), 173–189.

Ordonez, V., Han, X., Kuznetsova, P., Kulkarni, G., Mitchell, M., Yamaguchi, K., et al. (2016). Large scale retrieval and generation of image descriptions. International Journal of Computer Vision, 119(1), 46–59.MathSciNet

Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. CoRR arXiv:1511.06434.

Salimans, T., & Kingma, D. P. (2016). Weight normalization: A simple reparameterization to accelerate training of deep neural networks. In Advances in neural information processing systems 29: Annual conference on neural information processing systems 2016, 5–10 December 2016, Barcelona, Spain, p. 901.

Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., & Bai, X. (2018). Aster: An attentional scene text recognizer with flexible rectification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1(99), 1.

Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1), 1929–1958.MathSciNetMATH

Su, J., Carreras, X., & Duh, K. (Eds.). (2016). Proceedings of the 2016 conference on empirical methods in natural language processing, EMNLP 2016, Austin, Texas, USA, 1–4 November 2016. The Association for Computational Linguistics.

Tu, Z., Lu, Z., Liu, Y., Liu, X., & Li, H. (2016). Modeling coverage for neural machine translation. In Proceedings of the 54th annual meeting of the association for computational linguistics, ACL 2016, 7–12 August 2016, Berlin, Germany, Volume 1: Long Papers.

Wu, Y., Yin, F., Zhang, X., Liu, L., & Liu, C. (2018a). SCAN: Sliding convolutional attention network for scene text recognition. CoRR arXiv:1806.00578.

Wu, J., Yin, F., Zhang, Y., Zhang, X., & Liu, C. (2018b). Image-to-markup generation via paired adversarial learning. In Machine learning and knowledge discovery in databases—European Conference, ECML PKDD 2018, Dublin, Ireland, 10–14 September 2018, Proceedings, Part I, pp. 18–34.

Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A. C., Salakhutdinov, R., et al.: Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of the 32nd international conference on machine learning, ICML 2015, Lille, France, 6–11 July 2015, pp. 2048–2057.

Zanibbi, R., & Blostein, D. (2012). Recognition and retrieval of mathematical expressions. IJDAR, 15(4), 331–357.

Zhang, J., Du, J., & Dai, L. (2017a) A gru-based encoder-decoder approach with attention for online handwritten mathematical expression recognition. In 14th IAPR international conference on document analysis and recognition, ICDAR 2017, Kyoto, Japan, 9–15 November 2017, pp. 902–907.

Zhang, J., Du, J., & Dai, L. (2018). Multi-scale attention with dense encoder for handwritten mathematical expression recognition. In 24th international conference on pattern recognition, ICPR 2018, Beijing, China, 20–24 August 2018, pp. 2245–2250.

Zhang, J., Du, J., & Dai, L. (2019). Track, attend, and parse (TAP): An end-to-end framework for online handwritten mathematical expression recognition. IEEE Transactions on Multimedia, 21(1), 221–233.CrossRef

Zhang, J., Du, J., Zhang, S., Liu, D., Hu, Y., Hu, J., et al. (2017b). Watch, attend and parse: An end-to-end neural network based approach to handwritten mathematical expression recognition. Pattern Recognition, 71, 196–206.

Zhang, Y., Liang, S., Nie, S., Liu, W., & Peng, S. (2018). Robust offline handwritten character recognition through exploring writer-independent features under the guidance of printed data. Pattern Recognition Letters, 106, 20–26.

Zhou, X., Wang, D., Tian, F., Liu, C., & Nakagawa, M. (2013). Handwritten Chinese/Japanese text recognition using semi-Markov conditional random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(10), 2413–2426.

Titel: Handwritten Mathematical Expression Recognition via Paired Adversarial Learning
verfasst von: Jin-Wen Wu
Fei Yin
Yan-Ming Zhang
Xu-Yao Zhang
Cheng-Lin Liu
Publikationsdatum: 21.01.2020
Verlag: Springer US
Erschienen in: International Journal of Computer Vision / Ausgabe 10-11/2020
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI: https://doi.org/10.1007/s11263-020-01291-5

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 10-11/2020

Discriminative Region Proposal Adversarial Network for High-Quality Image-to-Image Translation

Discriminator Feature-Based Inference by Recycling the Discriminator of GANs

Pix2Shape: Towards Unsupervised Learning of 3D Scenes from Images Using a View-Based Representation

SliderGAN: Synthesizing Expressive Face Images by Sliding 3D Blendshape Parameters

GADE: A Generative Adversarial Approach to Density Estimation and its Applications

Adversarial Confidence Learning for Medical Image Segmentation and Synthesis

Premium Partner