Skip to main content
Top

2022 | OriginalPaper | Chapter

Making Equations Accessible in Scientific Documents

Authors : Shivansh Juyal, Sanjeev Sharma, Neha Jadhav, Volker Sorge, M. Balakrishnan

Published in: Computers Helping People with Special Needs

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Unlike a standard text document, a STEM document not only consists of text information but different components such as tables, figures, captions, mathematical equations etc. This paper presents a novel technique to detect mathematical equations in PDF documents and convert those equations into a more accessible format such as https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-08648-9_4/518378_1_En_4_IEq1_HTML.gif
Latex
. We use visual features of the document to detect the mathematical equations using object detection and subsequently apply heuristics to the generated bounding boxes to precisely cover the complete equation. These detections are passed to a tool called Maxtract which will rewrite the equations in https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-08648-9_4/518378_1_En_4_IEq2_HTML.gif
Latex
.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
3.
go back to reference Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: YOLOv4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: YOLOv4: optimal speed and accuracy of object detection. arXiv preprint arXiv:​2004.​10934 (2020)
4.
go back to reference Gao, L., Yi, X., Liao, Y., Jiang, Z., Yan, Z., Tang, Z.: A deep learning-based formula detection method for PDF documents. In: 14th International Conference on Document Analysis and Recognition, vol. 1, pp. 553–558. IEEE (2017) Gao, L., Yi, X., Liao, Y., Jiang, Z., Yan, Z., Tang, Z.: A deep learning-based formula detection method for PDF documents. In: 14th International Conference on Document Analysis and Recognition, vol. 1, pp. 553–558. IEEE (2017)
6.
go back to reference Inoue, K., Miyazaki, R., Suzuki, M.: Optical recognition of printed mathematical documents. In: Proceedings of the Third Asian Technology Conference in Mathematics, pp. 280–289 (1998) Inoue, K., Miyazaki, R., Suzuki, M.: Optical recognition of printed mathematical documents. In: Proceedings of the Third Asian Technology Conference in Mathematics, pp. 280–289 (1998)
7.
go back to reference Kacem, A., Belaïd, A., Ahmed, M.B.: Automatic extraction of printed mathematical formulas using fuzzy logic and propagation of context. Int. J. Doc. Anal. Recogn. 4(2), 97–108 (2001)CrossRef Kacem, A., Belaïd, A., Ahmed, M.B.: Automatic extraction of printed mathematical formulas using fuzzy logic and propagation of context. Int. J. Doc. Anal. Recogn. 4(2), 97–108 (2001)CrossRef
8.
go back to reference Mali, P., Kukkadapu, P., Mahdavi, M., Zanibbi, R.: ScanSSD: scanning single shot detector for mathematical formulas in PDF document images. arXiv preprint arXiv:2003.08005 (2020) Mali, P., Kukkadapu, P., Mahdavi, M., Zanibbi, R.: ScanSSD: scanning single shot detector for mathematical formulas in PDF document images. arXiv preprint arXiv:​2003.​08005 (2020)
9.
go back to reference Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28, 91–99 (2015) Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28, 91–99 (2015)
10.
go back to reference Sorge, V., Bansal, A., Jadhav, N.M., Garg, H., Verma, A., Balakrishnan, M.: Towards generating web-accessible stem documents from PDF. In: Proceedings of the 17th International Web for All Conference, pp. 1–5 (2020) Sorge, V., Bansal, A., Jadhav, N.M., Garg, H., Verma, A., Balakrishnan, M.: Towards generating web-accessible stem documents from PDF. In: Proceedings of the 17th International Web for All Conference, pp. 1–5 (2020)
11.
go back to reference Suzuki, M., Tamari, F., Fukuda, R., Uchida, S., Kanahori, T.: INFTY: an integrated OCR system for mathematical documents. In: Proceedings of the 2003 ACM Symposium on Document Engineering, pp. 95–104 (2003) Suzuki, M., Tamari, F., Fukuda, R., Uchida, S., Kanahori, T.: INFTY: an integrated OCR system for mathematical documents. In: Proceedings of the 2003 ACM Symposium on Document Engineering, pp. 95–104 (2003)
13.
go back to reference Zhong, Y., et al.: 1st place solution for ICDAR 2021 competition on mathematical formula detection. arXiv preprint arXiv:2107.05534 (2021) Zhong, Y., et al.: 1st place solution for ICDAR 2021 competition on mathematical formula detection. arXiv preprint arXiv:​2107.​05534 (2021)
Metadata
Title
Making Equations Accessible in Scientific Documents
Authors
Shivansh Juyal
Sanjeev Sharma
Neha Jadhav
Volker Sorge
M. Balakrishnan
Copyright Year
2022
DOI
https://doi.org/10.1007/978-3-031-08648-9_4