nach oben

International Journal on Document Analysis and Recognition (IJDAR)

Erschienen in:

27.10.2022 | Original Paper

Tables to LaTeX: structure and content extraction from scientific tables

verfasst von: Pratik Kayal, Mrinal Anand, Harsh Desai, Mayank Singh

Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) | Ausgabe 2/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Scientific documents contain tables that list important information in a concise fashion. Structure and content extraction from tables embedded within PDF research documents is a very challenging task due to the existence of visual features like spanning cells and content features like mathematical symbols and equations. Most existing table structure identification methods tend to ignore these academic writing features. In this paper, we adapt the transformer-based language modeling paradigm for scientific table structure and content extraction. Specifically, the proposed model converts a tabular image to its corresponding LaTeX source code. Overall, we outperform the current state-of-the-art baselines and achieve an exact match accuracy of 70.35 and 49.69% on table structure and content extraction, respectively. Further analysis demonstrates that the proposed models efficiently identify the number of rows and columns, the alphanumeric characters, the LaTeX tokens, and symbols.

Vorheriger Artikel Retrieval-based language model adaptation for handwritten Chinese text recognition

Nächster Artikel Refocus attention span networks for handwriting line recognition

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

http://arxiv.org/.

https://github.com/emcconville/wand.

Defining tabular structure requires lesser tokens. Thus, we only experimented with 250 variant.

Ba, L.J., Kiros, J.R., Hinton, G.E.: Layer normalization. CoRR (2016). http://arxiv.org/abs/1607.06450

Brischoux, F., Legagneux, P.: Don’t format manuscripts. Scientist 23(7), 24 (2009)

Chi, Z., Huang, H., Xu, H., Yu, H., Yin, W., Mao, X.: Complicated table structure recognition. CoRR (2019). http://arxiv.org/abs/1908.04729

Deng, Y., Kanervisto, A., Ling, J., Rush, A.M.: Image-to-markup generation with coarse-to-fine attention. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 980–989. JMLR. org (2017)

Deng, Y., Rosenberg, D., Mann, G.: Challenges in end-to-end neural scientific table recognition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 894–901 (2019)

Douglas, S., Hurst, M., Quinn, D., et al.: Using natural language processing for identifying and interpreting tables in plain text. In: Proceedings of the Fourth Annual Symposium on Document Analysis and Information Retrieval, pp. 535–546 (1995)

Embley, D.W., Hurst, M., Lopresti, D., Nagy, G.: Table-processing paradigms: a research survey. IJDAR 8(2–3), 66–86 (2006)CrossRef

Fang, J., Tao, X., Tang, Z., Qiu, R., Liu, Y.: Dataset, ground-truth and performance metrics for table detection evaluation. In: 2012 10th IAPR International Workshop on Document Analysis Systems, pp. 445–449. IEEE (2012)

Feng, X., Yao, H., Yi, Y., Zhang, J., Zhang, S.: Scene text recognition via transformer. arXiv preprint (2020). arXiv:2003.08077

10.

Gao, L., Huang, Y., Déjean, H., Meunier, J.L., Yan, Q., Fang, Y., Kleber, F., Lang, E.: Icdar 2019 competition on table detection and recognition (ctdar). In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1510–1515. IEEE (2019)

11.

Göbel, M., Hassan, T., Oro, E., Orsi, G.: Icdar 2013 table competition. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1449–1453. IEEE (2013)

12.

Hashmi, K.A., Liwicki, M., Stricker, D., Afzal, M.A., Afzal, M.A., Afzal, M.Z.: Current status and performance analysis of table recognition in document images with deep neural networks. IEEE Access 9, 87663–87685 (2021)CrossRef

13.

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition (2016)

14.

Huang, L., Wang, W., Chen, J., Wei, X.Y.: Attention on attention for image captioning. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4634–4643 (2019)

15.

Klein, G., Kim, Y., Deng, Y., Senellart, J., Rush, A.: OpenNMT: open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp. 67–72. Association for Computational Linguistics, Vancouver (2017). https://www.aclweb.org/anthology/P17-4012

16.

Levenshtein, V.I., et al.: Binary codes capable of correcting deletions, insertions, and reversals. In: Soviet Physics Doklady, vol. 10, pp. 707–710. Soviet Union (1966)

17.

Li, M., Cui, L., Huang, S., Wei, F., Zhou, M., Li, Z.: Tablebank: table benchmark for image-based table detection and recognition. In: LREC 2020 (2020). https://www.microsoft.com/en-us/research/publication/tablebank-table-benchmark-for-image-based-table-detection-and-recognition/

18.

Lyu, P., Yang, Z., Leng, X., Wu, X., Li, R., Shen, X.: 2d attentional irregular scene text recognizer. arXiv preprint (2019). arXiv:1906.05708

19.

Niklaus, C., Cetto, M., Freitas, A., Handschuh, S.: A survey on open information extraction. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 3866–3878 (2018)

20.

Paliwal, S.S., Vishwanath, D., Rahul, R., Sharma, M., Vig, L.: Tablenet: deep learning model for end-to-end table detection and tabular data extraction from scanned document images. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 128–133. IEEE (2019)

21.

Raja, S., Mondal, A., Jawahar, C.V.: Table structure recognition using top-down and bottom-up cues. CoRR (2020). https://arxiv.org/abs/2010.04565

22.

Ramel, J.Y., Crucianu, M., Vincent, N., Faure, C.: Detection, extraction and representation of tables. In: 7th International Conference on Document Analysis and Recognition. Proceedings, pp. 374–378. IEEE (2003)

23.

Sarawagi, S.: Information extraction. Found. Trends Databases 1(3), 261–377 (2008)CrossRefMATH

24.

Schreiber, S., Agne, S., Wolf, I., Dengel, A., Ahmed, S.: Deepdesrt: deep learning for detection and structure recognition of tables in document images. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1162–1167. IEEE (2017)

25.

Siegel, N., Lourie, N., Power, R., Ammar, W.: Extracting scientific figures with distantly supervised neural networks. In: Proceedings of the 18th ACM/IEEE On Joint Conference on Digital Libraries, pp. 223–232 (2018)

26.

Silva, A.C., Jorge, A.M., Torgo, L.: Design of an end-to-end method to extract information from tables. Int. J. Doc. Anal. Recogn. (IJDAR) 8(2–3), 144–171 (2006)CrossRef

27.

Singh, M., Sarkar, R., Vyas, A., Goyal, P., Mukherjee, A., Chakrabarti, S.: Automated early leaderboard generation from comparative tables. In: European Conference on Information Retrieval, pp. 244–257. Springer (2019)

28.

Smith, R.: An overview of the tesseract OCR engine. In: 9th International Conference on Document Analysis and Recognition (ICDAR 2007), vol. 2, pp. 629–633. IEEE (2007)

29.

pandas development team, T.: pandas-dev/pandas: Pandas (2020). https://doi.org/10.5281/zenodo.3509134

30.

Tenopir, C., King, D.W.: 6—The growth of journals publishing. In: B. Cope, A. Phillips (eds.) The Future of the Academic Journal, 2nd edn., pp. 159 – 178. Chandos Publishing (2014). https://doi.org/10.1533/9781780634647.159. http://www.sciencedirect.com/science/article/pii/B9781843347835500069

31.

Tensmeyer, C., Morariu, V.I., Price, B., Cohen, S., Martinez, T.: Deep splitting and merging for table structure decomposition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 114–121 (2019). https://doi.org/10.1109/ICDAR.2019.00027

32.

Vasileiadis, M., Kaklanis, N., Votis, K., Tzovaras, D.: Extraction of tabular data from document images. In: Proceedings of the 14th Web for all Conference on the Future of Accessible Work, pp. 1–2 (2017)

33.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)

34.

Yang, L., Wang, P., Li, H., Gao, Y., Zhang, L., Shen, C., Zhang, Y.: A simple and strong convolutional-attention network for irregular text recognition. arXiv preprint (2019). arXiv:1904.01375

35.

Yildiz, B., Kaiser, K., Miksch, S.: pdf2table: a method to extract table information from pdf files. In: IICAI, pp. 1773–1785 (2005)

36.

Zhong, X., ShafieiBavani E., Jimeno Yepes, A.: Image-based table recognition: data, model, and evaluation. European Conference on Computer Vision. Springer, Cham (2020)

37.

Zhong, X., Tang, J., Yepes, A.J.: Publaynet: largest dataset ever for document layout analysis. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1015–1022. IEEE (2019). https://doi.org/10.1109/ICDAR.2019.00166

Titel: Tables to LaTeX: structure and content extraction from scientific tables
verfasst von: Pratik Kayal
Mrinal Anand
Harsh Desai
Mayank Singh
Publikationsdatum: 27.10.2022
Verlag: Springer Berlin Heidelberg
Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) / Ausgabe 2/2023
Print ISSN: 1433-2833
Elektronische ISSN: 1433-2825
DOI: https://doi.org/10.1007/s10032-022-00420-9

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 2/2023

Refocus attention span networks for handwriting line recognition

Retrieval-based language model adaptation for handwritten Chinese text recognition

WriterINet: a multi-path deep CNN for offline text-independent writer identification

Adaptive dewarping of severely warped camera-captured document images based on document map generation

Premium Partner