Skip to main content
Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) 3/2022

22.04.2022 | Original Paper

Fusion of visual representations for multimodal information extraction from unstructured transactional documents

verfasst von: Berke Oral, Gülşen Eryiğit

Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) | Ausgabe 3/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The importance of automated document understanding in terms of today’s businesses’ speed, efficiency, and cost reduction is indisputable. Although structured and semi-structured business documents have been studied intensively within the literature, information extraction from the unstructured ones remains still an open and challenging research topic due to their difficulty levels and the scarcity of available datasets. Transactional documents occupy a special place among the various types of business documents as they serve to track the financial flow and are the most studied type accordingly. The processing of unstructured transactional documents requires the extraction of complex relations (i.e., n-ary, document-level, overlapping, and nested relations). Studies focusing on unstructured transactional documents rely mostly on textual information. However, the impact of their visual compositions remains an unexplored area and may be valuable on their automatic understanding. For the first time in the literature, this article investigates the impact of using different visual representations and their fusion on information extraction from unstructured transactional documents (i.e., for complex relation extraction from money transfer order documents). It introduces and experiments with five different visual representation approaches (i.e., word bounding box, grid embedding, grid convolutional neural network, layout embedding, and layout graph convolutional neural network) and their possible fusion with five different strategies (i.e., three basic vector operations, weighted fusion, and attention-based fusion). The results show that fusion strategies provide a valuable enhancement on combining diverse visual information from which unstructured transactional document understanding obtains different benefits depending on the context. While different visual representations have little effect when added individually to a pure textual baseline, their fusion provides a relative error reduction of up to 33%.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
There exist also semi-structured money transfer orders which are processed with table-detection algorithms, which are beyond the scope this article.
 
2
In the original study, Oral et al. [2] also use character embeddings next to pretrained textual word embeddings and report that this helps the performances at very low levels (0.2 percentage points) for NER. To alleviate this complexity, we dropped the character BiLSTM layer from textual representations to better observe the effects of the visual representations.
 
3
The term Layout embedding is also used by Xu et al. [22], but the approach that we introduce here should not be confused with it which is more similar to our Grid Embedding approach.
 
4
We also tested with an attention-based fusion approach using textual features as our attention context, but could not obtain good results.
 
5
Although there could appear semi-structured documents in this domain containing well-formed forms and tables, the authors state that these are not included in this dataset.
 
Literatur
1.
Zurück zum Zitat Graliński, F., Stanisławek, T., Wróblewska, A., Lipiński, D., Kaliska, A., Rosalska, P., Topolski, B., Biecek, P.: Kleister: A novel task for information extraction involving long documents with complex layout. arXiv preprint arXiv:2003.02356 (2020) Graliński, F., Stanisławek, T., Wróblewska, A., Lipiński, D., Kaliska, A., Rosalska, P., Topolski, B., Biecek, P.: Kleister: A novel task for information extraction involving long documents with complex layout. arXiv preprint arXiv:​2003.​02356 (2020)
4.
Zurück zum Zitat Chalkidis, I., Androutsopoulos, I., Michos, A.: Extracting contract elements. In: Proceedings of the 16th Edition of the International Conference on Articial Intelligence and Law. ICAIL ’17, pp. 19–28. Association for Computing Machinery, New York, NY, USA (2017). https://doi.org/10.1145/3086512.3086515 Chalkidis, I., Androutsopoulos, I., Michos, A.: Extracting contract elements. In: Proceedings of the 16th Edition of the International Conference on Articial Intelligence and Law. ICAIL ’17, pp. 19–28. Association for Computing Machinery, New York, NY, USA (2017). https://​doi.​org/​10.​1145/​3086512.​3086515
7.
Zurück zum Zitat Harley, A.W., Ufkes, A., Derpanis, K.G.: Evaluation of deep convolutional nets for document image classification and retrieval. In: International Conference on Document Analysis and Recognition (ICDAR) (2015) Harley, A.W., Ufkes, A., Derpanis, K.G.: Evaluation of deep convolutional nets for document image classification and retrieval. In: International Conference on Document Analysis and Recognition (ICDAR) (2015)
8.
Zurück zum Zitat Park, S., Shin, S., Lee, B., Lee, J., Surh, J., Seo, M., Lee, H.: Cord: A consolidated receipt dataset for post-ocr parsing. In: Workshop on Document Intelligence at NeurIPS 2019 (2019) Park, S., Shin, S., Lee, B., Lee, J., Surh, J., Seo, M., Lee, H.: Cord: A consolidated receipt dataset for post-ocr parsing. In: Workshop on Document Intelligence at NeurIPS 2019 (2019)
9.
Zurück zum Zitat Huang, Z., Chen, K., He, J., Bai, X., Karatzas, D., Lu, S., Jawahar, C.V.: Icdar2019 competition on scanned receipt ocr and information extraction. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1516–1520 (2019). https://doi.org/10.1109/ICDAR.2019.00244 Huang, Z., Chen, K., He, J., Bai, X., Karatzas, D., Lu, S., Jawahar, C.V.: Icdar2019 competition on scanned receipt ocr and information extraction. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1516–1520 (2019). https://​doi.​org/​10.​1109/​ICDAR.​2019.​00244
10.
11.
Zurück zum Zitat Palm, R.B., Winther, O., Laws, F.: Cloudscan—A configuration-free invoice analysis system using recurrent neural networks. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 01, pp. 406–413 (2017). https://doi.org/10.1109/ICDAR.2017.74 Palm, R.B., Winther, O., Laws, F.: Cloudscan—A configuration-free invoice analysis system using recurrent neural networks. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 01, pp. 406–413 (2017). https://​doi.​org/​10.​1109/​ICDAR.​2017.​74
12.
Zurück zum Zitat Sage, C., Aussem, A., Elghazel, H., Eglin, V., Espinas, J.: Recurrent neural network approach for table field extraction in business documents. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1308–1313 (2019). https://doi.org/10.1109/ICDAR.2019.00211 Sage, C., Aussem, A., Elghazel, H., Eglin, V., Espinas, J.: Recurrent neural network approach for table field extraction in business documents. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1308–1313 (2019). https://​doi.​org/​10.​1109/​ICDAR.​2019.​00211
13.
Zurück zum Zitat Sage, C., Aussem, A., Eglin, V., Elghazel, H., Espinas, J.: End-to-end extraction of structured information from business documents with pointer-generator networks. In: Proceedings of the Fourth Workshop on Structured Prediction for NLP, pp. 43–52. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.spnlp-1.6 Sage, C., Aussem, A., Eglin, V., Elghazel, H., Espinas, J.: End-to-end extraction of structured information from business documents with pointer-generator networks. In: Proceedings of the Fourth Workshop on Structured Prediction for NLP, pp. 43–52. Association for Computational Linguistics, Online (2020). https://​doi.​org/​10.​18653/​v1/​2020.​spnlp-1.​6
14.
Zurück zum Zitat Santosh, K., Belaid, A.: Document information extraction and its evaluation based on client’s relevance. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 35–39 (2013). IEEE Santosh, K., Belaid, A.: Document information extraction and its evaluation based on client’s relevance. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 35–39 (2013). IEEE
16.
Zurück zum Zitat Katti, A.R., Reisswig, C., Guder, C., Brarda, S., Bickel, S., Höhne, J., Faddoul, J.B.: Chargrid: Towards understanding 2D documents. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 4459–4469. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/D18-1476 Katti, A.R., Reisswig, C., Guder, C., Brarda, S., Bickel, S., Höhne, J., Faddoul, J.B.: Chargrid: Towards understanding 2D documents. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 4459–4469. Association for Computational Linguistics, Brussels, Belgium (2018). https://​doi.​org/​10.​18653/​v1/​D18-1476
19.
Zurück zum Zitat Zhao, X., Niu, E., Wu, Z., Wang, X.: Cutie: Learning to understand documents with convolutional universal text information extractor. arXiv preprint arXiv:1903.12363 (2019) Zhao, X., Niu, E., Wu, Z., Wang, X.: Cutie: Learning to understand documents with convolutional universal text information extractor. arXiv preprint arXiv:​1903.​12363 (2019)
20.
Zurück zum Zitat Liu, X., Gao, F., Zhang, Q., Zhao, H.: Graph convolution for multimodal information extraction from visually rich documents. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers), pp. 32–39. Association for Computational Linguistics, Minneapolis, Minnesota (2019). https://doi.org/10.18653/v1/N19-2005 Liu, X., Gao, F., Zhang, Q., Zhao, H.: Graph convolution for multimodal information extraction from visually rich documents. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers), pp. 32–39. Association for Computational Linguistics, Minneapolis, Minnesota (2019). https://​doi.​org/​10.​18653/​v1/​N19-2005
22.
Zurück zum Zitat Xu, Y., Xu, Y., Lv, T., Cui, L., Wei, F., Wang, G., Lu, Y., Florencio, D., Zhang, C., Che, W., Zhang, M., Zhou, L.: LayoutLMv2: Multi-modal pre-training for visually-rich document understanding. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 2579–2591. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.acl-long.201 Xu, Y., Xu, Y., Lv, T., Cui, L., Wei, F., Wang, G., Lu, Y., Florencio, D., Zhang, C., Che, W., Zhang, M., Zhou, L.: LayoutLMv2: Multi-modal pre-training for visually-rich document understanding. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 2579–2591. Association for Computational Linguistics, Online (2021). https://​doi.​org/​10.​18653/​v1/​2021.​acl-long.​201
23.
Zurück zum Zitat Zhang, P., Xu, Y., Cheng, Z., Pu, S., Lu, J., Qiao, L., Niu, Y., Wu, F.: Trie: End-to-end text reading and information extraction for document understanding. In: Proceedings of the 28th ACM International Conference on Multimedia. MM ’20, pp. 1413–1422. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3394171.3413900 Zhang, P., Xu, Y., Cheng, Z., Pu, S., Lu, J., Qiao, L., Niu, Y., Wu, F.: Trie: End-to-end text reading and information extraction for document understanding. In: Proceedings of the 28th ACM International Conference on Multimedia. MM ’20, pp. 1413–1422. Association for Computing Machinery, New York, NY, USA (2020). https://​doi.​org/​10.​1145/​3394171.​3413900
24.
Zurück zum Zitat Yadav, V., Bethard, S.: A survey on recent advances in named entity recognition from deep learning models. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 2145–2158. Association for Computational Linguistics, Santa Fe, New Mexico, USA (2018). https://www.aclweb.org/anthology/C18-1182 Yadav, V., Bethard, S.: A survey on recent advances in named entity recognition from deep learning models. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 2145–2158. Association for Computational Linguistics, Santa Fe, New Mexico, USA (2018). https://​www.​aclweb.​org/​anthology/​C18-1182
26.
Zurück zum Zitat Weld, H., Huang, X., Long, S., Poon, J., Han, S.C.: A survey of joint intent detection and slot-filling models in natural language understanding. arXiv preprint arXiv:2101.08091 (2021) Weld, H., Huang, X., Long, S., Poon, J., Han, S.C.: A survey of joint intent detection and slot-filling models in natural language understanding. arXiv preprint arXiv:​2101.​08091 (2021)
27.
Zurück zum Zitat Subramani, N., Matton, A., Greaves, M., Lam, A.: A Survey of Deep Learning Approaches for OCR and Document Understanding (2021) Subramani, N., Matton, A., Greaves, M., Lam, A.: A Survey of Deep Learning Approaches for OCR and Document Understanding (2021)
28.
Zurück zum Zitat Jiang, H., Bao, Q., Cheng, Q., Yang, D., Wang, L., Xiao, Y.: Complex relation extraction: Challenges and opportunities. arXiv preprint arXiv:2012.04821 (2020) Jiang, H., Bao, Q., Cheng, Q., Yang, D., Wang, L., Xiao, Y.: Complex relation extraction: Challenges and opportunities. arXiv preprint arXiv:​2012.​04821 (2020)
29.
Zurück zum Zitat Sahin, G.G., Emekligil, E., Arslan, S., Ağın, O., Eryiğit, G.: Relation extraction via one-shot dependency parsing on intersentential, higher-order, and nested relations. Turk. J. Electr. Eng. Comput. Sci. 26(2), 830–843 (2018)CrossRef Sahin, G.G., Emekligil, E., Arslan, S., Ağın, O., Eryiğit, G.: Relation extraction via one-shot dependency parsing on intersentential, higher-order, and nested relations. Turk. J. Electr. Eng. Comput. Sci. 26(2), 830–843 (2018)CrossRef
30.
Zurück zum Zitat Oral, B., Emekligil, E., Arslan, S., Eryiğit, G.: Extracting complex relations from banking documents. In: Proceedings of the Second Workshop on Economics and Natural Language Processing, pp. 1–9. Association for Computational Linguistics, Hong Kong (2019). https://doi.org/10.18653/v1/D19-5101 Oral, B., Emekligil, E., Arslan, S., Eryiğit, G.: Extracting complex relations from banking documents. In: Proceedings of the Second Workshop on Economics and Natural Language Processing, pp. 1–9. Association for Computational Linguistics, Hong Kong (2019). https://​doi.​org/​10.​18653/​v1/​D19-5101
32.
Zurück zum Zitat Yu, W., Lu, N., Qi, X., Gong, P., Xiao, R.: PICK: Processing key information extraction from documents using improved graph learning-convolutional networks. In: 2020 25th International Conference on Pattern Recognition (ICPR) (2020) Yu, W., Lu, N., Qi, X., Gong, P., Xiao, R.: PICK: Processing key information extraction from documents using improved graph learning-convolutional networks. In: 2020 25th International Conference on Pattern Recognition (ICPR) (2020)
33.
Zurück zum Zitat Bach, N., Badaskar, S.: A review of relation extraction. Lit. Rev. Lang. Stat. II(2), 1–15 (2007) Bach, N., Badaskar, S.: A review of relation extraction. Lit. Rev. Lang. Stat. II(2), 1–15 (2007)
34.
Zurück zum Zitat Riedel, S., Yao, L., McCallum, A.: Modeling relations and their mentions without labeled text. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) Machine Learning and Knowledge Discovery in Databases, pp. 148–163. Springer, Berlin, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15939-8_10 Riedel, S., Yao, L., McCallum, A.: Modeling relations and their mentions without labeled text. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) Machine Learning and Knowledge Discovery in Databases, pp. 148–163. Springer, Berlin, Heidelberg (2010). https://​doi.​org/​10.​1007/​978-3-642-15939-8_​10
35.
Zurück zum Zitat McDonald, R., Pereira, F., Kulick, S., Winters, S., Jin, Y., White, P.: Simple algorithms for complex relation extraction with applications to biomedical IE. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), pp. 491–498. Association for Computational Linguistics, Ann Arbor, MI (2005). https://doi.org/10.3115/1219840.1219901 McDonald, R., Pereira, F., Kulick, S., Winters, S., Jin, Y., White, P.: Simple algorithms for complex relation extraction with applications to biomedical IE. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), pp. 491–498. Association for Computational Linguistics, Ann Arbor, MI (2005). https://​doi.​org/​10.​3115/​1219840.​1219901
36.
Zurück zum Zitat Peng, N., Poon, H., Quirk, C., Toutanova, K., Yih, W.T.: Cross-sentence n-ary relation extraction with graph lstms. Trans. Assoc. Comput. Linguist. 5, 101–115 (2017)CrossRef Peng, N., Poon, H., Quirk, C., Toutanova, K., Yih, W.T.: Cross-sentence n-ary relation extraction with graph lstms. Trans. Assoc. Comput. Linguist. 5, 101–115 (2017)CrossRef
37.
Zurück zum Zitat Jia, R., Wong, C., Poon, H.: Document-level n-ary relation extraction with multiscale representation learning. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 3693–3704. Association for Computational Linguistics, Minneapolis, MN (2019). https://doi.org/10.18653/v1/N19-1370 Jia, R., Wong, C., Poon, H.: Document-level n-ary relation extraction with multiscale representation learning. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 3693–3704. Association for Computational Linguistics, Minneapolis, MN (2019). https://​doi.​org/​10.​18653/​v1/​N19-1370
38.
Zurück zum Zitat Song, L., Zhang, Y., Wang, Z., Gildea, D.: N-ary relation extraction using graph-state LSTM. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2226–2235. Association for Computational Linguistics, Brussels, Belgium (2018). https://doi.org/10.18653/v1/D18-1246 Song, L., Zhang, Y., Wang, Z., Gildea, D.: N-ary relation extraction using graph-state LSTM. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2226–2235. Association for Computational Linguistics, Brussels, Belgium (2018). https://​doi.​org/​10.​18653/​v1/​D18-1246
39.
Zurück zum Zitat Prasojo, R.E., Kacimi, M., Nutt, W.: Stuffie: Semantic tagging of unlabeled facets using fine-grained information extraction. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management. CIKM ’18, pp. 467–476. Association for Computing Machinery, New York, NY, USA (2018). https://doi.org/10.1145/3269206.3271812 Prasojo, R.E., Kacimi, M., Nutt, W.: Stuffie: Semantic tagging of unlabeled facets using fine-grained information extraction. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management. CIKM ’18, pp. 467–476. Association for Computing Machinery, New York, NY, USA (2018). https://​doi.​org/​10.​1145/​3269206.​3271812
41.
Zurück zum Zitat Zeng, X., Zeng, D., He, S., Liu, K., Zhao, J.: Extracting relational facts by an end-to-end neural model with copy mechanism. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 506–514. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-1047 Zeng, X., Zeng, D., He, S., Liu, K., Zhao, J.: Extracting relational facts by an end-to-end neural model with copy mechanism. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 506–514. Association for Computational Linguistics, Melbourne, Australia (2018). https://​doi.​org/​10.​18653/​v1/​P18-1047
42.
Zurück zum Zitat Sahu, S.K., Christopoulou, F., Miwa, M., Ananiadou, S.: Inter-sentence relation extraction with document-level graph convolutional neural network. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 4309–4316. Association for Computational Linguistics, Florence, Italy (2019). https://doi.org/10.18653/v1/P19-1423 Sahu, S.K., Christopoulou, F., Miwa, M., Ananiadou, S.: Inter-sentence relation extraction with document-level graph convolutional neural network. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 4309–4316. Association for Computational Linguistics, Florence, Italy (2019). https://​doi.​org/​10.​18653/​v1/​P19-1423
43.
Zurück zum Zitat Xiong, L., Hu, C., Xiong, C., Campos, D., Overwijk, A.: Open domain web keyphrase extraction beyond language modeling. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 5175–5184. Association for Computational Linguistics, Hong Kong, China (2019). https://doi.org/10.18653/v1/D19-1521 Xiong, L., Hu, C., Xiong, C., Campos, D., Overwijk, A.: Open domain web keyphrase extraction beyond language modeling. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 5175–5184. Association for Computational Linguistics, Hong Kong, China (2019). https://​doi.​org/​10.​18653/​v1/​D19-1521
45.
Zurück zum Zitat Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettlemoyer, L.: Deep contextualized word representations. CoRR arXiv:1802.05365 (2018) Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettlemoyer, L.: Deep contextualized word representations. CoRR arXiv:​1802.​05365 (2018)
46.
Zurück zum Zitat Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR arXiv:1810.04805 (2018) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR arXiv:​1810.​04805 (2018)
48.
Zurück zum Zitat Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26, pp. 3111–3119. Curran Associates, Inc. (2013) Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26, pp. 3111–3119. Curran Associates, Inc. (2013)
Metadaten
Titel
Fusion of visual representations for multimodal information extraction from unstructured transactional documents
verfasst von
Berke Oral
Gülşen Eryiğit
Publikationsdatum
22.04.2022
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal on Document Analysis and Recognition (IJDAR) / Ausgabe 3/2022
Print ISSN: 1433-2833
Elektronische ISSN: 1433-2825
DOI
https://doi.org/10.1007/s10032-022-00399-3

Weitere Artikel der Ausgabe 3/2022

International Journal on Document Analysis and Recognition (IJDAR) 3/2022 Zur Ausgabe

Premium Partner