Top

International Journal on Document Analysis and Recognition (IJDAR)

Published in:

12-07-2022 | Original Paper

Sequence-aware multimodal page classification of Brazilian legal documents

Authors: Pedro H. Luz de Araujo, Ana Paula G. S. de Almeida, Fabricio Ataides Braz, Nilton Correia da Silva, Flavio de Barros Vidal, Teofilo E. de Campos

Published in: International Journal on Document Analysis and Recognition (IJDAR) | Issue 1/2023

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

The Brazilian Supreme Court receives tens of thousands of cases each semester. Court employees spend thousands of hours to execute the initial analysis and classification of those cases—which takes effort away from posterior, more complex stages of the case management workflow. In this paper, we explore multimodal classification of documents from Brazil’s Supreme Court. We train and evaluate our methods on a novel multimodal dataset of 6510 lawsuits (339,478 pages) with manual annotation assigning each page to one of six classes. Each lawsuit is an ordered sequence of pages, which are stored both as an image and as a corresponding text extracted through optical character recognition. We first train two unimodal classifiers: A ResNet pre-trained on ImageNet is fine-tuned on the images, and a convolutional network with filters of multiple kernel sizes is trained from scratch on document texts. We use them as extractors of visual and textual features, which are then combined through our proposed fusion module. Our fusion module can handle missing textual or visual input by using learned embeddings for missing data. Moreover, we experiment with bidirectional long short-term memory (biLSTM) networks and linear-chain conditional random fields to model the sequential nature of the pages. The multimodal approaches outperform both textual and visual classifiers, especially when leveraging the sequential nature of the pages.

previous article Open writer identification from offline handwritten signatures by jointing the one-class symbolic data analysis classifier and feature-dissimilarities

next article Pho(SC)-CTC—a hybrid approach towards zero-shot word image recognition

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

To the best of our knowledge.

http://ailab.unb.br/victor/lrec2020/.

Agam, G., Argamon, S., Frieder, O., Grossman, D., Lewis, D.: The Complex Document Image Processing (CDIP) test collection project (2006). http://ir.iit.edu/projects/CDIP.html

Audebert, N., Herold, C., Slimani, K., Vidal, C.: Multimodal deep networks for text and image-based document classification. CoRR abs/1907.06370 (2019). http://arxiv.org/abs/1907.06370

Bakkali, S., Ming, Z., Coustaty, M., Rusinol, M.: Visual and textual deep feature fusion for document image classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2020)

Beltagy, I., Peters, M.E., Cohan, A.: Longformer: The long-document transformer. CoRR abs/2004.05150 (2020). https://arxiv.org/abs/2004.05150

Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). http://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf

Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguistics 5, 135–146 (2017). https://doi.org/10.1162/tacl_a_00051.CrossRef

Braz, F.A., da Silva, N.C., Lima, J.A.S.: Leveraging effectiveness and efficiency in page stream deep segmentation. Eng. Appl. Artif. Intell. 105, 104394 (2021). https://doi.org/10.1016/j.engappai.2021.104394.CrossRef

Buitinck, L., Louppe, G., Blondel, M., Pedregosa, F., Mueller, A., Grisel, O., Niculae, V., Prettenhofer, P., Gramfort, A., Grobler, J., Layton, R., VanderPlas, J., Joly, A., Holt, B., Varoquaux, G.: API design for machine learning software: experiences from the scikit-learn project. In: ECML PKDD Workshop: Languages for Data Mining and Machine Learning, pp. 108–122 (2013)

Chen, N., Blostein, D.: A survey of document image classification: problem statement, classifier architecture and performance evaluation. Int. J. Document Anal. Recogn. (IJDAR) 10(1), 1–16 (2007). https://doi.org/10.1007/s10032-006-0020-2CrossRef

10.

Chung, J., Gülçehre, Ç., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. CoRR abs/1412.3555 (2014). http://arxiv.org/abs/1412.3555

11.

Conneau, A., Schwenk, H., Barrault, L., Lecun, Y.: Very deep convolutional networks for text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pp. 1107–1116. Association for Computational Linguistics, Valencia, Spain (2017). http://www.aclweb.org/anthology/E17-1104

12.

Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inform. Sci. 41(6), 391–407 (1990)CrossRef

13.

Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018). http://arxiv.org/abs/1810.04805

14.

Dimmick, D., Garris, M., Wilson, C., Flanagan, P.: Nist special database 2 - structured forms database users’ guide (2017). https://doi.org/10.6028/NIST.NSRDS.2-2017

15.

Engin, D., Emekligil, E., Oral, B., Arslan, S., Akpınar, M.: Multimodal deep neural networks for banking document classification. In: International Conference on Advances in Information Mining and Management, pp. 21–25 (2019)

16.

Ford, G., Thoma, G.R.: Ground truth data for document image analysis. In: Symposium on document image understanding and technology (SDIUT), pp. 199–205 (2003)

17.

Harley, A.W., Ufkes, A., Derpanis, K.G.: Evaluation of deep convolutional nets for document image classification and retrieval. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 991–995. IEEE (2015)

18.

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90

19.

Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef

20.

Howard, J., Ruder, S.: Universal language model fine-tuning for text classification. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 328–339. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-1031

21.

Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on Machine Learning - Volume 37, pp. 448–456. JMLR.org (2015). http://proceedings.mlr.press/v37/ioffe15.html

22.

Jain, R., Wigington, C.: Multimodal document image classification. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 71–77 (2019). https://doi.org/10.1109/ICDAR.2019.00021

23.

Kingma, D.P., Ba, J.: Adam: A method for stochastic optmisation. In: International Conference on Learning Representations (ICLR) (2015). Preprint available at https://arxiv.org/abs/1412.6980

24.

Kumar, J., Ye, P., Doermann, D.: Structural similarity for document image classification and retrieval. Pattern Recogn. Lett. 43, 119–126 (2014)CrossRef

25.

Lafferty, J.D., Andrew, M., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, ICML, pp. 282–289. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2001)

26.

Lewis, D., Agam, G., Argamon, S., Frieder, O., Grossman, D., Heard, J.: Building a test collection for complex document information processing. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’06, p. 665-666. Association for Computing Machinery, New York, NY, USA (2006). https://doi.org/10.1145/1148170.1148307

27.

Luz de Araujo, P.H., de Campos, T.E., Ataides Braz, F., Correia da Silva, N.: VICTOR: a dataset for Brazilian legal documents classification. In: Proceedings of The 12th Language Resources and Evaluation Conference (LREC), pp. 1449–1458. European Language Resources Association, Marseille, France (2020). https://www.aclweb.org/anthology/2020.lrec-1.181

28.

Mota, C., Lima, A., Nascimento, A., Miranda, P., de Mello, R.: Classificação de páginas de petições iniciais utilizando redes neurais convolucionais multimodais. In: Anais do XVII Encontro Nacional de Inteligência Artificial e Computacional, pp. 318–329. SBC, Porto Alegre, RS, Brasil (2020). https://doi.org/10.5753/eniac.2020.12139

29.

Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., Liu, P.J.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(140), 1–67 (2020). http://jmlr.org/papers/v21/20-074.html

30.

Ramshaw, L.A., Marcus, M.P.: Text chunking using transformation-based learning. In: Natural language processing using very large corpora, pp. 157–176. Springer (1999). https://doi.org/10.1007/978-94-017-2390-9_10. Preprint available at http://arxiv.org/abs/cmp-lg/9505040

31.

Rosenstein, M.T., Marx, Z., Kaelbling, L.P., Dietterich, T.G.: To transfer or not to transfer. In: In NIPS’05 Workshop, Inductive Transfer: 10 Years Later (2005)

32.

Rusiñol, M., Frinken, V., Karatzas, D., Bagdanov, A.D., Lladós, J.: Multimodal page classification in administrative document image streams. Int. J. Document Anal. Recogn. (IJDAR) 17(4), 331–341 (2014). https://doi.org/10.1007/s10032-014-0225-8CrossRef

33.

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Visi. (IJCV) 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-yMathSciNetCrossRef

34.

Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

35.

Secretaria de Comunicação Social do Conselho Nacional de Justiça: Sumário executivo do relatório justiça em números 2020 (2018). https://www.cnj.jus.br/wp-content/uploads/2020/08/WEB_V2_SUMARIO_EXECUTIVO_CNJ_JN2020.pdf

36.

Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)

37.

Smith, L.N.: Cyclical learning rates for training neural networks. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 464–472 (2017). https://doi.org/10.1109/WACV.2017.58

38.

Smith, L.N., Topin, N.: Super-convergence: Very fast training of neural networks using large learning rates. CoRR abs/1708.07120 (2017). http://arxiv.org/abs/1708.07120

39.

Smith, R.: An overview of the Tesseract OCR engine. In: Ninth International Conference on Document Analysis and Recognition (ICDAR), vol. 2, pp. 629–633. IEEE (2007)

40.

Supremo Tribunal Federal: Ministra Cármen Lúcia anuncia início de funcionamento do Projeto Victor, de inteligência artificial (2018). http://www.stf.jus.br/portal/cms/verNoticiaDetalhe.asp?idConteudo=388443

41.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L.u., Polosukhin, I.: Attention is all you need. In: I. Guyon, U.V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Garnett (eds.) Advances in Neural Information Processing Systems 30, pp. 5998–6008. Curran Associates, Inc. (2017). http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf

42.

Wiedemann, G., Heyer, G.: Multi-modal page stream segmentation with convolutional neural networks. Language Res. Evalu. (2019). https://doi.org/10.1007/s10579-019-09476-2CrossRef

43.

Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., Zhou, M.: LayoutLM: Pre-Training of Text and Layout for Document Image Understanding, p. 1192–1200. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3394486.3403172

44.

Xu, Y., Lv, T., Cui, L., Wang, G., Lu, Y., Florencio, D., Zhang, C., Wei, F.: Layoutxlm: Multimodal pre-training for multilingual visually-rich document understanding (2021)

45.

Xu, Y., Xu, Y., Lv, T., Cui, L., Wei, F., Wang, G., Lu, Y., Florencio, D., Zhang, C., Che, W., Zhang, M., Zhou, L.: LayoutLMv2: Multi-modal pre-training for visually-rich document understanding. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 2579–2591. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.acl-long.201

Title: Sequence-aware multimodal page classification of Brazilian legal documents
Authors: Pedro H. Luz de Araujo
Ana Paula G. S. de Almeida
Fabricio Ataides Braz
Nilton Correia da Silva
Flavio de Barros Vidal
Teofilo E. de Campos
Publication date: 12-07-2022
Publisher: Springer Berlin Heidelberg
Published in: International Journal on Document Analysis and Recognition (IJDAR) / Issue 1/2023
Print ISSN: 1433-2833
Electronic ISSN: 1433-2825
DOI: https://doi.org/10.1007/s10032-022-00406-7

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 1/2023

Pho(SC)-CTC—a hybrid approach towards zero-shot word image recognition

Open writer identification from offline handwritten signatures by jointing the one-class symbolic data analysis classifier and feature-dissimilarities

Cover-based multiple book genre recognition using an improved multimodal network

YOLO-table: disclosure document table detection with involution

Premium Partner