nach oben

International Journal on Document Analysis and Recognition (IJDAR)

Erschienen in:

02.05.2022 | Original Paper

YOLO-table: disclosure document table detection with involution

verfasst von: Daqian Zhang, Ruibin Mao, Runting Guo, Yang Jiang, Jing Zhu

Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) | Ausgabe 1/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

As financial document automation becomes more general, table detection is receiving more and more attention as an important part of document automation. Disclosure documents contain both bordered and borderless tables of varying lengths, and there is currently no model that performs well on these types of documents. To solve this problem, we propose a table detection model based on YOLO-table. We introduce involution into the backbone of the network to improve the network’s ability to learn table spatial layout features and design a simple Feature Pyramid Network to improve model effectiveness. In addition, this paper proposes a table-based augment method. We experiment on a disclosure document dataset, and the results show that the F1-measure of the YOLO-table reaches 97.3%. Compared with YOLOv3, our method improves the accuracy by 2.8% and the speed by 1.25 times. It also evaluates the ICDAR2013 and ICDAR2019 Table Competition datasets and achieves state-of-the-art performance.

Nächster Artikel Open writer identification from offline handwritten signatures by jointing the one-class symbolic data analysis classifier and feature-dissimilarities

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Li, H., Yang, Q., Cao, Y., Yao, J., et al.: Cracking tabular presentation diversity for automatic cross-checking over numerical facts. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2599–2607 (2020)

Hu, J., Kashi, R.S., Lopresti, D., et al.: Evaluating the performance of table processing algorithms. Int. J. Doc. Anal. Recogn. 4(3), 140–153 (2020)CrossRef

Dai, J., Li, Y., He, K., et al.: R-fcn: Object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems, pp. 379–387 (2016)

Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural. Inf. Process. Syst. 28, 91–99 (2015)

He, K., Gkioxari, G., Dollár, P., et al.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)

Redmon, J., Divvala, S., Girshick, R., et al.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)

Liu, W., Anguelov, D., Erhan, D., et al.: SSD: Single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37 (2016)

Lin, T. Y., Goyal, P., Girshick, R., et al.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)

Gobel, M., Hassan, T., Oro, E., et al.: Icdar 2013 table competition. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 1449–1453 (2013)

10.

Gao, L., Huang, Y., Déjean, H., et al.: Icdar 2019 competition on table detection and recognition (ctdar). In: 2019 15th International Conference on Document Analysis and Recognition (ICDAR), pp. 1510–1515 (2019)

11.

Cesarini, F., Marinai, S., Sarti, L., et al.: Trainable table location in document images. In: Object Recognition Supported by User Interaction for Service Robots vol. 3, pp. 236–240 (2002)

12.

Yildiz, B., Kaiser, K., Miksch, S.: pdf2table: A method to extract table information from pdf files. In: IICAI, pp. 1773–1785 (2005)

13.

Silva, A.C.: Learning rich hidden Markov models in document analysis: Table location. In: 2009 10th International Conference on Document Analysis and Recognition, pp. 843–847 (2009)

14.

Melinda, L., Bhagvati, C.: Parameter-free table detection method. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 454–460 (2019)

15.

He, D., Cohen, S., Price, B., et al.: Multi-scale multi-task FCN for semantic page segmentation and table detection. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 254–261 (2017)

16.

Fang, J., Tao, X., Tang, Z., et al.: Dataset, ground-truth and performance metrics for table detection evaluation. In: 2012 10th IAPR International Workshop on Document Analysis Systems (DAS), pp. 445–449 (2012)

17.

Kavasidis, I., Palazzo, S., Spampinato, C., et al.: A saliency-based convolutional neural network for table and chart detection in digitized documents. arXiv preprint arXiv:1804.06236 (2018)

18.

Gilani, A., Qasim, S. R., Malik, I., et al.: Table detection using deep learning. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 771–776 (2017)

19.

Shafait, F., Smith, R.: Table detection in heterogeneous documents. In: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, pp. 65–72 (2010)

20.

Shahab, A., Shafait, F., Kieninger, T., et al.: An open approach towards the benchmarkingof table structure recognition systems. In Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, pp. 113–120 (2010)

21.

Sun, N., Zhu, Y., Hu, X.: Faster R-CNN based table detection combining corner locating. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1314–1319 (2019)

22.

Gao, L., Yi, X., Jiang, Z., et al.: ICDAR 2017 Competition on Page Object Detection. In: 2017 14th International Conference on Document Analysis and Recognition (ICDAR), pp. 1417–1422 (2017)

23.

Huang, Y., Yan, Q., Li, Y., et al.: A YOLO-based table detection method. In 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 813–818 (2019)

24.

Redmon, J., Farhadi, A.: Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018)

25.

Zhang, X., Bai, Y., Wei, N., et al.: Cloud computer research on table detection model based on the DC-LSTM model. J. Phys. Conf. Ser. 1927(1), 012004 (2021)CrossRef

26.

Li, M., Cui, L., Huang, S., et al.: TableBank: Table Benchmark for Image-based Table Detection and Recognition. In: Proceedings of the 12th Language Resources and Evaluation (2020)

27.

Riba, P., Dutta, A., Goldmann, L., et al.: Table detection in invoice documents by graph neural networks. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 122–127 (2019)

28.

Harley, A.W., Ufkes, A., Derpanis, K.G.: Evaluation of deep convolutional nets for document image classification and retrieval. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 991–995 (2015)

29.

Li, D., Hu, J., Wang, C., et al.: Involution: Inverting the inherence of convolution for visual recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12321–12330 (2021)

30.

Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)

31.

Lin, T. Y., Dollár, P., Girshick, R., et al.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)

32.

Chen, Q., Wang, Y., Yang, T., et al.: You only look one-level feature. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13039–13048 (2021)

33.

Yu, F., Koltun, V: Multi-scale context aggregation by dilated convolutions. In: ICLR (2016)

34.

He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

35.

Zhang, S., Chi, C., Yao, Y., et al.: Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9759–9768 (2020)

36.

Rezatofighi, H., Tsoi, N., Gwak, J., et al.: Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 658–666 (2019)

37.

Khan, U., Zahid, S., Ali, M. A., et al.: TabAug: data driven augmentation for enhanced table structure recognition. In: International Conference on Document Analysis and Recognition, pp. 585–601 (2021)

38.

Shepley, A., Falzon, G., Kwan, P.: Confluence: A robust non-IoU alternative to non-maxima suppression in object detection. arXiv preprint arXiv:2012.00257 (2020)

39.

Neubeck, A., Van Gool, L: Efficient non-maximum suppression. In: 18th International Conference on Pattern Recognition (ICPR’06), vol. 3, pp. 850–855 (2006)

40.

Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)

41.

Schreiber, S., Agne, S., Wolf, I., et al.: Deepdesrt: Deep learning for detection and structure recognition of tables in document images. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1162–1167 (2017)

42.

Tran, D. N., Tran, T. A., Oh, A., et al.: Table detection from document image using vertical arrangement of text blocks. Int. J. Contents 77–85 (2015)

43.

Hao, L., Gao, L., Yi, X., et al.: A table detection method for pdf documents based on convolutional neural networks. In: 2016 12th IAPR Workshop on Document Analysis Systems (DAS), pp. 287–292 (2016)

44.

Prasad, D., Gadpal, A., Kapadni, K., et al.: ascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 572–573 (2020)

45.

SNazir, D., Hashmi, K. A., Pagani, A., et al.: HybridTabNet: Towards better table detection in scanned document images. Appl. Sci. 11(18), 8396 (2021)

46.

Zheng, X., Burdick, D., Popa, L., et al.: Global table extractor (gte): A framework for joint table identification and cell structure recognition using visual context. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 697–706 (2021)

47.

Li, J., Xu, Y., Lv, T., et al.: DiT: Self-supervised Pre-training for Document Image Transformer. arXiv preprint arXiv:2203.02378 (2022)

Titel: YOLO-table: disclosure document table detection with involution
verfasst von: Daqian Zhang
Ruibin Mao
Runting Guo
Yang Jiang
Jing Zhu
Publikationsdatum: 02.05.2022
Verlag: Springer Berlin Heidelberg
Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) / Ausgabe 1/2023
Print ISSN: 1433-2833
Elektronische ISSN: 1433-2825
DOI: https://doi.org/10.1007/s10032-022-00400-z

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 1/2023

Cover-based multiple book genre recognition using an improved multimodal network

Open writer identification from offline handwritten signatures by jointing the one-class symbolic data analysis classifier and feature-dissimilarities

Pho(SC)-CTC—a hybrid approach towards zero-shot word image recognition

Sequence-aware multimodal page classification of Brazilian legal documents

Premium Partner