Skip to main content
Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) 1/2023

02.05.2022 | Original Paper

YOLO-table: disclosure document table detection with involution

verfasst von: Daqian Zhang, Ruibin Mao, Runting Guo, Yang Jiang, Jing Zhu

Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) | Ausgabe 1/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

As financial document automation becomes more general, table detection is receiving more and more attention as an important part of document automation. Disclosure documents contain both bordered and borderless tables of varying lengths, and there is currently no model that performs well on these types of documents. To solve this problem, we propose a table detection model based on YOLO-table. We introduce involution into the backbone of the network to improve the network’s ability to learn table spatial layout features and design a simple Feature Pyramid Network to improve model effectiveness. In addition, this paper proposes a table-based augment method. We experiment on a disclosure document dataset, and the results show that the F1-measure of the YOLO-table reaches 97.3%. Compared with YOLOv3, our method improves the accuracy by 2.8% and the speed by 1.25 times. It also evaluates the ICDAR2013 and ICDAR2019 Table Competition datasets and achieves state-of-the-art performance.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Li, H., Yang, Q., Cao, Y., Yao, J., et al.: Cracking tabular presentation diversity for automatic cross-checking over numerical facts. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2599–2607 (2020) Li, H., Yang, Q., Cao, Y., Yao, J., et al.: Cracking tabular presentation diversity for automatic cross-checking over numerical facts. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2599–2607 (2020)
2.
Zurück zum Zitat Hu, J., Kashi, R.S., Lopresti, D., et al.: Evaluating the performance of table processing algorithms. Int. J. Doc. Anal. Recogn. 4(3), 140–153 (2020)CrossRef Hu, J., Kashi, R.S., Lopresti, D., et al.: Evaluating the performance of table processing algorithms. Int. J. Doc. Anal. Recogn. 4(3), 140–153 (2020)CrossRef
3.
Zurück zum Zitat Dai, J., Li, Y., He, K., et al.: R-fcn: Object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems, pp. 379–387 (2016) Dai, J., Li, Y., He, K., et al.: R-fcn: Object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems, pp. 379–387 (2016)
4.
Zurück zum Zitat Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural. Inf. Process. Syst. 28, 91–99 (2015) Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural. Inf. Process. Syst. 28, 91–99 (2015)
5.
Zurück zum Zitat He, K., Gkioxari, G., Dollár, P., et al.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017) He, K., Gkioxari, G., Dollár, P., et al.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
6.
Zurück zum Zitat Redmon, J., Divvala, S., Girshick, R., et al.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016) Redmon, J., Divvala, S., Girshick, R., et al.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
7.
Zurück zum Zitat Liu, W., Anguelov, D., Erhan, D., et al.: SSD: Single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37 (2016) Liu, W., Anguelov, D., Erhan, D., et al.: SSD: Single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37 (2016)
8.
Zurück zum Zitat Lin, T. Y., Goyal, P., Girshick, R., et al.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017) Lin, T. Y., Goyal, P., Girshick, R., et al.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
9.
Zurück zum Zitat Gobel, M., Hassan, T., Oro, E., et al.: Icdar 2013 table competition. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 1449–1453 (2013) Gobel, M., Hassan, T., Oro, E., et al.: Icdar 2013 table competition. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 1449–1453 (2013)
10.
Zurück zum Zitat Gao, L., Huang, Y., Déjean, H., et al.: Icdar 2019 competition on table detection and recognition (ctdar). In: 2019 15th International Conference on Document Analysis and Recognition (ICDAR), pp. 1510–1515 (2019) Gao, L., Huang, Y., Déjean, H., et al.: Icdar 2019 competition on table detection and recognition (ctdar). In: 2019 15th International Conference on Document Analysis and Recognition (ICDAR), pp. 1510–1515 (2019)
11.
Zurück zum Zitat Cesarini, F., Marinai, S., Sarti, L., et al.: Trainable table location in document images. In: Object Recognition Supported by User Interaction for Service Robots vol. 3, pp. 236–240 (2002) Cesarini, F., Marinai, S., Sarti, L., et al.: Trainable table location in document images. In: Object Recognition Supported by User Interaction for Service Robots vol. 3, pp. 236–240 (2002)
12.
Zurück zum Zitat Yildiz, B., Kaiser, K., Miksch, S.: pdf2table: A method to extract table information from pdf files. In: IICAI, pp. 1773–1785 (2005) Yildiz, B., Kaiser, K., Miksch, S.: pdf2table: A method to extract table information from pdf files. In: IICAI, pp. 1773–1785 (2005)
13.
Zurück zum Zitat Silva, A.C.: Learning rich hidden Markov models in document analysis: Table location. In: 2009 10th International Conference on Document Analysis and Recognition, pp. 843–847 (2009) Silva, A.C.: Learning rich hidden Markov models in document analysis: Table location. In: 2009 10th International Conference on Document Analysis and Recognition, pp. 843–847 (2009)
14.
Zurück zum Zitat Melinda, L., Bhagvati, C.: Parameter-free table detection method. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 454–460 (2019) Melinda, L., Bhagvati, C.: Parameter-free table detection method. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 454–460 (2019)
15.
Zurück zum Zitat He, D., Cohen, S., Price, B., et al.: Multi-scale multi-task FCN for semantic page segmentation and table detection. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 254–261 (2017) He, D., Cohen, S., Price, B., et al.: Multi-scale multi-task FCN for semantic page segmentation and table detection. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 254–261 (2017)
16.
Zurück zum Zitat Fang, J., Tao, X., Tang, Z., et al.: Dataset, ground-truth and performance metrics for table detection evaluation. In: 2012 10th IAPR International Workshop on Document Analysis Systems (DAS), pp. 445–449 (2012) Fang, J., Tao, X., Tang, Z., et al.: Dataset, ground-truth and performance metrics for table detection evaluation. In: 2012 10th IAPR International Workshop on Document Analysis Systems (DAS), pp. 445–449 (2012)
17.
Zurück zum Zitat Kavasidis, I., Palazzo, S., Spampinato, C., et al.: A saliency-based convolutional neural network for table and chart detection in digitized documents. arXiv preprint arXiv:1804.06236 (2018) Kavasidis, I., Palazzo, S., Spampinato, C., et al.: A saliency-based convolutional neural network for table and chart detection in digitized documents. arXiv preprint arXiv:​1804.​06236 (2018)
18.
Zurück zum Zitat Gilani, A., Qasim, S. R., Malik, I., et al.: Table detection using deep learning. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 771–776 (2017) Gilani, A., Qasim, S. R., Malik, I., et al.: Table detection using deep learning. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 771–776 (2017)
19.
Zurück zum Zitat Shafait, F., Smith, R.: Table detection in heterogeneous documents. In: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, pp. 65–72 (2010) Shafait, F., Smith, R.: Table detection in heterogeneous documents. In: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, pp. 65–72 (2010)
20.
Zurück zum Zitat Shahab, A., Shafait, F., Kieninger, T., et al.: An open approach towards the benchmarkingof table structure recognition systems. In Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, pp. 113–120 (2010) Shahab, A., Shafait, F., Kieninger, T., et al.: An open approach towards the benchmarkingof table structure recognition systems. In Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, pp. 113–120 (2010)
21.
Zurück zum Zitat Sun, N., Zhu, Y., Hu, X.: Faster R-CNN based table detection combining corner locating. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1314–1319 (2019) Sun, N., Zhu, Y., Hu, X.: Faster R-CNN based table detection combining corner locating. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1314–1319 (2019)
22.
Zurück zum Zitat Gao, L., Yi, X., Jiang, Z., et al.: ICDAR 2017 Competition on Page Object Detection. In: 2017 14th International Conference on Document Analysis and Recognition (ICDAR), pp. 1417–1422 (2017) Gao, L., Yi, X., Jiang, Z., et al.: ICDAR 2017 Competition on Page Object Detection. In: 2017 14th International Conference on Document Analysis and Recognition (ICDAR), pp. 1417–1422 (2017)
23.
Zurück zum Zitat Huang, Y., Yan, Q., Li, Y., et al.: A YOLO-based table detection method. In 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 813–818 (2019) Huang, Y., Yan, Q., Li, Y., et al.: A YOLO-based table detection method. In 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 813–818 (2019)
25.
Zurück zum Zitat Zhang, X., Bai, Y., Wei, N., et al.: Cloud computer research on table detection model based on the DC-LSTM model. J. Phys. Conf. Ser. 1927(1), 012004 (2021)CrossRef Zhang, X., Bai, Y., Wei, N., et al.: Cloud computer research on table detection model based on the DC-LSTM model. J. Phys. Conf. Ser. 1927(1), 012004 (2021)CrossRef
26.
Zurück zum Zitat Li, M., Cui, L., Huang, S., et al.: TableBank: Table Benchmark for Image-based Table Detection and Recognition. In: Proceedings of the 12th Language Resources and Evaluation (2020) Li, M., Cui, L., Huang, S., et al.: TableBank: Table Benchmark for Image-based Table Detection and Recognition. In: Proceedings of the 12th Language Resources and Evaluation (2020)
27.
Zurück zum Zitat Riba, P., Dutta, A., Goldmann, L., et al.: Table detection in invoice documents by graph neural networks. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 122–127 (2019) Riba, P., Dutta, A., Goldmann, L., et al.: Table detection in invoice documents by graph neural networks. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 122–127 (2019)
28.
Zurück zum Zitat Harley, A.W., Ufkes, A., Derpanis, K.G.: Evaluation of deep convolutional nets for document image classification and retrieval. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 991–995 (2015) Harley, A.W., Ufkes, A., Derpanis, K.G.: Evaluation of deep convolutional nets for document image classification and retrieval. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 991–995 (2015)
29.
Zurück zum Zitat Li, D., Hu, J., Wang, C., et al.: Involution: Inverting the inherence of convolution for visual recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12321–12330 (2021) Li, D., Hu, J., Wang, C., et al.: Involution: Inverting the inherence of convolution for visual recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12321–12330 (2021)
30.
Zurück zum Zitat Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017) Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
31.
Zurück zum Zitat Lin, T. Y., Dollár, P., Girshick, R., et al.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017) Lin, T. Y., Dollár, P., Girshick, R., et al.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
32.
Zurück zum Zitat Chen, Q., Wang, Y., Yang, T., et al.: You only look one-level feature. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13039–13048 (2021) Chen, Q., Wang, Y., Yang, T., et al.: You only look one-level feature. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13039–13048 (2021)
33.
Zurück zum Zitat Yu, F., Koltun, V: Multi-scale context aggregation by dilated convolutions. In: ICLR (2016) Yu, F., Koltun, V: Multi-scale context aggregation by dilated convolutions. In: ICLR (2016)
34.
Zurück zum Zitat He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
35.
Zurück zum Zitat Zhang, S., Chi, C., Yao, Y., et al.: Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9759–9768 (2020) Zhang, S., Chi, C., Yao, Y., et al.: Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9759–9768 (2020)
36.
Zurück zum Zitat Rezatofighi, H., Tsoi, N., Gwak, J., et al.: Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 658–666 (2019) Rezatofighi, H., Tsoi, N., Gwak, J., et al.: Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 658–666 (2019)
37.
Zurück zum Zitat Khan, U., Zahid, S., Ali, M. A., et al.: TabAug: data driven augmentation for enhanced table structure recognition. In: International Conference on Document Analysis and Recognition, pp. 585–601 (2021) Khan, U., Zahid, S., Ali, M. A., et al.: TabAug: data driven augmentation for enhanced table structure recognition. In: International Conference on Document Analysis and Recognition, pp. 585–601 (2021)
38.
Zurück zum Zitat Shepley, A., Falzon, G., Kwan, P.: Confluence: A robust non-IoU alternative to non-maxima suppression in object detection. arXiv preprint arXiv:2012.00257 (2020) Shepley, A., Falzon, G., Kwan, P.: Confluence: A robust non-IoU alternative to non-maxima suppression in object detection. arXiv preprint arXiv:​2012.​00257 (2020)
39.
Zurück zum Zitat Neubeck, A., Van Gool, L: Efficient non-maximum suppression. In: 18th International Conference on Pattern Recognition (ICPR’06), vol. 3, pp. 850–855 (2006) Neubeck, A., Van Gool, L: Efficient non-maximum suppression. In: 18th International Conference on Pattern Recognition (ICPR’06), vol. 3, pp. 850–855 (2006)
40.
Zurück zum Zitat Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020) Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:​2004.​10934 (2020)
41.
Zurück zum Zitat Schreiber, S., Agne, S., Wolf, I., et al.: Deepdesrt: Deep learning for detection and structure recognition of tables in document images. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1162–1167 (2017) Schreiber, S., Agne, S., Wolf, I., et al.: Deepdesrt: Deep learning for detection and structure recognition of tables in document images. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1162–1167 (2017)
42.
Zurück zum Zitat Tran, D. N., Tran, T. A., Oh, A., et al.: Table detection from document image using vertical arrangement of text blocks. Int. J. Contents 77–85 (2015) Tran, D. N., Tran, T. A., Oh, A., et al.: Table detection from document image using vertical arrangement of text blocks. Int. J. Contents 77–85 (2015)
43.
Zurück zum Zitat Hao, L., Gao, L., Yi, X., et al.: A table detection method for pdf documents based on convolutional neural networks. In: 2016 12th IAPR Workshop on Document Analysis Systems (DAS), pp. 287–292 (2016) Hao, L., Gao, L., Yi, X., et al.: A table detection method for pdf documents based on convolutional neural networks. In: 2016 12th IAPR Workshop on Document Analysis Systems (DAS), pp. 287–292 (2016)
44.
Zurück zum Zitat Prasad, D., Gadpal, A., Kapadni, K., et al.: ascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 572–573 (2020) Prasad, D., Gadpal, A., Kapadni, K., et al.: ascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 572–573 (2020)
45.
Zurück zum Zitat SNazir, D., Hashmi, K. A., Pagani, A., et al.: HybridTabNet: Towards better table detection in scanned document images. Appl. Sci. 11(18), 8396 (2021) SNazir, D., Hashmi, K. A., Pagani, A., et al.: HybridTabNet: Towards better table detection in scanned document images. Appl. Sci. 11(18), 8396 (2021)
46.
Zurück zum Zitat Zheng, X., Burdick, D., Popa, L., et al.: Global table extractor (gte): A framework for joint table identification and cell structure recognition using visual context. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 697–706 (2021) Zheng, X., Burdick, D., Popa, L., et al.: Global table extractor (gte): A framework for joint table identification and cell structure recognition using visual context. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 697–706 (2021)
47.
Zurück zum Zitat Li, J., Xu, Y., Lv, T., et al.: DiT: Self-supervised Pre-training for Document Image Transformer. arXiv preprint arXiv:2203.02378 (2022) Li, J., Xu, Y., Lv, T., et al.: DiT: Self-supervised Pre-training for Document Image Transformer. arXiv preprint arXiv:​2203.​02378 (2022)
Metadaten
Titel
YOLO-table: disclosure document table detection with involution
verfasst von
Daqian Zhang
Ruibin Mao
Runting Guo
Yang Jiang
Jing Zhu
Publikationsdatum
02.05.2022
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal on Document Analysis and Recognition (IJDAR) / Ausgabe 1/2023
Print ISSN: 1433-2833
Elektronische ISSN: 1433-2825
DOI
https://doi.org/10.1007/s10032-022-00400-z

Weitere Artikel der Ausgabe 1/2023

International Journal on Document Analysis and Recognition (IJDAR) 1/2023 Zur Ausgabe

Premium Partner