Skip to main content
main-content

Tipp

Weitere Kapitel dieses Buchs durch Wischen aufrufen

2021 | OriginalPaper | Buchkapitel

Multi-Type-TD-TSR – Extracting Tables from Document Images Using a Multi-stage Pipeline for Table Detection and Table Structure Recognition: From OCR to Structured Table Representations

verfasst von: Pascal Fischer, Alen Smajic, Giuseppe Abrami, Alexander Mehler

Erschienen in: KI 2021: Advances in Artificial Intelligence

Verlag: Springer International Publishing

share
TEILEN

Abstract

As global trends are shifting towards data-driven industries, the demand for automated algorithms that can convert images of scanned documents into machine readable information is rapidly growing. In addition to digitization there is an improvement toward process automation that used to require manual inspection of documents. Although optical character recognition (OCR) technologies mostly solved the task of converting human-readable characters from images, the task of extracting tables has been less focused on. This recognition consists of two sub-tasks: table detection and table structure recognition. Most prior work on this problem focuses on either task without offering an end-to-end solution or paying attention to real application conditions like rotated images or noise artefacts. Recent work shows a clear trend towards deep learning using transfer learning for table structure recognition due to the lack of sufficiently large datasets. We present a multistage pipeline named Multi-Type-TD-TSR, which offers an end-to-end solution for table recognition. It utilizes state-of-the-art deep learning models and differentiates between three types of tables based on their borders. For the table structure recognition we use a deterministic non-data driven algorithm, which works on all three types. In addition, we present an algorithm for non-bordered tables and one for bordered ones as the basis of our table structure detection algorithm. We evaluate Multi-Type-TD-TSR on a self annotated subset of the ICDAR 2019 table structure recognition dataset [5] and achieve a new state-of-the-art. Source code is available under https://​github.​com/​Psarpei/​Multi-Type-TD-TSR.
Literatur
1.
Zurück zum Zitat Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017) CrossRef Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017) CrossRef
3.
Zurück zum Zitat Cohen, W.W., Hurst, M., Jensen, L.S.: A flexible learning system for wrapping tables and lists in html documents. In: Proceedings of the 11th International Conference on World Wide Web, WWW 2002, pp. 232–241. Association for Computing Machinery, New York (2002). https://​doi.​org/​10.​1145/​511446.​511477 Cohen, W.W., Hurst, M., Jensen, L.S.: A flexible learning system for wrapping tables and lists in html documents. In: Proceedings of the 11th International Conference on World Wide Web, WWW 2002, pp. 232–241. Association for Computing Machinery, New York (2002). https://​doi.​org/​10.​1145/​511446.​511477
4.
Zurück zum Zitat Cortes, C., Vapnik, V.: Support vector machine. Mach. Learn. 20(3), 273–297 (1995) MATH Cortes, C., Vapnik, V.: Support vector machine. Mach. Learn. 20(3), 273–297 (1995) MATH
6.
Zurück zum Zitat Gatterbauer, W., Bohunsky, P., Herzog, M., Krüpl, B., Pollak, B.: Towards domain-independent information extraction from web tables. In: Proceedings of the 16th International Conference on World Wide Web, WWW 2007, pp. 71–80. Association for Computing Machinery, New York (2007). https://​doi.​org/​10.​1145/​1242572.​1242583 Gatterbauer, W., Bohunsky, P., Herzog, M., Krüpl, B., Pollak, B.: Towards domain-independent information extraction from web tables. In: Proceedings of the 16th International Conference on World Wide Web, WWW 2007, pp. 71–80. Association for Computing Machinery, New York (2007). https://​doi.​org/​10.​1145/​1242572.​1242583
7.
Zurück zum Zitat Gilani, A., Qasim, S.R., Malik, I., Shafait, F.: Table detection using deep learning. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 771–776. IEEE (2017) Gilani, A., Qasim, S.R., Malik, I., Shafait, F.: Table detection using deep learning. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 771–776. IEEE (2017)
10.
Zurück zum Zitat Kasar, T., Barlas, P., Adam, S., Chatelain, C., Paquet, T.: Learning to detect tables in scanned document images using line information. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1185–1189. IEEE (2013) Kasar, T., Barlas, P., Adam, S., Chatelain, C., Paquet, T.: Learning to detect tables in scanned document images using line information. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1185–1189. IEEE (2013)
14.
Zurück zum Zitat Prasad, D., Gadpal, A., Kapadni, K., Visave, M., Sultanpure, K.: CascadeTabNet: an approach for end to end table detection and structure recognition from image-based documents. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 572–573 (2020) Prasad, D., Gadpal, A., Kapadni, K., Visave, M., Sultanpure, K.: CascadeTabNet: an approach for end to end table detection and structure recognition from image-based documents. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 572–573 (2020)
15.
Zurück zum Zitat Pyreddi, P., Croft, W.B.: A system for retrieval in text tables. In: ACM DL (1997) Pyreddi, P., Croft, W.B.: A system for retrieval in text tables. In: ACM DL (1997)
16.
Zurück zum Zitat Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. arXiv preprint arXiv:​1506.​01497 (2015) Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. arXiv preprint arXiv:​1506.​01497 (2015)
17.
Zurück zum Zitat Reza, M.M., Bukhari, S.S., Jenckel, M., Dengel, A.: Table localization and segmentation using GAN and CNN. In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol. 5, pp. 152–157. IEEE (2019) Reza, M.M., Bukhari, S.S., Jenckel, M., Dengel, A.: Table localization and segmentation using GAN and CNN. In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol. 5, pp. 152–157. IEEE (2019)
20.
Zurück zum Zitat Schreiber, S., Agne, S., Wolf, I., Dengel, A., Ahmed, S.: DeepDeSRT: deep learning for detection and structure recognition of tables in document images. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1162–1167. IEEE (2017) Schreiber, S., Agne, S., Wolf, I., Dengel, A., Ahmed, S.: DeepDeSRT: deep learning for detection and structure recognition of tables in document images. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1162–1167. IEEE (2017)
23.
Zurück zum Zitat Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017) Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017)
Metadaten
Titel
Multi-Type-TD-TSR – Extracting Tables from Document Images Using a Multi-stage Pipeline for Table Detection and Table Structure Recognition: From OCR to Structured Table Representations
verfasst von
Pascal Fischer
Alen Smajic
Giuseppe Abrami
Alexander Mehler
Copyright-Jahr
2021
DOI
https://doi.org/10.1007/978-3-030-87626-5_8

Premium Partner