nach oben

International Journal on Document Analysis and Recognition (IJDAR)

Erschienen in:

25.08.2023 | Original Paper

A deep learning-based solution for digitization of invoice images with automatic invoice generation and labelling

verfasst von: Halil Arslan, Yunus Emre Işık, Yasin Görmez

Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) | Ausgabe 1/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Nowadays, the level of invoice traffic between companies has reached enormous levels. Invoices are crucial financial documents for companies, and they need to extract this information from these documents to access and control them quickly when necessary. While electronic invoices can be easily transferred to the company’s ERP system with the help of integrators, information from printed invoices must be entered into the ERP system. Information entry is generally performed manually by company employees, so the probability of error is high. The automatic recognition of information in printed invoices will reduce the possibility of error. It will also save time and money by reducing workforce requirements. This study proposes a deep learning-based solution for detecting fields in image invoices that are in high demand among businesses. The system offers an end-to-end solution, which includes a novel method for generating synthetic invoices and automatic labeling. Three invoice templates were used to evaluate the usability of the system and an adaptive fine-tuning-based solution is proposed for newly coming invoice templates. Furthermore, 6 different object detection models were compared to find the most suitable one for our problem. The system was also tested with 1022 real invoice images that were manually labeled to test real-world usage. The results indicated that the fine-tuned model achieved an accuracy that was 8.4% higher than the baseline models. In tests performed on CPU, TOOD and Cascade-RCNN models were the most successful algorithms, while YOLOv5 was the fastest running algorithm. Depending on the priority of the needs, both algorithms can be preferred for real-time usage in the detection of invoice fields. The synthetic invoice generation code is available at https://github.com/SCU-CENG/Invoice-Generation.

Vorheriger Artikel A multifaceted evaluation of representation of graphemes for practically effective Bangla OCR

Nächster Artikel Correction to: MRZ code extraction from visa and passport documents using convolutional neural networks

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nur mit Berechtigung zugänglich

Yu, W., Lu, N., Qi, X., Gong, P., Xiao, R.: PICK: processing key information extraction from documents using improved graph learning-convolutional networks. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 4363–4370. IEEE (2021)

Rusinol, M., Benkhelfallah, T., Poulain dAndecy, V.: Field extraction from administrative documents by incremental structural templates. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1100–1104. IEEE (2013)

Arslan, H.: End to end invoice processing application based on key fields extraction. IEEE Access 10, 78398–78413 (2022)CrossRef

Singh, P., Varadarajan, S., Singh, A.N., Srivastava, M.M.: Multi-domain document layout understanding using few-shot object detection. In: International Conference on Image Analysis and Recognition, pp. 89–99. Springer (2020)

Rodriguez-Cruz, R.P., Avila-Garcia, M.S., Hernandez-Luquin, M.F.: Automatic generation of printed representations of ecuadorian electronic invoices through XML data binding. J. Adv. Inf. Technol. (JAIT) 7(4) (2016). https://doi.org/10.12720/jait.7.4.271-275

Nishanth, A.: Dynamic invoicing from HTML templates using make (2022). https://pdf4me.com/blog/dynamic-invoicing-from-html-templates-using-integromat/

Castrejon, L., Kundu, K., Urtasun, R., Fidler, S.: Annotating object instances with a polygon-rnn. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5230–5238 (2017)

Acuna, D., Ling, H., Kar, A., Fidler, S.: Efficient interactive annotation of segmentation datasets with polygon-rnn++. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 859–868 (2018)

Ling, H., Gao, J., Kar, A., Chen, W., Fidler, S.: Fast interactive object annotation with curve-gcn. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5257–5266 (2019)

10.

Englbrecht, F., Ruider, I.E., Bausch, A.R.: Automatic image annotation for fluorescent cell nuclei segmentation. PLoS ONE 16(4), 0250093 (2021)CrossRef

11.

Adhikari, B., Peltomaki, J., Puura, J., Huttunen, H.: Faster bounding box annotation for object detection in indoor scenes. In: 2018 7th European Workshop on Visual Information Processing (EUVIP), pp. 1–6. IEEE (2018)

12.

Zhang, X., Zhao, C., Luo, H., Zhao, W., Zhong, S., Tang, L., Peng, J., Fan, J.: Automatic learning for object detection. Neurocomputing 484, 260–272 (2022)CrossRef

13.

Zhong, X., Tang, J., Yepes, A.J.: Publaynet: largest dataset ever for document layout analysis. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1015–1022. IEEE (2019)

14.

Baviskar, D., Ahirrao, S., Kotecha, K.: Multi-layout invoice document dataset (MIDD): a dataset for named entity recognition. Data 6(7), 78 (2021)CrossRef

15.

Baviskar, D., Ahirrao, S., Kotecha, K.: Multi-layout unstructured invoice documents dataset: a dataset for template-free invoice processing and its evaluation using AI approaches. IEEE Access 9, 101494–101512 (2021)CrossRef

16.

Kawaguchi, K., Kaelbling, L.P., Bengio, Y.: Generalization in deep learning. arXiv:1710.05468 (2017)

17.

Buslaev, A., Iglovikov, V.I., Khvedchenya, E., Parinov, A., Druzhinin, M., Kalinin, A.A.: Albumentations: fast and flexible image augmentations. Information 11(2), 125 (2020)CrossRef

18.

He, X., Zhao, K., Chu, X.: AutoML: a survey of the state-of-the-art. Knowl.-Based Syst. 212, 106622 (2021)CrossRef

19.

Khalifa, N.E., Loey, M., Mirjalili, S.: A comprehensive survey of recent trends in deep learning for digital images augmentation. Artif. Intell. Rev. 55, 2351–2377 (2022). https://doi.org/10.1007/s10462-021-10066-4

20.

Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)

21.

Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)

22.

Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. (NIPS), 28 (2015)

23.

He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)

24.

Cai, Z., Vasconcelos, N.: Cascade R-CNN: high quality object detection and instance segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 43(5), 1483–1498 (2019)CrossRef

25.

Zaidi, S.S.A., Ansari, M.S., Aslam, A., Kanwal, N., Asghar, M., Lee, B.: A survey of modern deep learning based object detection models. Digit. Signal Process. 126, 103514 (2022)CrossRef

26.

Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: unified, real-time object detection. CoRR arXiv:1506.02640 (2015)

27.

D. Thuan.: Evolution of YOLO algorithm and YOLOv5: the state-of-the-art object detection algorithm (2021)

28.

Jiang, P., Ergu, D., Liu, F., Cai, Y., Ma, B.: A review of Yolo algorithm developments. Procedia Comput. Sci. 199, 1066–1073 (2022). https://doi.org/10.1016/j.procs.2022.01.135CrossRef

29.

Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. CoRR arXiv:1804.02767 (2018)

30.

Bochkovskiy, A., Wang, C., Liao, H.M.: Yolov4: optimal speed and accuracy of object detection. CoRR arXiv:2004.10934 (2020)

31.

Jocher, G.: YOLOv5 by ultralytics. https://doi.org/10.5281/zenodo.3908559. https://github.com/ultralytics/yolov5

32.

Lin, T., Goyal, P., Girshick, R.B., He, K., Dollár, P.: Focal loss for dense object detection. CoRR arXiv:1708.02002 (2017)

33.

Kim, K., Lee, H.S.: Probabilistic anchor assignment with iou prediction for object detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) Computer Vision—ECCV 2020, pp. 355–371. Springer, Cham (2020)

34.

Feng, C., Zhong, Y., Gao, Y., Scott, M.R., Huang, W.: TOOD: task-aligned one-stage object detection. CoRR arXiv:2108.07755 (2021)

35.

Jung, A.B.: Imgaug. https://github.com/aleju/imgaug. Accessed 30 Oct 2018 (2018)

36.

Guo, Y., Shi, H., Kumar, A., Grauman, K., Rosing, T., Feris, R.S.: Spottune: transfer learning through adaptive fine-tuning. CoRR arXiv:1811.08737 (2018)

Titel: A deep learning-based solution for digitization of invoice images with automatic invoice generation and labelling
verfasst von: Halil Arslan
Yunus Emre Işık
Yasin Görmez
Publikationsdatum: 25.08.2023
Verlag: Springer Berlin Heidelberg
Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) / Ausgabe 1/2024
Print ISSN: 1433-2833
Elektronische ISSN: 1433-2825
DOI: https://doi.org/10.1007/s10032-023-00449-4

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 1/2024

A multifaceted evaluation of representation of graphemes for practically effective Bangla OCR

Chart classification: a survey and benchmarking of different state-of-the-art methods

Chinese text recognition enhanced by glyph and character semantic information

Attribute-based document image retrieval

Correction to: MRZ code extraction from visa and passport documents using convolutional neural networks

BPFormNet: a lightweight block pyramid network for form segmentation and classification

Premium Partner