nach oben

Pattern Recognition and Image Analysis

Erschienen in:

01.12.2022 | APPLIED PROBLEMS

Hough Encoder for Machine Readable Zone Localization

verfasst von: S. Ilyuhin, A. Sheshkus, V. Arlazarov, D. Nikolaev

Erschienen in: Pattern Recognition and Image Analysis | Ausgabe 4/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In this paper, we deal with the document machine readable zone (MRZ) detection in the images taken by smartphones. These images could have low quality and projective distortions. Current solutions for the considered task are mostly image processing algorithms. All of them have precision problems in various use cases. Known neural network methods could address this issue, but impose strict requirements for computation power which is critical for on-device inference. Therefore, we propose a method that combines neural network and image processing that is both lightweight and accurate. The network is trained to process the input images and obtain the heatmap of MRZ characters. After that, we use connected components analysis to merge the characters into lines and evaluate the MRZ bounding box. The proposed neural network is a light version of the Hough encoder—architecture that was designed to work with projectively distorted images. Our network is 1.7 times smaller than the original Hough encoder and more than 100 times smaller in comparison with typical autoencoders, which makes it possible to use the proposed method in embedded devices. Experiments were held on the open synthetic dataset of the documents with 3 types of MRZ. Our results show that our method gives high quality on the test dataset, and have less trainable parameter compared to the common solution: Unet.

Vorheriger Artikel Finite Normal Mixture Models for the Ensemble Learning of Recurrent Neural Networks with Applications to Currency Pairs

Nächster Artikel ATM Cash Flow Prediction Using Local and Global Model Approaches in Cash Management Optimization

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

V. V. Arlazarov, K. B. Bulatov, T. S. Chernov, and V. L. Arlazarov, “MIDV500: A dataset for identity document analysis and recognition on mobile devices in video stream,” Komp’yut. Opt. 43, 818–824 (2019). https://doi.org/10.18287/2412-6179-2019-43-5-818-824CrossRef

V. Bessmeltsev, E. Bulushev, and N. Goloshevsky, “High-speed OCR algorithm for portable passport readers,” in Graphikon’2011, Moscow, 2011 (Grafikon, Moscow, 2011), pp. 29–32.

A. Bissacco, M. Cummins, Y. Netzer, and H. Neven, “PhotoOCR: Reading text in uncontrolled conditions,” in Proc. IEEE Int. Conf. on Computer Vision, Sydney, 2013 (IEEE, 2013), pp. 785–792. https://doi.org/10.1109/ICCV.2013.102

K. Bulatov, D. Matalov, and V. V. Arlazarov, “MIDV-2019: Challenges of the modern mobile-based document OCR,” Proc. SPIE 11433, 114332N (2020). https://doi.org/10.1117/12.2558438CrossRef

K. Bulatov, D. Polevoy, D. Ilin, and Y. S. Chernyshova, “Problems of machine-readable zone recognition captured with digital mobile cameras,” Tr. Inst. Sist. Anal. Ross. Akad. Nauk 65, 85–93 (2015).

Y. S. Chernyshova, A. V. Sheshkus, and V. V. Arlazarov, “Two-step CNN framework for text line recognition in camera-captured images,” IEEE Access 8, 32587–32600 (2020). https://doi.org/10.1109/ACCESS.2020.2974051CrossRef

H. Cho, M. Sung, and B. Jun, “Canny text detector: Fast and robust scene text localization algorithm,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016 (IEEE, 2016), pp. 3566–3573. https://doi.org/10.1109/CVPR.2016.388

I. Doc, 9303-Machine Readable Travel Documents. Part 1–2, Technical Report, (International Civil Aviation Organization, 2006).

J. Fabrizio, M. Cord, and B. Marcotegui, “Text extraction from street level images,” in CMRT09—CityModels, Roads and Traffic, Paris, 2009, Ed. by U. Stilla, F. Rottensteiner, and N. Paparoditis, pp. 199–204. https://hal.archives-ouvertes.fr/hal-00906998.

10.

L. M. González, L. M. Bergasa, J. J. Yebes, and S. Bronte, “Text location in complex images,” in Proc. 21st Int. Conf. on Pattern Recognition (ICPR2012), Tsukuba, Japan, 2012 (IEEE, 2012), pp. 617–620.

11.

A. Hartl, C. Arth, and D. Schmalstieg, “Real-time detection and recognition of machine-readable zones with mobile devices,” VISAPP, No. 3, 79–87 (2015).

12.

T. Hastie, S. Rosset, J. Zhu, and H. Zou, “Multi-class AdaBoost,” Stat. Its Interface 2, 349–360 (2009). https://doi.org/10.4310/SII.2009.v2.n3.a8MathSciNetCrossRefMATH

13.

T. He, Z. Tian, W. Huang, C. Shen, Y. Qiao, and C. Sun, “An end-to-end textspotter with explicit alignment and attention,” in IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Salt Lake City, Utah, 2018 (IEEE, 2018), pp. 5020–5029. https://doi.org/10.1109/CVPR.2018.00527

14.

D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, L. G. i Bigorda, S. R. Mestre, J. Mas, D. F. Mota, J. A. Almazan, and L. P. de las Heras, “ICDAR 2013 Robust Reading Competition,” in 12th Int. Conf. on Document Analysis and Recognition, Washington, D.C., 2013 (IEEE, 2013), pp. 1484–1493. https://doi.org/10.1109/ICDAR.2013.221

15.

S. Kolmakov, N. Skoryukina, and V. Arlazarov, “Machine-readable zones detection in images captured by mobile devices’ cameras,” Pattern Recognit. Image Anal. 30, 489–495 (2020). https://doi.org/10.1134/S105466182003013XCrossRef

16.

J.-J. Lee, P.-H. Lee, S.-W. Lee, A. Yuille, and C. Koch, “AdaBoost for text detection in natural scene,” in Int. Conf. on Document Analysis and Recognition, Beijing, 2011 (IEEE, 2011), pp. 429–434. https://doi.org/10.1109/ICDAR.2011.93

17.

Y. Liu, H. James, O. Gupta, and D. Raviv, “MRZ code extraction from visa and passport documents using convolutional neural networks,” Int. J. Doc. Anal. Recognit. 25, 29–39 (2022). https://doi.org/10.1007/s10032-021-00384-2CrossRef

18.

K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, and L. Van Gool, “A comparison of affine region detectors,” Int. J. Comput. Vision 65, 43–72 (2005). https://doi.org/10.1007/s11263-005-3848-xCrossRef

19.

E. Rainarli, “Maximally stable extremal regions and naïve Bayes to detect scene text,” IOP Conf. Ser.: Mater. Sci. Eng. 879, 012106 (2020). https://doi.org/10.1088/1757-899X/879/1/012106

20.

Z. Raisi, M. A. Naiel, P. Fieguth, S. Wardell, and J. Zelek, “Text detection and recognition in the wild: A review,” (2020). arXiv:2006.04305 [cs.CV]

21.

Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing directive 95/46/EC (general data protection regulation).

22.

O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Ed. by N. Navab, J. Hornegger, W. Wells, and A. Frangi, Lecture Notes in Computer Science, Vol. 9351 (Springer, Cham, 2015), pp. 234–241. https://doi.org/10.1007/978-3-319-24574-4_28CrossRef

23.

S. R. Safavian and D. Landgrebe, “A survey of decision tree classifier methodology,” IEEE Trans. Syst., Man, Cybern. 21, 660–674 (1991). https://doi.org/10.1109/21.97458MathSciNetCrossRef

24.

Z. Selmi, M. B. Halima, A. Wali, and A. M. Alimi, “A framework of text detection and recognition from natural images for mobile device,” Proc. SPIE 10341, 1034127 (2017). https://doi.org/10.1117/12.2268567CrossRef

25.

A. Sheshkus, A. Ingacheva, V. Arlazarov, and D. Nikolaev, “HoughNet: Neural network architecture for vanishing points detection,” in 2019 Int. Conf. on Document Analysis and Recognition (ICDAR), Sydney, 2019 (IEEE, 2019), pp. 844–849. https://doi.org/10.1109/ICDAR.2019.00140

26.

A. V. Sheshkus, D. P. Nikolaev, and V. L. Arlazarov, “Hough Encoder: neural network architecture for document image semantic segmentation,” in IEEE Int. Conf. on Image Processing (ICIP), Abu Dhabi, 2020 (IEEE, 2020), pp. 1946–1950. https://doi.org/10.1109/ICIP40778.2020.9191182

27.

N. Skoryukina, “Machine-readable zones localization method robust to capture conditions,” Tr. Inst. Sist. Anal. Ross. Akad. Nauk 67, 81–86 (2017).

28.

Z. Tian, W. Huang, T. He, P. He, and Y. Qiao, “Detecting text in natural image with connectionist text proposal network,” in Computer Vision—ECCV 2016, Ed. by B. Leibe, J. Matas, N. Sebe, and M. Welling, Lecture Notes in Computer Science, Vol. 9912 (Springer, Cham, 2016), pp. 56–72. https://doi.org/10.1007/978-3-319-46484-8_4CrossRef

29.

H. Yu, Y. Huang, L. Pi, C. Zhang, X. Li, and L. Wang, “End-to-end video text detection with online tracking,” Pattern Recognit. 113, 107791 (2021). https://doi.org/10.1016/j.patcog.2020.107791CrossRef

30.

A. Zamberletti, L. Noce, and I. Gallo, “Text localization based on fast feature pyramids and multi-resolution maximally stable extremal regions,” in Computer Vision—ACCV 2014 Workshops, Ed. by C. Jawahar and S. Shan, Lecture Notes in Computer Science, Vol. 9009 (Springer, Cham, 2014), pp. 91–105. https://doi.org/10.1007/978-3-319-16631-5_7CrossRef

31.

X. Zhou, C. Yao, H. Wen, Y. Wang, S. Zhou, W. He, and J. Liang, “EAST: An efficient and accurate scene text detector,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, 2017 (IEEE, 2017), pp. 2642–2651. https://doi.org/10.1109/CVPR.2017.283

Titel: Hough Encoder for Machine Readable Zone Localization
verfasst von: S. Ilyuhin
A. Sheshkus
V. Arlazarov
D. Nikolaev
Publikationsdatum: 01.12.2022
Verlag: Pleiades Publishing
Erschienen in: Pattern Recognition and Image Analysis / Ausgabe 4/2022
Print ISSN: 1054-6618
Elektronische ISSN: 1555-6212
DOI: https://doi.org/10.1134/S1054661822040150

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 4/2022

Deep Learning Algorithm for Maximizing the Spectral Efficiency of Wireless Systems

Finite Normal Mixture Models for the Ensemble Learning of Recurrent Neural Networks with Applications to Currency Pairs

Fast and Accurate Deep Learning Model for Stamps Detection for Embedded Devices

Subjective Restoration of Omissions in the Measurement Data of an Object of Study and Its Mathematical Model

Reduction of Video Data at Translation of a Registered Object Relative to Video Sensors Based on the Eigenbasis of Interpretation Model

ATM Cash Flow Prediction Using Local and Global Model Approaches in Cash Management Optimization

Premium Partner