Top

Scientific and Technical Information Processing

Published in:

01-12-2023

Problems of Combining Multiple Text Recognition Results

Author: V. V. Arlazarov

Published in: Scientific and Technical Information Processing | Issue 5/2023

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

In this paper, the task of combining recognition results from multiple images is considered. Systems in which such problems occur are analyzed, and some known approaches are described. It should be noted that currently there is no unified approach that could be used to solve the combination problem for increasing text recognition accuracy using multiple images or in a video stream. As an example, a comparative study of three different approaches to the combination of per-frame recognition results of identity document fields is presented, and it is demonstrated that different approaches may be advantageous for different parts of a data set, while a selection of the potential best single result still significantly outperforms all of the analyzed methods. The task of the per-frame combination of recognition results is an important component in video stream recognition systems and requires careful consideration and the formulation of general approaches that would be applicable to various domains.

previous article Efficient Bounded-Suboptimal Search for the Multiagent Pathfinding Problem

next article Representation of Syntactic Structures with Coordinating Conjunctions

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Kosaraju, S.C., Masum, M., Tsaku, N.Z., Patel, P., Bayramoglu, T., Modgil, G., and Kang, M., DoT-Net: Document layout classification using texture-based CNN, 2019 Int. Conf. on Document Analysis and Recognition (ICDAR), Sydney, 2019, IEEE, 2019, pp. 1029–1034. https://doi.org/10.1109/icdar.2019.00168

He, D., Cohen, S., Price, B., Kifer, D., and Giles, C.L., Multi-scale multi-task FCN for semantic page segmentation and table detection, 2017 14th IAPR Int. Conf. on Document Analysis and Recognition (ICDAR), Kyoto, 2017, IEEE, 2017, pp. 254–261. https://doi.org/10.1109/icdar.2017.50

Jia, F., Shi, C., Wang, Ya., Wang, C., and Xiao, B., Grayscale-projection based optimal character segmentation for camera-captured faint text recognition, 2017 14th IAPR Int. Conf. on Document Analysis and Recognition (ICDAR), Kyoto, 2017, IEEE, 2017, pp. 1301–1306. https://doi.org/10.1109/icdar.2017.214

Baek, J., Kim, G., Lee, J., Park, S., Han, D., Yun, S., Oh, S., and Lee, H., What is wrong with scene text recognition model comparisons? Dataset and model analysis, 2019 IEEE/CVF Int. Conf. on Computer Vision (ICCV), Seoul, 2019, IEEE, 2019, pp. 4714–4722. https://doi.org/10.1109/iccv.2019.00481

Li, H., Wang, S., and Kot, A.C., Image recapture detection with convolutional and recurrent neural networks, Electron. Imaging, 2017, vol. 29, no. 7, pp. 87–91. https://doi.org/10.2352/issn.2470-1173.2017.7.mwsf-329CrossRef

Yusoff, N. and Alamro, L., Implementation of feature extraction algorithms for image tampering detection, Int. J. Adv. Comput. Res., 2019, vol. 9, no. 43, pp. 197–211. https://doi.org/10.19101/ijacr.pid37CrossRef

Wemhoener, D., Yalniz, I.Z., and Manmatha, R., Creating an improved version using noisy OCR from multiple editions, 2013 12th Int. Conf. on Document Analysis and Recognition, Washington, D.C., 2013, IEEE, 2013, pp. 160–164. https://doi.org/10.1109/icdar.2013.39

Wang, R., Pizer, S.M., and Frahm, J.-M., Recurrent neural network for (un-)supervised learning of monocular video visual odometry and depth, 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), Long Beach, Calif., 2019, IEEE, 2019, pp. 5555–5564. https://doi.org/10.1109/cvpr.2019.00570

Jeong, J., Yoon, Yo.H., and Park, J.H., Reliable road scene interpretation based on itom with the integrated fusion of vehicle and lane tracker in dense traffic situation, Sensors, 2020, vol. 20, no. 9, p. 2457. https://doi.org/10.3390/s20092457ADSCrossRefPubMedPubMedCentral

10.

Bulatov, K., Fedotova, N., and Arlazarov, V.V., An approach to road scene text recognition with per-frame accumulation and dynamic stopping decision, Proc. SPIE, 2021, vol. 11605, p. 116051S. https://doi.org/10.1117/12.2586912CrossRef

11.

Bulatov, K.B., Bezmaternykh, P.V., Nikolaev, D.P., and Arlazarov, V.V., Towards a unified framework for identity documents analysis and recognition, Komp’yuternaya Opt., 2022, vol. 46, no. 3, pp. 436–454. https://doi.org/10.18287/2412-6179-CO-1024ADSCrossRef

12.

Polevoy, D., Bulatov, K., Skoryukina, N., Chernov, T., Arlazarov, V., and Sheshkus, A., Key aspects of document recognition using small digital cameras, Vestn. RFFI, 2016, no. 4, pp. 97–108. https://doi.org/10.22204/2410-4639-2016-092-04-97-108

13.

Kohonen, T., Median strings, Pattern Recognit. Lett., 1985, vol. 3, no. 5, pp. 309–313. https://doi.org/10.1016/0167-8655(85)90061-3ADSCrossRef

14.

Fiscus, J.G., A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER), 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proc., Santa Barbara, Calif., 1997, IEEE, 1997, pp. 347–354. https://doi.org/10.1109/asru.1997.659110

15.

Kittler, J., Hatef, M., Duin, R.P.W., and Matas, J., On combining classifiers, IEEE Trans. Pattern Anal. Mach. Intell., 1998, vol. 20, no. 3, pp. 226–239. https://doi.org/10.1109/34.667881CrossRef

16.

Petrovsky, A.B., Methods for the group classification of multi-attribute objects (part 1), Sci. Tech. Inf. Process., 2010, vol. 37, no. 5, pp. 346–356. https://doi.org/10.3103/S0147688210050102CrossRef

17.

Petrovsky, A.B., Methods for the group classification of multi-attribute objects (part 2), Sci. Tech. Inf. Process., 2010, vol. 37, no. 5, pp. 357–368. https://doi.org/10.3103/S0147688210050114CrossRef

18.

Polevoy, D.V., Aliev, M.A., and Nikolaev, D.P., Choosing the best image of the document owner’s photograph in the video stream on the mobile device, Proc. SPIE, 2021, vol. 11605, p. 116050F. https://doi.org/10.1117/12.2586939CrossRef

19.

Zhanzhan, C., Jing, L., Yi, N., Shiliang, P., Fei, W., and Shuigeng, Z., You only recognize once: Towards fast video text spotting, 27th ACM Int. Conf., Nice, 2019, New York: Association for Computing Machinery, 2019, pp. 855–863. https://doi.org/10.1145/3343031.3351093

20.

Arlazarov, V.L., Slavin, O.A., and Farsobina, V.V., Algorithms for finding the optimal position of the images in their summation, Iskusstvennyi Intellekt Prinyatie Reshenii, 2015, no. 2, pp. 25–34.

21.

Haris, M., Shakhnarovich, G., and Ukita, N., Recurrent back-projection network for video super-resolution, 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), Long Beach, 2019, IEEE, 2019, pp. 3897–3906. https://doi.org/10.1109/cvpr.2019.00402

22.

Mehregan, K., Ahmadyfard, A., and Khosravi, H., Super-resolution of license-plates using frames of low-resolution video, 2019 5th Iranian Conf. on Signal Processing and Intelligent Systems (ICSPIS), Shahrood, Iran, 2019, IEEE, 2019, pp. 1–6. https://doi.org/10.1109/icspis48872.2019.9066104

23.

Merino-Gracia, C. and Mirmehdi, M., Real-time text tracking in natural scenes, IET Comput. Vision, 2014, vol. 8, no. 6, pp. 670–681. https://doi.org/10.1049/iet-cvi.2013.0217CrossRef

24.

Zhang, S., Li, P., Meng, Yi., Li, L., Zhou, Q., and Fu, X., A video deblurring algorithm based on motion vector and an encorder-decoder network, IEEE Access, 2019, vol. 7, pp. 86778–86788. https://doi.org/10.1109/access.2019.2923759CrossRef

25.

Myasnikov, V.V. and Dmitriev, E.A., The accuracy dependency investigation of simultaneous localization and mapping on the errors from mobile device sensors, Komp’yuternaya Opt., 2019, vol. 43, no. 3, pp. 492–503. https://doi.org/10.18287/2412-6179-2019-43-3-492-503ADSCrossRef

26.

Bulatov, K.B., Selecting optimal strategy for combining per-frame character recognition results in video stream, Informatsionnye Tekhnologii i Vychislitel’nye Sistemy, ZGIHCL, 2017, no. 3, pp. 45–55.

27.

Polikar, R., Ensemble based systems in decision making, IEEE Circuits Syst. Mag., 2006, vol. 6, no. 3, pp. 21–45. https://doi.org/10.1109/mcas.2006.1688199CrossRef

28.

Zhou, Z.H., Ensemble Methods: Foundations and Algorithms, New York: Chapman and Hall/CRC, 2012.CrossRef

29.

Bulatov, K.B., Kirsanov, V.V., Arlazarov, V.V., Nikolaev, D.P., and Polevoy, D.V., Methods for integration the results of the documents text fields recognition in the videostream of a mobile device, Vestn. RFFI, 2016, no. 4, pp. 109–115. https://doi.org/10.22204/2410-4639-2016-092-04-109-115

30.

Buldakova, T., Slavin, O., and Putintsev, D., Algorithms of integration of results of recognition in video sequences of document fields recognizing a person, Mezhdunarodnyi Zh. Prikl. Fundam. Issled., 2017, no. 7, pp. 172–175.

31.

Bulatov, K.B., A method to reduce errors of string recognition based on combination of several recognition results with per-character alternatives, Bull. S. Ural State Univ. Ser. Math. Modell., Program. Comput. Software, 2019, vol. 12, no. 3, pp. 74–88. https://doi.org/10.14529/mmp190307CrossRef

32.

Petrova, O., Bulatov, K., and Arlazarov, V.L., Methods of weighted combination for text field recognition in a video stream, Proc. SPIE, 2020, vol. 114332, p. 114332L. https://doi.org/10.1117/12.2559378CrossRef

33.

Andreeva, E., Arlazarov, V.V., Slavin, O., and Janiszewski, I., Experimental modeling the flow of character recognition results in video stream for document recognition, Proc. SPIE, 2019, vol. 11041, p. 110411L. https://doi.org/10.1117/12.2522970

34.

Arlazarov, V.V., Slavin, O.A., Uskov, A.V., and Janiszewski, I.M., Modelling the flow of character recognition results in video stream, Bull. S. Ural State Univ., Ser.: Math. Modell., Program. Comput. Software, 2018, vol. 11, no. 2, pp. 14–28. https://doi.org/10.14529/mmp180202CrossRef

35.

Reddy, S., Mathew, M., Gomez, L., Rusinol, M., Karatzas, D., and Jawahar, C.V., RoadText-1K: Text detection & recognition dataset for driving videos, 2020 IEEE Int. Conf. on Robotics and Automation (ICRA), Paris, 2020, IEEE, 2020. https://doi.org/10.1109/icra40945.2020.9196577

36.

Chernov, T.S., Razumnyi, N.P., Kozharinov, A.S., Nikolaev, D.P., and Arlazarov, V.V., Image quality assessment for video stream recognition systems, Inf. Tekhnol. Vychisl. Sist., 2017, no. 4, pp. 71–82.

37.

Bulatov, K. and Polevoy, D., Reducing Overconfidence In Neural Networks By Dynamic Variation of Recognizer Relevance, ECMS 2015 Proceedings, Albena, Bulgaria, 2015, Mladenov, V.M., Georgieva, P., Spasov, G., and Petrova, G., Eds., ECMS, 2015, pp. 488–491. https://doi.org/10.7148/2015-0488

38.

Arlazarov, V.V., Bulatov, K., Chernov, T., and Arlazarov, V.L., MIDV-500: A dataset for identity document analysis and recognition on mobile devices in video stream, Komp’yuternaya Opt., 2019, vol. 43, no. 5, pp. 818–824. https://doi.org/10.18287/2412-6179-2019-43-5-818-824ADSCrossRef

39.

Bulatov, K., Matalov, D., and Arlazarov, V.V., MIDV-2019: Challenges of the modern mobile-based document OCR, Proc. SPIE, 2019, vol. 11433, p. 114332N. https://doi.org/10.1117/12.2558438CrossRef

40.

Yujian, L. and Bo, L., A normalized Levenshtein distance metric, IEEE Trans. Pattern Anal. Mach. Intell., 2007, vol. 29, no. 6, pp. 1091–1095. https://doi.org/10.1109/tpami.2007.1078CrossRefPubMed

41.

Chernyshova, Yu., Sheshkus, A., and Arlazarov, V., Two-step CNN framework for text line recognition in camera-captured images, IEEE Access, 2020, vol. 8, pp. 32587–32600. https://doi.org/10.1109/access.2020.2974051CrossRef

Title: Problems of Combining Multiple Text Recognition Results
Author: V. V. Arlazarov
Publication date: 01-12-2023
Publisher: Pleiades Publishing
Published in: Scientific and Technical Information Processing / Issue 5/2023
Print ISSN: 0147-6882
Electronic ISSN: 1934-8118
DOI: https://doi.org/10.3103/S0147688223050027

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Other articles of this Issue 5/2023

Representation of Syntactic Structures with Coordinating Conjunctions

Internet Traffic Prediction Model

A Method for Deepfake Detection Using Convolutional Neural Networks

The IACPaaS Platform for Developing Systems Based on Ontologies: A Decade of Use

Homonymy Resolution During Interpretation of Speech Commands by a Mobile Robot

Method of Training a Kernel Tree

Premium Partner