Skip to main content
Top
Published in: Scientific and Technical Information Processing 5/2023

01-12-2023

Problems of Combining Multiple Text Recognition Results

Author: V. V. Arlazarov

Published in: Scientific and Technical Information Processing | Issue 5/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this paper, the task of combining recognition results from multiple images is considered. Systems in which such problems occur are analyzed, and some known approaches are described. It should be noted that currently there is no unified approach that could be used to solve the combination problem for increasing text recognition accuracy using multiple images or in a video stream. As an example, a comparative study of three different approaches to the combination of per-frame recognition results of identity document fields is presented, and it is demonstrated that different approaches may be advantageous for different parts of a data set, while a selection of the potential best single result still significantly outperforms all of the analyzed methods. The task of the per-frame combination of recognition results is an important component in video stream recognition systems and requires careful consideration and the formulation of general approaches that would be applicable to various domains.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Kosaraju, S.C., Masum, M., Tsaku, N.Z., Patel, P., Bayramoglu, T., Modgil, G., and Kang, M., DoT-Net: Document layout classification using texture-based CNN, 2019 Int. Conf. on Document Analysis and Recognition (ICDAR), Sydney, 2019, IEEE, 2019, pp. 1029–1034. https://doi.org/10.1109/icdar.2019.00168 Kosaraju, S.C., Masum, M., Tsaku, N.Z., Patel, P., Bayramoglu, T., Modgil, G., and Kang, M., DoT-Net: Document layout classification using texture-based CNN, 2019 Int. Conf. on Document Analysis and Recognition (ICDAR), Sydney, 2019, IEEE, 2019, pp. 1029–1034. https://​doi.​org/​10.​1109/​icdar.​2019.​00168
2.
go back to reference He, D., Cohen, S., Price, B., Kifer, D., and Giles, C.L., Multi-scale multi-task FCN for semantic page segmentation and table detection, 2017 14th IAPR Int. Conf. on Document Analysis and Recognition (ICDAR), Kyoto, 2017, IEEE, 2017, pp. 254–261. https://doi.org/10.1109/icdar.2017.50 He, D., Cohen, S., Price, B., Kifer, D., and Giles, C.L., Multi-scale multi-task FCN for semantic page segmentation and table detection, 2017 14th IAPR Int. Conf. on Document Analysis and Recognition (ICDAR), Kyoto, 2017, IEEE, 2017, pp. 254–261. https://​doi.​org/​10.​1109/​icdar.​2017.​50
3.
go back to reference Jia, F., Shi, C., Wang, Ya., Wang, C., and Xiao, B., Grayscale-projection based optimal character segmentation for camera-captured faint text recognition, 2017 14th IAPR Int. Conf. on Document Analysis and Recognition (ICDAR), Kyoto, 2017, IEEE, 2017, pp. 1301–1306. https://doi.org/10.1109/icdar.2017.214 Jia, F., Shi, C., Wang, Ya., Wang, C., and Xiao, B., Grayscale-projection based optimal character segmentation for camera-captured faint text recognition, 2017 14th IAPR Int. Conf. on Document Analysis and Recognition (ICDAR), Kyoto, 2017, IEEE, 2017, pp. 1301–1306. https://​doi.​org/​10.​1109/​icdar.​2017.​214
4.
go back to reference Baek, J., Kim, G., Lee, J., Park, S., Han, D., Yun, S., Oh, S., and Lee, H., What is wrong with scene text recognition model comparisons? Dataset and model analysis, 2019 IEEE/CVF Int. Conf. on Computer Vision (ICCV), Seoul, 2019, IEEE, 2019, pp. 4714–4722. https://doi.org/10.1109/iccv.2019.00481 Baek, J., Kim, G., Lee, J., Park, S., Han, D., Yun, S., Oh, S., and Lee, H., What is wrong with scene text recognition model comparisons? Dataset and model analysis, 2019 IEEE/CVF Int. Conf. on Computer Vision (ICCV), Seoul, 2019, IEEE, 2019, pp. 4714–4722. https://​doi.​org/​10.​1109/​iccv.​2019.​00481
7.
8.
go back to reference Wang, R., Pizer, S.M., and Frahm, J.-M., Recurrent neural network for (un-)supervised learning of monocular video visual odometry and depth, 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), Long Beach, Calif., 2019, IEEE, 2019, pp. 5555–5564. https://doi.org/10.1109/cvpr.2019.00570 Wang, R., Pizer, S.M., and Frahm, J.-M., Recurrent neural network for (un-)supervised learning of monocular video visual odometry and depth, 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), Long Beach, Calif., 2019, IEEE, 2019, pp. 5555–5564. https://​doi.​org/​10.​1109/​cvpr.​2019.​00570
14.
go back to reference Fiscus, J.G., A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER), 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proc., Santa Barbara, Calif., 1997, IEEE, 1997, pp. 347–354. https://doi.org/10.1109/asru.1997.659110 Fiscus, J.G., A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER), 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proc., Santa Barbara, Calif., 1997, IEEE, 1997, pp. 347–354. https://​doi.​org/​10.​1109/​asru.​1997.​659110
19.
20.
go back to reference Arlazarov, V.L., Slavin, O.A., and Farsobina, V.V., Algorithms for finding the optimal position of the images in their summation, Iskusstvennyi Intellekt Prinyatie Reshenii, 2015, no. 2, pp. 25–34. Arlazarov, V.L., Slavin, O.A., and Farsobina, V.V., Algorithms for finding the optimal position of the images in their summation, Iskusstvennyi Intellekt Prinyatie Reshenii, 2015, no. 2, pp. 25–34.
26.
go back to reference Bulatov, K.B., Selecting optimal strategy for combining per-frame character recognition results in video stream, Informatsionnye Tekhnologii i Vychislitel’nye Sistemy, ZGIHCL, 2017, no. 3, pp. 45–55. Bulatov, K.B., Selecting optimal strategy for combining per-frame character recognition results in video stream, Informatsionnye Tekhnologii i Vychislitel’nye Sistemy, ZGIHCL, 2017, no. 3, pp. 45–55.
28.
go back to reference Zhou, Z.H., Ensemble Methods: Foundations and Algorithms, New York: Chapman and Hall/CRC, 2012.CrossRef Zhou, Z.H., Ensemble Methods: Foundations and Algorithms, New York: Chapman and Hall/CRC, 2012.CrossRef
30.
go back to reference Buldakova, T., Slavin, O., and Putintsev, D., Algorithms of integration of results of recognition in video sequences of document fields recognizing a person, Mezhdunarodnyi Zh. Prikl. Fundam. Issled., 2017, no. 7, pp. 172–175. Buldakova, T., Slavin, O., and Putintsev, D., Algorithms of integration of results of recognition in video sequences of document fields recognizing a person, Mezhdunarodnyi Zh. Prikl. Fundam. Issled., 2017, no. 7, pp. 172–175.
31.
go back to reference Bulatov, K.B., A method to reduce errors of string recognition based on combination of several recognition results with per-character alternatives, Bull. S. Ural State Univ. Ser. Math. Modell., Program. Comput. Software, 2019, vol. 12, no. 3, pp. 74–88. https://doi.org/10.14529/mmp190307CrossRef Bulatov, K.B., A method to reduce errors of string recognition based on combination of several recognition results with per-character alternatives, Bull. S. Ural State Univ. Ser. Math. Modell., Program. Comput. Software, 2019, vol. 12, no. 3, pp. 74–88. https://​doi.​org/​10.​14529/​mmp190307CrossRef
33.
34.
go back to reference Arlazarov, V.V., Slavin, O.A., Uskov, A.V., and Janiszewski, I.M., Modelling the flow of character recognition results in video stream, Bull. S. Ural State Univ., Ser.: Math. Modell., Program. Comput. Software, 2018, vol. 11, no. 2, pp. 14–28. https://doi.org/10.14529/mmp180202CrossRef Arlazarov, V.V., Slavin, O.A., Uskov, A.V., and Janiszewski, I.M., Modelling the flow of character recognition results in video stream, Bull. S. Ural State Univ., Ser.: Math. Modell., Program. Comput. Software, 2018, vol. 11, no. 2, pp. 14–28. https://​doi.​org/​10.​14529/​mmp180202CrossRef
36.
go back to reference Chernov, T.S., Razumnyi, N.P., Kozharinov, A.S., Nikolaev, D.P., and Arlazarov, V.V., Image quality assessment for video stream recognition systems, Inf. Tekhnol. Vychisl. Sist., 2017, no. 4, pp. 71–82. Chernov, T.S., Razumnyi, N.P., Kozharinov, A.S., Nikolaev, D.P., and Arlazarov, V.V., Image quality assessment for video stream recognition systems, Inf. Tekhnol. Vychisl. Sist., 2017, no. 4, pp. 71–82.
37.
go back to reference Bulatov, K. and Polevoy, D., Reducing Overconfidence In Neural Networks By Dynamic Variation of Recognizer Relevance, ECMS 2015 Proceedings, Albena, Bulgaria, 2015, Mladenov, V.M., Georgieva, P., Spasov, G., and Petrova, G., Eds., ECMS, 2015, pp. 488–491. https://doi.org/10.7148/2015-0488 Bulatov, K. and Polevoy, D., Reducing Overconfidence In Neural Networks By Dynamic Variation of Recognizer Relevance, ECMS 2015 Proceedings, Albena, Bulgaria, 2015, Mladenov, V.M., Georgieva, P., Spasov, G., and Petrova, G., Eds., ECMS, 2015, pp. 488–491. https://​doi.​org/​10.​7148/​2015-0488
Metadata
Title
Problems of Combining Multiple Text Recognition Results
Author
V. V. Arlazarov
Publication date
01-12-2023
Publisher
Pleiades Publishing
Published in
Scientific and Technical Information Processing / Issue 5/2023
Print ISSN: 0147-6882
Electronic ISSN: 1934-8118
DOI
https://doi.org/10.3103/S0147688223050027

Other articles of this Issue 5/2023

Scientific and Technical Information Processing 5/2023 Go to the issue

Premium Partner