Skip to main content
Erschienen in: Scientific and Technical Information Processing 5/2023

01.12.2023

Problems of Combining Multiple Text Recognition Results

verfasst von: V. V. Arlazarov

Erschienen in: Scientific and Technical Information Processing | Ausgabe 5/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, the task of combining recognition results from multiple images is considered. Systems in which such problems occur are analyzed, and some known approaches are described. It should be noted that currently there is no unified approach that could be used to solve the combination problem for increasing text recognition accuracy using multiple images or in a video stream. As an example, a comparative study of three different approaches to the combination of per-frame recognition results of identity document fields is presented, and it is demonstrated that different approaches may be advantageous for different parts of a data set, while a selection of the potential best single result still significantly outperforms all of the analyzed methods. The task of the per-frame combination of recognition results is an important component in video stream recognition systems and requires careful consideration and the formulation of general approaches that would be applicable to various domains.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Kosaraju, S.C., Masum, M., Tsaku, N.Z., Patel, P., Bayramoglu, T., Modgil, G., and Kang, M., DoT-Net: Document layout classification using texture-based CNN, 2019 Int. Conf. on Document Analysis and Recognition (ICDAR), Sydney, 2019, IEEE, 2019, pp. 1029–1034. https://doi.org/10.1109/icdar.2019.00168 Kosaraju, S.C., Masum, M., Tsaku, N.Z., Patel, P., Bayramoglu, T., Modgil, G., and Kang, M., DoT-Net: Document layout classification using texture-based CNN, 2019 Int. Conf. on Document Analysis and Recognition (ICDAR), Sydney, 2019, IEEE, 2019, pp. 1029–1034. https://​doi.​org/​10.​1109/​icdar.​2019.​00168
2.
Zurück zum Zitat He, D., Cohen, S., Price, B., Kifer, D., and Giles, C.L., Multi-scale multi-task FCN for semantic page segmentation and table detection, 2017 14th IAPR Int. Conf. on Document Analysis and Recognition (ICDAR), Kyoto, 2017, IEEE, 2017, pp. 254–261. https://doi.org/10.1109/icdar.2017.50 He, D., Cohen, S., Price, B., Kifer, D., and Giles, C.L., Multi-scale multi-task FCN for semantic page segmentation and table detection, 2017 14th IAPR Int. Conf. on Document Analysis and Recognition (ICDAR), Kyoto, 2017, IEEE, 2017, pp. 254–261. https://​doi.​org/​10.​1109/​icdar.​2017.​50
3.
Zurück zum Zitat Jia, F., Shi, C., Wang, Ya., Wang, C., and Xiao, B., Grayscale-projection based optimal character segmentation for camera-captured faint text recognition, 2017 14th IAPR Int. Conf. on Document Analysis and Recognition (ICDAR), Kyoto, 2017, IEEE, 2017, pp. 1301–1306. https://doi.org/10.1109/icdar.2017.214 Jia, F., Shi, C., Wang, Ya., Wang, C., and Xiao, B., Grayscale-projection based optimal character segmentation for camera-captured faint text recognition, 2017 14th IAPR Int. Conf. on Document Analysis and Recognition (ICDAR), Kyoto, 2017, IEEE, 2017, pp. 1301–1306. https://​doi.​org/​10.​1109/​icdar.​2017.​214
4.
Zurück zum Zitat Baek, J., Kim, G., Lee, J., Park, S., Han, D., Yun, S., Oh, S., and Lee, H., What is wrong with scene text recognition model comparisons? Dataset and model analysis, 2019 IEEE/CVF Int. Conf. on Computer Vision (ICCV), Seoul, 2019, IEEE, 2019, pp. 4714–4722. https://doi.org/10.1109/iccv.2019.00481 Baek, J., Kim, G., Lee, J., Park, S., Han, D., Yun, S., Oh, S., and Lee, H., What is wrong with scene text recognition model comparisons? Dataset and model analysis, 2019 IEEE/CVF Int. Conf. on Computer Vision (ICCV), Seoul, 2019, IEEE, 2019, pp. 4714–4722. https://​doi.​org/​10.​1109/​iccv.​2019.​00481
7.
8.
Zurück zum Zitat Wang, R., Pizer, S.M., and Frahm, J.-M., Recurrent neural network for (un-)supervised learning of monocular video visual odometry and depth, 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), Long Beach, Calif., 2019, IEEE, 2019, pp. 5555–5564. https://doi.org/10.1109/cvpr.2019.00570 Wang, R., Pizer, S.M., and Frahm, J.-M., Recurrent neural network for (un-)supervised learning of monocular video visual odometry and depth, 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), Long Beach, Calif., 2019, IEEE, 2019, pp. 5555–5564. https://​doi.​org/​10.​1109/​cvpr.​2019.​00570
14.
Zurück zum Zitat Fiscus, J.G., A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER), 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proc., Santa Barbara, Calif., 1997, IEEE, 1997, pp. 347–354. https://doi.org/10.1109/asru.1997.659110 Fiscus, J.G., A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER), 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proc., Santa Barbara, Calif., 1997, IEEE, 1997, pp. 347–354. https://​doi.​org/​10.​1109/​asru.​1997.​659110
19.
Zurück zum Zitat Zhanzhan, C., Jing, L., Yi, N., Shiliang, P., Fei, W., and Shuigeng, Z., You only recognize once: Towards fast video text spotting, 27th ACM Int. Conf., Nice, 2019, New York: Association for Computing Machinery, 2019, pp. 855–863. https://doi.org/10.1145/3343031.3351093 Zhanzhan, C., Jing, L., Yi, N., Shiliang, P., Fei, W., and Shuigeng, Z., You only recognize once: Towards fast video text spotting, 27th ACM Int. Conf., Nice, 2019, New York: Association for Computing Machinery, 2019, pp. 855–863. https://​doi.​org/​10.​1145/​3343031.​3351093
20.
Zurück zum Zitat Arlazarov, V.L., Slavin, O.A., and Farsobina, V.V., Algorithms for finding the optimal position of the images in their summation, Iskusstvennyi Intellekt Prinyatie Reshenii, 2015, no. 2, pp. 25–34. Arlazarov, V.L., Slavin, O.A., and Farsobina, V.V., Algorithms for finding the optimal position of the images in their summation, Iskusstvennyi Intellekt Prinyatie Reshenii, 2015, no. 2, pp. 25–34.
21.
26.
Zurück zum Zitat Bulatov, K.B., Selecting optimal strategy for combining per-frame character recognition results in video stream, Informatsionnye Tekhnologii i Vychislitel’nye Sistemy, ZGIHCL, 2017, no. 3, pp. 45–55. Bulatov, K.B., Selecting optimal strategy for combining per-frame character recognition results in video stream, Informatsionnye Tekhnologii i Vychislitel’nye Sistemy, ZGIHCL, 2017, no. 3, pp. 45–55.
28.
Zurück zum Zitat Zhou, Z.H., Ensemble Methods: Foundations and Algorithms, New York: Chapman and Hall/CRC, 2012.CrossRef Zhou, Z.H., Ensemble Methods: Foundations and Algorithms, New York: Chapman and Hall/CRC, 2012.CrossRef
30.
Zurück zum Zitat Buldakova, T., Slavin, O., and Putintsev, D., Algorithms of integration of results of recognition in video sequences of document fields recognizing a person, Mezhdunarodnyi Zh. Prikl. Fundam. Issled., 2017, no. 7, pp. 172–175. Buldakova, T., Slavin, O., and Putintsev, D., Algorithms of integration of results of recognition in video sequences of document fields recognizing a person, Mezhdunarodnyi Zh. Prikl. Fundam. Issled., 2017, no. 7, pp. 172–175.
31.
Zurück zum Zitat Bulatov, K.B., A method to reduce errors of string recognition based on combination of several recognition results with per-character alternatives, Bull. S. Ural State Univ. Ser. Math. Modell., Program. Comput. Software, 2019, vol. 12, no. 3, pp. 74–88. https://doi.org/10.14529/mmp190307CrossRef Bulatov, K.B., A method to reduce errors of string recognition based on combination of several recognition results with per-character alternatives, Bull. S. Ural State Univ. Ser. Math. Modell., Program. Comput. Software, 2019, vol. 12, no. 3, pp. 74–88. https://​doi.​org/​10.​14529/​mmp190307CrossRef
33.
Zurück zum Zitat Andreeva, E., Arlazarov, V.V., Slavin, O., and Janiszewski, I., Experimental modeling the flow of character recognition results in video stream for document recognition, Proc. SPIE, 2019, vol. 11041, p. 110411L. https://doi.org/10.1117/12.2522970 Andreeva, E., Arlazarov, V.V., Slavin, O., and Janiszewski, I., Experimental modeling the flow of character recognition results in video stream for document recognition, Proc. SPIE, 2019, vol. 11041, p. 110411L. https://​doi.​org/​10.​1117/​12.​2522970
34.
Zurück zum Zitat Arlazarov, V.V., Slavin, O.A., Uskov, A.V., and Janiszewski, I.M., Modelling the flow of character recognition results in video stream, Bull. S. Ural State Univ., Ser.: Math. Modell., Program. Comput. Software, 2018, vol. 11, no. 2, pp. 14–28. https://doi.org/10.14529/mmp180202CrossRef Arlazarov, V.V., Slavin, O.A., Uskov, A.V., and Janiszewski, I.M., Modelling the flow of character recognition results in video stream, Bull. S. Ural State Univ., Ser.: Math. Modell., Program. Comput. Software, 2018, vol. 11, no. 2, pp. 14–28. https://​doi.​org/​10.​14529/​mmp180202CrossRef
36.
Zurück zum Zitat Chernov, T.S., Razumnyi, N.P., Kozharinov, A.S., Nikolaev, D.P., and Arlazarov, V.V., Image quality assessment for video stream recognition systems, Inf. Tekhnol. Vychisl. Sist., 2017, no. 4, pp. 71–82. Chernov, T.S., Razumnyi, N.P., Kozharinov, A.S., Nikolaev, D.P., and Arlazarov, V.V., Image quality assessment for video stream recognition systems, Inf. Tekhnol. Vychisl. Sist., 2017, no. 4, pp. 71–82.
37.
Zurück zum Zitat Bulatov, K. and Polevoy, D., Reducing Overconfidence In Neural Networks By Dynamic Variation of Recognizer Relevance, ECMS 2015 Proceedings, Albena, Bulgaria, 2015, Mladenov, V.M., Georgieva, P., Spasov, G., and Petrova, G., Eds., ECMS, 2015, pp. 488–491. https://doi.org/10.7148/2015-0488 Bulatov, K. and Polevoy, D., Reducing Overconfidence In Neural Networks By Dynamic Variation of Recognizer Relevance, ECMS 2015 Proceedings, Albena, Bulgaria, 2015, Mladenov, V.M., Georgieva, P., Spasov, G., and Petrova, G., Eds., ECMS, 2015, pp. 488–491. https://​doi.​org/​10.​7148/​2015-0488
Metadaten
Titel
Problems of Combining Multiple Text Recognition Results
verfasst von
V. V. Arlazarov
Publikationsdatum
01.12.2023
Verlag
Pleiades Publishing
Erschienen in
Scientific and Technical Information Processing / Ausgabe 5/2023
Print ISSN: 0147-6882
Elektronische ISSN: 1934-8118
DOI
https://doi.org/10.3103/S0147688223050027

Weitere Artikel der Ausgabe 5/2023

Scientific and Technical Information Processing 5/2023 Zur Ausgabe

Premium Partner