Skip to main content
Top

2015 | OriginalPaper | Chapter

A Machine Learning Approach to Hypothesis Decoding in Scene Text Recognition

Authors : Jindřich Libovický, Lukáš Neumann, Pavel Pecina, Jiří Matas

Published in: Computer Vision - ACCV 2014 Workshops

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Scene Text Recognition (STR) is a task of localizing and transcribing textual information captured in real-word images. With its increasing accuracy, it becomes a new source of textual data for standard Natural Language Processing tasks and poses new problems because of the specific nature of Scene Text. In this paper, we learn a string hypotheses decoding procedure in an STR pipeline using structured prediction methods that proved to be useful in automatic Speech Recognition and Machine Translation. The model allow to employ a wide range of typographical and language features into the decoding process. The proposed method is evaluated on a standard dataset and improves both character and word recognition performance over the baseline.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
We used the current version of TextSpotter available at http://​www.​textspotter.​org.
 
Literature
1.
go back to reference Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., de las Heras, L.P., et al.: ICDAR 2013 robust reading competition. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 1484–1493. IEEE (2013) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., de las Heras, L.P., et al.: ICDAR 2013 robust reading competition. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 1484–1493. IEEE (2013)
2.
go back to reference Neumann, L., Matas, J.: On combining multiple segmentations in scene text recognition. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 523–527. IEEE (2013) Neumann, L., Matas, J.: On combining multiple segmentations in scene text recognition. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 523–527. IEEE (2013)
3.
go back to reference Ghoshal, A., Jansche, M., Khudanpur, S., Riley, M., Ulinski, M.: Web-derived pronunciations. In: IEEE International Conference on Acoustics, Speech and Signal Processing, 2009. ICASSP 2009, pp. 4289–4292. IEEE (2009) Ghoshal, A., Jansche, M., Khudanpur, S., Riley, M., Ulinski, M.: Web-derived pronunciations. In: IEEE International Conference on Acoustics, Speech and Signal Processing, 2009. ICASSP 2009, pp. 4289–4292. IEEE (2009)
4.
go back to reference Bilmes, J.A.: Graphical models and automatic speech recognition. In: Johnson, M., Khudanpur, S.P., Ostendorf, M., Rosenfeld, R. (eds.) Mathematical Foundations of Speech and Language Processing, pp. 191–245. Springer, New York (2004)CrossRef Bilmes, J.A.: Graphical models and automatic speech recognition. In: Johnson, M., Khudanpur, S.P., Ostendorf, M., Rosenfeld, R. (eds.) Mathematical Foundations of Speech and Language Processing, pp. 191–245. Springer, New York (2004)CrossRef
5.
go back to reference Daumé III, H., Langford, J., Marcu, D.: Search-based structured prediction. Mach. Learn. 75, 297–325 (2009)CrossRef Daumé III, H., Langford, J., Marcu, D.: Search-based structured prediction. Mach. Learn. 75, 297–325 (2009)CrossRef
6.
go back to reference Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., et al.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 177–180. Association for Computational Linguistics (2007) Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., et al.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 177–180. Association for Computational Linguistics (2007)
7.
go back to reference Zhang, H., Zhao, K., Song, Y.Z., Guo, J.: Text extraction from natural scene image: a survey. Neurocomputing 122, 310–323 (2013)CrossRef Zhang, H., Zhao, K., Song, Y.Z., Guo, J.: Text extraction from natural scene image: a survey. Neurocomputing 122, 310–323 (2013)CrossRef
8.
go back to reference Mishra, A., Alahari, K., Jawahar, C.: Top-down and bottom-up cues for scene text recognition. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2687–2694. IEEE (2012) Mishra, A., Alahari, K., Jawahar, C.: Top-down and bottom-up cues for scene text recognition. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2687–2694. IEEE (2012)
9.
go back to reference Novikova, T., Barinova, O., Kohli, P., Lempitsky, V.: Large-lexicon attribute-consistent text recognition in natural images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VI. LNCS, vol. 7577, pp. 752–765. Springer, Heidelberg (2012) CrossRef Novikova, T., Barinova, O., Kohli, P., Lempitsky, V.: Large-lexicon attribute-consistent text recognition in natural images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VI. LNCS, vol. 7577, pp. 752–765. Springer, Heidelberg (2012) CrossRef
10.
go back to reference Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 1457–1464. IEEE (2011) Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 1457–1464. IEEE (2011)
11.
go back to reference Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), CA, USA, pp. 3538–3545. IEEE (2012) Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), CA, USA, pp. 3538–3545. IEEE (2012)
12.
go back to reference Roy, S., Roy, P.P., Shivakumara, P., Louloudis, G., Tan, C.L., Pal, U.: HMM-based multi oriented text recognition in natural scene image. In: 2013 2nd IAPR Asian Conference on Pattern Recognition (ACPR), pp. 288–292. IEEE (2013) Roy, S., Roy, P.P., Shivakumara, P., Louloudis, G., Tan, C.L., Pal, U.: HMM-based multi oriented text recognition in natural scene image. In: 2013 2nd IAPR Asian Conference on Pattern Recognition (ACPR), pp. 288–292. IEEE (2013)
13.
go back to reference Shi, C., Wang, C., Xiao, B., Zhang, Y., Gao, S., Zhang, Z.: Scene text recognition using part-based tree-structured character detection. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2961–2968. IEEE (2013) Shi, C., Wang, C., Xiao, B., Zhang, Y., Gao, S., Zhang, Z.: Scene text recognition using part-based tree-structured character detection. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2961–2968. IEEE (2013)
14.
go back to reference Bissacco, A., Cummins, M., Netzer, Y., Neven, H.: PhotoOCR: reading text in uncontrolled conditions. In: 2013 IEEE International Conference on Computer Vision (ICCV), pp. 785–792. IEEE (2013) Bissacco, A., Cummins, M., Netzer, Y., Neven, H.: PhotoOCR: reading text in uncontrolled conditions. In: 2013 IEEE International Conference on Computer Vision (ICCV), pp. 785–792. IEEE (2013)
15.
go back to reference Weinman, J., Butler, Z., Knoll, D., Feild, J.: Toward integrated scene text reading. IEEE Trans. Pattern Anal. Mach. Intell. 36, 375–387 (2014)CrossRef Weinman, J., Butler, Z., Knoll, D., Feild, J.: Toward integrated scene text reading. IEEE Trans. Pattern Anal. Mach. Intell. 36, 375–387 (2014)CrossRef
16.
go back to reference Feild, J.: Improving text recognition in images of natural scenes. Ph.D. thesis, University Massachusetts Amherst (2014) Feild, J.: Improving text recognition in images of natural scenes. Ph.D. thesis, University Massachusetts Amherst (2014)
17.
go back to reference Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newslett. 11, 10–18 (2009)CrossRef Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newslett. 11, 10–18 (2009)CrossRef
18.
go back to reference Collins, M.: Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 1–8. Association for Computational Linguistics (2002) Collins, M.: Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 1–8. Association for Computational Linguistics (2002)
19.
go back to reference Joachims, T., Finley, T., Yu, C.N.J.: Cutting-plane training of structural SVMs. Mach. Learn. 77, 27–59 (2009)CrossRefMATH Joachims, T., Finley, T., Yu, C.N.J.: Cutting-plane training of structural SVMs. Mach. Learn. 77, 27–59 (2009)CrossRefMATH
20.
go back to reference Svetnik, V., Liaw, A., Tong, C., Culberson, J.C., Sheridan, R.P., Feuston, B.P.: Random forest: a classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci. 43, 1947–1958 (2003)CrossRef Svetnik, V., Liaw, A., Tong, C., Culberson, J.C., Sheridan, R.P., Feuston, B.P.: Random forest: a classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci. 43, 1947–1958 (2003)CrossRef
Metadata
Title
A Machine Learning Approach to Hypothesis Decoding in Scene Text Recognition
Authors
Jindřich Libovický
Lukáš Neumann
Pavel Pecina
Jiří Matas
Copyright Year
2015
DOI
https://doi.org/10.1007/978-3-319-16631-5_13

Premium Partner