Skip to main content
Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) 2/2015

01.06.2015 | Special Issue Paper

Exploiting colour information for better scene text detection and recognition

verfasst von: Muhammad Fraz, M. Saquib Sarfraz, Eran A. Edirisinghe

Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) | Ausgabe 2/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper presents an approach for text detection and recognition in scene images. The main contribution of this paper is to demonstrate that the colour information within the images if efficiently exploited is good enough to identify text regions from the surrounding noise. In the same way, the colour information present in character and word images can be used to achieve significant performance improvement in the recognition of characters and words. The proposed pipeline makes use of the colour information and low-level image processing operations to enhance text information that improves the overall performance of text detection and recognition in the wild. The proposed method offers two main advantages. First, it enhances the text regions up to a level of clarity where a simple off-the-shelf feature representation and classification method achieves state-of-the-art recognition performance. Second, the proposed framework is computationally fast as compared to other text detection and recognition techniques that offer good accuracy at the cost of significantly high latency. We performed extensive experimentation to evaluate our method on challenging benchmark datasets (Chars74K, ICDAR03, ICDAR11 and SVT), and the results show a considerable performance improvement.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Lienhart, R., Effelsberg, W.: Automatic text segmentation and text recognition for video indexing. J. Multimed. Syst. 8, 69–81 (1998)CrossRef Lienhart, R., Effelsberg, W.: Automatic text segmentation and text recognition for video indexing. J. Multimed. Syst. 8, 69–81 (1998)CrossRef
2.
Zurück zum Zitat Fraz, M., Zafar, I., Tzanidou, G., Edirisinghe, E.A., Sarfraz, M.S.: Human object annotation for surveillance video forensics. J. Electron. Imaging 22(4), 041115 (2013)CrossRef Fraz, M., Zafar, I., Tzanidou, G., Edirisinghe, E.A., Sarfraz, M.S.: Human object annotation for surveillance video forensics. J. Electron. Imaging 22(4), 041115 (2013)CrossRef
3.
Zurück zum Zitat Sarfraz, M.S., Shahzad, A., Elahi, Muhammad A., Fraz, M., Zafar, I., Edirisinghe, E.A.: Real-time automatic license plate recognition for CCTV forensic applications. J. Real-Time Image Process. 8(3), 285–295 (2013)CrossRef Sarfraz, M.S., Shahzad, A., Elahi, Muhammad A., Fraz, M., Zafar, I., Edirisinghe, E.A.: Real-time automatic license plate recognition for CCTV forensic applications. J. Real-Time Image Process. 8(3), 285–295 (2013)CrossRef
4.
Zurück zum Zitat Dumitras, T.: Eye of the Beholder: Phone-based text-recognition for the visually-impaired. In: 10th IEEE International Symposium on Wearable Computers (2006) Dumitras, T.: Eye of the Beholder: Phone-based text-recognition for the visually-impaired. In: 10th IEEE International Symposium on Wearable Computers (2006)
5.
Zurück zum Zitat Huang, W., Lin, Z., Yang, J., Wang, J.: Text localization in natural images using stroke feature transform and text covariance descriptor. In: ICCV (2013) Huang, W., Lin, Z., Yang, J., Wang, J.: Text localization in natural images using stroke feature transform and text covariance descriptor. In: ICCV (2013)
6.
Zurück zum Zitat Neumann, L., Matas, J.: Scene text localization and recognition with oriented stroke detection. In: ICCV (2013) Neumann, L., Matas, J.: Scene text localization and recognition with oriented stroke detection. In: ICCV (2013)
7.
Zurück zum Zitat Ezaki, N., Bulacu, M., Schomaker, L.: Text detection from natural scene images: towards a system for visually impaired persons. Pattern Recognit. 2, 683–686 (2004) Ezaki, N., Bulacu, M., Schomaker, L.: Text detection from natural scene images: towards a system for visually impaired persons. Pattern Recognit. 2, 683–686 (2004)
8.
Zurück zum Zitat Lucas, S.M.: Text locating competition results. In: ICDAR (2005) Lucas, S.M.: Text locating competition results. In: ICDAR (2005)
9.
Zurück zum Zitat Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: CVPR (2010) Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: CVPR (2010)
10.
Zurück zum Zitat Shivakumara, P., Phan, T.Q., Tan, C.L.: A Laplacian approach to multi-oriented text detection in video. PAMI 33(2), 412–419 (2011) Shivakumara, P., Phan, T.Q., Tan, C.L.: A Laplacian approach to multi-oriented text detection in video. PAMI 33(2), 412–419 (2011)
11.
Zurück zum Zitat Neumann, L., Matas, J.: A method for text localization and recognition in real-world images. In: ACCV (2010) Neumann, L., Matas, J.: A method for text localization and recognition in real-world images. In: ACCV (2010)
12.
Zurück zum Zitat Chen, H., Tsai, S.S., Schroth, G., Chen, D.M., Grzesczuk, R., Girod, B.: Robust text detection in natural scene images with edge enhanced maximally stable extremal regions. In: ICIP (2011) Chen, H., Tsai, S.S., Schroth, G., Chen, D.M., Grzesczuk, R., Girod, B.: Robust text detection in natural scene images with edge enhanced maximally stable extremal regions. In: ICIP (2011)
13.
Zurück zum Zitat de Campos, T., Babu, B., Varma, M.: Character recognition in natural images. In: VISAPP (2009) de Campos, T., Babu, B., Varma, M.: Character recognition in natural images. In: VISAPP (2009)
14.
Zurück zum Zitat Sosa, L.P., Lucas, S.M., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R.: ICDAR2003 robust reading competition. In: ICDAR (2003) Sosa, L.P., Lucas, S.M., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R.: ICDAR2003 robust reading competition. In: ICDAR (2003)
15.
Zurück zum Zitat Wang, K., Babenko, B., Belongie, S.: End to end scene text recognition. In: ICCV (2011) Wang, K., Babenko, B., Belongie, S.: End to end scene text recognition. In: ICCV (2011)
16.
Zurück zum Zitat Jain, A.K., Zhong, Y.: Page segmentation using texture analysis. Pattern Recognit. 29(5), 743–770 (1996)CrossRef Jain, A.K., Zhong, Y.: Page segmentation using texture analysis. Pattern Recognit. 29(5), 743–770 (1996)CrossRef
17.
Zurück zum Zitat Zhong, Y., Zhang, H., Jain, A.K.: Automatic caption localization in compressed video. PAMI 22(4), 385–392 (2000)CrossRef Zhong, Y., Zhang, H., Jain, A.K.: Automatic caption localization in compressed video. PAMI 22(4), 385–392 (2000)CrossRef
18.
Zurück zum Zitat Wu, V., Manmatha, R., Riseman, E.R.: Textfinder: an automatic system to detect and recognize text in images. PAMI 21(11), 1224–1229 (1999)CrossRef Wu, V., Manmatha, R., Riseman, E.R.: Textfinder: an automatic system to detect and recognize text in images. PAMI 21(11), 1224–1229 (1999)CrossRef
19.
Zurück zum Zitat Wu, V., Manmatha, R., Riseman, E.R.: Finding text in images. In: ACM Conference on Digital Libraries (1997) Wu, V., Manmatha, R., Riseman, E.R.: Finding text in images. In: ACM Conference on Digital Libraries (1997)
20.
Zurück zum Zitat Sin, B., Kim, S., Cho, B.: Locating characters in scene images using frequency features. In: ICPR (2002) Sin, B., Kim, S., Cho, B.: Locating characters in scene images using frequency features. In: ICPR (2002)
21.
Zurück zum Zitat Mao, W., Chung, F., Lanm, K., Siu, W.: Hybrid Chinese/English text detection in images and video frames. In: ICPR (2002) Mao, W., Chung, F., Lanm, K., Siu, W.: Hybrid Chinese/English text detection in images and video frames. In: ICPR (2002)
22.
Zurück zum Zitat Lim, Y.K., Choi, S.H., Lee, S.W.: Text extraction in MPEG compressed video for content-based indexing. In: ICPR, pp. 409412 (2000) Lim, Y.K., Choi, S.H., Lee, S.W.: Text extraction in MPEG compressed video for content-based indexing. In: ICPR, pp. 409412 (2000)
23.
Zurück zum Zitat Lee, C.W., Jung, K., Kim, H.J.: Automatic text detection and removal in video sequences. Pattern Recognit. Lett. 24(15), 2607–2623 (2003) Lee, C.W., Jung, K., Kim, H.J.: Automatic text detection and removal in video sequences. Pattern Recognit. Lett. 24(15), 2607–2623 (2003)
24.
Zurück zum Zitat Chen, X., Yuille, A.L.: Detecting and reading text in natural scenes. In: CVPR (2004) Chen, X., Yuille, A.L.: Detecting and reading text in natural scenes. In: CVPR (2004)
25.
Zurück zum Zitat Ye, Q., Huang, Q., Gao, W., Zhao, D.: Fast and robust text detection in images and video frames. Image Vis. Comput. 23(6), 565–576 (2005)CrossRef Ye, Q., Huang, Q., Gao, W., Zhao, D.: Fast and robust text detection in images and video frames. Image Vis. Comput. 23(6), 565–576 (2005)CrossRef
26.
Zurück zum Zitat Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: CVPR (2012) Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: CVPR (2012)
27.
Zurück zum Zitat Mosleh, A., Bouguila, N., Hamza, A.B.: Image text detection using a bandlet-based edge detector and stroke width transform. In: BMVC (2012) Mosleh, A., Bouguila, N., Hamza, A.B.: Image text detection using a bandlet-based edge detector and stroke width transform. In: BMVC (2012)
28.
Zurück zum Zitat Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: CVPR (2012) Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: CVPR (2012)
29.
Zurück zum Zitat Shi, C., Wang, C., Xiao, B., Zhang, Y., Gao, S., Zhang, Z.: Scene text recognition using part-based tree-structured character detection. In: CVPR (2013) Shi, C., Wang, C., Xiao, B., Zhang, Y., Gao, S., Zhang, Z.: Scene text recognition using part-based tree-structured character detection. In: CVPR (2013)
30.
Zurück zum Zitat Mishra, A., Alahari, K., Jawahar, C.V.: Scene text recognition using higher order language priors. In: BMVC (2012) Mishra, A., Alahari, K., Jawahar, C.V.: Scene text recognition using higher order language priors. In: BMVC (2012)
31.
Zurück zum Zitat Mishra, A., Alahari, K., Jawahar, C.V.: Top-down and bottom-up cues for scene text recognition. In: CVPR (2012) Mishra, A., Alahari, K., Jawahar, C.V.: Top-down and bottom-up cues for scene text recognition. In: CVPR (2012)
32.
Zurück zum Zitat Wang, K., Belongie, S.: Word spotting in the wild. In: ECCV (2010) Wang, K., Belongie, S.: Word spotting in the wild. In: ECCV (2010)
33.
Zurück zum Zitat Dalal, N., Triggs, B.: Histogram of oriented gradients for human detection. In: CVPR (2005) Dalal, N., Triggs, B.: Histogram of oriented gradients for human detection. In: CVPR (2005)
34.
Zurück zum Zitat Sheshadri, K., Divyala, S.K.: Exemplar driven character recognition in the wild. In: BMVC (2012) Sheshadri, K., Divyala, S.K.: Exemplar driven character recognition in the wild. In: BMVC (2012)
35.
Zurück zum Zitat Yi, C., Yang, X., Tian, Y.: Feature representations for scene text character recognition: a comparative study. In: ICDAR (2013) Yi, C., Yang, X., Tian, Y.: Feature representations for scene text character recognition: a comparative study. In: ICDAR (2013)
36.
Zurück zum Zitat Lee, C., Bharadwaj, A., Di, W., Jagadeesh, V., Piramuthu, R.: Region based discriminative pooling for scene text recognition. In: CVPR (2014) Lee, C., Bharadwaj, A., Di, W., Jagadeesh, V., Piramuthu, R.: Region based discriminative pooling for scene text recognition. In: CVPR (2014)
37.
Zurück zum Zitat Smith, D.L., Field, J., Miller, E.L.: Enforcing similarity constraints with integer programming for better scene text recognition. In: CVPR (2011) Smith, D.L., Field, J., Miller, E.L.: Enforcing similarity constraints with integer programming for better scene text recognition. In: CVPR (2011)
38.
Zurück zum Zitat Weinmann, J., Butler, Z., Knoll, D., Field, J.: Towards integrated scene text reading. In: PAMI (2013) Weinmann, J., Butler, Z., Knoll, D., Field, J.: Towards integrated scene text reading. In: PAMI (2013)
39.
Zurück zum Zitat Bissaco, A., Cummins, M., Netzer, Y., Neven, H.: PhotoOCR: reading text in uncontrolled Conditions. In: ICCV (2013) Bissaco, A., Cummins, M., Netzer, Y., Neven, H.: PhotoOCR: reading text in uncontrolled Conditions. In: ICCV (2013)
40.
Zurück zum Zitat Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: CVPR (2001) Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: CVPR (2001)
41.
Zurück zum Zitat Pan, Y., Hou, X., Liu, C.: Text localization in natural scene images based on conditional random fields. In: ICDAR (2009) Pan, Y., Hou, X., Liu, C.: Text localization in natural scene images based on conditional random fields. In: ICDAR (2009)
42.
Zurück zum Zitat Yao, C., Bai, X., Shi, B., Liu, W.: Strokelets: a learned multi-scale representation for scene text recognition. In: CVPR (2014) Yao, C., Bai, X., Shi, B., Liu, W.: Strokelets: a learned multi-scale representation for scene text recognition. In: CVPR (2014)
43.
Zurück zum Zitat Novikova, T., Barinoya, O., Kohli, P., Lempitsky, V.: Large-lexicon attribute-consistent text recognition in natural images. In: ECCV (2012) Novikova, T., Barinoya, O., Kohli, P., Lempitsky, V.: Large-lexicon attribute-consistent text recognition in natural images. In: ECCV (2012)
44.
Zurück zum Zitat Milyaev, S., Barinova, O., Kohli, P., Lempitsky, V.: Image binarization for end-to-end text understanding in natural images. In: ICDAR (2013) Milyaev, S., Barinova, O., Kohli, P., Lempitsky, V.: Image binarization for end-to-end text understanding in natural images. In: ICDAR (2013)
45.
Zurück zum Zitat Mishra, A., Alahari, K., Jawahar, C.V.: An MRF model for binarization of natural scene text. In: ICDAR (2011) Mishra, A., Alahari, K., Jawahar, C.V.: An MRF model for binarization of natural scene text. In: ICDAR (2011)
46.
Zurück zum Zitat Wakhara, T., Kita, K.: Binarization of color character strings in scene image using k-mean clustering and support vector machines. In: ICDAR (2011) Wakhara, T., Kita, K.: Binarization of color character strings in scene image using k-mean clustering and support vector machines. In: ICDAR (2011)
47.
Zurück zum Zitat Field, J.L., Miller, E.G.L.: Improving open-vocabulary scene text recognition. In: ICDAR (2013) Field, J.L., Miller, E.G.L.: Improving open-vocabulary scene text recognition. In: ICDAR (2013)
48.
Zurück zum Zitat Bianco, S., Ciocca, G., Cusanom, C., Schenttini, R.: Improving color constancy using indoor-outdoor image classification. J. Image Process. 17(12), 2381–2392 (2008) Bianco, S., Ciocca, G., Cusanom, C., Schenttini, R.: Improving color constancy using indoor-outdoor image classification. J. Image Process. 17(12), 2381–2392 (2008)
49.
Zurück zum Zitat Buchsbaum, G.: A spatial processor model for object color perception. J. Franklin Inst. 310, 126 (1980) Buchsbaum, G.: A spatial processor model for object color perception. J. Franklin Inst. 310, 126 (1980)
50.
Zurück zum Zitat Heckbert, P.S.: Color image quantization for frame buffer display. Comput. Graph. 16(3), 297–307 (1982) Heckbert, P.S.: Color image quantization for frame buffer display. Comput. Graph. 16(3), 297–307 (1982)
51.
Zurück zum Zitat Nvarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001) Nvarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)
52.
Zurück zum Zitat Tomasi, C., Manduchi, R.: Bilateral filtering for gray and color images. In: ICCV (1998) Tomasi, C., Manduchi, R.: Bilateral filtering for gray and color images. In: ICCV (1998)
53.
Zurück zum Zitat Wolf, C., Jolion, J.M.: Object count/area graphs for the evaluation of object detection and segmentation algorithms. Int. J. Doc. Anal. 8(4), 280–296 (2006)CrossRef Wolf, C., Jolion, J.M.: Object count/area graphs for the evaluation of object detection and segmentation algorithms. Int. J. Doc. Anal. 8(4), 280–296 (2006)CrossRef
54.
Zurück zum Zitat Shi, C., Wang, C., Xiao, B., Zhang, Y., Gao, S.: Scene text detection using graph model built upon maximally stable extremal regions. Pattern Recognit. 34(2), 107–116 (2013)CrossRef Shi, C., Wang, C., Xiao, B., Zhang, Y., Gao, S.: Scene text detection using graph model built upon maximally stable extremal regions. Pattern Recognit. 34(2), 107–116 (2013)CrossRef
55.
Zurück zum Zitat Shahab, A., Shafait, F., Dengel, A.: ICDAR2011 robust reading competition challenge 2: reading text in scene images. In: ICDAR (2011) Shahab, A., Shafait, F., Dengel, A.: ICDAR2011 robust reading competition challenge 2: reading text in scene images. In: ICDAR (2011)
56.
Zurück zum Zitat Yi, C., Tian, Y.: Text extraction from scene images by character appearance and structure modelling. J. Comput. Vis. Image Underst. 117(2), 182–194 (2013) Yi, C., Tian, Y.: Text extraction from scene images by character appearance and structure modelling. J. Comput. Vis. Image Underst. 117(2), 182–194 (2013)
57.
Zurück zum Zitat Gonzalez, A., Begasa, L., Yebes, J., Bonte, S.: Text localization in complex images. In: ICPR (2007) Gonzalez, A., Begasa, L., Yebes, J., Bonte, S.: Text localization in complex images. In: ICPR (2007)
58.
Zurück zum Zitat Yi, C., Tian, Y.: Text string detection from natural scenes by structure-based partition and grouping. In: IEEE Transaction on Image Processing, p. 25942605 (2011) Yi, C., Tian, Y.: Text string detection from natural scenes by structure-based partition and grouping. In: IEEE Transaction on Image Processing, p. 25942605 (2011)
59.
Zurück zum Zitat Neumann, L., Matas, J.: Text localization in real world images using efficiently pruned exhaustive search. In: ICDAR (2011) Neumann, L., Matas, J.: Text localization in real world images using efficiently pruned exhaustive search. In: ICDAR (2011)
60.
Zurück zum Zitat Goel, V., Mishra, A., Alahari, K., Jawahar, C.V.: Whole is greater than sum of parts: recognizing scene text words. In: ICDAR, pp. 398402 (2013) Goel, V., Mishra, A., Alahari, K., Jawahar, C.V.: Whole is greater than sum of parts: recognizing scene text words. In: ICDAR, pp. 398402 (2013)
61.
Zurück zum Zitat Phan, T.Q., Shivakumara, P., Tian, S., Tan, C.L.: Recognizing text with perspective distortion in natural scenes. In: ICCV (2013) Phan, T.Q., Shivakumara, P., Tian, S., Tan, C.L.: Recognizing text with perspective distortion in natural scenes. In: ICCV (2013)
Metadaten
Titel
Exploiting colour information for better scene text detection and recognition
verfasst von
Muhammad Fraz
M. Saquib Sarfraz
Eran A. Edirisinghe
Publikationsdatum
01.06.2015
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal on Document Analysis and Recognition (IJDAR) / Ausgabe 2/2015
Print ISSN: 1433-2833
Elektronische ISSN: 1433-2825
DOI
https://doi.org/10.1007/s10032-015-0239-x

Weitere Artikel der Ausgabe 2/2015

International Journal on Document Analysis and Recognition (IJDAR) 2/2015 Zur Ausgabe

Editorial

Preface