nach oben

International Journal on Document Analysis and Recognition (IJDAR)

Erschienen in:

01.06.2015 | Special Issue Paper

Exploiting colour information for better scene text detection and recognition

verfasst von: Muhammad Fraz, M. Saquib Sarfraz, Eran A. Edirisinghe

Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) | Ausgabe 2/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

This paper presents an approach for text detection and recognition in scene images. The main contribution of this paper is to demonstrate that the colour information within the images if efficiently exploited is good enough to identify text regions from the surrounding noise. In the same way, the colour information present in character and word images can be used to achieve significant performance improvement in the recognition of characters and words. The proposed pipeline makes use of the colour information and low-level image processing operations to enhance text information that improves the overall performance of text detection and recognition in the wild. The proposed method offers two main advantages. First, it enhances the text regions up to a level of clarity where a simple off-the-shelf feature representation and classification method achieves state-of-the-art recognition performance. Second, the proposed framework is computationally fast as compared to other text detection and recognition techniques that offer good accuracy at the cost of significantly high latency. We performed extensive experimentation to evaluate our method on challenging benchmark datasets (Chars74K, ICDAR03, ICDAR11 and SVT), and the results show a considerable performance improvement.

Vorheriger Artikel A new ring radius transform-based thinning method for multi-oriented video characters

Nächster Artikel Fast and accurate scene text understanding with image binarization and off-the-shelf OCR

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Lienhart, R., Effelsberg, W.: Automatic text segmentation and text recognition for video indexing. J. Multimed. Syst. 8, 69–81 (1998)CrossRef

Fraz, M., Zafar, I., Tzanidou, G., Edirisinghe, E.A., Sarfraz, M.S.: Human object annotation for surveillance video forensics. J. Electron. Imaging 22(4), 041115 (2013)CrossRef

Sarfraz, M.S., Shahzad, A., Elahi, Muhammad A., Fraz, M., Zafar, I., Edirisinghe, E.A.: Real-time automatic license plate recognition for CCTV forensic applications. J. Real-Time Image Process. 8(3), 285–295 (2013)CrossRef

Dumitras, T.: Eye of the Beholder: Phone-based text-recognition for the visually-impaired. In: 10th IEEE International Symposium on Wearable Computers (2006)

Huang, W., Lin, Z., Yang, J., Wang, J.: Text localization in natural images using stroke feature transform and text covariance descriptor. In: ICCV (2013)

Neumann, L., Matas, J.: Scene text localization and recognition with oriented stroke detection. In: ICCV (2013)

Ezaki, N., Bulacu, M., Schomaker, L.: Text detection from natural scene images: towards a system for visually impaired persons. Pattern Recognit. 2, 683–686 (2004)

Lucas, S.M.: Text locating competition results. In: ICDAR (2005)

Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: CVPR (2010)

10.

Shivakumara, P., Phan, T.Q., Tan, C.L.: A Laplacian approach to multi-oriented text detection in video. PAMI 33(2), 412–419 (2011)

11.

Neumann, L., Matas, J.: A method for text localization and recognition in real-world images. In: ACCV (2010)

12.

Chen, H., Tsai, S.S., Schroth, G., Chen, D.M., Grzesczuk, R., Girod, B.: Robust text detection in natural scene images with edge enhanced maximally stable extremal regions. In: ICIP (2011)

13.

de Campos, T., Babu, B., Varma, M.: Character recognition in natural images. In: VISAPP (2009)

14.

Sosa, L.P., Lucas, S.M., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R.: ICDAR2003 robust reading competition. In: ICDAR (2003)

15.

Wang, K., Babenko, B., Belongie, S.: End to end scene text recognition. In: ICCV (2011)

16.

Jain, A.K., Zhong, Y.: Page segmentation using texture analysis. Pattern Recognit. 29(5), 743–770 (1996)CrossRef

17.

Zhong, Y., Zhang, H., Jain, A.K.: Automatic caption localization in compressed video. PAMI 22(4), 385–392 (2000)CrossRef

18.

Wu, V., Manmatha, R., Riseman, E.R.: Textfinder: an automatic system to detect and recognize text in images. PAMI 21(11), 1224–1229 (1999)CrossRef

19.

Wu, V., Manmatha, R., Riseman, E.R.: Finding text in images. In: ACM Conference on Digital Libraries (1997)

20.

Sin, B., Kim, S., Cho, B.: Locating characters in scene images using frequency features. In: ICPR (2002)

21.

Mao, W., Chung, F., Lanm, K., Siu, W.: Hybrid Chinese/English text detection in images and video frames. In: ICPR (2002)

22.

Lim, Y.K., Choi, S.H., Lee, S.W.: Text extraction in MPEG compressed video for content-based indexing. In: ICPR, pp. 409412 (2000)

23.

Lee, C.W., Jung, K., Kim, H.J.: Automatic text detection and removal in video sequences. Pattern Recognit. Lett. 24(15), 2607–2623 (2003)

24.

Chen, X., Yuille, A.L.: Detecting and reading text in natural scenes. In: CVPR (2004)

25.

Ye, Q., Huang, Q., Gao, W., Zhao, D.: Fast and robust text detection in images and video frames. Image Vis. Comput. 23(6), 565–576 (2005)CrossRef

26.

Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: CVPR (2012)

27.

Mosleh, A., Bouguila, N., Hamza, A.B.: Image text detection using a bandlet-based edge detector and stroke width transform. In: BMVC (2012)

28.

Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: CVPR (2012)

29.

Shi, C., Wang, C., Xiao, B., Zhang, Y., Gao, S., Zhang, Z.: Scene text recognition using part-based tree-structured character detection. In: CVPR (2013)

30.

Mishra, A., Alahari, K., Jawahar, C.V.: Scene text recognition using higher order language priors. In: BMVC (2012)

31.

Mishra, A., Alahari, K., Jawahar, C.V.: Top-down and bottom-up cues for scene text recognition. In: CVPR (2012)

32.

Wang, K., Belongie, S.: Word spotting in the wild. In: ECCV (2010)

33.

Dalal, N., Triggs, B.: Histogram of oriented gradients for human detection. In: CVPR (2005)

34.

Sheshadri, K., Divyala, S.K.: Exemplar driven character recognition in the wild. In: BMVC (2012)

35.

Yi, C., Yang, X., Tian, Y.: Feature representations for scene text character recognition: a comparative study. In: ICDAR (2013)

36.

Lee, C., Bharadwaj, A., Di, W., Jagadeesh, V., Piramuthu, R.: Region based discriminative pooling for scene text recognition. In: CVPR (2014)

37.

Smith, D.L., Field, J., Miller, E.L.: Enforcing similarity constraints with integer programming for better scene text recognition. In: CVPR (2011)

38.

Weinmann, J., Butler, Z., Knoll, D., Field, J.: Towards integrated scene text reading. In: PAMI (2013)

39.

Bissaco, A., Cummins, M., Netzer, Y., Neven, H.: PhotoOCR: reading text in uncontrolled Conditions. In: ICCV (2013)

40.

Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: CVPR (2001)

41.

Pan, Y., Hou, X., Liu, C.: Text localization in natural scene images based on conditional random fields. In: ICDAR (2009)

42.

Yao, C., Bai, X., Shi, B., Liu, W.: Strokelets: a learned multi-scale representation for scene text recognition. In: CVPR (2014)

43.

Novikova, T., Barinoya, O., Kohli, P., Lempitsky, V.: Large-lexicon attribute-consistent text recognition in natural images. In: ECCV (2012)

44.

Milyaev, S., Barinova, O., Kohli, P., Lempitsky, V.: Image binarization for end-to-end text understanding in natural images. In: ICDAR (2013)

45.

Mishra, A., Alahari, K., Jawahar, C.V.: An MRF model for binarization of natural scene text. In: ICDAR (2011)

46.

Wakhara, T., Kita, K.: Binarization of color character strings in scene image using k-mean clustering and support vector machines. In: ICDAR (2011)

47.

Field, J.L., Miller, E.G.L.: Improving open-vocabulary scene text recognition. In: ICDAR (2013)

48.

Bianco, S., Ciocca, G., Cusanom, C., Schenttini, R.: Improving color constancy using indoor-outdoor image classification. J. Image Process. 17(12), 2381–2392 (2008)

49.

Buchsbaum, G.: A spatial processor model for object color perception. J. Franklin Inst. 310, 126 (1980)

50.

Heckbert, P.S.: Color image quantization for frame buffer display. Comput. Graph. 16(3), 297–307 (1982)

51.

Nvarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)

52.

Tomasi, C., Manduchi, R.: Bilateral filtering for gray and color images. In: ICCV (1998)

53.

Wolf, C., Jolion, J.M.: Object count/area graphs for the evaluation of object detection and segmentation algorithms. Int. J. Doc. Anal. 8(4), 280–296 (2006)CrossRef

54.

Shi, C., Wang, C., Xiao, B., Zhang, Y., Gao, S.: Scene text detection using graph model built upon maximally stable extremal regions. Pattern Recognit. 34(2), 107–116 (2013)CrossRef

55.

Shahab, A., Shafait, F., Dengel, A.: ICDAR2011 robust reading competition challenge 2: reading text in scene images. In: ICDAR (2011)

56.

Yi, C., Tian, Y.: Text extraction from scene images by character appearance and structure modelling. J. Comput. Vis. Image Underst. 117(2), 182–194 (2013)

57.

Gonzalez, A., Begasa, L., Yebes, J., Bonte, S.: Text localization in complex images. In: ICPR (2007)

58.

Yi, C., Tian, Y.: Text string detection from natural scenes by structure-based partition and grouping. In: IEEE Transaction on Image Processing, p. 25942605 (2011)

59.

Neumann, L., Matas, J.: Text localization in real world images using efficiently pruned exhaustive search. In: ICDAR (2011)

60.

Goel, V., Mishra, A., Alahari, K., Jawahar, C.V.: Whole is greater than sum of parts: recognizing scene text words. In: ICDAR, pp. 398402 (2013)

61.

Phan, T.Q., Shivakumara, P., Tian, S., Tan, C.L.: Recognizing text with perspective distortion in natural scenes. In: ICCV (2013)

Titel: Exploiting colour information for better scene text detection and recognition
verfasst von: Muhammad Fraz
M. Saquib Sarfraz
Eran A. Edirisinghe
Publikationsdatum: 01.06.2015
Verlag: Springer Berlin Heidelberg
Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) / Ausgabe 2/2015
Print ISSN: 1433-2833
Elektronische ISSN: 1433-2825
DOI: https://doi.org/10.1007/s10032-015-0239-x

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 2/2015

Preface

Scene text extraction based on edges and support vector regression

Restoring camera-captured distorted document images

Automatic diacritization of Arabic text using recurrent neural networks

A new ring radius transform-based thinning method for multi-oriented video characters

Fast and accurate scene text understanding with image binarization and off-the-shelf OCR