Skip to main content
Erschienen in: Pattern Analysis and Applications 4/2013

01.11.2013 | Theoretical Advances

Text detection in street level images

verfasst von: Jonathan Fabrizio, Beatriz Marcotegui, Matthieu Cord

Erschienen in: Pattern Analysis and Applications | Ausgabe 4/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Text detection system for natural images is a very challenging task in Computer Vision. Image acquisition introduces distortion in terms of perspective, blurring, illumination, and characters which may have very different shape, size, and color. We introduce in this article a full text detection scheme. Our architecture is based on a new process to combine a hypothesis generation step to get potential boxes of text and a hypothesis validation step to filter false detections. The hypothesis generation process relies on a new efficient segmentation method based on a morphological operator. Regions are then filtered and classified using shape descriptors based on Fourier, Pseudo Zernike moments and an original polar descriptor, which is invariant to rotation. Classification process relies on three SVM classifiers combined in a late fusion scheme. Detected characters are finally grouped to generate our text box hypotheses. Validation step is based on a global SVM classification of the box content using dedicated descriptors adapted from the HOG approach. Results on the well-known ICDAR database are reported showing that our method is competitive. Evaluation protocol and metrics are deeply discussed and results on a very challenging street-level database are also proposed.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
f anti-extensive ⇔ \(f(X) \subset X\).
 
2
f extensive ⇔ \(X \subset f(X)\).
 
3
The smaller p, the more probably pixels are assigned to the high value areas (Fig. 5).
 
4
French national geographic institute IGN [20].
 
5
Image-based Town On-live Web Navigation and Search engine [22], a project funded by the ANR (French National Research Agency [1]) and the french consortium Cap Digital. The first goal is to allow users to navigate freely within the image flow of a city and the second is to automatically enhance cartographic databases by extracting features from this image flow.
 
6
A solution would be to do harsh annotation with three classes instead of two: one for the text, one for non-text and the last one for unreadable text. The system is not penalized, whether it detects unreadable text or not.
 
7
Following the protocol is important. Using ICDAR database, but changing the protocol can have a significant impact on the performance evaluation.
 
8
Our run gets results for the original and sub-sampled images to catch all text data sizes.
 
Literatur
2.
Zurück zum Zitat Arth C, Limberger F, Bischof H (2007) Real-time license plate recognition on an embedded DSP-platform. IEEE International Conference on Computer Vision and Pattern Recognition (CVPR ’07) pp 1–8 Arth C, Limberger F, Bischof H (2007) Real-time license plate recognition on an embedded DSP-platform. IEEE International Conference on Computer Vision and Pattern Recognition (CVPR ’07) pp 1–8
4.
Zurück zum Zitat Breen EJ, Jones R (1996) Attribute openings, thinnings, and granulometries. Comput Vis Image Underst 64(3):377–389CrossRef Breen EJ, Jones R (1996) Attribute openings, thinnings, and granulometries. Comput Vis Image Underst 64(3):377–389CrossRef
5.
Zurück zum Zitat Chehdi K, Coquin D (1991) Binarisation d’images par seuillage local optimal maximisant un critre d’homognite. GRETSI Chehdi K, Coquin D (1991) Binarisation d’images par seuillage local optimal maximisant un critre d’homognite. GRETSI
6.
Zurück zum Zitat Chen D, Odobez J, Thiran J (2004) A localization/verification scheme for finding text in images and video frames based on contrast independent features and machine learning method. Image Commun 19(3):205–217 Chen D, Odobez J, Thiran J (2004) A localization/verification scheme for finding text in images and video frames based on contrast independent features and machine learning method. Image Commun 19(3):205–217
7.
Zurück zum Zitat Chen X, Yuille AL (2004) Detecting and reading text in natural scenes. IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2:366–373. doi:10.1109/CVPR.2004.77 Chen X, Yuille AL (2004) Detecting and reading text in natural scenes. IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2:366–373. doi:10.​1109/​CVPR.​2004.​77
8.
Zurück zum Zitat Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297MATH Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297MATH
9.
Zurück zum Zitat Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE CVPR IEEE Computer Society, pp 886–893 Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE CVPR IEEE Computer Society, pp 886–893
10.
Zurück zum Zitat Ezaki N, Bulacu M, Schomaker L (2004) Text detection from natural scene images: Towards a system for visually impaired persons. In: 17th International conference on pattern recognition, vol 2, pp 683–686 Ezaki N, Bulacu M, Schomaker L (2004) Text detection from natural scene images: Towards a system for visually impaired persons. In: 17th International conference on pattern recognition, vol 2, pp 683–686
11.
Zurück zum Zitat Fabrizio J, Cord M, Marcotegui B (2009) Text extraction from street level images isprs workshop cmrt. ISPRS Workshop Fabrizio J, Cord M, Marcotegui B (2009) Text extraction from street level images isprs workshop cmrt. ISPRS Workshop
12.
Zurück zum Zitat Fabrizio J, Marcotegui B (2009) Fast implementation of the ultimate opening. International symposium on mathematical morphology. pp 272–281 Fabrizio J, Marcotegui B (2009) Fast implementation of the ultimate opening. International symposium on mathematical morphology. pp 272–281
13.
Zurück zum Zitat Fabrizio J, Marcotegui B, Cord M (2009) Text segmentation in natural scenes using toggle-mapping. 2009 IEEE International Conference on Image Processing Fabrizio J, Marcotegui B, Cord M (2009) Text segmentation in natural scenes using toggle-mapping. 2009 IEEE International Conference on Image Processing
14.
Zurück zum Zitat Garcia WC, Apostolidis X (2000) Text detection and segmentation in complex color images. In: IEEE International Conference on Acoustic Speech, Signal Processing, pp 2326–2329. IEEE Computer Society Garcia WC, Apostolidis X (2000) Text detection and segmentation in complex color images. In: IEEE International Conference on Acoustic Speech, Signal Processing, pp 2326–2329. IEEE Computer Society
15.
Zurück zum Zitat Gatos B, Ntirogiannis K, Pratikakis I (2009) Icdar 2009 document image binarization contest (dibco 2009). International Conference on Document Analysis and Recognition Gatos B, Ntirogiannis K, Pratikakis I (2009) Icdar 2009 document image binarization contest (dibco 2009). International Conference on Document Analysis and Recognition
17.
Zurück zum Zitat Gosselin P, Cord M (2008) Active learning methods for interactive image retrieval. IEEE Trans Image Process 17(7):1200–1211MathSciNetCrossRef Gosselin P, Cord M (2008) Active learning methods for interactive image retrieval. IEEE Trans Image Process 17(7):1200–1211MathSciNetCrossRef
18.
Zurück zum Zitat Hanif SM, Prevost L (2007) Texture based text detection in natural scene images—a help to blind and visually impaired persons. In: Conference on assistive technologies for people with vision and hearing impairments Hanif SM, Prevost L (2007) Texture based text detection in natural scene images—a help to blind and visually impaired persons. In: Conference on assistive technologies for people with vision and hearing impairments
24.
Zurück zum Zitat Joachims T (1999) Making large-scale svm learning practical. Advances in kernel methods: support vector learning. pp 169–184 Joachims T (1999) Making large-scale svm learning practical. Advances in kernel methods: support vector learning. pp 169–184
26.
Zurück zum Zitat Jung K, Kim K, Jain A (2004) Text information extraction in images and video: a survey. Pattern Recogn Lett 37(5):977–997CrossRef Jung K, Kim K, Jain A (2004) Text information extraction in images and video: a survey. Pattern Recogn Lett 37(5):977–997CrossRef
27.
Zurück zum Zitat Kavallieratou E, Balcan D, Popa M, Fakotakis N (2001) Handwritten text localization in skewed documents. In: International conference on image processing, pp. I: 1102–1105 Kavallieratou E, Balcan D, Popa M, Fakotakis N (2001) Handwritten text localization in skewed documents. In: International conference on image processing, pp. I: 1102–1105
28.
Zurück zum Zitat Kuncheva L (2004) Combining pattern classifiers. methods and algorithms. Wiley, Hoboken Kuncheva L (2004) Combining pattern classifiers. methods and algorithms. Wiley, Hoboken
29.
Zurück zum Zitat Liang J, Doermann D, Li H (2005) Camera-based analysis of text and documents: a survey. Int J Doc Anal Recogn 7(2–3):83 – 104 Liang J, Doermann D, Li H (2005) Camera-based analysis of text and documents: a survey. Int J Doc Anal Recogn 7(2–3):83 – 104
30.
31.
Zurück zum Zitat Liu Q, Jung C, Kim S, Moon Y, yeun Kim J (2006) Stroke filter for text localization in video images. IEEE international conference on image processing Liu Q, Jung C, Kim S, Moon Y, yeun Kim J (2006) Stroke filter for text localization in video images. IEEE international conference on image processing
32.
Zurück zum Zitat Liu X, Samarabandu J (2006) Multiscale edge based text extraction from complex images. In: International conference on multimedia expo, pp 1721–1724 Liu X, Samarabandu J (2006) Multiscale edge based text extraction from complex images. In: International conference on multimedia expo, pp 1721–1724
33.
Zurück zum Zitat Lucas S (2005) Icdar 2005 text locating competition results. Eight international conference on document analysis and recognition Lucas S (2005) Icdar 2005 text locating competition results. Eight international conference on document analysis and recognition
34.
Zurück zum Zitat Mancas-Thillou C (2006) Natural scene text understanding. Ph.D. thesis, TCTS Lab of the Facult Polytechnique de Mons, Belgium Mancas-Thillou C (2006) Natural scene text understanding. Ph.D. thesis, TCTS Lab of the Facult Polytechnique de Mons, Belgium
35.
Zurück zum Zitat Niblack W (1986) An introduction to image processing. Prentice-Hall, Englewood Cliffs Niblack W (1986) An introduction to image processing. Prentice-Hall, Englewood Cliffs
36.
Zurück zum Zitat Otsu N (1979) A threshold selection method from gray level histogram. IEEE Trans Syst Man Cybern 9:62–66CrossRef Otsu N (1979) A threshold selection method from gray level histogram. IEEE Trans Syst Man Cybern 9:62–66CrossRef
37.
Zurück zum Zitat Palumbo PW, Srihari SN, Soh J, Sridhar R, Demjanenko V (1992) Postal address block location in real time. Computer 25(7):34–42. doi:10.1109/2.144438 Palumbo PW, Srihari SN, Soh J, Sridhar R, Demjanenko V (1992) Postal address block location in real time. Computer 25(7):34–42. doi:10.​1109/​2.​144438
38.
Zurück zum Zitat Pan W, Bui TD, Suen CY (2009) Text detection from natural scene images using topographic maps and sparse representations. In: IEEE ICIP. IEEE Computer Society Pan W, Bui TD, Suen CY (2009) Text detection from natural scene images using topographic maps and sparse representations. In: IEEE ICIP. IEEE Computer Society
39.
Zurück zum Zitat Pazio M, Niedwiecki M, Kowalik R, Lebied J (2007) Text detection system for the blind. 15th european signal processing conference EUSIPCO, pp 272–276 Pazio M, Niedwiecki M, Kowalik R, Lebied J (2007) Text detection system for the blind. 15th european signal processing conference EUSIPCO, pp 272–276
40.
Zurück zum Zitat Retornaz T (2007) Détection de textes enfouis dans des bases d’images généralistes. un descripteur sémantique pour l’indexation. Ph.D. thesis, Ecole Nationale Suprieure des Mines de Paris—C.M.M., Fontainebleau Retornaz T (2007) Détection de textes enfouis dans des bases d’images généralistes. un descripteur sémantique pour l’indexation. Ph.D. thesis, Ecole Nationale Suprieure des Mines de Paris—C.M.M., Fontainebleau
41.
Zurück zum Zitat Retornaz T, Marcotegui B (2007) Scene text localization based on the ultimate opening. Int Symp Math Morphol 1:177–188 Retornaz T, Marcotegui B (2007) Scene text localization based on the ultimate opening. Int Symp Math Morphol 1:177–188
42.
Zurück zum Zitat Sauvola J, Inen MP (2000) Adaptive document image binarization. Pattern Recogn Lett 33:225–236CrossRef Sauvola J, Inen MP (2000) Adaptive document image binarization. Pattern Recogn Lett 33:225–236CrossRef
43.
Zurück zum Zitat Sauvola JJ, Seppänen T, Haapakoski S, Pietikäinen M (1997) Adaptive document binarization. In: ICDAR ’97: Proceedings of the 4th International Conference on Document Analysis and Recognition, pp 147–152. IEEE Computer Society, Washington, DC Sauvola JJ, Seppänen T, Haapakoski S, Pietikäinen M (1997) Adaptive document binarization. In: ICDAR ’97: Proceedings of the 4th International Conference on Document Analysis and Recognition, pp 147–152. IEEE Computer Society, Washington, DC
44.
Zurück zum Zitat Seeger M, Dance C (2001) Binarising camera images for ocr. Proceeding of sixth international conference on document analysis and recognition (ICDAR) Seeger M, Dance C (2001) Binarising camera images for ocr. Proceeding of sixth international conference on document analysis and recognition (ICDAR)
45.
Zurück zum Zitat Serra J (1989) Toggle mappings. From pixels to features. In: Simon JC (ed), Elsevier, North-Holland. pp 61–72 Serra J (1989) Toggle mappings. From pixels to features. In: Simon JC (ed), Elsevier, North-Holland. pp 61–72
46.
Zurück zum Zitat Sezgin M, Sankur B (2004) Survey over image thresholding techniques and quantitative performance evaluation. J Electron Imaging 13(1):146–165CrossRef Sezgin M, Sankur B (2004) Survey over image thresholding techniques and quantitative performance evaluation. J Electron Imaging 13(1):146–165CrossRef
47.
Zurück zum Zitat Shafait F, Keysers D, Breuel TM (2008) Efficient implementation of local adaptive thresholding techniques using integral images. In: Document Recognition and Retrieval XV. San Jose Shafait F, Keysers D, Breuel TM (2008) Efficient implementation of local adaptive thresholding techniques using integral images. In: Document Recognition and Retrieval XV. San Jose
48.
Zurück zum Zitat Szumilas L (2008) Scale and rotation invariant shape matching. Ph.D. thesis, Technische universität wien fakultät für informatik Szumilas L (2008) Scale and rotation invariant shape matching. Ph.D. thesis, Technische universität wien fakultät für informatik
50.
Zurück zum Zitat Viola P, Jones M (2001) Robust real-time object detection. Int J Comput Vis Viola P, Jones M (2001) Robust real-time object detection. Int J Comput Vis
51.
Zurück zum Zitat Wahl F, Wong K, Casey R (1982) Block segmentation and text extraction in mixed text/image documents. Comput Graph Image Process 20(4):375–390CrossRef Wahl F, Wong K, Casey R (1982) Block segmentation and text extraction in mixed text/image documents. Comput Graph Image Process 20(4):375–390CrossRef
52.
Zurück zum Zitat Wolf C, michel Jolion J, Chassaing F (2002) Text localization, enhancement and binarization in multimedia documents. In: Proceedings of the international conference on pattern recognition (ICPR) 2002, pp 1037–1040 Wolf C, michel Jolion J, Chassaing F (2002) Text localization, enhancement and binarization in multimedia documents. In: Proceedings of the international conference on pattern recognition (ICPR) 2002, pp 1037–1040
53.
Zurück zum Zitat Wolf C, Jolion JM (2006) Object count/area graphs for the evaluation of object detection and segmentation algorithms. IJDAR 8(4):280–296CrossRef Wolf C, Jolion JM (2006) Object count/area graphs for the evaluation of object detection and segmentation algorithms. IJDAR 8(4):280–296CrossRef
54.
Zurück zum Zitat Xiao Y, Yan H (2003) Text region extraction in a document image based on the delaunay tessellation. Pattern Recogn Lett 36(3):799–809MathSciNetCrossRefMATH Xiao Y, Yan H (2003) Text region extraction in a document image based on the delaunay tessellation. Pattern Recogn Lett 36(3):799–809MathSciNetCrossRefMATH
55.
Zurück zum Zitat Zhao XK, Lin YF, Hu Y Liu YTH (2011) Text from corners: a novel approach to detect text and caption in videos. IEEE Trans Image Process 20(3):790–799 Zhao XK, Lin YF, Hu Y Liu YTH (2011) Text from corners: a novel approach to detect text and caption in videos. IEEE Trans Image Process 20(3):790–799
56.
Zurück zum Zitat Zhu KF Qi, RJ, Xu L, Kimachi M, Wu Y, Aziwa T (2005) Using adaboost to detect and segment characters from natural scenes. In: Proceedings of CBDAR, ICDAR Workshop Zhu KF Qi, RJ, Xu L, Kimachi M, Wu Y, Aziwa T (2005) Using adaboost to detect and segment characters from natural scenes. In: Proceedings of CBDAR, ICDAR Workshop
Metadaten
Titel
Text detection in street level images
verfasst von
Jonathan Fabrizio
Beatriz Marcotegui
Matthieu Cord
Publikationsdatum
01.11.2013
Verlag
Springer London
Erschienen in
Pattern Analysis and Applications / Ausgabe 4/2013
Print ISSN: 1433-7541
Elektronische ISSN: 1433-755X
DOI
https://doi.org/10.1007/s10044-013-0329-7

Weitere Artikel der Ausgabe 4/2013

Pattern Analysis and Applications 4/2013 Zur Ausgabe