Skip to main content
Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) 2/2015

01.06.2015 | Special Issue Paper

Scene text extraction based on edges and support vector regression

verfasst von: Shijian Lu, Tao Chen, Shangxuan Tian, Joo-Hwee Lim, Chew-Lim Tan

Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) | Ausgabe 2/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper presents a scene text extraction technique that automatically detects and segments texts from scene images. Three text-specific features are designed over image edges with which a set of candidate text boundaries is first detected. For each detected candidate text boundary, one or more candidate characters are then extracted by using a local threshold that is estimated based on the surrounding image pixels. The real characters and words are finally identified by a support vector regression model that is trained using bags-of-words representation. The proposed technique has been evaluated over the latest ICDAR-2013 Robust Reading Competition dataset. Experiments show that it obtains superior F-measures of 78.19 % and 75.24 % (on atom level), respectively, for the scene text detection and segmentation tasks.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Niblack, W.: An Introduction to Digital Image Processing. Prentice-Hall, Englewood (1986) Niblack, W.: An Introduction to Digital Image Processing. Prentice-Hall, Englewood (1986)
2.
Zurück zum Zitat Liang, J., Doermann, D., Li, H.: Camera-based analysis of text and documents: a survey. Int. J. Doc. Anal. Recognit. 7(2–3), 84–104 (2005)CrossRef Liang, J., Doermann, D., Li, H.: Camera-based analysis of text and documents: a survey. Int. J. Doc. Anal. Recognit. 7(2–3), 84–104 (2005)CrossRef
3.
Zurück zum Zitat Jung, K., Kim, K.I., Jain, A.K.: Text information extraction in images and video: a survey. Pattern Recognit. 37(5), 977–997 (2004)CrossRef Jung, K., Kim, K.I., Jain, A.K.: Text information extraction in images and video: a survey. Pattern Recognit. 37(5), 977–997 (2004)CrossRef
4.
Zurück zum Zitat Clavelli, A., Karatzas, D., Llados, J.: A framework for the assessment of text extraction algorithms on complex colour images. In: IAPR International Workshop on Document Analysis Systems, pp. 19–26 (2010) Clavelli, A., Karatzas, D., Llados, J.: A framework for the assessment of text extraction algorithms on complex colour images. In: IAPR International Workshop on Document Analysis Systems, pp. 19–26 (2010)
5.
Zurück zum Zitat Chen, X., Yuille, A.: Detecting and reading text in natural scenes. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 366–373 (2004) Chen, X., Yuille, A.: Detecting and reading text in natural scenes. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 366–373 (2004)
6.
Zurück zum Zitat Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Gomez i Bigorda, L., Robles Mestre, S., Mas, J., Fernandez Mota, D., Almazan Almazan, J., de las Heras, L.-P.: ICDAR 2013 robust reading competition. In: International Conference on Document Analysis and Recognition, pp. 1484–1493 (2013) Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Gomez i Bigorda, L., Robles Mestre, S., Mas, J., Fernandez Mota, D., Almazan Almazan, J., de las Heras, L.-P.: ICDAR 2013 robust reading competition. In: International Conference on Document Analysis and Recognition, pp. 1484–1493 (2013)
7.
Zurück zum Zitat Lienhart, R., Wernicke, A.: Localizing and segmenting text in images and videos. IEEE Trans. Circuit Syst. Video Technol. 12(4), 256–268 (2002)CrossRef Lienhart, R., Wernicke, A.: Localizing and segmenting text in images and videos. IEEE Trans. Circuit Syst. Video Technol. 12(4), 256–268 (2002)CrossRef
8.
Zurück zum Zitat Jain, A.K., Yu, B.: Automatic text location in images and video frames. Pattern Recognit. 31(12), 2055–2076 (1998)CrossRef Jain, A.K., Yu, B.: Automatic text location in images and video frames. Pattern Recognit. 31(12), 2055–2076 (1998)CrossRef
9.
Zurück zum Zitat Kim, H.K.: Efficient automatic text location method and content-based indexing and structuring of video database. J. Vis. Commun. Image Represent. 7(4), 336–344 (1996)CrossRef Kim, H.K.: Efficient automatic text location method and content-based indexing and structuring of video database. J. Vis. Commun. Image Represent. 7(4), 336–344 (1996)CrossRef
10.
Zurück zum Zitat Lucas, S.M., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R.: ICDAR 2003 robust reading competitions. In: International Conference on Document Analysis and Recognition, pp. 682–687 (2003) Lucas, S.M., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R.: ICDAR 2003 robust reading competitions. In: International Conference on Document Analysis and Recognition, pp. 682–687 (2003)
11.
Zurück zum Zitat Lucas, S.M.: ICDAR 2005 text locating competition results. In: International Conference on Document Analysis and Recognition, pp. 80–84 (2005) Lucas, S.M.: ICDAR 2005 text locating competition results. In: International Conference on Document Analysis and Recognition, pp. 80–84 (2005)
12.
Zurück zum Zitat Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 2963–2970 (2010) Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 2963–2970 (2010)
13.
Zurück zum Zitat Datta, R., Joshi, D., Li, J., Wang, James Z.: Image retrieval: Ideas, influences, and trends of the new age. ACM Comput. Surv. 40(2), 1–60 (2008)CrossRef Datta, R., Joshi, D., Li, J., Wang, James Z.: Image retrieval: Ideas, influences, and trends of the new age. ACM Comput. Surv. 40(2), 1–60 (2008)CrossRef
14.
Zurück zum Zitat Shivakumara, P., Phan, T.Q., Tan, C.L.: A Laplacian approach to multi-oriented text detection in video. IEEE Trans. Pattern. Anal. Mach. Intell. 33(2), 412–419 (2011)CrossRef Shivakumara, P., Phan, T.Q., Tan, C.L.: A Laplacian approach to multi-oriented text detection in video. IEEE Trans. Pattern. Anal. Mach. Intell. 33(2), 412–419 (2011)CrossRef
15.
Zurück zum Zitat Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern. Anal. Mach. Intell. 8(6), 679–698 (1986)CrossRef Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern. Anal. Mach. Intell. 8(6), 679–698 (1986)CrossRef
16.
Zurück zum Zitat Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 886–893 (2005) Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 886–893 (2005)
17.
Zurück zum Zitat Mishra, A., Alahari, K., Jawahar, C.V.: Top-down and bottom-up cues for scene text recognition. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 2687–2694 (2012) Mishra, A., Alahari, K., Jawahar, C.V.: Top-down and bottom-up cues for scene text recognition. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 2687–2694 (2012)
18.
Zurück zum Zitat Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: IEEE International Conference on Computer Vision, pp. 1457–1464 (2011) Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: IEEE International Conference on Computer Vision, pp. 1457–1464 (2011)
19.
Zurück zum Zitat Wang, K., Belongie, S.: Word spotting in the wild. In: European Conference on Computer Vision, pp. 591–604 (2010) Wang, K., Belongie, S.: Word spotting in the wild. In: European Conference on Computer Vision, pp. 591–604 (2010)
20.
Zurück zum Zitat Coates, A., Carpenter, B., Case, C., Satheesh, S., Suresh, B., Wang, T., Wu, D.J., Ng, A.Y.: Text detection and character recognition in scene images with unsupervised feature learning. In: International Conference on Document Analysis and Recognition, pp. 440–445 (2011) Coates, A., Carpenter, B., Case, C., Satheesh, S., Suresh, B., Wang, T., Wu, D.J., Ng, A.Y.: Text detection and character recognition in scene images with unsupervised feature learning. In: International Conference on Document Analysis and Recognition, pp. 440–445 (2011)
21.
Zurück zum Zitat Wang, T., Wu, David J., Coates, A., Ng, A.Y.: End-to-end text recognition with convolutional neural networks. In: International Conference on Pattern Recognition, pp. 3304–3308 (2012) Wang, T., Wu, David J., Coates, A., Ng, A.Y.: End-to-end text recognition with convolutional neural networks. In: International Conference on Pattern Recognition, pp. 3304–3308 (2012)
22.
Zurück zum Zitat Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 3538–3545 (2012) Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 3538–3545 (2012)
23.
Zurück zum Zitat Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 1083–1090 (2012) Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 1083–1090 (2012)
24.
Zurück zum Zitat Yi, C., Tian, Y.: Text string detection from natural scenes by structure-based partition and grouping. IEEE Trans. Image Process. 20(9), 2594–2605 (2011) Yi, C., Tian, Y.: Text string detection from natural scenes by structure-based partition and grouping. IEEE Trans. Image Process. 20(9), 2594–2605 (2011)
25.
Zurück zum Zitat Shahab, A., Shafait, F., Dengel, A.: ICDAR 2011 robust reading competition challenge 2: reading text in scene images. In: International Conference on Document Analysis and Recognition, pp. 1491–1496 (2011) Shahab, A., Shafait, F., Dengel, A.: ICDAR 2011 robust reading competition challenge 2: reading text in scene images. In: International Conference on Document Analysis and Recognition, pp. 1491–1496 (2011)
26.
Zurück zum Zitat Chen, H., Tsai, S.S., Schroth, G., Chen, D.M., Grzeszczuk, R., Girod, B.: Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In: International Conference on Image Processing, pp. 2609–2612 (2011) Chen, H., Tsai, S.S., Schroth, G., Chen, D.M., Grzeszczuk, R., Girod, B.: Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In: International Conference on Image Processing, pp. 2609–2612 (2011)
27.
Zurück zum Zitat Wolf, C., Jolion, J.: Object count/area graphs for the evaluation of object detection and segmentation algorithms. Int. J. Doc. Anal. Recognit. 8(4), 280–296 (2006)CrossRef Wolf, C., Jolion, J.: Object count/area graphs for the evaluation of object detection and segmentation algorithms. Int. J. Doc. Anal. Recognit. 8(4), 280–296 (2006)CrossRef
28.
Zurück zum Zitat Shi, C., Wang, C., Xiao, B., Zhang, Y., Gao, S.: Scene text detection using graph model built upon maximally stable extremal regions. Pattern Recognit. Lett. 34(2), 280–296 (2012) Shi, C., Wang, C., Xiao, B., Zhang, Y., Gao, S.: Scene text detection using graph model built upon maximally stable extremal regions. Pattern Recognit. Lett. 34(2), 280–296 (2012)
29.
Zurück zum Zitat Pan, Y.F., Hou, X., Liu, C.L.: A hybrid approach to detect and localize texts in natural scene images. IEEE Trans. Image. Process. 20(3), 800–813 (2011)CrossRefMathSciNet Pan, Y.F., Hou, X., Liu, C.L.: A hybrid approach to detect and localize texts in natural scene images. IEEE Trans. Image. Process. 20(3), 800–813 (2011)CrossRefMathSciNet
30.
Zurück zum Zitat Yi, C., Tian, Y.: Localizing text in scene images by boundary clustering, stroke segmentation, and string fragment classification. IEEE Trans. Image. Process. 21(9), 4256–4268 (2012)CrossRefMathSciNet Yi, C., Tian, Y.: Localizing text in scene images by boundary clustering, stroke segmentation, and string fragment classification. IEEE Trans. Image. Process. 21(9), 4256–4268 (2012)CrossRefMathSciNet
31.
Zurück zum Zitat Kasar, T., Kumar, J., Ramakrishnan, A.G.: Font and background color independent text binarization. In: International workshop on Camera Based Document Analysis and Recognition (workshop of ICDAR), pp. 3–9 (2007) Kasar, T., Kumar, J., Ramakrishnan, A.G.: Font and background color independent text binarization. In: International workshop on Camera Based Document Analysis and Recognition (workshop of ICDAR), pp. 3–9 (2007)
32.
Zurück zum Zitat Basak, D., Pal, S., Patranabis, D.C.: Support vector regression. Neural Inf. Process. 11(10), 203–224 (2007) Basak, D., Pal, S., Patranabis, D.C.: Support vector regression. Neural Inf. Process. 11(10), 203–224 (2007)
33.
Zurück zum Zitat Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–65 (1979)CrossRefMathSciNet Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–65 (1979)CrossRefMathSciNet
34.
Zurück zum Zitat Huang, W., Lin, Z., Yang, J., Wang, J.: Text localization in natural images using stroke feature transform and text covariance descriptors. In: IEEE International Conference on Computer Vision, pp. 1241–1248 (2013) Huang, W., Lin, Z., Yang, J., Wang, J.: Text localization in natural images using stroke feature transform and text covariance descriptors. In: IEEE International Conference on Computer Vision, pp. 1241–1248 (2013)
35.
Zurück zum Zitat Chen, T., Yap, K.-H., Zhang, D.J.: Discriminative soft bag-of-visual phrase for mobile landmark recognition. IEEE Trans. Multimed. 13, 612–622 (2014)CrossRef Chen, T., Yap, K.-H., Zhang, D.J.: Discriminative soft bag-of-visual phrase for mobile landmark recognition. IEEE Trans. Multimed. 13, 612–622 (2014)CrossRef
36.
Zurück zum Zitat Li, T., Mei, T., Kweon, I.-S., Hua, X.S.: Contextual bags-of-words for visual categorization. IEEE Trans. Circuits Syst. Video Technol. 21, 381–392 (2010)CrossRef Li, T., Mei, T., Kweon, I.-S., Hua, X.S.: Contextual bags-of-words for visual categorization. IEEE Trans. Circuits Syst. Video Technol. 21, 381–392 (2010)CrossRef
Metadaten
Titel
Scene text extraction based on edges and support vector regression
verfasst von
Shijian Lu
Tao Chen
Shangxuan Tian
Joo-Hwee Lim
Chew-Lim Tan
Publikationsdatum
01.06.2015
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal on Document Analysis and Recognition (IJDAR) / Ausgabe 2/2015
Print ISSN: 1433-2833
Elektronische ISSN: 1433-2825
DOI
https://doi.org/10.1007/s10032-015-0237-z

Weitere Artikel der Ausgabe 2/2015

International Journal on Document Analysis and Recognition (IJDAR) 2/2015 Zur Ausgabe

Editorial

Preface