Skip to main content
Erschienen in:
Buchtitelbild

2014 | OriginalPaper | Buchkapitel

Spatially Prioritized and Persistent Text Detection and Decoding

verfasst von : Hsueh-Cheng Wang, Yafim Landa, Maurice Fallon, Seth Teller

Erschienen in: Camera-Based Document Analysis and Recognition

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We show how to exploit temporal and spatial coherence to achieve efficient and effective text detection and decoding for a sensor suite moving through an environment in which text occurs at a variety of locations, scales and orientations with respect to the observer. Our method uses simultaneous localization and mapping (SLAM) to extract planar “tiles” representing scene surfaces. Multiple observations of each tile, captured from different observer poses, are aligned using homography transformations. Text is detected using Discrete Cosine Transform (DCT) and Maximally Stable Extremal Regions (MSER), and decoded by an Optical Character Recognition (OCR) engine. The decoded characters are then clustered into character blocks to obtain an MLE word configuration. This paper’s contributions include: (1) spatiotemporal fusion of tile observations via SLAM, prior to inspection, thereby improving the quality of the input data; and (2) combination of multiple noisy text observations into a single higher-confidence estimate of environmental text.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Chen, X., Yuille, A.: Detecting and reading text in natural scenes. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2004) Chen, X., Yuille, A.: Detecting and reading text in natural scenes. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2004)
2.
Zurück zum Zitat Neumann, L., Matas, J.: A method for text localization and recognition in real-world images. In: Asian Conference on Computer Vision (ACCV), pp. 770–783 (2004) Neumann, L., Matas, J.: A method for text localization and recognition in real-world images. In: Asian Conference on Computer Vision (ACCV), pp. 770–783 (2004)
3.
Zurück zum Zitat Neumann, L., Matas, J.: Text localization in real-world images using efficiently pruned exhaustive search. In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), pp. 687–691 (2011) Neumann, L., Matas, J.: Text localization in real-world images using efficiently pruned exhaustive search. In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), pp. 687–691 (2011)
4.
Zurück zum Zitat Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: Proceedings of the IEEE International Conference Computer Vision and Pattern Recognition (CVPR) (2012) Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: Proceedings of the IEEE International Conference Computer Vision and Pattern Recognition (CVPR) (2012)
5.
Zurück zum Zitat Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22(10), 761–767 (2004)CrossRef Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22(10), 761–767 (2004)CrossRef
6.
Zurück zum Zitat Lucas, S.: ICDAR 2005 text locating competition results. In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 80–84 (2005) Lucas, S.: ICDAR 2005 text locating competition results. In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 80–84 (2005)
7.
Zurück zum Zitat Wang, K., Belongie, S.: Word spotting in the wild. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 591–604. Springer, Heidelberg (2010) Wang, K., Belongie, S.: Word spotting in the wild. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 591–604. Springer, Heidelberg (2010)
8.
Zurück zum Zitat Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: International Conference on Computer Vision (ICCV) (2011) Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: International Conference on Computer Vision (ICCV) (2011)
9.
Zurück zum Zitat Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: Proceedings of the IEEE International Conference Computer Vision and Pattern Recognition (CVPR), pp. 2963–2970 (2010) Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: Proceedings of the IEEE International Conference Computer Vision and Pattern Recognition (CVPR), pp. 2963–2970 (2010)
10.
Zurück zum Zitat Smith, R.: An overview of the tesseract OCR engine. In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), pp. 629–633 (2007) Smith, R.: An overview of the tesseract OCR engine. In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), pp. 629–633 (2007)
11.
Zurück zum Zitat Smith, R.: History of the tesseract OCR engine: what worked and what didn’t. In: Proceedings of SPIE Document Recognition and Retrieval (2013) Smith, R.: History of the tesseract OCR engine: what worked and what didn’t. In: Proceedings of SPIE Document Recognition and Retrieval (2013)
12.
Zurück zum Zitat Posner, I., Corke, P., Newman, P.: Using text-spotting to query the world. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3181–3186 (2010) Posner, I., Corke, P., Newman, P.: Using text-spotting to query the world. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3181–3186 (2010)
13.
Zurück zum Zitat Yi, C., Tian, Y.: Assistive text reading from complex background for blind persons. In: Proceedings of Camera-based Document Analysis and Recognition (CBDAR), pp. 15–28 (2011) Yi, C., Tian, Y.: Assistive text reading from complex background for blind persons. In: Proceedings of Camera-based Document Analysis and Recognition (CBDAR), pp. 15–28 (2011)
14.
Zurück zum Zitat Sato, T., Kanade, T., Hughes, E., Smith, M.: Video OCR for digital news archive. In: Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database, pp. 52–60 (1998) Sato, T., Kanade, T., Hughes, E., Smith, M.: Video OCR for digital news archive. In: Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database, pp. 52–60 (1998)
15.
Zurück zum Zitat Li, H., Doermann, D.: Text enhancement in digital video using multiple frame integration. In: Proceedings of the seventh ACM international conference on Multimedia (Part 1), pp. 19–22 (1999) Li, H., Doermann, D.: Text enhancement in digital video using multiple frame integration. In: Proceedings of the seventh ACM international conference on Multimedia (Part 1), pp. 19–22 (1999)
16.
Zurück zum Zitat Hua, X.S., Yin, P., Zhang, H.J.: Efficient video text recognition using multiple frame integration. In: Proceedings of the 2002 International Conference on Image Processing, vol. 2 II-397–II-400 (2002) Hua, X.S., Yin, P., Zhang, H.J.: Efficient video text recognition using multiple frame integration. In: Proceedings of the 2002 International Conference on Image Processing, vol. 2 II-397–II-400 (2002)
17.
Zurück zum Zitat Jung, K., Kim, K.I., Jain, A.K.: Text information extraction in images and video: a survey. Pattern Recogn. 37(5), 977–997 (2004)CrossRef Jung, K., Kim, K.I., Jain, A.K.: Text information extraction in images and video: a survey. Pattern Recogn. 37(5), 977–997 (2004)CrossRef
18.
Zurück zum Zitat Myers, G.K., Burns, B.: A robust method for tracking scene text in video imagery. In: CBDAR05 (2005) Myers, G.K., Burns, B.: A robust method for tracking scene text in video imagery. In: CBDAR05 (2005)
19.
Zurück zum Zitat Olson, E.: Real-time correlative scan matching. In: IEEE International Conference on Robotics and Automation (ICRA), Kobe, Japan, pp. 4387–4393, June 2009 Olson, E.: Real-time correlative scan matching. In: IEEE International Conference on Robotics and Automation (ICRA), Kobe, Japan, pp. 4387–4393, June 2009
20.
Zurück zum Zitat Bachrach, A., Prentice, S., He, R., Roy, N.: RANGE - robust autonomous navigation in GPS-denied environments. J. Field Robot. 28(5), 644–666 (2011)CrossRef Bachrach, A., Prentice, S., He, R., Roy, N.: RANGE - robust autonomous navigation in GPS-denied environments. J. Field Robot. 28(5), 644–666 (2011)CrossRef
21.
Zurück zum Zitat Fallon, M.F., Johannsson, H., Brookshire, J., Teller, S., Leonard, J.J.: Sensor fusion for flexible human-portable building-scale mapping. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Algarve, Portugal (2012) Fallon, M.F., Johannsson, H., Brookshire, J., Teller, S., Leonard, J.J.: Sensor fusion for flexible human-portable building-scale mapping. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Algarve, Portugal (2012)
22.
Zurück zum Zitat Park, S.C., Park, M.K., Kang, M.G.: Super-resolution image reconstruction: a technical overview. IEEE Signal Process. Mag. 20(3), 21–36 (2003)CrossRef Park, S.C., Park, M.K., Kang, M.G.: Super-resolution image reconstruction: a technical overview. IEEE Signal Process. Mag. 20(3), 21–36 (2003)CrossRef
23.
Zurück zum Zitat Farsiu, S., Robinson, M., Elad, M., Milanfar, P.: Fast and robust multiframe super resolution. IEEE Trans. Image Process. 13(10), 1327–1344 (2004)CrossRef Farsiu, S., Robinson, M., Elad, M., Milanfar, P.: Fast and robust multiframe super resolution. IEEE Trans. Image Process. 13(10), 1327–1344 (2004)CrossRef
24.
Zurück zum Zitat Mishra, A., Alahari, K., Jawahar, C.: Top-down and bottom-up cues for scene text recognition. In: Proceedings of the IEEE International Conference Computer Vision and Pattern Recognition (CVPR), pp. 2687–2694 (2012) Mishra, A., Alahari, K., Jawahar, C.: Top-down and bottom-up cues for scene text recognition. In: Proceedings of the IEEE International Conference Computer Vision and Pattern Recognition (CVPR), pp. 2687–2694 (2012)
25.
Zurück zum Zitat Crandall, D., Antani, S., Kasturi, R.: Extraction of special effects caption text events from digital video. Int. J. Doc. Anal. Recogn. 5(2–3), 138–157 (2003)CrossRef Crandall, D., Antani, S., Kasturi, R.: Extraction of special effects caption text events from digital video. Int. J. Doc. Anal. Recogn. 5(2–3), 138–157 (2003)CrossRef
26.
Zurück zum Zitat Goto, H.: Redefining the dct-based feature for scene text detection. Int. J. Doc. Anal. Recogn. (IJDAR) 11(1), 1–8 (2008)CrossRefMathSciNet Goto, H.: Redefining the dct-based feature for scene text detection. Int. J. Doc. Anal. Recogn. (IJDAR) 11(1), 1–8 (2008)CrossRefMathSciNet
27.
Zurück zum Zitat Nistér, D., Stewénius, H.: Linear time maximally stable extremal regions. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 183–196. Springer, Heidelberg (2008) Nistér, D., Stewénius, H.: Linear time maximally stable extremal regions. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 183–196. Springer, Heidelberg (2008)
28.
Zurück zum Zitat Merino-Gracia, C., Lenc, K., Mirmehdi, M.: A head-mounted device for recognizing text in natural scenes. In: Proceedings of Camera-based Document Analysis and Recognition (CBDAR), pp. 29–41 (2011) Merino-Gracia, C., Lenc, K., Mirmehdi, M.: A head-mounted device for recognizing text in natural scenes. In: Proceedings of Camera-based Document Analysis and Recognition (CBDAR), pp. 29–41 (2011)
29.
Zurück zum Zitat Huang, A., Olson, E., Moore, D.: LCM: Lightweight communications and marshalling. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Taipei, Taiwan, October 2010 Huang, A., Olson, E., Moore, D.: LCM: Lightweight communications and marshalling. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Taipei, Taiwan, October 2010
30.
Zurück zum Zitat Bonci, A., Leo, T., Longhi, S.: A Bayesian approach to the Hough transform for line detection. IEEE Trans. Syst. Man Cybern. Part A: Syst. Hum. 35(6), 945–955 (2005)CrossRef Bonci, A., Leo, T., Longhi, S.: A Bayesian approach to the Hough transform for line detection. IEEE Trans. Syst. Man Cybern. Part A: Syst. Hum. 35(6), 945–955 (2005)CrossRef
31.
Zurück zum Zitat Jones, M.N., Mewhort, D.J.K.: Case-sensitive letter and bigram frequency counts from large-scale english corpora. Behav. Res. Meth. Instrum. Comput. 36(3), 388–396 (2004)CrossRef Jones, M.N., Mewhort, D.J.K.: Case-sensitive letter and bigram frequency counts from large-scale english corpora. Behav. Res. Meth. Instrum. Comput. 36(3), 388–396 (2004)CrossRef
32.
Zurück zum Zitat Rabiner, L.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)CrossRef Rabiner, L.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)CrossRef
Metadaten
Titel
Spatially Prioritized and Persistent Text Detection and Decoding
verfasst von
Hsueh-Cheng Wang
Yafim Landa
Maurice Fallon
Seth Teller
Copyright-Jahr
2014
DOI
https://doi.org/10.1007/978-3-319-05167-3_1