Skip to main content

2020 | OriginalPaper | Buchkapitel

A Deep Convolutional Deblurring and Detection Neural Network for Localizing Text in Videos

verfasst von : Yang Wang, Ye Qian, Jiahao Shi, Feng Su

Erschienen in: MultiMedia Modeling

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Scene text in the video is usually vulnerable to various blurs like those caused by camera or text motions, which brings additional difficulty to reliably extract them from the video for content-based video applications. In this paper, we propose a novel fully convolutional deep neural network for deblurring and detecting text in the video. Specifically, to cope with blur of video text, we propose an effective deblurring subnetwork that is composed of multi-level convolutional blocks with both cross-block (long) and within-block (short) skip connections for progressively learning residual deblurred image details as well as a spatial attention mechanism to pay more attention on blurred regions, which generates the sharper image for current frame by fusing multiple surrounding adjacent frames. To further localize text in the frames, we enhance the EAST text detection model by introducing deformable convolution layers and deconvolution layers, which better capture widely varied appearances of video text. Experiments on the public scene text video dataset demonstrate the state-of-the-art performance of the proposed video text deblurring and detection model.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Cho, S., Wang, J., Lee, S.: Video deblurring for hand-held cameras using patch-based synthesis. ACM Trans. Graph. (TOG) 31(4), 64 (2012)CrossRef Cho, S., Wang, J., Lee, S.: Video deblurring for hand-held cameras using patch-based synthesis. ACM Trans. Graph. (TOG) 31(4), 64 (2012)CrossRef
2.
Zurück zum Zitat Dai, J., et al.: Deformable convolutional networks. In: ICCV, October 2017 Dai, J., et al.: Deformable convolutional networks. In: ICCV, October 2017
3.
Zurück zum Zitat Delbracio, M., Sapiro, G.: Burst deblurring: removing camera shake through fourier burst accumulation. In: CVPR, pp. 2385–2393 (2015) Delbracio, M., Sapiro, G.: Burst deblurring: removing camera shake through fourier burst accumulation. In: CVPR, pp. 2385–2393 (2015)
4.
Zurück zum Zitat Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: CVPR, pp. 2963–2970 (2010) Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: CVPR, pp. 2963–2970 (2010)
5.
Zurück zum Zitat He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
6.
Zurück zum Zitat He, P., Huang, W., He, T., Zhu, Q., Qiao, Y., Li, X.: Single shot text detector with regional attention. In: ICCV, pp. 3047–3055 (2017) He, P., Huang, W., He, T., Zhu, Q., Qiao, Y., Li, X.: Single shot text detector with regional attention. In: ICCV, pp. 3047–3055 (2017)
7.
Zurück zum Zitat Karatzas, D., et al.: ICDAR 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Karatzas, D., et al.: ICDAR 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013)
8.
Zurück zum Zitat Khare, V., Shivakumara, P., Paramesran, R., Blumenstein, M.: Arbitrarily-oriented multi-lingual text detection in video. Multimedia Tools Appl. 76(15), 16625–16655 (2017)CrossRef Khare, V., Shivakumara, P., Paramesran, R., Blumenstein, M.: Arbitrarily-oriented multi-lingual text detection in video. Multimedia Tools Appl. 76(15), 16625–16655 (2017)CrossRef
9.
Zurück zum Zitat Khare, V., Shivakumara, P., Raveendran, P.: A new histogram oriented moments descriptor for multi-oriented moving text detection in video. Expert Syst. Appl. 42(21), 7627–7640 (2015)CrossRef Khare, V., Shivakumara, P., Raveendran, P.: A new histogram oriented moments descriptor for multi-oriented moving text detection in video. Expert Syst. Appl. 42(21), 7627–7640 (2015)CrossRef
10.
Zurück zum Zitat Khare, V., Shivakumara, P., Raveendran, P., Blumenstein, M.: A blind deconvolution model for scene text detection and recognition in video. Pattern Recogn. 54(C), 128–148 (2016)CrossRef Khare, V., Shivakumara, P., Raveendran, P., Blumenstein, M.: A blind deconvolution model for scene text detection and recognition in video. Pattern Recogn. 54(C), 128–148 (2016)CrossRef
11.
Zurück zum Zitat Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. IVC 22(10), 761–767 (2004)CrossRef Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. IVC 22(10), 761–767 (2004)CrossRef
12.
Zurück zum Zitat Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015) Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)
13.
Zurück zum Zitat Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 3482–3490 (2017) Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 3482–3490 (2017)
14.
Zurück zum Zitat Shivakumara, P., Phan, T.Q., Tan, C.L.: New fourier-statistical features in RGB space for video text detection. IEEE TCSVT 20(11), 1520–1532 (2010) Shivakumara, P., Phan, T.Q., Tan, C.L.: New fourier-statistical features in RGB space for video text detection. IEEE TCSVT 20(11), 1520–1532 (2010)
15.
Zurück zum Zitat Shivakumara, P., Sreedhar, R.P., Phan, T.Q., Lu, S., Tan, C.L.: Multioriented video scene text detection through Bayesian classification and boundary growing. IEEE TCSVT 22(8), 1227–1235 (2012) Shivakumara, P., Sreedhar, R.P., Phan, T.Q., Lu, S., Tan, C.L.: Multioriented video scene text detection through Bayesian classification and boundary growing. IEEE TCSVT 22(8), 1227–1235 (2012)
16.
Zurück zum Zitat Su, S., Delbracio, M., Wang, J., Sapiro, G., Heidrich, W., Wang, O.: Deep video deblurring for hand-held cameras. In: CVPR, pp. 1279–1288, July 2017 Su, S., Delbracio, M., Wang, J., Sapiro, G., Heidrich, W., Wang, O.: Deep video deblurring for hand-held cameras. In: CVPR, pp. 1279–1288, July 2017
17.
Zurück zum Zitat Tian, S., Pan, Y., Huang, C., Lu, S., Yu, K., Tan, C.L.: Text flow: a unified text detection system in natural scene images. In: ICCV, pp. 4651–4659 (2015) Tian, S., Pan, Y., Huang, C., Lu, S., Yu, K., Tan, C.L.: Text flow: a unified text detection system in natural scene images. In: ICCV, pp. 4651–4659 (2015)
19.
Zurück zum Zitat Wang, L., Wang, Y., Shan, S., Su, F.: Scene text detection and tracking in video with background cues. In: ICMR, pp. 160–168 (2018) Wang, L., Wang, Y., Shan, S., Su, F.: Scene text detection and tracking in video with background cues. In: ICMR, pp. 160–168 (2018)
20.
Zurück zum Zitat Yang, C., et al.: Tracking based multi-orientation scene text detection: a unified framework with dynamic programming. IEEE TIP 26(7), 3235–3248 (2017)MathSciNetMATH Yang, C., et al.: Tracking based multi-orientation scene text detection: a unified framework with dynamic programming. IEEE TIP 26(7), 3235–3248 (2017)MathSciNetMATH
21.
Zurück zum Zitat Yin, X.C., Yin, X., Huang, K., Hao, H.W.: Robust text detection in natural scene images. IEEE TPAMI 36(5), 970–983 (2014)CrossRef Yin, X.C., Yin, X., Huang, K., Hao, H.W.: Robust text detection in natural scene images. IEEE TPAMI 36(5), 970–983 (2014)CrossRef
22.
Zurück zum Zitat Zhao, X., Lin, K.H., Fu, Y., Hu, Y., Liu, Y., Huang, T.S.: Text from corners: a novel approach to detect text and caption in videos. IEEE TIP 20(3), 790–799 (2011)MathSciNetMATH Zhao, X., Lin, K.H., Fu, Y., Hu, Y., Liu, Y., Huang, T.S.: Text from corners: a novel approach to detect text and caption in videos. IEEE TIP 20(3), 790–799 (2011)MathSciNetMATH
23.
Zurück zum Zitat Zhou, X., et al.: EAST: an efficient and accurate scene text detector. In: CVPR, pp. 2642–2651 (2017) Zhou, X., et al.: EAST: an efficient and accurate scene text detector. In: CVPR, pp. 2642–2651 (2017)
Metadaten
Titel
A Deep Convolutional Deblurring and Detection Neural Network for Localizing Text in Videos
verfasst von
Yang Wang
Ye Qian
Jiahao Shi
Feng Su
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-37734-2_10

Neuer Inhalt