Skip to main content

2021 | OriginalPaper | Buchkapitel

ICDAR 2021 Competition on Scene Video Text Spotting

verfasst von : Zhanzhan Cheng, Jing Lu, Baorui Zou, Shuigeng Zhou, Fei Wu

Erschienen in: Document Analysis and Recognition – ICDAR 2021

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Scene video text spotting (SVTS) is a very important research topic because of many real-life applications. However, only a little effort has been put to spotting scene video text, in contrast to massive studies of scene text spotting in static images. Due to various environmental interferences like motion blur, spotting scene video text becomes very challenging. To promote this research area, this competition introduces a new challenge dataset containing 129 video clips from 21 natural scenarios in full annotations. The competition contains three tasks, that is, video text detection (Task 1), video text tracking (Task 2) and end-to-end video text spotting (Task3). During the competition period (opened on 1st March, 2021 and closed on 11th April, 2021), a total of 24 teams participated in the three proposed tasks with 46 valid submissions, respectively. This paper includes dataset descriptions, task definitions, evaluation protocols and results summaries of the ICDAR 2021 on SVTS competition. Thanks to the healthy number of teams as well as submissions, we consider that the SVTS competition has been successfully held, drawing much attention from the community and promoting the field research and its development.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Bookstein, F.L.: Thin-plate splines and the decomposition of deformations. IEEE TPAMI 16(6), 567–585 (1989)CrossRef Bookstein, F.L.: Thin-plate splines and the decomposition of deformations. IEEE TPAMI 16(6), 567–585 (1989)CrossRef
2.
Zurück zum Zitat Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: CVPR, pp. 6154–6162 (2018) Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: CVPR, pp. 6154–6162 (2018)
3.
Zurück zum Zitat Cheng, Z., Lu, J., Niu, Y., Pu, S., Wu, F., Zhou, S.: You only recognize once: towards fast video text spotting. In: ACM MM, pp. 855–863 (2019) Cheng, Z., Lu, J., Niu, Y., Pu, S., Wu, F., Zhou, S.: You only recognize once: towards fast video text spotting. In: ACM MM, pp. 855–863 (2019)
4.
Zurück zum Zitat Cheng, Z., et al.: FREE: a fast and robust end-to-end video text spotter. IEEE Trans. Image Process. 30, 822–837 (2020)CrossRef Cheng, Z., et al.: FREE: a fast and robust end-to-end video text spotter. IEEE Trans. Image Process. 30, 822–837 (2020)CrossRef
5.
Zurück zum Zitat Dai, J., et al.: Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 764–773 (2017) Dai, J., et al.: Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 764–773 (2017)
6.
Zurück zum Zitat Maes, F., Collignon, A.: Multimodality image registration by maximization of mutual information. IEEE TMI 16(2), 187–198 (1997) Maes, F., Collignon, A.: Multimodality image registration by maximization of mutual information. IEEE TMI 16(2), 187–198 (1997)
7.
Zurück zum Zitat Feng, W., Yin, F., Zhang, X., Liu, C.: Semantic-aware video text detection. In: CVPR. IEEE (2021) Feng, W., Yin, F., Zhang, X., Liu, C.: Semantic-aware video text detection. In: CVPR. IEEE (2021)
8.
Zurück zum Zitat Gao, S., Cheng, M.M., Zhao, K., Zhang, X.Y., Yang, M.H., Torr, P.H.: Res2Net: a new multi-scale backbone architecture. IEEE TPAMI PP(99), 1 (2019) Gao, S., Cheng, M.M., Zhao, K., Zhang, X.Y., Yang, M.H., Torr, P.H.: Res2Net: a new multi-scale backbone architecture. IEEE TPAMI PP(99), 1 (2019)
9.
Zurück zum Zitat Graves, A., Fernández, S., Gomez, F.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Graves, A., Fernández, S., Gomez, F.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006)
10.
Zurück zum Zitat He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV, pp. 2961–2969 (2017) He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV, pp. 2961–2969 (2017)
11.
Zurück zum Zitat He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
12.
Zurück zum Zitat Jie, H., Li, S., Gang, S.: Squeeze-and-excitation networks. In: CVPR, pp. 7132–7141 (2018) Jie, H., Li, S., Gang, S.: Squeeze-and-excitation networks. In: CVPR, pp. 7132–7141 (2018)
13.
Zurück zum Zitat Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015)
14.
Zurück zum Zitat Karatzas, D., et al.: ICDAR 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Karatzas, D., et al.: ICDAR 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013)
15.
Zurück zum Zitat Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: ALBERT: a lite BERT for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 (2019) Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: ALBERT: a lite BERT for self-supervised learning of language representations. arXiv preprint arXiv:​1909.​11942 (2019)
16.
Zurück zum Zitat Liu, Y., Jin, L., Zhang, S., Sheng, Z.: Detecting curve text in the wild: new dataset and new solution. arXiv preprint arXiv:1712.02170 (2017) Liu, Y., Jin, L., Zhang, S., Sheng, Z.: Detecting curve text in the wild: new dataset and new solution. arXiv preprint arXiv:​1712.​02170 (2017)
17.
Zurück zum Zitat Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: CVPR, pp. 5693–5703 (2019) Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: CVPR, pp. 5693–5703 (2019)
Metadaten
Titel
ICDAR 2021 Competition on Scene Video Text Spotting
verfasst von
Zhanzhan Cheng
Jing Lu
Baorui Zou
Shuigeng Zhou
Fei Wu
Copyright-Jahr
2021
DOI
https://doi.org/10.1007/978-3-030-86337-1_43

Premium Partner