Skip to main content
Top

2021 | OriginalPaper | Chapter

ICDAR 2021 Competition on Scene Video Text Spotting

Authors : Zhanzhan Cheng, Jing Lu, Baorui Zou, Shuigeng Zhou, Fei Wu

Published in: Document Analysis and Recognition – ICDAR 2021

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Scene video text spotting (SVTS) is a very important research topic because of many real-life applications. However, only a little effort has been put to spotting scene video text, in contrast to massive studies of scene text spotting in static images. Due to various environmental interferences like motion blur, spotting scene video text becomes very challenging. To promote this research area, this competition introduces a new challenge dataset containing 129 video clips from 21 natural scenarios in full annotations. The competition contains three tasks, that is, video text detection (Task 1), video text tracking (Task 2) and end-to-end video text spotting (Task3). During the competition period (opened on 1st March, 2021 and closed on 11th April, 2021), a total of 24 teams participated in the three proposed tasks with 46 valid submissions, respectively. This paper includes dataset descriptions, task definitions, evaluation protocols and results summaries of the ICDAR 2021 on SVTS competition. Thanks to the healthy number of teams as well as submissions, we consider that the SVTS competition has been successfully held, drawing much attention from the community and promoting the field research and its development.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Bookstein, F.L.: Thin-plate splines and the decomposition of deformations. IEEE TPAMI 16(6), 567–585 (1989)CrossRef Bookstein, F.L.: Thin-plate splines and the decomposition of deformations. IEEE TPAMI 16(6), 567–585 (1989)CrossRef
2.
go back to reference Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: CVPR, pp. 6154–6162 (2018) Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: CVPR, pp. 6154–6162 (2018)
3.
go back to reference Cheng, Z., Lu, J., Niu, Y., Pu, S., Wu, F., Zhou, S.: You only recognize once: towards fast video text spotting. In: ACM MM, pp. 855–863 (2019) Cheng, Z., Lu, J., Niu, Y., Pu, S., Wu, F., Zhou, S.: You only recognize once: towards fast video text spotting. In: ACM MM, pp. 855–863 (2019)
4.
go back to reference Cheng, Z., et al.: FREE: a fast and robust end-to-end video text spotter. IEEE Trans. Image Process. 30, 822–837 (2020)CrossRef Cheng, Z., et al.: FREE: a fast and robust end-to-end video text spotter. IEEE Trans. Image Process. 30, 822–837 (2020)CrossRef
5.
go back to reference Dai, J., et al.: Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 764–773 (2017) Dai, J., et al.: Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 764–773 (2017)
6.
go back to reference Maes, F., Collignon, A.: Multimodality image registration by maximization of mutual information. IEEE TMI 16(2), 187–198 (1997) Maes, F., Collignon, A.: Multimodality image registration by maximization of mutual information. IEEE TMI 16(2), 187–198 (1997)
7.
go back to reference Feng, W., Yin, F., Zhang, X., Liu, C.: Semantic-aware video text detection. In: CVPR. IEEE (2021) Feng, W., Yin, F., Zhang, X., Liu, C.: Semantic-aware video text detection. In: CVPR. IEEE (2021)
8.
go back to reference Gao, S., Cheng, M.M., Zhao, K., Zhang, X.Y., Yang, M.H., Torr, P.H.: Res2Net: a new multi-scale backbone architecture. IEEE TPAMI PP(99), 1 (2019) Gao, S., Cheng, M.M., Zhao, K., Zhang, X.Y., Yang, M.H., Torr, P.H.: Res2Net: a new multi-scale backbone architecture. IEEE TPAMI PP(99), 1 (2019)
9.
go back to reference Graves, A., Fernández, S., Gomez, F.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006) Graves, A., Fernández, S., Gomez, F.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006)
10.
go back to reference He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV, pp. 2961–2969 (2017) He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV, pp. 2961–2969 (2017)
11.
go back to reference He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
12.
go back to reference Jie, H., Li, S., Gang, S.: Squeeze-and-excitation networks. In: CVPR, pp. 7132–7141 (2018) Jie, H., Li, S., Gang, S.: Squeeze-and-excitation networks. In: CVPR, pp. 7132–7141 (2018)
13.
go back to reference Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015) Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015)
14.
go back to reference Karatzas, D., et al.: ICDAR 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013) Karatzas, D., et al.: ICDAR 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013)
15.
go back to reference Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: ALBERT: a lite BERT for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 (2019) Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: ALBERT: a lite BERT for self-supervised learning of language representations. arXiv preprint arXiv:​1909.​11942 (2019)
16.
go back to reference Liu, Y., Jin, L., Zhang, S., Sheng, Z.: Detecting curve text in the wild: new dataset and new solution. arXiv preprint arXiv:1712.02170 (2017) Liu, Y., Jin, L., Zhang, S., Sheng, Z.: Detecting curve text in the wild: new dataset and new solution. arXiv preprint arXiv:​1712.​02170 (2017)
17.
go back to reference Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: CVPR, pp. 5693–5703 (2019) Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: CVPR, pp. 5693–5703 (2019)
Metadata
Title
ICDAR 2021 Competition on Scene Video Text Spotting
Authors
Zhanzhan Cheng
Jing Lu
Baorui Zou
Shuigeng Zhou
Fei Wu
Copyright Year
2021
DOI
https://doi.org/10.1007/978-3-030-86337-1_43

Premium Partner