nach oben

Erschienen in:

2018 | OriginalPaper | Buchkapitel

What Do I Annotate Next? An Empirical Study of Active Learning for Action Localization

verfasst von : Fabian Caba Heilbron, Joon-Young Lee, Hailin Jin, Bernard Ghanem

Erschienen in: Computer Vision – ECCV 2018

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Despite tremendous progress achieved in temporal action localization, state-of-the-art methods still struggle to train accurate models when annotated data is scarce. In this paper, we introduce a novel active learning framework for temporal localization that aims to mitigate this data dependency issue. We equip our framework with active selection functions that can reuse knowledge from previously annotated datasets. We study the performance of two state-of-the-art active selection functions as well as two widely used active learning baselines. To validate the effectiveness of each one of these selection functions, we conduct simulated experiments on ActivityNet. We find that using previously acquired knowledge as a bootstrapping source is crucial for active learners aiming to localize actions. When equipped with the right selection function, our proposed framework exhibits significantly better performance than standard active learning strategies, such as uncertainty sampling. Finally, we employ our framework to augment the newly compiled Kinetics action dataset with ground-truth temporal annotations. As a result, we collect Kinetics-Localization, a novel large-scale dataset for temporal action localization, which contains more than 15K YouTube videos.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Super-Identity Convolutional Neural Network for Face Hallucination

Nächstes Kapitel Semi-supervised Adversarial Learning to Generate Photorealistic Face Images of New Identities from 3D Morphable Model

Nur mit Berechtigung zugänglich

Bachman, P., Sordoni, A., Trischler, A.: Learning Algorithms for Active Learning. arXiv preprint arXiv:1708.00088 (2017)

Bandla, S., Grauman, K.: Active learning of an action detector from untrimmed videos. In: ICCV (2013)

Brinker, K.: Incorporating diversity in active learning with support vector machines. In: International Conference on Machine Learning (ICML) (2003)

Bruzzone, L., Marconcini, M.: Domain adaptation problems: a DASVM classification technique and a circular validation strategy. PAMI 32(5), 770–787 (2010)CrossRef

Buch, S., Escorcia, V., Ghanem, B., Fei-Fei, L., Niebles, J.C.: End-to-end, single-stream temporal action detection in untrimmed videos. In: BMVC (2017)

Buch, S., Escorcia, V., Shen, C., Ghanem, B., Niebles, J.C.: SST: single-stream temporal action proposals. In: CVPR (2017)

Caba Heilbron, F., Barrios, W., Escorcia, V., Ghanem, B.: SCC: semantic context cascade for efficient action detection. In: CVPR (2017)

Caba Heilbron, F., Escorcia, V., Ghanem, B., Carlos Niebles, J.: ActivityNet: a large-scale video benchmark for human activity understanding. In: CVPR (2015)

Caba Heilbron, F., Niebles, J.C.: Collecting and annotating human activities in web videos. In: Proceedings of International Conference on Multimedia Retrieval (ICMR) (2014)

10.

Caba Heilbron, F., Niebles, J.C., Ghanem, B.: Fast temporal activity proposals for efficient detection of human actions in untrimmed videos. In: CVPR (2016)

11.

Caba Heilbron, F., Thabet, A., Niebles, J.C., Ghanem, B.: Camera motion and surrounding scene appearance as context for action recognition. In: ACCV (2014)

12.

Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: CVPR (2017)

13.

Chao, Y.W., Vijayanarasimhan, S., Seybold, B., Ross, D.A., Deng, J., Sukthankar, R.: Rethinking the faster R-CNN architecture for temporal action localization. In: CVPR (2018)

14.

Chen, X., Shrivastava, A., Gupta, A.: NEIL: extracting visual knowledge from web data. In: ICCV (2013)

15.

Cisco: The Zettabyte Era: Trends and Analysis (2017). https://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-index-vni/vni-hyperconnectivity-wp.html

16.

Collins, B., Deng, J., Li, K., Fei-Fei, L.: Towards scalable dataset construction: an active learning approach. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 86–98. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88682-2_8CrossRef

17.

Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)MATH

18.

Cronin, L., et al.: Human vs robots in the discovery and crystallization of gigantic polyoxometalates. Angewandte Chemie (2017)

19.

Duchenne, O., Laptev, I., Sivic, J., Bach, F., Ponce, J.: Automatic annotation of human actions in video. In: ICCV (2009)

20.

Escorcia, V., Caba Heilbron, F., Niebles, J.C., Ghanem, B.: DAPs: deep action proposals for action understanding. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 768–784. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_47CrossRef

21.

Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9(Aug), 1871–1874 (2008)

22.

Freytag, A., Rodner, E., Denzler, J.: Selecting Influential examples: active learning with expected model output changes. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 562–577. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_37CrossRef

23.

Gaidon, A., Harchaoui, Z., Schmid, C.: Actom sequence models for efficient action detection. In: CVPR (2011)

24.

Gao, J., Yang, Z., Sun, C., Chen, K., Nevatia, R.: Turn tap: temporal unit regression network for temporal action proposals. In: ICCV (2017)

25.

Gavves, S., Mensink, T., Tommasi, T., Snoek, C., Tuytelaars, T.: Active transfer learning with zero-shot priors: reusing past datasets for future tasks. In: ICCV (2015)

26.

Giancola, S., Amine, M., Dghaily, T., Ghanem, B.: SoccerNet: a scalable dataset for action spotting in soccer videos. In: CVPR Workshops (2018)

27.

Gilad-Bachrach, R., Navot, A., Tishby, N.: Query by committee made real. In: NIPS (2006)

28.

Gkioxari, G., Hariharan, B., Girshick, R., Malik, J.: R-CNNs for pose estimation and action detection (2014)

29.

Gkioxari, G., Malik, J.: Finding action tubes. In: CVPR (2015)

30.

Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: ICLR (2014)

31.

Gu, C., et al.: AVA: a video dataset of spatio-temporally localized atomic visual actions. In: CVPR (2018)

32.

Hakkani-Tür, D., Riccardi, G., Gorin, A.: Active learning for automatic speech recognition. In: 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2002)

33.

Hu, P., Ramanan, D.: Finding tiny faces. In: CVPR (2017)

34.

Jain, M., van Gemert, J., Jegou, H., Bouthemy, P., Snoek, C.G.: Action localization with tubelets from motion. In: CVPR (2014)

35.

Jiang, Y.G., et al.: THUMOS Challenge: Action Recognition with a Large Number of Classes (2014). http://crcv.ucf.edu/THUMOS14/

36.

Joshi, A.J., Porikli, F., Papanikolopoulos, N.: Multi-class active learning for image classification. In: CVPR (2009)

37.

Kapoor, A., Grauman, K., Urtasun, R., Darrell, T.: Active learning with Gaussian processes for object categorization. In: IEEE 11th International Conference on Computer Vision, ICCV 2007, pp. 1–8. IEEE (2007)

38.

Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: CVPR (2014)

39.

Kay, W., et al.: The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017)

40.

Konyushkova, K., Sznitman, R., Fua, P.: Learning active learning from data. In: Advances in Neural Information Processing Systems, pp. 4226–4236 (2017)

41.

Krishna, R., Hata, K., Ren, F., Fei-Fei, L., Niebles, J.C.: Dense-captioning events in videos. In: ICCV (2017)

42.

Lewis, D.D., Catlett, J.: Heterogeneous uncertainty sampling for supervised learning. In: ICML (1994)

43.

Lewis, D.D., Gale, W.A.: A sequential algorithm for training text classifiers. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (1994)

44.

Lin, T., Zhao, X., Su, H., Wang, C., Yang, M.: BSN: boundary sensitive network for temporal action proposal generation. In: ECCV (2018)

45.

Lin, X., Parikh, D.: Active learning for visual question answering: an empirical study. arXiv preprint arXiv:1711.01732 (2017)

46.

Liu, B., Ferrari, V.: Active learning for human pose estimation. In: ICCV (2017)

47.

Maji, S., Bourdev, L., Malik, J.: Action recognition from a distributed representation of pose and appearance. In: CVPR (2011)

48.

Mehran, R., Oyama, A., Shah, M.: Abnormal crowd behavior detection using social force model. In: CVPR (2009)

49.

Niebles, J.C., Chen, C.-W., Fei-Fei, L.: Modeling temporal structure of decomposable motion segments for activity classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6312, pp. 392–405. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15552-9_29CrossRef

50.

Oneata, D., Verbeek, J., Schmid, C.: Efficient action localization with approximately normalized fisher vectors. In: CVPR (2014)

51.

Platt, J., et al.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv. Large Margin Classifiers 10(3), 61–74 (1999)

52.

Poppe, R., et al.: A survey on vision-based human action recognition. Image Vis. Comput. 28(6), 976–990 (2010)CrossRef

53.

Qi, G.J., Hua, X.S., Rui, Y., Tang, J., Zhang, H.J.: Two-dimensional active learning for image classification. In: CVPR (2008)

54.

Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)MathSciNetCrossRef

55.

Settles, B.: Active learning literature survey. Univ. Wis. Madison 52(55–66), 11 (2010)

56.

Shou, Z., Wang, D., Chang, S.F.: Temporal action localization in untrimmed videos via multi-stage CNNS. In: CVPR (2016)

57.

Sigurdsson, G.A., Russakovsky, O., Farhadi, A., Laptev, I., Gupta, A.: Much ado about time: exhaustive annotation of temporal data. In: AAAI Conference on Human Computation and Crowdsourcing (HCOMP) (2016)

58.

Sigurdsson, G.A., et al.: Hollywood in homes: crowdsourcing data collection for activity understanding. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 510–526. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_31CrossRef

59.

Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, pp. 568–576 (2014)

60.

Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: NIPS (2014)

61.

Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: CVPR (2016)

62.

Thompson, C.A., Califf, M.E., Mooney, R.J.: Active learning for natural language parsing and information extraction

63.

Tong, S., Chang, E.: Support vector machine active learning for image retrieval. In: Proceedings of the ninth ACM International Conference on Multimedia, pp. 107–118. ACM (2001)

64.

Vijayanarasimhan, S., Grauman, K.: Large-scale live active learning: training object detectors with crawled data and crowds. Int. J. Comput. Vis. 108(1–2), 97–114 (2014)MathSciNetCrossRef

65.

Vondrick, C., Patterson, D., Ramanan, D.: Efficiently scaling up crowdsourced video annotation. Int. J. Comput. Vis. 101, 184–204 (2012)CrossRef

66.

Vondrick, C., Ramanan, D.: Video annotation and tracking with active learning. In: NIPS (2011)

67.

Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: CVPR (2011)

68.

Wang, L., Xiong, Y., Lin, D., Van Gool, L.: UntrimmedNets for weakly supervised action recognition and detection. In: CVPR (2017)

69.

Weinzaepfel, P., Harchaoui, Z., Schmid, C.: Learning to track for spatio-temporal action localization. In: ICCV (2015)

70.

Woodward, M., Finn, C.: Active one-shot learning. In: NIPS (2016)

71.

Xu, H., Das, A., Saenko, K.: R-C3D: region convolutional 3d network for temporal activity detection (2017)

72.

Yang, J., et al.: Automatically labeling video data using multi-class active learning. In: ICCV (2003)

73.

Yao, B., Jiang, X., Khosla, A., Lin, A.L., Guibas, L., Fei-Fei, L.: Human action recognition by learning bases of action attributes and parts. In: ICCV (2011)

74.

Yeung, S., Ramanathan, V., Russakovsky, O., Shen, L., Mori, G., Fei-Fei, L.: Learning to learn from noisy web videos. In: CVPR (2017)

75.

Yeung, S., Russakovsky, O., Mori, G., Fei-Fei, L.: End-to-end learning of action detection from frame glimpses in videos. In: CVPR (2016)

76.

Yilmaz, A., Javed, O., Shah, M.: Object tracking: a survey. ACM Comput. Surv. (CSUR) 38(4), 13 (2006)CrossRef

77.

Zhao, Y., Xiong, Y., Wang, L., Wu, Z., Lin, D., Tang, X.: Temporal action detection with structured segment networks. In: ICCV (2017)

Titel: What Do I Annotate Next? An Empirical Study of Active Learning for Action Localization
verfasst von: Fabian Caba Heilbron
Joon-Young Lee
Hailin Jin
Bernard Ghanem
Verlag: Springer International Publishing
Buch: Computer Vision – ECCV 2018
Print ISBN: 978-3-030-01251-9

Electronic ISBN: 978-3-030-01252-6

Copyright-Jahr: 2018
DOI: https://doi.org/10.1007/978-3-030-01252-6_13

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"