Top

Published in:

2016 | OriginalPaper | Chapter

Leaving Some Stones Unturned: Dynamic Feature Prioritization for Activity Detection in Streaming Video

Authors : Yu-Chuan Su, Kristen Grauman

Published in: Computer Vision – ECCV 2016

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Current approaches for activity recognition often ignore constraints on computational resources: (1) they rely on extensive feature computation to obtain rich descriptors on all frames, and (2) they assume batch-mode access to the entire test video at once. We propose a new active approach to activity recognition that prioritizes “what to compute when” in order to make timely predictions. The main idea is to learn a policy that dynamically schedules the sequence of features to compute on selected frames of a given test video. In contrast to traditional static feature selection, our approach continually re-prioritizes computation based on the accumulated history of observations and accounts for the transience of those observations in ongoing video. We develop variants to handle both the batch and streaming settings. On two challenging datasets, our method provides significantly better accuracy than alternative techniques for a wide range of computational budgets.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Video Summarization with Long Short-Term Memory

next chapter Robust and Accurate Line- and/or Point-Based Pose Estimation without Manhattan Assumptions

Available only for authorised users

For clarity of presentation, in the following we present our method assuming a trimmed input video; Sect. 3.2 explains adjustments for untrimmed inputs.

Note that \(a^{(k)}\) identifies an action selected at step k in the episode, whereas \(a_{m}\) is one of the M discrete action choices in \(\mathcal {A}\).

Some object detectors share features across object categories, e.g., R-CNN [53], in which case it may be practical to simplify the action to select only the video volume and apply all object classes. We use the DPM detector [54], which has the advantage of near real-time detection [42] using a single thread, whereas R-CNN relies heavily on parallel computation and hardware acceleration [55].

We retain the 75 objects among all 15,000 found most responsive for the activities, following [4]. Because the provided detections are frame-level, we split volumes only in the temporal dimension for \(l_m\) on UCF.

Object-Pref [4] is not applicable to the streaming case because it lacks a unique object response prior for the actions that is dependent on the buffer location.

ADL is less amenable to full-frame CNN descriptors, due to domain shift of egocentric video and the nature of the composite, object-driven activities.

Aggarwal, J.K., Ryoo, M.S.: Human activity analysis: a review. ACM Comput. Surv. 43(3), April 2011

Wang, H., Schmid, C.: Action recognition with improved trajectories. In: ICCV (2013)

Pirsiavash, H., Ramanan, D.: Detecting activities of daily living in first-person camera views. In: CVPR (2012)

Jain, M., van Gemert, J.C., Snoek, C.G.M.: What do 15,000 object categories tell us about classifying and localizing actions? In: CVPR (2015)

Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: NIPS (2014)

Xu, Z., Yang, Y., Hauptman, A.: A discriminative cnn video representation for event detection. In: CVPR (2015)

Zha, S., Luisier, F., Andrews, W., Srivastava, N., Salakhutdinov, R.: Exploiting image-trained CNN architectures for unconstrained video classification. In: BMVC (2015)

Han, D., Bo, L., Sminchisescu, C.: Selection and context for action recognition. In: ICCV (2009)

Butko, N., Movellan, J.: Optimal scanning for faster object detection. In: CVPR (2009)

10.

Vijayanarasimhan, S., Kapoor, A.: Visual recognition and detection under bounded computational resources. In: CVPR (2010)

11.

Karayev, S., Baumgartner, T., Fritz, M., Darrell, T.: Timely object recognition. In: NIPS (2012)

12.

Dulac-Arnold, G., Denoyer, L., Thome, N., Cord, M.: Sequentially generated instance-dependent image representations for classification. In: ICLR (2014)

13.

Karayev, S., Fritz, M., Darrell, T.: Anytime recognition of objects and scenes. In: CVPR (2014)

14.

Gonzalez-Garcia, A., Vezhnevets, A., Ferrari, V.: An active search strategy for efficient object class detection. In: CVPR (2015)

15.

Gao, T., Koller, D.: Active classification based on value of classifier. In: NIPS (2011)

16.

Yu, X., Fermuller, C., Teo, C.L., Yang, Y., Aloimonos, Y.: Active scene recognition with vision and language. In: CVPR (2011)

17.

Yeung, S., Russakovsky, O., Mori, G., Fei-Fei, L.: End-to-end learning of action detection from frame glimpses in videos. In: CVPR (2016)

18.

Hoai, M., la Torre, F.D.: Max-margin early event detectors. In: CVPR (2012)

19.

Ryoo, M.: Human activity prediction: early recognition of ongoing activities from streaming videos. In: ICCV (2011)

20.

Davis, J., Tyagi, A.: Minimal-latency human action recognition using reliable-inference. Image Vis. Comput. 24, 455–472 (2006)CrossRef

21.

Yao, B., Jiang, X., Khosla, A., Lin, A., Guibas, L., Fei-Fei, L.: Human action recognition by learning bases of action attributes and parts. In: ICCV (2011)

22.

Rohrbach, M., Regneri, M., Andriluka, M., Amin, S., Pinkal, M., Schiele, B.: Script data for attribute-based recognition of composite activities. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 144–157. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33718-5_11

23.

Li, Y., Ye, Z., Rehg, J.M.: Delving into egocentric actions. In: CVPR (2015)

24.

Ma, M., Fan, H., Kitani, K.M.: Going deeper into first-person activity recognition (2016)

25.

Jiang, Y.G., Liu, J., Roshan Zamir, A., Toderici, G., Laptev, I., Shah, M., Sukthankar, R.: THUMOS challenge: action recognition with a large number of classes (2014). http://crcv.ucf.edu/THUMOS14/

26.

Heilbron, F.C., Escorcia, V., Ghanem, B., Niebles, J.C.: ActivityNet: a large-scale video benchmark for human activity understanding. In: CVPR (2015)

27.

Ke, Y., Sukthankar, R., Hebert, M.: Efficient visual event detection using volumetric features. In: ICCV (2005)

28.

Duchenne, O., Laptev, I., Sivic, J., Bach, F., Ponce, J.: Automatic annotation of human actions in video. In: ICCV (2009)

29.

Satkin, S., Hebert, M.: Modeling the temporal extent of actions. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6311, pp. 536–548. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15549-9_39 CrossRef

30.

Medioni, G., Nevatia, R., Cohen, I.: Event detection and analysis from video streams. Trans. Pattern Anal. Mach. Intell. 23(8), 873–889 (2001)CrossRef

31.

Yao, A., Gall, J., van Gool, L.: A hough transform-based voting framework for action recognition. In: CVPR (2010)

32.

Kläser, A., Marszałek, M., Schmid, C., Zisserman, A.: Human focused action localization in video. In: International Workshop on Sign, Gesture, Activity (2010)

33.

Lan, T., Wang, Y., Mori, G.: Discriminative figure-centric models for joint action localization and recognition. In: ICCV (2011)

34.

Yu, G., Yuan, J.: Fast action proposals for human action detection and search. In: CVPR (2015)

35.

Jain, M., van Gemert, J., Jegou, H., Bouthemy, P., Snoek, C.: Action localization with tubelets from motion. In: CVPR (2015)

36.

Gkioxari, G., Malik, J.: Finding action tubes. In: CVPR (2015)

37.

van Gemert, J.C., Jain, M., Gati, E., Snoek, C.G.M.: APT: action localization proposals from dense trajectories. In: BMVC (2015)

38.

Chen, C.Y., Grauman, K.: Efficient activity detection with max-subgraph search. In: CVPR (2012)

39.

Yu, G., Yuan, J., Liu, Z.: Unsupervised random forest indexing for fast action search. In: CVPR (2011)

40.

Su, Y.C., Grauman, K.: Leaving some stones unturned: dynamic feature prioritization for activity detection in streaming video. Technical report, Comp. Sci. Dept., Univ. Texas, Austin, AI-15-05 (2015)

41.

Sadeghi, M.A., Forsyth, D.: 30Hz object detection with DPM V5. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 65–79. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10590-1_5

42.

Yan, J., Lei, Z., Wen, L., Li, S.: The fastest deformable part model for object detection. In: CVPR (2014)

43.

Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: CVPR (2001)

44.

Pedersoli, M., Vedaldi, A., Gonzalez, J.: A coarse-to-fine approach for fast deformable object detection. In: CVPR (2011)

45.

Weiss, D.J., Taskar, B.: Learning adaptive value of information for structured prediction. In: NIPS (2013)

46.

Wang, J., Bolukbasi, T., Trapeznikov, K., Saligrama, V.: Model selection by linear programming. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 647–662. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10605-2_42

47.

Karasev, V., Ravichandran, A., Soatto, S.: Active frame, location, and detector selection for automated and manual video annotation. In: CVPR (2014)

48.

Chen, D., Bilgic, M., Getoor, L., Jacobs, D.: Dynamic processing allocation in video. IEEE Trans. Pattern Anal. Mach. Intell. 33(11), 2174–2187 (2011)CrossRef

49.

Amer, M.R., Xie, D., Zhao, M., Todorovic, S., Zhu, S.-C.: Cost-sensitive top-down/bottom-up inference for multiscale activity recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 187–200. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33765-9_14

50.

Amer, M., Todorovic, S., Fern, A., Zhu, S.C.: Monte Carlo tree search for scheduling activity recognition. In: ICCV (2013)

51.

Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach. Pearson, Upper Saddle River (2010)MATH

52.

Gupta, A., Davis, L.S.: Objects in action: an approach for combining action understanding and object perception. In: CVPR (2007)

53.

Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)

54.

Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. PAMI 32(9), 1627–1645 (2010)CrossRef

55.

Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)

56.

Nvidia: GPU-based deep learning inference: a performance and power analysis. Whitepaper (2015)

57.

Soomro, K., Zamir, A.R., Shah, M.: UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012)

58.

Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

59.

Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)MathSciNetCrossRefMATH

60.

Lacoste-Julien, S., Jaggi, M., Schmidt, M., Pletscher, P.: Block-coordinate Frank-Wolfe optimization for structural SVMs. In: ICML (2013)

61.

McCandless, T., Grauman, K.: Object-centric spatio-temporal pyramids for egocentric activity recognition. In: BMVC (2013)

Title: Leaving Some Stones Unturned: Dynamic Feature Prioritization for Activity Detection in Streaming Video
Authors: Yu-Chuan Su
Kristen Grauman
Publisher: Springer International Publishing
Book: Computer Vision – ECCV 2016
Print ISBN: 978-3-319-46477-0

Electronic ISBN: 978-3-319-46478-7

Copyright Year: 2016
DOI: https://doi.org/10.1007/978-3-319-46478-7_48

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner