nach oben

Erschienen in:

2016 | OriginalPaper | Buchkapitel

Detecting Engagement in Egocentric Video

verfasst von : Yu-Chuan Su, Kristen Grauman

Erschienen in: Computer Vision – ECCV 2016

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In a wearable camera video, we see what the camera wearer sees. While this makes it easy to know roughly

https://static-content.springer.com/image/chp%3A10.1007%2F978-3-319-46454-1_28/419978_1_En_28_IEq1_HTML.gif

, it does not immediately reveal

https://static-content.springer.com/image/chp%3A10.1007%2F978-3-319-46454-1_28/419978_1_En_28_IEq2_HTML.gif

. Specifically, at what moments did his focus linger, as he paused to gather more information about something he saw? Knowing this answer would benefit various applications in video summarization and augmented reality, yet prior work focuses solely on the “what” question (estimating saliency, gaze) without considering the “when” (engagement). We propose a learning-based approach that uses long-term egomotion cues to detect engagement, specifically in browsing scenarios where one frequently takes in new visual information (e.g., shopping, touring). We introduce a large, richly annotated dataset for ego-engagement that is the first of its kind. Our approach outperforms a wide array of existing methods. We show engagement can be detected well independent of both scene appearance and the camera wearer’s identity.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Spot On: Action Localization from Pointly-Supervised Proposals

Nächstes Kapitel Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking

Nur mit Berechtigung zugänglich

Throughout, we will use the term “recorder” to refer to the photographer or the first-person camera-wearer; we use the term “viewer” to refer to a third party who is observing the data captured by some other recorder.

For a portion of the video, we also ask the original recorders to label all frames for their own video; this requires substantial tedious effort, hence to get the full labeled set in a scalable manner we apply crowdsourcing.

We randomly generate interval predictions 10 times based on the prior of interval length and temporal distribution and report the average.

Rudoy, D., Goldman, D., Shechtman, E., Zelnik-Manor, L.: Learning video saliency from human gaze using candidate selection. In: CVPR (2013)

Han, J., Sun, L., Hu, X., Han, J., Shao, L.: Spatial and temporal visual attention prediction in videos using eye movement data. Neurocomputing 145, 140–153 (2014)CrossRef

Lee, W., Huang, T., Yeh, S., Chen, H.: Learning-based prediction of visual attention for video signals. IEEE TIP 20(11), 3028–3038 (2011)MathSciNet

Abdollahian, G., Taskiran, C., Pizlo, Z., Delp, E.: Camera motion-based analysis of user generated video. TMM 12(1), 28–41 (2010)

Mahadevan, V., Vasconcelos, N.: Spatiotemporal saliency in dynamic scenes. TPAMI 32(1), 171–177 (2010)

Rahtu, E., Kannala, J., Salo, M., Heikkilä, J.: Segmenting salient objects from images and videos. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6315, pp. 366–379. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15555-0_27 CrossRef

Itti, L., Baldi, P.: Bayesian surprise attracts human attention. Vision Res. 49(10), 1295–1306 (2009)CrossRef

Liu, H., Jiang, S., Huang, Q., Xu, C.: A generic virtual content insertion system based on visual attention analysis. In: ACM MM (2008)

Li, Y., Fathi, A., Rehg, J.M.: Learning to predict gaze in egocentric video. In: ICCV (2013)

10.

Yamada, K., Sugano, Y., Okabe, T., Sato, Y., Sugimoto, A., Hiraki, K.: Attention prediction in egocentric video using motion and visual saliency. In: Ho, Y.-S. (ed.) PSIVT 2011. LNCS, vol. 7087, pp. 277–288. Springer, Heidelberg (2011). doi:10.1007/978-3-642-25367-6_25 CrossRef

11.

Yamada, K., Sugano, Y., Okabe, T., Sato, Y., Sugimoto, A., Hiraki, K.: Can saliency map models predict human egocentric visual attention? In: Koch, R., Huang, F. (eds.) ACCV 2010. LNCS, vol. 6468, pp. 420–429. Springer, Heidelberg (2011). doi:10.1007/978-3-642-22822-3_42 CrossRef

12.

Kender, J., Yeo, B.L.: On the structure and analysis of home videos. In: ACCV (2000)

13.

Li, K., Oh, S., Perera, A., Fu, Y.: A videography analysis framework for video retrieval and summarization. In: BMVC (2012)

14.

Gygli, M., Grabner, H., Riemenschneider, H., Gool, L.: Creating summaries from user videos. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 505–520. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10584-0_33

15.

Poleg, Y., Arora, C., Peleg, S.: Temporal segmentation of egocentric videos. In: CVPR (2014)

16.

Nguyen, T.V., Xu, M., Gao, G., Kankanhalli, M., Tian, Q., Yan, S.: Static saliency vs. dynamic saliency: a comparative study. In: ACM MM (2013)

17.

Ejaz, N., Mehmood, I., Baik, S.: Efficient visual attention based framework for extracting key frames from videos. Image Commun. 28, 34–44 (2013)

18.

Itti, L., Dhavale, N., Pighin, F.: Realistic avatar eye and head animation using a neurobiological model of visual attention. In: Proceedings of the SPIE 48th Annual International Symposium on Optical Science and Technology, vol. 5200, pp. 64–78, August 2003

19.

Harel, J., Koch, C., Perona, P.: Graph-based visual saliency. In: NIPS (2007)

20.

Seo, H., Milanfar, P.: Static and space-time visual saliency detection by self-resemblance. J. Vision 9(7), 1–27 (2009)CrossRef

21.

Ma, Y.F., Lu, L., Zhang, H.J., Li, M.: A user attention model for video summarization. In: ACM MM (2002)

22.

Kienzle, W., Schölkopf, B., Wichmann, F.A., Franz, M.O.: How to find interesting locations in video: a spatiotemporal interest point detector learned from human eye movements. In: Hamprecht, F.A., Schnörr, C., Jähne, B. (eds.) DAGM 2007. LNCS, vol. 4713, pp. 405–414. Springer, Heidelberg (2007). doi:10.1007/978-3-540-74936-3_41 CrossRef

23.

Dorr, M., Martinetz, T., Gegenfurtner, K.R., Barth, E.: Variability of eye movements when viewing dynamic natural scenes. J. Vision 10(10), 1–17 (2010)CrossRef

24.

Pilu, M.: On the use of attention clues for an autonomous wearable camera. Technical report HPL-2002-195, HP Laboratories Bristol (2003)

25.

Rallapalli, S., Ganesan, A., Padmanabhan, V., Chintalapudi, K., Qiu, L.: Enabling physical analytics in retail stores using smart glasses. In: MobiCom (2014)

26.

Nakamura, Y., Ohde, J., Ohta, Y.: Structuring personal activity records based on attention-analyzing videos from head mounted camera. In: ICPR (2000)

27.

Cheatle, P.: Media content and type selection from always-on wearable video. In: ICPR (2004)

28.

Lee, Y.J., Ghosh, J., Grauman, K.: Discovering important people and objects for egocentric video summarization. In: CVPR (2012)

29.

Lu, Z., Grauman, K.: Story-driven summarization for egocentric video. In: CVPR (2013)

30.

Aghazadeh, O., Sullivan, J., Carlsson, S.: Novelty detection from an egocentric perspective. In: CVPR (2011)

31.

Hoshen, Y., Ben-Artzi, G., Peleg, S.: Wisdom of the crowd in egocentric video curation. In: CVPR Workshop (2014)

32.

Park, H.S., Jain, E., Sheikh, Y.: 3D gaze concurrences from head-mounted cameras. In: NIPS (2012)

33.

Fathi, A., Hodgins, J., Rehg, J.: Social interactions: a first-person perspective. In: CVPR (2012)

34.

Fathi, A., Farhadi, A., Rehg, J.: Understanding egocentric activities. In: ICCV (2011)

35.

Pirsiavash, H., Ramanan, D.: Detecting activities of daily living in first-person camera views. In: CVPR (2012)

36.

Damen, D., Leelasawassuk, T., Haines, O., Calway, A., Mayol-Cuevas, W.: You-do, i-learn: discovering task relevant objects and their modes of interaction from multi-user egocentric video. In: BMVC 2014 (2014)

37.

Soran, B., Farhadi, A., Shapiro, L.: Action recognition in the presence of one egocentric and multiple static cameras. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9007, pp. 178–193. Springer, Heidelberg (2015). doi:10.1007/978-3-319-16814-2_12

38.

Kitani, K., Okabe, T., Sato, Y., Sugimoto, A.: Fast unsupervised ego-action learning for first-person sports video. In: CVPR (2011)

39.

Spriggs, E., la Torre, F.D., Hebert, M.: Temporal segmentation and activity classification from first-person sensing. In: CVPR Workshop on Egocentric Vision (2009)

40.

Li, Y., Ye, Z., Rehg, J.: Delving into egocentric actions. In: CVPR (2015)

41.

Mital, P.K., Smith, T.J., Hill, R.L., Henderson, J.M.: Clustering of gaze during dynamic scene viewing is predicted by motion. Cogn. Comput. 3(1), 5–24 (2011)CrossRef

42.

Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. arXiv preprint (2014). arXiv:1408.5093

43.

Liu, C.: Beyond Pixels: Exploring New Representations and Applications for Motion Analysis. Ph.D. thesis, Massachusetts Institute of Technology, May 2009

44.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in python. JMLR 12, 2825–2830 (2011)MathSciNetMATH

Titel: Detecting Engagement in Egocentric Video
verfasst von: Yu-Chuan Su
Kristen Grauman
Verlag: Springer International Publishing
Buch: Computer Vision – ECCV 2016
Print ISBN: 978-3-319-46453-4

Electronic ISBN: 978-3-319-46454-1

Copyright-Jahr: 2016
DOI: https://doi.org/10.1007/978-3-319-46454-1_28

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"