Skip to main content
Top

2016 | OriginalPaper | Chapter

Detecting Engagement in Egocentric Video

Authors : Yu-Chuan Su, Kristen Grauman

Published in: Computer Vision – ECCV 2016

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In a wearable camera video, we see what the camera wearer sees. While this makes it easy to know roughly https://static-content.springer.com/image/chp%3A10.1007%2F978-3-319-46454-1_28/419978_1_En_28_IEq1_HTML.gif , it does not immediately reveal https://static-content.springer.com/image/chp%3A10.1007%2F978-3-319-46454-1_28/419978_1_En_28_IEq2_HTML.gif . Specifically, at what moments did his focus linger, as he paused to gather more information about something he saw? Knowing this answer would benefit various applications in video summarization and augmented reality, yet prior work focuses solely on the “what” question (estimating saliency, gaze) without considering the “when” (engagement). We propose a learning-based approach that uses long-term egomotion cues to detect engagement, specifically in browsing scenarios where one frequently takes in new visual information (e.g., shopping, touring). We introduce a large, richly annotated dataset for ego-engagement that is the first of its kind. Our approach outperforms a wide array of existing methods. We show engagement can be detected well independent of both scene appearance and the camera wearer’s identity.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
1
Throughout, we will use the term “recorder” to refer to the photographer or the first-person camera-wearer; we use the term “viewer” to refer to a third party who is observing the data captured by some other recorder.
 
2
For a portion of the video, we also ask the original recorders to label all frames for their own video; this requires substantial tedious effort, hence to get the full labeled set in a scalable manner we apply crowdsourcing.
 
3
We randomly generate interval predictions 10 times based on the prior of interval length and temporal distribution and report the average.
 
Literature
1.
go back to reference Rudoy, D., Goldman, D., Shechtman, E., Zelnik-Manor, L.: Learning video saliency from human gaze using candidate selection. In: CVPR (2013) Rudoy, D., Goldman, D., Shechtman, E., Zelnik-Manor, L.: Learning video saliency from human gaze using candidate selection. In: CVPR (2013)
2.
go back to reference Han, J., Sun, L., Hu, X., Han, J., Shao, L.: Spatial and temporal visual attention prediction in videos using eye movement data. Neurocomputing 145, 140–153 (2014)CrossRef Han, J., Sun, L., Hu, X., Han, J., Shao, L.: Spatial and temporal visual attention prediction in videos using eye movement data. Neurocomputing 145, 140–153 (2014)CrossRef
3.
go back to reference Lee, W., Huang, T., Yeh, S., Chen, H.: Learning-based prediction of visual attention for video signals. IEEE TIP 20(11), 3028–3038 (2011)MathSciNet Lee, W., Huang, T., Yeh, S., Chen, H.: Learning-based prediction of visual attention for video signals. IEEE TIP 20(11), 3028–3038 (2011)MathSciNet
4.
go back to reference Abdollahian, G., Taskiran, C., Pizlo, Z., Delp, E.: Camera motion-based analysis of user generated video. TMM 12(1), 28–41 (2010) Abdollahian, G., Taskiran, C., Pizlo, Z., Delp, E.: Camera motion-based analysis of user generated video. TMM 12(1), 28–41 (2010)
5.
go back to reference Mahadevan, V., Vasconcelos, N.: Spatiotemporal saliency in dynamic scenes. TPAMI 32(1), 171–177 (2010) Mahadevan, V., Vasconcelos, N.: Spatiotemporal saliency in dynamic scenes. TPAMI 32(1), 171–177 (2010)
6.
go back to reference Rahtu, E., Kannala, J., Salo, M., Heikkilä, J.: Segmenting salient objects from images and videos. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6315, pp. 366–379. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15555-0_27 CrossRef Rahtu, E., Kannala, J., Salo, M., Heikkilä, J.: Segmenting salient objects from images and videos. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6315, pp. 366–379. Springer, Heidelberg (2010). doi:10.​1007/​978-3-642-15555-0_​27 CrossRef
7.
go back to reference Itti, L., Baldi, P.: Bayesian surprise attracts human attention. Vision Res. 49(10), 1295–1306 (2009)CrossRef Itti, L., Baldi, P.: Bayesian surprise attracts human attention. Vision Res. 49(10), 1295–1306 (2009)CrossRef
8.
go back to reference Liu, H., Jiang, S., Huang, Q., Xu, C.: A generic virtual content insertion system based on visual attention analysis. In: ACM MM (2008) Liu, H., Jiang, S., Huang, Q., Xu, C.: A generic virtual content insertion system based on visual attention analysis. In: ACM MM (2008)
9.
go back to reference Li, Y., Fathi, A., Rehg, J.M.: Learning to predict gaze in egocentric video. In: ICCV (2013) Li, Y., Fathi, A., Rehg, J.M.: Learning to predict gaze in egocentric video. In: ICCV (2013)
10.
go back to reference Yamada, K., Sugano, Y., Okabe, T., Sato, Y., Sugimoto, A., Hiraki, K.: Attention prediction in egocentric video using motion and visual saliency. In: Ho, Y.-S. (ed.) PSIVT 2011. LNCS, vol. 7087, pp. 277–288. Springer, Heidelberg (2011). doi:10.1007/978-3-642-25367-6_25 CrossRef Yamada, K., Sugano, Y., Okabe, T., Sato, Y., Sugimoto, A., Hiraki, K.: Attention prediction in egocentric video using motion and visual saliency. In: Ho, Y.-S. (ed.) PSIVT 2011. LNCS, vol. 7087, pp. 277–288. Springer, Heidelberg (2011). doi:10.​1007/​978-3-642-25367-6_​25 CrossRef
11.
go back to reference Yamada, K., Sugano, Y., Okabe, T., Sato, Y., Sugimoto, A., Hiraki, K.: Can saliency map models predict human egocentric visual attention? In: Koch, R., Huang, F. (eds.) ACCV 2010. LNCS, vol. 6468, pp. 420–429. Springer, Heidelberg (2011). doi:10.1007/978-3-642-22822-3_42 CrossRef Yamada, K., Sugano, Y., Okabe, T., Sato, Y., Sugimoto, A., Hiraki, K.: Can saliency map models predict human egocentric visual attention? In: Koch, R., Huang, F. (eds.) ACCV 2010. LNCS, vol. 6468, pp. 420–429. Springer, Heidelberg (2011). doi:10.​1007/​978-3-642-22822-3_​42 CrossRef
12.
go back to reference Kender, J., Yeo, B.L.: On the structure and analysis of home videos. In: ACCV (2000) Kender, J., Yeo, B.L.: On the structure and analysis of home videos. In: ACCV (2000)
13.
go back to reference Li, K., Oh, S., Perera, A., Fu, Y.: A videography analysis framework for video retrieval and summarization. In: BMVC (2012) Li, K., Oh, S., Perera, A., Fu, Y.: A videography analysis framework for video retrieval and summarization. In: BMVC (2012)
14.
go back to reference Gygli, M., Grabner, H., Riemenschneider, H., Gool, L.: Creating summaries from user videos. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 505–520. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10584-0_33 Gygli, M., Grabner, H., Riemenschneider, H., Gool, L.: Creating summaries from user videos. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 505–520. Springer, Heidelberg (2014). doi:10.​1007/​978-3-319-10584-0_​33
15.
go back to reference Poleg, Y., Arora, C., Peleg, S.: Temporal segmentation of egocentric videos. In: CVPR (2014) Poleg, Y., Arora, C., Peleg, S.: Temporal segmentation of egocentric videos. In: CVPR (2014)
16.
go back to reference Nguyen, T.V., Xu, M., Gao, G., Kankanhalli, M., Tian, Q., Yan, S.: Static saliency vs. dynamic saliency: a comparative study. In: ACM MM (2013) Nguyen, T.V., Xu, M., Gao, G., Kankanhalli, M., Tian, Q., Yan, S.: Static saliency vs. dynamic saliency: a comparative study. In: ACM MM (2013)
17.
go back to reference Ejaz, N., Mehmood, I., Baik, S.: Efficient visual attention based framework for extracting key frames from videos. Image Commun. 28, 34–44 (2013) Ejaz, N., Mehmood, I., Baik, S.: Efficient visual attention based framework for extracting key frames from videos. Image Commun. 28, 34–44 (2013)
18.
go back to reference Itti, L., Dhavale, N., Pighin, F.: Realistic avatar eye and head animation using a neurobiological model of visual attention. In: Proceedings of the SPIE 48th Annual International Symposium on Optical Science and Technology, vol. 5200, pp. 64–78, August 2003 Itti, L., Dhavale, N., Pighin, F.: Realistic avatar eye and head animation using a neurobiological model of visual attention. In: Proceedings of the SPIE 48th Annual International Symposium on Optical Science and Technology, vol. 5200, pp. 64–78, August 2003
19.
go back to reference Harel, J., Koch, C., Perona, P.: Graph-based visual saliency. In: NIPS (2007) Harel, J., Koch, C., Perona, P.: Graph-based visual saliency. In: NIPS (2007)
20.
go back to reference Seo, H., Milanfar, P.: Static and space-time visual saliency detection by self-resemblance. J. Vision 9(7), 1–27 (2009)CrossRef Seo, H., Milanfar, P.: Static and space-time visual saliency detection by self-resemblance. J. Vision 9(7), 1–27 (2009)CrossRef
21.
go back to reference Ma, Y.F., Lu, L., Zhang, H.J., Li, M.: A user attention model for video summarization. In: ACM MM (2002) Ma, Y.F., Lu, L., Zhang, H.J., Li, M.: A user attention model for video summarization. In: ACM MM (2002)
22.
go back to reference Kienzle, W., Schölkopf, B., Wichmann, F.A., Franz, M.O.: How to find interesting locations in video: a spatiotemporal interest point detector learned from human eye movements. In: Hamprecht, F.A., Schnörr, C., Jähne, B. (eds.) DAGM 2007. LNCS, vol. 4713, pp. 405–414. Springer, Heidelberg (2007). doi:10.1007/978-3-540-74936-3_41 CrossRef Kienzle, W., Schölkopf, B., Wichmann, F.A., Franz, M.O.: How to find interesting locations in video: a spatiotemporal interest point detector learned from human eye movements. In: Hamprecht, F.A., Schnörr, C., Jähne, B. (eds.) DAGM 2007. LNCS, vol. 4713, pp. 405–414. Springer, Heidelberg (2007). doi:10.​1007/​978-3-540-74936-3_​41 CrossRef
23.
go back to reference Dorr, M., Martinetz, T., Gegenfurtner, K.R., Barth, E.: Variability of eye movements when viewing dynamic natural scenes. J. Vision 10(10), 1–17 (2010)CrossRef Dorr, M., Martinetz, T., Gegenfurtner, K.R., Barth, E.: Variability of eye movements when viewing dynamic natural scenes. J. Vision 10(10), 1–17 (2010)CrossRef
24.
go back to reference Pilu, M.: On the use of attention clues for an autonomous wearable camera. Technical report HPL-2002-195, HP Laboratories Bristol (2003) Pilu, M.: On the use of attention clues for an autonomous wearable camera. Technical report HPL-2002-195, HP Laboratories Bristol (2003)
25.
go back to reference Rallapalli, S., Ganesan, A., Padmanabhan, V., Chintalapudi, K., Qiu, L.: Enabling physical analytics in retail stores using smart glasses. In: MobiCom (2014) Rallapalli, S., Ganesan, A., Padmanabhan, V., Chintalapudi, K., Qiu, L.: Enabling physical analytics in retail stores using smart glasses. In: MobiCom (2014)
26.
go back to reference Nakamura, Y., Ohde, J., Ohta, Y.: Structuring personal activity records based on attention-analyzing videos from head mounted camera. In: ICPR (2000) Nakamura, Y., Ohde, J., Ohta, Y.: Structuring personal activity records based on attention-analyzing videos from head mounted camera. In: ICPR (2000)
27.
go back to reference Cheatle, P.: Media content and type selection from always-on wearable video. In: ICPR (2004) Cheatle, P.: Media content and type selection from always-on wearable video. In: ICPR (2004)
28.
go back to reference Lee, Y.J., Ghosh, J., Grauman, K.: Discovering important people and objects for egocentric video summarization. In: CVPR (2012) Lee, Y.J., Ghosh, J., Grauman, K.: Discovering important people and objects for egocentric video summarization. In: CVPR (2012)
29.
go back to reference Lu, Z., Grauman, K.: Story-driven summarization for egocentric video. In: CVPR (2013) Lu, Z., Grauman, K.: Story-driven summarization for egocentric video. In: CVPR (2013)
30.
go back to reference Aghazadeh, O., Sullivan, J., Carlsson, S.: Novelty detection from an egocentric perspective. In: CVPR (2011) Aghazadeh, O., Sullivan, J., Carlsson, S.: Novelty detection from an egocentric perspective. In: CVPR (2011)
31.
go back to reference Hoshen, Y., Ben-Artzi, G., Peleg, S.: Wisdom of the crowd in egocentric video curation. In: CVPR Workshop (2014) Hoshen, Y., Ben-Artzi, G., Peleg, S.: Wisdom of the crowd in egocentric video curation. In: CVPR Workshop (2014)
32.
go back to reference Park, H.S., Jain, E., Sheikh, Y.: 3D gaze concurrences from head-mounted cameras. In: NIPS (2012) Park, H.S., Jain, E., Sheikh, Y.: 3D gaze concurrences from head-mounted cameras. In: NIPS (2012)
33.
go back to reference Fathi, A., Hodgins, J., Rehg, J.: Social interactions: a first-person perspective. In: CVPR (2012) Fathi, A., Hodgins, J., Rehg, J.: Social interactions: a first-person perspective. In: CVPR (2012)
34.
go back to reference Fathi, A., Farhadi, A., Rehg, J.: Understanding egocentric activities. In: ICCV (2011) Fathi, A., Farhadi, A., Rehg, J.: Understanding egocentric activities. In: ICCV (2011)
35.
go back to reference Pirsiavash, H., Ramanan, D.: Detecting activities of daily living in first-person camera views. In: CVPR (2012) Pirsiavash, H., Ramanan, D.: Detecting activities of daily living in first-person camera views. In: CVPR (2012)
36.
go back to reference Damen, D., Leelasawassuk, T., Haines, O., Calway, A., Mayol-Cuevas, W.: You-do, i-learn: discovering task relevant objects and their modes of interaction from multi-user egocentric video. In: BMVC 2014 (2014) Damen, D., Leelasawassuk, T., Haines, O., Calway, A., Mayol-Cuevas, W.: You-do, i-learn: discovering task relevant objects and their modes of interaction from multi-user egocentric video. In: BMVC 2014 (2014)
37.
go back to reference Soran, B., Farhadi, A., Shapiro, L.: Action recognition in the presence of one egocentric and multiple static cameras. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9007, pp. 178–193. Springer, Heidelberg (2015). doi:10.1007/978-3-319-16814-2_12 Soran, B., Farhadi, A., Shapiro, L.: Action recognition in the presence of one egocentric and multiple static cameras. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9007, pp. 178–193. Springer, Heidelberg (2015). doi:10.​1007/​978-3-319-16814-2_​12
38.
go back to reference Kitani, K., Okabe, T., Sato, Y., Sugimoto, A.: Fast unsupervised ego-action learning for first-person sports video. In: CVPR (2011) Kitani, K., Okabe, T., Sato, Y., Sugimoto, A.: Fast unsupervised ego-action learning for first-person sports video. In: CVPR (2011)
39.
go back to reference Spriggs, E., la Torre, F.D., Hebert, M.: Temporal segmentation and activity classification from first-person sensing. In: CVPR Workshop on Egocentric Vision (2009) Spriggs, E., la Torre, F.D., Hebert, M.: Temporal segmentation and activity classification from first-person sensing. In: CVPR Workshop on Egocentric Vision (2009)
40.
go back to reference Li, Y., Ye, Z., Rehg, J.: Delving into egocentric actions. In: CVPR (2015) Li, Y., Ye, Z., Rehg, J.: Delving into egocentric actions. In: CVPR (2015)
41.
go back to reference Mital, P.K., Smith, T.J., Hill, R.L., Henderson, J.M.: Clustering of gaze during dynamic scene viewing is predicted by motion. Cogn. Comput. 3(1), 5–24 (2011)CrossRef Mital, P.K., Smith, T.J., Hill, R.L., Henderson, J.M.: Clustering of gaze during dynamic scene viewing is predicted by motion. Cogn. Comput. 3(1), 5–24 (2011)CrossRef
42.
go back to reference Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. arXiv preprint (2014). arXiv:1408.5093 Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. arXiv preprint (2014). arXiv:​1408.​5093
43.
go back to reference Liu, C.: Beyond Pixels: Exploring New Representations and Applications for Motion Analysis. Ph.D. thesis, Massachusetts Institute of Technology, May 2009 Liu, C.: Beyond Pixels: Exploring New Representations and Applications for Motion Analysis. Ph.D. thesis, Massachusetts Institute of Technology, May 2009
44.
go back to reference Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in python. JMLR 12, 2825–2830 (2011)MathSciNetMATH Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in python. JMLR 12, 2825–2830 (2011)MathSciNetMATH
Metadata
Title
Detecting Engagement in Egocentric Video
Authors
Yu-Chuan Su
Kristen Grauman
Copyright Year
2016
DOI
https://doi.org/10.1007/978-3-319-46454-1_28

Premium Partner