nach oben

Erschienen in:

2018 | OriginalPaper | Buchkapitel

Multiple-Gaze Geometry: Inferring Novel 3D Locations from Gazes Observed in Monocular Video

verfasst von : Ernesto Brau, Jinyan Guan, Tanya Jeffries, Kobus Barnard

Erschienen in: Computer Vision – ECCV 2018

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

We develop using person gaze direction for scene understanding. In particular, we use intersecting gazes to learn 3D locations that people tend to look at, which is analogous to having multiple camera views. The 3D locations that we discover need not be visible to the camera. Conversely, knowing 3D locations of scene elements that draw visual attention, such as other people in the scene, can help infer gaze direction. We provide a Bayesian generative model for the temporal scene that captures the joint probability of camera parameters, locations of people, their gaze, what they are looking at, and locations of visual attention. Both the number of people in the scene and the number of extra objects that draw attention are unknown and need to be inferred. To execute this joint inference we use a probabilistic data association approach that enables principled comparison of model hypotheses. We use MCMC for inference over the discrete correspondence variables, and approximate the marginalization over continuous parameters using the Metropolis-Laplace approximation, using Hamiltonian (Hybrid) Monte Carlo for maximization. As existing data sets do not provide the 3D locations of what people are looking at, we contribute a small data set that does. On this data set, we infer what people are looking at with 59% precision compared with 13% for a baseline approach, and where those objects are within about 0.58 m.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Fully-Convolutional Point Networks for Large-Scale Point Clouds

Nächstes Kapitel Learning-Based Video Motion Magnification

Alameda-Pineda, X., et al.: Salsa: a novel dataset for multimodal group behavior analysis. IEEE Trans. Pattern Anal. Mach. Intell. 38(8), 1707–1720 (2016)CrossRef

Andriluka, M., Roth, S., Schiele, B.: People-tracking-by-detection and people-detection-by-tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8. IEEE (2008)

Andriyenko, A., Schindler, K.: Globally optimal multi-target tracking on a hexagonal lattice. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6311, pp. 466–479. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15549-9_34CrossRef

Andriyenko, A., Schindler, K., Roth, S.: Discrete-continuous optimization for multi-target tracking. In: CVPR, pp. 1926–1933 (2012)

Ba, S.O., Hung, H., Odobez, J.M.: Visual activity context for focus of attention estimation in dynamic meetings. In: IEEE International Conference on Multimedia and Expo, ICME 2009, pp. 1424–1427. IEEE (2009)

Ba, S.O., Odobez, J.-M.: Probabilistic head pose tracking evaluation in single and multiple camera setups. In: Stiefelhagen, R., Bowers, R., Fiscus, J. (eds.) CLEAR/RT -2007. LNCS, vol. 4625, pp. 276–286. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68585-2_26CrossRef

Ba, S.O., Odobez, J.M.: Recognizing visual focus of attention from head pose in natural meetings. IEEE Trans. Syst. Man Cybern. Part B Cybern. 39(1), 16–33 (2009)CrossRef

Ba, S.O., Odobez, J.M.: Multiperson visual focus of attention from head pose and meeting contextual cues. IEEE Trans. Pattern Anal. Mach. Intell. 33(1), 101–116 (2011)CrossRef

Benfold, B., Reid, I.: Stable multi-target tracking in real-time surveillance video. In: CVPR, pp. 3457–3464 (2011)

10.

Benfold, B., Reid, I.: Guiding visual surveillance by tracking human attention. In: BMVC, pp. 1–11 (2009)

11.

Beymer, D.J.: Face recognition under varying pose. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 1994, pp. 756–761. IEEE (1994)

12.

Blanz, V., Vetter, T.: Face recognition based on fitting a 3D morphable model. IEEE Trans. Pattern Anal. Mach. Intell. 25(9), 1063–1074 (2003)CrossRef

13.

Brau, E., Guan, J., Simek, K., Del Pero, L., Dawson, C.R., Barnard, K.: Bayesian 3D tracking from monocular video. In: 2013 IEEE International Conference on Computer Vision (ICCV), pp. 3368–3375. IEEE (2013)

14.

Chen, C., Heili, A., Odobez, J.M.: A joint estimation of head and body orientation cues in surveillance video. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 860–867. IEEE (2011)

15.

Chen, C., Odobez, J.M.: We are not contortionists: coupled adaptive learning for head and body orientation estimation in surveillance video. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1544–1551. IEEE (2012)

16.

Cristani, M., et al.: Social interaction discovery by statistical analysis of F-formations. In: BMVC (2011)

17.

Dehghan, A., Assari, S.M., Shah, M.: GMMCP tracker: globally optimal generalized maximum multi clique problem for multiple object tracking. In: CVPR, vol. 1, p. 2 (2015)

18.

Del Pero, L., Guan, J., Brau, E., Schlecht, J., Barnard, K.: Sampling bedrooms. In: CVPR, pp. 2009–2016 (2011)

19.

Duffner, S., Garcia, C.: Visual focus of attention estimation with unsupervised incremental learning. IEEE Trans. Circuits Syst. Video Technol. 26(12), 2264–2272 (2016)CrossRef

20.

Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. In: IEEE PAMI (2009)

21.

Gee, A., Cipolla, R.: Determining the gaze of faces in images. Image Vis. Comput. 12(10), 639–647 (1994)CrossRef

22.

Gu, L., Kanade, T.: 3D alignment of face in a single image. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 1305–1312. IEEE (2006)

23.

Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, New York (2000)MATH

24.

Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning; Data Mining, Inference, and Prediction. Springer Series in Statistics. Springer, New York (2001)MATH

25.

Horprasert, T., Yacoob, Y., Davis, L.S.: Computing 3d head orientation from a monocular image sequence. In: 25th Annual AIPR Workshop on Emerging Applications of Computer Vision, pp. 244–252. International Society for Optics and Photonics (1997)

26.

Huang, J., Shao, X., Wechsler, H.: Face pose discrimination using support vector machines (SVM). In: Proceedings of the Fourteenth International Conference on Pattern Recognition, vol. 1, pp. 154–156. IEEE (1998)

27.

Huang, Y., Duan, D., Cui, J., Davoine, F., Wang, L., Zha, H.: Joint estimation of head pose and visual focus of attention. In: 2014 IEEE International Conference on Image Processing (ICIP), pp. 3332–3336. IEEE (2014)

28.

Isard, M., MacCormick, J.: BraMBLe: a Bayesian multiple-blob tracker. In: ICCV, pp. 34–41 (2001)

29.

Jayagopi, D.B., et al.: The vernissage corpus: a multimodal human-robot-interaction dataset. Technical report (2012)

30.

Kitani, K.M., Ziebart, B.D., Bagnell, J.A., Hebert, M.: Activity forecasting. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 201–214. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33765-9_15CrossRef

31.

Kuo, C., Huang, C., Nevatia, R.: Multi-target tracking by on-line learned discriminative appearance models. In: CVPR, pp. 685–692 (2010)

32.

La Cascia, M., Sclaroff, S., Athitsos, V.: Fast, reliable head tracking under varying illumination: an approach based on registration of texture-mapped 3d models. IEEE Trans. Pattern Anal. Mach. Intell. 22(4), 322–336 (2000)CrossRef

33.

Li, Y., Gong, S., Liddell, H.: Support vector regression and classification based multi-view face detection and recognition. In: Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition, pp. 300–305. IEEE (2000)

34.

Li, Y., Gong, S., Sherrah, J., Liddell, H.: Support vector machine based multi-view face detection and recognition. Image Vis. Comput. 22(5), 413–427 (2004)CrossRef

35.

Liu, C.: Exploring new representations and applications for motion analysis. Ph.D. thesis, M.I.T (2009)

36.

Massé, B., Ba, S., Horaud, R.: Simultaneous estimation of gaze direction and visual focus of attention for multi-person-to-robot interaction. In: 2016 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2016)

37.

Milan, A., Leal-Taixé, L., Schindler, K., Reid, I.: Joint tracking and segmentation of multiple targets. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5397–5406 (2015)

38.

Murphy-Chutorian, E., Trivedi, M.M.: Head pose estimation in computer vision: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 31(4), 607–626 (2009)CrossRef

39.

Niyogi, S., Freeman, W.T.: Example-based head tracking. In: Proceedings of the Second International Conference on Automatic Face and Gesture Recognition, pp. 374–378. IEEE (1996)

40.

Oh, S.: Bayesian formulation of data association and Markov chain Monte Carlo data association. In: Robotics: Science and Systems Conference (RSS) Workshop Inside Data association (2008)

41.

Oh, S., Russell, S., Sastry, S.: Markov chain Monte Carlo data association for general multiple target tracking problems (2004)

42.

Otsuka, K., Takemae, Y., Yamato, J.: A probabilistic inference of multiparty-conversation structure based on Markov-switching models of gaze patterns, head directions, and utterances. In: Proceedings of the 7th International Conference on Multimodal Interfaces, pp. 191–198. ACM (2005)

43.

Otsuka, K., Yamato, J., Takemae, Y., Murase, H.: Conversation scene analysis with dynamic Bayesian network basedon visual head tracking. In: 2006 IEEE International Conference on Multimedia and Expo, pp. 949–952. IEEE (2006)

44.

Pirsiavash, H., Ramanan, D., Fowlkes, C.: Globally-optimal greedy algorithms for tracking a variable number of objects. In: CVPR, pp. 1201–1208 (2011)

45.

Sankaranarayanan, K., Chang, M.C., Krahnstoever, N.: Tracking gaze direction from far-field surveillance cameras. In: 2011 IEEE Workshop on Applications of Computer Vision (WACV), pp. 519–526. IEEE (2011)

46.

Segal, A.V., Reid, I.: Latent data association: Bayesian model selection for multi-target tracking. In: 2013 IEEE International Conference on Computer Vision (ICCV), pp. 2904–2911. IEEE (2013)

47.

Sheikhi, S., Odobez, J.-M.: Recognizing the visual focus of attention for human robot interaction. In: Salah, A.A., Ruiz-del-Solar, J., Meriçli, Ç., Oudeyer, P.-Y. (eds.) HBU 2012. LNCS, vol. 7559, pp. 99–112. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34014-7_9CrossRef

48.

Smith, K., Ba, S.O., Gatica-Perez, D., Odobez, J.M.: Tracking the multi person wandering visual focus of attention. In: Proceedings of the 8th International Conference on Multimodal Interfaces, pp. 265–272. ACM (2006)

49.

Smith, K., Ba, S.O., Odobez, J.M., Gatica-Perez, D.: Tracking the visual focus of attention for a varying number of wandering people. IEEE Trans. Pattern Anal. Mach. Intell. 30(7), 1212–1229 (2008)CrossRef

50.

Stiefelhagen, R., Bernardin, K., Bowers, R., Garofolo, J., Mostefa, D., Soundararajan, P.: The CLEAR 2006 evaluation. In: Stiefelhagen, R., Garofolo, J. (eds.) CLEAR 2006. LNCS, vol. 4122, pp. 1–44. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-69568-4_1CrossRef

51.

Stiefelhagen, R., Yang, J., Waibel, A.: Modeling focus of attention for meeting indexing. In: Proceedings of the seventh ACM International Conference on Multimedia (Part 1), pp. 3–10. ACM (1999)

52.

Stiefelhagen, R., Yang, J., Waibel, A.: Modeling focus of attention for meeting indexing based on multiple cues. IEEE Trans. Neural Netw. 13(4), 928–938 (2002)CrossRef

53.

Stiefelhagen, R., Zhu, J.: Head orientation and gaze direction in meetings. In: Extended Abstracts on Human Factors in Computing Systems, CHI 2002, pp. 858–859. ACM (2002)

54.

Tang, S., Andres, B., Andriluka, M., Schiele, B.: Subgraph decomposition for multi-target tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5033–5041 (2015)

55.

Titsias, M.K., Lawrence, N.D., Rattray, M.: Efficient sampling for Gaussian Process inference using control variables. In: Advances in Neural Information Processing Systems, vol. 21, pp. 1681–1688. Curran Associates Inc., Vancouver, British Columbia, Canada (2008)

56.

Valenti, R., Sebe, N., Gevers, T.: Combining head pose and eye location information for gaze estimation. IEEE Trans. Image Process. 21(2), 802–815 (2012)MathSciNetCrossRef

57.

Voit, M., Nickel, K., Stiefelhagen, R.: Head pose estimation in single- and multi-view environments - results on the CLEAR’07 benchmarks. In: Stiefelhagen, R., Bowers, R., Fiscus, J. (eds.) CLEAR/RT -2007. LNCS, vol. 4625, pp. 307–316. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-68585-2_29CrossRef

58.

Voit, M., Stiefelhagen, R.: Deducing the visual focus of attention from head pose estimation in dynamic multi-view meeting scenarios. In: Proceedings of the 10th International Conference on Multimodal Interfaces, pp. 173–180. ACM (2008)

59.

Voit, M., Stiefelhagen, R.: 3D user-perspective, voxel-based estimation of visual focus of attention in dynamic meeting scenarios. In: International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction, p. 51. ACM (2010)

60.

Vondrick, C., Patterson, D., Ramanan, D.: Efficiently scaling up crowdsourced video annotation. Int. J. Comput. Vis. 101, 1–21 (2013). https://doi.org/10.1007/s11263-012-0564-1CrossRef

61.

Wei, P., Zhao, Y., Zheng, N., Zhu, S.C.: Modeling 4d human-object interactions for joint event segmentation, recognition, and object localization. IEEE Trans Pattern Anal. Mach. Intell. 39, 1165–1179 (2016)CrossRef

62.

Wu, Y., Toyama, K.: Wide-range, person-and illumination-insensitive head orientation estimation. In: Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition, pp. 183–188. IEEE (2000)

63.

Xiao, J., Moriyama, T., Kanade, T., Cohn, J.F.: Robust full-motion recovery of head by dynamic templates and re-registration techniques. Int. J. Imaging Syst. Technol. 13(1), 85–94 (2003)CrossRef

64.

Xie, D., Todorovicy, S., Zhu, S.C.: Inferring “dark matter” and “dark energy” from videos. In: ICCV (2013)

65.

Yang, R., Zhang, Z.: Model-based head pose tracking with stereovision. In: Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition, pp. 255–260. IEEE (2002)

66.

Yi, Y., Xu, H.: Hierarchical data association framework with occlusion handling for multiple targets tracking. IEEE Signal Process. Lett. 21(3), 288–291 (2014)MathSciNetCrossRef

67.

Yücel, Z., Salah, A.A., Mericli, C., Meriçli, T., Valenti, R., Gevers, T.: Joint attention by gaze interpolation and saliency. IEEE Trans. Cybern. 43(3), 829–842 (2013)CrossRef

68.

Zen, G., Lepri, B., Ricci, E., Lanz, O.: Space speaks: towards socially and personality aware visual surveillance. In: 1st ACM International Workshop on Multimodal Pervasive Video Analysis, pp. 37–42. ACM, Firenze, Italy (2010)

69.

Zhang, L., Li, Y., Nevatia, R.: Global data association for multi-object tracking using network flows. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8. IEEE (2008)

70.

Zhao, G., Chen, L., Song, J., Chen, G.: Large head movement tracking using sift-based registration. In: Proceedings of the 15th International Conference on Multimedia, pp. 807–810. ACM (2007)

71.

Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2879–2886. IEEE (2012)

Titel: Multiple-Gaze Geometry: Inferring Novel 3D Locations from Gazes Observed in Monocular Video
verfasst von: Ernesto Brau
Jinyan Guan
Tanya Jeffries
Kobus Barnard
Verlag: Springer International Publishing
Buch: Computer Vision – ECCV 2018
Print ISBN: 978-3-030-01224-3

Electronic ISBN: 978-3-030-01225-0

Copyright-Jahr: 2018
DOI: https://doi.org/10.1007/978-3-030-01225-0_38

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"