Skip to main content
Erschienen in: International Journal of Computer Vision 1-2/2014

01.08.2014

Exploring Transfer Learning Approaches for Head Pose Classification from Multi-view Surveillance Images

verfasst von: Anoop Kolar Rajagopal, Ramanathan Subramanian, Elisa Ricci, Radu L. Vieriu, Oswald Lanz, Ramakrishnan Kalpathi R., Nicu Sebe

Erschienen in: International Journal of Computer Vision | Ausgabe 1-2/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Head pose classification from surveillance images acquired with distant, large field-of-view cameras is difficult as faces are captured at low-resolution and have a blurred appearance. Domain adaptation approaches are useful for transferring knowledge from the training (source) to the test (target) data when they have different attributes, minimizing target data labeling efforts in the process. This paper examines the use of transfer learning for efficient multi-view head pose classification with minimal target training data under three challenging situations: (i) where the range of head poses in the source and target images is different, (ii) where source images capture a stationary person while target images capture a moving person whose facial appearance varies under motion due to changing perspective, scale and (iii) a combination of (i) and (ii). On the whole, the presented methods represent novel transfer learning solutions employed in the context of multi-view head pose classification. We demonstrate that the proposed solutions considerably outperform the state-of-the-art through extensive experimental validation. Finally, the DPOSE dataset compiled for benchmarking head pose classification performance with moving persons, and to aid behavioral understanding applications is presented in this work.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
Head pose estimation involves determination of the pan (out-of-plane horizontal head rotation), tilt (out-of-plane vertical rotation) and roll (in-plane head rotation). In this work, we are mainly concerned about estimating pan and tilt.
 
3
27824 4-view images correspond to static targets rotating in-place at the room center, while 25660 images capture freely moving targets.
 
4
These values account for the tracker’s variance, the horizontal and vertical offsets of the head from the body centroid due to head pan, tilt and roll.
 
5
This warping can also be applied in the case where the number of cameras/views for the source and target are different.
 
6
as seen from Table 1, which presents accuracies achieved with source-only \(Cov (d=12)\) features
 
7
In our implementation, we consider the room-center as the reference position.
 
8
\({\varvec{\varSigma }}\) is chosen to be positive semi-definite and have a trace equal to 1 as proposed in Kulis et al. (2011)
 
10
The NN classifier assigns the class label of the nearest target training example to the test image.
 
Literatur
Zurück zum Zitat Benfold, B., & Reid, I. (2011). Unsupervised learning of a scene-specific coarse gaze estimator. In International Conference on Computer Vision (pp. 2344–2351). Benfold, B., & Reid, I. (2011). Unsupervised learning of a scene-specific coarse gaze estimator. In International Conference on Computer Vision (pp. 2344–2351).
Zurück zum Zitat Chen, C., & Odobez, J.-M. (2012). We are not contortionists: Coupled adaptive learning for head and body orientation estimation in surveillance video. In Computer Vision and Pattern Recognition (pp. 1544–1551). Chen, C., & Odobez, J.-M. (2012). We are not contortionists: Coupled adaptive learning for head and body orientation estimation in surveillance video. In Computer Vision and Pattern Recognition (pp. 1544–1551).
Zurück zum Zitat Dai, W., Yang, Q., Xue, G. R., & Yu, Y. (2007). Boosting for transfer learning. In International Conference on Machine Learning (pp. 193–200). Dai, W., Yang, Q., Xue, G. R., & Yu, Y. (2007). Boosting for transfer learning. In International Conference on Machine Learning (pp. 193–200).
Zurück zum Zitat Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Computer Vision and Pattern Recognition (pp. 886–893). Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Computer Vision and Pattern Recognition (pp. 886–893).
Zurück zum Zitat Daume, H. (2007). Frustratingly easy domain adaptation. In Proceedings of Association for Computational Linguistics (pp. 256–263). Daume, H. (2007). Frustratingly easy domain adaptation. In Proceedings of Association for Computational Linguistics (pp. 256–263).
Zurück zum Zitat Doshi, A., & Trivedi, M. M. (2012). Head and eye gaze dynamics during visual attention shifts in complex environments. Journal of Vision, 12(2), 1–16.CrossRef Doshi, A., & Trivedi, M. M. (2012). Head and eye gaze dynamics during visual attention shifts in complex environments. Journal of Vision, 12(2), 1–16.CrossRef
Zurück zum Zitat Duan, L., Tsang, I. W., & Xu, D. (2012). Domain transfer multiple kernel learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(3), 465–479.CrossRef Duan, L., Tsang, I. W., & Xu, D. (2012). Domain transfer multiple kernel learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(3), 465–479.CrossRef
Zurück zum Zitat Duan, L., Tsang, I. W., Xu, D., & Chua, T.-S. (2009). Domain adaptation from multiple sources via auxiliary classifiers. In International Conference on Machine Learning (pp. 289–296). Duan, L., Tsang, I. W., Xu, D., & Chua, T.-S. (2009). Domain adaptation from multiple sources via auxiliary classifiers. In International Conference on Machine Learning (pp. 289–296).
Zurück zum Zitat Farhadi, A., & Tabrizi, M. K. (2008). Learning to recognize activities from the wrong view point. In European Conference on Computer Vision (pp. 154–166). Farhadi, A., & Tabrizi, M. K. (2008). Learning to recognize activities from the wrong view point. In European Conference on Computer Vision (pp. 154–166).
Zurück zum Zitat Ferencz, A., Learned-Miller, E. G., & Malik, J. (2008). Learning to locate informative features for visual identification. International Journal of Computer Vision, 77(1–3), 3–24.CrossRef Ferencz, A., Learned-Miller, E. G., & Malik, J. (2008). Learning to locate informative features for visual identification. International Journal of Computer Vision, 77(1–3), 3–24.CrossRef
Zurück zum Zitat HOSDB. (2006). Imagery library for intelligent detection systems (i-lids). In IEEE Crime and Security. HOSDB. (2006). Imagery library for intelligent detection systems (i-lids). In IEEE Crime and Security.
Zurück zum Zitat Jiang, J., & Zhai, C. (2007). Instance weighting for domain adaptation in nlp. In Association of Computational Linguistics (pp. 264–271). Jiang, J., & Zhai, C. (2007). Instance weighting for domain adaptation in nlp. In Association of Computational Linguistics (pp. 264–271).
Zurück zum Zitat Katzenmaier, M., Stiefelhagen, R., & Schultz, T. (2004). Identifying the addressee in human-human-robot interactions based on head pose and speech. In International Conference on Multimodal Interfaces (pp. 144–151). Katzenmaier, M., Stiefelhagen, R., & Schultz, T. (2004). Identifying the addressee in human-human-robot interactions based on head pose and speech. In International Conference on Multimodal Interfaces (pp. 144–151).
Zurück zum Zitat Kulis, B., Saenko, K., & Darrell, T. (2011). What you saw is not what you get: Domain adaptation using asymmetric kernel transforms. In Computer Vision and Pattern Recognition (pp. 1785–1792). Kulis, B., Saenko, K., & Darrell, T. (2011). What you saw is not what you get: Domain adaptation using asymmetric kernel transforms. In Computer Vision and Pattern Recognition (pp. 1785–1792).
Zurück zum Zitat Lanz, O. (2006). Approximate bayesian multibody tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(9), 1436–1449.CrossRef Lanz, O. (2006). Approximate bayesian multibody tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(9), 1436–1449.CrossRef
Zurück zum Zitat Lanz, O., & Brunelli, R. (2008). Joint bayesian tracking of head location and pose from low-resolution video. In R. Stiefelhagen, R. Bowers, & J. G. Fiscus (Eds.), Multimodal technologies for perception of humans, Lecture Notes in Computer Science (Vol. 4625, pp. 287–296). Heidelberg: Springer. Lanz, O., & Brunelli, R. (2008). Joint bayesian tracking of head location and pose from low-resolution video. In R. Stiefelhagen, R. Bowers, & J. G. Fiscus (Eds.), Multimodal technologies for perception of humans, Lecture Notes in Computer Science (Vol. 4625, pp. 287–296). Heidelberg: Springer.
Zurück zum Zitat Lepri, B., Subramanian, R., Kalimeri, K., Staiano, J., Pianesi, F., & Sebe, N. (2012). Connecting meeting behavior with extraversion–A systematic study. IEEE Transactions on Affective Computing, 3(4), 443–455.CrossRef Lepri, B., Subramanian, R., Kalimeri, K., Staiano, J., Pianesi, F., & Sebe, N. (2012). Connecting meeting behavior with extraversion–A systematic study. IEEE Transactions on Affective Computing, 3(4), 443–455.CrossRef
Zurück zum Zitat Lim, J. J., Salakhutdinov, R., & Torralba, A. (2011). Transfer learning by borrowing examples for multiclass object detection. In Advances in Neural Information Processing Systems (pp. 118–126). Lim, J. J., Salakhutdinov, R., & Torralba, A. (2011). Transfer learning by borrowing examples for multiclass object detection. In Advances in Neural Information Processing Systems (pp. 118–126).
Zurück zum Zitat Muñoz-Salinas, R., Yeguas-Bolivar, E., Saffiotti, A., & Carnicer, R. M. (2012). Multi-camera head pose estimation. Machine Vision and Applications, 23(3), 479–490.CrossRef Muñoz-Salinas, R., Yeguas-Bolivar, E., Saffiotti, A., & Carnicer, R. M. (2012). Multi-camera head pose estimation. Machine Vision and Applications, 23(3), 479–490.CrossRef
Zurück zum Zitat Murphy-Chutorian, E., & Trivedi, M. M. (2009). Head pose estimation in computer vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(4), 607–626.CrossRef Murphy-Chutorian, E., & Trivedi, M. M. (2009). Head pose estimation in computer vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(4), 607–626.CrossRef
Zurück zum Zitat Orozco, J., Gong, S., & Xiang, T. (2009). Head pose classification in crowded scenes. In British Machine Vision Conference (pp. 1– 11). Orozco, J., Gong, S., & Xiang, T. (2009). Head pose classification in crowded scenes. In British Machine Vision Conference (pp. 1– 11).
Zurück zum Zitat Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359.CrossRef Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359.CrossRef
Zurück zum Zitat Pardoe, D., & Stone, P. (2010). Boosting for regression transfer. In International Conference on Machine Learning (pp. 863–870). Pardoe, D., & Stone, P. (2010). Boosting for regression transfer. In International Conference on Machine Learning (pp. 863–870).
Zurück zum Zitat Rajagopal, A., Subramanian, R., Vieriu, R. L., Ricci, E., Lanz, O., Sebe, N., & Ramakrishnan, K. (2012). An adaptation framework for head pose estimation in dynamic multi-view scenarios. In Asian Conference on Computer Vision (pp. 652–666). Rajagopal, A., Subramanian, R., Vieriu, R. L., Ricci, E., Lanz, O., Sebe, N., & Ramakrishnan, K. (2012). An adaptation framework for head pose estimation in dynamic multi-view scenarios. In Asian Conference on Computer Vision (pp. 652–666).
Zurück zum Zitat Ricci, E., & Odobez, J.-M. (2009). Learning large margin likelihoods for realtime head pose tracking. In International Conference on Image Processing (pp. 2593–2596). Ricci, E., & Odobez, J.-M. (2009). Learning large margin likelihoods for realtime head pose tracking. In International Conference on Image Processing (pp. 2593–2596).
Zurück zum Zitat Smith, K., Ba, S. O., Odobez, J.-M., & Gatica-Perez, D. (2008). Tracking the visual focus of attention for a varying number of wandering people. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(7), 1212–1229.CrossRef Smith, K., Ba, S. O., Odobez, J.-M., & Gatica-Perez, D. (2008). Tracking the visual focus of attention for a varying number of wandering people. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(7), 1212–1229.CrossRef
Zurück zum Zitat Stiefelhagen, R., Bowers, R., & Fiscus, J. G. (2007). Multimodal Technologies for Perception of Humans. In International evaluation workshops CLEAR 2007 and RT 2007, Baltimore, MD, May 8–11, 2007, Revised Selected Papers (Vol. 4625). Heidelberg: Springer. Stiefelhagen, R., Bowers, R., & Fiscus, J. G. (2007). Multimodal Technologies for Perception of Humans. In International evaluation workshops CLEAR 2007 and RT 2007, Baltimore, MD, May 8–11, 2007, Revised Selected Papers (Vol. 4625). Heidelberg: Springer.
Zurück zum Zitat Subramanian, R., Staiano, J., Kalimeri, K., Sebe, N., & Pianesi, F. (2010). Putting the pieces together: Multimodal analysis of social attention in meetings. In Acm Int’l Conference on Multimedia (pp. 659–662). Subramanian, R., Staiano, J., Kalimeri, K., Sebe, N., & Pianesi, F. (2010). Putting the pieces together: Multimodal analysis of social attention in meetings. In Acm Int’l Conference on Multimedia (pp. 659–662).
Zurück zum Zitat Subramanian, R., Yan, Y., Staiano, J., Lanz, O., & Sebe, N. (2013). On the relationship between head pose, social attention and personality prediction for unstructured and dynamic group interactions. In Acm Int’l Conference on Multimodal Interfaces. Subramanian, R., Yan, Y., Staiano, J., Lanz, O., & Sebe, N. (2013). On the relationship between head pose, social attention and personality prediction for unstructured and dynamic group interactions. In Acm Int’l Conference on Multimodal Interfaces.
Zurück zum Zitat Tosato, D., Farenzena, M., Spera, M., Murino, V., & Cristani, M. (2010). Multi-class classification on riemannian manifolds for video surveillance. In European Conference on Computer Vision (pp. 378–391). Tosato, D., Farenzena, M., Spera, M., Murino, V., & Cristani, M. (2010). Multi-class classification on riemannian manifolds for video surveillance. In European Conference on Computer Vision (pp. 378–391).
Zurück zum Zitat Voit, M., & Stiefelhagen, R. (2009). A system for probabilistic joint 3d head tracking and pose estimation in low-resolution, multi-view environments. In Computer Vision Systems (pp. 415–424). Voit, M., & Stiefelhagen, R. (2009). A system for probabilistic joint 3d head tracking and pose estimation in low-resolution, multi-view environments. In Computer Vision Systems (pp. 415–424).
Zurück zum Zitat Wang, X., Han, T. X., & Yan, S. (2009). An hog-lbp human detector with partial occlusion handling. In International Conference on Computer Vision (pp. 32–39). Wang, X., Han, T. X., & Yan, S. (2009). An hog-lbp human detector with partial occlusion handling. In International Conference on Computer Vision (pp. 32–39).
Zurück zum Zitat Williams, C., Bonilla, E. V., & Chai, K. M. (2007). Multi-task gaussian process prediction. In Advances in Neural Information Processing Systems (pp. 153–160). Williams, C., Bonilla, E. V., & Chai, K. M. (2007). Multi-task gaussian process prediction. In Advances in Neural Information Processing Systems (pp. 153–160).
Zurück zum Zitat Yan, Y., Subramanian, R., Lanz, O., & Sebe, N. (2012). Active transfer learning for multi-view head-pose classification. In Int’l Conference on Pattern Recognition (pp. 1168–1171). Yan, Y., Subramanian, R., Lanz, O., & Sebe, N. (2012). Active transfer learning for multi-view head-pose classification. In Int’l Conference on Pattern Recognition (pp. 1168–1171).
Zurück zum Zitat Yan, Y., Ricci, E., Subramanian, R., Lanz, O., & Sebe, N. (2013) No matter where you are: Flexible graph-guided multi-task learning for multi-view head pose classification under target motion. In Int’l Conference on Computer Vision. Yan, Y., Ricci, E., Subramanian, R., Lanz, O., & Sebe, N. (2013) No matter where you are: Flexible graph-guided multi-task learning for multi-view head pose classification under target motion. In Int’l Conference on Computer Vision.
Zurück zum Zitat Yang, J., Yan, R., & Hauptmann, A. G. (2007). Cross-domain video concept detection using adaptive svms. In Acm Int’l Conference on Multimedia (pp. 188–197). Yang, J., Yan, R., & Hauptmann, A. G. (2007). Cross-domain video concept detection using adaptive svms. In Acm Int’l Conference on Multimedia (pp. 188–197).
Zurück zum Zitat Yang, W., Wang, Y., & Mori, G. (2009). Human action recognition from a single clip per action. In Int’l Workshop on Machine learning for Vision-Based Motion Analysis. Yang, W., Wang, Y., & Mori, G. (2009). Human action recognition from a single clip per action. In Int’l Workshop on Machine learning for Vision-Based Motion Analysis.
Zurück zum Zitat Yang, W., Wang, Y., & Mori. G. (2010). Efficient human action detection using a transferable distance function. In Asian Conference on Computer Vision (pp. 417–426). Yang, W., Wang, Y., & Mori. G. (2010). Efficient human action detection using a transferable distance function. In Asian Conference on Computer Vision (pp. 417–426).
Zurück zum Zitat Zabulis, X., Sarmis, T., & Argyros, A. A. (2009). 3d head pose estimation from multiple distant views. In British Machine Vision Conference (pp. 1–12). Zabulis, X., Sarmis, T., & Argyros, A. A. (2009). 3d head pose estimation from multiple distant views. In British Machine Vision Conference (pp. 1–12).
Zurück zum Zitat Zhang, Y., & Yeung, D.-Y. (2010). A convex formulation for learning task relationships in multi-task learning. In Uncertainity in Artificial Intelligence (pp. 733–742). Zhang, Y., & Yeung, D.-Y. (2010). A convex formulation for learning task relationships in multi-task learning. In Uncertainity in Artificial Intelligence (pp. 733–742).
Zurück zum Zitat Zheng, J., Jiang, Z., Phillips, J., & Chellappa, R. (2012). Cross-view action recognition via a transferable dictionary pair. In British Machine Vision Conference (pp. 1–11). Zheng, J., Jiang, Z., Phillips, J., & Chellappa, R. (2012). Cross-view action recognition via a transferable dictionary pair. In British Machine Vision Conference (pp. 1–11).
Metadaten
Titel
Exploring Transfer Learning Approaches for Head Pose Classification from Multi-view Surveillance Images
verfasst von
Anoop Kolar Rajagopal
Ramanathan Subramanian
Elisa Ricci
Radu L. Vieriu
Oswald Lanz
Ramakrishnan Kalpathi R.
Nicu Sebe
Publikationsdatum
01.08.2014
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 1-2/2014
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-013-0692-2

Weitere Artikel der Ausgabe 1-2/2014

International Journal of Computer Vision 1-2/2014 Zur Ausgabe

Premium Partner