nach oben

International Journal of Computer Vision

Erschienen in:

01.08.2014

Exploring Transfer Learning Approaches for Head Pose Classification from Multi-view Surveillance Images

verfasst von: Anoop Kolar Rajagopal, Ramanathan Subramanian, Elisa Ricci, Radu L. Vieriu, Oswald Lanz, Ramakrishnan Kalpathi R., Nicu Sebe

Erschienen in: International Journal of Computer Vision | Ausgabe 1-2/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Head pose classification from surveillance images acquired with distant, large field-of-view cameras is difficult as faces are captured at low-resolution and have a blurred appearance. Domain adaptation approaches are useful for transferring knowledge from the training (source) to the test (target) data when they have different attributes, minimizing target data labeling efforts in the process. This paper examines the use of transfer learning for efficient multi-view head pose classification with minimal target training data under three challenging situations: (i) where the range of head poses in the source and target images is different, (ii) where source images capture a stationary person while target images capture a moving person whose facial appearance varies under motion due to changing perspective, scale and (iii) a combination of (i) and (ii). On the whole, the presented methods represent novel transfer learning solutions employed in the context of multi-view head pose classification. We demonstrate that the proposed solutions considerably outperform the state-of-the-art through extensive experimental validation. Finally, the DPOSE dataset compiled for benchmarking head pose classification performance with moving persons, and to aid behavioral understanding applications is presented in this work.

Vorheriger Artikel Domain Adaptation for Structured Regression

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nur mit Berechtigung zugänglich

Head pose estimation involves determination of the pan (out-of-plane horizontal head rotation), tilt (out-of-plane vertical rotation) and roll (in-plane head rotation). In this work, we are mainly concerned about estimating pan and tilt.

available at http://tev.fbk.eu/DATABASES/DPOSE.html

27824 4-view images correspond to static targets rotating in-place at the room center, while 25660 images capture freely moving targets.

These values account for the tracker’s variance, the horizontal and vertical offsets of the head from the body centroid due to head pan, tilt and roll.

This warping can also be applied in the case where the number of cameras/views for the source and target are different.

as seen from Table 1, which presents accuracies achieved with source-only \(Cov (d=12)\) features

In our implementation, we consider the room-center as the reference position.

\({\varvec{\varSigma }}\) is chosen to be positive semi-definite and have a trace equal to 1 as proposed in Kulis et al. (2011)

http://sedumi.ie.lehigh.edu/

The NN classifier assigns the class label of the nearest target training example to the test image.

Benfold, B., & Reid, I. (2011). Unsupervised learning of a scene-specific coarse gaze estimator. In International Conference on Computer Vision (pp. 2344–2351).

Chen, C., & Odobez, J.-M. (2012). We are not contortionists: Coupled adaptive learning for head and body orientation estimation in surveillance video. In Computer Vision and Pattern Recognition (pp. 1544–1551).

Dai, W., Yang, Q., Xue, G. R., & Yu, Y. (2007). Boosting for transfer learning. In International Conference on Machine Learning (pp. 193–200).

Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Computer Vision and Pattern Recognition (pp. 886–893).

Daume, H. (2007). Frustratingly easy domain adaptation. In Proceedings of Association for Computational Linguistics (pp. 256–263).

Doshi, A., & Trivedi, M. M. (2012). Head and eye gaze dynamics during visual attention shifts in complex environments. Journal of Vision, 12(2), 1–16.CrossRef

Duan, L., Tsang, I. W., & Xu, D. (2012). Domain transfer multiple kernel learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(3), 465–479.CrossRef

Duan, L., Tsang, I. W., Xu, D., & Chua, T.-S. (2009). Domain adaptation from multiple sources via auxiliary classifiers. In International Conference on Machine Learning (pp. 289–296).

Farhadi, A., & Tabrizi, M. K. (2008). Learning to recognize activities from the wrong view point. In European Conference on Computer Vision (pp. 154–166).

Ferencz, A., Learned-Miller, E. G., & Malik, J. (2008). Learning to locate informative features for visual identification. International Journal of Computer Vision, 77(1–3), 3–24.CrossRef

HOSDB. (2006). Imagery library for intelligent detection systems (i-lids). In IEEE Crime and Security.

Jiang, J., & Zhai, C. (2007). Instance weighting for domain adaptation in nlp. In Association of Computational Linguistics (pp. 264–271).

Katzenmaier, M., Stiefelhagen, R., & Schultz, T. (2004). Identifying the addressee in human-human-robot interactions based on head pose and speech. In International Conference on Multimodal Interfaces (pp. 144–151).

Kulis, B., Saenko, K., & Darrell, T. (2011). What you saw is not what you get: Domain adaptation using asymmetric kernel transforms. In Computer Vision and Pattern Recognition (pp. 1785–1792).

Lanz, O. (2006). Approximate bayesian multibody tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(9), 1436–1449.CrossRef

Lanz, O., & Brunelli, R. (2008). Joint bayesian tracking of head location and pose from low-resolution video. In R. Stiefelhagen, R. Bowers, & J. G. Fiscus (Eds.), Multimodal technologies for perception of humans, Lecture Notes in Computer Science (Vol. 4625, pp. 287–296). Heidelberg: Springer.

Lepri, B., Subramanian, R., Kalimeri, K., Staiano, J., Pianesi, F., & Sebe, N. (2012). Connecting meeting behavior with extraversion–A systematic study. IEEE Transactions on Affective Computing, 3(4), 443–455.CrossRef

Lim, J. J., Salakhutdinov, R., & Torralba, A. (2011). Transfer learning by borrowing examples for multiclass object detection. In Advances in Neural Information Processing Systems (pp. 118–126).

Muñoz-Salinas, R., Yeguas-Bolivar, E., Saffiotti, A., & Carnicer, R. M. (2012). Multi-camera head pose estimation. Machine Vision and Applications, 23(3), 479–490.CrossRef

Murphy-Chutorian, E., & Trivedi, M. M. (2009). Head pose estimation in computer vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(4), 607–626.CrossRef

Orozco, J., Gong, S., & Xiang, T. (2009). Head pose classification in crowded scenes. In British Machine Vision Conference (pp. 1– 11).

Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359.CrossRef

Pardoe, D., & Stone, P. (2010). Boosting for regression transfer. In International Conference on Machine Learning (pp. 863–870).

Rajagopal, A., Subramanian, R., Vieriu, R. L., Ricci, E., Lanz, O., Sebe, N., & Ramakrishnan, K. (2012). An adaptation framework for head pose estimation in dynamic multi-view scenarios. In Asian Conference on Computer Vision (pp. 652–666).

Ricci, E., & Odobez, J.-M. (2009). Learning large margin likelihoods for realtime head pose tracking. In International Conference on Image Processing (pp. 2593–2596).

Smith, K., Ba, S. O., Odobez, J.-M., & Gatica-Perez, D. (2008). Tracking the visual focus of attention for a varying number of wandering people. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(7), 1212–1229.CrossRef

Stiefelhagen, R., Bowers, R., & Fiscus, J. G. (2007). Multimodal Technologies for Perception of Humans. In International evaluation workshops CLEAR 2007 and RT 2007, Baltimore, MD, May 8–11, 2007, Revised Selected Papers (Vol. 4625). Heidelberg: Springer.

Subramanian, R., Staiano, J., Kalimeri, K., Sebe, N., & Pianesi, F. (2010). Putting the pieces together: Multimodal analysis of social attention in meetings. In Acm Int’l Conference on Multimedia (pp. 659–662).

Subramanian, R., Yan, Y., Staiano, J., Lanz, O., & Sebe, N. (2013). On the relationship between head pose, social attention and personality prediction for unstructured and dynamic group interactions. In Acm Int’l Conference on Multimodal Interfaces.

Tosato, D., Farenzena, M., Spera, M., Murino, V., & Cristani, M. (2010). Multi-class classification on riemannian manifolds for video surveillance. In European Conference on Computer Vision (pp. 378–391).

Voit, M., & Stiefelhagen, R. (2009). A system for probabilistic joint 3d head tracking and pose estimation in low-resolution, multi-view environments. In Computer Vision Systems (pp. 415–424).

Wang, X., Han, T. X., & Yan, S. (2009). An hog-lbp human detector with partial occlusion handling. In International Conference on Computer Vision (pp. 32–39).

Williams, C., Bonilla, E. V., & Chai, K. M. (2007). Multi-task gaussian process prediction. In Advances in Neural Information Processing Systems (pp. 153–160).

Yan, Y., Subramanian, R., Lanz, O., & Sebe, N. (2012). Active transfer learning for multi-view head-pose classification. In Int’l Conference on Pattern Recognition (pp. 1168–1171).

Yan, Y., Ricci, E., Subramanian, R., Lanz, O., & Sebe, N. (2013) No matter where you are: Flexible graph-guided multi-task learning for multi-view head pose classification under target motion. In Int’l Conference on Computer Vision.

Yang, J., Yan, R., & Hauptmann, A. G. (2007). Cross-domain video concept detection using adaptive svms. In Acm Int’l Conference on Multimedia (pp. 188–197).

Yang, W., Wang, Y., & Mori, G. (2009). Human action recognition from a single clip per action. In Int’l Workshop on Machine learning for Vision-Based Motion Analysis.

Yang, W., Wang, Y., & Mori. G. (2010). Efficient human action detection using a transferable distance function. In Asian Conference on Computer Vision (pp. 417–426).

Zabulis, X., Sarmis, T., & Argyros, A. A. (2009). 3d head pose estimation from multiple distant views. In British Machine Vision Conference (pp. 1–12).

Zhang, Y., & Yeung, D.-Y. (2010). A convex formulation for learning task relationships in multi-task learning. In Uncertainity in Artificial Intelligence (pp. 733–742).

Zheng, J., Jiang, Z., Phillips, J., & Chellappa, R. (2012). Cross-view action recognition via a transferable dictionary pair. In British Machine Vision Conference (pp. 1–11).

Titel: Exploring Transfer Learning Approaches for Head Pose Classification from Multi-view Surveillance Images
verfasst von: Anoop Kolar Rajagopal
Ramanathan Subramanian
Elisa Ricci
Radu L. Vieriu
Oswald Lanz
Ramakrishnan Kalpathi R.
Nicu Sebe
Publikationsdatum: 01.08.2014
Verlag: Springer US
Erschienen in: International Journal of Computer Vision / Ausgabe 1-2/2014
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI: https://doi.org/10.1007/s11263-013-0692-2

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 1-2/2014

Weakly-Supervised Cross-Domain Dictionary Learning for Visual Recognition

Domain Adaptation for Structured Regression

Learning Kernels for Unsupervised Domain Adaptation with Applications to Visual Object Recognition

Asymmetric and Category Invariant Feature Transformations for Domain Adaptation

Generalized Transfer Subspace Learning Through Low-Rank Constraint

Harnessing Lab Knowledge for Real-World Action Recognition

Premium Partner