nach oben

International Journal of Computer Vision

Erschienen in:

24.11.2017

From Facial Expression Recognition to Interpersonal Relation Prediction

verfasst von: Zhanpeng Zhang, Ping Luo, Chen Change Loy, Xiaoou Tang

Erschienen in: International Journal of Computer Vision | Ausgabe 5/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Interpersonal relation defines the association, e.g., warm, friendliness, and dominance, between two or more people. We investigate if such fine-grained and high-level relation traits can be characterized and quantified from face images in the wild. We address this challenging problem by first studying a deep network architecture for robust recognition of facial expressions. Unlike existing models that typically learn from facial expression labels alone, we devise an effective multitask network that is capable of learning from rich auxiliary attributes such as gender, age, and head pose, beyond just facial expression data. While conventional supervised training requires datasets with complete labels (e.g., all samples must be labeled with gender, age, and expression), we show that this requirement can be relaxed via a novel attribute propagation method. The approach further allows us to leverage the inherent correspondences between heterogeneous attribute sources despite the disparate distributions of different datasets. With the network we demonstrate state-of-the-art results on existing facial expression recognition benchmarks. To predict inter-personal relation, we use the expression recognition network as branches for a Siamese model. Extensive experiments show that our model is capable of mining mutual context of faces for accurate fine-grained interpersonal prediction.

Vorheriger Artikel No-Reference Image Quality Assessment for Image Auto-Denoising

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Despite we did not study the integration of face and body cues, if body posture and hand gesture information are available, they can be naturally used as additional input channels for our deep models.

Both ExpW and relation datasets are available at http://mmlab.ie.cuhk.edu.hk/projects/socialrelation/index.html.

Bi, W., & Kwok, J. T. (2014). Multilabel classification with label correlations and missing labels. In AAAI conference on artificial intelligence (pp. 1680–1686).

Bromley, J., Guyon, I., Lecun, Y., Säckinger, E., & Shah, R. (1994). Signature verification using a Siamese time delay neural network. In Advances in neural information processing systems.

Celeux, G., Forbes, F., & Peyrard, N. (2003). EM procedures using mean field-like approximations for markov model-based image segmentation. Pattern Recognition, 36(1), 131–144.CrossRefMATH

Chakraborty, I., Cheng, H., & Javed, O. (2013). 3D visual proxemics: Recognizing human interactions in 3D from a single image. In IEEE conference on computer vision and pattern recognition (pp. 3406–3413).

Chen, Y. Y., Hsu, W. H., & Liao, H. Y. M. (2012). Discovering informative social subgraphs and predicting pairwise relationships from group photos. In ACM multimedia (pp. 669–678).

Chu, X., Ouyang, W., Yang, W., & Wang, X. (2015). Multi-task recurrent neural network for immediacy prediction. In Proceedings of the IEEE international conference on computer vision (pp. 3352–3360).

Cristani, M., Raghavendra, R., Del Bue, A., & Murino, V. (2013). Human behavior analysis in video surveillance: A social signal processing perspective. Neurocomputing, 100, 86–97.CrossRef

Dahmane, M., & Meunier, J. (2011). Emotion recognition using dynamic grid-based hog features. In IEEE international conference on automatic face & gesture recognition (pp. 884–888).

Deng, Z., Vahdat, A., Hu, H., & Mori, G. (2016). Structure inference machines: Recurrent neural networks for analyzing relations in group activity recognition. In IEEE conference on computer vision and pattern recognition.

Dhall, A., Asthana, A., Goecke, R., & Gedeon, T. (2011). Emotion recognition using PHOG and LPQ features. In IEEE international conference on automatic face & gesture recognition and workshops (pp. 878–883).

Dhall, A., Ramana Murthy, O., Goecke, R., Joshi, J., & Gedeon, T. (2015). Video and image based emotion recognition challenges in the wild: Emotiw 2015. In ACM international conference on multimodal interaction (pp. 423–426).

Ding, L., & Yilmaz, A. (2010). Learning relations among movie characters: A social network perspective. In European conference on computer vision.

Ding, L., & Yilmaz, A. (2011). Inferring social relations from visual concepts. In IEEE international conference on computer vision (pp. 699–706).

Emily, M., & Hand, R. C. (2017). Attributes for improved attributes: A multi-task network utilizing implicit and explicit relationships for facial attribute classification. In AAAI conference on artificial intelligence.

Fabian Benitez-Quiroz, C., Srinivasan, R., & Martinez, A. M. (2016). Emotionet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In IEEE conference on computer vision and pattern recognition.

Fathi, A., Hodgins, J. K., & Rehg, J. M. (2012). Social interactions: A first-person perspective. In IEEE conference on computer vision and pattern recognition.

Gallagher, A. C., & Chen, T. (2009). Understanding images of groups of people. In IEEE conference on computer vision and pattern recognition (pp. 256–263). IEEE.

Girard, J. M. (2014). Perceptions of interpersonal behavior are influenced by gender, facial expression intensity, and head pose. In ACM international conference on multimodal interaction (pp. 394–398).

Goodfellow, I., Erhan, D., Carrier, P. L., Courville, A., Mirza, et al. (2013). Challenges in representation learning: A report on three machine learning contests. http://arxiv.org/abs/1307.0414.

Gottman, J., Levenson, R., & Woodin, E. (2001). Facial expressions during marital conflict. Journal of Family Communication, 1(1), 37–57.CrossRef

Gupta, A. K., & Nagar, D. K. (1999). Matrix variate distributions. Boca Raton: CRC Press.MATH

Hess, U., Blairy, S., & Kleck, R. E. (2000). The influence of facial emotion displays, gender, and ethnicity on judgments of dominance and affiliation. Journal of Nonverbal Behavior, 24(4), 265–283.CrossRef

Hoai, M., & Zisserman, A. (2014). Talking heads: Detecting humans and recognizing their interactions. In IEEE conference on computer vision and pattern recognition.

Hu, Y., Zeng, Z., Yin, L., Wei, X., Zhou, X., & Huang, T. S. (2008). Multi-view facial expression recognition. In IEEE international conference on automatic face & gesture recognition. https://doi.org/10.1109/AFGR.2008.4813445.

Huang, C., Li, Y., Loy, C. C., & Tang, X. (2016). Learning deep representation for imbalanced classification. In IEEE conference on computer vision and pattern recognition.

Hung, H., Jayagopi, D., Yeo, C., Friedland, G., Ba, S., Odobez, J. M., et al. (2007). Using audio and video features to classify the most dominant person in a group meeting. In ACM multimedia.

Ibrahim, M., Muralidharan, S., Deng, Z., Vahdat, A., & Mori, G. (2016). A hierarchical deep temporal model for group activity recognition. In IEEE conference on computer vision and pattern recognition.

Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd international conference on machine learning (pp. 448–456).

Joo, J., Li, W., Steen, F., & Zhu, S. C. (2014). Visual persuasion: Inferring communicative intents of images. In IEEE conference on computer vision and pattern recognition (pp. 216–223).

Jung, H., Lee, S., Yim, J., Park, S., & Kim, J. (2015). Joint fine-tuning in deep neural networks for facial expression recognition. In IEEE international conference on computer vision.

Khorrami, P., Paine, T., & Huang, T. (2015). Do deep neural networks learn facial action units when doing expression recognition? In IEEE international conference on computer vision workshop.

Kiesler, D. J. (1983). The 1982 interpersonal circle: A taxonomy for complementarity in human transactions. Psychological Review, 90(3), 185.CrossRef

Knutson, B. (1996). Facial expressions of emotion influence interpersonal trait inferences. Journal of Nonverbal Behavior, 20(3), 165–182.CrossRef

Kong, Y., Jia, Y., & Fu, Y. (2012). Learning human interaction by interactive phrases. In European conference on computer vision (pp. 300–313).

Kostinger, M., Wohlhart, P., Roth, P., & Bischof, H. (2011). Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization. In IEEE international conference on computer vision workshop (pp. 2144–2151).

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou & K. Q. Weinberger (Eds.), Advances in neural information processing systems. Curran Associates, Inc.

Kumar, N., Belhumeur, P., & Nayar, S. (2008). Facetracer: A search engine for large collections of images with faces. In European conference on computer vision (pp. 340–353). Berlin: Springer.

Lan, T., Sigal, L., & Mori, G. (2012). Social roles in hierarchical models for human activity recognition. In IEEE conference on computer vision and pattern recognition.

Lee, D. H. (2013). Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In International conference on machine learning workshop (vol. 3, p. 2).

Levi, G., & Hassner, T. (2015). Emotion recognition in the wild via convolutional neural networks and mapped binary patterns. In ACM international conference on multimodal interaction (pp. 503–510).

Li, H., Lin, Z., Shen, X., Brandt, J., & Hua, G. (2015). A convolutional neural network cascade for face detection. In IEEE conference on computer vision and pattern recognition.

Liu, M., Li, S., Shan, S., & Chen, X. (2015). AU-inspired deep networks for facial expression feature learning. Neurocomputing, 159, 126–136.CrossRef

Liu, M., Li, S., Shan, S., Wang, R., & Chen, X. (2014a). Deeply learning deformable facial action parts model for dynamic expression analysis. In Asian conference on computer vision.

Liu, M., Shan, S., Wang, R., & Chen, X. (2014b). Learning expressionlets on spatio-temporal manifold for dynamic facial expression recognition. In IEEE conference on computer vision and pattern recognition.

Liu, P., Han, S., Meng, Z., & Tong, Y. (2014c). Facial expression recognition via a boosted deep belief network. In IEEE conference on computer vision and pattern recognition (pp. 1805–1812).

Liu, S., Yang, J., Huang, C., & Yang, M. H. (2015a). Multi-objective convolutional learning for face labeling. In IEEE conference on computer vision and pattern recognition.

Liu, Z., Luo, P., Wang, X., & Tang, X. (2015b). Deep learning face attributes in the wild. In IEEE international conference on computer vision.

Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar, Z., & Matthews, I. (2010). The extended Cohn–Kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression. In IEEE conference on computer vision and pattern recognition workshops (pp. 94–101).

Luo, P., Wang, X., & Tang, X. (2012). Hierarchical face parsing via deep learning. In IEEE conference on computer vision and pattern recognition.

Lyons, M. J., Budynek, J., & Akamatsu, S. (1999). Automatic classification of single facial images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(12), 1357–1362.CrossRef

Microsoft Cognitive Services. (2016). https://www.microsoft.com/cognitive-services/en-us/emotion-api.

Mollahosseini, A., Chan, D., & Mahoor, M. H. (2016). Going deeper in facial expression recognition using deep neural networks. In IEEE winter conference on applications of computer vision.

Moody, J., Hanson, S., Krogh, A., & Hertz, J. A. (1995). A simple weight decay can improve generalization. Advances in Neural Information Processing Systems, 4, 950–957.

Ng, H. W., Nguyen, V. D., Vonikakis, V., & Winkler, S. (2015). Deep learning for emotion recognition on small datasets using transfer learning. In ACM international conference on multimodal interaction (pp. 443–449).

Opitz, M., Waltner, G., Poier, G., Possegger, H., & Bischof, H. (2016). Grid loss: Detecting occluded faces. In European conference on computer vision.

Pantic, M., Cowie, R., D’Errico, F., Heylen, D., Mehu, M., Pelachaud, C., et al. (2011). Social signal processing: The research agenda. In T. B. Moeslund, A. Hilton, V. Krüger & L. Sigal (Eds.), Visual analysis of humans (pp. 511–538). Berlin: Springer.

Pantic, M., Valstar, M., Rademaker, R., & Maat, L. (2005). Web-based database for facial expression analysis. In IEEE international conference on multimedia and expo.

Parkhi, O. M., Vedaldi, A., & Zisserman, A. (2015). Deep face recognition. In British machine vision conference.

Pearson, K. (1895). Note on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58, 240–242.CrossRef

Pentland, A. (2007). Social signal processing. IEEE Signal Processing Magazine, 24(4), 108.CrossRef

Raducanu, B., & Gatica-Perez, D. (2012). Inferring competitive role patterns in reality TV show through nonverbal analysis. Multimedia Tools and Applications, 56(1), 207–226.CrossRef

Ramanathan, V., Yao, B., & Fei-Fei, L. (2013). Social role discovery in human events. In IEEE conference on computer vision and pattern recognition (pp. 2475–2482).

Ricci, E., Varadarajan, J., Subramanian, R., Rota Bulo, S., Ahuja, N., & Lanz, O. (2015). Uncovering interactions and interactors: Joint estimation of head, body orientation and f-formations from surveillance videos. In IEEE international conference on computer vision.

Ruiz, A., Van de Weijer, J., & Binefa, X. (2015). From emotions to action units with hidden and semi-hidden-task learning. In IEEE international conference on computer vision (pp. 3703–3711).

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.MathSciNetCrossRef

Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embedding for face recognition and clustering. In IEEE conference on computer vision and pattern recognition.

Shan, C., Gong, S., & McOwan, P. W. (2009). Facial expression recognition based on local binary patterns: A comprehensive study. Image and Vision Computing, 27(6), 803–816.CrossRef

Sun, Y., Wang, X., & Tang, X. (2016). Sparsifying neural network connections for face recognition. In IEEE conference on computer vision and pattern recognition.

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2015). Going deeper with convolutions. In IEEE conference on computer vision and pattern recognition.

Tian, Y., Kanade, T., & Cohn, J. F. (2011). Facial expression recognition. In S. Z. Li & A. K. Jain (Eds.), Handbook of face recognition. Berlin: Springer.

Tian, Y. I., Kanade, T., & Cohn, J. F. (2001). Recognizing action units for facial expression analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(2), 97–115.CrossRef

Tianqi, C., Li, M., Li, Y., Lin, M., Wang, N., Wang, M., et al. (2016). Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. In NIPS workshop on machine learning systems.

Trigeorgis, G., Snape, P., Nicolaou, M. A., Antonakos, E., & Zafeiriou, S. (2016). Mnemonic descent method: A recurrent process applied for end-to-end face alignment. In IEEE conference on computer vision and pattern recognition.

Valstar, M. F., Mehu, M., Jiang, B., Pantic, M., & Scherer, K. (2012). Meta-analysis of the first facial expression recognition challenge. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 42(4), 966–979.CrossRef

Vinciarelli, A., Pantic, M., & Bourlard, H. (2009). Social signal processing: Survey of an emerging domain. Image and Vision Computing, 27(12), 1743–1759.CrossRef

Vinciarelli, A., Pantic, M., Heylen, D., Pelachaud, C., Poggi, I., D’Errico, F., et al. (2012). Bridging the gap between social animal and unsocial machine: A survey of social signal processing. IEEE Transactions on Affective Computing, 3(1), 69–87.CrossRef

Wang, G., Gallagher, A., Luo, J., & Forsyth, D. (2010). Seeing people in social context: Recognizing people and social relationships. In European conference on computer vision (pp. 169–182).

Wang, J., Cheng, Y., & Feris, R. S. (2016). Walk and learn: Facial attribute representation learning from egocentric video and contextual data. In IEEE conference on computer vision and pattern recognition.

Weng, C. Y., Chu, W. T., & Wu, J. L. (2009). RoleNet: Movie analysis from the perspective of social networks. IEEE Transactions on Multimedia, 11(2), 256–271.CrossRef

Wu, Y., & Ji, Q. (2016). Constrained joint cascade regression framework for simultaneous facial action unit recognition and facial landmark detection. In IEEE conference on computer vision and pattern recognition.

Yang, B., Yan, J., Lei, Z., & Li, S. Z. (2014). Aggregate channel features for multi-view face detection. In International joint conference on biometrics.

Yang, B., Yan, J., Lei, Z., & Li, S. Z. (2015). Convolutional channel features. In IEEE international conference on computer vision.

Yang, H., Zhou, J. T., & Cai, J. (2016). Improving multi-label learning with missing labels by structured semantic correlations. In European conference on computer vision (pp. 835–851).

Yang, S., Luo, P., Loy, C. C., & Tang, X. (2015). From facial parts responses to face detection: A deep learning approach. In IEEE international conference on computer vision.

Yang, S., Luo, P., Loy, C. C., & Tang, X. (2016). Wider face: A face detection benchmark. In IEEE conference on computer vision and pattern recognition.

Yao, A., Shao, J., Ma, N., & Chen, Y. (2015). Capturing au-aware facial features and their latent relations for emotion recognition in the wild. In ACM international conference on multimodal interaction (pp. 451–458).

Yu, H. F., Jain, P., Kar, P., & Dhillon, I. (2014). Large-scale multi-label learning with missing labels. In International conference on machine learning (pp. 593–601).

Yu, Z., & Zhang, C. (2015). Image based static facial expression recognition with multiple deep network learning. In ACM international conference on multimodal interaction (pp. 435–442).

Zafeiriou, S., Papaioannou, A., Kotsia, I., Nicolaou, M. A., & Zhao, G. (2016). Facial affect in-the-wild: A survey and a new database. In IEEE conference on computer vision and pattern recognition workshop.

Zelnik-Manor, L., & Perona, P. (2004). Self-tuning spectral clustering. In NIPS (pp. 1601–1608).

Zhang, N., Paluri, M., Ranzato, M., Darrell, T., & Bourdev, L. (2014). Panda: Pose aligned networks for deep attribute modeling. In IEEE conference on computer vision and pattern recognition.

Zhang, Z., Luo, P., Loy, C. C., & Tang, X. (2015a). Learning deep representation for face alignment with auxiliary attributes. In IEEE transactions on pattern analysis and machine intelligence.

Zhang, Z., Luo, P., Loy, C. C., & Tang, X. (2015b). Learning social relation traits from face images. In IEEE international conference on computer vision.

Zhang, Z., Luo, P., Loy, C. C., & Tang, X. (2016). Joint face representation adaptation and clustering in videos. In European conference on computer vision.

Zhao, G., Huang, X., Taini, M., Li, S. Z., & Pietikäinen, M. (2011). Facial expression recognition from near-infrared videos. Image and Vision Computing, 29(9), 607–619.CrossRef

Zhao, G., & Pietikainen, M. (2007). Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(6), 915–928.CrossRef

Zhao, X., Liang, X., Liu, L., Li, T., Vasconcelos, N., & Yan, S. (2016). Peak-piloted deep network for facial expression recognition. In European conference on computer vision.

Zhong, L., Liu, Q., Yang, P., Liu, B., Huang, J., & Metaxas, D. N. (2012). Learning active facial patches for expression analysis. In IEEE conference on computer vision and pattern recognition (pp. 2562–2569).

Zhu, X., Lei, Z., Liu, X., Shi, H., & Li, S. Z. (2016). Face alignment across large poses: A 3d solution. In IEEE conference on computer vision and pattern recognition.

Titel: From Facial Expression Recognition to Interpersonal Relation Prediction
verfasst von: Zhanpeng Zhang
Ping Luo
Chen Change Loy
Xiaoou Tang
Publikationsdatum: 24.11.2017
Verlag: Springer US
Erschienen in: International Journal of Computer Vision / Ausgabe 5/2018
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI: https://doi.org/10.1007/s11263-017-1055-1

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 5/2018

Do Semantic Parts Emerge in Convolutional Neural Networks?

Appreciation to IJCV Reviewers of 2017

Classification of Multi-class Daily Human Motion using Discriminative Body Parts and Sentence Descriptions

No-Reference Image Quality Assessment for Image Auto-Denoising

Visual Tracking via Subspace Learning: A Discriminative Approach

Dense Reconstruction of Transparent Objects by Altering Incident Light Paths Through Refraction

Premium Partner