Skip to main content
Top
Published in: International Journal of Computer Vision 5/2018

24-11-2017

From Facial Expression Recognition to Interpersonal Relation Prediction

Authors: Zhanpeng Zhang, Ping Luo, Chen Change Loy, Xiaoou Tang

Published in: International Journal of Computer Vision | Issue 5/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Interpersonal relation defines the association, e.g., warm, friendliness, and dominance, between two or more people. We investigate if such fine-grained and high-level relation traits can be characterized and quantified from face images in the wild. We address this challenging problem by first studying a deep network architecture for robust recognition of facial expressions. Unlike existing models that typically learn from facial expression labels alone, we devise an effective multitask network that is capable of learning from rich auxiliary attributes such as gender, age, and head pose, beyond just facial expression data. While conventional supervised training requires datasets with complete labels (e.g., all samples must be labeled with gender, age, and expression), we show that this requirement can be relaxed via a novel attribute propagation method. The approach further allows us to leverage the inherent correspondences between heterogeneous attribute sources despite the disparate distributions of different datasets. With the network we demonstrate state-of-the-art results on existing facial expression recognition benchmarks. To predict inter-personal relation, we use the expression recognition network as branches for a Siamese model. Extensive experiments show that our model is capable of mining mutual context of faces for accurate fine-grained interpersonal prediction.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Footnotes
1
Despite we did not study the integration of face and body cues, if body posture and hand gesture information are available, they can be naturally used as additional input channels for our deep models.
 
Literature
go back to reference Bi, W., & Kwok, J. T. (2014). Multilabel classification with label correlations and missing labels. In AAAI conference on artificial intelligence (pp. 1680–1686). Bi, W., & Kwok, J. T. (2014). Multilabel classification with label correlations and missing labels. In AAAI conference on artificial intelligence (pp. 1680–1686).
go back to reference Bromley, J., Guyon, I., Lecun, Y., Säckinger, E., & Shah, R. (1994). Signature verification using a Siamese time delay neural network. In Advances in neural information processing systems. Bromley, J., Guyon, I., Lecun, Y., Säckinger, E., & Shah, R. (1994). Signature verification using a Siamese time delay neural network. In Advances in neural information processing systems.
go back to reference Celeux, G., Forbes, F., & Peyrard, N. (2003). EM procedures using mean field-like approximations for markov model-based image segmentation. Pattern Recognition, 36(1), 131–144.CrossRefMATH Celeux, G., Forbes, F., & Peyrard, N. (2003). EM procedures using mean field-like approximations for markov model-based image segmentation. Pattern Recognition, 36(1), 131–144.CrossRefMATH
go back to reference Chakraborty, I., Cheng, H., & Javed, O. (2013). 3D visual proxemics: Recognizing human interactions in 3D from a single image. In IEEE conference on computer vision and pattern recognition (pp. 3406–3413). Chakraborty, I., Cheng, H., & Javed, O. (2013). 3D visual proxemics: Recognizing human interactions in 3D from a single image. In IEEE conference on computer vision and pattern recognition (pp. 3406–3413).
go back to reference Chen, Y. Y., Hsu, W. H., & Liao, H. Y. M. (2012). Discovering informative social subgraphs and predicting pairwise relationships from group photos. In ACM multimedia (pp. 669–678). Chen, Y. Y., Hsu, W. H., & Liao, H. Y. M. (2012). Discovering informative social subgraphs and predicting pairwise relationships from group photos. In ACM multimedia (pp. 669–678).
go back to reference Chu, X., Ouyang, W., Yang, W., & Wang, X. (2015). Multi-task recurrent neural network for immediacy prediction. In Proceedings of the IEEE international conference on computer vision (pp. 3352–3360). Chu, X., Ouyang, W., Yang, W., & Wang, X. (2015). Multi-task recurrent neural network for immediacy prediction. In Proceedings of the IEEE international conference on computer vision (pp. 3352–3360).
go back to reference Cristani, M., Raghavendra, R., Del Bue, A., & Murino, V. (2013). Human behavior analysis in video surveillance: A social signal processing perspective. Neurocomputing, 100, 86–97.CrossRef Cristani, M., Raghavendra, R., Del Bue, A., & Murino, V. (2013). Human behavior analysis in video surveillance: A social signal processing perspective. Neurocomputing, 100, 86–97.CrossRef
go back to reference Dahmane, M., & Meunier, J. (2011). Emotion recognition using dynamic grid-based hog features. In IEEE international conference on automatic face & gesture recognition (pp. 884–888). Dahmane, M., & Meunier, J. (2011). Emotion recognition using dynamic grid-based hog features. In IEEE international conference on automatic face & gesture recognition (pp. 884–888).
go back to reference Deng, Z., Vahdat, A., Hu, H., & Mori, G. (2016). Structure inference machines: Recurrent neural networks for analyzing relations in group activity recognition. In IEEE conference on computer vision and pattern recognition. Deng, Z., Vahdat, A., Hu, H., & Mori, G. (2016). Structure inference machines: Recurrent neural networks for analyzing relations in group activity recognition. In IEEE conference on computer vision and pattern recognition.
go back to reference Dhall, A., Asthana, A., Goecke, R., & Gedeon, T. (2011). Emotion recognition using PHOG and LPQ features. In IEEE international conference on automatic face & gesture recognition and workshops (pp. 878–883). Dhall, A., Asthana, A., Goecke, R., & Gedeon, T. (2011). Emotion recognition using PHOG and LPQ features. In IEEE international conference on automatic face & gesture recognition and workshops (pp. 878–883).
go back to reference Dhall, A., Ramana Murthy, O., Goecke, R., Joshi, J., & Gedeon, T. (2015). Video and image based emotion recognition challenges in the wild: Emotiw 2015. In ACM international conference on multimodal interaction (pp. 423–426). Dhall, A., Ramana Murthy, O., Goecke, R., Joshi, J., & Gedeon, T. (2015). Video and image based emotion recognition challenges in the wild: Emotiw 2015. In ACM international conference on multimodal interaction (pp. 423–426).
go back to reference Ding, L., & Yilmaz, A. (2010). Learning relations among movie characters: A social network perspective. In European conference on computer vision. Ding, L., & Yilmaz, A. (2010). Learning relations among movie characters: A social network perspective. In European conference on computer vision.
go back to reference Ding, L., & Yilmaz, A. (2011). Inferring social relations from visual concepts. In IEEE international conference on computer vision (pp. 699–706). Ding, L., & Yilmaz, A. (2011). Inferring social relations from visual concepts. In IEEE international conference on computer vision (pp. 699–706).
go back to reference Emily, M., & Hand, R. C. (2017). Attributes for improved attributes: A multi-task network utilizing implicit and explicit relationships for facial attribute classification. In AAAI conference on artificial intelligence. Emily, M., & Hand, R. C. (2017). Attributes for improved attributes: A multi-task network utilizing implicit and explicit relationships for facial attribute classification. In AAAI conference on artificial intelligence.
go back to reference Fabian Benitez-Quiroz, C., Srinivasan, R., & Martinez, A. M. (2016). Emotionet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In IEEE conference on computer vision and pattern recognition. Fabian Benitez-Quiroz, C., Srinivasan, R., & Martinez, A. M. (2016). Emotionet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In IEEE conference on computer vision and pattern recognition.
go back to reference Fathi, A., Hodgins, J. K., & Rehg, J. M. (2012). Social interactions: A first-person perspective. In IEEE conference on computer vision and pattern recognition. Fathi, A., Hodgins, J. K., & Rehg, J. M. (2012). Social interactions: A first-person perspective. In IEEE conference on computer vision and pattern recognition.
go back to reference Gallagher, A. C., & Chen, T. (2009). Understanding images of groups of people. In IEEE conference on computer vision and pattern recognition (pp. 256–263). IEEE. Gallagher, A. C., & Chen, T. (2009). Understanding images of groups of people. In IEEE conference on computer vision and pattern recognition (pp. 256–263). IEEE.
go back to reference Girard, J. M. (2014). Perceptions of interpersonal behavior are influenced by gender, facial expression intensity, and head pose. In ACM international conference on multimodal interaction (pp. 394–398). Girard, J. M. (2014). Perceptions of interpersonal behavior are influenced by gender, facial expression intensity, and head pose. In ACM international conference on multimodal interaction (pp. 394–398).
go back to reference Gottman, J., Levenson, R., & Woodin, E. (2001). Facial expressions during marital conflict. Journal of Family Communication, 1(1), 37–57.CrossRef Gottman, J., Levenson, R., & Woodin, E. (2001). Facial expressions during marital conflict. Journal of Family Communication, 1(1), 37–57.CrossRef
go back to reference Gupta, A. K., & Nagar, D. K. (1999). Matrix variate distributions. Boca Raton: CRC Press.MATH Gupta, A. K., & Nagar, D. K. (1999). Matrix variate distributions. Boca Raton: CRC Press.MATH
go back to reference Hess, U., Blairy, S., & Kleck, R. E. (2000). The influence of facial emotion displays, gender, and ethnicity on judgments of dominance and affiliation. Journal of Nonverbal Behavior, 24(4), 265–283.CrossRef Hess, U., Blairy, S., & Kleck, R. E. (2000). The influence of facial emotion displays, gender, and ethnicity on judgments of dominance and affiliation. Journal of Nonverbal Behavior, 24(4), 265–283.CrossRef
go back to reference Hoai, M., & Zisserman, A. (2014). Talking heads: Detecting humans and recognizing their interactions. In IEEE conference on computer vision and pattern recognition. Hoai, M., & Zisserman, A. (2014). Talking heads: Detecting humans and recognizing their interactions. In IEEE conference on computer vision and pattern recognition.
go back to reference Huang, C., Li, Y., Loy, C. C., & Tang, X. (2016). Learning deep representation for imbalanced classification. In IEEE conference on computer vision and pattern recognition. Huang, C., Li, Y., Loy, C. C., & Tang, X. (2016). Learning deep representation for imbalanced classification. In IEEE conference on computer vision and pattern recognition.
go back to reference Hung, H., Jayagopi, D., Yeo, C., Friedland, G., Ba, S., Odobez, J. M., et al. (2007). Using audio and video features to classify the most dominant person in a group meeting. In ACM multimedia. Hung, H., Jayagopi, D., Yeo, C., Friedland, G., Ba, S., Odobez, J. M., et al. (2007). Using audio and video features to classify the most dominant person in a group meeting. In ACM multimedia.
go back to reference Ibrahim, M., Muralidharan, S., Deng, Z., Vahdat, A., & Mori, G. (2016). A hierarchical deep temporal model for group activity recognition. In IEEE conference on computer vision and pattern recognition. Ibrahim, M., Muralidharan, S., Deng, Z., Vahdat, A., & Mori, G. (2016). A hierarchical deep temporal model for group activity recognition. In IEEE conference on computer vision and pattern recognition.
go back to reference Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd international conference on machine learning (pp. 448–456). Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd international conference on machine learning (pp. 448–456).
go back to reference Joo, J., Li, W., Steen, F., & Zhu, S. C. (2014). Visual persuasion: Inferring communicative intents of images. In IEEE conference on computer vision and pattern recognition (pp. 216–223). Joo, J., Li, W., Steen, F., & Zhu, S. C. (2014). Visual persuasion: Inferring communicative intents of images. In IEEE conference on computer vision and pattern recognition (pp. 216–223).
go back to reference Jung, H., Lee, S., Yim, J., Park, S., & Kim, J. (2015). Joint fine-tuning in deep neural networks for facial expression recognition. In IEEE international conference on computer vision. Jung, H., Lee, S., Yim, J., Park, S., & Kim, J. (2015). Joint fine-tuning in deep neural networks for facial expression recognition. In IEEE international conference on computer vision.
go back to reference Khorrami, P., Paine, T., & Huang, T. (2015). Do deep neural networks learn facial action units when doing expression recognition? In IEEE international conference on computer vision workshop. Khorrami, P., Paine, T., & Huang, T. (2015). Do deep neural networks learn facial action units when doing expression recognition? In IEEE international conference on computer vision workshop.
go back to reference Kiesler, D. J. (1983). The 1982 interpersonal circle: A taxonomy for complementarity in human transactions. Psychological Review, 90(3), 185.CrossRef Kiesler, D. J. (1983). The 1982 interpersonal circle: A taxonomy for complementarity in human transactions. Psychological Review, 90(3), 185.CrossRef
go back to reference Knutson, B. (1996). Facial expressions of emotion influence interpersonal trait inferences. Journal of Nonverbal Behavior, 20(3), 165–182.CrossRef Knutson, B. (1996). Facial expressions of emotion influence interpersonal trait inferences. Journal of Nonverbal Behavior, 20(3), 165–182.CrossRef
go back to reference Kong, Y., Jia, Y., & Fu, Y. (2012). Learning human interaction by interactive phrases. In European conference on computer vision (pp. 300–313). Kong, Y., Jia, Y., & Fu, Y. (2012). Learning human interaction by interactive phrases. In European conference on computer vision (pp. 300–313).
go back to reference Kostinger, M., Wohlhart, P., Roth, P., & Bischof, H. (2011). Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization. In IEEE international conference on computer vision workshop (pp. 2144–2151). Kostinger, M., Wohlhart, P., Roth, P., & Bischof, H. (2011). Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization. In IEEE international conference on computer vision workshop (pp. 2144–2151).
go back to reference Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou & K. Q. Weinberger (Eds.), Advances in neural information processing systems. Curran Associates, Inc. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou & K. Q. Weinberger (Eds.), Advances in neural information processing systems. Curran Associates, Inc.
go back to reference Kumar, N., Belhumeur, P., & Nayar, S. (2008). Facetracer: A search engine for large collections of images with faces. In European conference on computer vision (pp. 340–353). Berlin: Springer. Kumar, N., Belhumeur, P., & Nayar, S. (2008). Facetracer: A search engine for large collections of images with faces. In European conference on computer vision (pp. 340–353). Berlin: Springer.
go back to reference Lan, T., Sigal, L., & Mori, G. (2012). Social roles in hierarchical models for human activity recognition. In IEEE conference on computer vision and pattern recognition. Lan, T., Sigal, L., & Mori, G. (2012). Social roles in hierarchical models for human activity recognition. In IEEE conference on computer vision and pattern recognition.
go back to reference Lee, D. H. (2013). Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In International conference on machine learning workshop (vol. 3, p. 2). Lee, D. H. (2013). Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In International conference on machine learning workshop (vol. 3, p. 2).
go back to reference Levi, G., & Hassner, T. (2015). Emotion recognition in the wild via convolutional neural networks and mapped binary patterns. In ACM international conference on multimodal interaction (pp. 503–510). Levi, G., & Hassner, T. (2015). Emotion recognition in the wild via convolutional neural networks and mapped binary patterns. In ACM international conference on multimodal interaction (pp. 503–510).
go back to reference Li, H., Lin, Z., Shen, X., Brandt, J., & Hua, G. (2015). A convolutional neural network cascade for face detection. In IEEE conference on computer vision and pattern recognition. Li, H., Lin, Z., Shen, X., Brandt, J., & Hua, G. (2015). A convolutional neural network cascade for face detection. In IEEE conference on computer vision and pattern recognition.
go back to reference Liu, M., Li, S., Shan, S., & Chen, X. (2015). AU-inspired deep networks for facial expression feature learning. Neurocomputing, 159, 126–136.CrossRef Liu, M., Li, S., Shan, S., & Chen, X. (2015). AU-inspired deep networks for facial expression feature learning. Neurocomputing, 159, 126–136.CrossRef
go back to reference Liu, M., Li, S., Shan, S., Wang, R., & Chen, X. (2014a). Deeply learning deformable facial action parts model for dynamic expression analysis. In Asian conference on computer vision. Liu, M., Li, S., Shan, S., Wang, R., & Chen, X. (2014a). Deeply learning deformable facial action parts model for dynamic expression analysis. In Asian conference on computer vision.
go back to reference Liu, M., Shan, S., Wang, R., & Chen, X. (2014b). Learning expressionlets on spatio-temporal manifold for dynamic facial expression recognition. In IEEE conference on computer vision and pattern recognition. Liu, M., Shan, S., Wang, R., & Chen, X. (2014b). Learning expressionlets on spatio-temporal manifold for dynamic facial expression recognition. In IEEE conference on computer vision and pattern recognition.
go back to reference Liu, P., Han, S., Meng, Z., & Tong, Y. (2014c). Facial expression recognition via a boosted deep belief network. In IEEE conference on computer vision and pattern recognition (pp. 1805–1812). Liu, P., Han, S., Meng, Z., & Tong, Y. (2014c). Facial expression recognition via a boosted deep belief network. In IEEE conference on computer vision and pattern recognition (pp. 1805–1812).
go back to reference Liu, S., Yang, J., Huang, C., & Yang, M. H. (2015a). Multi-objective convolutional learning for face labeling. In IEEE conference on computer vision and pattern recognition. Liu, S., Yang, J., Huang, C., & Yang, M. H. (2015a). Multi-objective convolutional learning for face labeling. In IEEE conference on computer vision and pattern recognition.
go back to reference Liu, Z., Luo, P., Wang, X., & Tang, X. (2015b). Deep learning face attributes in the wild. In IEEE international conference on computer vision. Liu, Z., Luo, P., Wang, X., & Tang, X. (2015b). Deep learning face attributes in the wild. In IEEE international conference on computer vision.
go back to reference Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar, Z., & Matthews, I. (2010). The extended Cohn–Kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression. In IEEE conference on computer vision and pattern recognition workshops (pp. 94–101). Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar, Z., & Matthews, I. (2010). The extended Cohn–Kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression. In IEEE conference on computer vision and pattern recognition workshops (pp. 94–101).
go back to reference Luo, P., Wang, X., & Tang, X. (2012). Hierarchical face parsing via deep learning. In IEEE conference on computer vision and pattern recognition. Luo, P., Wang, X., & Tang, X. (2012). Hierarchical face parsing via deep learning. In IEEE conference on computer vision and pattern recognition.
go back to reference Lyons, M. J., Budynek, J., & Akamatsu, S. (1999). Automatic classification of single facial images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(12), 1357–1362.CrossRef Lyons, M. J., Budynek, J., & Akamatsu, S. (1999). Automatic classification of single facial images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(12), 1357–1362.CrossRef
go back to reference Mollahosseini, A., Chan, D., & Mahoor, M. H. (2016). Going deeper in facial expression recognition using deep neural networks. In IEEE winter conference on applications of computer vision. Mollahosseini, A., Chan, D., & Mahoor, M. H. (2016). Going deeper in facial expression recognition using deep neural networks. In IEEE winter conference on applications of computer vision.
go back to reference Moody, J., Hanson, S., Krogh, A., & Hertz, J. A. (1995). A simple weight decay can improve generalization. Advances in Neural Information Processing Systems, 4, 950–957. Moody, J., Hanson, S., Krogh, A., & Hertz, J. A. (1995). A simple weight decay can improve generalization. Advances in Neural Information Processing Systems, 4, 950–957.
go back to reference Ng, H. W., Nguyen, V. D., Vonikakis, V., & Winkler, S. (2015). Deep learning for emotion recognition on small datasets using transfer learning. In ACM international conference on multimodal interaction (pp. 443–449). Ng, H. W., Nguyen, V. D., Vonikakis, V., & Winkler, S. (2015). Deep learning for emotion recognition on small datasets using transfer learning. In ACM international conference on multimodal interaction (pp. 443–449).
go back to reference Opitz, M., Waltner, G., Poier, G., Possegger, H., & Bischof, H. (2016). Grid loss: Detecting occluded faces. In European conference on computer vision. Opitz, M., Waltner, G., Poier, G., Possegger, H., & Bischof, H. (2016). Grid loss: Detecting occluded faces. In European conference on computer vision.
go back to reference Pantic, M., Cowie, R., D’Errico, F., Heylen, D., Mehu, M., Pelachaud, C., et al. (2011). Social signal processing: The research agenda. In T. B. Moeslund, A. Hilton, V. Krüger & L. Sigal (Eds.), Visual analysis of humans (pp. 511–538). Berlin: Springer. Pantic, M., Cowie, R., D’Errico, F., Heylen, D., Mehu, M., Pelachaud, C., et al. (2011). Social signal processing: The research agenda. In T. B. Moeslund, A. Hilton, V. Krüger & L. Sigal (Eds.), Visual analysis of humans (pp. 511–538). Berlin: Springer.
go back to reference Pantic, M., Valstar, M., Rademaker, R., & Maat, L. (2005). Web-based database for facial expression analysis. In IEEE international conference on multimedia and expo. Pantic, M., Valstar, M., Rademaker, R., & Maat, L. (2005). Web-based database for facial expression analysis. In IEEE international conference on multimedia and expo.
go back to reference Parkhi, O. M., Vedaldi, A., & Zisserman, A. (2015). Deep face recognition. In British machine vision conference. Parkhi, O. M., Vedaldi, A., & Zisserman, A. (2015). Deep face recognition. In British machine vision conference.
go back to reference Pearson, K. (1895). Note on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58, 240–242.CrossRef Pearson, K. (1895). Note on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58, 240–242.CrossRef
go back to reference Pentland, A. (2007). Social signal processing. IEEE Signal Processing Magazine, 24(4), 108.CrossRef Pentland, A. (2007). Social signal processing. IEEE Signal Processing Magazine, 24(4), 108.CrossRef
go back to reference Raducanu, B., & Gatica-Perez, D. (2012). Inferring competitive role patterns in reality TV show through nonverbal analysis. Multimedia Tools and Applications, 56(1), 207–226.CrossRef Raducanu, B., & Gatica-Perez, D. (2012). Inferring competitive role patterns in reality TV show through nonverbal analysis. Multimedia Tools and Applications, 56(1), 207–226.CrossRef
go back to reference Ramanathan, V., Yao, B., & Fei-Fei, L. (2013). Social role discovery in human events. In IEEE conference on computer vision and pattern recognition (pp. 2475–2482). Ramanathan, V., Yao, B., & Fei-Fei, L. (2013). Social role discovery in human events. In IEEE conference on computer vision and pattern recognition (pp. 2475–2482).
go back to reference Ricci, E., Varadarajan, J., Subramanian, R., Rota Bulo, S., Ahuja, N., & Lanz, O. (2015). Uncovering interactions and interactors: Joint estimation of head, body orientation and f-formations from surveillance videos. In IEEE international conference on computer vision. Ricci, E., Varadarajan, J., Subramanian, R., Rota Bulo, S., Ahuja, N., & Lanz, O. (2015). Uncovering interactions and interactors: Joint estimation of head, body orientation and f-formations from surveillance videos. In IEEE international conference on computer vision.
go back to reference Ruiz, A., Van de Weijer, J., & Binefa, X. (2015). From emotions to action units with hidden and semi-hidden-task learning. In IEEE international conference on computer vision (pp. 3703–3711). Ruiz, A., Van de Weijer, J., & Binefa, X. (2015). From emotions to action units with hidden and semi-hidden-task learning. In IEEE international conference on computer vision (pp. 3703–3711).
go back to reference Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.MathSciNetCrossRef Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.MathSciNetCrossRef
go back to reference Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embedding for face recognition and clustering. In IEEE conference on computer vision and pattern recognition. Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embedding for face recognition and clustering. In IEEE conference on computer vision and pattern recognition.
go back to reference Shan, C., Gong, S., & McOwan, P. W. (2009). Facial expression recognition based on local binary patterns: A comprehensive study. Image and Vision Computing, 27(6), 803–816.CrossRef Shan, C., Gong, S., & McOwan, P. W. (2009). Facial expression recognition based on local binary patterns: A comprehensive study. Image and Vision Computing, 27(6), 803–816.CrossRef
go back to reference Sun, Y., Wang, X., & Tang, X. (2016). Sparsifying neural network connections for face recognition. In IEEE conference on computer vision and pattern recognition. Sun, Y., Wang, X., & Tang, X. (2016). Sparsifying neural network connections for face recognition. In IEEE conference on computer vision and pattern recognition.
go back to reference Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2015). Going deeper with convolutions. In IEEE conference on computer vision and pattern recognition. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2015). Going deeper with convolutions. In IEEE conference on computer vision and pattern recognition.
go back to reference Tian, Y., Kanade, T., & Cohn, J. F. (2011). Facial expression recognition. In S. Z. Li & A. K. Jain (Eds.), Handbook of face recognition. Berlin: Springer. Tian, Y., Kanade, T., & Cohn, J. F. (2011). Facial expression recognition. In S. Z. Li & A. K. Jain (Eds.), Handbook of face recognition. Berlin: Springer.
go back to reference Tian, Y. I., Kanade, T., & Cohn, J. F. (2001). Recognizing action units for facial expression analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(2), 97–115.CrossRef Tian, Y. I., Kanade, T., & Cohn, J. F. (2001). Recognizing action units for facial expression analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(2), 97–115.CrossRef
go back to reference Tianqi, C., Li, M., Li, Y., Lin, M., Wang, N., Wang, M., et al. (2016). Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. In NIPS workshop on machine learning systems. Tianqi, C., Li, M., Li, Y., Lin, M., Wang, N., Wang, M., et al. (2016). Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. In NIPS workshop on machine learning systems.
go back to reference Trigeorgis, G., Snape, P., Nicolaou, M. A., Antonakos, E., & Zafeiriou, S. (2016). Mnemonic descent method: A recurrent process applied for end-to-end face alignment. In IEEE conference on computer vision and pattern recognition. Trigeorgis, G., Snape, P., Nicolaou, M. A., Antonakos, E., & Zafeiriou, S. (2016). Mnemonic descent method: A recurrent process applied for end-to-end face alignment. In IEEE conference on computer vision and pattern recognition.
go back to reference Valstar, M. F., Mehu, M., Jiang, B., Pantic, M., & Scherer, K. (2012). Meta-analysis of the first facial expression recognition challenge. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 42(4), 966–979.CrossRef Valstar, M. F., Mehu, M., Jiang, B., Pantic, M., & Scherer, K. (2012). Meta-analysis of the first facial expression recognition challenge. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 42(4), 966–979.CrossRef
go back to reference Vinciarelli, A., Pantic, M., & Bourlard, H. (2009). Social signal processing: Survey of an emerging domain. Image and Vision Computing, 27(12), 1743–1759.CrossRef Vinciarelli, A., Pantic, M., & Bourlard, H. (2009). Social signal processing: Survey of an emerging domain. Image and Vision Computing, 27(12), 1743–1759.CrossRef
go back to reference Vinciarelli, A., Pantic, M., Heylen, D., Pelachaud, C., Poggi, I., D’Errico, F., et al. (2012). Bridging the gap between social animal and unsocial machine: A survey of social signal processing. IEEE Transactions on Affective Computing, 3(1), 69–87.CrossRef Vinciarelli, A., Pantic, M., Heylen, D., Pelachaud, C., Poggi, I., D’Errico, F., et al. (2012). Bridging the gap between social animal and unsocial machine: A survey of social signal processing. IEEE Transactions on Affective Computing, 3(1), 69–87.CrossRef
go back to reference Wang, G., Gallagher, A., Luo, J., & Forsyth, D. (2010). Seeing people in social context: Recognizing people and social relationships. In European conference on computer vision (pp. 169–182). Wang, G., Gallagher, A., Luo, J., & Forsyth, D. (2010). Seeing people in social context: Recognizing people and social relationships. In European conference on computer vision (pp. 169–182).
go back to reference Wang, J., Cheng, Y., & Feris, R. S. (2016). Walk and learn: Facial attribute representation learning from egocentric video and contextual data. In IEEE conference on computer vision and pattern recognition. Wang, J., Cheng, Y., & Feris, R. S. (2016). Walk and learn: Facial attribute representation learning from egocentric video and contextual data. In IEEE conference on computer vision and pattern recognition.
go back to reference Weng, C. Y., Chu, W. T., & Wu, J. L. (2009). RoleNet: Movie analysis from the perspective of social networks. IEEE Transactions on Multimedia, 11(2), 256–271.CrossRef Weng, C. Y., Chu, W. T., & Wu, J. L. (2009). RoleNet: Movie analysis from the perspective of social networks. IEEE Transactions on Multimedia, 11(2), 256–271.CrossRef
go back to reference Wu, Y., & Ji, Q. (2016). Constrained joint cascade regression framework for simultaneous facial action unit recognition and facial landmark detection. In IEEE conference on computer vision and pattern recognition. Wu, Y., & Ji, Q. (2016). Constrained joint cascade regression framework for simultaneous facial action unit recognition and facial landmark detection. In IEEE conference on computer vision and pattern recognition.
go back to reference Yang, B., Yan, J., Lei, Z., & Li, S. Z. (2014). Aggregate channel features for multi-view face detection. In International joint conference on biometrics. Yang, B., Yan, J., Lei, Z., & Li, S. Z. (2014). Aggregate channel features for multi-view face detection. In International joint conference on biometrics.
go back to reference Yang, B., Yan, J., Lei, Z., & Li, S. Z. (2015). Convolutional channel features. In IEEE international conference on computer vision. Yang, B., Yan, J., Lei, Z., & Li, S. Z. (2015). Convolutional channel features. In IEEE international conference on computer vision.
go back to reference Yang, H., Zhou, J. T., & Cai, J. (2016). Improving multi-label learning with missing labels by structured semantic correlations. In European conference on computer vision (pp. 835–851). Yang, H., Zhou, J. T., & Cai, J. (2016). Improving multi-label learning with missing labels by structured semantic correlations. In European conference on computer vision (pp. 835–851).
go back to reference Yang, S., Luo, P., Loy, C. C., & Tang, X. (2015). From facial parts responses to face detection: A deep learning approach. In IEEE international conference on computer vision. Yang, S., Luo, P., Loy, C. C., & Tang, X. (2015). From facial parts responses to face detection: A deep learning approach. In IEEE international conference on computer vision.
go back to reference Yang, S., Luo, P., Loy, C. C., & Tang, X. (2016). Wider face: A face detection benchmark. In IEEE conference on computer vision and pattern recognition. Yang, S., Luo, P., Loy, C. C., & Tang, X. (2016). Wider face: A face detection benchmark. In IEEE conference on computer vision and pattern recognition.
go back to reference Yao, A., Shao, J., Ma, N., & Chen, Y. (2015). Capturing au-aware facial features and their latent relations for emotion recognition in the wild. In ACM international conference on multimodal interaction (pp. 451–458). Yao, A., Shao, J., Ma, N., & Chen, Y. (2015). Capturing au-aware facial features and their latent relations for emotion recognition in the wild. In ACM international conference on multimodal interaction (pp. 451–458).
go back to reference Yu, H. F., Jain, P., Kar, P., & Dhillon, I. (2014). Large-scale multi-label learning with missing labels. In International conference on machine learning (pp. 593–601). Yu, H. F., Jain, P., Kar, P., & Dhillon, I. (2014). Large-scale multi-label learning with missing labels. In International conference on machine learning (pp. 593–601).
go back to reference Yu, Z., & Zhang, C. (2015). Image based static facial expression recognition with multiple deep network learning. In ACM international conference on multimodal interaction (pp. 435–442). Yu, Z., & Zhang, C. (2015). Image based static facial expression recognition with multiple deep network learning. In ACM international conference on multimodal interaction (pp. 435–442).
go back to reference Zafeiriou, S., Papaioannou, A., Kotsia, I., Nicolaou, M. A., & Zhao, G. (2016). Facial affect in-the-wild: A survey and a new database. In IEEE conference on computer vision and pattern recognition workshop. Zafeiriou, S., Papaioannou, A., Kotsia, I., Nicolaou, M. A., & Zhao, G. (2016). Facial affect in-the-wild: A survey and a new database. In IEEE conference on computer vision and pattern recognition workshop.
go back to reference Zelnik-Manor, L., & Perona, P. (2004). Self-tuning spectral clustering. In NIPS (pp. 1601–1608). Zelnik-Manor, L., & Perona, P. (2004). Self-tuning spectral clustering. In NIPS (pp. 1601–1608).
go back to reference Zhang, N., Paluri, M., Ranzato, M., Darrell, T., & Bourdev, L. (2014). Panda: Pose aligned networks for deep attribute modeling. In IEEE conference on computer vision and pattern recognition. Zhang, N., Paluri, M., Ranzato, M., Darrell, T., & Bourdev, L. (2014). Panda: Pose aligned networks for deep attribute modeling. In IEEE conference on computer vision and pattern recognition.
go back to reference Zhang, Z., Luo, P., Loy, C. C., & Tang, X. (2015a). Learning deep representation for face alignment with auxiliary attributes. In IEEE transactions on pattern analysis and machine intelligence. Zhang, Z., Luo, P., Loy, C. C., & Tang, X. (2015a). Learning deep representation for face alignment with auxiliary attributes. In IEEE transactions on pattern analysis and machine intelligence.
go back to reference Zhang, Z., Luo, P., Loy, C. C., & Tang, X. (2015b). Learning social relation traits from face images. In IEEE international conference on computer vision. Zhang, Z., Luo, P., Loy, C. C., & Tang, X. (2015b). Learning social relation traits from face images. In IEEE international conference on computer vision.
go back to reference Zhang, Z., Luo, P., Loy, C. C., & Tang, X. (2016). Joint face representation adaptation and clustering in videos. In European conference on computer vision. Zhang, Z., Luo, P., Loy, C. C., & Tang, X. (2016). Joint face representation adaptation and clustering in videos. In European conference on computer vision.
go back to reference Zhao, G., Huang, X., Taini, M., Li, S. Z., & Pietikäinen, M. (2011). Facial expression recognition from near-infrared videos. Image and Vision Computing, 29(9), 607–619.CrossRef Zhao, G., Huang, X., Taini, M., Li, S. Z., & Pietikäinen, M. (2011). Facial expression recognition from near-infrared videos. Image and Vision Computing, 29(9), 607–619.CrossRef
go back to reference Zhao, G., & Pietikainen, M. (2007). Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(6), 915–928.CrossRef Zhao, G., & Pietikainen, M. (2007). Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(6), 915–928.CrossRef
go back to reference Zhao, X., Liang, X., Liu, L., Li, T., Vasconcelos, N., & Yan, S. (2016). Peak-piloted deep network for facial expression recognition. In European conference on computer vision. Zhao, X., Liang, X., Liu, L., Li, T., Vasconcelos, N., & Yan, S. (2016). Peak-piloted deep network for facial expression recognition. In European conference on computer vision.
go back to reference Zhong, L., Liu, Q., Yang, P., Liu, B., Huang, J., & Metaxas, D. N. (2012). Learning active facial patches for expression analysis. In IEEE conference on computer vision and pattern recognition (pp. 2562–2569). Zhong, L., Liu, Q., Yang, P., Liu, B., Huang, J., & Metaxas, D. N. (2012). Learning active facial patches for expression analysis. In IEEE conference on computer vision and pattern recognition (pp. 2562–2569).
go back to reference Zhu, X., Lei, Z., Liu, X., Shi, H., & Li, S. Z. (2016). Face alignment across large poses: A 3d solution. In IEEE conference on computer vision and pattern recognition. Zhu, X., Lei, Z., Liu, X., Shi, H., & Li, S. Z. (2016). Face alignment across large poses: A 3d solution. In IEEE conference on computer vision and pattern recognition.
Metadata
Title
From Facial Expression Recognition to Interpersonal Relation Prediction
Authors
Zhanpeng Zhang
Ping Luo
Chen Change Loy
Xiaoou Tang
Publication date
24-11-2017
Publisher
Springer US
Published in
International Journal of Computer Vision / Issue 5/2018
Print ISSN: 0920-5691
Electronic ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-017-1055-1

Other articles of this Issue 5/2018

International Journal of Computer Vision 5/2018 Go to the issue

Premium Partner