nach oben

International Journal of Computer Vision

Erschienen in:

29.11.2018

Blended Emotion in-the-Wild: Multi-label Facial Expression Recognition Using Crowdsourced Annotations and Deep Locality Feature Learning

verfasst von: Shan Li, Weihong Deng

Erschienen in: International Journal of Computer Vision | Ausgabe 6-7/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Comprehending different categories of facial expressions plays a great role in the design of computational model analyzing human perceived and affective state. Authoritative studies have revealed that facial expressions in human daily life are in multiple or co-occurring mental states. However, due to the lack of valid datasets, most previous studies are still restricted to basic emotions with single label. In this paper, we present a novel multi-label facial expression database, RAF-ML, along with a new deep learning algorithm, to address this problem. Specifically, a crowdsourcing annotation of 1.2 million labels from 315 participants was implemented to identify the multi-label expressions collected from social network, then EM algorithm was designed to filter out unreliable labels. For all we know, RAF-ML is the first database in the wild that provides with crowdsourced cognition for multi-label expressions. Focusing on the ambiguity and continuity of blended expressions, we propose a new deep manifold learning network, called Deep Bi-Manifold CNN, to learn the discriminative feature for multi-label expressions by jointly preserving the local affinity of deep features and the manifold structures of emotion labels. Furthermore, a deep domain adaption method is leveraged to extend the deep manifold features learned from RAF-ML to other expression databases under various imaging conditions and cultures. Extensive experiments on the RAF-ML and other diverse databases (JAFFE, CK\(+\), SFEW and MMI) show that the deep manifold feature is not only superior in multi-label expression recognition in the wild, but also captures the elemental and generic components that are effective for a wide range of expression recognition tasks.

Vorheriger Artikel Identity-Preserving Face Recovery from Stylized Portraits

Nächster Artikel Deep Affect Prediction in-the-Wild: Aff-Wild Database and Challenge, Deep Architectures, and Beyond

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nur mit Berechtigung zugänglich

http://www.whdeng.cn/RAF/model2.html.

Ekman have stated in Ekman et al. (2013), “if the stimulus does contain an emotion blend, and the investigator allows only a single choice which does not contain blend terms, low levels of agreement may result, since some of the observers may choose a term for one of the blend components, some for another.”

Compound emotions out of these 4908 images and the other basic emotions have been presented in RAF-DB (Li et al. 2017; Li and Deng 2019).

It’s reasonable that there are no images with five or six labels in RAF-ML, since people can hardly perceive both negative and positive valence-level (for example, joyful and angry) simultaneously from still images, which occupy different regions in the valence-activation space (Cowie et al. 2001; Russell and Barrett 1999).

Anitha, C., Venkatesha, M., & Adiga, B. S. (2010). A survey on facial expression databases. International Journal of Engineering Science and Technology, 2(10), 5158–5174.

Chang, C. C., & Lin, C. J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2, 27:1–27:27. Software Available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.

Chang, Y., Hu, C., & Turk, M. (2004). Probabilistic expression analysis on manifolds. In Computer vision and pattern recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE computer society conference on (Vol. 2, pp. II–II). IEEE.

Chen, J., Liu, X., Tu, P., & Aragones, A. (2013). Learning person-specific models for facial expression and action unit recognition. Pattern Recognition Letters, 34(15), 1964–1970.CrossRef

Chu, W. S., De la Torre, F., & Cohn, J. F. (2013). Selective transfer machine for personalized facial action unit detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3515–3522).

Cour, T., Sapp, B., & Taskar, B. (2011). Learning from partial labels. Journal of Machine Learning Research, 12(May), 1501–1536.MathSciNetMATH

Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., et al. (2001). Emotion recognition in human–computer interaction. IEEE Signal Processing Magazine, 18(1), 32–80.CrossRef

Csurka, G. (2017). Domain adaptation for visual applications: A comprehensive survey. CoRR, arXiv:1702.05374.

Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In: CVPR, on (Vol. 1, pp. 886–893). IEEE.

Dhall, A., Goecke, R., & Gedeon, T. (2015a). Automatic group happiness intensity analysis. IEEE Transactions on Affective Computing, 6(1), 13–26.CrossRef

Dhall, A., Ramana Murthy, O., Goecke, R., Joshi, J., & Gedeon, T. (2015b). Video and image based emotion recognition challenges in the wild: Emotiw 2015. In Proceedings of the 2015 ACM on international conference on multimodal interaction (pp. 423–426). ACM.

Ding, X., Chu, W. S., De la Torre, F., Cohn, J. F., & Wang, Q. (2013). Facial action unit event detection by cascade of tasks. In Proceedings of the IEEE international conference on computer vision (pp. 2400–2407).

Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., & Darrell, T. (2014). Decaf: A deep convolutional activation feature for generic visual recognition. In ICML (pp. 647–655).

Du, S., Tao, Y., & Martinez, A. M. (2014). Compound facial expressions of emotion. Proceedings of the National Academy of Sciences, 111(15), E1454–E1462.CrossRef

Ekman, P., & Friesen, W. V. (2003). Unmasking the face: A guide to recognizing emotions from facial clues. Journal of Personality (p. 212). Cambridge, MA: Malor Books.

Ekman, P., Friesen, W. V., & Ellsworth, P. (2013). Emotion in the human face: Guidelines for research and an integration of findings. Amsterdam: Elsevier.

Ekman, P., & Rosenberg, E. L. (1997). What the face reveals: Basic and applied studies of spontaneous expression using the Facial Action Coding System (FACS). Oxford: Oxford University Press.

Ekman, P., & Scherer, K. (1984). Expression and the nature of emotion. Approaches to Emotion, 3, 19–344.

Eleftheriadis, S., Rudovic, O., & Pantic, M. (2015a). Discriminative shared gaussian processes for multiview and view-invariant facial expression recognition. IEEE Transactions on Image Processing, 24(1), 189–204.MathSciNetCrossRefMATH

Eleftheriadis, S., Rudovic, O., & Pantic, M. (2015b). Multi-conditional latent variable model for joint facial action unit detection. In Proceedings of the IEEE international conference on computer vision (pp. 3792–3800).

Fabian Benitez-Quiroz, C., Srinivasan, R., & Martinez, A. M. (2016). Emotionet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5562–5570).

Fürnkranz, J., Hüllermeier, E., Mencía, E. L., & Brinker, K. (2008). Multilabel classification via calibrated label ranking. Machine Learning, 73(2), 133–153.CrossRef

Gao, B. B., Xing, C., Xie, C. W., Wu, J., & Geng, X. (2017). Deep label distribution learning with label ambiguity. IEEE Transactions on Image Processing, 26(6), 2825–2838.MathSciNetCrossRefMATH

Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölkopf, B., & Smola, A. (2012a). A kernel two-sample test. Journal of Machine Learning Research, 13(Mar), 723–773.MathSciNetMATH

Gretton, A., Sejdinovic, D., Strathmann, H., Balakrishnan, S., Pontil, M., Fukumizu, K., & Sriperumbudur, B. K. (2012b). Optimal kernel choice for large-scale two-sample tests. In Advances in neural information processing systems (pp. 1205–1213).

Hadsell, R., Chopra, S., & LeCun, Y. (2006). Dimensionality reduction by learning an invariant mapping. In Computer vision and pattern recognition, 2006 IEEE computer society conference on (Vol. 2, pp. 1735–1742). IEEE.

Hassin, R. R., Aviezer, H., & Bentin, S. (2013). Inherently ambiguous: Facial expressions of emotions, in context. Emotion Review, 5(1), 60–65.CrossRef

He, X., & Niyogi, P. (2004). Locality preserving projections. In Advances in neural information processing systems (pp. 153–160).

Hou, P., Geng, X., & Zhang, M. L. (2016). Multi-label manifold learning. In AAAI (pp. 1680–1686).

Huang, S. J., Zhou, Z. H., & Zhou, Z. (2012). Multi-label learning by exploiting label correlations locally. In AAAI (pp. 949–955).

Inc. M. (2013). Face++ research toolkit. www.faceplusplus.com.

Izard, C. E. (1972). Anxiety: A variable combination of interacting fundamental emotions. In Anxiety: Current trends in theory and research (Vol. 1, pp. 55–106).

Izard, C. E. (2013). Human emotions. Berlin: Springer.

Jack, R. E., Garrod, O. G., Yu, H., Caldara, R., & Schyns, P. G. (2012). Facial expressions of emotion are not culturally universal. Proceedings of the National Academy of Sciences, 109(19), 7241–7244.CrossRef

Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., et al. (2014). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on multimedia (pp. 675–678). ACM.

Jung, H., Lee, S., Yim, J., Park, S., & Kim, J. (2015). Joint fine-tuning in deep neural networks for facial expression recognition. In Proceedings of the IEEE international conference on computer vision (pp. 2983–2991).

Kim, B. K., Roh, J., Dong, S. Y., & Lee, S. Y. (2016). Hierarchical committee of deep convolutional neural networks for robust facial expression recognition. Journal on Multimodal User, Interfaces, 1–17.

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105).

Li, S., & Deng, W. (2018). Deep facial expression recognition: A survey. CoRR, arXiv:1804.08348.

Li, S., & Deng, W. (2019). Reliable crowdsourcing and deep locality-preserving learning for unconstrained facial expression recognition. IEEE Transactions on Image Processing, 28(1), 356–370.MathSciNetCrossRefMATH

Li, S., Deng, W., & Du, J. (2017). Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In 2017 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2584–2593). IEEE.

Liu, M., Li, S., Shan, S., & Chen, X. (2013). Au-aware deep networks for facial expression recognition. In Automatic face and gesture recognition (FG), 2013 10th IEEE international conference and workshops on (pp. 1–6). IEEE.

Liu, M., Li, S., Shan, S., Wang, R., & Chen, X. (2014a). Deeply learning deformable facial action parts model for dynamic expression analysis. In Asian conference on computer vision (pp. 143–157). Berlin: Springer.

Liu, M., Shan, S., Wang, R., & Chen, X. (2014b). Learning expression lets on spatio-temporal manifold for dynamic facial expression recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1749–1756).

Liu, M., Shan, S., Wang, R., & Chen, X. (2016). Learning expressionlets via universal manifold model for dynamic facial expression recognition. IEEE Transactions on Image Processing, 25(12), 5920–5932.MathSciNetCrossRefMATH

Long, M., Cao, Y., Wang, J., & Jordan, M. (2015). Learning transferable features with deep adaptation networks. In International conference on machine learning (pp. 97–105).

Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar, Z., & Matthews, I. (2010). The extended Cohn–Kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression. In CVPRW, on (pp. 94–101). IEEE.

Lv, Y., Feng, Z., & Xu, C. (2014). Facial expression recognition via deep learning. In Smart computing (SMARTCOMP), 2014 international conference on (pp. 303–308). IEEE.

Lyons, M., Akamatsu, S., Kamachi, M., & Gyoba, J. (1998). Coding facial expressions with Gabor wavelets. In Automatic face and gesture recognition, 1998. Proceedings. Third IEEE international conference on (pp. 200–205). IEEE.

Miao, Y. Q., Araujo, R., & Kamel, M. S. (2012) Cross-domain facial expression recognition using supervised kernel mean matching. In Machine learning and applications (ICMLA), 2012 11th international conference on, IEEE (Vol. 2, pp. 326–332).

Mollahosseini, A., Chan, D., & Mahoor, M. H. (2016). Going deeper in facial expression recognition using deep neural networks. In 2016 IEEE Winter conference on applications of computer vision (WACV) (pp. 1–10). IEEE.

Ng, H. W., Nguyen, V. D., Vonikakis, V., & Winkler, S. (2015). Deep learning for emotion recognition on small datasets using transfer learning. In Proceedings of the 2015 ACM on international conference on multimodal interaction (pp. 443–449). ACM.

Nummenmaa, T. (1988). The recognition of pure and blended facial expressions of emotion from still photographs. Scandinavian Journal of Psychology, 29(1), 33–47.CrossRef

Ojala, T., Pietikäinen, M., & Mäenpää, T. (2002). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7), 971–987.CrossRefMATH

Pantic, M., & Rothkrantz, L. J. M. (2000). Automatic analysis of facial expressions: The state of the art. IEEE Transactions on pattern analysis and machine intelligence, 22(12), 1424–1445.CrossRef

Patel, V. M., Gopalan, R., Li, R., & Chellappa, R. (2015). Visual domain adaptation: A survey of recent advances. IEEE Signal Processing Magazine, 32(3), 53–69.CrossRef

Plutchik, R. (1991). The emotions. Lanham: University Press of America.

Russell, J. A., & Barrett, L. F. (1999). Core affect, prototypical emotional episodes, and other things called emotion: Dissecting the elephant. Journal of Personality and Social Psychology, 76(5), 805.CrossRef

Sariyanidi, E., Gunes, H., & Cavallaro, A. (2015). Automatic analysis of facial affect: A survey of registration, representation, and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(6), 1113–1133.CrossRef

Sariyanidi, E., Gunes, H., & Cavallaro, A. (2017). Learning bases of activity for facial expression recognition. IEEE Transactions on Image Processing, 26(4), 1965–1978. https://doi.org/10.1109/TIP.2017.2662237.MathSciNetCrossRefMATH

Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition. (pp. 815–823).

Sharif Razavian, A., Azizpour, H., Sullivan, J., & Carlsson, S. (2014). CNN features off-the-shelf: An astounding baseline for recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 806–813).

Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. CoRR, arXiv:1409.1556.

Tomkins, S. S. (1963). Affect imagery consciousness: Volume II: The negative affects (Vol. 2). Berlin: Springer.

Tsoumakas, G., & Katakis, I. (2006). Multi-label classification: An overview. International Journal of Data Warehousing and Mining, 3(3), 1–13.CrossRef

Tsoumakas, G., & Vlahavas, I. (2007). Random k-labelsets: An ensemble method for multilabel classification. In Machine learning: ECML 2007 (pp. 406–417). Berlin: Springer.

Valstar, M., & Pantic, M. (2010). Induced disgust, happiness and surprise: An addition to the MMI facial expression database. In Proceedings of the 3rd international workshop on EMOTION (satellite of LREC): Corpora for research on emotion and affect (p. 65).

Viola, P., & Jones, M. (2001) Rapid object detection using a boosted cascade of simple features. In CVPR, on (Vol. 1, pp. 1–511). IEEE.

Wang, S., Liu, Z., Wang, J., Wang, Z., Li, Y., Chen, X., et al. (2014). Exploiting multi-expression dependences for implicit multi-emotion video tagging. Image and Vision Computing, 32(10), 682–691.CrossRef

Wen, Y., Zhang, K., Li, Z., & Qiao, Y. (2016). A discriminative feature learning approach for deep face recognition. In European conference on computer vision. Berlin: Springer (pp. 499–515).

Whitehill, J., Wu, T. F., Bergsma, J., Movellan, J. R., & Ruvolo, P. L. (2009). Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In Advances in neural information processing systems (pp. 2035–2043).

Xing, C., Geng, X., & Xue, H. (2016). Logistic boosting regression for label distribution learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4489–4497).

Xiong, X., & De la Torre, F. (2013). Supervised descent method and its applications to face alignment. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 532–539).

Yan, H., Ang, M. H., & Poo, A. N. (2011). Cross-dataset facial expression recognition. In Robotics and automation (ICRA), 2011 IEEE international conference on (pp. 5985–5990). IEEE.

Yin, L., Wei, X., Sun, Y., Wang, J., & Rosato, M. J. (2006). A 3d facial expression database for facial behavior research. In Automatic face and gesture recognition, 2006. FGR 2006. 7th international conference on (pp. 211–216). IEEE.

Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How transferable are features in deep neural networks? In Advances in neural information processing systems (pp. 3320–3328).

Yu, Z., & Zhang, C. (2015). Image based static facial expression recognition with multiple deep network learning. In Proceedings of the 2015 ACM on international conference on multimodal interaction (pp. 435–442). ACM.

Zen, G., Porzi, L., Sangineto, E., Ricci, E., & Sebe, N. (2016). Learning personalized models for facial expression analysis and gesture recognition. IEEE Transactions on Multimedia, 18(4), 775–788.CrossRef

Zeng, J., Chu, W. S., De la Torre, F., Cohn, J. F., & Xiong, Z. (2015). Confidence preserving machine for facial action unit detection. In Proceedings of the IEEE international conference on computer vision (pp. 3622–3630).

Zhang, M. L., & Wu, L. (2015). Lift: Multi-label learning with label-specific features. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(1), 107–120.CrossRef

Zhang, M. L., & Yu, F. (2015). Solvingthe partial label learning problem: An instance-based approach. In IJCAI (pp. 4048–4054).

Zhang, M. L., & Zhou, Z. H. (2007). ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognition, 40(7), 2038–2048.CrossRefMATH

Zhang, M. L., & Zhou, Z. H. (2014). A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 26(8), 1819–1837.CrossRef

Zhang, Z., Luo, P., Loy, C. C., & Tang, X. (2018). From facial expression recognition to interpersonal relation prediction. International Journal of Computer Vision, 126(5), 550–569.MathSciNetCrossRef

Zhao, K., Zhang, H., Ma, Z., Song, Y. Z., & Guo, J. (2015). Multi-label learning with prior knowledge for facial expression analysis. Neurocomputing, 157, 280–289.CrossRef

Zhong, L., Liu, Q., Yang, P., Liu, B., Huang, J., & Metaxas, D. N. (2012). Learning active facial patches for expression analysis. In Computer vision and pattern recognition (CVPR), 2012 IEEE conference on (pp. 2562–2569). IEEE.

Zhou, Y., Xue, H., & Geng, X. (2015). Emotion distribution recognition from facial expressions. In Proceedings of the 23rd ACM international conference on multimedia (pp. 1247–1250). ACM.

Zhu, R., Sang, G., & Zhao, Q. (2016). Discriminative feature adaptation for cross-domain facial expression recognition. In Biometrics (ICB), 2016 international conference on (pp. 1–7). IEEE.

Zong, Y., Huang, X., Zheng, W., Cui, Z., & Zhao, G. (2017). Learning a target sample re-generator for cross-database micro-expression recognition. In Proceedings of the 2017 ACM on multimedia conference (pp. 872–880). ACM.

Titel: Blended Emotion in-the-Wild: Multi-label Facial Expression Recognition Using Crowdsourced Annotations and Deep Locality Feature Learning
verfasst von: Shan Li
Weihong Deng
Publikationsdatum: 29.11.2018
Verlag: Springer US
Erschienen in: International Journal of Computer Vision / Ausgabe 6-7/2019
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI: https://doi.org/10.1007/s11263-018-1131-1

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 6-7/2019

The Menpo Benchmark for Multi-pose 2D and 3D Facial Landmark Localisation and Tracking

A Comprehensive Study on Center Loss for Deep Face Recognition

Joint Face Hallucination and Deblurring via Structure Generation and Detail Enhancement

Face-Specific Data Augmentation for Unconstrained Face Recognition

An Adversarial Neuro-Tensorial Approach for Learning Disentangled Representations

Detecting and Mitigating Adversarial Perturbations for Robust Face Recognition

Premium Partner