nach oben

Journal on Multimodal User Interfaces

Erschienen in:

01.06.2016 | Original Paper

Hierarchical committee of deep convolutional neural networks for robust facial expression recognition

verfasst von: Bo-Kyeong Kim, Jihyeon Roh, Suh-Yeon Dong, Soo-Young Lee

Erschienen in: Journal on Multimodal User Interfaces | Ausgabe 2/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

This paper describes our approach towards robust facial expression recognition (FER) for the third Emotion Recognition in the Wild (EmotiW2015) challenge. We train multiple deep convolutional neural networks (deep CNNs) as committee members and combine their decisions. To improve this committee of deep CNNs, we present two strategies: (1) in order to obtain diverse decisions from deep CNNs, we vary network architecture, input normalization, and random weight initialization in training these deep models, and (2) in order to form a better committee in structural and decisional aspects, we construct a hierarchical architecture of the committee with exponentially-weighted decision fusion. In solving a seven-class problem of static FER in the wild for the EmotiW2015, we achieve a test accuracy of 61.6 %. Moreover, on other public FER databases, our hierarchical committee of deep CNNs yields superior performance, outperforming or competing with state-of-the-art results for these databases.

Vorheriger Artikel Emotion recognition in the wild via sparse transductive transfer linear discriminant analysis

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Agostinelli F, Anderson MR, Lee H (2013) Adaptive multi-column deep neural networks with application to robust image denoising. In: Advances in Neural Information Processing Systems, pp 1493–1501

Aksela M, Laaksonen J (2006) Using diversity of errors for selecting members of a committee classifier. Patt Recog 39(4):608–623CrossRefMATH

Bell D, JwW Guan, Bi Y et al (2005) On combining classifier mass functions for text categorization. Know Data Eng IEEE Trans 17(10):1307–1319CrossRef

Boulesteix AL, Porzelius C, Daumer M (2008) Microarray-based classification and clinical predictors: on combined classifiers and additional predictive value. Bioinformatics 24(15):1698–1706CrossRef

Cireşan D, Meier U, Masci J, Schmidhuber J (2012a) Multi-column deep neural network for traffic sign classification. Neural Networks 32:333–338CrossRef

Cireşan D, Meier U, Schmidhuber J (2012b) Multi-column deep neural networks for image classification. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, IEEE, pp 3642–3649

Cireşan DC, Meier U, Gambardella LM, Schmidhuber J (2010) Deep, big, simple neural nets for handwritten digit recognition. Neural Comput 22(12):3207–3220CrossRef

Cireşan DC, Meier U, Gambardella LM, Schmidhuber J (2011) Convolutional neural network committees for handwritten character classification. In: Document Analysis and Recognition (ICDAR), 2011 International Conference on, IEEE, pp 1135–1139

Dhall A, Goecke R, Lucey S, Gedeon T (2012) Collecting large, richly annotated facial-expression databases from movies. MultiMedia IEEE 19(3):34–41CrossRef

10.

Dhall A, Goecke R, Joshi J, Wagner M, Gedeon T (2013) Emotion recognition in the wild challenge 2013. In: Proceedings of the 15th ACM on International conference on multimodal interaction, ACM, pp 509–516

11.

Dhall A, Goecke R, Joshi J, Sikka K, Gedeon T (2014) Emotion recognition in the wild challenge 2014: Baseline, data and protocol. In: Proceedings of the 16th International Conference on Multimodal Interaction, ACM, pp 461–466

12.

Dhall A, Murthy OVR, Goecke R, Joshi J, Gedeon T (2015) Video and image based emotion recognition challenges in the wild: Emotiw 2015. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, ACM, pp 423–426

13.

Dietterich TG (2000) Ensemble methods in machine learning. In: Multiple classifier systems, Springer, pp 1–15

14.

Ebrahimi Kahou S, Michalski V, Konda K, Memisevic R, Pal C (2015) Recurrent neural networks for emotion recognition in video. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, ACM, pp 467–474

15.

Giacinto G, Roli F (2001) Design of effective neural network ensembles for image classification purposes. Image Vision Comput 19(9):699–707CrossRef

16.

Goodfellow IJ, Erhan D, Carrier PL, Courville A, Mirza M, Hamner B, Cukierski W, Tang Y, Thaler D, Lee DH et al (2015) Challenges in representation learning: A report on three machine learning contests. Neural Networks 64:59–63CrossRef

17.

Gross R, Brajovic V (2003) An image preprocessing algorithm for illumination invariant face recognition. In: Audio-and Video-Based Biometric Person Authentication, Springer, pp 10–18

18.

Hansen LK, Salamon P (1990) Neural network ensembles. Patt Anal Mach Intell IEEE Trans 12(10):993–1001CrossRef

19.

Huang Y, Suen C (1993) The behavior-knowledge space method for combination of multiple classifiers. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, pp 347–347

20.

Ionescu RT, Popescu M, Grozea C (2013) Local learning to improve bag of visual words model for facial expression recognition. In: Workshop on Challenges in Representation Learning, ICML

21.

Jacobs RA, Jordan MI, Nowlan SJ, Hinton GE (1991) Adaptive mixtures of local experts. Neural Comput 3(1):79–87CrossRef

22.

Jordan MI, Jacobs RA (1994) Hierarchical mixtures of experts and the em algorithm. Neural Comput 6(2):181–214CrossRef

23.

Kahou SE, Pal C, Bouthillier X, Froumenty P, Gülçehre Ç, Memisevic R, Vincent P, Courville A, Bengio Y, Ferrari RC, et al. (2013) Combining modality specific deep neural networks for emotion recognition in video. In: Proceedings of the 15th ACM on International conference on multimodal interaction, ACM, pp 543–550

24.

Kahou SE, Froumenty P, Pal C (2014) Facial expression analysis based on high dimensional binary features. In: Computer Vision-ECCV 2014 Workshops, Springer, pp 135–147

25.

Khorrami P, Paine TL, Huang TS (2015) Do deep neural networks learn facial action units when doing expression recognition? arXiv preprint arXiv:1510.02969

26.

Kim BK, Lee H, Roh J, Lee SY (2015) Hierarchical committee of deep cnns with exponentially-weighted decision fusion for static facial expression recognition. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, ACM, pp 427–434

27.

Kittler J, Hatef M, Duin RP, Matas J (1998) On combining classifiers. Patt Anal Mach Intell IEEE Trans 20(3):226–239CrossRef

28.

Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

29.

Kuncheva LI (2004) Combining pattern classifiers: methods and algorithms. Wiley, USACrossRefMATH

30.

Kuncheva LI, Bezdek JC, Duin RP (2001) Decision templates for multiple classifier fusion: an experimental comparison. Patt Recogn 34(2):299–314CrossRefMATH

31.

LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Procee IEEE 86(11):2278–2324CrossRef

32.

Liu M, Zhang D, Yap PT, Shen D (2012) Hierarchical ensemble of multi-level classifiers for diagnosis of alzheimer’s disease. In: Machine Learning in Medical Imaging, Springer, pp 27–35

33.

Liu M, Li S, Shan S, Chen X (2013) Enhancing expression recognition in the wild with unlabeled reference data. In: Computer Vision-ACCV 2012, Springer, pp 577–588

34.

Liu M, Wang R, Li S, Shan S, Huang Z, Chen X (2014) Combining multiple kernel methods on riemannian manifold for emotion recognition in the wild. In: Proceedings of the 16th International Conference on Multimodal Interaction, ACM, pp 494–501

35.

Pajares G, Guijarro M, Ribeiro A (2010) A hopfield neural network for combining classifiers applied to textured images. Neural Networks 23(1):144–153CrossRef

36.

Pan SJ, Yang Q (2010) A survey on transfer learning. Knowl Data Eng IEEE Trans 22(10):1345–1359CrossRef

37.

Polikar R (2006) Ensemble based systems in decision making. Circ Syst Magaz IEEE 6(3):21–45CrossRef

38.

Reed S, Lee H, Anguelov D, Szegedy C, Erhan D, Rabinovich A (2014a) Training deep neural networks on noisy labels with bootstrapping. arXiv preprint arXiv:1412.6596

39.

Reed S, Sohn K, Zhang Y, Lee H (2014b) Learning to disentangle factors of variation with manifold interaction. In: Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp 1431–1439

40.

Rifai S, Bengio Y, Courville A, Vincent P, Mirza M (2012) Disentangling factors of variation for facial expression recognition. In: Computer Vision-ECCV 2012, Springer, pp 808–822

41.

Rodríguez-Liñares L, García-Mateo C, Alba-Castro JL (2003) On combining classifiers for speaker authentication. Patt Recogn 36(2):347–359CrossRef

42.

Schuller B, Valstar M, Eyben F, McKeown G, Cowie R, Pantic M (2011) Avec 2011-the first international audio/visual emotion challenge. In: Affective Computing and Intelligent Interaction, Springer, pp 415–424

43.

Shan C (2012) Smile detection by boosting pixel differences. Image Process IEEE Trans 21(1):431–436MathSciNetCrossRef

44.

Sharkey AJC (1996) On combining artificial neural nets. Conn Sci 8(3–4):299–314CrossRef

45.

Shipp CA, Kuncheva LI (2002) Relationships between combination methods and measures of diversity in combining classifiers. Inform Fusion 3(2):135–148CrossRef

46.

Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958MathSciNetMATH

47.

Štruc V, Pavešic N (2011) Photometric normalization techniques for illumination invariance. Advances in Face Image Analysis: Techniques and Technologies pp 279–300

48.

Su Y, Shan S, Chen X, Gao W (2009) Hierarchical ensemble of global and local classifiers for face recognition. Image Process IEEE Trans 18(8):1885–1896MathSciNetCrossRef

49.

Sun Y, Wang X, Tang X (2014) Deep learning face representation from predicting 10,000 classes. In: Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, IEEE, pp 1891–1898

50.

Susskind JM, Anderson AK, Hinton GE (2010) The toronto face database. Department of Computer Science, University of Toronto, Toronto, ON, Canada, Tech Rep

51.

Tang Y (2013a) deep-learning-faces. https://code.google.com/p/deep-learning-faces/

52.

Tang Y (2013b) Deep learning using linear support vector machines. arXiv preprint arXiv:1306.0239

53.

Titsias MK, Likas A (2002) Mixture of experts classification using a hierarchical mixture model. Neural Comput 14(9):2221–2244CrossRefMATH

54.

Valstar MF, Jiang B, Mehu M, Pantic M, Scherer K (2011) The first facial expression recognition and analysis challenge. In: Automatic Face & Gesture Recognition and Workshops (FG 2011), 2011 IEEE International Conference on, IEEE, pp 921–926

55.

Vedaldi A, Lenc K (2014) Matconvnet-convolutional neural networks for matlab. arXiv preprint arXiv:1412.4564

56.

Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vision 57(2):137–154

57.

Whitehill J, Littlewort G, Fasel I, Bartlett M, Movellan J (2009) Toward practical smile detection. Patt Anal Mach Intell IEEE Trans 31(11):2106–2111CrossRef

58.

Wolpert DH (1992) Stacked generalization. Neural Networks 5(2):241–259MathSciNetCrossRef

59.

Wu CH, Liang WB (2011) Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels. Affect Comp IEEE Trans 2(1):10–21MathSciNetCrossRef

60.

Wu D, Shao L (2014) Deep dynamic neural networks for gesture segmentation and recognition. In: Computer Vision-ECCV 2014 Workshops, Springer, pp 552–571

61.

Xiong X, De la Torre F (2013) Supervised descent method and its applications to face alignment. In: Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, IEEE, pp 532–539

62.

Yao A, Shao J, Ma N, Chen Y (2015) Capturing au-aware facial features and their latent relations for emotion recognition in the wild. In: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, ACM, pp 451–458

63.

Yu Z, Zhang C (2015) Image based static facial expression recognition with multiple deep network learning. In: Proceedings of the 2015 ACM Int Confer Multi Inter ACM, pp 435–442

64.

Zhu X, Ramanan D (2012) Face detection, pose estimation, and landmark localization in the wild. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, IEEE, pp 2879–2886

Titel: Hierarchical committee of deep convolutional neural networks for robust facial expression recognition
verfasst von: Bo-Kyeong Kim
Jihyeon Roh
Suh-Yeon Dong
Soo-Young Lee
Publikationsdatum: 01.06.2016
Verlag: Springer International Publishing
Erschienen in: Journal on Multimodal User Interfaces / Ausgabe 2/2016
Print ISSN: 1783-7677
Elektronische ISSN: 1783-8738
DOI: https://doi.org/10.1007/s12193-015-0209-0

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 2/2016

Video modeling and learning on Riemannian manifold for emotion recognition in the wild

Combining feature-level and decision-level fusion in a hierarchical classifier for emotion recognition in the wild

Revisiting the EmotiW challenge: how wild is it really?

Combining modality-specific extreme learning machines for emotion recognition in the wild

Emotion recognition in the wild via sparse transductive transfer linear discriminant analysis

EmoNets: Multimodal deep learning approaches for emotion recognition in video

Premium Partner