Elsevier

Neurocomputing

Volume 367, 20 November 2019, Pages 188-197
Neurocomputing

Semi-supervised facial expression recognition using reduced spatial features and Deep Belief Networks

https://doi.org/10.1016/j.neucom.2019.08.029Get rights and content

Abstract

A semi-supervised emotion recognition algorithm using reduced features as well as a novel feature selection approach is proposed. The proposed algorithm consists of a cascaded structure where first a feature extraction is applied to the facial images, followed by a feature reduction. A semi-supervised training with all the available labeled and unlabeled data is applied to a Deep Belief Network (DBN). Feature selection is performed to eliminate those features that do not provide information, using a reconstruction error-based ranking. Results show that HOG features of mouth provide the best performance. The performance evaluation has been done between the semi-supervised approach using DBN and other supervised strategies such as Support Vector Machine (SVM) and Convolutional Neural Network (CNN). The results show that the semi-supervised approach has improved efficiency using the information contained in both labeled and unlabeled data. Different databases were used to validate the experiments and the application of Linear Discriminant Analysis (LDA) on the HOG features of mouth gave the highest recognition rate.

Introduction

Amongst the various modes of emotion recognition (ER), the facial expression is one of the conveying forms used for the display of emotions. ER can be applied in various fields like medicine, marketing, entertainment. For example, a medical robot can be designed to continuously monitoring their emotional state [1], [2], or a diagnostic suggestion system for therapists [3]. In Human-Computer Interaction, a system endowed with emotional intelligence can be used to create effective communication with users [4]. In emergency situations, as part of the corresponding situational awareness, real-time decisions can be made from the behavioral patterns of the subjects.

The development of a facial ER system is challenging since the images of the same person with the same facial expression can vary with the lighting conditions, background, and occlusions [5], which precludes homogeneity. Certain emotions have only subtle distinctions which make them harder to analyze and describe. The state-of-the-art approaches in facial ER used feature-based methods [6], [7] and template-based methods [8]. The first ones focus appearance and geometric modelled feature extraction. Template-based methods were less reliable because they are limited to only frontal faces and the accuracy rate changed with variations in pose, scale, and shape. Feature extraction is mostly based on Histogram of Oriented Gradients (HOG) [9] and Local Binary Patterns (LBP) [10]. HOG descriptors were used to encode facial components since it projected the appearance of gradient orientation in an image. Other works use Discrete Wavelet Transform (DWT) for feature extraction and Neural Networks for classification [11]. Dimensionality reduction (DR) techniques in ER include principal components analysis (PCA) [12] and linear discriminant analysis (LDA) [13]. Recently, PCA based facial feature projection has also been used for age progression application [14]. These methods cannot be used to find the nonlinear structure of the data. To overcome this limitation various nonlinear DR algorithms such as kernel PCA [15], locally linear embedding (LLE) [16], isometric feature mapping (Isomap) [17] and T-distributed Stochastic Neighbor Embedding (t-SNE) [18] have been proposed. The sparse representation-based methods for classification (SRC) are also widely used since 2009 [19]. SRC is most effective when there is high separability between the subspaces [20], [21], [22]. But its main disadvantage over classical subspace learning algorithms is that the classification criterion of SRC fails and leads to misclassification when the samples are highly correlated. Deep Neural Networks [23] have gained popularity in the recent years as a choice for supervised learning. The major drawback of supervised learning comes from the fact that most of the data available in general is unlabeled., something particularly evident in the case of human face images. In 2004, Hinton and co-workers Hinton et al. [24] proposed the idea of the Restricted Boltzmann Machines (RBM) and its generalization to Deep Belief Networks (DBN), where unsupervised techniques can model the probabilistic distribution of the data and cluster it [25].

In this paper, a semi-supervised DBN is used to include unlabeled and labeled data to improve the accuracy of the classifier. Semi-supervised learning is comparable to human learning, which involves a small amount of labeled data along with greater amounts of unlabelled observations [26]. To make use of unlabeled data, DBNs are applied to learn the model [27] and the obtained discriminative model is fitted to a labeled dataset by performing Backpropagation (BP).

The major contributions in this paper are two. First, we propose to use semi-supervised learning during feature selection to determine the features that are more explanatory of the human emotions in the available data. The proposed DBN has an input layer taking in the dimensionality-reduced feature vectors corresponding to Histogram Oriented Gradients (HOG) of mouth, HOG of the eye, Wavelet Transform of mouth and Wavelet transform of the eye. Reconstruction error and validation accuracy were used to find the most significant feature vector. Second, a semisupervised learning process is proposed that uses non labelled data to train a DBN. After convergence, we proceed to fine tune the structure with BP and the available labelled data. The data is previously processed by a dimensionality reduction method. The most efficient linear method was LDA and amongst the nonlinear approaches, the best one was Isomap.The proposed semi-supervised framework was evaluated on the CK+, MMI and RAFD databases. The results show that the presented approach have a performance similar or better than those of the state of the art (SVM and CNN) with the additional benefits of using significantly less labelled data and a dramatically reduced training and test computational burdens.

Section snippets

Proposed approach

The introduced semi-supervised deep belief network for facial ER is shown in Fig. 1. The proposed method incorporates different feature extraction methods and dimensionality reduction techniques prior to passing the data into the DBN. Based on the characteristics of the different facial expressions, the mouth and eye patches are extracted from the facial data. Then two feature extraction methods namely HOG and 2D-DWT were used to compute the significant spatial components from the mouth and eye

Databases

The extended Cohn-Kanade database (CK+) [49], the Radboud Faces database (RaFD) [50] and MMI database [51] were used to test the proposed method for facial ER. The CK+ and MMI databases were captured from a lab-based environment whereas the RaFD database contained facial images with varying poses and gaze directions. Firstly, in the case of CK+ database, there are 327 image sequences with 7 expression labels namely anger, neutral, disgust, fear, happy, sad, and surprise. The last frame of each

Conclusion

A semi-supervised approach for facial ER utilizing reduced facial features with most of the data being unlabeled is introduced with a four-layered neural network. They are convenient to use due to their easy training. Since we use CD and BP, training can be done sequentially. Semi-supervised learning was achieved by combining CD and BP, as CD is unsupervised, and BP is supervised. The facial features used were mouth and eye HOG, 2D-DWT of eyes and 2D-DWT of mouth. Further, the analysis was done

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work has been supported by National Science Foundation S&CC EAGER grant 1637092. Authors would like to thank UNM Center for Advanced Research Computing, supported in part by the NSF, for providing high performance computing, large-scale storage and visualization resources.

Aswathy Rajendra Kurup received the bachelor’s degree in Electronics and Communication Engineering from Amrita school of Engineering in 2015 and the master’s degree in Electrical Engineering from The University of New Mexico in 2017. She is currently working towards her Ph.D. degree in Electrical Engineering from The University of New Mexico. Her research interests are Image Processing, Signal Processing and Machine Learning.

References (58)

  • C. Soladi et al.

    A new invariant representation of facial expressions: definition and application to blended expression recognition

    Proceedings of the 2012 19th IEEE International Conference on Image Processing

    (2012)
  • M. Pantic et al.

    Dynamics of facial expression: recognition of facial actions and their temporal segments from face profile image sequences

    IEEE Trans. Syst. Man Cybern. Part B (Cybern.)

    (2006)
  • Y.-l. Tian et al.

    Recognizing action units for facial expression analysis

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2001)
  • M. Pantic et al.

    Automatic analysis of facial expressions: the state of the art

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2000)
  • Y. Hu et al.

    Multi-view facial expression recognition

    Proceedings of the 2008 8th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2008

    (2008)
  • S.B. Kazmi et al.

    Wavelets based facial expression recognition using a bank of neural networks

    Proceedings of the 2010 5th International Conference on Future Information Technology

    (2010)
  • M. Turk et al.

    Eigenfaces for recognition

    J. Cognit. Neurosci.

    (1991)
  • P.N. Belhumeur et al.

    Eigenfaces vs. fisherfaces: recognition using class specific linear projection

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1997)
  • shux. et al.

    Personalized age progression with bi-level aging dictionary learning

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2018)
  • T. Balachander et al.

    Kernel based subspace pattern classification

    Proceedings of the International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339), IJCNN’99.

    (1999)
  • L.K. Saul et al.

    Think globally, fit locally: unsupervised learning of low dimensional manifolds

    J. Mach. Learn. Res.

    (2003)
  • J.B. Tenenbaum et al.

    A global geometric framework for nonlinear dimensionality reduction

    Science

    (2000)
  • L. van der Maaten

    Accelerating t-SNE using tree-based algorithms

    J. Mach. Learn. Res.

    (2014)
  • J. Wright et al.

    Robust face recognition via sparse representation

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2009)
  • ShuX. et al.

    Image classification with tailored fine-grained dictionaries

    IEEE Trans. Circuits Syst. Video Technol.

    (2018)
  • LanX. et al.

    Learning common and feature-specific patterns: a novel multiple-sparse-representation-based tracker

    IEEE Trans. Image Process.

    (2018)
  • GuJ. et al.

    Random subspace based ensemble sparse representation

    Pattern Recognit.

    (2017)
  • S. Hochreiter et al.

    Gradient flow in recurrent nets: the difficulty of learning long-term dependencies

  • G.E. Hinton et al.

    A fast learning algorithm for deep belief nets

    Neural Comput.

    (2006)
  • Cited by (44)

    • Neural network-based control of an ultrafast laser

      2023, Nuclear Instruments and Methods in Physics Research, Section A: Accelerators, Spectrometers, Detectors and Associated Equipment
    • SelfNet: A semi-supervised local Fisher discriminant network for few-shot learning

      2022, Neurocomputing
      Citation Excerpt :

      Semi-supervised learning algorithms, as is well known, are trained on a combination of a small amount of labeled data and a large amount of unlabeled data, which provides the benefits of both supervised and unsupervised learning. Semi-supervised technology has found widespread application in a variety of fields, such as face recognition [22,23], action recognition [24], and object detection [25]. In this sense, we might tackle the few-shot learning problem with semi-supervised technology, and several works have proved that this strategy is both effective and feasible [26–29].

    • SG-DSN: A Semantic Graph-based Dual-Stream Network for facial expression recognition

      2021, Neurocomputing
      Citation Excerpt :

      One key challenge of implementing effective FER is to capture discriminative expression information from static images or video sequences. Previous studies mainly depend on hand-craft feature design or automatic feature learning followed by classifier construction [3,4]. However, these methods generally handle local and holistic expression cues in the view of classic image processing, without considering latent semantic information.

    • Challenges and issues in facial emotion recognition techniques

      2024, International Journal of Business Intelligence and Data Mining
    View all citing articles on Scopus

    Aswathy Rajendra Kurup received the bachelor’s degree in Electronics and Communication Engineering from Amrita school of Engineering in 2015 and the master’s degree in Electrical Engineering from The University of New Mexico in 2017. She is currently working towards her Ph.D. degree in Electrical Engineering from The University of New Mexico. Her research interests are Image Processing, Signal Processing and Machine Learning.

    Meenu Ajith received the bachelor’s degree in Electronics and Communication Engineering from Amrita school of Engineering in 2015 and the master’s degree in 2017 in Electrical Engineering from The University of New Mexico in 2017. She is currently working towards her Ph.D. degree in Electrical Engineering from The University of New Mexico. Her research interests are Machine Learning, Computer Vision, Pattern Recognition and Image Processing.

    Manel Martínez Ramón is a professor with the ECE department of The University of New Mexico. He holds the King Felipe VI Endowed Chair of the University of New Mexico, a chair sponsored by the Household of the King of Spain. He is a Telecommunications Engineer (Universitt Politecnica de Catalunya, Spain, 1996) and Ph.D. in Communications Technologies (Universidad Carlos III de Madrid, Spain, 1999). His research interests are in Machine Learning applications to smart antennas, neuroimage, first responders and other cyber-human systems, smart grid and others. His last work is the monographic book “Signal Processing with Kernel Methods”, Wiley, 2018.

    View full text