Facial Expression Recognition (FER) is a critical component of artificial intelligence (AI) with applications in human-computer interaction, affective computing, security, and healthcare. Despite significant advancements, real-time FER remains challenging due to computational inefficiency, dataset biases, and the difficulty of detecting subtle micro-expressions. This study proposes an optimized deep learning framework that integrates Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTM) networks, and Transformer-based architectures to enhance video-based emotion analysis. The model leverages self-supervised learning (SSL), contrastive learning, and motion tracking to improve micro-expression recognition, achieving a 9.2% increase in classification accuracy.
At the same time, the framework adopts low-bit quantization (INT4/INT2) and model pruning along with parallelism for enabling real-time performance on edge AI devices with reduced latency up to 159% with deployment on Jetson Nano, Google Coral, and Edge TPU. To address the fairness and bias concerns, RCNN, the adversarial debaser, and the reinforcement learning-based dataset balancing led to a 14.3% reduction in misclassification for the minority demographic group. The study also discusses ethical concerns in FER application, especially relating to the privacy risks in surveillance and mental health diagnostics. The reason for this is to promote multi-modal emotion recognition via facial expression integration with voice and physiological signals for enhancing context-aware emotion analysis. Experimental evaluations show that using the proposed model achieves state-of-the-art accuracy of 94.1%, much better than the best baseline models in terms of accuracy and computational efficiency.