Top

Pattern Recognition and Image Analysis

Published in:

01-09-2022 | SELECTED PAPERS OF THE 8th INTERNATIONAL WORKSHOP “IMAGE MINING. THEORY AND APPLICATIONS”

Audio-Visual Continuous Recognition of Emotional State in a Multi-User System Based on Personalized Representation of Facial Expressions and Voice

Authors: A. V. Savchenko, L. V. Savchenko

Published in: Pattern Recognition and Image Analysis | Issue 3/2022

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

This paper is devoted to tracking dynamics of psycho-emotional state based on analysis of the user’s facial video and voice. We propose a novel technology with personalized acoustic and visual lightweight neural network models that can be launched in real-time on any laptop or even mobile device. At first, two separate user-independent classifiers (feed-forward neural networks) are trained for speech emotion and facial expression recognition in video. The former extracts acoustic features with OpenL3 or OpenSmile frameworks. The latter is based on preliminary extraction of emotional features from each frame with a pre-trained convolutional neural network. Next, both classifiers are fine-tuned using a small number of short emotional videos that should be available for each user. The face of a user is identified during the real-time tracking of emotional state to choose the concrete neural networks. The final decision about current emotion in a short time frame is predicted by blending the outputs of personalized audio and video classifiers. It is experimentally demonstrated for the Russian Acted Multimodal Affective Set that the proposed approach makes it possible to increase the emotion recognition accuracy by 2–15%.

previous article Extracting Knowledge from Images of Meanders and Spirals in the Diagnosis of Patients with Parkinson’s Disease

next article Jaccard Index-Based Detection of Order 2 Rotational Quasi-Symmetry Focus for Binary Images

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

J. Cramer, H. H. Wu, J. Salamon, and J. P. Bello, “Look, listen, and learn more: Design choices for deep audio embeddings,” in ICASSP 2019–2019 IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Brighton, UK, 2019 (IEEE, 2019), pp. 3852–3856. https://doi.org/10.1109/ICASSP.2019.8682475

P. Demochkina and A. V. Savchenko, “MobileEmotiFace: Efficient facial image representations in video-based emotion recognition on mobile devices,” in Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021, Ed. A. Del Bimbo, Lecture Notes in Computer Science, Vol. 12665 (Springer, Cham, 2021), pp. 266–274. https://doi.org/10.1007/978-3-030-68821-9_25CrossRef

A. Dhall, R. Goecke, S. Lucey and T. Gedeon, “Collecting large, richly annotated facial-expression databases from movies”, IEEE Multimedia 19, 34–41 (2012). https://doi.org/10.1109/MMUL.2012.26CrossRef

F. Eyben, M. Wöllmer, and B. Schuller, “OpenSmile: the Munich versatile and fast open-source audio feature extractor,” in Proc. 18th ACM Int. Conf. on Multimedia, Firenze, 2010 (Association for Computing Machinery, New York, 2010), pp. 1459–1462. https://doi.org/10.1145/1873951.1874246

M. Farooq, F. Hussain, N. K. Baloch, F. R. Raja, H. Yu, and Y. Bin Zikria, “Impact of feature selection algorithm on speech emotion recognition using deep convolutional neural network,” Sensors 20, 6008 (2020). https://doi.org/10.3390/s20216008CrossRef

D. Hu, X. Hou, L. Wei, L. Jiang, and Y. Mo, “MM-DFN: Multimodal dynamic fusion network for emotion recognition in conversations,” in ICASSP 2022–2022 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 2022 (IEEE, 2022), pp. 7037–7041. https://doi.org/10.1109/ICASSP43922.2022.9747397

S. Jie, and Q. Yongsheng, “Multi-view facial expression recognition with multi-view facial expression light weight network,” Pattern Recognit. Image Anal. 30, 805–814 (2020). https://doi.org/10.1134/S1054661820040197CrossRef

V. Kumar, S. Rao, and L. Yu, “Noisy student training using body language dataset improves facial expression recognition,” in Computer Vision–ECCV 2020 Workshops, Ed. by A. Bartoli, Lecture Notes in Computer Science, Vol. 12535 (Springer, Cham, 2020), pp. 756–773. https://doi.org/10.1007/978-3-030-66415-2_53CrossRef

S. Li, W. Zheng, Y. Zong, C. Lu, C. Tang, X. Jiang, J. Liu, and W. Xia, “Bi-modality fusion for emotion recognition in the wild,” in ICMI’19: Int. Conf. on Multimodal Interaction, Suzhou, China, 2019 (Association for Computing Machinery, New York, 2019), pp. 589–594. https://doi.org/10.1145/3340555.3355719

10.

A. Mollahosseini, B. Hasani, and M. H. Mahoor, “AffectNet: A database for facial expression, valence, and arousal computing in the wild,” IEEE Trans. Affective Comput. 10, 18–31 (2017). https://doi.org/10.1109/TAFFC.2017.2740923CrossRef

11.

O. Perepelkina, E. Kazimirova, and M. Konstantinova, “RAMAS: Russian multimodal corpus of dyadic interaction for affective computing,” in Speech and Computer. SPECOM 2018, Ed. by A. Karpov, O. Jokisch, and R. Potapova, Lecture Notes in Computer Science, Vol. 11096 (Springer, Cham, 2018), pp. 501–510. https://doi.org/10.1007/978-3-319-99579-3_52CrossRef

12.

E. Ryumina, O. Verkholyak, and A. Karpov, “Annotation confidence vs. training sample size: trade-off solution for partially-continuous categorical emotion recognition”, in Interspeech 2021 (IEEE, 2021), pp. 3690–3694. https://doi.org/10.21437/Interspeech.2021-1636

13.

A. V. Savchenko, “Facial expression and attributes recognition based on multi-task learning of lightweight neural networks,” in IEEE 19th Int. Symp. Intelligent Systems and Informatics (SISY), Subotica, Serbia, 2021, Ed. by L. Kovács (IEEE, 2021), pp. 119–124. https://doi.org/10.1109/SISY52375.2021.9582508

14.

A. V. Savchenko, “Personalized frame-level facial expression recognition in video,” in Pattern Recognition and Artificial Intelligence. ICPRAI 2022, Ed. by M. El Yacoubi, E. Granger, P. C. Yuen, U. Pal, and N. Vincent, Lecture Notes in Computer Science, Vol. 13363 (Springer, Cham, 2022), pp 447–458. https://doi.org/10.1007/978-3-031-09037-0_37CrossRef

15.

A. V. Savchenko, “Video-based frame-level facial analysis of affective behavior on mobile devices using EfficientNets,” in IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2022, Ed. by D. Kollias (IEEE, 2022), pp. 2359–2366.

16.

A. Savchenko, A. Alekseev, S. Kwon, E. Tutubalina, E. Myasnikov, and S. Nikolenko. “Ad lingua: Text classification improves symbolism prediction in image advertisements,” in Proc. 28th Int. Conf. on Computational Linguistics, Barcelona, 2020, Ed. by D. Scott, N. Bel, and Ch. Zong (Association for Computational Linguistics, 2020), pp. 1886–1892. https://doi.org/10.18653/v1/2020.coling-main.171

17.

A. V. Savchenko and L. Savchenko, “Speaker-aware training of speech emotion classifier with speaker recognition,” in Speech and Computer. SPECOM 2021, Ed. by A. Karpov and R. Potapova, Lecture Notes in Computer Science, Vol. 12997 (Springer, Cham, 2021), pp. 614–625. https://doi.org/10.1007/978-3-030-87802-3_55CrossRef

18.

L. V. Savchenko and A. V. Savchenko, “A method of real-time dynamic measurement of a speaker’s emotional state from a speech waveform,” Meas. Tech. 64, 319–327 (2021). https://doi.org/10.1007/s11018-021-01935-zCrossRef

19.

M. Shahabinejad, Y. Wang, Y. Yu, J. Tang, and J. Li, “Toward personalized emotion recognition: A face recognition based attention method for facial emotion recognition,” in 16th IEEE Int. Conf. on Automatic Face and Gesture Recognition (FG 2021), Jodhpur, India, 2021 (IEEE, 2021), pp. 1–5. https://doi.org/10.1109/FG52635.2021.9666982

20.

B. Sonawane, and P. Sharma, “Deep learning based approach of emotion detection and grading system,” Pattern Recognit. Image Anal. 30, 726–740 (2020). https://doi.org/10.1134/S1054661820040239CrossRef

21.

K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, “Joint face detection and alignment using multitask cascaded convolutional networks,” IEEE Signal Process. Lett. 23, 1499–1503 (2016). https://doi.org/10.1109/LSP.2016.2603342CrossRef

22.

H. Zhou, D. Meng, Yu. Zhang, X. Peng, J. Du, K. Wang, and Yu Qiao, “Exploring emotion features and fusion strategies for audio-video emotion recognition,” in Int. Conf. on Multimodal Interaction, Suzhou, China, 2019, Ed. by W. Gao, H. M. Ling Meng, M. Turk, S. R. Fussell, B. Schuller, Ya. Song, and K. Yu (Association for Computing Machinery, New York, 2019), pp. 562–566. https://doi.org/10.1145/3340555.3355713

Title: Audio-Visual Continuous Recognition of Emotional State in a Multi-User System Based on Personalized Representation of Facial Expressions and Voice
Authors: A. V. Savchenko
L. V. Savchenko
Publication date: 01-09-2022
Publisher: Pleiades Publishing
Published in: Pattern Recognition and Image Analysis / Issue 3/2022
Print ISSN: 1054-6618
Electronic ISSN: 1555-6212
DOI: https://doi.org/10.1134/S1054661822030397

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Other articles of this Issue 3/2022

Images Recognition with Selection of Informative Subspaces by Conjugacy Criterion

On Some Scientific Results of the IMTA-VIII-2022: 8th International Workshop “Image Mining: Theory and Applications”

An Automated Analysis Tool for the Classification of Sea Surface Temperature Imagery

Meta-Learning Approach in Diffractive Lens Computational Imaging

Recognition of Congestive Heart Failure Based on a Complex Correlation Measure of the Heart Rate Signal

Classification with Incomplete Probabilistic Labeling Based on Manifold Regularization and Fuzzy Clustering Ensemble

Premium Partner