Skip to main content
Top
Published in: Pattern Recognition and Image Analysis 3/2022

01-09-2022 | SELECTED PAPERS OF THE 8th INTERNATIONAL WORKSHOP “IMAGE MINING. THEORY AND APPLICATIONS”

Audio-Visual Continuous Recognition of Emotional State in a Multi-User System Based on Personalized Representation of Facial Expressions and Voice

Authors: A. V. Savchenko, L. V. Savchenko

Published in: Pattern Recognition and Image Analysis | Issue 3/2022

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This paper is devoted to tracking dynamics of psycho-emotional state based on analysis of the user’s facial video and voice. We propose a novel technology with personalized acoustic and visual lightweight neural network models that can be launched in real-time on any laptop or even mobile device. At first, two separate user-independent classifiers (feed-forward neural networks) are trained for speech emotion and facial expression recognition in video. The former extracts acoustic features with OpenL3 or OpenSmile frameworks. The latter is based on preliminary extraction of emotional features from each frame with a pre-trained convolutional neural network. Next, both classifiers are fine-tuned using a small number of short emotional videos that should be available for each user. The face of a user is identified during the real-time tracking of emotional state to choose the concrete neural networks. The final decision about current emotion in a short time frame is predicted by blending the outputs of personalized audio and video classifiers. It is experimentally demonstrated for the Russian Acted Multimodal Affective Set that the proposed approach makes it possible to increase the emotion recognition accuracy by 2–15%.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference J. Cramer, H. H. Wu, J. Salamon, and J. P. Bello, “Look, listen, and learn more: Design choices for deep audio embeddings,” in ICASSP 2019–2019 IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Brighton, UK, 2019 (IEEE, 2019), pp. 3852–3856. https://doi.org/10.1109/ICASSP.2019.8682475 J. Cramer, H. H. Wu, J. Salamon, and J. P. Bello, “Look, listen, and learn more: Design choices for deep audio embeddings,” in ICASSP 2019–2019 IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Brighton, UK, 2019 (IEEE, 2019), pp. 3852–3856. https://​doi.​org/​10.​1109/​ICASSP.​2019.​8682475
2.
go back to reference P. Demochkina and A. V. Savchenko, “MobileEmotiFace: Efficient facial image representations in video-based emotion recognition on mobile devices,” in Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021, Ed. A. Del Bimbo, Lecture Notes in Computer Science, Vol. 12665 (Springer, Cham, 2021), pp. 266–274. https://doi.org/10.1007/978-3-030-68821-9_25CrossRef P. Demochkina and A. V. Savchenko, “MobileEmotiFace: Efficient facial image representations in video-based emotion recognition on mobile devices,” in Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021, Ed. A. Del Bimbo, Lecture Notes in Computer Science, Vol. 12665 (Springer, Cham, 2021), pp. 266–274. https://​doi.​org/​10.​1007/​978-3-030-68821-9_​25CrossRef
4.
go back to reference F. Eyben, M. Wöllmer, and B. Schuller, “OpenSmile: the Munich versatile and fast open-source audio feature extractor,” in Proc. 18th ACM Int. Conf. on Multimedia, Firenze, 2010 (Association for Computing Machinery, New York, 2010), pp. 1459–1462. https://doi.org/10.1145/1873951.1874246 F. Eyben, M. Wöllmer, and B. Schuller, “OpenSmile: the Munich versatile and fast open-source audio feature extractor,” in Proc. 18th ACM Int. Conf. on Multimedia, Firenze, 2010 (Association for Computing Machinery, New York, 2010), pp. 1459–1462. https://​doi.​org/​10.​1145/​1873951.​1874246
6.
8.
9.
go back to reference S. Li, W. Zheng, Y. Zong, C. Lu, C. Tang, X. Jiang, J. Liu, and W. Xia, “Bi-modality fusion for emotion recognition in the wild,” in ICMI’19: Int. Conf. on Multimodal Interaction, Suzhou, China, 2019 (Association for Computing Machinery, New York, 2019), pp. 589–594. https://doi.org/10.1145/3340555.3355719 S. Li, W. Zheng, Y. Zong, C. Lu, C. Tang, X. Jiang, J. Liu, and W. Xia, “Bi-modality fusion for emotion recognition in the wild,” in ICMI’19: Int. Conf. on Multimodal Interaction, Suzhou, China, 2019 (Association for Computing Machinery, New York, 2019), pp. 589–594. https://​doi.​org/​10.​1145/​3340555.​3355719
11.
go back to reference O. Perepelkina, E. Kazimirova, and M. Konstantinova, “RAMAS: Russian multimodal corpus of dyadic interaction for affective computing,” in Speech and Computer. SPECOM 2018, Ed. by A. Karpov, O. Jokisch, and R. Potapova, Lecture Notes in Computer Science, Vol. 11096 (Springer, Cham, 2018), pp. 501–510. https://doi.org/10.1007/978-3-319-99579-3_52CrossRef O. Perepelkina, E. Kazimirova, and M. Konstantinova, “RAMAS: Russian multimodal corpus of dyadic interaction for affective computing,” in Speech and Computer. SPECOM 2018, Ed. by A. Karpov, O. Jokisch, and R. Potapova, Lecture Notes in Computer Science, Vol. 11096 (Springer, Cham, 2018), pp. 501–510. https://​doi.​org/​10.​1007/​978-3-319-99579-3_​52CrossRef
13.
go back to reference A. V. Savchenko, “Facial expression and attributes recognition based on multi-task learning of lightweight neural networks,” in IEEE 19th Int. Symp. Intelligent Systems and Informatics (SISY), Subotica, Serbia, 2021, Ed. by L. Kovács (IEEE, 2021), pp. 119–124. https://doi.org/10.1109/SISY52375.2021.9582508 A. V. Savchenko, “Facial expression and attributes recognition based on multi-task learning of lightweight neural networks,” in IEEE 19th Int. Symp. Intelligent Systems and Informatics (SISY), Subotica, Serbia, 2021, Ed. by L. Kovács (IEEE, 2021), pp. 119–124. https://​doi.​org/​10.​1109/​SISY52375.​2021.​9582508
14.
go back to reference A. V. Savchenko, “Personalized frame-level facial expression recognition in video,” in Pattern Recognition and Artificial Intelligence. ICPRAI 2022, Ed. by M. El Yacoubi, E. Granger, P. C. Yuen, U. Pal, and N. Vincent, Lecture Notes in Computer Science, Vol. 13363 (Springer, Cham, 2022), pp 447–458. https://doi.org/10.1007/978-3-031-09037-0_37CrossRef A. V. Savchenko, “Personalized frame-level facial expression recognition in video,” in Pattern Recognition and Artificial Intelligence. ICPRAI 2022, Ed. by M. El Yacoubi, E. Granger, P. C. Yuen, U. Pal, and N. Vincent, Lecture Notes in Computer Science, Vol. 13363 (Springer, Cham, 2022), pp 447–458. https://​doi.​org/​10.​1007/​978-3-031-09037-0_​37CrossRef
15.
go back to reference A. V. Savchenko, “Video-based frame-level facial analysis of affective behavior on mobile devices using EfficientNets,” in IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2022, Ed. by D. Kollias (IEEE, 2022), pp. 2359–2366. A. V. Savchenko, “Video-based frame-level facial analysis of affective behavior on mobile devices using EfficientNets,” in IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2022, Ed. by D. Kollias (IEEE, 2022), pp. 2359–2366.
16.
go back to reference A. Savchenko, A. Alekseev, S. Kwon, E. Tutubalina, E. Myasnikov, and S. Nikolenko. “Ad lingua: Text classification improves symbolism prediction in image advertisements,” in Proc. 28th Int. Conf. on Computational Linguistics, Barcelona, 2020, Ed. by D. Scott, N. Bel, and Ch. Zong (Association for Computational Linguistics, 2020), pp. 1886–1892. https://doi.org/10.18653/v1/2020.coling-main.171 A. Savchenko, A. Alekseev, S. Kwon, E. Tutubalina, E. Myasnikov, and S. Nikolenko. “Ad lingua: Text classification improves symbolism prediction in image advertisements,” in Proc. 28th Int. Conf. on Computational Linguistics, Barcelona, 2020, Ed. by D. Scott, N. Bel, and Ch. Zong (Association for Computational Linguistics, 2020), pp. 1886–1892. https://​doi.​org/​10.​18653/​v1/​2020.​coling-main.​171
17.
19.
go back to reference M. Shahabinejad, Y. Wang, Y. Yu, J. Tang, and J. Li, “Toward personalized emotion recognition: A face recognition based attention method for facial emotion recognition,” in 16th IEEE Int. Conf. on Automatic Face and Gesture Recognition (FG 2021), Jodhpur, India, 2021 (IEEE, 2021), pp. 1–5. https://doi.org/10.1109/FG52635.2021.9666982 M. Shahabinejad, Y. Wang, Y. Yu, J. Tang, and J. Li, “Toward personalized emotion recognition: A face recognition based attention method for facial emotion recognition,” in 16th IEEE Int. Conf. on Automatic Face and Gesture Recognition (FG 2021), Jodhpur, India, 2021 (IEEE, 2021), pp. 1–5. https://​doi.​org/​10.​1109/​FG52635.​2021.​9666982
22.
go back to reference H. Zhou, D. Meng, Yu. Zhang, X. Peng, J. Du, K. Wang, and Yu Qiao, “Exploring emotion features and fusion strategies for audio-video emotion recognition,” in Int. Conf. on Multimodal Interaction, Suzhou, China, 2019, Ed. by W. Gao, H. M. Ling Meng, M. Turk, S. R. Fussell, B. Schuller, Ya. Song, and K. Yu (Association for Computing Machinery, New York, 2019), pp. 562–566. https://doi.org/10.1145/3340555.3355713 H. Zhou, D. Meng, Yu. Zhang, X. Peng, J. Du, K. Wang, and Yu Qiao, “Exploring emotion features and fusion strategies for audio-video emotion recognition,” in Int. Conf. on Multimodal Interaction, Suzhou, China, 2019, Ed. by W. Gao, H. M. Ling Meng, M. Turk, S. R. Fussell, B. Schuller, Ya. Song, and K. Yu (Association for Computing Machinery, New York, 2019), pp. 562–566. https://​doi.​org/​10.​1145/​3340555.​3355713
Metadata
Title
Audio-Visual Continuous Recognition of Emotional State in a Multi-User System Based on Personalized Representation of Facial Expressions and Voice
Authors
A. V. Savchenko
L. V. Savchenko
Publication date
01-09-2022
Publisher
Pleiades Publishing
Published in
Pattern Recognition and Image Analysis / Issue 3/2022
Print ISSN: 1054-6618
Electronic ISSN: 1555-6212
DOI
https://doi.org/10.1134/S1054661822030397

Other articles of this Issue 3/2022

Pattern Recognition and Image Analysis 3/2022 Go to the issue

SELECTED PAPERS OF THE 8th INTERNATIONAL WORKSHOP “IMAGE MINING. THEORY AND APPLICATIONS”

Images Recognition with Selection of Informative Subspaces by Conjugacy Criterion

SELECTED PAPERS OF THE 8th INTERNATIONAL WORKSHOP “IMAGE MINING. THEORY AND APPLICATIONS”

An Automated Analysis Tool for the Classification of Sea Surface Temperature Imagery

KEYNOTE PAPERS OF THE 8th INTERNATIONAL WORKSHOP “IMAGE MINING. THEORY AND APPLICATIONS”

Meta-Learning Approach in Diffractive Lens Computational Imaging

SELECTED PAPERS OF THE 8th INTERNATIONAL WORKSHOP “IMAGE MINING. THEORY AND APPLICATIONS”

Recognition of Congestive Heart Failure Based on a Complex Correlation Measure of the Heart Rate Signal

SELECTED PAPERS OF THE 8th INTERNATIONAL WORKSHOP “IMAGE MINING. THEORY AND APPLICATIONS”

Classification with Incomplete Probabilistic Labeling Based on Manifold Regularization and Fuzzy Clustering Ensemble

Premium Partner