Skip to main content
Erschienen in:

27.05.2024 | Research

Vision-Enabled Large Language and Deep Learning Models for Image-Based Emotion Recognition

verfasst von: Mohammad Nadeem, Shahab Saquib Sohail, Laeeba Javed, Faisal Anwer, Abdul Khader Jilani Saudagar, Khan Muhammad

Erschienen in: Cognitive Computation | Ausgabe 5/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The significant advancements in the capabilities, reasoning, and efficiency of artificial intelligence (AI)-based tools and systems are evident. Some noteworthy examples of such tools include generative AI-based large language models (LLMs) such as generative pretrained transformer 3.5 (GPT 3.5), generative pretrained transformer 4 (GPT-4), and Bard. LLMs are versatile and effective for various tasks such as composing poetry, writing codes, generating essays, and solving puzzles. Thus far, LLMs can only effectively process text-based input. However, recent advancements have enabled them to handle multimodal inputs, such as text, images, and audio, making them highly general-purpose tools. LLMs have achieved decent performance in pattern recognition tasks (such as classification), therefore, there is a curiosity about whether general-purpose LLMs can perform comparable or even superior to specialized deep learning models (DLMs) trained specifically for a given task. In this study, we compared the performances of fine-tuned DLMs with those of general-purpose LLMs for image-based emotion recognition. We trained DLMs, namely, a convolutional neural network (CNN) (two CNN models were used: \(CNN_1\) and \(CNN_2\)), ResNet50, and VGG-16 models, using an image dataset for emotion recognition, and then tested their performance on another dataset. Subsequently, we subjected the same testing dataset to two vision-enabled LLMs (LLaVa and GPT-4). The \(CNN_2\) was found to be the superior model with an accuracy of 62% while VGG16 produced the lowest accuracy with 31%. In the category of LLMs, GPT-4 performed the best, with an accuracy of 55.81%. LLava LLM had a higher accuracy than \(CNN_1\) and VGG16 models. The other performance metrics such as precision, recall, and F1-score followed similar trends. However, GPT-4 performed the best with small datasets. The poor results observed in LLMs can be attributed to their general-purpose nature, which, despite extensive pretraining, may not fully capture the features required for specific tasks like emotion recognition in images as effectively as models fine-tuned for those tasks. The LLMs did not surpass specialized models but achieved comparable performance, making them a viable option for specific tasks without additional training. In addition, LLMs can be considered a good alternative when the available dataset is small.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Sai S, Mittal U, Chamola V, Huang K, Spinelli I, Scardapane S, Tan Z, Hussain A. Machine un-learning: an overview of techniques, applications, and future directions. Cogn Comput. 2023;1–25. Sai S, Mittal U, Chamola V, Huang K, Spinelli I, Scardapane S, Tan Z, Hussain A. Machine un-learning: an overview of techniques, applications, and future directions. Cogn Comput. 2023;1–25.
2.
Zurück zum Zitat O’Leary DE. An analysis of three chatbots: BlenderBot, ChatGPT and Lamda. Intell Syst Accounting Fin Manage. 2023;30(1):41–54.CrossRef O’Leary DE. An analysis of three chatbots: BlenderBot, ChatGPT and Lamda. Intell Syst Accounting Fin Manage. 2023;30(1):41–54.CrossRef
6.
Zurück zum Zitat Bakker M, Chadwick M, Sheahan H, Tessler M, Campbell-Gillingham L, Balaguer J, McAleese N, Glaese A, Aslanides J, Botvinick M, et al. Fine-tuning language models to find agreement among humans with diverse preferences. Adv Neural Inf Process Syst. 2022;35:38176–89. Bakker M, Chadwick M, Sheahan H, Tessler M, Campbell-Gillingham L, Balaguer J, McAleese N, Glaese A, Aslanides J, Botvinick M, et al. Fine-tuning language models to find agreement among humans with diverse preferences. Adv Neural Inf Process Syst. 2022;35:38176–89.
7.
Zurück zum Zitat Singhal K, Azizi S, Tu T, Mahdavi SS, Wei J, Chung HW, Scales N, Tanwani A, Cole-Lewis H, Pfohl S, et al. Large language models encode clinical knowledge. Nature. 2023;620(7972):172–80.CrossRef Singhal K, Azizi S, Tu T, Mahdavi SS, Wei J, Chung HW, Scales N, Tanwani A, Cole-Lewis H, Pfohl S, et al. Large language models encode clinical knowledge. Nature. 2023;620(7972):172–80.CrossRef
9.
Zurück zum Zitat Zhao B, Jin W, Del Ser J, Yang G. ChatAgri: exploring potentials of ChatGPT on cross-linguistic agricultural text classification. Neurocomputing. 2023;557:126708.CrossRef Zhao B, Jin W, Del Ser J, Yang G. ChatAgri: exploring potentials of ChatGPT on cross-linguistic agricultural text classification. Neurocomputing. 2023;557:126708.CrossRef
12.
Zurück zum Zitat Dowling M, Lucey B. ChatGPT for (finance) research: the Bananarama conjecture. Financ Res Lett. 2023;53:103662.CrossRef Dowling M, Lucey B. ChatGPT for (finance) research: the Bananarama conjecture. Financ Res Lett. 2023;53:103662.CrossRef
14.
Zurück zum Zitat Cascella M, Montomoli J, Bellini V, Bignami E. Evaluating the feasibility of ChatGPT in healthcare: an analysis of multiple clinical and research scenarios. J Med Syst. 2023;47(1):33.CrossRef Cascella M, Montomoli J, Bellini V, Bignami E. Evaluating the feasibility of ChatGPT in healthcare: an analysis of multiple clinical and research scenarios. J Med Syst. 2023;47(1):33.CrossRef
15.
Zurück zum Zitat Sohail SS, Farhat F, Himeur Y, Nadeem M, Madsen DØ, Singh Y, Atalla S, Mansoor W. Decoding ChatGPT: a taxonomy of existing research, current challenges, and possible future directions. J King Saud Univ Comput Inf Sci. 2023;101675. Sohail SS, Farhat F, Himeur Y, Nadeem M, Madsen DØ, Singh Y, Atalla S, Mansoor W. Decoding ChatGPT: a taxonomy of existing research, current challenges, and possible future directions. J King Saud Univ Comput Inf Sci. 2023;101675.
16.
Zurück zum Zitat Sashida M, Izumi K, Sakaji H. Extraction SDGS-related sentences from sustainability reports using Bert and ChatGPT. In: 14th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI). IEEE; 2023. p. 742–5. Sashida M, Izumi K, Sakaji H. Extraction SDGS-related sentences from sustainability reports using Bert and ChatGPT. In: 14th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI). IEEE; 2023. p. 742–5.
17.
Zurück zum Zitat Mosaiyebzadeh F, Pouriyeh S, Parizi R, Dehbozorgi N, Dorodchi M, Macêdo Batista D. Exploring the role of ChatGPT in education: applications and challenges. In: Proceedings of the 24th Annual Conference on Information Technology Education. 2023. p. 84–9. Mosaiyebzadeh F, Pouriyeh S, Parizi R, Dehbozorgi N, Dorodchi M, Macêdo Batista D. Exploring the role of ChatGPT in education: applications and challenges. In: Proceedings of the 24th Annual Conference on Information Technology Education. 2023. p. 84–9.
18.
Zurück zum Zitat Patrinos GP, Sarhangi N, Sarrami B, Khodayari N, Larijani B, Hasanzad M. Using ChatGPT to predict the future of personalized medicine. Pharmacogenomics J. 2023;23(6):178–84.CrossRef Patrinos GP, Sarhangi N, Sarrami B, Khodayari N, Larijani B, Hasanzad M. Using ChatGPT to predict the future of personalized medicine. Pharmacogenomics J. 2023;23(6):178–84.CrossRef
19.
Zurück zum Zitat Amin MM, Cambria E, Schuller BW. Can ChatGPT’s responses boost traditional natural language processing? IEEE Intell Syst. 2023;38(5):5–11.CrossRef Amin MM, Cambria E, Schuller BW. Can ChatGPT’s responses boost traditional natural language processing? IEEE Intell Syst. 2023;38(5):5–11.CrossRef
21.
Zurück zum Zitat Shen D, Wu G, Suk H-I. Deep learning in medical image analysis. Annu Rev Biomed Eng. 2017;19:221–48.CrossRef Shen D, Wu G, Suk H-I. Deep learning in medical image analysis. Annu Rev Biomed Eng. 2017;19:221–48.CrossRef
22.
Zurück zum Zitat Sultana F, Sufian A, Dutta P. Advancements in image classification using convolutional neural network. In: 2018 Fourth International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN). IEEE; 2018. p. 122–9.CrossRef Sultana F, Sufian A, Dutta P. Advancements in image classification using convolutional neural network. In: 2018 Fourth International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN). IEEE; 2018. p. 122–9.CrossRef
23.
Zurück zum Zitat Dhruv P, Naskar S. Image classification using convolutional neural network (CNN) and recurrent neural network (RNN): a review. Machine Learning and Information Processing: Proceedings of ICMLIP. 2019;2020:367–81. Dhruv P, Naskar S. Image classification using convolutional neural network (CNN) and recurrent neural network (RNN): a review. Machine Learning and Information Processing: Proceedings of ICMLIP. 2019;2020:367–81.
24.
Zurück zum Zitat Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, et al. Language models are few-shot learners. Adv Neural Inf Process Syst. 2020;33:1877–901. Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, et al. Language models are few-shot learners. Adv Neural Inf Process Syst. 2020;33:1877–901.
25.
Zurück zum Zitat Lazarus RS. Emotions and interpersonal relationships: toward a person-centered conceptualization of emotions and coping. J Pers. 2006;74(1):9–46.CrossRef Lazarus RS. Emotions and interpersonal relationships: toward a person-centered conceptualization of emotions and coping. J Pers. 2006;74(1):9–46.CrossRef
26.
Zurück zum Zitat Elliott EA, Jacobs AM. Facial expressions, emotions, and sign languages. Front Psychol. 2013;4:115.CrossRef Elliott EA, Jacobs AM. Facial expressions, emotions, and sign languages. Front Psychol. 2013;4:115.CrossRef
27.
Zurück zum Zitat Li H, Xu H. Deep reinforcement learning for robust emotional classification in facial expression recognition. Knowl-Based Syst. 2020;204:106172.CrossRef Li H, Xu H. Deep reinforcement learning for robust emotional classification in facial expression recognition. Knowl-Based Syst. 2020;204:106172.CrossRef
28.
Zurück zum Zitat Sun C, Shrivastava A, Singh S, Gupta A. Revisiting unreasonable effectiveness of data in deep learning era. In: Proceedings of the IEEE International Conference on Computer Vision. 2017. p. 843–52. Sun C, Shrivastava A, Singh S, Gupta A. Revisiting unreasonable effectiveness of data in deep learning era. In: Proceedings of the IEEE International Conference on Computer Vision. 2017. p. 843–52.
29.
Zurück zum Zitat Shorten C, Khoshgoftaar TM. A survey on image data augmentation for deep learning. J Big Data. 2019;6(1):1–48.CrossRef Shorten C, Khoshgoftaar TM. A survey on image data augmentation for deep learning. J Big Data. 2019;6(1):1–48.CrossRef
30.
Zurück zum Zitat Shaha M, Pawar M. 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA). In: Transfer learning for image classification. IEEE; 2018. p. 656–60. Shaha M, Pawar M. 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA). In: Transfer learning for image classification. IEEE; 2018. p. 656–60.
31.
Zurück zum Zitat Fan Y, Lam JC, Li VO. Multi-region ensemble convolutional neural network for facial expression recognition. In: Artificial Neural Networks and Machine Learning–ICANN 2018: 27th International Conference on Artificial Neural Networks, Rhodes, Greece, October 4-7, 2018, Proceedings, Part I 27. Springer; 2018. p. 84–94.CrossRef Fan Y, Lam JC, Li VO. Multi-region ensemble convolutional neural network for facial expression recognition. In: Artificial Neural Networks and Machine Learning–ICANN 2018: 27th International Conference on Artificial Neural Networks, Rhodes, Greece, October 4-7, 2018, Proceedings, Part I 27. Springer; 2018. p. 84–94.CrossRef
32.
Zurück zum Zitat Wang Y, Li Y, Song Y, Rong X. Facial expression recognition based on auxiliary models. Algorithms. 2019;12(11):227.CrossRef Wang Y, Li Y, Song Y, Rong X. Facial expression recognition based on auxiliary models. Algorithms. 2019;12(11):227.CrossRef
34.
Zurück zum Zitat Bodapati JD, Veeranjaneyulu N. Facial emotion recognition using deep CNN based features. Int J Innov Technol Explor Eng. 2019;8(7):1928–31. Bodapati JD, Veeranjaneyulu N. Facial emotion recognition using deep CNN based features. Int J Innov Technol Explor Eng. 2019;8(7):1928–31.
36.
Zurück zum Zitat Rescigno M, Spezialetti M, Rossi S. Personalized models for facial emotion recognition through transfer learning. Multimed Tools Appl. 2020;79:35811–28.CrossRef Rescigno M, Spezialetti M, Rossi S. Personalized models for facial emotion recognition through transfer learning. Multimed Tools Appl. 2020;79:35811–28.CrossRef
37.
Zurück zum Zitat Chowdary MK, Nguyen TN, Hemanth DJ. Deep learning-based facial emotion recognition for human–computer interaction applications. Neural Comput Applic. 2021;1–18. Chowdary MK, Nguyen TN, Hemanth DJ. Deep learning-based facial emotion recognition for human–computer interaction applications. Neural Comput Applic. 2021;1–18.
38.
Zurück zum Zitat Lakshmi D, Ponnusamy R. Facial emotion recognition using modified hog and LBP features with deep stacked autoencoders. Microprocess Microsyst. 2021;82:103834.CrossRef Lakshmi D, Ponnusamy R. Facial emotion recognition using modified hog and LBP features with deep stacked autoencoders. Microprocess Microsyst. 2021;82:103834.CrossRef
39.
Zurück zum Zitat Mishra S, Joshi B, Paudyal R, Chaulagain D, Shakya S. Deep residual learning for facial emotion recognition. In: Mobile Computing and Sustainable Informatics: Proceedings of ICMCSI 2021. Springer; 2022. p. 301–13.CrossRef Mishra S, Joshi B, Paudyal R, Chaulagain D, Shakya S. Deep residual learning for facial emotion recognition. In: Mobile Computing and Sustainable Informatics: Proceedings of ICMCSI 2021. Springer; 2022. p. 301–13.CrossRef
40.
Zurück zum Zitat Eluri S. A novel leaky rectified triangle linear unit based deep convolutional neural network for facial emotion recognition. Multimed Tools Appl. 2023;82(12):18669–89.CrossRef Eluri S. A novel leaky rectified triangle linear unit based deep convolutional neural network for facial emotion recognition. Multimed Tools Appl. 2023;82(12):18669–89.CrossRef
41.
Zurück zum Zitat Tseng S-Y, Narayanan S, Georgiou P. Multimodal embeddings from language models for emotion recognition in the wild. IEEE Signal Process Lett. 2021;28:608–12.CrossRef Tseng S-Y, Narayanan S, Georgiou P. Multimodal embeddings from language models for emotion recognition in the wild. IEEE Signal Process Lett. 2021;28:608–12.CrossRef
42.
Zurück zum Zitat Lammerse M, Hassan SZ, Sabet SS, Riegler MA, Halvorsen P. Human vs. GPT-3: the challenges of extracting emotions from child responses. In: 2022 14th International Conference on Quality of Multimedia Experience (QoMEX). IEEE; 2022. p. 1–4. Lammerse M, Hassan SZ, Sabet SS, Riegler MA, Halvorsen P. Human vs. GPT-3: the challenges of extracting emotions from child responses. In: 2022 14th International Conference on Quality of Multimedia Experience (QoMEX). IEEE; 2022. p. 1–4.
43.
Zurück zum Zitat Elyoseph Z, Hadar-Shoval D, Asraf K, Lvovsky M. ChatGPT outperforms humans in emotional awareness evaluations. Front Psychol. 2023;14:1199058. Elyoseph Z, Hadar-Shoval D, Asraf K, Lvovsky M. ChatGPT outperforms humans in emotional awareness evaluations. Front Psychol. 2023;14:1199058.
46.
Zurück zum Zitat Goodfellow IJ, Erhan D, Carrier PL, Courville A, Mirza M, Hamner B, Cukierski W, Tang Y, Thaler D, Lee D-H, et al. Challenges in representation learning: a report on three machine learning contests. In: Neural Information Processing: 20th International Conference, ICONIP 2013, Daegu, Korea, November 3-7, 2013. Proceedings, Part III 20. Springer; 2013. p. 117–24.CrossRef Goodfellow IJ, Erhan D, Carrier PL, Courville A, Mirza M, Hamner B, Cukierski W, Tang Y, Thaler D, Lee D-H, et al. Challenges in representation learning: a report on three machine learning contests. In: Neural Information Processing: 20th International Conference, ICONIP 2013, Daegu, Korea, November 3-7, 2013. Proceedings, Part III 20. Springer; 2013. p. 117–24.CrossRef
48.
Zurück zum Zitat He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. p. 770–8. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. p. 770–8.
49.
Zurück zum Zitat Liu H, Li C, Wu Q, Lee YJ. Visual instruction tuning. Adv Neural Inf Process Syst. 2024;36. Liu H, Li C, Wu Q, Lee YJ. Visual instruction tuning. Adv Neural Inf Process Syst. 2024;36.
50.
Zurück zum Zitat Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J, et al. Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning. PMLR; 2021. p. 8748–63. Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J, et al. Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning. PMLR; 2021. p. 8748–63.
51.
Zurück zum Zitat Chiang W-L, Li Z, Lin Z, Sheng Y, Wu Z, Zhang H, Zheng L, Zhuang S, Zhuang Y, Gonzalez JE et al. Vicuna: an open-source chatbot impressing GPT-4 with 90%* ChatGPT quality. 2023. https://vicuna.lmsys.org. Accessed 14 Apr 2023. Chiang W-L, Li Z, Lin Z, Sheng Y, Wu Z, Zhang H, Zheng L, Zhuang S, Zhuang Y, Gonzalez JE et al. Vicuna: an open-source chatbot impressing GPT-4 with 90%* ChatGPT quality. 2023. https://​vicuna.​lmsys.​org. Accessed 14 Apr 2023.
53.
Zurück zum Zitat Yosinski J, Clune J, Bengio Y, Lipson H. How transferable are features in deep neural networks? Adv Neural Inf Proces Syst. 2014;27. Yosinski J, Clune J, Bengio Y, Lipson H. How transferable are features in deep neural networks? Adv Neural Inf Proces Syst. 2014;27.
54.
Zurück zum Zitat Amin MM, Cambria E, Schuller BW. Will affective computing emerge from foundation models and general artificial intelligence? a first evaluation of ChatGPT. IEEE Intell Syst. 2023;38(2):15–23.CrossRef Amin MM, Cambria E, Schuller BW. Will affective computing emerge from foundation models and general artificial intelligence? a first evaluation of ChatGPT. IEEE Intell Syst. 2023;38(2):15–23.CrossRef
55.
Zurück zum Zitat Areeb QM, Nadeem M, Sohail SS, Imam R, Doctor F, Himeur Y, Hussain A, Amira A. Filter bubbles in recommender systems: fact or fallacy-a systematic review. Wiley Interdiscip Rev Data Min Knowl Discov. 2023;13(6):e1512.CrossRef Areeb QM, Nadeem M, Sohail SS, Imam R, Doctor F, Himeur Y, Hussain A, Amira A. Filter bubbles in recommender systems: fact or fallacy-a systematic review. Wiley Interdiscip Rev Data Min Knowl Discov. 2023;13(6):e1512.CrossRef
56.
Zurück zum Zitat Meskó B, Topol EJ. The imperative for regulatory oversight of large language models (or generative AI) in healthcare. npj Digital Medicine. 2023;6(1):120.CrossRef Meskó B, Topol EJ. The imperative for regulatory oversight of large language models (or generative AI) in healthcare. npj Digital Medicine. 2023;6(1):120.CrossRef
Metadaten
Titel
Vision-Enabled Large Language and Deep Learning Models for Image-Based Emotion Recognition
verfasst von
Mohammad Nadeem
Shahab Saquib Sohail
Laeeba Javed
Faisal Anwer
Abdul Khader Jilani Saudagar
Khan Muhammad
Publikationsdatum
27.05.2024
Verlag
Springer US
Erschienen in
Cognitive Computation / Ausgabe 5/2024
Print ISSN: 1866-9956
Elektronische ISSN: 1866-9964
DOI
https://doi.org/10.1007/s12559-024-10281-5

Weitere Artikel der Ausgabe 5/2024

Cognitive Computation 5/2024 Zur Ausgabe