Skip to main content
Erschienen in: Journal on Multimodal User Interfaces 3/2013

01.11.2013 | Original Paper

Multimodal assistive technologies for depression diagnosis and monitoring

verfasst von: Jyoti Joshi, Roland Goecke, Sharifa Alghowinem, Abhinav Dhall, Michael Wagner, Julien Epps, Gordon Parker, Michael Breakspear

Erschienen in: Journal on Multimodal User Interfaces | Ausgabe 3/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Depression is a severe mental health disorder with high societal costs. Current clinical practice depends almost exclusively on self-report and clinical opinion, risking a range of subjective biases. The long-term goal of our research is to develop assistive technologies to support clinicians and sufferers in the diagnosis and monitoring of treatment progress in a timely and easily accessible format. In the first phase, we aim to develop a diagnostic aid using affective sensing approaches. This paper describes the progress to date and proposes a novel multimodal framework comprising of audio-video fusion for depression diagnosis. We exploit the proposition that the auditory and visual human communication complement each other, which is well-known in auditory-visual speech processing; we investigate this hypothesis for depression analysis. For the video data analysis, intra-facial muscle movements and the movements of the head and shoulders are analysed by computing spatio-temporal interest points. In addition, various audio features (fundamental frequency f0, loudness, intensity and mel-frequency cepstral coefficients) are computed. Next, a bag of visual features and a bag of audio features are generated separately. In this study, we compare fusion methods at feature level, score level and decision level. Experiments are performed on an age and gender matched clinical dataset of 30 patients and 30 healthy controls. The results from the multimodal experiments show the proposed framework’s effectiveness in depression analysis.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Alghowinem S, Goecke R, Wagner M, Epps J, Breakspear M, Parker G (2012) From joyous to clinically depressed: mood detection using spontaneous speech. In: Proceedings of the 25th international FLAIRS conference FLAIRS-25, pp 141–146 Alghowinem S, Goecke R, Wagner M, Epps J, Breakspear M, Parker G (2012) From joyous to clinically depressed: mood detection using spontaneous speech. In: Proceedings of the 25th international FLAIRS conference FLAIRS-25, pp 141–146
2.
Zurück zum Zitat Ambadar Z, Schooler J, Cohn J (2005) Deciphering the enigmatic face: the importance of facial dynamics to interpreting subtle facial expressions. Psychol Sci 16(5):403–410CrossRef Ambadar Z, Schooler J, Cohn J (2005) Deciphering the enigmatic face: the importance of facial dynamics to interpreting subtle facial expressions. Psychol Sci 16(5):403–410CrossRef
3.
Zurück zum Zitat Asthana A, Saragih J, Wagner M, Goecke R (2009) Evaluating AAM fitting methods for facial expression recognition. In: Proceedings of the IEEE international conference on affective computing and intelligent interaction ACII2009, pp 598–605 Asthana A, Saragih J, Wagner M, Goecke R (2009) Evaluating AAM fitting methods for facial expression recognition. In: Proceedings of the IEEE international conference on affective computing and intelligent interaction ACII2009, pp 598–605
4.
Zurück zum Zitat Batliner A, Huber R (2007) Speaker classification I. Speaker characteristics and emotion classification, Springer, Verlag, pp 138–151 Batliner A, Huber R (2007) Speaker classification I. Speaker characteristics and emotion classification, Springer, Verlag, pp 138–151
5.
Zurück zum Zitat Busso C, Deng Z, Yildirim S, Bulut M, Lee CM, Kazemzadeh A, Lee S, Neumann U, Narayanan S (2004) Analysis of emotion recognition using facial expressions, speech and multimodal information. In: Sixth international conference on multimodal interfaces ICMI 2004, ACM Press, New York, pp 205–211 Busso C, Deng Z, Yildirim S, Bulut M, Lee CM, Kazemzadeh A, Lee S, Neumann U, Narayanan S (2004) Analysis of emotion recognition using facial expressions, speech and multimodal information. In: Sixth international conference on multimodal interfaces ICMI 2004, ACM Press, New York, pp 205–211
7.
Zurück zum Zitat Cohn JF, Kreuz TS, Matthews I, Yang Y, Nguyen MH, Padilla MT, Zhou F, De la Torre F (2009) Detecting depression from facial actions and vocal prosody. In: Proceedings of international conference on affective computing and intelligent interaction ACII2009, pp 1–7 Cohn JF, Kreuz TS, Matthews I, Yang Y, Nguyen MH, Padilla MT, Zhou F, De la Torre F (2009) Detecting depression from facial actions and vocal prosody. In: Proceedings of international conference on affective computing and intelligent interaction ACII2009, pp 1–7
8.
Zurück zum Zitat Cummins N, Epps J, Breakspear M, Goecke R (2011) An investigation of depressed speech detection: features and normalization. In: Proceedings of interspeech 2011, pp 2997–3000 Cummins N, Epps J, Breakspear M, Goecke R (2011) An investigation of depressed speech detection: features and normalization. In: Proceedings of interspeech 2011, pp 2997–3000
9.
Zurück zum Zitat Dhall A, Asthana A, Goecke R (2010) Facial expression based automatic album creation. In: Proceedings of international conference on neural information processing ICONIP2010, lecture notes of computer science 6444, pp 485–492 Dhall A, Asthana A, Goecke R (2010) Facial expression based automatic album creation. In: Proceedings of international conference on neural information processing ICONIP2010, lecture notes of computer science 6444, pp 485–492
10.
Zurück zum Zitat Dhall A, Asthana A, Goecke R, Gedeon T (2011) Emotion recognition using PHOG and LPQ features. In: Proceedings of IEEE international conference on automatic face and gesture recognition FG2011, workshop facial expression recognition and analysis, FERA2011, pp 878–883 Dhall A, Asthana A, Goecke R, Gedeon T (2011) Emotion recognition using PHOG and LPQ features. In: Proceedings of IEEE international conference on automatic face and gesture recognition FG2011, workshop facial expression recognition and analysis, FERA2011, pp 878–883
11.
Zurück zum Zitat Dhall A, Joshi J, Radwan I, Goecke R (2012) Finding happiest moments in a social context. In: Proceedings of Asian conference on computer vision ACCV2012, LNCS 7725, pp 613–626 Dhall A, Joshi J, Radwan I, Goecke R (2012) Finding happiest moments in a social context. In: Proceedings of Asian conference on computer vision ACCV2012, LNCS 7725, pp 613–626
12.
Zurück zum Zitat Edwards G, Taylor C, Cootes T (1998) Interpreting face images using active appearance models. In: Proceedings of the IEEE international conference on automatic face and gesture recognition FG’98, IEEE, Nara, pp 300–305 Edwards G, Taylor C, Cootes T (1998) Interpreting face images using active appearance models. In: Proceedings of the IEEE international conference on automatic face and gesture recognition FG’98, IEEE, Nara, pp 300–305
13.
Zurück zum Zitat Ellgring H (2008) Nonverbal communication in depression. Cambridge University Press, Cambridge Ellgring H (2008) Nonverbal communication in depression. Cambridge University Press, Cambridge
14.
Zurück zum Zitat Ellgring H, Scherer KR (1996) Vocal indicators of mood change in depression. J Nonverbal Behav 20(2):83–110CrossRef Ellgring H, Scherer KR (1996) Vocal indicators of mood change in depression. J Nonverbal Behav 20(2):83–110CrossRef
15.
Zurück zum Zitat Everingham M, Sivic J, Zisserman A (2006) Hello! My name is... Buffy—automatic naming of characters in TV Video. In: Proceedings of the British machine vision conference BMVC2006, Edinburgh, pp 899–908 Everingham M, Sivic J, Zisserman A (2006) Hello! My name is... Buffy—automatic naming of characters in TV Video. In: Proceedings of the British machine vision conference BMVC2006, Edinburgh, pp 899–908
16.
Zurück zum Zitat Eyben F, Wöllmer M, Schuller B (2010) Opensmile: the munich versatile and fast open-source audio feature extractor. In: Proceedings of the 18th ACM international conference on multimedia (MM’10), Firenze, pp 1459–1462 Eyben F, Wöllmer M, Schuller B (2010) Opensmile: the munich versatile and fast open-source audio feature extractor. In: Proceedings of the 18th ACM international conference on multimedia (MM’10), Firenze, pp 1459–1462
17.
Zurück zum Zitat Joshi J, Dhall A, Goecke R, Breakspear M, Parker G (2012) Neural-net classification for spatio-temporal descriptor based depression analysis. In: Proceedings of the international conference on pattern recognition ICPR2012, Tsukuba, pp 2634–2638 Joshi J, Dhall A, Goecke R, Breakspear M, Parker G (2012) Neural-net classification for spatio-temporal descriptor based depression analysis. In: Proceedings of the international conference on pattern recognition ICPR2012, Tsukuba, pp 2634–2638
18.
Zurück zum Zitat Joshi J, Goecke R, Breakspear M, Parker G (2013) Can body expressions contribute to automatic depression analysis? In: Proceedings of the 10th IEEE international conference on automatic face and gesture recognition FG2013 . Shanghai, China Joshi J, Goecke R, Breakspear M, Parker G (2013) Can body expressions contribute to automatic depression analysis? In: Proceedings of the 10th IEEE international conference on automatic face and gesture recognition FG2013 . Shanghai, China
19.
Zurück zum Zitat Kessler R, Berglund P, Demler O, Jin R, Koretz D, Merikangas K, Rush A, Walters E, Wang P (2003) The epidemiology of major depressive disorder: results from the national comorbidity survey replication (NCS-R). J Am Med Assoc 289(23): 3095–3105 Kessler R, Berglund P, Demler O, Jin R, Koretz D, Merikangas K, Rush A, Walters E, Wang P (2003) The epidemiology of major depressive disorder: results from the national comorbidity survey replication (NCS-R). J Am Med Assoc 289(23): 3095–3105
20.
Zurück zum Zitat Kuny S, Stassen HH (1993) Speaking behavior and voice sound characteristics in depressive patients during recovery. J Psychiatr Res 27(3):289–307CrossRef Kuny S, Stassen HH (1993) Speaking behavior and voice sound characteristics in depressive patients during recovery. J Psychiatr Res 27(3):289–307CrossRef
21.
Zurück zum Zitat Laptev I, Marszałek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: Proceedings of IEEE conference on computer vision and pattern recognition CVPR2008, pp 1–8 Laptev I, Marszałek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: Proceedings of IEEE conference on computer vision and pattern recognition CVPR2008, pp 1–8
22.
Zurück zum Zitat Lingenfelser F, Wagner J, André E (2011) A systematic discussion of fusion techniques for multi-modal affect recognition tasks. In: Proceedings of the 13th international conference on multimodal interfaces ICMI2011, ACM, Alicante, pp 19–26 Lingenfelser F, Wagner J, André E (2011) A systematic discussion of fusion techniques for multi-modal affect recognition tasks. In: Proceedings of the 13th international conference on multimodal interfaces ICMI2011, ACM, Alicante, pp 19–26
23.
Zurück zum Zitat Lucey S, Matthews I, Hu C, Ambadar Z, de la Torre F, Cohn J (2006) AAM derived face representations for robust facial action recognition. In: Proceedings of the IEEE international conference on automatic face and gesture recognition FG2006, Southampton, pp 155–162 Lucey S, Matthews I, Hu C, Ambadar Z, de la Torre F, Cohn J (2006) AAM derived face representations for robust facial action recognition. In: Proceedings of the IEEE international conference on automatic face and gesture recognition FG2006, Southampton, pp 155–162
24.
Zurück zum Zitat Mathers C, Boerma T, Fat DM (2004) The global burden of disease: 2004 update. Technical report, WHO Press, Switzerland Mathers C, Boerma T, Fat DM (2004) The global burden of disease: 2004 update. Technical report, WHO Press, Switzerland
25.
Zurück zum Zitat McIntyre G, Goecke R, Hyett M, Green M, Breakspear M (2009) An approach for automatically measuring facial activity in depressed subjects. In: Proceedings of international conference on affective computing and intelligent interaction ACII2009, Amsterdam, pp 223–230 McIntyre G, Goecke R, Hyett M, Green M, Breakspear M (2009) An approach for automatically measuring facial activity in depressed subjects. In: Proceedings of international conference on affective computing and intelligent interaction ACII2009, Amsterdam, pp 223–230
26.
Zurück zum Zitat Moore E, Clements M, Peifer J, Weisser L (2004) Comparing objective feature statistics of speech for classifying clinical depression. In: Proceedings of 26th annual international conference of the IEEE engineering in medicine and biology society IEMBS’04 1, pp 17–20 Moore E, Clements M, Peifer J, Weisser L (2004) Comparing objective feature statistics of speech for classifying clinical depression. In: Proceedings of 26th annual international conference of the IEEE engineering in medicine and biology society IEMBS’04 1, pp 17–20
27.
Zurück zum Zitat Moore E, Clements M, Peifer J, Weisser L (2008) Critical analysis of the impact of glottal features in the classification of clinical depression in speech. In: IEEE transactions on biomedical engineering 55, pp 96–107 Moore E, Clements M, Peifer J, Weisser L (2008) Critical analysis of the impact of glottal features in the classification of clinical depression in speech. In: IEEE transactions on biomedical engineering 55, pp 96–107
28.
Zurück zum Zitat Mundt JC, Snyder PJ, Cannizzaro MS, Chappie K, Geralts DS (2007) Voice acoustic measures of depression severity and treatment response collected via interactive voice response (IVR) technology. J Neurolinguistics 20(1):50–64CrossRef Mundt JC, Snyder PJ, Cannizzaro MS, Chappie K, Geralts DS (2007) Voice acoustic measures of depression severity and treatment response collected via interactive voice response (IVR) technology. J Neurolinguistics 20(1):50–64CrossRef
29.
Zurück zum Zitat Nilsonne A (1988) Speech characteristics as indicators of depressive illness. Acta Psychiatrica Scandinavica 77(3):253–263 Nilsonne A (1988) Speech characteristics as indicators of depressive illness. Acta Psychiatrica Scandinavica 77(3):253–263
30.
Zurück zum Zitat Ozdas A, Shiavi R, Silverman S, Silverman M, Wilkes D (2000) Analysis of fundamental frequency for near term suicidal risk assessment. In: Proceedings of the 2000 IEEE international conference on systems, man and cybernetics SMC2000, vol 3, pp 1853–1858 Ozdas A, Shiavi R, Silverman S, Silverman M, Wilkes D (2000) Analysis of fundamental frequency for near term suicidal risk assessment. In: Proceedings of the 2000 IEEE international conference on systems, man and cybernetics SMC2000, vol 3, pp 1853–1858
31.
Zurück zum Zitat Pantic M, Rothkrantz L (2000) Automatic analysis of facial expressions: the state of the art. IEEE Trans Pattern Anal Mach Intell 22(12):1424–1445CrossRef Pantic M, Rothkrantz L (2000) Automatic analysis of facial expressions: the state of the art. IEEE Trans Pattern Anal Mach Intell 22(12):1424–1445CrossRef
32.
Zurück zum Zitat Parker G, Hadzi-Pavlovic D (1996) Melancholia: a disorder of movement and mood. Cambridge University Press, Cambridge Parker G, Hadzi-Pavlovic D (1996) Melancholia: a disorder of movement and mood. Cambridge University Press, Cambridge
33.
Zurück zum Zitat Picard R (1997) Affective computing. MIT Press, Cambridge Picard R (1997) Affective computing. MIT Press, Cambridge
34.
Zurück zum Zitat Prendergast M (2006) Understanding depression. Penguin, Australia Prendergast M (2006) Understanding depression. Penguin, Australia
35.
Zurück zum Zitat Saragih J, Goecke R (2009) Learning AAM fitting through simulation. Pattern Recognit 42(11):2628–2636CrossRefMATH Saragih J, Goecke R (2009) Learning AAM fitting through simulation. Pattern Recognit 42(11):2628–2636CrossRefMATH
36.
Zurück zum Zitat Saragih JM, Lucey S, Cohn J (2009) Face alignment through subspace constrained mean-shifts. In: Proceedings of the IEEE international conference of computer vision ICCV2009, pp 1034–1041 Saragih JM, Lucey S, Cohn J (2009) Face alignment through subspace constrained mean-shifts. In: Proceedings of the IEEE international conference of computer vision ICCV2009, pp 1034–1041
37.
Zurück zum Zitat Scherer KR (1987) Vocal assessment of affective disorders. In: Maser JD (ed) Depression and expressive behavior, Lawrence Erlbaum Associates, pp 57–82 Scherer KR (1987) Vocal assessment of affective disorders. In: Maser JD (ed) Depression and expressive behavior, Lawrence Erlbaum Associates, pp 57–82
38.
Zurück zum Zitat Schuller B, Batliner A, Seppi D, Steidl S, Vogt T, Wagner J, Devillers L, Vidrascu L, Amir N, Kessous L (2007) The relevance of feature type for the automatic classification of emotional user states: low level descriptors and functionals. In: Proceedings of interspeech 2007, Antwerp, Belgium, pp 2253–2256 Schuller B, Batliner A, Seppi D, Steidl S, Vogt T, Wagner J, Devillers L, Vidrascu L, Amir N, Kessous L (2007) The relevance of feature type for the automatic classification of emotional user states: low level descriptors and functionals. In: Proceedings of interspeech 2007, Antwerp, Belgium, pp 2253–2256
39.
Zurück zum Zitat Sikka K, Wu T, Susskind J, Bartlett MS (2012) Exploring bag of words architectures in the facial expression domain. In: Proceedings of the European conference on computer vision (ECCV) workshops, lecture notes in computer science 7584, Springer, Verlag, pp 250–259 Sikka K, Wu T, Susskind J, Bartlett MS (2012) Exploring bag of words architectures in the facial expression domain. In: Proceedings of the European conference on computer vision (ECCV) workshops, lecture notes in computer science 7584, Springer, Verlag, pp 250–259
40.
Zurück zum Zitat Viola PA, Jones MJ (2001) Rapid object detection using a boosted cascade of simple features. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition CVPR2001, pp 511–518 Viola PA, Jones MJ (2001) Rapid object detection using a boosted cascade of simple features. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition CVPR2001, pp 511–518
41.
Zurück zum Zitat Zeng Z, Pantic M, Roisman G, Huang T (2009) A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31(1):39–58 Zeng Z, Pantic M, Roisman G, Huang T (2009) A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31(1):39–58
42.
Zurück zum Zitat Zhao G, Pietikainen M (2007) Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans Pattern Anal Mach Intell 31(1):39–58 Zhao G, Pietikainen M (2007) Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans Pattern Anal Mach Intell 31(1):39–58
43.
Zurück zum Zitat Zhu X, Ramanan D (2012) Face detection, pose estimation, and landmark localization in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition CVPR2012, Providence, USA, pp 2879–2886 Zhu X, Ramanan D (2012) Face detection, pose estimation, and landmark localization in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition CVPR2012, Providence, USA, pp 2879–2886
Metadaten
Titel
Multimodal assistive technologies for depression diagnosis and monitoring
verfasst von
Jyoti Joshi
Roland Goecke
Sharifa Alghowinem
Abhinav Dhall
Michael Wagner
Julien Epps
Gordon Parker
Michael Breakspear
Publikationsdatum
01.11.2013
Verlag
Springer Berlin Heidelberg
Erschienen in
Journal on Multimodal User Interfaces / Ausgabe 3/2013
Print ISSN: 1783-7677
Elektronische ISSN: 1783-8738
DOI
https://doi.org/10.1007/s12193-013-0123-2

Weitere Artikel der Ausgabe 3/2013

Journal on Multimodal User Interfaces 3/2013 Zur Ausgabe