Skip to main content
Top
Published in: Cluster Computing 4/2019

09-12-2017

Research on depression detection algorithm combine acoustic rhythm with sparse face recognition

Authors: Jian Zhao, Weiwen Su, Jian Jia, Chao Zhang, Tingting Lu

Published in: Cluster Computing | Special Issue 4/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Due to the existence of false positive rate of the traditional depression diagnosis method, this paper proposes a multi-modal fusion algorithm based on speech signal and facial image sequence for depression diagnosis. Introduced spectrum subtraction to enhance depressed speech signal, and use cepstrum method to extract pitch frequency features with large variation rate and formant features with significant difference, the short time energy and Mel-frequency cepstral coefficients characteristic parameters for different emotion speeches are analyzed in both time domain and frequency domain, and establish a model for training and identification. Meanwhile, this paper implements the orthogonal match pursuit algorithm to obtain a sparse linear combination of face test samples, and cascade with voice and facial emotions based proportion. The experimental results show that the recognition rate based on the depression detection algorithm of fusion speech and facial emotions has reached 81.14%. Compared to the existing doctor’s accuracy rate of 47.3%, the accuracy can bring extra 71.54% improvement by combining with the proposed method of this paper. Additionally, it can easily apply to the hardware and software on the existing hospital instruments with low cost. Therefore, it is an accurate and effective method for diagnosing depression.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Ionescu, D.F., et al.: Defining anxious depression: a review of the literature. CNS Spectr. 18(5), 252–260 (2013)CrossRef Ionescu, D.F., et al.: Defining anxious depression: a review of the literature. CNS Spectr. 18(5), 252–260 (2013)CrossRef
2.
go back to reference Erschens, R., et al.: Methodological aspects of international research on the burden of anxiety and depression in medical students. Ment. Health Prev. 4(1), 31–35 (2016)CrossRef Erschens, R., et al.: Methodological aspects of international research on the burden of anxiety and depression in medical students. Ment. Health Prev. 4(1), 31–35 (2016)CrossRef
3.
go back to reference Melton, T.H., et al.: Comorbid anxiety and depressive symptoms in children and adolescents: a systematic review and analysis. J. Psychiatr. Pract. 22(2), 84 (2016)CrossRef Melton, T.H., et al.: Comorbid anxiety and depressive symptoms in children and adolescents: a systematic review and analysis. J. Psychiatr. Pract. 22(2), 84 (2016)CrossRef
4.
go back to reference Potapova, R., Grigorieva, M.: Crosslinguistic intelligibility of Russian and German speech in noisy environment. J. Electr. Comput. Eng. 2017, 1–9 (2017)CrossRef Potapova, R., Grigorieva, M.: Crosslinguistic intelligibility of Russian and German speech in noisy environment. J. Electr. Comput. Eng. 2017, 1–9 (2017)CrossRef
5.
go back to reference Vrbova, K., et al.: Quality of life, self-stigma, and hope in schizophrenia spectrum disorders: a cross-sectional study. Neuropsychiatr. Dis. Treat. 13, 567 (2017)CrossRef Vrbova, K., et al.: Quality of life, self-stigma, and hope in schizophrenia spectrum disorders: a cross-sectional study. Neuropsychiatr. Dis. Treat. 13, 567 (2017)CrossRef
6.
go back to reference Hernández-Mena, C.D., Meza-Ruiz, I.V., Herrera-Camacho, J.A.: Automatic speech recognizers for Mexican Spanish and its open resources. J. Appl. Res. Technol. 15(3) (2017) Hernández-Mena, C.D., Meza-Ruiz, I.V., Herrera-Camacho, J.A.: Automatic speech recognizers for Mexican Spanish and its open resources. J. Appl. Res. Technol. 15(3) (2017)
7.
go back to reference Huang, Y.B., et al.: Hash authentication algorithm of compressed domain speech perception based on MFCC and NMF. Appl. Mech. Mater. 719–720, 1166–1170 (2015)CrossRef Huang, Y.B., et al.: Hash authentication algorithm of compressed domain speech perception based on MFCC and NMF. Appl. Mech. Mater. 719–720, 1166–1170 (2015)CrossRef
8.
go back to reference Yang, A.Y., et al.: Distributed sensor perception via sparse representation. Proc. IEEE 98(6), 1077–1088 (2010)CrossRef Yang, A.Y., et al.: Distributed sensor perception via sparse representation. Proc. IEEE 98(6), 1077–1088 (2010)CrossRef
9.
go back to reference Maas, A.L., et al.: Building DNN acoustic models for large vocabulary speech recognition. Comput. Speech Lang. 41(C), 195–213 (2017) Maas, A.L., et al.: Building DNN acoustic models for large vocabulary speech recognition. Comput. Speech Lang. 41(C), 195–213 (2017)
10.
go back to reference Ozdas, A., et al.: Investigation of vocal jitter and glottal flow spectrum as possible cues for depression and near-term suicidal risk. IEEE Trans. Biomed. Eng. 51(9), 1530–1540 (2004)CrossRef Ozdas, A., et al.: Investigation of vocal jitter and glottal flow spectrum as possible cues for depression and near-term suicidal risk. IEEE Trans. Biomed. Eng. 51(9), 1530–1540 (2004)CrossRef
11.
go back to reference Vihari, S., et al.: Comparison of speech enhancement algorithms. Procedia Comput. Sci. 89, 666–676 (2016)CrossRef Vihari, S., et al.: Comparison of speech enhancement algorithms. Procedia Comput. Sci. 89, 666–676 (2016)CrossRef
12.
go back to reference Liu, Y.H., Zhou, D.M., Jiang, Z.J.: Improved spectral subtraction speech enhancement algorithm. Adv. Mater. Res. 760–762, 536–541 (2013)CrossRef Liu, Y.H., Zhou, D.M., Jiang, Z.J.: Improved spectral subtraction speech enhancement algorithm. Adv. Mater. Res. 760–762, 536–541 (2013)CrossRef
13.
go back to reference Tohidypour, H.R., Ahadi, S.M.: New features for speech enhancement using bivariate shrinkage based on redundant wavelet filter-banks. Comput. Speech Lang. 35(C), 93–115 (2016) Tohidypour, H.R., Ahadi, S.M.: New features for speech enhancement using bivariate shrinkage based on redundant wavelet filter-banks. Comput. Speech Lang. 35(C), 93–115 (2016)
14.
go back to reference You, C.H., Bin, M.A.: Spectral-domain speech enhancement for speech recognition. Speech Commun. 94, 30–41 (2017)CrossRef You, C.H., Bin, M.A.: Spectral-domain speech enhancement for speech recognition. Speech Commun. 94, 30–41 (2017)CrossRef
15.
go back to reference Sahu, S., Espywilson, C.: Effects of depression on speech. J. Acoust. Soc. Am. 136(4), 2312–2312 (2014)CrossRef Sahu, S., Espywilson, C.: Effects of depression on speech. J. Acoust. Soc. Am. 136(4), 2312–2312 (2014)CrossRef
16.
go back to reference ChinnaRao, M., Murthy, A.V.S.N., Satyanarayana, Ch.: Emotion recognition system based on skew gaussian mixture model and MFCC coefficients. Int. J. Inf. Eng. Electron. Bus. (IJIEEB) 4, 51–57 (2015) ChinnaRao, M., Murthy, A.V.S.N., Satyanarayana, Ch.: Emotion recognition system based on skew gaussian mixture model and MFCC coefficients. Int. J. Inf. Eng. Electron. Bus. (IJIEEB) 4, 51–57 (2015)
17.
go back to reference Yang, Y., Fairbairn, C., Cohn, J.F.: Detecting depression severity from vocal prosody. IEEE Trans. Affect. Comput. 4(2), 142–150 (2013)CrossRef Yang, Y., Fairbairn, C., Cohn, J.F.: Detecting depression severity from vocal prosody. IEEE Trans. Affect. Comput. 4(2), 142–150 (2013)CrossRef
18.
go back to reference Schuller, B., et al.: Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Commun. 53(9–10), 1062–1087 (2011)CrossRef Schuller, B., et al.: Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Commun. 53(9–10), 1062–1087 (2011)CrossRef
19.
go back to reference Laukkanen, A.M., Björkner, E., Sundberg, J.: Throaty voice quality: subglottal pressure, voice source, and formant characteristics. J. Voice 20(1), 25–37 (2006)CrossRef Laukkanen, A.M., Björkner, E., Sundberg, J.: Throaty voice quality: subglottal pressure, voice source, and formant characteristics. J. Voice 20(1), 25–37 (2006)CrossRef
20.
go back to reference Hou, L.M., Xiao-Ning, H.U., Xie, J.M.: Application of formant instantaneous characteristics to speech recognition and speaker identification. J. Shanghai Univ. (English Edition) 15(2), 123–127 (2011)CrossRef Hou, L.M., Xiao-Ning, H.U., Xie, J.M.: Application of formant instantaneous characteristics to speech recognition and speaker identification. J. Shanghai Univ. (English Edition) 15(2), 123–127 (2011)CrossRef
21.
go back to reference Vijayan, K., Reddy, P.R., Murty, K.S.R.: Significance of analytic phase of speech signals in speaker verification. Speech Commun. 81, 54–71 (2016)CrossRef Vijayan, K., Reddy, P.R., Murty, K.S.R.: Significance of analytic phase of speech signals in speaker verification. Speech Commun. 81, 54–71 (2016)CrossRef
22.
go back to reference He, L., Guo, L.H., Li, H.Z.: Emotion speech recognition under sadness conditions. Adv. Mater. Res. 488–489, 1329–1334 (2012)CrossRef He, L., Guo, L.H., Li, H.Z.: Emotion speech recognition under sadness conditions. Adv. Mater. Res. 488–489, 1329–1334 (2012)CrossRef
23.
go back to reference Jian, Z., et al.: A fast iterative pursuit algorithm in robust face recognition based on sparse representation. Math. Probl. Eng. 2, 1–11 (2014) Jian, Z., et al.: A fast iterative pursuit algorithm in robust face recognition based on sparse representation. Math. Probl. Eng. 2, 1–11 (2014)
24.
go back to reference Yin, A.H., Jiang, H.M., Zhang, Q.M.: Application of improved OMP algorithm in face recognition. Comput. Eng. 38(12), 275–278 (2012) Yin, A.H., Jiang, H.M., Zhang, Q.M.: Application of improved OMP algorithm in face recognition. Comput. Eng. 38(12), 275–278 (2012)
Metadata
Title
Research on depression detection algorithm combine acoustic rhythm with sparse face recognition
Authors
Jian Zhao
Weiwen Su
Jian Jia
Chao Zhang
Tingting Lu
Publication date
09-12-2017
Publisher
Springer US
Published in
Cluster Computing / Issue Special Issue 4/2019
Print ISSN: 1386-7857
Electronic ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-017-1469-0

Other articles of this Special Issue 4/2019

Cluster Computing 4/2019 Go to the issue

Premium Partner