Skip to main content

Pathophysiological Voice Analysis for Diagnosis and Monitoring of Depression

  • Chapter
  • First Online:
Understanding Depression

Abstract

Self-assessment questionnaires are commonly used for screening for stress and depression. However, there are problems of reporting bias that respondents underestimate or overestimate consciously or unconsciously. On the other hand, various biomarkers of depression and stress are being studied. These are often necessary for expensive equipment and chemicals and are useful for definitive diagnosis and elucidation of mechanisms, but they are not suitable for screening for many populations.

It is a known fact that various diseases change the voice. The relationship between disease and voice has been studied in the field of acoustic phonetics since long ago. They have been studied mainly in the frequency band (F1, F2, etc.) which are obtained by Cepstrum analysis of voice. They are influenced by the shape of the vocal tract called the formant (the cavity from the vocal cord to the mouth). On the other hand, studies using the fundamental frequency (F0) which is obtained as a lowest frequency by FFT also have been reported. F0 is affected vocal cord vibration, and currently there are various methods of F0 analysis. F0 contains a lot of involuntary components compared to the formant. Therefore, analysis of F0 is potentially available to diagnose various diseases. Now, the range of adaptation of voice analysis has expanded from the otolaryngology area to psychiatric areas such as depression and neurological diseases such as Parkinson’s disease. In addition, research such as differential diagnosis by voice and measurement of therapeutic effect has started.

Such developments are largely due to the development of computers, especially the spread of smartphones. In other words, voice collection and analysis became possible in everyday life. For example, several smartphone applications that measure stress and depression by analyzing everyday speech have been published. In Japan, the movement to utilize such applications in the fields of healthcare and industrial medicine is becoming active. Our group has already developed Mind Monitoring System using smartphone and operates that system in Japan. This system is based on the emotion recognition technology instead of directly voice analysis. Pathophysiological analysis by voice is noninvasive, remote and continuously, without requiring special equipment. Therefore, this technique is effective as screening for many subjects and long-term continuous monitoring at home. This means that this technology can be a bridge between healthcare and medical treatment. In clinical, it is also possible to give objective indicators to medical areas that had only subjective indicators.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  • Beck AT. A systematic investigation of depression. Compr Psychiatry. 1961;2(3):163–70.

    Article  CAS  Google Scholar 

  • Beck AT, Ward CH, Mendelson M, Mock J, Erbaugh J. An inventory for measuring depression. Arch Gen Psychiatry. 1961;4(6):561–71.

    Article  CAS  Google Scholar 

  • Burkhardt F, Sendlmeier WF. Verification of acoustical correlates of emotional speech using formant-synthesis. In: ISCA Tutorial and Research Workshop (ITRW) on speech and emotion, 2000.

    Google Scholar 

  • Cahn JE. The generation of affect in synthesized speech. J Am Voice I/O Soc. 1990;8:1–19.

    Google Scholar 

  • Cobb S, Lindemann E. Neuropsychiatric observations (in a symposium on the management of the cocoanut grove burns at the Massachusetts General Hospital). Ann Surg. 1943;117(2):814.

    Article  CAS  Google Scholar 

  • Cummins N, Epps J, Breakspear M, Goecke R. An investigation of depressed speech detection: features and normalization. In: Interspeech, 2011. P. 2997–3000.

    Google Scholar 

  • Darby JK, editor. Speech evaluation in psychiatry. New York: Grune and Stratton; 1981.

    Google Scholar 

  • Darby JK, Hollien H. Vocal and speech patterns of depressive patients. Folia Phoniatr. 1977;2(9):279–91.

    Article  Google Scholar 

  • Darby JK, Simmons N, Berger P. Speech and voice parameters in depression a: pilot study. J Commun Disord. 1984;17:87–94.

    Article  Google Scholar 

  • Eyben F, Wöllmer M, Schuller B. Opensmile: the Munich versatile and fast open-source audio feature extractor. In: Proceedings of the 18th ACM international conference on multimedia, ACM, Oct 2010. P. 1459–62.

    Google Scholar 

  • Flint AJ, Black SE, Campbell-Taylor I, Gailey GF, Levinton C. Abnormal speech articulation, psychomotor retardation, and subcortical dysfunction in major depression. J Psychiatr Res. 1993;27(3):309–19.

    Article  CAS  Google Scholar 

  • Goldberg DP, Blackwell B. Psychiatric illness in general practice: a detailed study using a new method of case identification. BMJ. 1970;2(5707):439–43.

    Article  Google Scholar 

  • Hagiwara N, Omiya Y, Shinohara S, Nakamura M, Kogure U, Mitsuyoshi S, Tokuno S. Effectiveness verification by the difference of the recording method in the monitoring system of the mental health state by voice using the smartphone [Japanese]. In: Japan Biomedical Engineering Symposium 2016 (JBEMS 2016), Asahikawa, Sept 2016a.

    Google Scholar 

  • Hagiwara N, Omiya Y, Shinohara S, Nakamura M, Yasunaga H, Mitsuyoshi S, Tokuno S. Validity of the mind monitoring system as a mental health indicator. In: 2016 IEEE 16th international conference on Bioinformatics and Bioengineering (BIBE), Taichung, Oct 2016b. P. 262–5.

    Google Scholar 

  • Hargreaves W, Starkweather J, Blacker K. Voice quality in depression. J Abnorm Psychol. 1965;70:218–20.

    Article  CAS  Google Scholar 

  • Hoge CW, Castro CA, Messer SC, McGurk D, Cotting DI, Koffman RL. Combat duty in Iraq and Afghanistan, mental health problems, and barriers to care. N Engl J Med. 2004;351(1):13–22.

    Article  CAS  Google Scholar 

  • Kessler RC, Andrews G, Colpe LJ, Hiripi E, Mroczek DK, Normand SL, Walters EE, Zaslavsky AM. Short screening scales to monitor population prevalences and trends in non-specific psychological distress. Psychol Med. 2002;32(6):959–76.

    Article  CAS  Google Scholar 

  • Low LSA, Maddage NC, Lech M, Sheeber LB, Allen NB. Detection of clinical depression in adolescents’ speech during family interactions. IEEE Trans Biomed Eng. 2011;58(3):574–86.

    Article  Google Scholar 

  • Maxhuni A, Muñoz-MelĂ©ndez A, Osmani V, Perez H, Mayora O, Morales EF. Classification of bipolar disorder episodes based on analysis of voice and motor activity of patients. Pervasive Mob Comput. 2016;31:50–66.

    Article  Google Scholar 

  • McLay RN, Deal WE, Murphy JA, Center KB, Kolkow TT, Grieger TA. On-the-record screenings versus anonymous surveys in reporting PTSD. Am J Psychiatry. 2008;165(6):775–6.

    Article  Google Scholar 

  • Mitsuyoshi S. Emotion recognizing method, sensibility creating method, device, and software. WO0223524, Mar 2002.

    Google Scholar 

  • Mitsuyoshi S. Development of verbal analysis pathophysiology. Econophys Sociophys Other Multidiscip Sci J. 2015;5(1):11–6.

    Google Scholar 

  • Mitsuyoshi S. Development of voice pathophysiology analysis technology: joint symposium with IT companies. In: 7th Asia Pacific regional conference of the International Association for Suicide Prevention, Tokyo, May 2016.

    Google Scholar 

  • Mitsuyoshi S, Ren F, Tanaka Y, Kuroiwa S. Non-verbal voice emotion analysis system. Int J Innov Comput Inf Control. 2006;2(4):819–30.

    Google Scholar 

  • Mitsuyoshi S, Tanaka Y, Ren F, Shibasaki K, Kato M, Murata T, Minami T, Yagura H. Emotion voice analysis system connected to the human brain. In: IEEE NLP-KE 2007, 2007. P. 479–84.

    Google Scholar 

  • Mitsuyoshi S, Monnma F, Tanaka Y, Minami T, Kato M, Murata T. Identifying neural components of emotion in free conversation with fMRI. In: Defense Science Research conference and expo (DSR) 2011, IEEE, Singapore, Aug 2011. P. 1–4.

    Google Scholar 

  • Miyazaki K. Verbal analysis of pathophysiology in stress resilience program: joint symposium with IT companies. In: 7th Asia Pacific regional conference of the International Association for Suicide Prevention, Tokyo, May 2016.

    Google Scholar 

  • Moses PJ. The voice of neurosis. New York: Grune and Stratton; 1954.

    Google Scholar 

  • Mundt JC, Greist JH, Gelenberg AJ, Katzelnick DJ, Jefferson JW, Model JG. Feasibility and validation of a computer-automated Columbia-suicide severity rating scale using interactive voice response technology. J Psychiatr Res. 2010;44(16):1224–8.

    Article  Google Scholar 

  • Mundt JC, Greist JH, Jefferson JW, Federico M, Mann JJ, Posner K. Prediction of suicidal behavior in clinical research by lifetime suicidal ideation and behavior ascertained by the electronic Columbia-suicide severity rating scale. J Clin Psychiatry. 2013;74(9):887–93.

    Article  Google Scholar 

  • Murray IR, Arnott JL. Implementation and testing of a system for producing emotion-by-rule in synthetic speech. Speech Comm. 1995;16(4):369–90.

    Article  Google Scholar 

  • Nakamura M, Shinohara S, Omiya Y, Mitsuyoshi S, Takagi H, Ushiwatari A, Tokuno S. Correlation between self-administered psychological test and emotion measured by voice analysis. In: International Conference on Information Science and Management Engineering 1 (ICISME 2015), Phuket, Dec 2015.

    Google Scholar 

  • Newman SS, Mather VG. Analysis of spoken language of patients with affective disorders. Am J Psychiatry. 1938;94:912–42.

    Article  Google Scholar 

  • Nilsonne Ă…, Sundberg J, Ternström S, Askenfelt A. Measuring the rate of change of voice fundamental frequency in fluent speech during mental depression. J Acoust Soc Am. 1988;83(2):716–28.

    Article  CAS  Google Scholar 

  • Omiya Y. Development of the Mind Monitoring System (MIMOSYS) which can be able to monitor mental health status using call voice with a smartphone: joint symposium with IT companies. In: 7th Asia Pacific regional conference of the International Association for Suicide Prevention, Tokyo, May 2016.

    Google Scholar 

  • Omiya Y, Hagiwara N, Shinohara S, Nakamura M, Mitsuyoshi S, Tokuno S. Development of mind monitoring system using call voice. In: Neuroscience 2016, San Diego, Nov 2016.

    Google Scholar 

  • Perrin M, DiGrande L, Wheeler K, Thorpe L, Farfel M, Brackbill R. Differences in PTSD prevalence and associated risk factors among World Trade Center disaster rescue and recovery workers. Am J Psychiatr. 2007;164(9):1385–94.

    Article  Google Scholar 

  • Radloff LS. The CES-D scale: a self report depression scale for research in the general population. Appl Psychol Measur. 1977;1:385–401.

    Article  Google Scholar 

  • Scherer KR. Vocal assessment of affective disorders. In: Maser JD, editor. Depression and expressive behavior. Hillsdale: Lawrence Erlbaum Associates; 1987. p. 57–82.

    Google Scholar 

  • Shinohara S, Mitsuyoshi S, Nakamura M, Omiya Y, Tsumatori G, Tokuno S. Validity of a voice-based evaluation method for effectiveness of behavioural therapy. In: Pervasive computing paradigms for mental health. Cham: Springer; 2015. p. 43–51.

    Google Scholar 

  • Shinohara S, Omiya Y, Nakamura M, Hagiwara N, Mitsuyoshi S, Tokuno S. Voice disability index using pitch rate. In: 2016 IEEE EMBS conference on Biomedical Engineering and Sciences (IECBES), IEEE, Kuala Lumpur, Dec 2016. P. 557–60.

    Google Scholar 

  • Suzuki G, Tokuno S, Nibuya M, Ishida T, Yamamoto T, Mukai Y, Mitani K, Tsumatori G, Scott D, Shimizu K. Decreased plasma brain-derived neurotrophic factor and vascular endothelial growth factor concentrations during military training. PLoS One. 2014;9(2):e89455.

    Article  Google Scholar 

  • Szabadi E, Bradshaw CM, Besson JAO. Elongation of pause-time in speech: a simple, objective measure of motor retardation in depression. Br J Psychiatry. 1976;129:592–7.

    Article  CAS  Google Scholar 

  • Tokuno S. Stress evaluation by voice: from prevention to treatment in mental health care. Econophys Sociophys Other Multidiscip Sci J. 2015a;5(1):30–5.

    Google Scholar 

  • Tokuno S. Medical evidence of voice pathophysiology analysis technology: joint symposium with IT companies. In: 7th Asia Pacific regional conference of the International Association for Suicide Prevention, Tokyo, May 2015b.

    Google Scholar 

  • Tokuno S. Verbal analysis of pathophysiology [Japanese]. Saibou. 2016;48(14):9–12.

    Google Scholar 

  • Tokuno S, Tsumatori G, Shono S, Takei E, Suzuki G, Yamamoto T, Shimura M. Usage of emotion recognition in military health care. In: Defense Science Research conference and expo (DSR) 2011, IEEE, Singapore. P. 1–4.

    Google Scholar 

  • Tokuno S, Shimozono S, Tsumatori G. Usage of emotion recognition in stress resilience program. In: 40th WCMM (World Congress in Military Medicine), Saudi Arabia, Dec 2013.

    Google Scholar 

  • Tokuno S, Mitsuyoshi S, Suzuki G, Tsumatori G. Stress evaluation using voice emotion recognition technology: a novel stress evaluation technology for disaster responders. In: XVI World Congress of Psychiatry, Madrid, Sept 2014.

    Google Scholar 

  • Tokuno S, Omiya Y, Shinohara S, Nakamura M, Hagiwara N, Mitsuyoshi S. Psychological impact of Kumamoto earthquake by voice analysis using a smart phone application. In: Neuroscience 2016, San Diego, Nov 2016.

    Google Scholar 

  • Tolkmitt F, Helfrich H, Standke R, Scherer KR. Vocal indicators of psychiatric treatment effects in depressive and schizophrenics. J Commun Disord. 1982;15:209–22.

    Article  CAS  Google Scholar 

  • Weintraub W, Aronson H. The application of verbal behavior analysis to the study of psychological defense mechanisms: IV. Speech patterns associated with depressive behavior. J Nerv Ment Disord. 1967;144:22–8.

    Article  CAS  Google Scholar 

  • Weiss DS. The impact of event scale-revised. In: Wilson JP, Keane TM, editors. Assessing psychological trauma and PTSD. 2nd ed. New York: Guilford Press; 2004. p. 168–89.

    Google Scholar 

Download references

Acknowledgments

I appreciate Shunji Mitsuyoshi, Shuji Shinohara, Mitsuteru Nakamura, Masakazu Higuchi, Yasuhiro Omiya, and Naoki Hagiwara. They are my team and each working on research with original ideas and outstanding skills.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shinichi Tokuno .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this chapter

Cite this chapter

Tokuno, S. (2018). Pathophysiological Voice Analysis for Diagnosis and Monitoring of Depression. In: Kim, YK. (eds) Understanding Depression. Springer, Singapore. https://doi.org/10.1007/978-981-10-6577-4_6

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-6577-4_6

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-6576-7

  • Online ISBN: 978-981-10-6577-4

  • eBook Packages: MedicineMedicine (R0)

Publish with us

Policies and ethics