Abstract
Self-assessment questionnaires are commonly used for screening for stress and depression. However, there are problems of reporting bias that respondents underestimate or overestimate consciously or unconsciously. On the other hand, various biomarkers of depression and stress are being studied. These are often necessary for expensive equipment and chemicals and are useful for definitive diagnosis and elucidation of mechanisms, but they are not suitable for screening for many populations.
It is a known fact that various diseases change the voice. The relationship between disease and voice has been studied in the field of acoustic phonetics since long ago. They have been studied mainly in the frequency band (F1, F2, etc.) which are obtained by Cepstrum analysis of voice. They are influenced by the shape of the vocal tract called the formant (the cavity from the vocal cord to the mouth). On the other hand, studies using the fundamental frequency (F0) which is obtained as a lowest frequency by FFT also have been reported. F0 is affected vocal cord vibration, and currently there are various methods of F0 analysis. F0 contains a lot of involuntary components compared to the formant. Therefore, analysis of F0 is potentially available to diagnose various diseases. Now, the range of adaptation of voice analysis has expanded from the otolaryngology area to psychiatric areas such as depression and neurological diseases such as Parkinson’s disease. In addition, research such as differential diagnosis by voice and measurement of therapeutic effect has started.
Such developments are largely due to the development of computers, especially the spread of smartphones. In other words, voice collection and analysis became possible in everyday life. For example, several smartphone applications that measure stress and depression by analyzing everyday speech have been published. In Japan, the movement to utilize such applications in the fields of healthcare and industrial medicine is becoming active. Our group has already developed Mind Monitoring System using smartphone and operates that system in Japan. This system is based on the emotion recognition technology instead of directly voice analysis. Pathophysiological analysis by voice is noninvasive, remote and continuously, without requiring special equipment. Therefore, this technique is effective as screening for many subjects and long-term continuous monitoring at home. This means that this technology can be a bridge between healthcare and medical treatment. In clinical, it is also possible to give objective indicators to medical areas that had only subjective indicators.
References
Beck AT. A systematic investigation of depression. Compr Psychiatry. 1961;2(3):163–70.
Beck AT, Ward CH, Mendelson M, Mock J, Erbaugh J. An inventory for measuring depression. Arch Gen Psychiatry. 1961;4(6):561–71.
Burkhardt F, Sendlmeier WF. Verification of acoustical correlates of emotional speech using formant-synthesis. In: ISCA Tutorial and Research Workshop (ITRW) on speech and emotion, 2000.
Cahn JE. The generation of affect in synthesized speech. J Am Voice I/O Soc. 1990;8:1–19.
Cobb S, Lindemann E. Neuropsychiatric observations (in a symposium on the management of the cocoanut grove burns at the Massachusetts General Hospital). Ann Surg. 1943;117(2):814.
Cummins N, Epps J, Breakspear M, Goecke R. An investigation of depressed speech detection: features and normalization. In: Interspeech, 2011. P. 2997–3000.
Darby JK, editor. Speech evaluation in psychiatry. New York: Grune and Stratton; 1981.
Darby JK, Hollien H. Vocal and speech patterns of depressive patients. Folia Phoniatr. 1977;2(9):279–91.
Darby JK, Simmons N, Berger P. Speech and voice parameters in depression a: pilot study. J Commun Disord. 1984;17:87–94.
Eyben F, Wöllmer M, Schuller B. Opensmile: the Munich versatile and fast open-source audio feature extractor. In: Proceedings of the 18th ACM international conference on multimedia, ACM, Oct 2010. P. 1459–62.
Flint AJ, Black SE, Campbell-Taylor I, Gailey GF, Levinton C. Abnormal speech articulation, psychomotor retardation, and subcortical dysfunction in major depression. J Psychiatr Res. 1993;27(3):309–19.
Goldberg DP, Blackwell B. Psychiatric illness in general practice: a detailed study using a new method of case identification. BMJ. 1970;2(5707):439–43.
Hagiwara N, Omiya Y, Shinohara S, Nakamura M, Kogure U, Mitsuyoshi S, Tokuno S. Effectiveness verification by the difference of the recording method in the monitoring system of the mental health state by voice using the smartphone [Japanese]. In: Japan Biomedical Engineering Symposium 2016 (JBEMS 2016), Asahikawa, Sept 2016a.
Hagiwara N, Omiya Y, Shinohara S, Nakamura M, Yasunaga H, Mitsuyoshi S, Tokuno S. Validity of the mind monitoring system as a mental health indicator. In: 2016 IEEE 16th international conference on Bioinformatics and Bioengineering (BIBE), Taichung, Oct 2016b. P. 262–5.
Hargreaves W, Starkweather J, Blacker K. Voice quality in depression. J Abnorm Psychol. 1965;70:218–20.
Hoge CW, Castro CA, Messer SC, McGurk D, Cotting DI, Koffman RL. Combat duty in Iraq and Afghanistan, mental health problems, and barriers to care. N Engl J Med. 2004;351(1):13–22.
Kessler RC, Andrews G, Colpe LJ, Hiripi E, Mroczek DK, Normand SL, Walters EE, Zaslavsky AM. Short screening scales to monitor population prevalences and trends in non-specific psychological distress. Psychol Med. 2002;32(6):959–76.
Low LSA, Maddage NC, Lech M, Sheeber LB, Allen NB. Detection of clinical depression in adolescents’ speech during family interactions. IEEE Trans Biomed Eng. 2011;58(3):574–86.
Maxhuni A, Muñoz-Meléndez A, Osmani V, Perez H, Mayora O, Morales EF. Classification of bipolar disorder episodes based on analysis of voice and motor activity of patients. Pervasive Mob Comput. 2016;31:50–66.
McLay RN, Deal WE, Murphy JA, Center KB, Kolkow TT, Grieger TA. On-the-record screenings versus anonymous surveys in reporting PTSD. Am J Psychiatry. 2008;165(6):775–6.
Mitsuyoshi S. Emotion recognizing method, sensibility creating method, device, and software. WO0223524, Mar 2002.
Mitsuyoshi S. Development of verbal analysis pathophysiology. Econophys Sociophys Other Multidiscip Sci J. 2015;5(1):11–6.
Mitsuyoshi S. Development of voice pathophysiology analysis technology: joint symposium with IT companies. In: 7th Asia Pacific regional conference of the International Association for Suicide Prevention, Tokyo, May 2016.
Mitsuyoshi S, Ren F, Tanaka Y, Kuroiwa S. Non-verbal voice emotion analysis system. Int J Innov Comput Inf Control. 2006;2(4):819–30.
Mitsuyoshi S, Tanaka Y, Ren F, Shibasaki K, Kato M, Murata T, Minami T, Yagura H. Emotion voice analysis system connected to the human brain. In: IEEE NLP-KE 2007, 2007. P. 479–84.
Mitsuyoshi S, Monnma F, Tanaka Y, Minami T, Kato M, Murata T. Identifying neural components of emotion in free conversation with fMRI. In: Defense Science Research conference and expo (DSR) 2011, IEEE, Singapore, Aug 2011. P. 1–4.
Miyazaki K. Verbal analysis of pathophysiology in stress resilience program: joint symposium with IT companies. In: 7th Asia Pacific regional conference of the International Association for Suicide Prevention, Tokyo, May 2016.
Moses PJ. The voice of neurosis. New York: Grune and Stratton; 1954.
Mundt JC, Greist JH, Gelenberg AJ, Katzelnick DJ, Jefferson JW, Model JG. Feasibility and validation of a computer-automated Columbia-suicide severity rating scale using interactive voice response technology. J Psychiatr Res. 2010;44(16):1224–8.
Mundt JC, Greist JH, Jefferson JW, Federico M, Mann JJ, Posner K. Prediction of suicidal behavior in clinical research by lifetime suicidal ideation and behavior ascertained by the electronic Columbia-suicide severity rating scale. J Clin Psychiatry. 2013;74(9):887–93.
Murray IR, Arnott JL. Implementation and testing of a system for producing emotion-by-rule in synthetic speech. Speech Comm. 1995;16(4):369–90.
Nakamura M, Shinohara S, Omiya Y, Mitsuyoshi S, Takagi H, Ushiwatari A, Tokuno S. Correlation between self-administered psychological test and emotion measured by voice analysis. In: International Conference on Information Science and Management Engineering 1 (ICISME 2015), Phuket, Dec 2015.
Newman SS, Mather VG. Analysis of spoken language of patients with affective disorders. Am J Psychiatry. 1938;94:912–42.
Nilsonne Å, Sundberg J, Ternström S, Askenfelt A. Measuring the rate of change of voice fundamental frequency in fluent speech during mental depression. J Acoust Soc Am. 1988;83(2):716–28.
Omiya Y. Development of the Mind Monitoring System (MIMOSYS) which can be able to monitor mental health status using call voice with a smartphone: joint symposium with IT companies. In: 7th Asia Pacific regional conference of the International Association for Suicide Prevention, Tokyo, May 2016.
Omiya Y, Hagiwara N, Shinohara S, Nakamura M, Mitsuyoshi S, Tokuno S. Development of mind monitoring system using call voice. In: Neuroscience 2016, San Diego, Nov 2016.
Perrin M, DiGrande L, Wheeler K, Thorpe L, Farfel M, Brackbill R. Differences in PTSD prevalence and associated risk factors among World Trade Center disaster rescue and recovery workers. Am J Psychiatr. 2007;164(9):1385–94.
Radloff LS. The CES-D scale: a self report depression scale for research in the general population. Appl Psychol Measur. 1977;1:385–401.
Scherer KR. Vocal assessment of affective disorders. In: Maser JD, editor. Depression and expressive behavior. Hillsdale: Lawrence Erlbaum Associates; 1987. p. 57–82.
Shinohara S, Mitsuyoshi S, Nakamura M, Omiya Y, Tsumatori G, Tokuno S. Validity of a voice-based evaluation method for effectiveness of behavioural therapy. In: Pervasive computing paradigms for mental health. Cham: Springer; 2015. p. 43–51.
Shinohara S, Omiya Y, Nakamura M, Hagiwara N, Mitsuyoshi S, Tokuno S. Voice disability index using pitch rate. In: 2016 IEEE EMBS conference on Biomedical Engineering and Sciences (IECBES), IEEE, Kuala Lumpur, Dec 2016. P. 557–60.
Suzuki G, Tokuno S, Nibuya M, Ishida T, Yamamoto T, Mukai Y, Mitani K, Tsumatori G, Scott D, Shimizu K. Decreased plasma brain-derived neurotrophic factor and vascular endothelial growth factor concentrations during military training. PLoS One. 2014;9(2):e89455.
Szabadi E, Bradshaw CM, Besson JAO. Elongation of pause-time in speech: a simple, objective measure of motor retardation in depression. Br J Psychiatry. 1976;129:592–7.
Tokuno S. Stress evaluation by voice: from prevention to treatment in mental health care. Econophys Sociophys Other Multidiscip Sci J. 2015a;5(1):30–5.
Tokuno S. Medical evidence of voice pathophysiology analysis technology: joint symposium with IT companies. In: 7th Asia Pacific regional conference of the International Association for Suicide Prevention, Tokyo, May 2015b.
Tokuno S. Verbal analysis of pathophysiology [Japanese]. Saibou. 2016;48(14):9–12.
Tokuno S, Tsumatori G, Shono S, Takei E, Suzuki G, Yamamoto T, Shimura M. Usage of emotion recognition in military health care. In: Defense Science Research conference and expo (DSR) 2011, IEEE, Singapore. P. 1–4.
Tokuno S, Shimozono S, Tsumatori G. Usage of emotion recognition in stress resilience program. In: 40th WCMM (World Congress in Military Medicine), Saudi Arabia, Dec 2013.
Tokuno S, Mitsuyoshi S, Suzuki G, Tsumatori G. Stress evaluation using voice emotion recognition technology: a novel stress evaluation technology for disaster responders. In: XVI World Congress of Psychiatry, Madrid, Sept 2014.
Tokuno S, Omiya Y, Shinohara S, Nakamura M, Hagiwara N, Mitsuyoshi S. Psychological impact of Kumamoto earthquake by voice analysis using a smart phone application. In: Neuroscience 2016, San Diego, Nov 2016.
Tolkmitt F, Helfrich H, Standke R, Scherer KR. Vocal indicators of psychiatric treatment effects in depressive and schizophrenics. J Commun Disord. 1982;15:209–22.
Weintraub W, Aronson H. The application of verbal behavior analysis to the study of psychological defense mechanisms: IV. Speech patterns associated with depressive behavior. J Nerv Ment Disord. 1967;144:22–8.
Weiss DS. The impact of event scale-revised. In: Wilson JP, Keane TM, editors. Assessing psychological trauma and PTSD. 2nd ed. New York: Guilford Press; 2004. p. 168–89.
Acknowledgments
I appreciate Shunji Mitsuyoshi, Shuji Shinohara, Mitsuteru Nakamura, Masakazu Higuchi, Yasuhiro Omiya, and Naoki Hagiwara. They are my team and each working on research with original ideas and outstanding skills.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Tokuno, S. (2018). Pathophysiological Voice Analysis for Diagnosis and Monitoring of Depression. In: Kim, YK. (eds) Understanding Depression. Springer, Singapore. https://doi.org/10.1007/978-981-10-6577-4_6
Download citation
DOI: https://doi.org/10.1007/978-981-10-6577-4_6
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6576-7
Online ISBN: 978-981-10-6577-4
eBook Packages: MedicineMedicine (R0)