Abstract
Voice acoustic analysis is typically a labor-intensive, time-consuming process that requires the application of idiosyncratic parameters tailored to individual aspects of the speech signal. Such processes limit the efficiency and utility of voice analysis in clinical practice as well as in applied research and development. In the present study, we analyzed 1,120 voice files, using standard techniques (case-by-case hand analysis), taking roughly 10 work weeks of personnel time to complete. The results were compared with the analytic output of several automated analysis scripts that made use of preset pitch-range parameters. After pitch windows were selected to appropriately account for sex differences, the automated analysis scripts reduced processing time of the 1,120 speech samples to less than 2.5 h and produced results comparable to those obtained with hand analysis. However, caution should be exercised when applying the suggested preset values to pathological voice populations.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Atkinson, I. A., Kondoz, A. M., & Evans, B. G. (1995). Pitch detection of speech signals using segmented autocorrelation. Electronics Letters, 31, 533–535.
Bassich, C. J., & Ludlow, C. L. (1986). The use of perceptual methods by new clinicians for assessing voice quality. Journal of Speech & Hearing Disorders, 51, 125–133.
Boersma, P. (1993). Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. Institute of Phonetic Sciences, University of Amsterdam, Proceedings, 17, 97–110.
Boersma, P., & Weenink, D. (2008). Praat: Doing phonetics by computer (Version 4.6.09) [Computer program]. Retrieved January 4, 2009, from www.praat.org/.
Cannizzaro, M. S., Harel, B., Reilly, N., Chappell, P., & Snyder, P. J. (2004). Voice acoustical measurement of the severity of major depression. Brain & Cognition, 56, 30–35.
Cannizzaro, M. S., Reilly, N., Mundt, J. C., & Snyder, P. J. (2005). Remote capture of human voice acoustical data by telephone: A methods study. Clinical Linguistics & Phonetics, 19, 649–658.
Carding, P. N., Carlson, E., Epstein, R., Mathieson, L., & Shewell, C. (2001). Re: Evaluation of voice quality. International Journal of Language & Communication Disorders, 36, 127–134.
Deliyski, D. D., Evans, M. K., & Shaw, H. S. (2005). Influence of data acquisition environment on accuracy of acoustic voice quality measurements. Journal of Voice, 19, 176–186.
Deliyski, D. D., Shaw, H. S., & Evans, M. K. (2005a). Adverse effects of environmental noise on acoustic voice quality measurements. Journal of Voice, 19, 15–28.
Deliyski, D. D., Shaw, H. S., & Evans, M. K. (2005b). Influence of sampling rate on accuracy and reliability of acoustic voice analysis. Logopedics, Phoniatrics, Vocology, 30, 55–62.
Eskenazi, L., Childers, D. G., & Hicks, D. M. (1990). Acoustic correlates of vocal quality. Journal of Speech & Hearing Research, 33, 298–306.
Fette, B., Gibson, R., & Greenwood, E. (1980). Windowing functions for the average magnitude difference function pitch extractor. Paper presented at the IEEE International Conference on Acoustics, Speech, & Signal Processing.
Gerhard, D. (2003). Pitch extraction and fundamental frequency: History and current techniques (TR-CS 2003-06). Regina, SK.
Green, J. R., Beukelman, D. R., & Ball, L. J. (2004). Algorithmic estimation of pauses in extended speech samples of dysarthric and typical speech. Journal of Medical Speech—Language Pathology, 12, 149–154.
Hirose, K., Fujisaki, H., & Seto, S. (1992). A scheme for pitch extraction of speech using autocorrelation function with frame length proportional to the time lag. Acoustics, Speech, & Signal Processing, 1, 149–152.
Hirst, D. (2002). Automatic analysis of prosody for multi-lingual speech corpora. In E. Keller, G. Bailly, A. Monaghan, J. Terken, & M. Huckvale (Eds.), Improvements in speech synthesis (pp. 320–327). New York: Wiley.
John, A., Sell, D., Sweeney, T., Harding-Bell, A., & Williams, A. (2006). The Cleft Audit Protocol for Speech—Augmented: A validated and reliable measure for auditing cleft speech. Cleft Palate—Craniofacial Journal, 43, 272–288.
Karnell, M. P., Hall, K. D., & Landahl, K. L. (1995). Comparison of fundamental frequency and perturbation measurements among three analysis systems. Journal of Voice, 9, 383–393.
Karnell, M. P., Scherer, R. S., & Fischer, L. B. (1991). Comparison of acoustic voice perturbation measures among three independent voice laboratories. Journal of Speech & Hearing Research, 34, 781–790.
Kent, R. D. (1996). Hearing and believing: Some limits to the auditory— perceptual assessment of speech and voice disorders. American Journal of Speech—Language Pathology, 5, 7–23.
Kent, R. D., & Read, C. (2002). Acoustic analysis of speech. San Diego: Singular Thomson Learning.
Kent, R. D., Vorperian, H. K., & Duffy, J. R. (1999). Reliability of the multi-dimensional voice program for the analysis of voice samples of subjects with dysarthria. American Journal of Speech—Language Pathology, 8, 129–136.
Mendoza, E., Muñoz, J., & Valencia Naranjo, N. (1996). The longterm average spectrum as a measure of voice stability. Folia Phoniatrica et Logopaedica, 48, 57–64.
Morris, R. J., & Brown, W. S. (1996). Comparison of various automatic means for measuring mean fundamental frequency. Journal of Voice, 10, 159–165.
Mueller, P. B. (1997). The aging voice. Seminars in Speech & Language, 18, 159–168.
Mundt, J. C., Snyder, P. J., Cannizzaro, M. S., Chappie, K., & Geralts, D. S. (2007). Voice acoustic measures of depression severity and treatment response collected via interactive voice response (IVR) technology. Journal of Neurolinguistics, 20, 50–64.
Perry, C. K., Ingrisano, D. R., & Scott, S. R. (1996). Accuracy of jitter estimates using different filter settings on Visi-Pitch: A preliminary report. Journal of Voice, 10, 337–341.
Rabiner, L. R. (1977). On the use of autocorrelation analysis for pitch detection. IEEE Transactions on Acoustics, Speech, & Signal Processing, 25, 24–33.
Scherer, R. C., Vail, V. J., & Guo, C. G. (1995). Required number of tokens to determine representative voice perturbation values. Journal of Speech & Hearing Research, 38, 1260–1269.
Takagi, T., Seiyama, N., & Miyasaka, E. (2000). A method for pitch extraction of speech signals using autocorrelation functions through multiple window lengths. Electronics & Communications in Japan, 83, 67–79.
Titze, I. R. (1994). The G. Paul Moore lecture: Toward standards in acoustic analysis of voice. Journal of Voice, 8, 1–7.
Titze, I. R., & Liang, H. (1993). Comparison of Fo extraction methods for high-precision voice perturbation measurements. Journal of Speech & Hearing Research, 36, 1120–1133.
Vogel, A. P., & Maruff, P. (2008). Comparison of voice acquisition methodologies in speech research. Behavior Research Methods, 40, 982–987.
Zraick, R. I., Wendel, K., & Smith-Olinde, L. (2005). The effect of speaking task on perceptual judgment of the severity of dysphonic voice. Journal of Voice, 19, 574–581.
Author information
Authors and Affiliations
Corresponding author
Additional information
Data collection for the present study was supported by Small Business Innovation Research Grant R43MH68950 from the National Institute of Mental Health.
Rights and permissions
About this article
Cite this article
Vogel, A.P., Maruff, P., Snyder, P.J. et al. Standardization of pitch-range settings in voice acoustic analysis. Behavior Research Methods 41, 318–324 (2009). https://doi.org/10.3758/BRM.41.2.318
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.3758/BRM.41.2.318