Abstract
The hypernasality is one of the most typical characteristics of cleft palate (CP) speech. The evaluation outcome of hypernasality grading decides the necessity of follow-up surgery. Currently, the evaluation of CP speech is carried out by experienced speech therapists. However, the result strongly depends on their clinical experience and subjective judgment. This work aims to propose an automatic evaluation system for hypernasality grading in CP speech. The database tested in this work is collected by the Hospital of Stomatology, Sichuan University, which has the largest number of CP patients in China. Based on the production process of hypernasality, source sound pulse and vocal tract filter features are presented. These features include pitch, the first and second energy amplified frequency bands, cepstrum based features, MFCC, short-time energy in the sub-bands features. These features combined with KNN classier are applied to automatically classify four grades of hypernasality: normal, mild, moderate and severe. The experiment results show that the proposed system achieves a good performance. The classification rates for four hypernasality grades reach up to 80.4 %. The sensitivity of proposed features to the gender is also discussed.
Similar content being viewed by others
References
Harding, A., and Grunwell, P., Characteristics of cleft palate speech. Eur. J. Disord. Commun. 31(4):331–357, 1996.
Shi, X. H., Chen, N., Xing, S. Z., Yan, Y., and Yao, W. P., Study on spectrum features of speech before and after repair in cleft palate patients. Stomatology 28(2):65–69, 2008.
Yang, Y., Chen, Y., Jiang, L. P., Liu, Q., and Wang, G. M., Acoustic analysis of specific sentences in patients with velopharyngeal incompetence after Palatoplasty. J. Oral Maxillofac. Surg. 24(1):27–34, 2014.
Yang, X. C., Li, N. Y., Bu, L. X., and Tong, Q. C., The analysis of format characteristics of vowels in the speech of patient with cleft palate. WCJS 21(6):451–453, 2003.
Wen, B., Zhang, X. J., Lu, M. C., and Shuang, S. W., Spectro-mode features of unaspirated consonants by speakers with cleft palate. J. Jilin Univ. (Med. Ed.) 30(5):769–771, 2004.
Tian, X. F., Wang, G. M., Chen, Y., Wu, Y. L., Hu, Q. G., and Wang, T. M., Acoustic characteristics of misarticulation in patients without velopharyngeal incompetence. Acad. J. Shanghai Second Med. Univ. 24(2):114–116, 2004.
Zhu, Y. S., Shi, J. J., Wei, W. H., Yan, S., and He, B. H., Analyzing misarticulation of post-operation cleft palate speech applying acoustic technology. Clin. Stomatol. 17(1):39–40, 2001.
Yang, Z. J., Chen, R. J., and Mu, Y., Evaluation of the velopharyngeal function with the Sonagraph. Beijing J. Stomatol. 18(1):36–38, 2010.
Feng, X. H., Wei, W., Wei, J. H., and He, Y. H., Acoustic technology analysis consonants articulated by Postoperative Patuents with cleft palate. Chin. J. Aesthet. Med. 12(6):611–613, 2003.
Zhu, H. P., Ma, L., Luo, Y., Chen, R. J., Sun, Y. G., and Wang, G. H., The correlation between spectrum manifestations of Nasalised vowels and listener judgments of hypernasality. Chin. J. Oral Maxillofac. Surg. 1(1):56–58, 2013.
Ma, S. W., and Wen, Y. X., Analysis study of /i/ nasal formant in different types of CLEFF palate children. Chin. J. Aesthet. Med. 20(1):48–53, 2011.
Vijayalakshmi, P. R., Ramasubba, M., and Douglas, O., Acoustic analysis and detection of hypernasality using a group delay function. IEEE Trans. Biomed. Eng. 54(4):621–629, 2007.
Maier, A., Honig, F., Hacker C., Shuster M. and Noth E., Automatic Evaluation of Characteristic Speech Disorders in Children with Cleft Lip and Palate. Proc. of 11th Int. Conf. on Spoken Language Processing, Brisbane, Australia, pp. 1757–1760, 2008.
Maier, A., T. Bocklet, F. H., Nöth, E., Stelzle, F., Nkenke, E., and Schuster, M., Automatic detection of articulation disorders in children with cleft lip and palate. J. Acoust. Soc. Am. 126(5):2589–2602, 2009.
Murillo, S., Orozco, J. R., Vargas, J. F., Arias, J. D., and Castellanos, C. G., Automatic detection of hypernasality in children. Lect. Notes Comput. Sci 6687:167–174, 2011. Ed. Springer Berlin/Heidelberg.
Orozco-Arroyave, J.R., Murillo-Rendon, S., Alvarez-Meza A., Arias-Londono, J.D., Delgado-Trejos, E., Vargas-Bonilla, J.F. and Castellanos-Domnguez, C.G., Automatic selection of acoustic and non-linear dynamic features in voice signals for hypernasality detection. Proc. of 14th INTERSPEECH, Florence, Italy, pp. 529–532, 2011.
Delgado, E., Sepúlveda, F. A., Röthlisberger, S., and Castellanos, G., The Rademacher complexity model over acoustic features for improving robustness in hypernasal speech detection. In: Computers and simulation in modern science, vol. 5. WSEAS Press, Cambridge, pp. 130–135, 2011.
He, L., Zhang, J., Liu, Q., Yin, H., and Lech, M., Automatic evaluation of hypernasality and speech intelligibility for children with cleft palate. 2013 8th IEEE Conference on Industrial Electronics and Applications (ICIEA), Melbourne, Australia, pp. 220–223, 2013.
Kuehn, D. P., Tomes, L., Jones, D. L., O’Gara, M., Seaver, E. J., Smith, B. E., Van Demark, D. R., and Wachtel, J. M., Efficacy of continuous positive airway pressure for treatment of hypernasality. Cleft Palate Craniofac. J. 39:267–276, 2002.
He, L., Zhang, J., Liu, Q., Yin, H., and Lech, M., Automatic evaluation of hypernasality and consonant misarticulation in cleft palate speech. IEEE Signal Proc. Lett. 21(10):1298–1301, 2014.
Wei, C., Yang, Y., Mandarin isolated words recognition method based on pitch contour. 2012 I.E. 2nd International Conference on Cloud Computing and Intelligent Systems (CCIS), Hangzhou, China, Vol. 01: 143–147, 2012.
Jurafsky, D., and Martin, J. H., Speech and language processing–an introduction to natural language processing, computational linguistics, and speech recognition, 2nd edition. Pearson Education, Inc, New York, 2009.
Quatieri, T. F., Discrete-time speech signal processing: Principles and practice. Prentice Hall, Upper Saddle River, 2002.
Zahorian, S. A., Hu, H., and Zahorian, S. A., A spectral/temporal method for robust fundamental frequency tracking. J. Acoust. Soc. Am. 123(6):4559–4571, 2008.
Murphy, J. P., and Akande, O., Noise estimation in voice signals using short-term cepstral analysis. J. Acoust. Soc. Am. 121(3):1679–1689, 2007.
Chakroborty, S., and Saha, G., Improved text-independent speaker identification using fused MFCC & IMFCC feature sets based on gaussian filter. Int. J. Signal Process. 5(1):11–19, 2009.
Beritelli, F., Casale, S., Russo, A., Serrano, S., Ettorre, D., Speech emotion recognition using MFCCs extracted from a mobile terminal based on ETSI front end. Processing of the 8th International Conference on Signal Processing, Beijing, China, Vol. 2, pp: 1–4, 16–20, 2006.
Gelfand, S. A., Hearing: an introduction to psychological and physiological acoustics, 4th edition. Marcel Dekker, New York, 2004.
Saraydemir, S., Taspinar, N., Erogul, O., Kayserili, H., and Dinckan, N., Down syndrome diagnosis based on gabor wavelet transform. J. Med. Syst. 36(5):3205–3213, 2012.
Nagarajan, R., Hariharan, M., and Satiyan, M., Luminance sticker based facial expression recognition using discrete wavelet transform for physically disabled persons. J. Med. Syst. 36(4):2225–2234, 2012.
Liu, L., Zhang, H., and Jiang, H., Content analysis for audio classification and segmentation. IEEE Trans. Speech Audio Process. 10(7):504–516, 2002.
Hassan, A., and Damper, R. I., Classification of emotional speech using 3DEC hierarchical classifier. Speech Comm. 54:903–916, 2012.
Author information
Authors and Affiliations
Corresponding authors
Additional information
This article is part of the Topical Collection on Education & Training
Rights and permissions
About this article
Cite this article
He, L., Zhang, J., Liu, Q. et al. Automatic Evaluation of Hypernasality Based on a Cleft Palate Speech Database. J Med Syst 39, 61 (2015). https://doi.org/10.1007/s10916-015-0242-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10916-015-0242-2