Top

International Journal of Speech Technology

Published in:

01-03-2013

Non-intrusive speech quality assessment using several combinations of auditory features

Authors: Rajesh Kumar Dubey, Arun Kumar

Published in: International Journal of Speech Technology | Issue 1/2013

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Quality estimation of speech is essential for monitoring and maintenance of the quality of service at different nodes of modern telecommunication networks. It is also required in the selection of codecs in speech communication systems. There is no requirement of the original clean speech signal as a reference in non-intrusive speech quality evaluation, and thus it is of importance in evaluating the quality of speech at any node of the communication network. In this paper, non-intrusive speech quality assessment of narrowband speech is done by Gaussian Mixture Model (GMM) training using several combinations of auditory perception and speech production features, which include principal components of Lyon’s auditory model features, MFCC, LSF and their first and second differences. Results are obtained and compared for several combinations of auditory features for three sets of databases. The results are also compared with ITU-T Recommendation P.563 for non-intrusive speech quality assessment. It is found that many combinations of these feature sets outperform the ITU-T P.563 Recommendation under the test conditions.

previous article Development and evaluation of online text-independent speaker verification system for remote person authentication

next article Speaker recognition utilizing distributed DCT-II based Mel frequency cepstral coefficients and fuzzy vector quantization

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

http://www.ee.columbia.edu/~dpwe/resources/matlab/rastamat.

http://www.utdallas.edu/~loizou/speech/noizeus.

ITU-T Recommendation P.800 (1996). Methods for subjective determination of transmission quality. International Telecommunication Union-Telecommunication Standardization Sector, Geneva.

Fegyo, T., Szarvas, M., Tatai, P., & Gordos, G. (2000). Objective speech quality estimation for analog mobile channels: problems and solutions. International Journal of Speech Technology, 3, 277–287. MATHCrossRef

ITU-T Recommendation P.862 (1996). Perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrow-band telephone networks. International Telecommunication Union-Telecommunication Standardization Sector, Geneva.

ITU-T Recommendation P.861 (1996). Objective quality measurement of telephone band (300–3400 Hz) speech codecs. International Telecommunication Union-Telecommunication Standardization Sector, Geneva.

ITU-T Recommendation P.563 (2004). Single ended method for objective speech quality assessment in narrow-band telephony applications. International Telecommunication Union- Telecommunication Standardization Sector, Geneva.

Liang, J., & Kubichek, R. (1994). Output based objective speech quality. In IEEE 44th vehicular technology conf. (Vol. 3(8–10), pp. 1719–1723).

Au, O. C., & Lam, K. (1998). A novel output based objective speech quality measure for wireless communication. In Proc. 4th international conf. on signal processing (Vol. 1, pp. 666–669).

Gray, P., Hollier, M., & Massara, R. (2000). Non-intrusive speech quality assessment using vocal-tract models. IEE Proceedings. Vision, Image and Signal Processing, 147(6), 493–501. CrossRef

Malfait, L., Berger, J., & Kastner, M. (2006). P.563-The ITU-T standard for single-ended speech quality assessment. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 1924–1934. CrossRef

Kim, D. S. (2005). ANIQUE: an auditory model for single ended speech quality estimation. IEEE Transactions on Audio, Speech, and Language Processing, 13(5), 821–831. CrossRef

Grancharov, V., Zhao, D. Y., Lindblom, J., & Kleijn, W. B. (2006). Low-complexity, non-intrusive speech quality assessment. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 1948–1956. CrossRef

Falk, T., Xu, Q., & Chan, W. Y. (2005). Non-intrusive GMM based speech quality measurement. In Proc. IEEE international conf. on acoustics, speech and signal processing (Vol. 1, pp. 125–128).

Chen, G., & Parsa, V. (2006). Bayesian model based non-intrusive speech quality evaluation. In Proc. IEEE international conf. on acoustics, speech and signal processing (Vol. 1, pp. 385–388).

Falk, T., & Chan, W. Y. (2006a). Enhanced non-intrusive speech quality measurement using degradation models. In Proc. IEEE international conf. on acoustics, speech and signal processing (Vol. 1, pp. 837–840).

Falk, T., & Chan, W. Y. (2006b). Single-ended speech quality measurement using machine learning methods. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 1935–1947. CrossRef

Chen, G., & Parsa, V. (2005). Non-intrusive speech quality evaluation using an adaptive neuro-fuzzy inference system. IEEE Signal Processing Letters, 12(5), 403–406. CrossRef

Slaney, M. (1988). Lyon’s cochlear model. Advanced technology group, apple technical report no. 13, Apple Computer Inc.

Lyon, R. F. (1982). A computational model of filtering, detection, and compression in the cochlea. In Proc. IEEE international conf. on acoustics, speech and signal processing (pp. 1282–1285).

Jing, Z., & Johnson, M. H. (2002). Auditory modeling inspired methods of feature extraction for robust automatic speech recognition. In Proc. IEEE international conf. on acoustics, speech and signal processing (Vol. 4, pp. 4176–4179).

ITU-T Recommendation P. Supplement-23 (1998). ITU-T Coded-Speech database (1998). International telecommunication union-telecommunication standardization sector, Geneva.

Hu, Y., & Loizou, P. C. (2008). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 229–238. CrossRef

Hu, Y., & Loizou, P. C. (2007). Subjective comparison and evaluation of speech enhancement algorithms. Speech Communications, 49, 588–601. CrossRef

Narwaria, M., Lin, W., McLoughlin, I. V., Emmanuel, S., & Tien, C. L. (2010). Non-intrusive speech quality assessment with support vector regression. In Advances in multimedia modeling, 16th International multimedia modeling conf. (Vol. 5916, pp. 325–335). Berlin: Springer. CrossRef

Hasan, M. R., Jamil, M., Rabbani, M. G., & Rahman, M. S. (2004). Speaker identification using mel frequency cepstral coefficients. In 3rd international conf. on electrical & computer engineering (pp. 565–568). Dhaka: ICECE.

Bozkurt, E., Erzin, E., Erdem, C. E., & Erdem, A. T. (2010). Use of line spectral frequencies for emotion recognition from speech. In IEEE international conf. on pattern recognition, Turkey (pp. 3708–3711). CrossRef

Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis of speech. The Journal of the Acoustical Society of America, 87, 1738–1752. CrossRef

Audhkhasi, K., & Kumar, A. (2010). Two scale auditory features based non-intrusive speech quality evaluation. IETE Journal of Research, 56(2), 111–118. CrossRef

Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B. Methodological, 39(1), 1–38. MathSciNetMATH

Karmakar, A., Kumar, A., & Patney, R. K. (2006). A multiresolution model of auditory excitation pattern and its application to objective evaluation of perceived speech quality. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 1912–1923. CrossRef

Rabiner, L. R., & Schafer, R. W. (1978). Digital processing of speech signals. Englewood: Prentice-Hall.

Title: Non-intrusive speech quality assessment using several combinations of auditory features
Authors: Rajesh Kumar Dubey
Arun Kumar
Publication date: 01-03-2013
Publisher: Springer US
Published in: International Journal of Speech Technology / Issue 1/2013
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-012-9162-4

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 1/2013

Speaker recognition utilizing distributed DCT-II based Mel frequency cepstral coefficients and fuzzy vector quantization

Phoneme recognition using zerocrossing interval distribution of speech patterns and ANN

Dynamic prosody modification using zero frequency filtered signal

Emotion modeling from speech signal based on wavelet packet transform

Effect of aging on speech features and phoneme recognition: a study on Bengali voicing vowels

Blind separation of audio signals using trigonometric transforms and Kalman filtering