New single-ended objective measure for non-intrusive speech quality evaluation

Mahdi, Abdulhussain E.; Picovici, Dorel

doi:10.1007/s11760-008-0092-1

New single-ended objective measure for non-intrusive speech quality evaluation

Original Paper
Published: 06 November 2008

Volume 4, pages 23–38, (2010)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Abdulhussain E. Mahdi¹ &
Dorel Picovici¹

192 Accesses
7 Citations
3 Altmetric
Explore all metrics

Abstract

This article proposes a new output-based method for non-intrusive assessment of speech quality of voice communication systems and evaluates its performance. The method requires access to the processed (degraded) speech only, and is based on measuring perception-motivated objective auditory distances between the voiced parts of the output speech to appropriately matching references extracted from a pre-formulated codebook. The codebook is formed by optimally clustering a large number of parametric speech vectors extracted from a database of clean speech records. The auditory distances are then mapped into objective Mean Opinion listening quality scores. An efficient data-mining tool known as the self-organizing map (SOM) achieves the required clustering and mapping/reference matching processes. In order to obtain a perception-based, speaker-independent parametric representation of the speech, three domain transformation techniques have been investigated. The first technique is based on a perceptual linear prediction (PLP) model, the second utilises a bark spectrum (BS) analysis and the third utilises mel-frequency cepstrum coefficients (MFCC). Reported evaluation results show that the proposed method provides high correlation with subjective listening quality scores, yielding accuracy similar to that of the ITU-T P.563 while maintaining a relatively low computational complexity. Results also demonstrate that the method outperforms the PESQ in a number of distortion conditions, such as those of speech degraded by channel impairments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Abbreviations

AGC:: Automatic gain control
AD:: Auditory distance
ASD:: Auditory spectrum distance
ASR:: Automatic speech recognition systems
BSD:: Bark spectral distance
BMU:: Best matching unit
CQ:: Conversational quality
D_MM :: Euclidean-based median minimum distance
DFT:: Discrete Fourier transform
FFT:: Fast Fourier transform
IDFT:: Inverse discrete Fourier transform
ITU-T:: International Telecommunication Union-Telecommunication Standardization Sector
LQ:: Listening quality
LP:: Linear prediction
MOS:: Mean opinion score
MOS_LQO:: Objective mean opinion listening quality score [2]
MOS_LQS:: Subjective mean opinion listening quality score [2]
MFCC:: Mel-frequency cepstrum coefficients
MNRU:: Modulated noise reference unit [28]
NN:: Neural network
PLP:: Perceptual linear prediction
PAQM:: Perceptual audio quality measure
PSQM:: Perceptual speech quality measure
PAMS:: Perceptual analysis measurement systems
PESQ:: Perceptual evaluation of speech quality
POSQE:: Perceptual output-based speech quality evaluation
PSTN:: Public switched telephony networks
QoS:: Quality of service
QoE:: Quality of experience
SLA:: Service level agreement
SOM:: Self-organizing map
VQ:: Vector quantization

References

ITU-T Recommendation P.800: Methods for Subjective Determination of Transmission Quality. International Telecommunication Union, Geneva, Switzerland (1996)
ITU-T Recommendation P.800.1: Mean Opinion Score (MOS) Terminology. International Telecommunication Union, Geneva, Switzerland (2006)
Rix, A.W.: Perceptual speech quality assessment—a review. In: Proc. of IEEE Intl. Conf. Acoustics, Speech, and Signal Process., May 2004, vol. II, pp. 1056–1059. ICASSP, Montreal (2004)
Rix A.W., Beerends J.G., Kim D.-S., Kroom P., Ghitza O.: Objective assessment of speech and audio quality-technology and applications. IEEE Trans. Audio Speech Lang. Process. 14(6), 1890–1901 (2006)
Article Google Scholar
Schroeder M.R., Atal B.S., Hall J.L.: Optimizing digital speech coders by exploiting masking properties of human ear. J. Acoust. Soc. Am. 66(6), 1647–1652 (1979)
Article Google Scholar
Karjalainen, M.: A new auditory model for the evaluation of sound quality of audio systems. In: Proc. of IEEE Intl. Conf. Acoustics, Speech, and Signal Process., March 1985, pp. 608–611. ICASSP, Tampa (1985)
Wang S., Sekey A., Gersho A.: An objective measure for predicting subjective quality of speech coders. IEEE J. Selected Areas Commun. 10(5), 819–829 (1992)
Article Google Scholar
Beerends J.G., Stemerdink J.A.: A perceptual audio quality measure based on a psychoacoustic sound representation. J. Audio Eng. Soc. 40(12), 963–974 (1992)
Google Scholar
Beerends J.G., Stemerdink J.A.: A perceptual speech quality measure based on a psychoacoustic sound representation. J. Audio Eng. Soc. 42(3), 115–123 (1994)
Google Scholar
ITU-T Recommendation P.861: Objective Quality Measurement of Telephone-Band (300–3400 Hz) Speech Codecs. International Telecommunication Union, Geneva, Switzerland (1996)
Rix, A.W., Hollier, M.P.: The perceptual analysis measurement system for robust end-to-end speech quality assessment. In: Proc. of IEEE Intl. Conf. Acoustics, Speech, and Signal Process., June 2000, vol. III, pp. 1515–1518. ICASSP, Istanbul, Turkey (2000)
ITU-T Recommendation. P.862: Perceptual Evaluation of Speech Quality (PESQ), An Objective Method for End-to-End Speech Quality Assessment of Narrowband Telephone Networks and Speech Codecs. International Telecommunication Union, Geneva, Switzerland (2001)
Gray P., Hollier M.P., Massara R.E.: Non-intrusive speech-quality assessment using vocal tract models. IEE Proc. Vis. Image Sig. Process. 147(6), 493–501 (2000)
Article Google Scholar
Kim, D.-S., Tarraf, A.: Perceptual model for non-intrusive speech quality assessment. In: Proc. of IEEE Intl. Conf. Acoustics, Speech, and Signal Process., May 2004, vol. III, pp. 1060–1063. ICASSP, Montreal, Canada (2004)
Chen, G., Parsa, V.: Bayesian model based non-intrusive speech quality evaluation. In: Proc. of IEEE Intl. Conf. Acoustics, Speech, and Signal Process., March 2005, vol. I, pp. 385–388. ICASSP, PA, USA (2005)
Kim, D.-S., Tarraf, A.: Enhanced perceptual model for non-intrusive speech quality assessment. In: Proc. of IEEE Intl. Conf. Acoustics, Speech, and Signal Process., May 2006, vol. I, pp. 829–832. ICASSP, Toulouse, France (2006)
ITU-T Recommendation P.563: Single Ended Method for Objective Speech Quality Assessment in Narrow-band Telephony Applications. International Telecommunication Union, Geneva, Switzerland (2004)
Malfait L., Berger J., Kastner M.: P.563 -The ITU-T standard for single-ended speech quality assessment. IEEE Trans. Audio Speech Lang. Process. 14(6), 1924–1934 (2006)
Article Google Scholar
Quatieri T.E.: Discrete-Time Speech Signal Processing: Principles and Practice. Prentice Hall PTR, New Jersey (2002)
Google Scholar
Vesanto J., Alhonieni E.: Clustering of the Self-Organizing Map. IEEE Trans. Neural Netw. 11(3), 586–600 (2000)
Article Google Scholar
Gresho, A., Gray, R.M.: Vector Quantization and Signal Compression. Kluwer, Boston, MA, USA
Rafila, K.S., Dawoud, D.S.: Voiced/unvoiced/ mixed excitation classification of speech using the autocorrelation of the output of an ADPCM system. In: Proc. of IEEE Int. Conf. on Systems Eng., OH, USA, August 1989, pp. 537–540
Kubin, G., Ataland, B.S., Kleijin, W.B.: Performance of noise excitation for unvoiced speech. In: Proc. of the IEEE Workshop on Speech Coding for Telecom., Ste. Adele, P.Q., Canada, Oct. 1993, pp. 35–36
Hermansky H.: Perceptual linear prediction (PLP) analysis of speech. J. Acoust. Soc. Am. 87(4), 1738–1753 (1990)
Article Google Scholar
Gopalan K., Anderson T.R., Cupples E.J.: A comparison of speaker identification results using features based on cepstrum and Fourier-Bessel expansion. IEEE Trans. Speech Audio Process. 7(3), 289–294 (1999)
Article Google Scholar
Thorpe, L., Yang, W.: Performance of current perceptual objective speech quality measures. In: Proc. of the IEEE Workshop on Speech Coding, Porvoo, Finland, June 1999, pp. 144 –146
Hall, J.L.: Auditory psychophysics for coding applications. In: Madisetti, V.K., Williams, D.B. (eds.) The Digital Signal Processing Handbook, Chapter 39, Section IX. pp. 39(1)–39(22). CRC-IEEE Press, Florida (1997)
Google Scholar
ITU-T Recommendation P.810: Modulated Noise Reference Unit – MNRU. International Telecommunication Union, Geneva, Switzerland (1996)
Voran S.: Objective estimation of perceived speech quality-part I: development of the measuring normalizing block technique. IEEE Trans. Speech Audio Process. 7(4), 371–382 (1999)
Article Google Scholar
Conway, A.E.: Output-based method of applying PESQ to measure the perceptual quality of framed speech signals. In: Proc. of IEEE Wireless Comm. & Network. Conf., WCNC, Atlanta, USA, March 2004, pp. 2521–2526
ITU-T Recommendation P.862.3: Application Guide for Objective Quality Measurement Based on Recommendations P.862, P.862.1 and P. 862.2. International Telecommunication Union, Geneva, Switzerland (2005)
ITU-T. Recommendation P.862.1: Mapping Function for Transforming P.862 Raw Result Scores to MOS-LQO. International Telecommunication Union, Geneva, Switzerland (2003)
ITU-T Recommendation P.862.2: Wideband Extension to Recommendation P.862 for the Assessment of Wideband Telephone Networks and Speech Codecs. International Telecommunication Union, Geneva, Switzerland (2005)
ITU-T Recommendation Supplement 23 P-Series: ITU-T Coded-Speech Database. International Telecommunication Union, Geneva, Switzerland (1998)

Download references

Author information

Authors and Affiliations

Department of Electronic and Computer Engineering, University of Limerick, Limerick, Ireland
Abdulhussain E. Mahdi & Dorel Picovici

Authors

Abdulhussain E. Mahdi
View author publications
You can also search for this author in PubMed Google Scholar
Dorel Picovici
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abdulhussain E. Mahdi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mahdi, A.E., Picovici, D. New single-ended objective measure for non-intrusive speech quality evaluation. SIViP 4, 23–38 (2010). https://doi.org/10.1007/s11760-008-0092-1

Download citation

Received: 20 December 2007
Revised: 13 October 2008
Accepted: 14 October 2008
Published: 06 November 2008
Issue Date: March 2010
DOI: https://doi.org/10.1007/s11760-008-0092-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

New single-ended objective measure for non-intrusive speech quality evaluation

Abstract

Access this article

Similar content being viewed by others

Siamese Neural Networks: An Overview

A comprehensive survey on automatic speech recognition using neural networks

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

New single-ended objective measure for non-intrusive speech quality evaluation

Abstract

Access this article

Similar content being viewed by others

Siamese Neural Networks: An Overview

A comprehensive survey on automatic speech recognition using neural networks

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation