Skip to main content
Log in

New single-ended objective measure for non-intrusive speech quality evaluation

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

This article proposes a new output-based method for non-intrusive assessment of speech quality of voice communication systems and evaluates its performance. The method requires access to the processed (degraded) speech only, and is based on measuring perception-motivated objective auditory distances between the voiced parts of the output speech to appropriately matching references extracted from a pre-formulated codebook. The codebook is formed by optimally clustering a large number of parametric speech vectors extracted from a database of clean speech records. The auditory distances are then mapped into objective Mean Opinion listening quality scores. An efficient data-mining tool known as the self-organizing map (SOM) achieves the required clustering and mapping/reference matching processes. In order to obtain a perception-based, speaker-independent parametric representation of the speech, three domain transformation techniques have been investigated. The first technique is based on a perceptual linear prediction (PLP) model, the second utilises a bark spectrum (BS) analysis and the third utilises mel-frequency cepstrum coefficients (MFCC). Reported evaluation results show that the proposed method provides high correlation with subjective listening quality scores, yielding accuracy similar to that of the ITU-T P.563 while maintaining a relatively low computational complexity. Results also demonstrate that the method outperforms the PESQ in a number of distortion conditions, such as those of speech degraded by channel impairments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Abbreviations

AGC:

Automatic gain control

AD:

Auditory distance

ASD:

Auditory spectrum distance

ASR:

Automatic speech recognition systems

BSD:

Bark spectral distance

BMU:

Best matching unit

CQ:

Conversational quality

DMM :

Euclidean-based median minimum distance

DFT:

Discrete Fourier transform

FFT:

Fast Fourier transform

IDFT:

Inverse discrete Fourier transform

ITU-T:

International Telecommunication Union-Telecommunication Standardization Sector

LQ:

Listening quality

LP:

Linear prediction

MOS:

Mean opinion score

MOS_LQO:

Objective mean opinion listening quality score [2]

MOS_LQS:

Subjective mean opinion listening quality score [2]

MFCC:

Mel-frequency cepstrum coefficients

MNRU:

Modulated noise reference unit [28]

NN:

Neural network

PLP:

Perceptual linear prediction

PAQM:

Perceptual audio quality measure

PSQM:

Perceptual speech quality measure

PAMS:

Perceptual analysis measurement systems

PESQ:

Perceptual evaluation of speech quality

POSQE:

Perceptual output-based speech quality evaluation

PSTN:

Public switched telephony networks

QoS:

Quality of service

QoE:

Quality of experience

SLA:

Service level agreement

SOM:

Self-organizing map

VQ:

Vector quantization

References

  1. ITU-T Recommendation P.800: Methods for Subjective Determination of Transmission Quality. International Telecommunication Union, Geneva, Switzerland (1996)

  2. ITU-T Recommendation P.800.1: Mean Opinion Score (MOS) Terminology. International Telecommunication Union, Geneva, Switzerland (2006)

  3. Rix, A.W.: Perceptual speech quality assessment—a review. In: Proc. of IEEE Intl. Conf. Acoustics, Speech, and Signal Process., May 2004, vol. II, pp. 1056–1059. ICASSP, Montreal (2004)

  4. Rix A.W., Beerends J.G., Kim D.-S., Kroom P., Ghitza O.: Objective assessment of speech and audio quality-technology and applications. IEEE Trans. Audio Speech Lang. Process. 14(6), 1890–1901 (2006)

    Article  Google Scholar 

  5. Schroeder M.R., Atal B.S., Hall J.L.: Optimizing digital speech coders by exploiting masking properties of human ear. J. Acoust. Soc. Am. 66(6), 1647–1652 (1979)

    Article  Google Scholar 

  6. Karjalainen, M.: A new auditory model for the evaluation of sound quality of audio systems. In: Proc. of IEEE Intl. Conf. Acoustics, Speech, and Signal Process., March 1985, pp. 608–611. ICASSP, Tampa (1985)

  7. Wang S., Sekey A., Gersho A.: An objective measure for predicting subjective quality of speech coders. IEEE J. Selected Areas Commun. 10(5), 819–829 (1992)

    Article  Google Scholar 

  8. Beerends J.G., Stemerdink J.A.: A perceptual audio quality measure based on a psychoacoustic sound representation. J. Audio Eng. Soc. 40(12), 963–974 (1992)

    Google Scholar 

  9. Beerends J.G., Stemerdink J.A.: A perceptual speech quality measure based on a psychoacoustic sound representation. J. Audio Eng. Soc. 42(3), 115–123 (1994)

    Google Scholar 

  10. ITU-T Recommendation P.861: Objective Quality Measurement of Telephone-Band (300–3400 Hz) Speech Codecs. International Telecommunication Union, Geneva, Switzerland (1996)

  11. Rix, A.W., Hollier, M.P.: The perceptual analysis measurement system for robust end-to-end speech quality assessment. In: Proc. of IEEE Intl. Conf. Acoustics, Speech, and Signal Process., June 2000, vol. III, pp. 1515–1518. ICASSP, Istanbul, Turkey (2000)

  12. ITU-T Recommendation. P.862: Perceptual Evaluation of Speech Quality (PESQ), An Objective Method for End-to-End Speech Quality Assessment of Narrowband Telephone Networks and Speech Codecs. International Telecommunication Union, Geneva, Switzerland (2001)

  13. Gray P., Hollier M.P., Massara R.E.: Non-intrusive speech-quality assessment using vocal tract models. IEE Proc. Vis. Image Sig. Process. 147(6), 493–501 (2000)

    Article  Google Scholar 

  14. Kim, D.-S., Tarraf, A.: Perceptual model for non-intrusive speech quality assessment. In: Proc. of IEEE Intl. Conf. Acoustics, Speech, and Signal Process., May 2004, vol. III, pp. 1060–1063. ICASSP, Montreal, Canada (2004)

  15. Chen, G., Parsa, V.: Bayesian model based non-intrusive speech quality evaluation. In: Proc. of IEEE Intl. Conf. Acoustics, Speech, and Signal Process., March 2005, vol. I, pp. 385–388. ICASSP, PA, USA (2005)

  16. Kim, D.-S., Tarraf, A.: Enhanced perceptual model for non-intrusive speech quality assessment. In: Proc. of IEEE Intl. Conf. Acoustics, Speech, and Signal Process., May 2006, vol. I, pp. 829–832. ICASSP, Toulouse, France (2006)

  17. ITU-T Recommendation P.563: Single Ended Method for Objective Speech Quality Assessment in Narrow-band Telephony Applications. International Telecommunication Union, Geneva, Switzerland (2004)

  18. Malfait L., Berger J., Kastner M.: P.563 -The ITU-T standard for single-ended speech quality assessment. IEEE Trans. Audio Speech Lang. Process. 14(6), 1924–1934 (2006)

    Article  Google Scholar 

  19. Quatieri T.E.: Discrete-Time Speech Signal Processing: Principles and Practice. Prentice Hall PTR, New Jersey (2002)

    Google Scholar 

  20. Vesanto J., Alhonieni E.: Clustering of the Self-Organizing Map. IEEE Trans. Neural Netw. 11(3), 586–600 (2000)

    Article  Google Scholar 

  21. Gresho, A., Gray, R.M.: Vector Quantization and Signal Compression. Kluwer, Boston, MA, USA

  22. Rafila, K.S., Dawoud, D.S.: Voiced/unvoiced/ mixed excitation classification of speech using the autocorrelation of the output of an ADPCM system. In: Proc. of IEEE Int. Conf. on Systems Eng., OH, USA, August 1989, pp. 537–540

  23. Kubin, G., Ataland, B.S., Kleijin, W.B.: Performance of noise excitation for unvoiced speech. In: Proc. of the IEEE Workshop on Speech Coding for Telecom., Ste. Adele, P.Q., Canada, Oct. 1993, pp. 35–36

  24. Hermansky H.: Perceptual linear prediction (PLP) analysis of speech. J. Acoust. Soc. Am. 87(4), 1738–1753 (1990)

    Article  Google Scholar 

  25. Gopalan K., Anderson T.R., Cupples E.J.: A comparison of speaker identification results using features based on cepstrum and Fourier-Bessel expansion. IEEE Trans. Speech Audio Process. 7(3), 289–294 (1999)

    Article  Google Scholar 

  26. Thorpe, L., Yang, W.: Performance of current perceptual objective speech quality measures. In: Proc. of the IEEE Workshop on Speech Coding, Porvoo, Finland, June 1999, pp. 144 –146

  27. Hall, J.L.: Auditory psychophysics for coding applications. In: Madisetti, V.K., Williams, D.B. (eds.) The Digital Signal Processing Handbook, Chapter 39, Section IX. pp. 39(1)–39(22). CRC-IEEE Press, Florida (1997)

    Google Scholar 

  28. ITU-T Recommendation P.810: Modulated Noise Reference Unit – MNRU. International Telecommunication Union, Geneva, Switzerland (1996)

  29. Voran S.: Objective estimation of perceived speech quality-part I: development of the measuring normalizing block technique. IEEE Trans. Speech Audio Process. 7(4), 371–382 (1999)

    Article  Google Scholar 

  30. Conway, A.E.: Output-based method of applying PESQ to measure the perceptual quality of framed speech signals. In: Proc. of IEEE Wireless Comm. & Network. Conf., WCNC, Atlanta, USA, March 2004, pp. 2521–2526

  31. ITU-T Recommendation P.862.3: Application Guide for Objective Quality Measurement Based on Recommendations P.862, P.862.1 and P. 862.2. International Telecommunication Union, Geneva, Switzerland (2005)

  32. ITU-T. Recommendation P.862.1: Mapping Function for Transforming P.862 Raw Result Scores to MOS-LQO. International Telecommunication Union, Geneva, Switzerland (2003)

  33. ITU-T Recommendation P.862.2: Wideband Extension to Recommendation P.862 for the Assessment of Wideband Telephone Networks and Speech Codecs. International Telecommunication Union, Geneva, Switzerland (2005)

  34. ITU-T Recommendation Supplement 23 P-Series: ITU-T Coded-Speech Database. International Telecommunication Union, Geneva, Switzerland (1998)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abdulhussain E. Mahdi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mahdi, A.E., Picovici, D. New single-ended objective measure for non-intrusive speech quality evaluation. SIViP 4, 23–38 (2010). https://doi.org/10.1007/s11760-008-0092-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-008-0092-1

Keywords

Navigation