Abstract
This article proposes a new output-based method for non-intrusive assessment of speech quality of voice communication systems and evaluates its performance. The method requires access to the processed (degraded) speech only, and is based on measuring perception-motivated objective auditory distances between the voiced parts of the output speech to appropriately matching references extracted from a pre-formulated codebook. The codebook is formed by optimally clustering a large number of parametric speech vectors extracted from a database of clean speech records. The auditory distances are then mapped into objective Mean Opinion listening quality scores. An efficient data-mining tool known as the self-organizing map (SOM) achieves the required clustering and mapping/reference matching processes. In order to obtain a perception-based, speaker-independent parametric representation of the speech, three domain transformation techniques have been investigated. The first technique is based on a perceptual linear prediction (PLP) model, the second utilises a bark spectrum (BS) analysis and the third utilises mel-frequency cepstrum coefficients (MFCC). Reported evaluation results show that the proposed method provides high correlation with subjective listening quality scores, yielding accuracy similar to that of the ITU-T P.563 while maintaining a relatively low computational complexity. Results also demonstrate that the method outperforms the PESQ in a number of distortion conditions, such as those of speech degraded by channel impairments.
Similar content being viewed by others
Abbreviations
- AGC:
-
Automatic gain control
- AD:
-
Auditory distance
- ASD:
-
Auditory spectrum distance
- ASR:
-
Automatic speech recognition systems
- BSD:
-
Bark spectral distance
- BMU:
-
Best matching unit
- CQ:
-
Conversational quality
- DMM :
-
Euclidean-based median minimum distance
- DFT:
-
Discrete Fourier transform
- FFT:
-
Fast Fourier transform
- IDFT:
-
Inverse discrete Fourier transform
- ITU-T:
-
International Telecommunication Union-Telecommunication Standardization Sector
- LQ:
-
Listening quality
- LP:
-
Linear prediction
- MOS:
-
Mean opinion score
- MOS_LQO:
-
Objective mean opinion listening quality score [2]
- MOS_LQS:
-
Subjective mean opinion listening quality score [2]
- MFCC:
-
Mel-frequency cepstrum coefficients
- MNRU:
-
Modulated noise reference unit [28]
- NN:
-
Neural network
- PLP:
-
Perceptual linear prediction
- PAQM:
-
Perceptual audio quality measure
- PSQM:
-
Perceptual speech quality measure
- PAMS:
-
Perceptual analysis measurement systems
- PESQ:
-
Perceptual evaluation of speech quality
- POSQE:
-
Perceptual output-based speech quality evaluation
- PSTN:
-
Public switched telephony networks
- QoS:
-
Quality of service
- QoE:
-
Quality of experience
- SLA:
-
Service level agreement
- SOM:
-
Self-organizing map
- VQ:
-
Vector quantization
References
ITU-T Recommendation P.800: Methods for Subjective Determination of Transmission Quality. International Telecommunication Union, Geneva, Switzerland (1996)
ITU-T Recommendation P.800.1: Mean Opinion Score (MOS) Terminology. International Telecommunication Union, Geneva, Switzerland (2006)
Rix, A.W.: Perceptual speech quality assessment—a review. In: Proc. of IEEE Intl. Conf. Acoustics, Speech, and Signal Process., May 2004, vol. II, pp. 1056–1059. ICASSP, Montreal (2004)
Rix A.W., Beerends J.G., Kim D.-S., Kroom P., Ghitza O.: Objective assessment of speech and audio quality-technology and applications. IEEE Trans. Audio Speech Lang. Process. 14(6), 1890–1901 (2006)
Schroeder M.R., Atal B.S., Hall J.L.: Optimizing digital speech coders by exploiting masking properties of human ear. J. Acoust. Soc. Am. 66(6), 1647–1652 (1979)
Karjalainen, M.: A new auditory model for the evaluation of sound quality of audio systems. In: Proc. of IEEE Intl. Conf. Acoustics, Speech, and Signal Process., March 1985, pp. 608–611. ICASSP, Tampa (1985)
Wang S., Sekey A., Gersho A.: An objective measure for predicting subjective quality of speech coders. IEEE J. Selected Areas Commun. 10(5), 819–829 (1992)
Beerends J.G., Stemerdink J.A.: A perceptual audio quality measure based on a psychoacoustic sound representation. J. Audio Eng. Soc. 40(12), 963–974 (1992)
Beerends J.G., Stemerdink J.A.: A perceptual speech quality measure based on a psychoacoustic sound representation. J. Audio Eng. Soc. 42(3), 115–123 (1994)
ITU-T Recommendation P.861: Objective Quality Measurement of Telephone-Band (300–3400 Hz) Speech Codecs. International Telecommunication Union, Geneva, Switzerland (1996)
Rix, A.W., Hollier, M.P.: The perceptual analysis measurement system for robust end-to-end speech quality assessment. In: Proc. of IEEE Intl. Conf. Acoustics, Speech, and Signal Process., June 2000, vol. III, pp. 1515–1518. ICASSP, Istanbul, Turkey (2000)
ITU-T Recommendation. P.862: Perceptual Evaluation of Speech Quality (PESQ), An Objective Method for End-to-End Speech Quality Assessment of Narrowband Telephone Networks and Speech Codecs. International Telecommunication Union, Geneva, Switzerland (2001)
Gray P., Hollier M.P., Massara R.E.: Non-intrusive speech-quality assessment using vocal tract models. IEE Proc. Vis. Image Sig. Process. 147(6), 493–501 (2000)
Kim, D.-S., Tarraf, A.: Perceptual model for non-intrusive speech quality assessment. In: Proc. of IEEE Intl. Conf. Acoustics, Speech, and Signal Process., May 2004, vol. III, pp. 1060–1063. ICASSP, Montreal, Canada (2004)
Chen, G., Parsa, V.: Bayesian model based non-intrusive speech quality evaluation. In: Proc. of IEEE Intl. Conf. Acoustics, Speech, and Signal Process., March 2005, vol. I, pp. 385–388. ICASSP, PA, USA (2005)
Kim, D.-S., Tarraf, A.: Enhanced perceptual model for non-intrusive speech quality assessment. In: Proc. of IEEE Intl. Conf. Acoustics, Speech, and Signal Process., May 2006, vol. I, pp. 829–832. ICASSP, Toulouse, France (2006)
ITU-T Recommendation P.563: Single Ended Method for Objective Speech Quality Assessment in Narrow-band Telephony Applications. International Telecommunication Union, Geneva, Switzerland (2004)
Malfait L., Berger J., Kastner M.: P.563 -The ITU-T standard for single-ended speech quality assessment. IEEE Trans. Audio Speech Lang. Process. 14(6), 1924–1934 (2006)
Quatieri T.E.: Discrete-Time Speech Signal Processing: Principles and Practice. Prentice Hall PTR, New Jersey (2002)
Vesanto J., Alhonieni E.: Clustering of the Self-Organizing Map. IEEE Trans. Neural Netw. 11(3), 586–600 (2000)
Gresho, A., Gray, R.M.: Vector Quantization and Signal Compression. Kluwer, Boston, MA, USA
Rafila, K.S., Dawoud, D.S.: Voiced/unvoiced/ mixed excitation classification of speech using the autocorrelation of the output of an ADPCM system. In: Proc. of IEEE Int. Conf. on Systems Eng., OH, USA, August 1989, pp. 537–540
Kubin, G., Ataland, B.S., Kleijin, W.B.: Performance of noise excitation for unvoiced speech. In: Proc. of the IEEE Workshop on Speech Coding for Telecom., Ste. Adele, P.Q., Canada, Oct. 1993, pp. 35–36
Hermansky H.: Perceptual linear prediction (PLP) analysis of speech. J. Acoust. Soc. Am. 87(4), 1738–1753 (1990)
Gopalan K., Anderson T.R., Cupples E.J.: A comparison of speaker identification results using features based on cepstrum and Fourier-Bessel expansion. IEEE Trans. Speech Audio Process. 7(3), 289–294 (1999)
Thorpe, L., Yang, W.: Performance of current perceptual objective speech quality measures. In: Proc. of the IEEE Workshop on Speech Coding, Porvoo, Finland, June 1999, pp. 144 –146
Hall, J.L.: Auditory psychophysics for coding applications. In: Madisetti, V.K., Williams, D.B. (eds.) The Digital Signal Processing Handbook, Chapter 39, Section IX. pp. 39(1)–39(22). CRC-IEEE Press, Florida (1997)
ITU-T Recommendation P.810: Modulated Noise Reference Unit – MNRU. International Telecommunication Union, Geneva, Switzerland (1996)
Voran S.: Objective estimation of perceived speech quality-part I: development of the measuring normalizing block technique. IEEE Trans. Speech Audio Process. 7(4), 371–382 (1999)
Conway, A.E.: Output-based method of applying PESQ to measure the perceptual quality of framed speech signals. In: Proc. of IEEE Wireless Comm. & Network. Conf., WCNC, Atlanta, USA, March 2004, pp. 2521–2526
ITU-T Recommendation P.862.3: Application Guide for Objective Quality Measurement Based on Recommendations P.862, P.862.1 and P. 862.2. International Telecommunication Union, Geneva, Switzerland (2005)
ITU-T. Recommendation P.862.1: Mapping Function for Transforming P.862 Raw Result Scores to MOS-LQO. International Telecommunication Union, Geneva, Switzerland (2003)
ITU-T Recommendation P.862.2: Wideband Extension to Recommendation P.862 for the Assessment of Wideband Telephone Networks and Speech Codecs. International Telecommunication Union, Geneva, Switzerland (2005)
ITU-T Recommendation Supplement 23 P-Series: ITU-T Coded-Speech Database. International Telecommunication Union, Geneva, Switzerland (1998)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mahdi, A.E., Picovici, D. New single-ended objective measure for non-intrusive speech quality evaluation. SIViP 4, 23–38 (2010). https://doi.org/10.1007/s11760-008-0092-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-008-0092-1