Skip to main content
Erschienen in: International Journal of Speech Technology 3/2019

20.08.2018

Reference free speech quality estimation for diverse data condition

verfasst von: Nirupam Shome, R. H. Laskar, Debaprasad Das

Erschienen in: International Journal of Speech Technology | Ausgabe 3/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The performance of any speech based systems depends on the quality of input speech signals. The signal to noise ratio (SNR) is considered to be a measure of the quality of speech signal. This paper reports some analyses (based on experimental evaluations) that are focused on calculating the quality of the speech signal so as to improve the overall accuracy of the system. As compared to existing methods that are based on voice activity detection (VAD), the proposed method is based on glottal activity detection (GAD) to detect the speech and non-speech regions from the input speech signals. Literature reveals that the GAD provides better results than VAD under noisy data condition for the above mentioned task. The proposed method uses a filter with specified cut-off frequency to separate the noise components that are present in the speech activity region. After that, we calculate the SNR as the ratio of the total energy in the speech activity region to that of the non-speech regions. The comparative analyses with two state-of-the-art techniques, viz, National Institute of Standards and Technology (NIST) SNR tool and Waveform Amplitude Distribution Analysis (WADA) SNR algorithm suggest that under different data conditions the proposed method outperforms the existing ones. Since the proposed model depends on the signal processing techniques and does not require any classifier or model to be developed, therefore it is found to be computationally efficient as compared to other two methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Breithaupt, C., Gerkmann, T., & Martin, R. (2008). A novel a priori SNR estimation approach based on selective cepstro-temporal smoothing. In International Conference on Acoustics, Speech and Signal Processing. (pp. 4897–4900). IEEE. Breithaupt, C., Gerkmann, T., & Martin, R. (2008). A novel a priori SNR estimation approach based on selective cepstro-temporal smoothing. In International Conference on Acoustics, Speech and Signal Processing. (pp. 4897–4900). IEEE.
Zurück zum Zitat Cohen, I. (2005a). Relaxed statistical model for speech enhancement and a priori SNR estimation. IEEE Transactions on Speech and Audio Processing, 13(5), 870–881.CrossRef Cohen, I. (2005a). Relaxed statistical model for speech enhancement and a priori SNR estimation. IEEE Transactions on Speech and Audio Processing, 13(5), 870–881.CrossRef
Zurück zum Zitat Cohen, I. (2005b). Speech enhancement using super-Gaussian speech models and noncausal a priori SNR estimation. Speech Communication, 47(3), 336–350.CrossRef Cohen, I. (2005b). Speech enhancement using super-Gaussian speech models and noncausal a priori SNR estimation. Speech Communication, 47(3), 336–350.CrossRef
Zurück zum Zitat Elshamy, S., Madhu, N., Tirry, W., & Fingscheidt, T. (2015). An iterative speech model-based a priori SNR estimator. In Sixteenth Annual Conference of the International Speech Communication Association. Elshamy, S., Madhu, N., Tirry, W., & Fingscheidt, T. (2015). An iterative speech model-based a priori SNR estimator. In Sixteenth Annual Conference of the International Speech Communication Association.
Zurück zum Zitat Ephraim, Y., & Malah, D. (1984). Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(6), 1109–1121.CrossRef Ephraim, Y., & Malah, D. (1984). Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(6), 1109–1121.CrossRef
Zurück zum Zitat Ephraim, Y., & Malah, D. (1985). Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 33(2), 443–445.CrossRef Ephraim, Y., & Malah, D. (1985). Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 33(2), 443–445.CrossRef
Zurück zum Zitat Fodor, B., & Fingscheidt, T. (2012). Reference-free SNR measurement for narrowband and wideband speech signals in car noise. In Proceedings of Speech Communication: ITG Symposium (pp. 1–4). VDE. Fodor, B., & Fingscheidt, T. (2012). Reference-free SNR measurement for narrowband and wideband speech signals in car noise. In Proceedings of Speech Communication: ITG Symposium (pp. 1–4). VDE.
Zurück zum Zitat Furui, S. (1989). Digital speech processing synthesis, and recognition. New York: Marcel Dekker. Furui, S. (1989). Digital speech processing synthesis, and recognition. New York: Marcel Dekker.
Zurück zum Zitat Garofolo, J. S., Lamel, L. F., Fisher, W. M., Fiscus, J. G., Pallett, D. S., Dahlgren, N. L., & Zue, V. (1993). Timit acoustic-phonetic continuous speech corpus. Philadelphia: Linguistic Data Consortium. Garofolo, J. S., Lamel, L. F., Fisher, W. M., Fiscus, J. G., Pallett, D. S., Dahlgren, N. L., & Zue, V. (1993). Timit acoustic-phonetic continuous speech corpus. Philadelphia: Linguistic Data Consortium.
Zurück zum Zitat Gerkmann, T., Breithaupt, C., & Martin, R. (2008). Improved a posteriori speech presence probability estimation based on a likelihood ratio with fixed priors. IEEE Transactions on Audio, Speech, and Language Processing, 16(5), 910–919.CrossRef Gerkmann, T., Breithaupt, C., & Martin, R. (2008). Improved a posteriori speech presence probability estimation based on a likelihood ratio with fixed priors. IEEE Transactions on Audio, Speech, and Language Processing, 16(5), 910–919.CrossRef
Zurück zum Zitat Hansen, J. H., & Pellom, B. L. (1998). An effective quality evaluation protocol for speech enhancement algorithms. In Fifth International Conference on Spoken Language Processing. Hansen, J. H., & Pellom, B. L. (1998). An effective quality evaluation protocol for speech enhancement algorithms. In Fifth International Conference on Spoken Language Processing.
Zurück zum Zitat Hirsch, H. G., & Ehrlicher, C. (1995). Noise estimation techniques for robust speech recognition. In International Conference on Acoustics, Speech, and Signal Processing ICASSP-95. (Vol. 1, pp. 153–156). IEEE. Hirsch, H. G., & Ehrlicher, C. (1995). Noise estimation techniques for robust speech recognition. In International Conference on Acoustics, Speech, and Signal Processing ICASSP-95. (Vol. 1, pp. 153–156). IEEE.
Zurück zum Zitat Hu, G., & Wang, D. (2008). Segregation of unvoiced speech from nonspeech interference. The Journal of the Acoustical Society of America, 124(2), 1306–1319.CrossRef Hu, G., & Wang, D. (2008). Segregation of unvoiced speech from nonspeech interference. The Journal of the Acoustical Society of America, 124(2), 1306–1319.CrossRef
Zurück zum Zitat Hu, Y., & Loizou, P. C. (2007). Subjective comparison and evaluation of speech enhancement algorithms. Speech Communication, 49(7–8), 588–601.CrossRef Hu, Y., & Loizou, P. C. (2007). Subjective comparison and evaluation of speech enhancement algorithms. Speech Communication, 49(7–8), 588–601.CrossRef
Zurück zum Zitat Kim, C., & Stern, R. M. (2008). Robust signal-to-noise ratio estimation based on waveform amplitude distribution analysis. In Ninth Annual Conference of the International Speech Communication Association. Kim, C., & Stern, R. M. (2008). Robust signal-to-noise ratio estimation based on waveform amplitude distribution analysis. In Ninth Annual Conference of the International Speech Communication Association.
Zurück zum Zitat Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: From features to supervectors. Speech Communication, 52(1), 12–40.CrossRef Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: From features to supervectors. Speech Communication, 52(1), 12–40.CrossRef
Zurück zum Zitat Lotter, T., & Vary, P. (2005). Speech enhancement by MAP spectral amplitude estimation using a super-Gaussian speech model. EURASIP Journal on Advances in Signal Processing, 2005(7), 354850.CrossRefMATH Lotter, T., & Vary, P. (2005). Speech enhancement by MAP spectral amplitude estimation using a super-Gaussian speech model. EURASIP Journal on Advances in Signal Processing, 2005(7), 354850.CrossRefMATH
Zurück zum Zitat Manam, A. B., Revanth, T. S., Das, R. K., & Prasanna, S. M. (2016). Speaker verification using acoustic factor analysis with phonetic content compensation in limited and degraded test conditions. In Region 10 Conference (TENCON), 2016 IEEE (pp. 1402–1406). IEEE. Manam, A. B., Revanth, T. S., Das, R. K., & Prasanna, S. M. (2016). Speaker verification using acoustic factor analysis with phonetic content compensation in limited and degraded test conditions. In Region 10 Conference (TENCON), 2016 IEEE (pp. 1402–1406). IEEE.
Zurück zum Zitat Martin, R. (1993). An efficient algorithm to estimate the instantaneous SNR of speech signals. In Third European Conference on Speech Communication and Technology. Martin, R. (1993). An efficient algorithm to estimate the instantaneous SNR of speech signals. In Third European Conference on Speech Communication and Technology.
Zurück zum Zitat Martin, R. (2001). Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Transactions on Speech and Audio Processing, 9(5), 504–512.CrossRef Martin, R. (2001). Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Transactions on Speech and Audio Processing, 9(5), 504–512.CrossRef
Zurück zum Zitat Moazzeni, T., Amei, A., Ma, J., & Jiang, Y. (2012). Statistical model based SNR estimation method for speech signals. Electronics Letters, 48(12), 727–729.CrossRef Moazzeni, T., Amei, A., Ma, J., & Jiang, Y. (2012). Statistical model based SNR estimation method for speech signals. Electronics Letters, 48(12), 727–729.CrossRef
Zurück zum Zitat Morales-Cordovilla, J. A., Ma, N., Sánchez, V., Carmona, J. L., Peinado, A. M., & Barker, J. (2011). A pitch based noise estimation technique for robust speech recognition with missing data. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4808–4811). IEEE. Morales-Cordovilla, J. A., Ma, N., Sánchez, V., Carmona, J. L., Peinado, A. M., & Barker, J. (2011). A pitch based noise estimation technique for robust speech recognition with missing data. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4808–4811). IEEE.
Zurück zum Zitat Murty, K. S. R., Yegnanarayana, B., & Joseph, M. A. (2009). Characterization of glottal activity from speech signals. IEEE Signal Processing Letters, 16(6), 469–472.CrossRef Murty, K. S. R., Yegnanarayana, B., & Joseph, M. A. (2009). Characterization of glottal activity from speech signals. IEEE Signal Processing Letters, 16(6), 469–472.CrossRef
Zurück zum Zitat Narayanan, A., & Wang, D. (2012). A CASA-based system for long-term SNR estimation. IEEE Transactions on Audio, Speech, and Language Processing, 20(9), 2518–2527.CrossRef Narayanan, A., & Wang, D. (2012). A CASA-based system for long-term SNR estimation. IEEE Transactions on Audio, Speech, and Language Processing, 20(9), 2518–2527.CrossRef
Zurück zum Zitat Paez, M., & Glisson, T. (1972). Minimum mean-squared-error quantization in speech PCM and DPCM systems. IEEE Transactions on Communications, 20(2), 225–230.CrossRef Paez, M., & Glisson, T. (1972). Minimum mean-squared-error quantization in speech PCM and DPCM systems. IEEE Transactions on Communications, 20(2), 225–230.CrossRef
Zurück zum Zitat Papadopoulos, P., Tsiartas, A., & Narayanan, S. (2016). Long-term SNR estimation of speech signals in known and unknown channel conditions. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(12), 2495–2506.CrossRef Papadopoulos, P., Tsiartas, A., & Narayanan, S. (2016). Long-term SNR estimation of speech signals in known and unknown channel conditions. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(12), 2495–2506.CrossRef
Zurück zum Zitat Plapous, C., Marro, C., & Scalart, P. (2006). Improved signal-to-noise ratio estimation for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 2098–2108.CrossRef Plapous, C., Marro, C., & Scalart, P. (2006). Improved signal-to-noise ratio estimation for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 2098–2108.CrossRef
Zurück zum Zitat Pollak, P., & Vondrasek, M. (2005). Methods for speech SNR estimation&58: Evaluation tool and analysis of VAD dependency. Radioengineering, 14(1), 6–11. Pollak, P., & Vondrasek, M. (2005). Methods for speech SNR estimation&58: Evaluation tool and analysis of VAD dependency. Radioengineering, 14(1), 6–11.
Zurück zum Zitat Rabiner, L. R., & Schafer, R. W. (1978). Digital processing of speech signals. Upper Saddle River: Prentice Hall. Rabiner, L. R., & Schafer, R. W. (1978). Digital processing of speech signals. Upper Saddle River: Prentice Hall.
Zurück zum Zitat Ren, Y., & Johnson, M. T. (2008). An improved SNR estimator for speech enhancement. In IEEE International Conference on Acoustics, Speech and Signal Processing. ICASSP. (pp. 4901–4904). IEEE. Ren, Y., & Johnson, M. T. (2008). An improved SNR estimator for speech enhancement. In IEEE International Conference on Acoustics, Speech and Signal Processing. ICASSP. (pp. 4901–4904). IEEE.
Zurück zum Zitat Saha, P., Baruah, U., Laskar, R. H., Mishra, S., Choudhury, S. P., & Das, T. K. (2016). Robust analysis for improvement of vowel onset point detection under noisy conditions. International Journal of Speech Technology, 19(3), 433–448.CrossRef Saha, P., Baruah, U., Laskar, R. H., Mishra, S., Choudhury, S. P., & Das, T. K. (2016). Robust analysis for improvement of vowel onset point detection under noisy conditions. International Journal of Speech Technology, 19(3), 433–448.CrossRef
Zurück zum Zitat Scalart, P. (1996). Speech enhancement based on a priori signal to noise estimation. In IEEE International Conference on Acoustics, Speech, and Signal Processing (Vol. 2, pp. 629–632). IEEE. Scalart, P. (1996). Speech enhancement based on a priori signal to noise estimation. In IEEE International Conference on Acoustics, Speech, and Signal Processing (Vol. 2, pp. 629–632). IEEE.
Zurück zum Zitat Sohn, J., Kim, N. S., & Sung, W. (1999). A statistical model-based voice activity detection. IEEE Signal Processing Letters, 6(1), 1–3.CrossRef Sohn, J., Kim, N. S., & Sung, W. (1999). A statistical model-based voice activity detection. IEEE Signal Processing Letters, 6(1), 1–3.CrossRef
Zurück zum Zitat Suhadi, S., Last, C., & Fingscheidt, T. (2011). A data-driven approach to a priori SNR estimation. IEEE Transactions on Audio, Speech, and Language Processing, 19(1), 186–195.CrossRef Suhadi, S., Last, C., & Fingscheidt, T. (2011). A data-driven approach to a priori SNR estimation. IEEE Transactions on Audio, Speech, and Language Processing, 19(1), 186–195.CrossRef
Zurück zum Zitat Tchorz, J., & Kollmeier, B. (2003). SNR estimation based on amplitude modulation analysis with applications to noise suppression. IEEE Transactions on Speech and Audio Processing, 11(3), 184–192.CrossRef Tchorz, J., & Kollmeier, B. (2003). SNR estimation based on amplitude modulation analysis with applications to noise suppression. IEEE Transactions on Speech and Audio Processing, 11(3), 184–192.CrossRef
Zurück zum Zitat Varga, A., Steenneken, H. J. M., Tomlinson, M., & Jones, D. (1992). The NOISEX-92 study on the effect of additive noise on automatic speech recognition, 1992. Documentation included in the NOISEX-92 CD-ROMs. Varga, A., Steenneken, H. J. M., Tomlinson, M., & Jones, D. (1992). The NOISEX-92 study on the effect of additive noise on automatic speech recognition, 1992. Documentation included in the NOISEX-92 CD-ROMs.
Zurück zum Zitat Wang, D. (2005). Speech separation by humans and machines (pp. 181–197). Boston: Springer.CrossRef Wang, D. (2005). Speech separation by humans and machines (pp. 181–197). Boston: Springer.CrossRef
Zurück zum Zitat Zhao, X., Shao, Y., & Wang, D. (2011). Robust speaker identification using a CASA front-end. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). (pp. 5468–5471). IEEE Zhao, X., Shao, Y., & Wang, D. (2011). Robust speaker identification using a CASA front-end. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). (pp. 5468–5471). IEEE
Metadaten
Titel
Reference free speech quality estimation for diverse data condition
verfasst von
Nirupam Shome
R. H. Laskar
Debaprasad Das
Publikationsdatum
20.08.2018
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 3/2019
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-018-9537-2

Weitere Artikel der Ausgabe 3/2019

International Journal of Speech Technology 3/2019 Zur Ausgabe