Weitere Artikel dieser Ausgabe durch Wischen aufrufen
The performance of any speech based systems depends on the quality of input speech signals. The signal to noise ratio (SNR) is considered to be a measure of the quality of speech signal. This paper reports some analyses (based on experimental evaluations) that are focused on calculating the quality of the speech signal so as to improve the overall accuracy of the system. As compared to existing methods that are based on voice activity detection (VAD), the proposed method is based on glottal activity detection (GAD) to detect the speech and non-speech regions from the input speech signals. Literature reveals that the GAD provides better results than VAD under noisy data condition for the above mentioned task. The proposed method uses a filter with specified cut-off frequency to separate the noise components that are present in the speech activity region. After that, we calculate the SNR as the ratio of the total energy in the speech activity region to that of the non-speech regions. The comparative analyses with two state-of-the-art techniques, viz, National Institute of Standards and Technology (NIST) SNR tool and Waveform Amplitude Distribution Analysis (WADA) SNR algorithm suggest that under different data conditions the proposed method outperforms the existing ones. Since the proposed model depends on the signal processing techniques and does not require any classifier or model to be developed, therefore it is found to be computationally efficient as compared to other two methods.
Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten
Sie möchten Zugang zu diesem Inhalt erhalten? Dann informieren Sie sich jetzt über unsere Produkte:
Breithaupt, C., Gerkmann, T., & Martin, R. (2008). A novel a priori SNR estimation approach based on selective cepstro-temporal smoothing. In International Conference on Acoustics, Speech and Signal Processing. (pp. 4897–4900). IEEE.
Cohen, I. (2005a). Relaxed statistical model for speech enhancement and a priori SNR estimation. IEEE Transactions on Speech and Audio Processing, 13(5), 870–881. CrossRef
Cohen, I. (2005b). Speech enhancement using super-Gaussian speech models and noncausal a priori SNR estimation. Speech Communication, 47(3), 336–350. CrossRef
Elshamy, S., Madhu, N., Tirry, W., & Fingscheidt, T. (2015). An iterative speech model-based a priori SNR estimator. In Sixteenth Annual Conference of the International Speech Communication Association.
Ephraim, Y., & Malah, D. (1984). Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(6), 1109–1121. CrossRef
Ephraim, Y., & Malah, D. (1985). Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 33(2), 443–445. CrossRef
Fodor, B., & Fingscheidt, T. (2012). Reference-free SNR measurement for narrowband and wideband speech signals in car noise. In Proceedings of Speech Communication: ITG Symposium (pp. 1–4). VDE.
Furui, S. (1989). Digital speech processing synthesis, and recognition. New York: Marcel Dekker.
Garofolo, J. S., Lamel, L. F., Fisher, W. M., Fiscus, J. G., Pallett, D. S., Dahlgren, N. L., & Zue, V. (1993). Timit acoustic-phonetic continuous speech corpus. Philadelphia: Linguistic Data Consortium. CrossRef
Gerkmann, T., Breithaupt, C., & Martin, R. (2008). Improved a posteriori speech presence probability estimation based on a likelihood ratio with fixed priors. IEEE Transactions on Audio, Speech, and Language Processing, 16(5), 910–919. CrossRef
Hansen, J. H., & Pellom, B. L. (1998). An effective quality evaluation protocol for speech enhancement algorithms. In Fifth International Conference on Spoken Language Processing.
Hirsch, H. G., & Ehrlicher, C. (1995). Noise estimation techniques for robust speech recognition. In International Conference on Acoustics, Speech, and Signal Processing ICASSP- 95. (Vol. 1, pp. 153–156). IEEE.
Hu, G., & Wang, D. (2008). Segregation of unvoiced speech from nonspeech interference. The Journal of the Acoustical Society of America, 124(2), 1306–1319. CrossRef
Hu, Y., & Loizou, P. C. (2007). Subjective comparison and evaluation of speech enhancement algorithms. Speech Communication, 49(7–8), 588–601. CrossRef
Kim, C., & Stern, R. M. (2008). Robust signal-to-noise ratio estimation based on waveform amplitude distribution analysis. In Ninth Annual Conference of the International Speech Communication Association.
Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: From features to supervectors. Speech Communication, 52(1), 12–40. CrossRef
Manam, A. B., Revanth, T. S., Das, R. K., & Prasanna, S. M. (2016). Speaker verification using acoustic factor analysis with phonetic content compensation in limited and degraded test conditions. In Region 10 Conference ( TENCON), 2016 IEEE (pp. 1402–1406). IEEE.
Martin, R. (1993). An efficient algorithm to estimate the instantaneous SNR of speech signals. In Third European Conference on Speech Communication and Technology.
Martin, R. (2001). Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Transactions on Speech and Audio Processing, 9(5), 504–512. CrossRef
Moazzeni, T., Amei, A., Ma, J., & Jiang, Y. (2012). Statistical model based SNR estimation method for speech signals. Electronics Letters, 48(12), 727–729. CrossRef
Morales-Cordovilla, J. A., Ma, N., Sánchez, V., Carmona, J. L., Peinado, A. M., & Barker, J. (2011). A pitch based noise estimation technique for robust speech recognition with missing data. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4808–4811). IEEE.
Murty, K. S. R., Yegnanarayana, B., & Joseph, M. A. (2009). Characterization of glottal activity from speech signals. IEEE Signal Processing Letters, 16(6), 469–472. CrossRef
Narayanan, A., & Wang, D. (2012). A CASA-based system for long-term SNR estimation. IEEE Transactions on Audio, Speech, and Language Processing, 20(9), 2518–2527. CrossRef
Paez, M., & Glisson, T. (1972). Minimum mean-squared-error quantization in speech PCM and DPCM systems. IEEE Transactions on Communications, 20(2), 225–230. CrossRef
Papadopoulos, P., Tsiartas, A., & Narayanan, S. (2016). Long-term SNR estimation of speech signals in known and unknown channel conditions. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(12), 2495–2506. CrossRef
Plapous, C., Marro, C., & Scalart, P. (2006). Improved signal-to-noise ratio estimation for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 2098–2108. CrossRef
Pollak, P., & Vondrasek, M. (2005). Methods for speech SNR estimation&58: Evaluation tool and analysis of VAD dependency. Radioengineering, 14(1), 6–11.
Rabiner, L. R., & Schafer, R. W. (1978). Digital processing of speech signals. Upper Saddle River: Prentice Hall.
Ren, Y., & Johnson, M. T. (2008). An improved SNR estimator for speech enhancement. In IEEE International Conference on Acoustics, Speech and Signal Processing. ICASSP. (pp. 4901–4904). IEEE.
Saha, P., Baruah, U., Laskar, R. H., Mishra, S., Choudhury, S. P., & Das, T. K. (2016). Robust analysis for improvement of vowel onset point detection under noisy conditions. International Journal of Speech Technology, 19(3), 433–448. CrossRef
Scalart, P. (1996). Speech enhancement based on a priori signal to noise estimation. In IEEE International Conference on Acoustics, Speech, and Signal Processing (Vol. 2, pp. 629–632). IEEE.
Sohn, J., Kim, N. S., & Sung, W. (1999). A statistical model-based voice activity detection. IEEE Signal Processing Letters, 6(1), 1–3. CrossRef
Suhadi, S., Last, C., & Fingscheidt, T. (2011). A data-driven approach to a priori SNR estimation. IEEE Transactions on Audio, Speech, and Language Processing, 19(1), 186–195. CrossRef
Tchorz, J., & Kollmeier, B. (2003). SNR estimation based on amplitude modulation analysis with applications to noise suppression. IEEE Transactions on Speech and Audio Processing, 11(3), 184–192. CrossRef
The NIST Speech, SNR Measurement [Online]. http://www.nist.gov/smartspace/nistspeechsnrmeasurement.html.
Varga, A., Steenneken, H. J. M., Tomlinson, M., & Jones, D. (1992). The NOISEX-92 study on the effect of additive noise on automatic speech recognition, 1992. Documentation included in the NOISEX-92 CD-ROMs.
Wang, D. (2005). Speech separation by humans and machines (pp. 181–197). Boston: Springer. CrossRef
Zhao, X., Shao, Y., & Wang, D. (2011). Robust speaker identification using a CASA front-end. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). (pp. 5468–5471). IEEE
- Reference free speech quality estimation for diverse data condition
R. H. Laskar
- Springer US
International Journal of Speech Technology
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
Neuer Inhalt/© Filograph | Getty Images | iStock