nach oben

International Journal of Speech Technology

Erschienen in:

01.06.2015

Improving the self-adaptive voice activity detector for speaker verification using map adaptation and asymmetric tapers

verfasst von: Nassim Asbai, Messaoud Bengherabi, Abderrahmane Amrouche, Youcef Aklouf

Erschienen in: International Journal of Speech Technology | Ausgabe 2/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

This paper brings an improvement of voice activity detection, based on vector quantization and speech enhancement preprocessing (VQ-VAD) proposed recently, and applied to speaker verification system under noisy environment. VQ-VAD is based on computing the likelihood ratio on an utterance-by utterance basis from mel-frequency cepstral coefficients that train speech and non-speech models. Whereas the notion of speech and non-speech segments in speech signal is independent of the speaker. For this, a modified VQ-VAD technique is proposed in this paper, by creating two UBM’s for speech and non-speech models, trained from a long utterance-independence model. Then, an adaptation of UBM’s models to the short utterance of speaker is performed via MAP adaptation, instead of using VQ models. Mel-frequency cepstral coefficient’s were also extracted by using the recently proposed asymmetric tapers instead of the traditional Hamming windowing. Using the GMM–UBM as a baseline system for speaker verification, extensive simulation results were done by adding different noise levels to the clean TIMIT database, characterized by its short training and very short testing utterances. The obtained results show the superiority of the proposed GMM-MAP-VAD approach in adverse conditions. Furthermore a drastic reduction in the EER is observed when using asymmetric tapers.

Vorheriger Artikel Using bands of frequencies for vowel recognition for Polish language

Nächster Artikel Images compression techniques for wireless sensor network applications

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Amrouche, A., Debyeche, M., Taleb-Ahmed, A., Michel Rouvaen, J., & Yagoub, M. C. (2010). An efficient speech recognition system in adverse conditions using the nonparametric regression. Engineering Applications of Artificial Intelligence, 23(1), 85–94.CrossRef

Asbai, N., Bengherabi, M., Amrouche, A., & Harizi, F. (2013a). Improving speaker verification robustness by front-end diversity and score level fusion. In Proceedings of 2013 International Conference on Signal-Image Technology & Internet-Based Systems (SITIS) kyoto, Japan (pp. 136–142). IEEE.

Asbai, N., Bengherabi, M., Harizi, F., & Amrouche, A. (2013b). Improving the performance of speaker verification systems under noisy conditions using low level features and score level fusion. In International Conference on Signal Processing and Multimedia Applications (SIGMAP), Iceland (pp. 33–38). INSTICC.

Berouti, M., Schwartz, R., & Makhoul, J. (1979). Enhancement of speech corrupted by acoustic noise. In Acoustics, Speech, and Signal Processing, ICASSP’79 (Vol. 4, pp. 208–211). IEEE.

Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., & Ouellet, P. (2011). Front-end factor analysis for speaker verification. Audio, Speech, and Language Processing, IEEE Transactions on, 19(4), 788–798.CrossRef

Do, M. N. (2003). Fast approximation of Kullback–Leibler distance for dependence trees and hidden Markov models. Signal Processing Letters, IEEE, 10(4), 115–118.CrossRefMathSciNet

Gauvain, J. L., & Lee, C. H. (1994). Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. Speech and Audio Processing, IEEE Transactions on, 2(2), 291–298.CrossRef

Gerkmann, T., & Hendriks, R. C. (2012). Unbiased MMSE-based noise power estimation with low complexity and low tracking delay. Audio, Speech, and Language Processing, IEEE Transactions on, 20(4), 1383–1393.CrossRef

Gonzalez-Rodriguez, J., Drygajlo, A., Ramos-Castro, D., Garcia-Gomar, M., & Ortega-Garcia, J. (2006). Robust estimation, interpretation and assessment of likelihood ratios in forensic speaker recognition. Computer Speech & Language, 20(2), 331–355.CrossRef

Kanungo, T., Mount, D. M., Netanyahu, N. S., Piatko, C. D., Silverman, R., & Wu, A. Y. (2002). An efficient k-means clustering algorithm: Analysis and implementation. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 24(7), 881–892.CrossRef

Kenny, P., Boulianne, G., Ouellet, P., & Dumouchel, P. (2007). Joint factor analysis versus eigenchannels in speaker recognition. Audio, Speech, and Language Processing, IEEE Transactions on, 15(4), 1435–1447.CrossRef

Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: From features to supervectors. Speech Communication, 52(1), 12–40.CrossRef

Kinnunen, T., & Rajan, P. (2013). A practical, self-adaptive voice activity detector for speaker verification with noisy telephone and microphone data. In Proceedings of Acoustics, Speech and Signal Processing, 2013. ICASSP 2013. Canada, (pp. 7229–7233). IEEE.

Linde, Y., Buzo, A., & Gray, R. M. (1980). An algorithm for vector quantizer design. Communications, IEEE Transactions on, 28(1), 84–95.CrossRef

Ma, B., Meng, H. M., Mak, M. W. (2007). Effects of device mismatch, language mismatch and environmental mismatch on speaker verification. In Acoustics, speech and signal processing, 2007. ICASSP 2007. IEEE International Conference on (Vol. 4, pp. IV-301).

Mak, M. W., & Yu, H. B. (2014). A study of voice activity detection techniques for NIST speaker recognition evaluations. Computer Speech & Language, 28(1), 295–313.CrossRef

Martin, R. (2001). Noise power spectral density estimation based on optimal smoothing and minimum statistics. Speech and Audio Processing, IEEE Transactions on, 9(5), 504–512.CrossRef

Morales-Cordovilla, J. A., Sánchez, V., Gómez, A. M., & Peinado, A. M. (2012). On the use of asymmetric windows for robust speech recognition. Circuits, Systems, and Signal Processing, 31(2), 727–736.CrossRefMathSciNet

Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker verification using adapted Gaussian mixture models. Digital signal processing, 10(1), 19–41.CrossRef

Rozman, R., & Kodek, D. M. (2007). Using asymmetric windows in automatic speech recognition. Speech communication, 49(4), 268–276.CrossRef

Varga, A., & Steeneken, H. J. (1993). Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech communication, 12(3), 247–251.CrossRef

Titel: Improving the self-adaptive voice activity detector for speaker verification using map adaptation and asymmetric tapers
verfasst von: Nassim Asbai
Messaoud Bengherabi
Abderrahmane Amrouche
Youcef Aklouf
Publikationsdatum: 01.06.2015
Verlag: Springer US
Erschienen in: International Journal of Speech Technology / Ausgabe 2/2015
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-014-9260-6

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 2/2015

A wavelet based method for removal of highly non-stationary noises from single-channel hindi speech patterns of low input SNR

Recognition of isolated words using Zernike and MFCC features for audio visual speech recognition

Images compression techniques for wireless sensor network applications

Automatic articulation error detection tool for Punjabi language with aid for hearing impaired people

Using bands of frequencies for vowel recognition for Polish language

Source and system features for phone recognition