Skip to main content

2024 | OriginalPaper | Buchkapitel

Adapting to Noise in Forensic Speaker Verification Using GMM-UBM I-Vector Method in High-Noise Backgrounds

verfasst von : K. V. Aljinu Khadar, R. K. Sunil Kumar, N. S. Sreekanth

Erschienen in: Computational Sciences and Sustainable Technologies

Verlag: Springer Nature Switzerland

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The performance of the GMM-UBM-I vector in a forensic speaker verification system has been examined in the context of noisy speech samples. This analysis utilised both Mel-frequency cepstral coefficients (MFCC) and MFCCs generated from auto-correlated speech signals. The noisy signal’s auto correlation coefficients are concentrated around the lower lag, whereas the autocorrelation coefficients near the higher lag are very small. Thus, in addition to retain the periodic nature, autocorrelation-based MFCC is also robust for analyzing speech signals in intense background noise. The performance of MFCC and auto-correlated MFCC depends heavily on the quality of the sample. It works best with data that is free of noise, but it suffers when used on real-world examples, ie, with noisy data. The experiment on speaker verification for forensic purposes involved the addition of White Gaussian Noise, Red Noise, and Pink Noise, with a Signal-to-Noise Ratio (SNR) range spanning from −20 dB to + 20 dB. The performance of both methods was affected drastically in call cases but autocorrelation-based MFCC gave better results than MFCC. Thus, autocorrelation-based MFCC is a valuable method for robust feature extraction when compared with MFCC for speaker verification purposes in intense background noise. The verification accuracy in our method is improved even in very high noise levels (−20 dB) than the reported research work.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Furui, S.: Speaker-independent and speaker-adaptive recognition techniques. Adv. Speech Signal Process. 597–622 (1992) Furui, S.: Speaker-independent and speaker-adaptive recognition techniques. Adv. Speech Signal Process. 597–622 (1992)
2.
Zurück zum Zitat Chiu, T.-L., Liou, H.-C., Yeh, Y.: A study of web-based oral activities enhanced by automatic speech recognition for EFL college learning. Comput. Assisted Lang. Learn. 20(3), 209–233 (2007)CrossRef Chiu, T.-L., Liou, H.-C., Yeh, Y.: A study of web-based oral activities enhanced by automatic speech recognition for EFL college learning. Comput. Assisted Lang. Learn. 20(3), 209–233 (2007)CrossRef
3.
Zurück zum Zitat Kabir, M.M., et al.: A survey of speaker recognition: fundamental theories, recognition methods and opportunities. IEEE Access 9, 79236–79263 (2021)CrossRef Kabir, M.M., et al.: A survey of speaker recognition: fundamental theories, recognition methods and opportunities. IEEE Access 9, 79236–79263 (2021)CrossRef
4.
Zurück zum Zitat Ajili, M., et al.: Phonological content impact on wrongful convictions in forensic voice comparison context. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2017) Ajili, M., et al.: Phonological content impact on wrongful convictions in forensic voice comparison context. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2017)
6.
Zurück zum Zitat Tull, R.G., Rutledge, J.C.: Cold speech for automatic speaker recognition. In: Acoustical Society of America 131st Meeting Lay Language Papers (1996) Tull, R.G., Rutledge, J.C.: Cold speech for automatic speaker recognition. In: Acoustical Society of America 131st Meeting Lay Language Papers (1996)
7.
Zurück zum Zitat Benzeghiba, M., et al.: Impact of variabilities on speech recognition. In: Proceeding of the SPECOM (2006) Benzeghiba, M., et al.: Impact of variabilities on speech recognition. In: Proceeding of the SPECOM (2006)
8.
Zurück zum Zitat Mandasari, M., McLaren, M., van Leeuwen, D.A.: The effect of noise on modern automatic speaker recognition systems. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2012) Mandasari, M., McLaren, M., van Leeuwen, D.A.: The effect of noise on modern automatic speaker recognition systems. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2012)
9.
Zurück zum Zitat Hasan, T., et al.: CRSS systems for 2012 NIST speaker recognition evaluation. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE (2013) Hasan, T., et al.: CRSS systems for 2012 NIST speaker recognition evaluation. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE (2013)
10.
Zurück zum Zitat Logan, B.: Mel frequency cepstral coefficients for music modeling. ISMIR 270(1) (2000) Logan, B.: Mel frequency cepstral coefficients for music modeling. ISMIR 270(1) (2000)
11.
Zurück zum Zitat Atal, B.S.: The history of linear prediction. IEEE Signal Process. Mag. 23(2), 154–161 (2006)CrossRef Atal, B.S.: The history of linear prediction. IEEE Signal Process. Mag. 23(2), 154–161 (2006)CrossRef
12.
Zurück zum Zitat Hariharan, M., Chee, L.S., Yaacob, S.: Analysis of infant cry through weighted linear prediction cepstral coefficients and probabilistic neural network. J. Med. Syst. 36, 1309–1315 (2012)CrossRef Hariharan, M., Chee, L.S., Yaacob, S.: Analysis of infant cry through weighted linear prediction cepstral coefficients and probabilistic neural network. J. Med. Syst. 36, 1309–1315 (2012)CrossRef
13.
Zurück zum Zitat Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. J. Acoustical Soc. Am. 87(4), 1738–1752 (1990)CrossRef Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. J. Acoustical Soc. Am. 87(4), 1738–1752 (1990)CrossRef
14.
Zurück zum Zitat Naing, H.M.S., et al.: Filterbank analysis of MFCC feature extraction in robust children speech recognition. In: 2019 International Symposium on Multimedia and Communication Technology (ISMAC). IEEE (2019) Naing, H.M.S., et al.: Filterbank analysis of MFCC feature extraction in robust children speech recognition. In: 2019 International Symposium on Multimedia and Communication Technology (ISMAC). IEEE (2019)
15.
Zurück zum Zitat Bai, Z., Zhang, X.-L.: Speaker recognition based on deep learning: an overview. Neural Netw. 140, 65–99 (2021)CrossRef Bai, Z., Zhang, X.-L.: Speaker recognition based on deep learning: an overview. Neural Netw. 140, 65–99 (2021)CrossRef
16.
Zurück zum Zitat Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted Gaussian mixture models. Digit. Signal Process. 10(1–3), 19–41 (2000)CrossRef Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted Gaussian mixture models. Digit. Signal Process. 10(1–3), 19–41 (2000)CrossRef
17.
Zurück zum Zitat Kenny, P., et al.: Joint factor analysis versus eigenchannels in speaker recognition. IEEE Trans. Audio Speech Lang. Process. 15(4), 1435–1447 (2007)CrossRef Kenny, P., et al.: Joint factor analysis versus eigenchannels in speaker recognition. IEEE Trans. Audio Speech Lang. Process. 15(4), 1435–1447 (2007)CrossRef
18.
Zurück zum Zitat Kenny, P., et al.: A study of interspeaker variability in speaker verification. IEEE Trans. Audio Speech Lang. Process. 16(5), 980–988 (2008)CrossRef Kenny, P., et al.: A study of interspeaker variability in speaker verification. IEEE Trans. Audio Speech Lang. Process. 16(5), 980–988 (2008)CrossRef
19.
Zurück zum Zitat Dehak, N., et al.: Cosine similarity scoring without score normalization techniques. Odyssey (2010) Dehak, N., et al.: Cosine similarity scoring without score normalization techniques. Odyssey (2010)
20.
Zurück zum Zitat Matějka, P., et al.: Full-covariance UBM and heavy-tailed PLDA in I-vector speaker verification. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2011) Matějka, P., et al.: Full-covariance UBM and heavy-tailed PLDA in I-vector speaker verification. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2011)
21.
Zurück zum Zitat Dehak, N., et al.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2010)CrossRef Dehak, N., et al.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2010)CrossRef
22.
Zurück zum Zitat Dev, A., Bansal, P.: Robust features for noisy speech recognition using MFCC computation from magnitude spectrum of higher order autocorrelation coefficients. Int. J. Comput. Appl. 10(8), 36–38 (2010) Dev, A., Bansal, P.: Robust features for noisy speech recognition using MFCC computation from magnitude spectrum of higher order autocorrelation coefficients. Int. J. Comput. Appl. 10(8), 36–38 (2010)
23.
Zurück zum Zitat Farahani, G., Ahadi, S.M.: Robust features for noisy speech recognition based on filtering and spectral peaks in autocorrelation domain. In: 2005 13th European Signal Processing Conference, pp. 1–4. IEEE (2005) Farahani, G., Ahadi, S.M.: Robust features for noisy speech recognition based on filtering and spectral peaks in autocorrelation domain. In: 2005 13th European Signal Processing Conference, pp. 1–4. IEEE (2005)
25.
Zurück zum Zitat Lau, L.: This is a sample template for authors. J. Digit. Forensics Secur. Law 9(2), 1–2 (2014) Lau, L.: This is a sample template for authors. J. Digit. Forensics Secur. Law 9(2), 1–2 (2014)
27.
Zurück zum Zitat Bibish Kumar, K.T., Sunil Kumar, R.K.: Viseme identification and analysis for recognition of Malayalam speech intense background noise. Ph.D. thesis (2021) Bibish Kumar, K.T., Sunil Kumar, R.K.: Viseme identification and analysis for recognition of Malayalam speech intense background noise. Ph.D. thesis (2021)
Metadaten
Titel
Adapting to Noise in Forensic Speaker Verification Using GMM-UBM I-Vector Method in High-Noise Backgrounds
verfasst von
K. V. Aljinu Khadar
R. K. Sunil Kumar
N. S. Sreekanth
Copyright-Jahr
2024
DOI
https://doi.org/10.1007/978-3-031-50993-3_22

Premium Partner