Skip to main content
Erschienen in: International Journal of Speech Technology 1/2022

17.09.2021

Closed-set speaker identification using VQ and GMM based models

verfasst von: Bidhan Barai, Tapas Chakraborty, Nibaran Das, Subhadip Basu, Mita Nasipuri

Erschienen in: International Journal of Speech Technology | Ausgabe 1/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

An array of features and methods are being developed over the past six decades for Speaker Identification (SI) and Speaker Verification (SV), jointly known as Speaker Recognition(SR). Mel Frequency Cepstral Coefficients (MFCC) is generally used as feature vectors in most of the cases because it gives higher accuracy compared to other features. The presented paper focuses on comparative study of state-of-the-art SR techniques along with their design challenges, robustness issues and performance evaluation methods. Rigorous experiments have been performed using Gaussian Mixture Model (GMM) with variations like Universal Background Model (UBM) and/or Vector Quantization (VQ) and/or VQ based UBM-GMM (VQ-UBM-GMM) with detail discussion. Other popular methods have been included, namely, Linear Discriminate Analysis (LDA), Probabilistic LDA (PLDA), Gaussian PLDA (GPLDA), Multi-condition GPLDA (MGPLDA), Identity Vector (i-vector) for comparative study only. Three popular audio data-sets have been used in the experiments, namely, IITG-MV SR, Hyke-2011 and ELSDSR. Hyke-2011 and ELSDSR contain clean speech while IITG-MV SR contains noisy audio data with variations in channel (device), environment, spoken style. We propose a new data mixing approach for SR to make the system independent of recording device, spoken style and environment. The accuracy we obtained for VQ and GMM based methods for databases, Hyke-2011 and ELSDSR are varies from \(99.6\%\) to \(100\%\) whereas accuracy for IITG-MV SR is upto \(98\%\). Indeed, in some cases the accuracies degrade drastically due to mismatch between training and testing data as well as singularity problem of GMM. The experimental results serve as a benchmark for VQ/GMM/UBM based methods for the IITG-MV SR database.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Abd El-Moneim, S., Sedik, A., Nassar, M. A., El-Fishawy, A. S., Sharshar, A. M., Hassan, S. E., et al. (2021). Text-dependent and text-independent speaker recognition of reverberant speech based on CNN. International Journal of Speech Technology, 20, 99–108. Abd El-Moneim, S., Sedik, A., Nassar, M. A., El-Fishawy, A. S., Sharshar, A. M., Hassan, S. E., et al. (2021). Text-dependent and text-independent speaker recognition of reverberant speech based on CNN. International Journal of Speech Technology, 20, 99–108.
Zurück zum Zitat Anand, P., Singh, A. K., Srivastava, S., & Lall, B. (2019). Few shot speaker recognition using deep neural networks. arXiv preprint arXiv:1904.08775. Anand, P., Singh, A. K., Srivastava, S., & Lall, B. (2019). Few shot speaker recognition using deep neural networks. arXiv preprint arXiv:​1904.​08775.
Zurück zum Zitat Aronowitz, H., & Aronowitz, V. (2010, March). Efficient score normalization for speaker recognition. In 2010 IEEE international conference on acoustics, speech and signal processing (pp. 4402–4405) IEEE. Aronowitz, H., & Aronowitz, V. (2010, March). Efficient score normalization for speaker recognition. In 2010 IEEE international conference on acoustics, speech and signal processing (pp. 4402–4405) IEEE.
Zurück zum Zitat Avci, E. (2007). A new optimum feature extraction and classification method for speaker recognition: GWPNN. Expert Systems with Applications, 32(2), 485–498.CrossRef Avci, E. (2007). A new optimum feature extraction and classification method for speaker recognition: GWPNN. Expert Systems with Applications, 32(2), 485–498.CrossRef
Zurück zum Zitat Barai, B., Das, D., Das, N., Basu, S., & Nasipuri, M. (2017). An ASR system using MFCC and VQ/GMM with emphasis on environmental dependency, IEEE Calcutta Conference (CALCON), Kolkata (pp. 362–366). Barai, B., Das, D., Das, N., Basu, S., & Nasipuri, M. (2017). An ASR system using MFCC and VQ/GMM with emphasis on environmental dependency, IEEE Calcutta Conference (CALCON), Kolkata (pp. 362–366).
Zurück zum Zitat Barai, B., Das, D., Das, N., Basu, S., & Nasipuri, M. (2018). Closed-set text-independent automatic speaker recognition system using VQ/GMM. In Intelligent Engineering Informatics (pp. 337–346). Singapore: Springer. Barai, B., Das, D., Das, N., Basu, S., & Nasipuri, M. (2018). Closed-set text-independent automatic speaker recognition system using VQ/GMM. In Intelligent Engineering Informatics (pp. 337–346). Singapore: Springer.
Zurück zum Zitat Barai, B., Das, D., Das, N., Basu, S., & Nasipuri, M. (2019). VQ/GMM-Based Speaker Identification with Emphasis on Language Dependency, Advanced Computing and Systems for Security(ACSS), Advances in Intelligent Systems and Computing (Vol. 883). Singapore: Springer. Barai, B., Das, D., Das, N., Basu, S., & Nasipuri, M. (2019). VQ/GMM-Based Speaker Identification with Emphasis on Language Dependency, Advanced Computing and Systems for Security(ACSS), Advances in Intelligent Systems and Computing (Vol. 883). Singapore: Springer.
Zurück zum Zitat Bolt, R. H., Cooper, F. S., David, E. E., Denes, P. B., Pickett, J. M., & Stevens, K. N. (1969). Identification of a speaker by speech spectrograms. Science, 166(3903), 338–343.CrossRef Bolt, R. H., Cooper, F. S., David, E. E., Denes, P. B., Pickett, J. M., & Stevens, K. N. (1969). Identification of a speaker by speech spectrograms. Science, 166(3903), 338–343.CrossRef
Zurück zum Zitat BÜYÜK, O., & Arslan, M. L. (2012). Model selection and score normalization for text-dependent single utterance speaker verification. Turkish Journal of Electrical Engineering and Computer Science, 20(2), 1277–1295. BÜYÜK, O., & Arslan, M. L. (2012). Model selection and score normalization for text-dependent single utterance speaker verification. Turkish Journal of Electrical Engineering and Computer Science, 20(2), 1277–1295.
Zurück zum Zitat Campbell, W. M., Sturim, D. E., Reynolds, D. A., & Solomonoff, A. (2006, May). SVM based speaker verification using a GMM supervector kernel and NAP variability compensation. In 2006 IEEE international conference on acoustics speech and signal processing proceedings (Vol. 1, pp. I–I) IEEE. Campbell, W. M., Sturim, D. E., Reynolds, D. A., & Solomonoff, A. (2006, May). SVM based speaker verification using a GMM supervector kernel and NAP variability compensation. In 2006 IEEE international conference on acoustics speech and signal processing proceedings (Vol. 1, pp. I–I) IEEE.
Zurück zum Zitat Campbell, W. M., Sturim, D. E., & Reynolds, D. A. (2006). Support vector machines using GMM supervectors for speaker verification. IEEE Signal Processing Letters, 13(5), 308–311.CrossRef Campbell, W. M., Sturim, D. E., & Reynolds, D. A. (2006). Support vector machines using GMM supervectors for speaker verification. IEEE Signal Processing Letters, 13(5), 308–311.CrossRef
Zurück zum Zitat Chakraborty, T., Barai, B., Chatterjee, B., Das, N., Basu, S., & Nasipuri, M. (2019). Closed-set device-independent speaker identification using cnn. In: International conference on intelligent computing and communication (ICICC - 2019). Berlin: Springer. Chakraborty, T., Barai, B., Chatterjee, B., Das, N., Basu, S., & Nasipuri, M. (2019). Closed-set device-independent speaker identification using cnn. In: International conference on intelligent computing and communication (ICICC - 2019). Berlin: Springer.
Zurück zum Zitat Chapaneri, S. V. (2012). Spoken digits recognition using weighted MFCC and improved features for dynamic time warping. International Journal of Computer Applications, 40(3), 6–12.CrossRef Chapaneri, S. V. (2012). Spoken digits recognition using weighted MFCC and improved features for dynamic time warping. International Journal of Computer Applications, 40(3), 6–12.CrossRef
Zurück zum Zitat Colibro, D., Vair, C., Castaldo, F., Dalmasso, E., & Laface, P. (2006, September). Speaker recognition using channel factors feature compensation. In 2006 14th European signal processing conference (pp. 1–5) IEEE. Colibro, D., Vair, C., Castaldo, F., Dalmasso, E., & Laface, P. (2006, September). Speaker recognition using channel factors feature compensation. In 2006 14th European signal processing conference (pp. 1–5) IEEE.
Zurück zum Zitat Cooke, M., Green, P., Josifovski, L., & Vizinho, A. (2001). Robust automatic speech recognition with missing and unreliable acoustic data. Speech Communication, 34(3), 267–285.MATHCrossRef Cooke, M., Green, P., Josifovski, L., & Vizinho, A. (2001). Robust automatic speech recognition with missing and unreliable acoustic data. Speech Communication, 34(3), 267–285.MATHCrossRef
Zurück zum Zitat Cumani, S., & Laface, P. (2018). Speaker recognition using e-vectors. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(4), 736–748.CrossRef Cumani, S., & Laface, P. (2018). Speaker recognition using e-vectors. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(4), 736–748.CrossRef
Zurück zum Zitat Dehak, N., Kenny, P. J., Dehak, R., Dumouchel, P., & Ouellet, P. (2010). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 19(4), 788–798.CrossRef Dehak, N., Kenny, P. J., Dehak, R., Dumouchel, P., & Ouellet, P. (2010). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 19(4), 788–798.CrossRef
Zurück zum Zitat Dehak, N., Plchot, O., Bahari, M. H., Burget, L., & Dehak, R. (2014). GMM weights adaptation based on subspace approaches for speaker verification. Proceedings Odyssey, 2014, 48–53.CrossRef Dehak, N., Plchot, O., Bahari, M. H., Burget, L., & Dehak, R. (2014). GMM weights adaptation based on subspace approaches for speaker verification. Proceedings Odyssey, 2014, 48–53.CrossRef
Zurück zum Zitat Djellali, H., & Laskri, M. T. (2013). Random vector quantisation modelling in automatic speaker verification. International Journal of Biometrics, 5(3–4), 248–265.CrossRef Djellali, H., & Laskri, M. T. (2013). Random vector quantisation modelling in automatic speaker verification. International Journal of Biometrics, 5(3–4), 248–265.CrossRef
Zurück zum Zitat Fan, X., & Hansen, J. H. (2009). Speaker identification with whispered speech based on modified LFCC parameters and feature mapping. In 2009 IEEE international conference on acoustics, speech and signal processing (pp. 4553–4556) IEEE. Fan, X., & Hansen, J. H. (2009). Speaker identification with whispered speech based on modified LFCC parameters and feature mapping. In 2009 IEEE international conference on acoustics, speech and signal processing (pp. 4553–4556) IEEE.
Zurück zum Zitat Feng, L., & Hansen, L. K. (2005). A new database for speaker recognition. IMM, Informatik og Matematisk Modelling, DTU. Feng, L., & Hansen, L. K. (2005). A new database for speaker recognition. IMM, Informatik og Matematisk Modelling, DTU.
Zurück zum Zitat Fortuna, J., Sivakumaran, P., Ariyaeeinia, A., & Malegaonkar, A. (2005). Open-set speaker identification using adapted Gaussian mixture models. In: Ninth European conference on speech communication and technology. Fortuna, J., Sivakumaran, P., Ariyaeeinia, A., & Malegaonkar, A. (2005). Open-set speaker identification using adapted Gaussian mixture models. In: Ninth European conference on speech communication and technology.
Zurück zum Zitat Fujihara, H., Kitahara, T., Goto, M., Komatani, K., Ogata, T., & Okuno, H. G. (2006). Speaker identification under noisy environments by using harmonic structure extraction and reliable frame weighting. In Ninth international conference on spoken language processing. Fujihara, H., Kitahara, T., Goto, M., Komatani, K., Ogata, T., & Okuno, H. G. (2006). Speaker identification under noisy environments by using harmonic structure extraction and reliable frame weighting. In Ninth international conference on spoken language processing.
Zurück zum Zitat Furui, S. (2018). Digital speech processing: Synthesis, and recognition. New York: CRC Press.CrossRef Furui, S. (2018). Digital speech processing: Synthesis, and recognition. New York: CRC Press.CrossRef
Zurück zum Zitat Ganchev, T., Potamitis, I., Fakotakis, N., & Kokkinakis, G. (2004). Text-independent speaker verification for real fast-varying noisy environments. International Journal of Speech Technology, 7(4), 281–292.CrossRef Ganchev, T., Potamitis, I., Fakotakis, N., & Kokkinakis, G. (2004). Text-independent speaker verification for real fast-varying noisy environments. International Journal of Speech Technology, 7(4), 281–292.CrossRef
Zurück zum Zitat Garcia, A. A., & Mammone, R. J. (1999, March). Channel-robust speaker identification using modified-mean cepstral mean normalization with frequency warping. In 1999 IEEE international conference on acoustics, speech, and signal processing. Proceedings. ICASSP99 (Cat. No. 99CH36258) (Vol. 1, pp. 325–328) IEEE. Garcia, A. A., & Mammone, R. J. (1999, March). Channel-robust speaker identification using modified-mean cepstral mean normalization with frequency warping. In 1999 IEEE international conference on acoustics, speech, and signal processing. Proceedings. ICASSP99 (Cat. No. 99CH36258) (Vol. 1, pp. 325–328) IEEE.
Zurück zum Zitat Garcia-Romero, D., Zhou, X., & Espy-Wilson, C. Y. (2012, March). Multicondition training of Gaussian PLDA models in i-vector space for noise and reverberation robust speaker recognition. In 2012 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4257–4260) IEEE. Garcia-Romero, D., Zhou, X., & Espy-Wilson, C. Y. (2012, March). Multicondition training of Gaussian PLDA models in i-vector space for noise and reverberation robust speaker recognition. In 2012 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4257–4260) IEEE.
Zurück zum Zitat Ghahabi, O., & Hernando, J. (2018). Restricted Boltzmann machines for vector representation of speech in speaker recognition. Computer Speech & Language, 47, 16–29.CrossRef Ghahabi, O., & Hernando, J. (2018). Restricted Boltzmann machines for vector representation of speech in speaker recognition. Computer Speech & Language, 47, 16–29.CrossRef
Zurück zum Zitat Grimaldi, M., & Cummins, F. (2008). Speaker identification using instantaneous frequencies. IEEE Transactions on Audio, Speech, and Language Processing, 16(6), 1097–1111.CrossRef Grimaldi, M., & Cummins, F. (2008). Speaker identification using instantaneous frequencies. IEEE Transactions on Audio, Speech, and Language Processing, 16(6), 1097–1111.CrossRef
Zurück zum Zitat Han, C. C., Chen, Y. N., Lo, C. C., & Wang, C. T. (2007). A novel approach for vector quantization using a neural network, mean shift, and principal component analysis-based seed re-initialization. Signal Processing, 87(5), 799–810.MATHCrossRef Han, C. C., Chen, Y. N., Lo, C. C., & Wang, C. T. (2007). A novel approach for vector quantization using a neural network, mean shift, and principal component analysis-based seed re-initialization. Signal Processing, 87(5), 799–810.MATHCrossRef
Zurück zum Zitat Haris, B. C., Pradhan, G., Misra, A., Prasanna, S. R. M., Das, R. K., & Sinha, R. (2012). Multivariability speaker recognition database in Indian scenario. International Journal of Speech Technology, 15(4), 441–453.CrossRef Haris, B. C., Pradhan, G., Misra, A., Prasanna, S. R. M., Das, R. K., & Sinha, R. (2012). Multivariability speaker recognition database in Indian scenario. International Journal of Speech Technology, 15(4), 441–453.CrossRef
Zurück zum Zitat Hegde, R. M., Murthy, H. A., & Gadde, V. R. R. (2006). Significance of the modified group delay feature in speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 15(1), 190–202.CrossRef Hegde, R. M., Murthy, H. A., & Gadde, V. R. R. (2006). Significance of the modified group delay feature in speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 15(1), 190–202.CrossRef
Zurück zum Zitat Hourri, S., Nikolov, N. S., & Kharroubi, J. (2020). A deep learning approach to integrate convolutional neural networks in speaker recognition. International Journal of Speech Technology, 23, 615–623.CrossRef Hourri, S., Nikolov, N. S., & Kharroubi, J. (2020). A deep learning approach to integrate convolutional neural networks in speaker recognition. International Journal of Speech Technology, 23, 615–623.CrossRef
Zurück zum Zitat Hourri, S., Nikolov, N. S., & Kharroubi, J. (2021). Convolutional neural network vectors for speaker recognition. International Journal of Speech Technology, 24(2), 389–400.CrossRef Hourri, S., Nikolov, N. S., & Kharroubi, J. (2021). Convolutional neural network vectors for speaker recognition. International Journal of Speech Technology, 24(2), 389–400.CrossRef
Zurück zum Zitat Juang, B. H., Rabiner, L., & Wilpon, J. G. (1987). On the use of bandpass liftering in speech recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 35(7), 947–954.CrossRef Juang, B. H., Rabiner, L., & Wilpon, J. G. (1987). On the use of bandpass liftering in speech recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 35(7), 947–954.CrossRef
Zurück zum Zitat Kanagasundaram, A., Dean, D., & Sridharan, S. (2012, December). JFA based speaker recognition using delta-phase and MFCC features. In SST 2012 14th Australasian international conference on speech science and technology. Kanagasundaram, A., Dean, D., & Sridharan, S. (2012, December). JFA based speaker recognition using delta-phase and MFCC features. In SST 2012 14th Australasian international conference on speech science and technology.
Zurück zum Zitat Kenny, P., Stafylakis, T., Ouellet, P., & Alam, M. J. (2014, May). JFA-based front ends for speaker recognition. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 1705–1709) IEEE. Kenny, P., Stafylakis, T., Ouellet, P., & Alam, M. J. (2014, May). JFA-based front ends for speaker recognition. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 1705–1709) IEEE.
Zurück zum Zitat Kenny, P. (2005). Joint factor analysis of speaker and session variability: Theory and algorithms. CRIM, Montreal,(Report) CRIM-06/08-13, 14, 28–29. Kenny, P. (2005). Joint factor analysis of speaker and session variability: Theory and algorithms. CRIM, Montreal,(Report) CRIM-06/08-13, 14, 28–29.
Zurück zum Zitat Kenny, P., Boulianne, G., & Dumouchel, P. (2005). Eigenvoice modeling with sparse training data. IEEE Transactions on Speech and Audio Processing, 13(3), 345–354.CrossRef Kenny, P., Boulianne, G., & Dumouchel, P. (2005). Eigenvoice modeling with sparse training data. IEEE Transactions on Speech and Audio Processing, 13(3), 345–354.CrossRef
Zurück zum Zitat Kenny, P., Ouellet, P., Dehak, N., Gupta, V., & Dumouchel, P. (2008). A study of interspeaker variability in speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 16(5), 980–988.CrossRef Kenny, P., Ouellet, P., Dehak, N., Gupta, V., & Dumouchel, P. (2008). A study of interspeaker variability in speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 16(5), 980–988.CrossRef
Zurück zum Zitat Khosravani, A., & Homayounpour, M. M. (2017). A PLDA approach for language and text independent speaker recognition. Computer Speech & Language, 45, 457–474.CrossRef Khosravani, A., & Homayounpour, M. M. (2017). A PLDA approach for language and text independent speaker recognition. Computer Speech & Language, 45, 457–474.CrossRef
Zurück zum Zitat Khosravani, A., & Homayounpour, M. M. (2018). Nonparametrically trained PLDA for short duration i-vector speaker verification. Computer Speech & Language, 52, 105–122.CrossRef Khosravani, A., & Homayounpour, M. M. (2018). Nonparametrically trained PLDA for short duration i-vector speaker verification. Computer Speech & Language, 52, 105–122.CrossRef
Zurück zum Zitat Kohonen, T. (1990). The self-organizing map. Proceedings of the IEEE, 78(9), 1464–1480.CrossRef Kohonen, T. (1990). The self-organizing map. Proceedings of the IEEE, 78(9), 1464–1480.CrossRef
Zurück zum Zitat Kuhn, R., Nguyen, P., Junqua, J. C., & Boman, R. (2000). Panasonic Corp, Speaker verification and speaker identification based on eigenvoices. U.S. Patent, 6(141), 644. Kuhn, R., Nguyen, P., Junqua, J. C., & Boman, R. (2000). Panasonic Corp, Speaker verification and speaker identification based on eigenvoices. U.S. Patent, 6(141), 644.
Zurück zum Zitat Lawson, A., Vabishchevich, P., Huggins, M., Ardis, P., Battles, B., & Stauffer, A. (2011). Survey and evaluation of acoustic features for speaker recognition. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5444–5447) IEEE. Lawson, A., Vabishchevich, P., Huggins, M., Ardis, P., Battles, B., & Stauffer, A. (2011). Survey and evaluation of acoustic features for speaker recognition. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5444–5447) IEEE.
Zurück zum Zitat Lerato, L., & Mashao, D. J. (2004). Enhancement of GMM speaker identification performance using complementary feature sets. In 2004 IEEE Africon. In 7th African conference in Africa (IEEE Cat. No. 04CH37590) (Vol. 1, pp. 257–261) IEEE. Lerato, L., & Mashao, D. J. (2004). Enhancement of GMM speaker identification performance using complementary feature sets. In 2004 IEEE Africon. In 7th African conference in Africa (IEEE Cat. No. 04CH37590) (Vol. 1, pp. 257–261) IEEE.
Zurück zum Zitat Li, Q., & Huang, Y. (2010, March). Robust speaker identification using an auditory-based feature. In 2010 IEEE international conference on acoustics, speech and signal processing (pp. 4514–4517) IEEE. Li, Q., & Huang, Y. (2010, March). Robust speaker identification using an auditory-based feature. In 2010 IEEE international conference on acoustics, speech and signal processing (pp. 4514–4517) IEEE.
Zurück zum Zitat Li, M., & Narayanan, S. (2014). Simplified supervised i-vector modeling with application to robust and efficient language identification and speaker verification. Computer Speech & Language, 28(4), 940–958.CrossRef Li, M., & Narayanan, S. (2014). Simplified supervised i-vector modeling with application to robust and efficient language identification and speaker verification. Computer Speech & Language, 28(4), 940–958.CrossRef
Zurück zum Zitat Linde, Y., Buzo, A., & Gray, R. (1980). An algorithm for vector quantizer design. IEEE Transactions on Communications, 28(1), 84–95.CrossRef Linde, Y., Buzo, A., & Gray, R. (1980). An algorithm for vector quantizer design. IEEE Transactions on Communications, 28(1), 84–95.CrossRef
Zurück zum Zitat Ling, Z., & Hong, Z. (2013, January). The improved VQ-MAP and its combination with LS-SVM for speaker recognition. In IEEE conference anthology (pp. 1–4) IEEE. Ling, Z., & Hong, Z. (2013, January). The improved VQ-MAP and its combination with LS-SVM for speaker recognition. In IEEE conference anthology (pp. 1–4) IEEE.
Zurück zum Zitat Liu, Z., Wu, Z., Li, T., Li, J., & Shen, C. (2018). GMM and CNN hybrid method for short utterance speaker recognition. IEEE Transactions on Industrial informatics, 14(7), 3244–3252.CrossRef Liu, Z., Wu, Z., Li, T., Li, J., & Shen, C. (2018). GMM and CNN hybrid method for short utterance speaker recognition. IEEE Transactions on Industrial informatics, 14(7), 3244–3252.CrossRef
Zurück zum Zitat Lleida, E., & Rodriguez-Fuentes, L. J. (2018). Speaker and language recognition and characterization: Introduction to the CSL special issue. Lleida, E., & Rodriguez-Fuentes, L. J. (2018). Speaker and language recognition and characterization: Introduction to the CSL special issue.
Zurück zum Zitat Lozano-Diez, A., Silnova, A., Matejka, P., Glembek, O., Plchot, O., Pesan, J., et al. (2016). Analysis and optimization of bottleneck features for speaker recognition. Odyssey, 2016, 21–24. Lozano-Diez, A., Silnova, A., Matejka, P., Glembek, O., Plchot, O., Pesan, J., et al. (2016). Analysis and optimization of bottleneck features for speaker recognition. Odyssey, 2016, 21–24.
Zurück zum Zitat Madikeri, S. R., & Murthy, H. A. (2011, January). Mel filter bank energy-based slope feature and its application to speaker recognition. In 2011 National Conference on Communications (NCC) (pp. 1–4) IEEE. Madikeri, S. R., & Murthy, H. A. (2011, January). Mel filter bank energy-based slope feature and its application to speaker recognition. In 2011 National Conference on Communications (NCC) (pp. 1–4) IEEE.
Zurück zum Zitat Mandasari, M. I., Saeidi, R., McLaren, M., & van Leeuwen, D. A. (2013). Quality measure functions for calibration of speaker recognition systems in various duration conditions. IEEE Transactions on Audio, Speech, and Language Processing, 21(11), 2425–2438.CrossRef Mandasari, M. I., Saeidi, R., McLaren, M., & van Leeuwen, D. A. (2013). Quality measure functions for calibration of speaker recognition systems in various duration conditions. IEEE Transactions on Audio, Speech, and Language Processing, 21(11), 2425–2438.CrossRef
Zurück zum Zitat Mandasari, M. I., Saeidi, R., & van Leeuwen, D. A. (2015). Quality measures based calibration with duration and noise dependency for speaker recognition. Speech Communication, 72, 126–137.CrossRef Mandasari, M. I., Saeidi, R., & van Leeuwen, D. A. (2015). Quality measures based calibration with duration and noise dependency for speaker recognition. Speech Communication, 72, 126–137.CrossRef
Zurück zum Zitat Markov, K., & Nakagawa, S. (1996, October). Frame level likelihood normalization for text-independent speaker identification using Gaussian mixture models. In Proceeding of fourth international conference on spoken language processing. ICSLP96 (Vol. 3, pp. 1764–1767) IEEE. Markov, K., & Nakagawa, S. (1996, October). Frame level likelihood normalization for text-independent speaker identification using Gaussian mixture models. In Proceeding of fourth international conference on spoken language processing. ICSLP96 (Vol. 3, pp. 1764–1767) IEEE.
Zurück zum Zitat Martin, A. F., Greenberg, C. S., Stanford, V. M., Howard, J. M., Doddington, G. R., & Godfrey, J. J. (2014). Performance factor analysis for the 2012 NIST speaker recognition evaluation. In Fifteenth annual conference of the international speech communication association. Martin, A. F., Greenberg, C. S., Stanford, V. M., Howard, J. M., Doddington, G. R., & Godfrey, J. J. (2014). Performance factor analysis for the 2012 NIST speaker recognition evaluation. In Fifteenth annual conference of the international speech communication association.
Zurück zum Zitat Mary, L., & Yegnanarayana, B. (2008). Extraction and representation of prosodic features for language and speaker recognition. Speech Communication, 50(10), 782–796.CrossRef Mary, L., & Yegnanarayana, B. (2008). Extraction and representation of prosodic features for language and speaker recognition. Speech Communication, 50(10), 782–796.CrossRef
Zurück zum Zitat Matějka, P., Glembek, O., Castaldo, F., Alam, M.J., Plchot, O., Kenny, P., Burget, L., & Černocky, J. (2011, May). Full-covariance UBM and heavy-tailed PLDA in i-vector speaker verification. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4828–4831) IEEE. Matějka, P., Glembek, O., Castaldo, F., Alam, M.J., Plchot, O., Kenny, P., Burget, L., & Černocky, J. (2011, May). Full-covariance UBM and heavy-tailed PLDA in i-vector speaker verification. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4828–4831) IEEE.
Zurück zum Zitat Matějka, P., Glembek, O., Novotný, O., Plchot, O., Grézl, F., Burget, L., & Cernocký, J. H. (2016, March). Analysis of DNN approaches to speaker identification. In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5100–5104) IEEE. Matějka, P., Glembek, O., Novotný, O., Plchot, O., Grézl, F., Burget, L., & Cernocký, J. H. (2016, March). Analysis of DNN approaches to speaker identification. In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5100–5104) IEEE.
Zurück zum Zitat Matrouf, D., Ben Kheder, W., Bousquet, P., Ajili, M., & Bonastre, J. (2015). Dealing with additive noise in speaker recognition systems based on i-vector approach, 23rd European Signal Processing Conference (EUSIPCO). Nice, 2015, 2092–2096. Matrouf, D., Ben Kheder, W., Bousquet, P., Ajili, M., & Bonastre, J. (2015). Dealing with additive noise in speaker recognition systems based on i-vector approach, 23rd European Signal Processing Conference (EUSIPCO). Nice, 2015, 2092–2096.
Zurück zum Zitat McCowan, I., Dean, D., McLaren, M., Vogt, R., & Sridharan, S. (2011). The delta-phase spectrum with application to voice activity detection and speaker recognition. IEEE Transactions on Audio, Speech, and Language Processing, 19(7), 2026–2038.CrossRef McCowan, I., Dean, D., McLaren, M., Vogt, R., & Sridharan, S. (2011). The delta-phase spectrum with application to voice activity detection and speaker recognition. IEEE Transactions on Audio, Speech, and Language Processing, 19(7), 2026–2038.CrossRef
Zurück zum Zitat McLaren, M., Castan, D., Ferrer, L., & Lawson, A. (2016, September). On the Issue of Calibration in DNN-Based Speaker Recognition Systems. In INTERSPEECH (pp. 1825–1829). McLaren, M., Castan, D., Ferrer, L., & Lawson, A. (2016, September). On the Issue of Calibration in DNN-Based Speaker Recognition Systems. In INTERSPEECH (pp. 1825–1829).
Zurück zum Zitat Ming, J., Stewart, D., & Vaseghi, S. (2005, March). Speaker identification in unknown noisy conditions-a universal compensation approach. In Proceedings.(ICASSP05). IEEE international conference on acoustics, speech, and signal processing, 2005. (Vol. 1, pp. I–617). IEEE. Ming, J., Stewart, D., & Vaseghi, S. (2005, March). Speaker identification in unknown noisy conditions-a universal compensation approach. In Proceedings.(ICASSP05). IEEE international conference on acoustics, speech, and signal processing, 2005. (Vol. 1, pp. I–617). IEEE.
Zurück zum Zitat Murthy, Y. S., Koolagudi, S. G., & Raja, T. J. (2021). Singer identification for Indian singers using convolutional neural networks. International Journal of Speech Technology, 1, 1–16. Murthy, Y. S., Koolagudi, S. G., & Raja, T. J. (2021). Singer identification for Indian singers using convolutional neural networks. International Journal of Speech Technology, 1, 1–16.
Zurück zum Zitat Murty, K. S. R., & Yegnanarayana, B. (2005). Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Processing Letters, 13(1), 52–55.CrossRef Murty, K. S. R., & Yegnanarayana, B. (2005). Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Processing Letters, 13(1), 52–55.CrossRef
Zurück zum Zitat Nabney, I. (2002). NETLAB: Algorithms for pattern recognition. Berlin: Springer.MATH Nabney, I. (2002). NETLAB: Algorithms for pattern recognition. Berlin: Springer.MATH
Zurück zum Zitat Nakagawa, S., Wang, L., & Ohtsuka, S. (2011). Speaker identification and verification by combining MFCC and phase information. IEEE Transactions on Audio, Speech, and Language Processing, 20(4), 1085–1095.CrossRef Nakagawa, S., Wang, L., & Ohtsuka, S. (2011). Speaker identification and verification by combining MFCC and phase information. IEEE Transactions on Audio, Speech, and Language Processing, 20(4), 1085–1095.CrossRef
Zurück zum Zitat Nguyen, V. X., Nguyen, V. P., & Pham, T. V. (2015, October). Robust speaker identification based on hybrid model of VQ and GMM-UBM. In 2015 international conference on advanced technologies for communications (ATC) (pp. 490–495) IEEE. Nguyen, V. X., Nguyen, V. P., & Pham, T. V. (2015, October). Robust speaker identification based on hybrid model of VQ and GMM-UBM. In 2015 international conference on advanced technologies for communications (ATC) (pp. 490–495) IEEE.
Zurück zum Zitat Nica, A., Caruntu, A., Toderean, G., & Buza, O. (2006, May). Analysis and synthesis of vowels using Matlab. In 2006 IEEE international conference on automation, quality and testing, robotics (Vol. 2, pp. 371–374) IEEE. Nica, A., Caruntu, A., Toderean, G., & Buza, O. (2006, May). Analysis and synthesis of vowels using Matlab. In 2006 IEEE international conference on automation, quality and testing, robotics (Vol. 2, pp. 371–374) IEEE.
Zurück zum Zitat Novoselov, S., Pekhovsky, T., Kudashev, O., Mendelev, V. S., & Prudnikov, A. (2015). Non-linear PLDA for i-vector speaker verification. In Sixteenth annual conference of the international speech communication association. Novoselov, S., Pekhovsky, T., Kudashev, O., Mendelev, V. S., & Prudnikov, A. (2015). Non-linear PLDA for i-vector speaker verification. In Sixteenth annual conference of the international speech communication association.
Zurück zum Zitat Novoselov, S., Pekhovsky, T., Shulipa, A., & Sholokhov, A. (2014, May). Text-dependent GMM-JFA system for password based speaker verification. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 729–737) IEEE. Novoselov, S., Pekhovsky, T., Shulipa, A., & Sholokhov, A. (2014, May). Text-dependent GMM-JFA system for password based speaker verification. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 729–737) IEEE.
Zurück zum Zitat Pal, S. K., & Majumder, D. D. (1977). Fuzzy sets and decision making approaches in vowel and speaker recognition. IEEE Transactions on Systems, Man, and Cybernetics, 7(8), 625–629.MATHCrossRef Pal, S. K., & Majumder, D. D. (1977). Fuzzy sets and decision making approaches in vowel and speaker recognition. IEEE Transactions on Systems, Man, and Cybernetics, 7(8), 625–629.MATHCrossRef
Zurück zum Zitat Pal, S. K., & Mitra, P. (2004). Pattern recognition algorithms for data mining. New York: Chapman and Hall/CRC.MATHCrossRef Pal, S. K., & Mitra, P. (2004). Pattern recognition algorithms for data mining. New York: Chapman and Hall/CRC.MATHCrossRef
Zurück zum Zitat Paliwal, K. K. (1999). Decorrelated and liftered filter-bank energies for robust speech recognition. In Sixth European conference on speech communication and technology. Paliwal, K. K. (1999). Decorrelated and liftered filter-bank energies for robust speech recognition. In Sixth European conference on speech communication and technology.
Zurück zum Zitat Qawaqneh, Z., Mallouh, A. A., & Barkana, B. D. (2017). Deep neural network framework and transformed MFCCs for speakers age and gender classification. Knowledge-Based Systems, 115, 5–14.CrossRef Qawaqneh, Z., Mallouh, A. A., & Barkana, B. D. (2017). Deep neural network framework and transformed MFCCs for speakers age and gender classification. Knowledge-Based Systems, 115, 5–14.CrossRef
Zurück zum Zitat Rabiner, L. R., & Schafer, R. W. (2011). Theory and applications of digital speech processing (Vol. 64). Upper Saddle River, NJ: Pearson. Rabiner, L. R., & Schafer, R. W. (2011). Theory and applications of digital speech processing (Vol. 64). Upper Saddle River, NJ: Pearson.
Zurück zum Zitat Rajan, P., Afanasyev, A., Hautamäki, V., & Kinnunen, T. (2014). From single to multiple enrollment i-vectors: Practical PLDA scoring variants for speaker verification. Digital Signal Processing, 31, 93–101.CrossRef Rajan, P., Afanasyev, A., Hautamäki, V., & Kinnunen, T. (2014). From single to multiple enrollment i-vectors: Practical PLDA scoring variants for speaker verification. Digital Signal Processing, 31, 93–101.CrossRef
Zurück zum Zitat Ram, R., & Mohanty, M. N. (2018). Performance analysis of adaptive variational mode decomposition approach for speech enhancement. International Journal of Speech Technology, 21(2), 369–381.CrossRef Ram, R., & Mohanty, M. N. (2018). Performance analysis of adaptive variational mode decomposition approach for speech enhancement. International Journal of Speech Technology, 21(2), 369–381.CrossRef
Zurück zum Zitat Rao, K. S., & Sarkar, S. (2014). Robust speaker recognition in noisy environments. Cham: Springer.CrossRef Rao, K. S., & Sarkar, S. (2014). Robust speaker recognition in noisy environments. Cham: Springer.CrossRef
Zurück zum Zitat Reda, A., Panjwani, S., & Cutrell, E. (2011). Hyke: A low-cost remote attendance tracking system for developing regions. In Proceedings of the 5th ACM workshop on networked systems for developing regions (pp. 15–20). New York: ACM. Reda, A., Panjwani, S., & Cutrell, E. (2011). Hyke: A low-cost remote attendance tracking system for developing regions. In Proceedings of the 5th ACM workshop on networked systems for developing regions (pp. 15–20). New York: ACM.
Zurück zum Zitat Reyes-Díaz, F. J., Hernández-Sierra, G., & de Lara, J. R. C. (2021). DNN and i-vector combined method for speaker recognition on multi-variability environments. International Journal of Speech Technology, 24(2), 409–418.CrossRef Reyes-Díaz, F. J., Hernández-Sierra, G., & de Lara, J. R. C. (2021). DNN and i-vector combined method for speaker recognition on multi-variability environments. International Journal of Speech Technology, 24(2), 409–418.CrossRef
Zurück zum Zitat Reynolds, D. A. (1995). Speaker identification and verification using Gaussian mixture speaker models. Speech Communication, 17(1–2), 91–108.CrossRef Reynolds, D. A. (1995). Speaker identification and verification using Gaussian mixture speaker models. Speech Communication, 17(1–2), 91–108.CrossRef
Zurück zum Zitat Richardson, F., Reynolds, D., & Dehak, N. (2015). A unified deep neural network for speaker and language recognition. arXiv preprint arXiv:1504.00923. Richardson, F., Reynolds, D., & Dehak, N. (2015). A unified deep neural network for speaker and language recognition. arXiv preprint arXiv:​1504.​00923.
Zurück zum Zitat Richardson, F., Reynolds, D., & Dehak, N. (2015). Deep neural network approaches to speaker and language recognition. IEEE Signal Processing Letters, 22(10), 1671–1675.CrossRef Richardson, F., Reynolds, D., & Dehak, N. (2015). Deep neural network approaches to speaker and language recognition. IEEE Signal Processing Letters, 22(10), 1671–1675.CrossRef
Zurück zum Zitat Rose, P. (2006). Technical forensic speaker recognition: Evaluation, types and testing of evidence. Computer Speech & Language, 20(2–3), 159–191.CrossRef Rose, P. (2006). Technical forensic speaker recognition: Evaluation, types and testing of evidence. Computer Speech & Language, 20(2–3), 159–191.CrossRef
Zurück zum Zitat Rouat, J. (2008). Computational auditory scene analysis: Principles, algorithms, and applications (wang, d. and brown, gj, eds.; 2006)[book review]. IEEE Transactions on Neural Networks, 19(1), 199.CrossRef Rouat, J. (2008). Computational auditory scene analysis: Principles, algorithms, and applications (wang, d. and brown, gj, eds.; 2006)[book review]. IEEE Transactions on Neural Networks, 19(1), 199.CrossRef
Zurück zum Zitat Sawada, H., Mukai, R., Araki, S., & Makino, S. (2004). A robust and precise method for solving the permutation problem of frequency-domain blind source separation. IEEE Transactions on Speech and Audio Processing, 12(5), 530–538.CrossRef Sawada, H., Mukai, R., Araki, S., & Makino, S. (2004). A robust and precise method for solving the permutation problem of frequency-domain blind source separation. IEEE Transactions on Speech and Audio Processing, 12(5), 530–538.CrossRef
Zurück zum Zitat Shahamiri, S. R., & Salim, S. S. B. (2014). Artificial neural networks as speech recognisers for dysarthric speech: Identifying the best-performing set of MFCC parameters and studying a speaker-independent approach. Advanced Engineering Informatics, 28(1), 102–110.CrossRef Shahamiri, S. R., & Salim, S. S. B. (2014). Artificial neural networks as speech recognisers for dysarthric speech: Identifying the best-performing set of MFCC parameters and studying a speaker-independent approach. Advanced Engineering Informatics, 28(1), 102–110.CrossRef
Zurück zum Zitat Shao, Y., & Wang, D. (2008, March). Robust speaker identification using auditory features and computational auditory scene analysis. In 2008 IEEE international conference on acoustics, speech and signal processing (pp. 1589–1592). IEEE. Shao, Y., & Wang, D. (2008, March). Robust speaker identification using auditory features and computational auditory scene analysis. In 2008 IEEE international conference on acoustics, speech and signal processing (pp. 1589–1592). IEEE.
Zurück zum Zitat Shao, Y., Srinivasan, S., & Wang, D. (2007, April). Incorporating auditory feature uncertainties in robust speaker identification. In 2007 IEEE international conference on acoustics, speech and signal processing-ICASSP07 (Vol. 4, pp. IV-277). IEEE. Shao, Y., Srinivasan, S., & Wang, D. (2007, April). Incorporating auditory feature uncertainties in robust speaker identification. In 2007 IEEE international conference on acoustics, speech and signal processing-ICASSP07 (Vol. 4, pp. IV-277). IEEE.
Zurück zum Zitat Shi, X., Yang, H., & Zhou, P. (2016, October). Robust speaker recognition based on improved GFCC. In 2016 2nd IEEE international conference on computer and communications (ICCC) (pp. 1927–1931) IEEE. Shi, X., Yang, H., & Zhou, P. (2016, October). Robust speaker recognition based on improved GFCC. In 2016 2nd IEEE international conference on computer and communications (ICCC) (pp. 1927–1931) IEEE.
Zurück zum Zitat Singh, N., Khan, R. A., & Shree, R. (2012). Applications of speaker recognition. Procedia Engineering, 38, 3122–3126.CrossRef Singh, N., Khan, R. A., & Shree, R. (2012). Applications of speaker recognition. Procedia Engineering, 38, 3122–3126.CrossRef
Zurück zum Zitat Susan, S., & Sharma, S. (2012, November). A fuzzy nearest neighbor classifier for speaker identification. In 2012 fourth international conference on computational intelligence and communication networks (pp. 842–845) IEEE. Susan, S., & Sharma, S. (2012, November). A fuzzy nearest neighbor classifier for speaker identification. In 2012 fourth international conference on computational intelligence and communication networks (pp. 842–845) IEEE.
Zurück zum Zitat Tirumala, S. S., Shahamiri, S. R., Garhwal, A. S., & Wang, R. (2017). Speaker identification features extraction methods: A systematic review. Expert Systems with Applications, 90, 250–271.CrossRef Tirumala, S. S., Shahamiri, S. R., Garhwal, A. S., & Wang, R. (2017). Speaker identification features extraction methods: A systematic review. Expert Systems with Applications, 90, 250–271.CrossRef
Zurück zum Zitat Togneri, R., & Pullella, D. (2011). An overview of speaker identification: Accuracy and robustness issues. IEEE Circuits and Systems Magazine, 11(2), 23–61.CrossRef Togneri, R., & Pullella, D. (2011). An overview of speaker identification: Accuracy and robustness issues. IEEE Circuits and Systems Magazine, 11(2), 23–61.CrossRef
Zurück zum Zitat Tsiakoulis, P., Potamianos, A., & Dimitriadis, D. (2013, May). Instantaneous frequency and bandwidth estimation using filterbank arrays. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 8032–8036) IEEE. Tsiakoulis, P., Potamianos, A., & Dimitriadis, D. (2013, May). Instantaneous frequency and bandwidth estimation using filterbank arrays. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 8032–8036) IEEE.
Zurück zum Zitat Vijayan, K., Kumar, V., & Murty, K. S. R. (2014). Feature extraction from analytic phase of speech signals for speaker verification. In Fifteenth annual conference of the international speech communication association. Vijayan, K., Kumar, V., & Murty, K. S. R. (2014). Feature extraction from analytic phase of speech signals for speaker verification. In Fifteenth annual conference of the international speech communication association.
Zurück zum Zitat Vijayan, K., Reddy, P. R., & Murty, K. S. R. (2016). Significance of analytic phase of speech signals in speaker verification. Speech Communication, 81, 54–71.CrossRef Vijayan, K., Reddy, P. R., & Murty, K. S. R. (2016). Significance of analytic phase of speech signals in speaker verification. Speech Communication, 81, 54–71.CrossRef
Zurück zum Zitat Vogt, R. J., Baker, B. J., & Sridharan, S. (2005). Modelling session variability in text independent speaker verification. Vogt, R. J., Baker, B. J., & Sridharan, S. (2005). Modelling session variability in text independent speaker verification.
Zurück zum Zitat Wang, N., Ching, P. C., Zheng, N. H., & Lee, T. (2007). Robust speaker recognition using both vocal source and vocal tract features estimated from noisy input utterances. In 2007 IEEE international symposium on signal processing and information technology (pp. 772–777). Wang, N., Ching, P. C., Zheng, N. H., & Lee, T. (2007). Robust speaker recognition using both vocal source and vocal tract features estimated from noisy input utterances. In 2007 IEEE international symposium on signal processing and information technology (pp. 772–777).
Zurück zum Zitat Wang, L., Minami, K., Yamamoto, K., & Nakagawa, S. (2010). Speaker recognition by combining MFCC and phase information in noisy conditions. IEICE Transactions on Information and Systems, 93(9), 2397–2406.CrossRef Wang, L., Minami, K., Yamamoto, K., & Nakagawa, S. (2010). Speaker recognition by combining MFCC and phase information in noisy conditions. IEICE Transactions on Information and Systems, 93(9), 2397–2406.CrossRef
Zurück zum Zitat Yaman, S., Pelecanos, J., & Sarikaya, R. (2012). Bottleneck features for speaker recognition. In Odyssey 2012-the speaker and language recognition workshop. Yaman, S., Pelecanos, J., & Sarikaya, R. (2012). Bottleneck features for speaker recognition. In Odyssey 2012-the speaker and language recognition workshop.
Zurück zum Zitat You, C. H., Lee, K. A., & Li, H. (2009). GMM-SVM kernel with a Bhattacharyya-based distance for speaker recognition. IEEE Transactions on Audio, Speech, and Language Processing, 18(6), 1300–1312. You, C. H., Lee, K. A., & Li, H. (2009). GMM-SVM kernel with a Bhattacharyya-based distance for speaker recognition. IEEE Transactions on Audio, Speech, and Language Processing, 18(6), 1300–1312.
Zurück zum Zitat Zeinali, H., Sameti, H., & Burget, L. (2017). HMM-based phrase-independent i-vector extractor for text-dependent speaker verification. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(7), 1421–1435.CrossRef Zeinali, H., Sameti, H., & Burget, L. (2017). HMM-based phrase-independent i-vector extractor for text-dependent speaker verification. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(7), 1421–1435.CrossRef
Zurück zum Zitat Zeinali, H., Sameti, H., & Burget, L. (2017). Text-dependent speaker verification based on i-vectors, Neural Networks and Hidden Markov Models. Computer Speech & Language, 46, 53–71.CrossRef Zeinali, H., Sameti, H., & Burget, L. (2017). Text-dependent speaker verification based on i-vectors, Neural Networks and Hidden Markov Models. Computer Speech & Language, 46, 53–71.CrossRef
Zurück zum Zitat Zhang, Y., & Abdulla, W. H. (2006). Gammatone auditory filterbank and independent component analysis for speaker identification. In Ninth international conference on spoken language processing. Zhang, Y., & Abdulla, W. H. (2006). Gammatone auditory filterbank and independent component analysis for speaker identification. In Ninth international conference on spoken language processing.
Zurück zum Zitat Zhao, X., & Wang, D. (2013, May). Analyzing noise robustness of MFCC and GFCC features in speaker identification. In 2013 IEEE international conference on acoustics, speech and signal processing (pp. 7204–7208) IEEE. Zhao, X., & Wang, D. (2013, May). Analyzing noise robustness of MFCC and GFCC features in speaker identification. In 2013 IEEE international conference on acoustics, speech and signal processing (pp. 7204–7208) IEEE.
Zurück zum Zitat Zhao, X., Shao, Y., & Wang, D. (2012). CASA-based robust speaker identification. IEEE Transactions on Audio, Speech, and Language Processing, 20(5), 1608–1616.CrossRef Zhao, X., Shao, Y., & Wang, D. (2012). CASA-based robust speaker identification. IEEE Transactions on Audio, Speech, and Language Processing, 20(5), 1608–1616.CrossRef
Zurück zum Zitat Zheng, R., Zhang, S., & Xu, B. (2006, January). A comparative study of feature and score normalization for speaker verification. In International conference on biometrics (pp. 531–538). Berlin: Springer. Zheng, R., Zhang, S., & Xu, B. (2006, January). A comparative study of feature and score normalization for speaker verification. In International conference on biometrics (pp. 531–538). Berlin: Springer.
Metadaten
Titel
Closed-set speaker identification using VQ and GMM based models
verfasst von
Bidhan Barai
Tapas Chakraborty
Nibaran Das
Subhadip Basu
Mita Nasipuri
Publikationsdatum
17.09.2021
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 1/2022
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-021-09899-9

Weitere Artikel der Ausgabe 1/2022

International Journal of Speech Technology 1/2022 Zur Ausgabe

Neuer Inhalt