nach oben

International Journal of Speech Technology

Erschienen in:

17.09.2021

Closed-set speaker identification using VQ and GMM based models

verfasst von: Bidhan Barai, Tapas Chakraborty, Nibaran Das, Subhadip Basu, Mita Nasipuri

Erschienen in: International Journal of Speech Technology | Ausgabe 1/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

An array of features and methods are being developed over the past six decades for Speaker Identification (SI) and Speaker Verification (SV), jointly known as Speaker Recognition(SR). Mel Frequency Cepstral Coefficients (MFCC) is generally used as feature vectors in most of the cases because it gives higher accuracy compared to other features. The presented paper focuses on comparative study of state-of-the-art SR techniques along with their design challenges, robustness issues and performance evaluation methods. Rigorous experiments have been performed using Gaussian Mixture Model (GMM) with variations like Universal Background Model (UBM) and/or Vector Quantization (VQ) and/or VQ based UBM-GMM (VQ-UBM-GMM) with detail discussion. Other popular methods have been included, namely, Linear Discriminate Analysis (LDA), Probabilistic LDA (PLDA), Gaussian PLDA (GPLDA), Multi-condition GPLDA (MGPLDA), Identity Vector (i-vector) for comparative study only. Three popular audio data-sets have been used in the experiments, namely, IITG-MV SR, Hyke-2011 and ELSDSR. Hyke-2011 and ELSDSR contain clean speech while IITG-MV SR contains noisy audio data with variations in channel (device), environment, spoken style. We propose a new data mixing approach for SR to make the system independent of recording device, spoken style and environment. The accuracy we obtained for VQ and GMM based methods for databases, Hyke-2011 and ELSDSR are varies from \(99.6\%\) to \(100\%\) whereas accuracy for IITG-MV SR is upto \(98\%\). Indeed, in some cases the accuracies degrade drastically due to mismatch between training and testing data as well as singularity problem of GMM. The experimental results serve as a benchmark for VQ/GMM/UBM based methods for the IITG-MV SR database.

Vorheriger Artikel Using novel method: Real Cepstral Discrete Cosine Transform, for detecting Parkinson from multiple system atrophy, other neurological diseases and healthy cases using voice analysis

Nächster Artikel Handling emotional speech: a prosody based data augmentation technique for improving neutral speech trained ASR systems

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Abd El-Moneim, S., Sedik, A., Nassar, M. A., El-Fishawy, A. S., Sharshar, A. M., Hassan, S. E., et al. (2021). Text-dependent and text-independent speaker recognition of reverberant speech based on CNN. International Journal of Speech Technology, 20, 99–108.

Anand, P., Singh, A. K., Srivastava, S., & Lall, B. (2019). Few shot speaker recognition using deep neural networks. arXiv preprint arXiv:1904.08775.

Aronowitz, H., & Aronowitz, V. (2010, March). Efficient score normalization for speaker recognition. In 2010 IEEE international conference on acoustics, speech and signal processing (pp. 4402–4405) IEEE.

Avci, E. (2007). A new optimum feature extraction and classification method for speaker recognition: GWPNN. Expert Systems with Applications, 32(2), 485–498.CrossRef

Barai, B., Das, D., Das, N., Basu, S., & Nasipuri, M. (2017). An ASR system using MFCC and VQ/GMM with emphasis on environmental dependency, IEEE Calcutta Conference (CALCON), Kolkata (pp. 362–366).

Barai, B., Das, D., Das, N., Basu, S., & Nasipuri, M. (2018). Closed-set text-independent automatic speaker recognition system using VQ/GMM. In Intelligent Engineering Informatics (pp. 337–346). Singapore: Springer.

Barai, B., Das, D., Das, N., Basu, S., & Nasipuri, M. (2019). VQ/GMM-Based Speaker Identification with Emphasis on Language Dependency, Advanced Computing and Systems for Security(ACSS), Advances in Intelligent Systems and Computing (Vol. 883). Singapore: Springer.

Bolt, R. H., Cooper, F. S., David, E. E., Denes, P. B., Pickett, J. M., & Stevens, K. N. (1969). Identification of a speaker by speech spectrograms. Science, 166(3903), 338–343.CrossRef

Brookes, M. (1997). Voicebox: Speech processing toolbox for matlab. Software, Retrieved Mar 2011, from www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html, 47.

BÜYÜK, O., & Arslan, M. L. (2012). Model selection and score normalization for text-dependent single utterance speaker verification. Turkish Journal of Electrical Engineering and Computer Science, 20(2), 1277–1295.

Campbell, W. M., Sturim, D. E., Reynolds, D. A., & Solomonoff, A. (2006, May). SVM based speaker verification using a GMM supervector kernel and NAP variability compensation. In 2006 IEEE international conference on acoustics speech and signal processing proceedings (Vol. 1, pp. I–I) IEEE.

Campbell, W. M., Sturim, D. E., & Reynolds, D. A. (2006). Support vector machines using GMM supervectors for speaker verification. IEEE Signal Processing Letters, 13(5), 308–311.CrossRef

Chakraborty, T., Barai, B., Chatterjee, B., Das, N., Basu, S., & Nasipuri, M. (2019). Closed-set device-independent speaker identification using cnn. In: International conference on intelligent computing and communication (ICICC - 2019). Berlin: Springer.

Chapaneri, S. V. (2012). Spoken digits recognition using weighted MFCC and improved features for dynamic time warping. International Journal of Computer Applications, 40(3), 6–12.CrossRef

Colibro, D., Vair, C., Castaldo, F., Dalmasso, E., & Laface, P. (2006, September). Speaker recognition using channel factors feature compensation. In 2006 14th European signal processing conference (pp. 1–5) IEEE.

Cooke, M., Green, P., Josifovski, L., & Vizinho, A. (2001). Robust automatic speech recognition with missing and unreliable acoustic data. Speech Communication, 34(3), 267–285.MATHCrossRef

Cumani, S., & Laface, P. (2018). Speaker recognition using e-vectors. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(4), 736–748.CrossRef

Dehak, N., Kenny, P. J., Dehak, R., Dumouchel, P., & Ouellet, P. (2010). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 19(4), 788–798.CrossRef

Dehak, N., Plchot, O., Bahari, M. H., Burget, L., & Dehak, R. (2014). GMM weights adaptation based on subspace approaches for speaker verification. Proceedings Odyssey, 2014, 48–53.CrossRef

Djellali, H., & Laskri, M. T. (2013). Random vector quantisation modelling in automatic speaker verification. International Journal of Biometrics, 5(3–4), 248–265.CrossRef

Fan, X., & Hansen, J. H. (2009). Speaker identification with whispered speech based on modified LFCC parameters and feature mapping. In 2009 IEEE international conference on acoustics, speech and signal processing (pp. 4553–4556) IEEE.

Feng, L., & Hansen, L. K. (2005). A new database for speaker recognition. IMM, Informatik og Matematisk Modelling, DTU.

Fortuna, J., Sivakumaran, P., Ariyaeeinia, A., & Malegaonkar, A. (2005). Open-set speaker identification using adapted Gaussian mixture models. In: Ninth European conference on speech communication and technology.

Fujihara, H., Kitahara, T., Goto, M., Komatani, K., Ogata, T., & Okuno, H. G. (2006). Speaker identification under noisy environments by using harmonic structure extraction and reliable frame weighting. In Ninth international conference on spoken language processing.

Furui, S. (2018). Digital speech processing: Synthesis, and recognition. New York: CRC Press.CrossRef

Ganchev, T., Potamitis, I., Fakotakis, N., & Kokkinakis, G. (2004). Text-independent speaker verification for real fast-varying noisy environments. International Journal of Speech Technology, 7(4), 281–292.CrossRef

Garcia, A. A., & Mammone, R. J. (1999, March). Channel-robust speaker identification using modified-mean cepstral mean normalization with frequency warping. In 1999 IEEE international conference on acoustics, speech, and signal processing. Proceedings. ICASSP99 (Cat. No. 99CH36258) (Vol. 1, pp. 325–328) IEEE.

Garcia-Romero, D., Zhou, X., & Espy-Wilson, C. Y. (2012, March). Multicondition training of Gaussian PLDA models in i-vector space for noise and reverberation robust speaker recognition. In 2012 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4257–4260) IEEE.

Ghahabi, O., & Hernando, J. (2018). Restricted Boltzmann machines for vector representation of speech in speaker recognition. Computer Speech & Language, 47, 16–29.CrossRef

Grimaldi, M., & Cummins, F. (2008). Speaker identification using instantaneous frequencies. IEEE Transactions on Audio, Speech, and Language Processing, 16(6), 1097–1111.CrossRef

Han, C. C., Chen, Y. N., Lo, C. C., & Wang, C. T. (2007). A novel approach for vector quantization using a neural network, mean shift, and principal component analysis-based seed re-initialization. Signal Processing, 87(5), 799–810.MATHCrossRef

Haris, B. C., Pradhan, G., Misra, A., Prasanna, S. R. M., Das, R. K., & Sinha, R. (2012). Multivariability speaker recognition database in Indian scenario. International Journal of Speech Technology, 15(4), 441–453.CrossRef

Hegde, R. M., Murthy, H. A., & Gadde, V. R. R. (2006). Significance of the modified group delay feature in speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 15(1), 190–202.CrossRef

Hourri, S., Nikolov, N. S., & Kharroubi, J. (2020). A deep learning approach to integrate convolutional neural networks in speaker recognition. International Journal of Speech Technology, 23, 615–623.CrossRef

Hourri, S., Nikolov, N. S., & Kharroubi, J. (2021). Convolutional neural network vectors for speaker recognition. International Journal of Speech Technology, 24(2), 389–400.CrossRef

Juang, B. H., Rabiner, L., & Wilpon, J. G. (1987). On the use of bandpass liftering in speech recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 35(7), 947–954.CrossRef

Kanagasundaram, A., Dean, D., & Sridharan, S. (2012, December). JFA based speaker recognition using delta-phase and MFCC features. In SST 2012 14th Australasian international conference on speech science and technology.

Kenny, P., Stafylakis, T., Ouellet, P., & Alam, M. J. (2014, May). JFA-based front ends for speaker recognition. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 1705–1709) IEEE.

Kenny, P. (2005). Joint factor analysis of speaker and session variability: Theory and algorithms. CRIM, Montreal,(Report) CRIM-06/08-13, 14, 28–29.

Kenny, P., Boulianne, G., & Dumouchel, P. (2005). Eigenvoice modeling with sparse training data. IEEE Transactions on Speech and Audio Processing, 13(3), 345–354.CrossRef

Kenny, P., Ouellet, P., Dehak, N., Gupta, V., & Dumouchel, P. (2008). A study of interspeaker variability in speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 16(5), 980–988.CrossRef

Khosravani, A., & Homayounpour, M. M. (2017). A PLDA approach for language and text independent speaker recognition. Computer Speech & Language, 45, 457–474.CrossRef

Khosravani, A., & Homayounpour, M. M. (2018). Nonparametrically trained PLDA for short duration i-vector speaker verification. Computer Speech & Language, 52, 105–122.CrossRef

Kohonen, T. (1990). The self-organizing map. Proceedings of the IEEE, 78(9), 1464–1480.CrossRef

Kuhn, R., Nguyen, P., Junqua, J. C., & Boman, R. (2000). Panasonic Corp, Speaker verification and speaker identification based on eigenvoices. U.S. Patent, 6(141), 644.

Lawson, A., Vabishchevich, P., Huggins, M., Ardis, P., Battles, B., & Stauffer, A. (2011). Survey and evaluation of acoustic features for speaker recognition. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5444–5447) IEEE.

Lerato, L., & Mashao, D. J. (2004). Enhancement of GMM speaker identification performance using complementary feature sets. In 2004 IEEE Africon. In 7th African conference in Africa (IEEE Cat. No. 04CH37590) (Vol. 1, pp. 257–261) IEEE.

Li, Q., & Huang, Y. (2010, March). Robust speaker identification using an auditory-based feature. In 2010 IEEE international conference on acoustics, speech and signal processing (pp. 4514–4517) IEEE.

Li, M., & Narayanan, S. (2014). Simplified supervised i-vector modeling with application to robust and efficient language identification and speaker verification. Computer Speech & Language, 28(4), 940–958.CrossRef

Linde, Y., Buzo, A., & Gray, R. (1980). An algorithm for vector quantizer design. IEEE Transactions on Communications, 28(1), 84–95.CrossRef

Ling, Z., & Hong, Z. (2013, January). The improved VQ-MAP and its combination with LS-SVM for speaker recognition. In IEEE conference anthology (pp. 1–4) IEEE.

Liu, Z., Wu, Z., Li, T., Li, J., & Shen, C. (2018). GMM and CNN hybrid method for short utterance speaker recognition. IEEE Transactions on Industrial informatics, 14(7), 3244–3252.CrossRef

Lleida, E., & Rodriguez-Fuentes, L. J. (2018). Speaker and language recognition and characterization: Introduction to the CSL special issue.

Lozano-Diez, A., Silnova, A., Matejka, P., Glembek, O., Plchot, O., Pesan, J., et al. (2016). Analysis and optimization of bottleneck features for speaker recognition. Odyssey, 2016, 21–24.

Madikeri, S. R., & Murthy, H. A. (2011, January). Mel filter bank energy-based slope feature and its application to speaker recognition. In 2011 National Conference on Communications (NCC) (pp. 1–4) IEEE.

Mandasari, M. I., Saeidi, R., McLaren, M., & van Leeuwen, D. A. (2013). Quality measure functions for calibration of speaker recognition systems in various duration conditions. IEEE Transactions on Audio, Speech, and Language Processing, 21(11), 2425–2438.CrossRef

Mandasari, M. I., Saeidi, R., & van Leeuwen, D. A. (2015). Quality measures based calibration with duration and noise dependency for speaker recognition. Speech Communication, 72, 126–137.CrossRef

Markov, K., & Nakagawa, S. (1996, October). Frame level likelihood normalization for text-independent speaker identification using Gaussian mixture models. In Proceeding of fourth international conference on spoken language processing. ICSLP96 (Vol. 3, pp. 1764–1767) IEEE.

Martin, A. F., Greenberg, C. S., Stanford, V. M., Howard, J. M., Doddington, G. R., & Godfrey, J. J. (2014). Performance factor analysis for the 2012 NIST speaker recognition evaluation. In Fifteenth annual conference of the international speech communication association.

Mary, L., & Yegnanarayana, B. (2008). Extraction and representation of prosodic features for language and speaker recognition. Speech Communication, 50(10), 782–796.CrossRef

Matějka, P., Glembek, O., Castaldo, F., Alam, M.J., Plchot, O., Kenny, P., Burget, L., & Černocky, J. (2011, May). Full-covariance UBM and heavy-tailed PLDA in i-vector speaker verification. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4828–4831) IEEE.

Matějka, P., Glembek, O., Novotný, O., Plchot, O., Grézl, F., Burget, L., & Cernocký, J. H. (2016, March). Analysis of DNN approaches to speaker identification. In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5100–5104) IEEE.

Matrouf, D., Ben Kheder, W., Bousquet, P., Ajili, M., & Bonastre, J. (2015). Dealing with additive noise in speaker recognition systems based on i-vector approach, 23rd European Signal Processing Conference (EUSIPCO). Nice, 2015, 2092–2096.

McCowan, I., Dean, D., McLaren, M., Vogt, R., & Sridharan, S. (2011). The delta-phase spectrum with application to voice activity detection and speaker recognition. IEEE Transactions on Audio, Speech, and Language Processing, 19(7), 2026–2038.CrossRef

McLaren, M., Castan, D., Ferrer, L., & Lawson, A. (2016, September). On the Issue of Calibration in DNN-Based Speaker Recognition Systems. In INTERSPEECH (pp. 1825–1829).

Ming, J., Stewart, D., & Vaseghi, S. (2005, March). Speaker identification in unknown noisy conditions-a universal compensation approach. In Proceedings.(ICASSP05). IEEE international conference on acoustics, speech, and signal processing, 2005. (Vol. 1, pp. I–617). IEEE.

Murthy, Y. S., Koolagudi, S. G., & Raja, T. J. (2021). Singer identification for Indian singers using convolutional neural networks. International Journal of Speech Technology, 1, 1–16.

Murty, K. S. R., & Yegnanarayana, B. (2005). Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Processing Letters, 13(1), 52–55.CrossRef

Nabney, I. (2002). NETLAB: Algorithms for pattern recognition. Berlin: Springer.MATH

Nakagawa, S., Wang, L., & Ohtsuka, S. (2011). Speaker identification and verification by combining MFCC and phase information. IEEE Transactions on Audio, Speech, and Language Processing, 20(4), 1085–1095.CrossRef

Nguyen, V. X., Nguyen, V. P., & Pham, T. V. (2015, October). Robust speaker identification based on hybrid model of VQ and GMM-UBM. In 2015 international conference on advanced technologies for communications (ATC) (pp. 490–495) IEEE.

Nica, A., Caruntu, A., Toderean, G., & Buza, O. (2006, May). Analysis and synthesis of vowels using Matlab. In 2006 IEEE international conference on automation, quality and testing, robotics (Vol. 2, pp. 371–374) IEEE.

Novoselov, S., Pekhovsky, T., Kudashev, O., Mendelev, V. S., & Prudnikov, A. (2015). Non-linear PLDA for i-vector speaker verification. In Sixteenth annual conference of the international speech communication association.

Novoselov, S., Pekhovsky, T., Shulipa, A., & Sholokhov, A. (2014, May). Text-dependent GMM-JFA system for password based speaker verification. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 729–737) IEEE.

Pal, S. K., & Majumder, D. D. (1977). Fuzzy sets and decision making approaches in vowel and speaker recognition. IEEE Transactions on Systems, Man, and Cybernetics, 7(8), 625–629.MATHCrossRef

Pal, S. K., & Mitra, P. (2004). Pattern recognition algorithms for data mining. New York: Chapman and Hall/CRC.MATHCrossRef

Paliwal, K. K. (1999). Decorrelated and liftered filter-bank energies for robust speech recognition. In Sixth European conference on speech communication and technology.

Qawaqneh, Z., Mallouh, A. A., & Barkana, B. D. (2017). Deep neural network framework and transformed MFCCs for speakers age and gender classification. Knowledge-Based Systems, 115, 5–14.CrossRef

Rabiner, L. R., & Schafer, R. W. (2011). Theory and applications of digital speech processing (Vol. 64). Upper Saddle River, NJ: Pearson.

Rajan, P., Afanasyev, A., Hautamäki, V., & Kinnunen, T. (2014). From single to multiple enrollment i-vectors: Practical PLDA scoring variants for speaker verification. Digital Signal Processing, 31, 93–101.CrossRef

Ram, R., & Mohanty, M. N. (2018). Performance analysis of adaptive variational mode decomposition approach for speech enhancement. International Journal of Speech Technology, 21(2), 369–381.CrossRef

Rao, K. S., & Sarkar, S. (2014). Robust speaker recognition in noisy environments. Cham: Springer.CrossRef

Reda, A., Panjwani, S., & Cutrell, E. (2011). Hyke: A low-cost remote attendance tracking system for developing regions. In Proceedings of the 5th ACM workshop on networked systems for developing regions (pp. 15–20). New York: ACM.

Reyes-Díaz, F. J., Hernández-Sierra, G., & de Lara, J. R. C. (2021). DNN and i-vector combined method for speaker recognition on multi-variability environments. International Journal of Speech Technology, 24(2), 409–418.CrossRef

Reynolds, D. A. (1995). Speaker identification and verification using Gaussian mixture speaker models. Speech Communication, 17(1–2), 91–108.CrossRef

Richardson, F., Reynolds, D., & Dehak, N. (2015). A unified deep neural network for speaker and language recognition. arXiv preprint arXiv:1504.00923.

Richardson, F., Reynolds, D., & Dehak, N. (2015). Deep neural network approaches to speaker and language recognition. IEEE Signal Processing Letters, 22(10), 1671–1675.CrossRef

Rose, P. (2006). Technical forensic speaker recognition: Evaluation, types and testing of evidence. Computer Speech & Language, 20(2–3), 159–191.CrossRef

Rouat, J. (2008). Computational auditory scene analysis: Principles, algorithms, and applications (wang, d. and brown, gj, eds.; 2006)[book review]. IEEE Transactions on Neural Networks, 19(1), 199.CrossRef

Sawada, H., Mukai, R., Araki, S., & Makino, S. (2004). A robust and precise method for solving the permutation problem of frequency-domain blind source separation. IEEE Transactions on Speech and Audio Processing, 12(5), 530–538.CrossRef

Shahamiri, S. R., & Salim, S. S. B. (2014). Artificial neural networks as speech recognisers for dysarthric speech: Identifying the best-performing set of MFCC parameters and studying a speaker-independent approach. Advanced Engineering Informatics, 28(1), 102–110.CrossRef

Shao, Y., & Wang, D. (2008, March). Robust speaker identification using auditory features and computational auditory scene analysis. In 2008 IEEE international conference on acoustics, speech and signal processing (pp. 1589–1592). IEEE.

Shao, Y., Srinivasan, S., & Wang, D. (2007, April). Incorporating auditory feature uncertainties in robust speaker identification. In 2007 IEEE international conference on acoustics, speech and signal processing-ICASSP07 (Vol. 4, pp. IV-277). IEEE.

Shi, X., Yang, H., & Zhou, P. (2016, October). Robust speaker recognition based on improved GFCC. In 2016 2nd IEEE international conference on computer and communications (ICCC) (pp. 1927–1931) IEEE.

Singh, N., Khan, R. A., & Shree, R. (2012). Applications of speaker recognition. Procedia Engineering, 38, 3122–3126.CrossRef

Susan, S., & Sharma, S. (2012, November). A fuzzy nearest neighbor classifier for speaker identification. In 2012 fourth international conference on computational intelligence and communication networks (pp. 842–845) IEEE.

Tirumala, S. S., Shahamiri, S. R., Garhwal, A. S., & Wang, R. (2017). Speaker identification features extraction methods: A systematic review. Expert Systems with Applications, 90, 250–271.CrossRef

Togneri, R., & Pullella, D. (2011). An overview of speaker identification: Accuracy and robustness issues. IEEE Circuits and Systems Magazine, 11(2), 23–61.CrossRef

Tsiakoulis, P., Potamianos, A., & Dimitriadis, D. (2013, May). Instantaneous frequency and bandwidth estimation using filterbank arrays. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 8032–8036) IEEE.

Vijayan, K., Kumar, V., & Murty, K. S. R. (2014). Feature extraction from analytic phase of speech signals for speaker verification. In Fifteenth annual conference of the international speech communication association.

Vijayan, K., Reddy, P. R., & Murty, K. S. R. (2016). Significance of analytic phase of speech signals in speaker verification. Speech Communication, 81, 54–71.CrossRef

Vogt, R. J., Baker, B. J., & Sridharan, S. (2005). Modelling session variability in text independent speaker verification.

Wang, N., Ching, P. C., Zheng, N. H., & Lee, T. (2007). Robust speaker recognition using both vocal source and vocal tract features estimated from noisy input utterances. In 2007 IEEE international symposium on signal processing and information technology (pp. 772–777).

Wang, L., Minami, K., Yamamoto, K., & Nakagawa, S. (2010). Speaker recognition by combining MFCC and phase information in noisy conditions. IEICE Transactions on Information and Systems, 93(9), 2397–2406.CrossRef

Yaman, S., Pelecanos, J., & Sarikaya, R. (2012). Bottleneck features for speaker recognition. In Odyssey 2012-the speaker and language recognition workshop.

You, C. H., Lee, K. A., & Li, H. (2009). GMM-SVM kernel with a Bhattacharyya-based distance for speaker recognition. IEEE Transactions on Audio, Speech, and Language Processing, 18(6), 1300–1312.

Zeinali, H., Sameti, H., & Burget, L. (2017). HMM-based phrase-independent i-vector extractor for text-dependent speaker verification. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(7), 1421–1435.CrossRef

Zeinali, H., Sameti, H., & Burget, L. (2017). Text-dependent speaker verification based on i-vectors, Neural Networks and Hidden Markov Models. Computer Speech & Language, 46, 53–71.CrossRef

Zhang, Y., & Abdulla, W. H. (2006). Gammatone auditory filterbank and independent component analysis for speaker identification. In Ninth international conference on spoken language processing.

Zhao, X., & Wang, D. (2013, May). Analyzing noise robustness of MFCC and GFCC features in speaker identification. In 2013 IEEE international conference on acoustics, speech and signal processing (pp. 7204–7208) IEEE.

Zhao, X., Shao, Y., & Wang, D. (2012). CASA-based robust speaker identification. IEEE Transactions on Audio, Speech, and Language Processing, 20(5), 1608–1616.CrossRef

Zheng, R., Zhang, S., & Xu, B. (2006, January). A comparative study of feature and score normalization for speaker verification. In International conference on biometrics (pp. 531–538). Berlin: Springer.

Titel: Closed-set speaker identification using VQ and GMM based models
verfasst von: Bidhan Barai
Tapas Chakraborty
Nibaran Das
Subhadip Basu
Mita Nasipuri
Publikationsdatum: 17.09.2021
Verlag: Springer US
Erschienen in: International Journal of Speech Technology / Ausgabe 1/2022
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-021-09899-9

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Internationaler Motorenkongress/© [M] ATZlive | Chisnikov / Fotolia.com, Search Icon, Banner Hanser, Gardiner von Trapp/© Alpega Group, Benny Hahn/© ZEP GmbH, Customer Experience/© © oatawa / Getty Images / iStock, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade, chassis.tech plus 2023/© [M] ATZlive / TÜV SÜD PRODUCT SERVICE GMBH

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 1/2022

Efficient cancelable speaker identification system based on a hybrid structure of DWT and SVD

Acoustic domain mismatch compensation in bird audio detection

Boosting subjective quality of Arabic text-to-speech (TTS) using end-to-end deep architecture

Evaluating the effect of using different transcription schemes in building a speech recognition system for Arabic

A method for constructing Korean spontaneous spoken language corpus based on an imitation of abbreviated and transformed particles

Correction to: The perception of emotional cues by children in artificial background noise

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.