Skip to main content
Top
Published in: Neural Computing and Applications 17/2020

10-03-2020 | Original Article

Robust features for text-independent speaker recognition with short utterances

Authors: Rania Chakroun, Mondher Frikha

Published in: Neural Computing and Applications | Issue 17/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Speaker recognition systems achieve good performance under controlled conditions. However, in real-world conditions, the performance degrades drastically. The principal cause being when limited data are presented. The presence of background noise is another main factor of performance distortion. In spite of the major advances in speaker recognition field, the effect of noise and the limitation of the amount of available speech data are still open problems, and no optimal solution has been found yet to cope with them. In this paper, we propose a new system using new enhanced and reduced gammatone coefficients in order to improve robustness with limited speech data duration. We demonstrate the usefulness of these coefficients compared to the well-known features with speakers taken from different databases recorded under different conditions.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Liu JC, Leu FY, Lin GL, Susanto H (2018) An MFCC-based text-independent speaker identification system for access control. Concur Comput Pract Exp 30(2):e4255CrossRef Liu JC, Leu FY, Lin GL, Susanto H (2018) An MFCC-based text-independent speaker identification system for access control. Concur Comput Pract Exp 30(2):e4255CrossRef
2.
go back to reference Togneri R, Pullella D (2011) An overview of speaker identification: accuracy and robustness issues. IEEE Circuits Syst Mag 11(2):23–61CrossRef Togneri R, Pullella D (2011) An overview of speaker identification: accuracy and robustness issues. IEEE Circuits Syst Mag 11(2):23–61CrossRef
3.
go back to reference Dişken G, Tüfekçi Z, Saribulut L, Çevik U (2017) A review on feature extraction for speaker recognition under degraded conditions. IETE Tech Rev 34(3):321–332CrossRef Dişken G, Tüfekçi Z, Saribulut L, Çevik U (2017) A review on feature extraction for speaker recognition under degraded conditions. IETE Tech Rev 34(3):321–332CrossRef
4.
go back to reference Larcher A, Bonastre JF, Mason JS (2008) Short utterance-based video aided speaker recognition. In: 2008 IEEE 10th workshop on multimedia signal processing, pp 897–901. IEEE Larcher A, Bonastre JF, Mason JS (2008) Short utterance-based video aided speaker recognition. In: 2008 IEEE 10th workshop on multimedia signal processing, pp 897–901. IEEE
5.
go back to reference Chang J, Wang D (2017) Robust speaker recognition based on DNN/i-vectors and speech separation. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 5415–5419. IEEE Chang J, Wang D (2017) Robust speaker recognition based on DNN/i-vectors and speech separation. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 5415–5419. IEEE
6.
go back to reference Ranjan S, Misra A, Hansen JH (2017) Curriculum learning based probabilistic linear discriminant analysis for noise robust speaker recognition. Proc Interspeech 2017:3717–3721CrossRef Ranjan S, Misra A, Hansen JH (2017) Curriculum learning based probabilistic linear discriminant analysis for noise robust speaker recognition. Proc Interspeech 2017:3717–3721CrossRef
7.
go back to reference Krishnamoorthy P, Jayanna HS, Prasanna SM (2011) Speaker recognition under limited data condition by noise addition. Expert Syst Appl 38(10):13487–13490CrossRef Krishnamoorthy P, Jayanna HS, Prasanna SM (2011) Speaker recognition under limited data condition by noise addition. Expert Syst Appl 38(10):13487–13490CrossRef
8.
go back to reference Jayanna HS, Mahadeva SR (2009) Multiple frame size and rate analysis for speaker recognition under limited data condition. IET Signal Process 3(3):189–204CrossRef Jayanna HS, Mahadeva SR (2009) Multiple frame size and rate analysis for speaker recognition under limited data condition. IET Signal Process 3(3):189–204CrossRef
9.
go back to reference Chakroun R, Frikha M, Zouari LB (2018) New approach for short utterance speaker identification. IET Signal Process 12(7):873–880CrossRef Chakroun R, Frikha M, Zouari LB (2018) New approach for short utterance speaker identification. IET Signal Process 12(7):873–880CrossRef
10.
go back to reference Fatima N, Zheng TF (2012) Short utterance speaker recognition a research agenda. In: International conference on systems and informatics (ICSAI) Fatima N, Zheng TF (2012) Short utterance speaker recognition a research agenda. In: International conference on systems and informatics (ICSAI)
11.
go back to reference Liu Z, Wu Z, Li T, Li J, Shen C (2018) GMM and CNN hybrid method for short utterance speaker recognition. IEEE Trans Ind Inf 14(7):3244–3252CrossRef Liu Z, Wu Z, Li T, Li J, Shen C (2018) GMM and CNN hybrid method for short utterance speaker recognition. IEEE Trans Ind Inf 14(7):3244–3252CrossRef
12.
go back to reference Park SJ, Yeung G, Kreiman J, Keating PA, Alwan A (2017) Using voice quality features to improve short-utterance, text-independent speaker verification systems. Proc Interspeech 2017:1522–1526CrossRef Park SJ, Yeung G, Kreiman J, Keating PA, Alwan A (2017) Using voice quality features to improve short-utterance, text-independent speaker verification systems. Proc Interspeech 2017:1522–1526CrossRef
13.
go back to reference Khosravani A, Homayounpour MM (2018) Nonparametrically trained PLDA for short duration i-vector speaker verification. Comput Speech Lang 52:105–122CrossRef Khosravani A, Homayounpour MM (2018) Nonparametrically trained PLDA for short duration i-vector speaker verification. Comput Speech Lang 52:105–122CrossRef
14.
go back to reference Matza A, Bistritz Y (2014) Skew Gaussian mixture models for speaker recognition. IET Signal Process 8(8):860–867CrossRef Matza A, Bistritz Y (2014) Skew Gaussian mixture models for speaker recognition. IET Signal Process 8(8):860–867CrossRef
15.
go back to reference Motlicek P, Dey S, Madikeri S, Burget L (2015) Employment of subspace gaussian mixture models in speaker recognition. In: IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 4445–4449 Motlicek P, Dey S, Madikeri S, Burget L (2015) Employment of subspace gaussian mixture models in speaker recognition. In: IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 4445–4449
16.
go back to reference Li ZY, Zhang WQ, Liu J (2015) Multi-resolution time frequency feature and complementary combination for short utterance speaker recognition. Multimed Tools Appl 74(3):937–953CrossRef Li ZY, Zhang WQ, Liu J (2015) Multi-resolution time frequency feature and complementary combination for short utterance speaker recognition. Multimed Tools Appl 74(3):937–953CrossRef
17.
go back to reference Saeidi R, Alku P (2015) Accounting for uncertainty of i-vectors in speaker recognition using uncertainty propagation and modified imputation. In: Proceedings of Interspeech, vol 2015 Saeidi R, Alku P (2015) Accounting for uncertainty of i-vectors in speaker recognition using uncertainty propagation and modified imputation. In: Proceedings of Interspeech, vol 2015
18.
go back to reference Sholokhov A, Sahidullah M, Kinnunen T (2018) Semi-supervised speech activity detection with an application to automatic speaker verification. Comput Speech Lang 47:132–156CrossRef Sholokhov A, Sahidullah M, Kinnunen T (2018) Semi-supervised speech activity detection with an application to automatic speaker verification. Comput Speech Lang 47:132–156CrossRef
19.
go back to reference Li L, Wang D, Zhang C, Zheng TF (2016) Improving short utterance speaker recognition by modeling speech unit classes. IEEE/ACM Trans Audio Speech Lang Process (TASLP) 24(6):1129–1139CrossRef Li L, Wang D, Zhang C, Zheng TF (2016) Improving short utterance speaker recognition by modeling speech unit classes. IEEE/ACM Trans Audio Speech Lang Process (TASLP) 24(6):1129–1139CrossRef
20.
go back to reference Reynolds D, Quatieri T, Dunn R (2000) Speaker verification using adapted Gaussian mixture models. Digit Signal Process 10(1–3):19–41CrossRef Reynolds D, Quatieri T, Dunn R (2000) Speaker verification using adapted Gaussian mixture models. Digit Signal Process 10(1–3):19–41CrossRef
21.
go back to reference Li S, Karatzoglou A, Gentile C (2016) Collaborative filtering bandits. In: Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 539–548 Li S, Karatzoglou A, Gentile C (2016) Collaborative filtering bandits. In: Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 539–548
22.
go back to reference Korda N, Szörényi B, Shuai L (2016) Distributed clustering of linear bandits in peer to peer networks. In: Journal of machine learning research workshop and conference proceedings, vol 48. International Machine Learning Societ, pp 1301–1309 Korda N, Szörényi B, Shuai L (2016) Distributed clustering of linear bandits in peer to peer networks. In: Journal of machine learning research workshop and conference proceedings, vol 48. International Machine Learning Societ, pp 1301–1309
23.
go back to reference Li S (2016) The art of clustering bandits. Doctoral dissertation, Università degli Studi dell’Insubria Li S (2016) The art of clustering bandits. Doctoral dissertation, Università degli Studi dell’Insubria
24.
go back to reference Dehak N, Kenny P, Dehak R, Dumouchel P, Ouellet P (2011) Front-end factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 19(99):788–798CrossRef Dehak N, Kenny P, Dehak R, Dumouchel P, Ouellet P (2011) Front-end factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 19(99):788–798CrossRef
25.
go back to reference Sarkar A, Matrouf D, Bousquet P, Bonastre J (2012) Study of the effect of i-vector modeling on short and mismatch utterance duration for speaker verification. In: Thirteenth annual conference of the international speech communication association, INTERSPEECH, pp 2662–2665 Sarkar A, Matrouf D, Bousquet P, Bonastre J (2012) Study of the effect of i-vector modeling on short and mismatch utterance duration for speaker verification. In: Thirteenth annual conference of the international speech communication association, INTERSPEECH, pp 2662–2665
26.
go back to reference Kanagasundaram A, Vogt R, Dean D, Sridharan S, Mason M (2011) I-vector based speaker recognition on short utterances. In: Proceedings of Interspeech, Florence, Italy, 2011, pp 2341–2344 Kanagasundaram A, Vogt R, Dean D, Sridharan S, Mason M (2011) I-vector based speaker recognition on short utterances. In: Proceedings of Interspeech, Florence, Italy, 2011, pp 2341–2344
27.
go back to reference Mandasari MI, McLaren M, van Leeuwen DA (2011) Evaluation of i-vector speaker recognition systems for forensic application. In: Proceedings of Interspeech. ISCA, Firenze Mandasari MI, McLaren M, van Leeuwen DA (2011) Evaluation of i-vector speaker recognition systems for forensic application. In: Proceedings of Interspeech. ISCA, Firenze
28.
go back to reference Hasan T, Saeidi R, Hansen JHL, van Leeuwen DA (2013) Duration mismatch compensation for i-vector based speaker recognition systems. In: Proceedings of IEEE ICASSP, Vancouver, Canada Hasan T, Saeidi R, Hansen JHL, van Leeuwen DA (2013) Duration mismatch compensation for i-vector based speaker recognition systems. In: Proceedings of IEEE ICASSP, Vancouver, Canada
30.
go back to reference Kinnunen T, Li H (2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Commun 52(1):12–40CrossRef Kinnunen T, Li H (2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Commun 52(1):12–40CrossRef
31.
go back to reference Zhang WQ, Zhao J, Zhang WL, Liu J (2014). Multi-scale kernels for short utterance speaker recognition. In: The 9th international symposium on Chinese spoken language processing. IEEE, pp 414–417 Zhang WQ, Zhao J, Zhang WL, Liu J (2014). Multi-scale kernels for short utterance speaker recognition. In: The 9th international symposium on Chinese spoken language processing. IEEE, pp 414–417
32.
go back to reference Fauve B, Evans N, Mason J (2008) Improving the performance of text-independent short duration SVM-and GMM-based speaker verification. In: Proceedings of Odyssey, Stellenbosch, South Africa Fauve B, Evans N, Mason J (2008) Improving the performance of text-independent short duration SVM-and GMM-based speaker verification. In: Proceedings of Odyssey, Stellenbosch, South Africa
33.
go back to reference McLaren M, Vogt R, Baker B, Sridharan S (2010) Experiments in SVM-based speaker verification using short utterances. In: Proceedings of Odyssey workshop 2010 McLaren M, Vogt R, Baker B, Sridharan S (2010) Experiments in SVM-based speaker verification using short utterances. In: Proceedings of Odyssey workshop 2010
34.
go back to reference Lan Y, Hu Z, Soh YC, Huang GB (2013) An extreme learning machine approach for speaker recognition. Neural Comput Appl 22(3–4):417–425CrossRef Lan Y, Hu Z, Soh YC, Huang GB (2013) An extreme learning machine approach for speaker recognition. Neural Comput Appl 22(3–4):417–425CrossRef
35.
go back to reference Heigold G, Moreno I, Bengio S, Shazeer N (2016) End-to-end text-dependent speaker verification. In: 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 5115–5119 Heigold G, Moreno I, Bengio S, Shazeer N (2016) End-to-end text-dependent speaker verification. In: 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 5115–5119
36.
go back to reference Zhang SX, Chen Z, Zhao Y, Li J, Gong Y (2017) End-to-end attention based text-dependent speaker verification. arXiv preprint arXiv:1701.00562 Zhang SX, Chen Z, Zhao Y, Li J, Gong Y (2017) End-to-end attention based text-dependent speaker verification. arXiv preprint arXiv:​1701.​00562
37.
go back to reference Variani E, Lei X, McDermott E, Moreno IL, Gonzalez-Dominguez J (2014) Deep neural networks for small footprint text-dependent speaker verification. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 4052–4056 Variani E, Lei X, McDermott E, Moreno IL, Gonzalez-Dominguez J (2014) Deep neural networks for small footprint text-dependent speaker verification. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 4052–4056
38.
go back to reference Heigold G, Moreno I, Bengio S, Shazeer N (2016) End-to-endtext-dependent speaker verification. In: 2016 IEEE international conference on Acoustics, speech and signal processing (ICASSP). IEEE, pp 5115–5119 Heigold G, Moreno I, Bengio S, Shazeer N (2016) End-to-endtext-dependent speaker verification. In: 2016 IEEE international conference on Acoustics, speech and signal processing (ICASSP). IEEE, pp 5115–5119
40.
go back to reference Snyder D, Ghahremani P, Povey D, Garcia-Romero D, Carmiel Y, Khudanpur S (2016) Deep neural network-based speaker embeddings for end-to-end speaker verification. In: 2016 IEEE spoken language technology workshop (SLT), IEEE, pp 165–170 Snyder D, Ghahremani P, Povey D, Garcia-Romero D, Carmiel Y, Khudanpur S (2016) Deep neural network-based speaker embeddings for end-to-end speaker verification. In: 2016 IEEE spoken language technology workshop (SLT), IEEE, pp 165–170
42.
go back to reference Kanagasundaram A, Vogt R, Dean D, Sridharan S (2012) PLDA based speaker recognition on short utterances. In: The speaker and language recognition workshop (Odyssey 2012), ISCA, 2012 Kanagasundaram A, Vogt R, Dean D, Sridharan S (2012) PLDA based speaker recognition on short utterances. In: The speaker and language recognition workshop (Odyssey 2012), ISCA, 2012
43.
go back to reference Kanagasundaram A, Dean D, Sridharan S (2014) Improving PLDA speaker verification with limited development data. In: IEEE international conference on acoustics, speech and signal processing Kanagasundaram A, Dean D, Sridharan S (2014) Improving PLDA speaker verification with limited development data. In: IEEE international conference on acoustics, speech and signal processing
44.
go back to reference Rahman MH, Kanagasundaram A, Himawan I, Dean D, Sridharan S (2018) Improving PLDA speaker verification performance using domain mismatch compensation techniques. Comput Speech Lang 47:240–258CrossRef Rahman MH, Kanagasundaram A, Himawan I, Dean D, Sridharan S (2018) Improving PLDA speaker verification performance using domain mismatch compensation techniques. Comput Speech Lang 47:240–258CrossRef
45.
go back to reference Cumani S, Plchot O, Laface P (2014) On the use of i-vector posterior distributions in probabilistic linear discriminant analysis. IEEE Trans Audio Speech Lang Process 22(4):846–857CrossRef Cumani S, Plchot O, Laface P (2014) On the use of i-vector posterior distributions in probabilistic linear discriminant analysis. IEEE Trans Audio Speech Lang Process 22(4):846–857CrossRef
46.
go back to reference Ganapathy S, Mallidi SH, Hermansky H (2014) Robust feature extraction using modulation filtering of autoregressive models. IEEE Trans Audio Speech Lang Process 22(8):1285–1295CrossRef Ganapathy S, Mallidi SH, Hermansky H (2014) Robust feature extraction using modulation filtering of autoregressive models. IEEE Trans Audio Speech Lang Process 22(8):1285–1295CrossRef
47.
go back to reference Zhao X, Wang Y, Wang D (2014) Robust speaker identificat ion in noisy and reverberant conditions. IEEE Trans Audio Speech Lang Process 22(4):836–845CrossRef Zhao X, Wang Y, Wang D (2014) Robust speaker identificat ion in noisy and reverberant conditions. IEEE Trans Audio Speech Lang Process 22(4):836–845CrossRef
48.
go back to reference Yu C, Liu G, Hahm S, Hansen JHL (2014) Uncertainty propagation in front end factor analysis for noise robust speaker recognition. In: Proceedings of the 39th ICASSP, Florence, Italy, pp 4017–4021 Yu C, Liu G, Hahm S, Hansen JHL (2014) Uncertainty propagation in front end factor analysis for noise robust speaker recognition. In: Proceedings of the 39th ICASSP, Florence, Italy, pp 4017–4021
49.
go back to reference Hurmalainen A, Saeidi R, Virtanen T (2015) Noise robust speaker recognition with convolutive sparse coding. In: Sixteenth annual conference of the international speech communication association Hurmalainen A, Saeidi R, Virtanen T (2015) Noise robust speaker recognition with convolutive sparse coding. In: Sixteenth annual conference of the international speech communication association
50.
go back to reference Lei Y, McLaren M, Ferrer L, Scheffer N (2014) Simplified VTS-based i-vector extraction in noise-robust speaker recognition. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 4037–4041. IEEE Lei Y, McLaren M, Ferrer L, Scheffer N (2014) Simplified VTS-based i-vector extraction in noise-robust speaker recognition. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 4037–4041. IEEE
51.
go back to reference Kheder WB, Matrouf D, Bousquet PM, Bonastre JF, Ajili M (2017) Fast i-vector denoising using MAP estimation and a noise distributions database for robust speaker recognition. Comput Speech Lang 45:104–122CrossRef Kheder WB, Matrouf D, Bousquet PM, Bonastre JF, Ajili M (2017) Fast i-vector denoising using MAP estimation and a noise distributions database for robust speaker recognition. Comput Speech Lang 45:104–122CrossRef
52.
go back to reference Ming J, Hazen TJ, Glass JR, Reynolds DA (2007) Robust speaker recognition in noisy conditions. IEEE Trans Audio Speech Lang Process 15(5):1711–1723CrossRef Ming J, Hazen TJ, Glass JR, Reynolds DA (2007) Robust speaker recognition in noisy conditions. IEEE Trans Audio Speech Lang Process 15(5):1711–1723CrossRef
53.
go back to reference Lei Y, Burget L, Scheffer N (2013)A noise robust i-vector extractor using vector Taylor series for speaker recognition. In: Proceedings of the 38th ICASSP, Vancouver, BC, Canada, 2013, pp 6788–6791 Lei Y, Burget L, Scheffer N (2013)A noise robust i-vector extractor using vector Taylor series for speaker recognition. In: Proceedings of the 38th ICASSP, Vancouver, BC, Canada, 2013, pp 6788–6791
54.
go back to reference Alku P, Saeidi R (2017) The linear predictive modeling of speech from higher-lag autocorrelation coefficients applied to noise-robust speaker recognition. IEEE/ACM Trans Audio Speech Lang Process 25:1606–1617CrossRef Alku P, Saeidi R (2017) The linear predictive modeling of speech from higher-lag autocorrelation coefficients applied to noise-robust speaker recognition. IEEE/ACM Trans Audio Speech Lang Process 25:1606–1617CrossRef
55.
go back to reference Liu X, Sadeghian R, Zahorian SA (2017) A modulation feature set for robust automatic speech recognition in additive noise and reverberation. In: IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 5230–5234 Liu X, Sadeghian R, Zahorian SA (2017) A modulation feature set for robust automatic speech recognition in additive noise and reverberation. In: IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 5230–5234
56.
go back to reference Zhao X, Shao Y, Wang DL (2012) CASA based robust speaker identification. IEEE Trans Audio Speech Lang Process 20(51):608–1616 Zhao X, Shao Y, Wang DL (2012) CASA based robust speaker identification. IEEE Trans Audio Speech Lang Process 20(51):608–1616
57.
go back to reference Venkatesan R, Ganesh AB (2018) Binaural classification-based speech segregation and robust speaker recognition system. Circuits Syst Signal Process 37(8):3383–3411MathSciNetCrossRef Venkatesan R, Ganesh AB (2018) Binaural classification-based speech segregation and robust speaker recognition system. Circuits Syst Signal Process 37(8):3383–3411MathSciNetCrossRef
58.
go back to reference Fedila, M, Bengherabi M, Amrouche A (2018) Gammatone filterbank and symbiotic combination of amplitude and phase-based spectra for robust speaker verification under noisy conditions and compression artifacts. Multimedia Tools Appl 77(13):16721–16739CrossRef Fedila, M, Bengherabi M, Amrouche A (2018) Gammatone filterbank and symbiotic combination of amplitude and phase-based spectra for robust speaker verification under noisy conditions and compression artifacts. Multimedia Tools Appl 77(13):16721–16739CrossRef
59.
go back to reference Atal B (1974) Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. J Acoustic Soc Am 55:1304CrossRef Atal B (1974) Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. J Acoustic Soc Am 55:1304CrossRef
60.
go back to reference Mammone R, Zhang X, Ramachandran R (1996) Robust speaker recognition: a feature-based approach. IEEE Signal Process Mag 13(5):58–71CrossRef Mammone R, Zhang X, Ramachandran R (1996) Robust speaker recognition: a feature-based approach. IEEE Signal Process Mag 13(5):58–71CrossRef
61.
go back to reference Reynolds D (1994) Experimental evaluation of features for robust speaker identification. IEEE Trans Speech Audio Process 2(4):639–643CrossRef Reynolds D (1994) Experimental evaluation of features for robust speaker identification. IEEE Trans Speech Audio Process 2(4):639–643CrossRef
62.
go back to reference Sheikhan M, Gharavian D, Ashoftedel F (2012) Using DTW neural–based MFCC warping to improve emotional speech recognition. Neural Comput Appl 21(7):1765–1773CrossRef Sheikhan M, Gharavian D, Ashoftedel F (2012) Using DTW neural–based MFCC warping to improve emotional speech recognition. Neural Comput Appl 21(7):1765–1773CrossRef
63.
go back to reference Turner C, Joseph A (2015) A wavelet packet and mel-frequency cepstral coefficients-based feature extraction method for speaker identification. Procedia Comput Sci 61:416–421CrossRef Turner C, Joseph A (2015) A wavelet packet and mel-frequency cepstral coefficients-based feature extraction method for speaker identification. Procedia Comput Sci 61:416–421CrossRef
64.
go back to reference Shahamiri SR, Salim SSB (2014) Artificial neural networks as speech recognisers for dysarthric speech: identifying the best-performing set of MFCC parameters and studying a speaker-independent approach. Adv Eng Inf 28(1):102–110CrossRef Shahamiri SR, Salim SSB (2014) Artificial neural networks as speech recognisers for dysarthric speech: identifying the best-performing set of MFCC parameters and studying a speaker-independent approach. Adv Eng Inf 28(1):102–110CrossRef
65.
go back to reference Ali H, Tran SN, Benetos E, Garcez ASDA (2018) Speaker recognition with hybrid features from a deep belief network. Neural Comput Appl 29(6):13–19CrossRef Ali H, Tran SN, Benetos E, Garcez ASDA (2018) Speaker recognition with hybrid features from a deep belief network. Neural Comput Appl 29(6):13–19CrossRef
66.
go back to reference Young S, Kershaw D, Odell J, Ollason D, Valtchev V, Woodland P (2002) Hidden Markov model toolkit (HTK) version 3.4 user’s guide Young S, Kershaw D, Odell J, Ollason D, Valtchev V, Woodland P (2002) Hidden Markov model toolkit (HTK) version 3.4 user’s guide
67.
go back to reference Zhang X, Zou X, Sun M, Zheng TF, Jia C, Wang Y (2019) Noise robust speaker recognition based on adaptive frame weighting in GMM for I-vector extraction. IEEE Access 7:27874–27882CrossRef Zhang X, Zou X, Sun M, Zheng TF, Jia C, Wang Y (2019) Noise robust speaker recognition based on adaptive frame weighting in GMM for I-vector extraction. IEEE Access 7:27874–27882CrossRef
68.
go back to reference Islam MA, Jassim WA, Cheok NS, Zilany MSA (2016) A robust speaker identification system using the responses from a model of the auditory periphery. PLoS ONE 11(7):e0158520CrossRef Islam MA, Jassim WA, Cheok NS, Zilany MSA (2016) A robust speaker identification system using the responses from a model of the auditory periphery. PLoS ONE 11(7):e0158520CrossRef
69.
go back to reference Zhao X, Shao Y, Wang D (2012) CASA-based robust speaker identification. Audio Speech Lang Process IEEE Trans 20(5):1608–1616CrossRef Zhao X, Shao Y, Wang D (2012) CASA-based robust speaker identification. Audio Speech Lang Process IEEE Trans 20(5):1608–1616CrossRef
70.
go back to reference Zhao X, Wang D (2013) Analyzing noise robustness of MFCC and GFCC features in speaker identification. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 7204–7208 Zhao X, Wang D (2013) Analyzing noise robustness of MFCC and GFCC features in speaker identification. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 7204–7208
71.
go back to reference Shao Y, Wang D (2008) Robust speaker identification using auditory features and computational auditory scene analysis. In: IEEE international conference on acoustics, speech and signal processing, 2008. ICASSP 2008. IEEE, pp 1589–1592 Shao Y, Wang D (2008) Robust speaker identification using auditory features and computational auditory scene analysis. In: IEEE international conference on acoustics, speech and signal processing, 2008. ICASSP 2008. IEEE, pp 1589–1592
72.
go back to reference Kenny P (2010) Bayesian speaker verification with heavy-tailed priors. In: Proceedings of odyssey speaker and language recognition workshop Kenny P (2010) Bayesian speaker verification with heavy-tailed priors. In: Proceedings of odyssey speaker and language recognition workshop
73.
go back to reference Garofolo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS, Dahlgren NL (1993) DARPA TIMIT acoustic phonetic continuous speech corpus CDROM. NIST Garofolo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS, Dahlgren NL (1993) DARPA TIMIT acoustic phonetic continuous speech corpus CDROM. NIST
74.
go back to reference Feng L, Hansen LK (2005) A new database for speaker recognition. Informatics and mathematical modeling. Technical University of Denmark, DTU Feng L, Hansen LK (2005) A new database for speaker recognition. Informatics and mathematical modeling. Technical University of Denmark, DTU
75.
go back to reference Reynolds DA (1995) Automatic speaker recognition using gaussian mixture speaker models. Linc Lab J 8(2):173–192 Reynolds DA (1995) Automatic speaker recognition using gaussian mixture speaker models. Linc Lab J 8(2):173–192
76.
go back to reference Jankowski C, Kalyanswamy A, Basson S, Spitz J (1990) NTIMIT: a phonetically balanced, continuous speech, telephone bandwidth speech database. ICASSP Jankowski C, Kalyanswamy A, Basson S, Spitz J (1990) NTIMIT: a phonetically balanced, continuous speech, telephone bandwidth speech database. ICASSP
Metadata
Title
Robust features for text-independent speaker recognition with short utterances
Authors
Rania Chakroun
Mondher Frikha
Publication date
10-03-2020
Publisher
Springer London
Published in
Neural Computing and Applications / Issue 17/2020
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-020-04793-y

Other articles of this Issue 17/2020

Neural Computing and Applications 17/2020 Go to the issue

Premium Partner