Top

Neural Computing and Applications

Published in:

10-03-2020 | Original Article

Robust features for text-independent speaker recognition with short utterances

Authors: Rania Chakroun, Mondher Frikha

Published in: Neural Computing and Applications | Issue 17/2020

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Speaker recognition systems achieve good performance under controlled conditions. However, in real-world conditions, the performance degrades drastically. The principal cause being when limited data are presented. The presence of background noise is another main factor of performance distortion. In spite of the major advances in speaker recognition field, the effect of noise and the limitation of the amount of available speech data are still open problems, and no optimal solution has been found yet to cope with them. In this paper, we propose a new system using new enhanced and reduced gammatone coefficients in order to improve robustness with limited speech data duration. We demonstrate the usefulness of these coefficients compared to the well-known features with speakers taken from different databases recorded under different conditions.

previous article SulSite-GTB: identification of protein S-sulfenylation sites by fusing multiple feature information and gradient tree boosting

next article Coping with opponents: multi-objective evolutionary neural networks for fighting games

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Liu JC, Leu FY, Lin GL, Susanto H (2018) An MFCC-based text-independent speaker identification system for access control. Concur Comput Pract Exp 30(2):e4255CrossRef

Togneri R, Pullella D (2011) An overview of speaker identification: accuracy and robustness issues. IEEE Circuits Syst Mag 11(2):23–61CrossRef

Dişken G, Tüfekçi Z, Saribulut L, Çevik U (2017) A review on feature extraction for speaker recognition under degraded conditions. IETE Tech Rev 34(3):321–332CrossRef

Larcher A, Bonastre JF, Mason JS (2008) Short utterance-based video aided speaker recognition. In: 2008 IEEE 10th workshop on multimedia signal processing, pp 897–901. IEEE

Chang J, Wang D (2017) Robust speaker recognition based on DNN/i-vectors and speech separation. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 5415–5419. IEEE

Ranjan S, Misra A, Hansen JH (2017) Curriculum learning based probabilistic linear discriminant analysis for noise robust speaker recognition. Proc Interspeech 2017:3717–3721CrossRef

Krishnamoorthy P, Jayanna HS, Prasanna SM (2011) Speaker recognition under limited data condition by noise addition. Expert Syst Appl 38(10):13487–13490CrossRef

Jayanna HS, Mahadeva SR (2009) Multiple frame size and rate analysis for speaker recognition under limited data condition. IET Signal Process 3(3):189–204CrossRef

Chakroun R, Frikha M, Zouari LB (2018) New approach for short utterance speaker identification. IET Signal Process 12(7):873–880CrossRef

10.

Fatima N, Zheng TF (2012) Short utterance speaker recognition a research agenda. In: International conference on systems and informatics (ICSAI)

11.

Liu Z, Wu Z, Li T, Li J, Shen C (2018) GMM and CNN hybrid method for short utterance speaker recognition. IEEE Trans Ind Inf 14(7):3244–3252CrossRef

12.

Park SJ, Yeung G, Kreiman J, Keating PA, Alwan A (2017) Using voice quality features to improve short-utterance, text-independent speaker verification systems. Proc Interspeech 2017:1522–1526CrossRef

13.

Khosravani A, Homayounpour MM (2018) Nonparametrically trained PLDA for short duration i-vector speaker verification. Comput Speech Lang 52:105–122CrossRef

14.

Matza A, Bistritz Y (2014) Skew Gaussian mixture models for speaker recognition. IET Signal Process 8(8):860–867CrossRef

15.

Motlicek P, Dey S, Madikeri S, Burget L (2015) Employment of subspace gaussian mixture models in speaker recognition. In: IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 4445–4449

16.

Li ZY, Zhang WQ, Liu J (2015) Multi-resolution time frequency feature and complementary combination for short utterance speaker recognition. Multimed Tools Appl 74(3):937–953CrossRef

17.

Saeidi R, Alku P (2015) Accounting for uncertainty of i-vectors in speaker recognition using uncertainty propagation and modified imputation. In: Proceedings of Interspeech, vol 2015

18.

Sholokhov A, Sahidullah M, Kinnunen T (2018) Semi-supervised speech activity detection with an application to automatic speaker verification. Comput Speech Lang 47:132–156CrossRef

19.

Li L, Wang D, Zhang C, Zheng TF (2016) Improving short utterance speaker recognition by modeling speech unit classes. IEEE/ACM Trans Audio Speech Lang Process (TASLP) 24(6):1129–1139CrossRef

20.

Reynolds D, Quatieri T, Dunn R (2000) Speaker verification using adapted Gaussian mixture models. Digit Signal Process 10(1–3):19–41CrossRef

21.

Li S, Karatzoglou A, Gentile C (2016) Collaborative filtering bandits. In: Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 539–548

22.

Korda N, Szörényi B, Shuai L (2016) Distributed clustering of linear bandits in peer to peer networks. In: Journal of machine learning research workshop and conference proceedings, vol 48. International Machine Learning Societ, pp 1301–1309

23.

Li S (2016) The art of clustering bandits. Doctoral dissertation, Università degli Studi dell’Insubria

24.

Dehak N, Kenny P, Dehak R, Dumouchel P, Ouellet P (2011) Front-end factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 19(99):788–798CrossRef

25.

Sarkar A, Matrouf D, Bousquet P, Bonastre J (2012) Study of the effect of i-vector modeling on short and mismatch utterance duration for speaker verification. In: Thirteenth annual conference of the international speech communication association, INTERSPEECH, pp 2662–2665

26.

Kanagasundaram A, Vogt R, Dean D, Sridharan S, Mason M (2011) I-vector based speaker recognition on short utterances. In: Proceedings of Interspeech, Florence, Italy, 2011, pp 2341–2344

27.

Mandasari MI, McLaren M, van Leeuwen DA (2011) Evaluation of i-vector speaker recognition systems for forensic application. In: Proceedings of Interspeech. ISCA, Firenze

28.

Hasan T, Saeidi R, Hansen JHL, van Leeuwen DA (2013) Duration mismatch compensation for i-vector based speaker recognition systems. In: Proceedings of IEEE ICASSP, Vancouver, Canada

29.

The NIST year 2012 speaker recognition evaluation plan (2012). [online] Available: http://www.nist.gov/itl/iad/mig/upload/NIST_SRE12_evalplan-v17-r1.pdf

30.

Kinnunen T, Li H (2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Commun 52(1):12–40CrossRef

31.

Zhang WQ, Zhao J, Zhang WL, Liu J (2014). Multi-scale kernels for short utterance speaker recognition. In: The 9th international symposium on Chinese spoken language processing. IEEE, pp 414–417

32.

Fauve B, Evans N, Mason J (2008) Improving the performance of text-independent short duration SVM-and GMM-based speaker verification. In: Proceedings of Odyssey, Stellenbosch, South Africa

33.

McLaren M, Vogt R, Baker B, Sridharan S (2010) Experiments in SVM-based speaker verification using short utterances. In: Proceedings of Odyssey workshop 2010

34.

Lan Y, Hu Z, Soh YC, Huang GB (2013) An extreme learning machine approach for speaker recognition. Neural Comput Appl 22(3–4):417–425CrossRef

35.

Heigold G, Moreno I, Bengio S, Shazeer N (2016) End-to-end text-dependent speaker verification. In: 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 5115–5119

36.

Zhang SX, Chen Z, Zhao Y, Li J, Gong Y (2017) End-to-end attention based text-dependent speaker verification. arXiv preprint arXiv:1701.00562

37.

Variani E, Lei X, McDermott E, Moreno IL, Gonzalez-Dominguez J (2014) Deep neural networks for small footprint text-dependent speaker verification. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 4052–4056

38.

Heigold G, Moreno I, Bengio S, Shazeer N (2016) End-to-endtext-dependent speaker verification. In: 2016 IEEE international conference on Acoustics, speech and signal processing (ICASSP). IEEE, pp 5115–5119

39.

Zhang C, Koishida K (2017) End-to-end text-independent speaker verification with triplet loss on short utterances. In: Interspeech, Copyright © 2017 ISCA, August 20–24, Stockholm, Sweden, pp 1487–1491. https://doi.org/10.21437/Interspeech.2017-1608

40.

Snyder D, Ghahremani P, Povey D, Garcia-Romero D, Carmiel Y, Khudanpur S (2016) Deep neural network-based speaker embeddings for end-to-end speaker verification. In: 2016 IEEE spoken language technology workshop (SLT), IEEE, pp 165–170

41.

Bhattacharya G, Alam MJ, Kenny P (2017) Deep speaker embeddings for short-duration speaker verification. In: Interspeech, Copyright © 2017 ISCA, August 20–24, Stockholm, Sweden, pp 1517–1521. https://doi.org/10.21437/Interspeech.2017-1575

42.

Kanagasundaram A, Vogt R, Dean D, Sridharan S (2012) PLDA based speaker recognition on short utterances. In: The speaker and language recognition workshop (Odyssey 2012), ISCA, 2012

43.

Kanagasundaram A, Dean D, Sridharan S (2014) Improving PLDA speaker verification with limited development data. In: IEEE international conference on acoustics, speech and signal processing

44.

Rahman MH, Kanagasundaram A, Himawan I, Dean D, Sridharan S (2018) Improving PLDA speaker verification performance using domain mismatch compensation techniques. Comput Speech Lang 47:240–258CrossRef

45.

Cumani S, Plchot O, Laface P (2014) On the use of i-vector posterior distributions in probabilistic linear discriminant analysis. IEEE Trans Audio Speech Lang Process 22(4):846–857CrossRef

46.

Ganapathy S, Mallidi SH, Hermansky H (2014) Robust feature extraction using modulation filtering of autoregressive models. IEEE Trans Audio Speech Lang Process 22(8):1285–1295CrossRef

47.

Zhao X, Wang Y, Wang D (2014) Robust speaker identificat ion in noisy and reverberant conditions. IEEE Trans Audio Speech Lang Process 22(4):836–845CrossRef

48.

Yu C, Liu G, Hahm S, Hansen JHL (2014) Uncertainty propagation in front end factor analysis for noise robust speaker recognition. In: Proceedings of the 39th ICASSP, Florence, Italy, pp 4017–4021

49.

Hurmalainen A, Saeidi R, Virtanen T (2015) Noise robust speaker recognition with convolutive sparse coding. In: Sixteenth annual conference of the international speech communication association

50.

Lei Y, McLaren M, Ferrer L, Scheffer N (2014) Simplified VTS-based i-vector extraction in noise-robust speaker recognition. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 4037–4041. IEEE

51.

Kheder WB, Matrouf D, Bousquet PM, Bonastre JF, Ajili M (2017) Fast i-vector denoising using MAP estimation and a noise distributions database for robust speaker recognition. Comput Speech Lang 45:104–122CrossRef

52.

Ming J, Hazen TJ, Glass JR, Reynolds DA (2007) Robust speaker recognition in noisy conditions. IEEE Trans Audio Speech Lang Process 15(5):1711–1723CrossRef

53.

Lei Y, Burget L, Scheffer N (2013)A noise robust i-vector extractor using vector Taylor series for speaker recognition. In: Proceedings of the 38th ICASSP, Vancouver, BC, Canada, 2013, pp 6788–6791

54.

Alku P, Saeidi R (2017) The linear predictive modeling of speech from higher-lag autocorrelation coefficients applied to noise-robust speaker recognition. IEEE/ACM Trans Audio Speech Lang Process 25:1606–1617CrossRef

55.

Liu X, Sadeghian R, Zahorian SA (2017) A modulation feature set for robust automatic speech recognition in additive noise and reverberation. In: IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 5230–5234

56.

Zhao X, Shao Y, Wang DL (2012) CASA based robust speaker identification. IEEE Trans Audio Speech Lang Process 20(51):608–1616

57.

Venkatesan R, Ganesh AB (2018) Binaural classification-based speech segregation and robust speaker recognition system. Circuits Syst Signal Process 37(8):3383–3411MathSciNetCrossRef

58.

Fedila, M, Bengherabi M, Amrouche A (2018) Gammatone filterbank and symbiotic combination of amplitude and phase-based spectra for robust speaker verification under noisy conditions and compression artifacts. Multimedia Tools Appl 77(13):16721–16739CrossRef

59.

Atal B (1974) Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. J Acoustic Soc Am 55:1304CrossRef

60.

Mammone R, Zhang X, Ramachandran R (1996) Robust speaker recognition: a feature-based approach. IEEE Signal Process Mag 13(5):58–71CrossRef

61.

Reynolds D (1994) Experimental evaluation of features for robust speaker identification. IEEE Trans Speech Audio Process 2(4):639–643CrossRef

62.

Sheikhan M, Gharavian D, Ashoftedel F (2012) Using DTW neural–based MFCC warping to improve emotional speech recognition. Neural Comput Appl 21(7):1765–1773CrossRef

63.

Turner C, Joseph A (2015) A wavelet packet and mel-frequency cepstral coefficients-based feature extraction method for speaker identification. Procedia Comput Sci 61:416–421CrossRef

64.

Shahamiri SR, Salim SSB (2014) Artificial neural networks as speech recognisers for dysarthric speech: identifying the best-performing set of MFCC parameters and studying a speaker-independent approach. Adv Eng Inf 28(1):102–110CrossRef

65.

Ali H, Tran SN, Benetos E, Garcez ASDA (2018) Speaker recognition with hybrid features from a deep belief network. Neural Comput Appl 29(6):13–19CrossRef

66.

Young S, Kershaw D, Odell J, Ollason D, Valtchev V, Woodland P (2002) Hidden Markov model toolkit (HTK) version 3.4 user’s guide

67.

Zhang X, Zou X, Sun M, Zheng TF, Jia C, Wang Y (2019) Noise robust speaker recognition based on adaptive frame weighting in GMM for I-vector extraction. IEEE Access 7:27874–27882CrossRef

68.

Islam MA, Jassim WA, Cheok NS, Zilany MSA (2016) A robust speaker identification system using the responses from a model of the auditory periphery. PLoS ONE 11(7):e0158520CrossRef

69.

Zhao X, Shao Y, Wang D (2012) CASA-based robust speaker identification. Audio Speech Lang Process IEEE Trans 20(5):1608–1616CrossRef

70.

Zhao X, Wang D (2013) Analyzing noise robustness of MFCC and GFCC features in speaker identification. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 7204–7208

71.

Shao Y, Wang D (2008) Robust speaker identification using auditory features and computational auditory scene analysis. In: IEEE international conference on acoustics, speech and signal processing, 2008. ICASSP 2008. IEEE, pp 1589–1592

72.

Kenny P (2010) Bayesian speaker verification with heavy-tailed priors. In: Proceedings of odyssey speaker and language recognition workshop

73.

Garofolo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS, Dahlgren NL (1993) DARPA TIMIT acoustic phonetic continuous speech corpus CDROM. NIST

74.

Feng L, Hansen LK (2005) A new database for speaker recognition. Informatics and mathematical modeling. Technical University of Denmark, DTU

75.

Reynolds DA (1995) Automatic speaker recognition using gaussian mixture speaker models. Linc Lab J 8(2):173–192

76.

Jankowski C, Kalyanswamy A, Basson S, Spitz J (1990) NTIMIT: a phonetically balanced, continuous speech, telephone bandwidth speech database. ICASSP

77.

The NIST Year 2010 Speaker Recognition Evaluation Plan (2010). http://www.nist.gov/itl/iad/mig/upload/NIST_SRE10_evalplan-r6.pdf

Title: Robust features for text-independent speaker recognition with short utterances
Authors: Rania Chakroun
Mondher Frikha
Publication date: 10-03-2020
Publisher: Springer London
Published in: Neural Computing and Applications / Issue 17/2020
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI: https://doi.org/10.1007/s00521-020-04793-y

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Other articles of this Issue 17/2020

Impact of finite wavy wall thickness on entropy generation and natural convection of nanofluid in cavity partially filled with non-Darcy porous layer

AMSOM: artificial metaplasticity in SOM neural networks—application to MIT-BIH arrhythmias database

UAM-RDE: an uncertainty analysis method for RSSI-based distance estimation in wireless sensor networks

Wajsberg algebras of order

Multimodal feature fusion for CNN-based gait recognition: an empirical comparison

Using eye-tracking into decision makers evaluation in evolutionary interactive UA-FLP algorithms

Premium Partner