Top

Arabian Journal for Science and Engineering

Published in:

19-08-2019 | Research Article - Computer Engineering and Computer Science

Filterbank Optimization for Text-Dependent Speaker Verification by Evolutionary Algorithm Using Spline-Defined Design Parameters

Authors: Mohammad Azharuddin Laskar, Rabul Hussain Laskar

Published in: Arabian Journal for Science and Engineering | Issue 11/2019

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Mel frequency cepstral coefficients (MFCCs) have been the most predominantly used spectral features in many a speech-based application. It was primarily introduced to address speech recognition and was later adopted for various other applications such as speaker recognition and emotion recognition. Several findings, in recent times, suggest that Mel-scale filterbank, which is primarily inspired by human perception phenomenon, may not be the most optimum one for speaker recognition. Working in the same direction, this study attempts optimization of filterbank design for text-dependent speaker verification. Motivated by the success of evolutionary computations in the related fields, an evolutionary algorithm is used to carry out this optimization process. This brings into effect data-driven learning of the design parameters and is hypothesized to yield filterbanks which would suit the specific task of speaker-phrase discrimination. The filterbanks have been optimized for the task of text-dependent speaker verification in general, and also for specific cases of speakers and phrases. The proposed filterbank results in relative equal error rate reduction of up to 39.41% with respect to the baseline MFCCs.

previous article Bidirectional Encoder–Decoder Model for Arabic Named Entity Recognition

next article Cluster-Based Architecture Capable for Device-to-Device Millimeter-Wave Communications in 5G Cellular Networks

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Hansen, J.H.; Hasan, T.: Speaker recognition by machines and humans: a tutorial review. IEEE Signal Process. Mag. 32(6), 74–99 (2015)CrossRef

Tirumala, S.S.; Shahamiri, S.R.; Garhwal, A.S.; Wang, R.: Speaker identification features extraction methods: a systematic review. Expert Syst. Appl. 90, 250–271 (2017)CrossRef

Mahmood, A.; Alsulaiman, M.; Muhammad, G.: Automatic speaker recognition using multi-directional local features (mdlf). Arab. J. Sci. Eng. 39(5), 3799–3811 (2014)CrossRef

Aronowitz, H.: Text dependent speaker verification using a small development set. In: Odyssey 2012-The Speaker and Language Recognition Workshop (2012)

Al-Kaltakchi, M.T.; Woo, W.L.; Dlay, S.; Chambers, J.A.: Evaluation of a speaker identification system with and without fusion using three databases in the presence of noise and handset effects. EURASIP J. Adv. Signal Process. 2017, 80 (2017)CrossRef

Al-Kaltakchi, M.T.; Woo W.L.; Dlay, S.S.; Chambers, J.A.: Comparison of I-vector and GMM-UBM approaches to speaker identification with TIMIT and NIST 2008 databases in challenging environments. In: 2017 25th European Signal Processing Conference (EUSIPCO), pp. 533–537. IEEE (2017)

Larcher, A.; Lee, K.A.; Ma, B.; Li, H.: Text-dependent speaker verification: classifiers, databases and RSR2015. Speech Commun. 60, 56–77 (2014)CrossRef

Al-Kaltakchi, M.T.; Woo, W.L.; Dlay, S.S.; Chambers, J.A.: Multi-dimensional I-vector closed set speaker identification based on an extreme learning machine with and without fusion technologies. In 2017 Intelligent Systems Conference (IntelliSys), pp. 1141–1146. IEEE (2017)

Hanilçi, C.; Çeliktaş, H.: Turkish text-dependent speaker verification using i-vector/PLDA approach. In 2018 26th Signal Processing and Communications Applications Conference (SIU), pp. 1–4. IEEE (2018).

10.

Kenny, P.: Joint factor analysis of speaker and session variability: theory and algorithms. CRIM, Montreal (Report) CRIM-06/08-13, 14, 28–29 (2005)

11.

Kenny, P.; Stafylakis, T.; Ouellet, P.; Alam, M.J.: JFA-based front ends for speaker recognition. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1705–1709. IEEE (2014).

12.

Kanagasundaram, A.; Vogt, R.; Dean, D.B.; Sridharan, S.; Mason, M.W.: I-vector based speaker recognition on short utterances. In Proceedings of the 12th Annual Conference of the International Speech Communication Association, International Speech Communication Association (ISCA), pp. 2341–2344 (2011).

13.

Larcher, A.; Bonastre, J.F.; Mason, J.: Reinforced temporal structure information for embedded utterance-based speaker recognition. In: Interspeech (2008)

14.

Ali, H.; Tran, S.N.; Benetos, E.; Garcez, A.S.D.A.: Speaker recognition with hybrid features from a deep belief network. Neural Comput. Appl. 29(6), 13–19 (2018)CrossRef

15.

Zeinali, H.; Sameti, H.; Burget, L.: Text-dependent speaker verification based on i-vectors, neural networks and hidden Markov models. Comput. Speech Lang. 46, 53–71 (2017)CrossRef

16.

Liu, Y.; Qian, Y.; Chen, N.; Fu, T.; Zhang, Y.; Yu, K.: Deep feature for text-dependent speaker verification. Speech Commun. 73, 1–13 (2015)CrossRef

17.

Variani, E.; Lei, X.; McDermott, E.; Moreno, I.L.; Gonzalez-Dominguez, J.: Deep neural networks for small footprint text-dependent speaker verification. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4052–4056. IEEE (2014)

18.

Bhattacharya, G.; Alam, M.J.; Stafylakis, T. Kenny, P.: Deep neural network based text-dependent speaker recognition: preliminary results. In: Odyssey Speak. Lang. Recognit. Work, pp. 9–15 (2016)

19.

Heigold, G.; Moreno, I.; Bengio, S.; Shazeer, N.: End-to-end text-dependent speaker verification. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5115-5119. IEEE (2016).

20.

Sadjadi, S.O.; Hansen, J.H.: Mean Hilbert envelope coefficients (MHEC) for robust speaker and language identification. Speech Commun. 72, 138–148 (2015)CrossRef

21.

Lippmann, R.P.: Speech recognition by machines and humans. Speech Commun. 22(1), 1–15 (1997)CrossRef

22.

Charbuillet, C.; Gas, B.; Chetouani, M.; Zarader, J.L.: Filter bank design for speaker diarization based on genetic algorithms. In: 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, 1, I-I. IEEE (2006)

23.

Pinheiro, H.N.; Neto, F.M.; Oliveira, A.L.; Ren, T.I.; Cavalcanti, G.D.; Adami, A.G.: Optimizing speaker-specific filter banks for speaker verification. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5350–5354. IEEE (2017)

24.

Miyajima, C.; Watanabe, H.; Tokuda, K.; Kitamura, T.; Katagiri, S.: A new approach to designing a feature extractor in speaker identification based on discriminative feature extraction. Speech Commun. 35(3–4), 203–218 (2001)CrossRefMATH

25.

Charbuillet, C.; Gas, B.; Chetouani, M.; Zarader, J.L.: Optimizing feature complementarity by evolution strategy: application to automatic speaker verification. Speech Commun. 51(9), 724–731 (2009)CrossRef

26.

Vignolo, L.D.; Prasanna, S.M.; Dandapat, S.; Rufiner, H.L.; Milone, D.H.: Feature optimization for stress recognition in speech. Pattern Recognit. Lett. 84, 1–7 (2016)CrossRef

27.

Chittaragi, N.B.; Prakash, A.; Koolagudi, S.G.: Dialect identification using spectral and prosodic features on single and ensemble classifiers. Arab. J. Sci. Eng. 43(8), 4289–4302 (2018)CrossRef

28.

Dey, S.; Motlicek, P.; Madikeri, S.; Ferras, M.: Template-matching for text-dependent speaker verification. Speech Commun. 88, 96–105 (2017)CrossRef

29.

Vignolo, L.D.; Rufiner, H.L.; Milone, D.H.; Goddard, J.C.: Evolutionary cepstral coefficients. Appl. Soft Comput. 11(4), 3419–3428 (2011)CrossRef

30.

Vignolo, L.D.; Rufiner, H.L.; Milone, D.H.; Goddard, J.C.: Evolutionary splines for cepstral filterbank optimization in phoneme classification. EURASIP J. Adv. Signal Process. 2011, 8 (2011)CrossRef

31.

Whitley, D.: A genetic algorithm tutorial. Stat. Comput. 4(2), 65–85 (1994)CrossRef

32.

Deb, K.: An introduction to genetic algorithms. Sadhana 24(4–5), 293–315 (1999)MathSciNetCrossRefMATH

33.

Al-Salami, N.M.: Evolutionary algorithm definition. Am. J. Eng. Appl. Sci. 2(4), 789–795 (2009)CrossRef

34.

Goldberg, D.E.: Genetic Algorithms. Pearson Education, New delhi (2006)

35.

Young, S.J.; Young, S.: The HTK Hidden Markov Model Toolkit: Design and Philosophy, p. 28. University of Cambridge, Department of Engineering, Cambridge (1993)

36.

Gallardo, L.F.: Human and Automatic Speaker Recognition Over Telecommunication Channels. Springer, Berlin (2015)

37.

Lei, H.; Lopez, E.: Mel, linear, and antimel frequency cepstral coefficients in broad phonetic regions for telephone speaker recognition. In: Tenth Annual Conference of the International Speech Communication Association (2009)

38.

Zeinali, H.; Sameti, H.; Burget, L.: HMM-based phrase-independent i-vector extractor for text-dependent speaker verification. IEEE ACM Trans. Audio Speech Lang. Process. 25(7), 1421–1435 (2017)CrossRef

39.

Chen, N.; Qian, Y.; Yu, K.: Multi-task learning for text-dependent speaker verification. In: Sixteenth annual conference of the international speech communication association (2015)

40.

Snyder, D.; Garcia-Romero, D.; Sell, G.; Povey, D.; Khudanpur, S.: X-vectors: robust DNN embeddings for speaker recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5329–5333. IEEE (2018)

41.

Laskar, M.A.; Laskar, R.H.: Integrating DNN–HMM technique with hierarchical multi-layer acoustic model for text-dependent speaker verification. Circuits Syst. Signal Process. 38, 1531–5878 (2019)CrossRef

Title: Filterbank Optimization for Text-Dependent Speaker Verification by Evolutionary Algorithm Using Spline-Defined Design Parameters
Authors: Mohammad Azharuddin Laskar
Rabul Hussain Laskar
Publication date: 19-08-2019
Publisher: Springer Berlin Heidelberg
Published in: Arabian Journal for Science and Engineering / Issue 11/2019
Print ISSN: 2193-567X
Electronic ISSN: 2191-4281
DOI: https://doi.org/10.1007/s13369-019-04090-4

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Other articles of this Issue 11/2019

Microarray Filtering-Based Fuzzy C-Means Clustering and Classification in Genomic Signal Processing

Toward an Efficient Deployment of Open Source Software in the Internet of Vehicles Field

Fast Execution of Black-Box Algorithms Through a Piece-Wise Linear Interpolation Technique

On Some Improved Versions of Whale Optimization Algorithm

Accuracy Control of Fiber Cable’s Outer Diameter with Algorithms of Filtration, Prediction and PID Controller

Developing a Portable Human–Robot Interaction (HRI) Framework for Outdoor Robots Through Selective Compartmentalization

Premium Partners