Skip to main content
Top
Published in: Arabian Journal for Science and Engineering 11/2019

19-08-2019 | Research Article - Computer Engineering and Computer Science

Filterbank Optimization for Text-Dependent Speaker Verification by Evolutionary Algorithm Using Spline-Defined Design Parameters

Authors: Mohammad Azharuddin Laskar, Rabul Hussain Laskar

Published in: Arabian Journal for Science and Engineering | Issue 11/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Mel frequency cepstral coefficients (MFCCs) have been the most predominantly used spectral features in many a speech-based application. It was primarily introduced to address speech recognition and was later adopted for various other applications such as speaker recognition and emotion recognition. Several findings, in recent times, suggest that Mel-scale filterbank, which is primarily inspired by human perception phenomenon, may not be the most optimum one for speaker recognition. Working in the same direction, this study attempts optimization of filterbank design for text-dependent speaker verification. Motivated by the success of evolutionary computations in the related fields, an evolutionary algorithm is used to carry out this optimization process. This brings into effect data-driven learning of the design parameters and is hypothesized to yield filterbanks which would suit the specific task of speaker-phrase discrimination. The filterbanks have been optimized for the task of text-dependent speaker verification in general, and also for specific cases of speakers and phrases. The proposed filterbank results in relative equal error rate reduction of up to 39.41% with respect to the baseline MFCCs.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Hansen, J.H.; Hasan, T.: Speaker recognition by machines and humans: a tutorial review. IEEE Signal Process. Mag. 32(6), 74–99 (2015)CrossRef Hansen, J.H.; Hasan, T.: Speaker recognition by machines and humans: a tutorial review. IEEE Signal Process. Mag. 32(6), 74–99 (2015)CrossRef
2.
go back to reference Tirumala, S.S.; Shahamiri, S.R.; Garhwal, A.S.; Wang, R.: Speaker identification features extraction methods: a systematic review. Expert Syst. Appl. 90, 250–271 (2017)CrossRef Tirumala, S.S.; Shahamiri, S.R.; Garhwal, A.S.; Wang, R.: Speaker identification features extraction methods: a systematic review. Expert Syst. Appl. 90, 250–271 (2017)CrossRef
3.
go back to reference Mahmood, A.; Alsulaiman, M.; Muhammad, G.: Automatic speaker recognition using multi-directional local features (mdlf). Arab. J. Sci. Eng. 39(5), 3799–3811 (2014)CrossRef Mahmood, A.; Alsulaiman, M.; Muhammad, G.: Automatic speaker recognition using multi-directional local features (mdlf). Arab. J. Sci. Eng. 39(5), 3799–3811 (2014)CrossRef
4.
go back to reference Aronowitz, H.: Text dependent speaker verification using a small development set. In: Odyssey 2012-The Speaker and Language Recognition Workshop (2012) Aronowitz, H.: Text dependent speaker verification using a small development set. In: Odyssey 2012-The Speaker and Language Recognition Workshop (2012)
5.
go back to reference Al-Kaltakchi, M.T.; Woo, W.L.; Dlay, S.; Chambers, J.A.: Evaluation of a speaker identification system with and without fusion using three databases in the presence of noise and handset effects. EURASIP J. Adv. Signal Process. 2017, 80 (2017)CrossRef Al-Kaltakchi, M.T.; Woo, W.L.; Dlay, S.; Chambers, J.A.: Evaluation of a speaker identification system with and without fusion using three databases in the presence of noise and handset effects. EURASIP J. Adv. Signal Process. 2017, 80 (2017)CrossRef
6.
go back to reference Al-Kaltakchi, M.T.; Woo W.L.; Dlay, S.S.; Chambers, J.A.: Comparison of I-vector and GMM-UBM approaches to speaker identification with TIMIT and NIST 2008 databases in challenging environments. In: 2017 25th European Signal Processing Conference (EUSIPCO), pp. 533–537. IEEE (2017) Al-Kaltakchi, M.T.; Woo W.L.; Dlay, S.S.; Chambers, J.A.: Comparison of I-vector and GMM-UBM approaches to speaker identification with TIMIT and NIST 2008 databases in challenging environments. In: 2017 25th European Signal Processing Conference (EUSIPCO), pp. 533–537. IEEE (2017)
7.
go back to reference Larcher, A.; Lee, K.A.; Ma, B.; Li, H.: Text-dependent speaker verification: classifiers, databases and RSR2015. Speech Commun. 60, 56–77 (2014)CrossRef Larcher, A.; Lee, K.A.; Ma, B.; Li, H.: Text-dependent speaker verification: classifiers, databases and RSR2015. Speech Commun. 60, 56–77 (2014)CrossRef
8.
go back to reference Al-Kaltakchi, M.T.; Woo, W.L.; Dlay, S.S.; Chambers, J.A.: Multi-dimensional I-vector closed set speaker identification based on an extreme learning machine with and without fusion technologies. In 2017 Intelligent Systems Conference (IntelliSys), pp. 1141–1146. IEEE (2017) Al-Kaltakchi, M.T.; Woo, W.L.; Dlay, S.S.; Chambers, J.A.: Multi-dimensional I-vector closed set speaker identification based on an extreme learning machine with and without fusion technologies. In 2017 Intelligent Systems Conference (IntelliSys), pp. 1141–1146. IEEE (2017)
9.
go back to reference Hanilçi, C.; Çeliktaş, H.: Turkish text-dependent speaker verification using i-vector/PLDA approach. In 2018 26th Signal Processing and Communications Applications Conference (SIU), pp. 1–4. IEEE (2018). Hanilçi, C.; Çeliktaş, H.: Turkish text-dependent speaker verification using i-vector/PLDA approach. In 2018 26th Signal Processing and Communications Applications Conference (SIU), pp. 1–4. IEEE (2018).
10.
go back to reference Kenny, P.: Joint factor analysis of speaker and session variability: theory and algorithms. CRIM, Montreal (Report) CRIM-06/08-13, 14, 28–29 (2005) Kenny, P.: Joint factor analysis of speaker and session variability: theory and algorithms. CRIM, Montreal (Report) CRIM-06/08-13, 14, 28–29 (2005)
11.
go back to reference Kenny, P.; Stafylakis, T.; Ouellet, P.; Alam, M.J.: JFA-based front ends for speaker recognition. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1705–1709. IEEE (2014). Kenny, P.; Stafylakis, T.; Ouellet, P.; Alam, M.J.: JFA-based front ends for speaker recognition. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1705–1709. IEEE (2014).
12.
go back to reference Kanagasundaram, A.; Vogt, R.; Dean, D.B.; Sridharan, S.; Mason, M.W.: I-vector based speaker recognition on short utterances. In Proceedings of the 12th Annual Conference of the International Speech Communication Association, International Speech Communication Association (ISCA), pp. 2341–2344 (2011). Kanagasundaram, A.; Vogt, R.; Dean, D.B.; Sridharan, S.; Mason, M.W.: I-vector based speaker recognition on short utterances. In Proceedings of the 12th Annual Conference of the International Speech Communication Association, International Speech Communication Association (ISCA), pp. 2341–2344 (2011).
13.
go back to reference Larcher, A.; Bonastre, J.F.; Mason, J.: Reinforced temporal structure information for embedded utterance-based speaker recognition. In: Interspeech (2008) Larcher, A.; Bonastre, J.F.; Mason, J.: Reinforced temporal structure information for embedded utterance-based speaker recognition. In: Interspeech (2008)
14.
go back to reference Ali, H.; Tran, S.N.; Benetos, E.; Garcez, A.S.D.A.: Speaker recognition with hybrid features from a deep belief network. Neural Comput. Appl. 29(6), 13–19 (2018)CrossRef Ali, H.; Tran, S.N.; Benetos, E.; Garcez, A.S.D.A.: Speaker recognition with hybrid features from a deep belief network. Neural Comput. Appl. 29(6), 13–19 (2018)CrossRef
15.
go back to reference Zeinali, H.; Sameti, H.; Burget, L.: Text-dependent speaker verification based on i-vectors, neural networks and hidden Markov models. Comput. Speech Lang. 46, 53–71 (2017)CrossRef Zeinali, H.; Sameti, H.; Burget, L.: Text-dependent speaker verification based on i-vectors, neural networks and hidden Markov models. Comput. Speech Lang. 46, 53–71 (2017)CrossRef
16.
go back to reference Liu, Y.; Qian, Y.; Chen, N.; Fu, T.; Zhang, Y.; Yu, K.: Deep feature for text-dependent speaker verification. Speech Commun. 73, 1–13 (2015)CrossRef Liu, Y.; Qian, Y.; Chen, N.; Fu, T.; Zhang, Y.; Yu, K.: Deep feature for text-dependent speaker verification. Speech Commun. 73, 1–13 (2015)CrossRef
17.
go back to reference Variani, E.; Lei, X.; McDermott, E.; Moreno, I.L.; Gonzalez-Dominguez, J.: Deep neural networks for small footprint text-dependent speaker verification. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4052–4056. IEEE (2014) Variani, E.; Lei, X.; McDermott, E.; Moreno, I.L.; Gonzalez-Dominguez, J.: Deep neural networks for small footprint text-dependent speaker verification. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4052–4056. IEEE (2014)
18.
go back to reference Bhattacharya, G.; Alam, M.J.; Stafylakis, T. Kenny, P.: Deep neural network based text-dependent speaker recognition: preliminary results. In: Odyssey Speak. Lang. Recognit. Work, pp. 9–15 (2016) Bhattacharya, G.; Alam, M.J.; Stafylakis, T. Kenny, P.: Deep neural network based text-dependent speaker recognition: preliminary results. In: Odyssey Speak. Lang. Recognit. Work, pp. 9–15 (2016)
19.
go back to reference Heigold, G.; Moreno, I.; Bengio, S.; Shazeer, N.: End-to-end text-dependent speaker verification. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5115-5119. IEEE (2016). Heigold, G.; Moreno, I.; Bengio, S.; Shazeer, N.: End-to-end text-dependent speaker verification. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5115-5119. IEEE (2016).
20.
go back to reference Sadjadi, S.O.; Hansen, J.H.: Mean Hilbert envelope coefficients (MHEC) for robust speaker and language identification. Speech Commun. 72, 138–148 (2015)CrossRef Sadjadi, S.O.; Hansen, J.H.: Mean Hilbert envelope coefficients (MHEC) for robust speaker and language identification. Speech Commun. 72, 138–148 (2015)CrossRef
21.
go back to reference Lippmann, R.P.: Speech recognition by machines and humans. Speech Commun. 22(1), 1–15 (1997)CrossRef Lippmann, R.P.: Speech recognition by machines and humans. Speech Commun. 22(1), 1–15 (1997)CrossRef
22.
go back to reference Charbuillet, C.; Gas, B.; Chetouani, M.; Zarader, J.L.: Filter bank design for speaker diarization based on genetic algorithms. In: 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, 1, I-I. IEEE (2006) Charbuillet, C.; Gas, B.; Chetouani, M.; Zarader, J.L.: Filter bank design for speaker diarization based on genetic algorithms. In: 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, 1, I-I. IEEE (2006)
23.
go back to reference Pinheiro, H.N.; Neto, F.M.; Oliveira, A.L.; Ren, T.I.; Cavalcanti, G.D.; Adami, A.G.: Optimizing speaker-specific filter banks for speaker verification. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5350–5354. IEEE (2017) Pinheiro, H.N.; Neto, F.M.; Oliveira, A.L.; Ren, T.I.; Cavalcanti, G.D.; Adami, A.G.: Optimizing speaker-specific filter banks for speaker verification. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5350–5354. IEEE (2017)
24.
go back to reference Miyajima, C.; Watanabe, H.; Tokuda, K.; Kitamura, T.; Katagiri, S.: A new approach to designing a feature extractor in speaker identification based on discriminative feature extraction. Speech Commun. 35(3–4), 203–218 (2001)CrossRefMATH Miyajima, C.; Watanabe, H.; Tokuda, K.; Kitamura, T.; Katagiri, S.: A new approach to designing a feature extractor in speaker identification based on discriminative feature extraction. Speech Commun. 35(3–4), 203–218 (2001)CrossRefMATH
25.
go back to reference Charbuillet, C.; Gas, B.; Chetouani, M.; Zarader, J.L.: Optimizing feature complementarity by evolution strategy: application to automatic speaker verification. Speech Commun. 51(9), 724–731 (2009)CrossRef Charbuillet, C.; Gas, B.; Chetouani, M.; Zarader, J.L.: Optimizing feature complementarity by evolution strategy: application to automatic speaker verification. Speech Commun. 51(9), 724–731 (2009)CrossRef
26.
go back to reference Vignolo, L.D.; Prasanna, S.M.; Dandapat, S.; Rufiner, H.L.; Milone, D.H.: Feature optimization for stress recognition in speech. Pattern Recognit. Lett. 84, 1–7 (2016)CrossRef Vignolo, L.D.; Prasanna, S.M.; Dandapat, S.; Rufiner, H.L.; Milone, D.H.: Feature optimization for stress recognition in speech. Pattern Recognit. Lett. 84, 1–7 (2016)CrossRef
27.
go back to reference Chittaragi, N.B.; Prakash, A.; Koolagudi, S.G.: Dialect identification using spectral and prosodic features on single and ensemble classifiers. Arab. J. Sci. Eng. 43(8), 4289–4302 (2018)CrossRef Chittaragi, N.B.; Prakash, A.; Koolagudi, S.G.: Dialect identification using spectral and prosodic features on single and ensemble classifiers. Arab. J. Sci. Eng. 43(8), 4289–4302 (2018)CrossRef
28.
go back to reference Dey, S.; Motlicek, P.; Madikeri, S.; Ferras, M.: Template-matching for text-dependent speaker verification. Speech Commun. 88, 96–105 (2017)CrossRef Dey, S.; Motlicek, P.; Madikeri, S.; Ferras, M.: Template-matching for text-dependent speaker verification. Speech Commun. 88, 96–105 (2017)CrossRef
29.
go back to reference Vignolo, L.D.; Rufiner, H.L.; Milone, D.H.; Goddard, J.C.: Evolutionary cepstral coefficients. Appl. Soft Comput. 11(4), 3419–3428 (2011)CrossRef Vignolo, L.D.; Rufiner, H.L.; Milone, D.H.; Goddard, J.C.: Evolutionary cepstral coefficients. Appl. Soft Comput. 11(4), 3419–3428 (2011)CrossRef
30.
go back to reference Vignolo, L.D.; Rufiner, H.L.; Milone, D.H.; Goddard, J.C.: Evolutionary splines for cepstral filterbank optimization in phoneme classification. EURASIP J. Adv. Signal Process. 2011, 8 (2011)CrossRef Vignolo, L.D.; Rufiner, H.L.; Milone, D.H.; Goddard, J.C.: Evolutionary splines for cepstral filterbank optimization in phoneme classification. EURASIP J. Adv. Signal Process. 2011, 8 (2011)CrossRef
31.
go back to reference Whitley, D.: A genetic algorithm tutorial. Stat. Comput. 4(2), 65–85 (1994)CrossRef Whitley, D.: A genetic algorithm tutorial. Stat. Comput. 4(2), 65–85 (1994)CrossRef
33.
go back to reference Al-Salami, N.M.: Evolutionary algorithm definition. Am. J. Eng. Appl. Sci. 2(4), 789–795 (2009)CrossRef Al-Salami, N.M.: Evolutionary algorithm definition. Am. J. Eng. Appl. Sci. 2(4), 789–795 (2009)CrossRef
34.
go back to reference Goldberg, D.E.: Genetic Algorithms. Pearson Education, New delhi (2006) Goldberg, D.E.: Genetic Algorithms. Pearson Education, New delhi (2006)
35.
go back to reference Young, S.J.; Young, S.: The HTK Hidden Markov Model Toolkit: Design and Philosophy, p. 28. University of Cambridge, Department of Engineering, Cambridge (1993) Young, S.J.; Young, S.: The HTK Hidden Markov Model Toolkit: Design and Philosophy, p. 28. University of Cambridge, Department of Engineering, Cambridge (1993)
36.
go back to reference Gallardo, L.F.: Human and Automatic Speaker Recognition Over Telecommunication Channels. Springer, Berlin (2015) Gallardo, L.F.: Human and Automatic Speaker Recognition Over Telecommunication Channels. Springer, Berlin (2015)
37.
go back to reference Lei, H.; Lopez, E.: Mel, linear, and antimel frequency cepstral coefficients in broad phonetic regions for telephone speaker recognition. In: Tenth Annual Conference of the International Speech Communication Association (2009) Lei, H.; Lopez, E.: Mel, linear, and antimel frequency cepstral coefficients in broad phonetic regions for telephone speaker recognition. In: Tenth Annual Conference of the International Speech Communication Association (2009)
38.
go back to reference Zeinali, H.; Sameti, H.; Burget, L.: HMM-based phrase-independent i-vector extractor for text-dependent speaker verification. IEEE ACM Trans. Audio Speech Lang. Process. 25(7), 1421–1435 (2017)CrossRef Zeinali, H.; Sameti, H.; Burget, L.: HMM-based phrase-independent i-vector extractor for text-dependent speaker verification. IEEE ACM Trans. Audio Speech Lang. Process. 25(7), 1421–1435 (2017)CrossRef
39.
go back to reference Chen, N.; Qian, Y.; Yu, K.: Multi-task learning for text-dependent speaker verification. In: Sixteenth annual conference of the international speech communication association (2015) Chen, N.; Qian, Y.; Yu, K.: Multi-task learning for text-dependent speaker verification. In: Sixteenth annual conference of the international speech communication association (2015)
40.
go back to reference Snyder, D.; Garcia-Romero, D.; Sell, G.; Povey, D.; Khudanpur, S.: X-vectors: robust DNN embeddings for speaker recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5329–5333. IEEE (2018) Snyder, D.; Garcia-Romero, D.; Sell, G.; Povey, D.; Khudanpur, S.: X-vectors: robust DNN embeddings for speaker recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5329–5333. IEEE (2018)
41.
go back to reference Laskar, M.A.; Laskar, R.H.: Integrating DNN–HMM technique with hierarchical multi-layer acoustic model for text-dependent speaker verification. Circuits Syst. Signal Process. 38, 1531–5878 (2019)CrossRef Laskar, M.A.; Laskar, R.H.: Integrating DNN–HMM technique with hierarchical multi-layer acoustic model for text-dependent speaker verification. Circuits Syst. Signal Process. 38, 1531–5878 (2019)CrossRef
Metadata
Title
Filterbank Optimization for Text-Dependent Speaker Verification by Evolutionary Algorithm Using Spline-Defined Design Parameters
Authors
Mohammad Azharuddin Laskar
Rabul Hussain Laskar
Publication date
19-08-2019
Publisher
Springer Berlin Heidelberg
Published in
Arabian Journal for Science and Engineering / Issue 11/2019
Print ISSN: 2193-567X
Electronic ISSN: 2191-4281
DOI
https://doi.org/10.1007/s13369-019-04090-4

Other articles of this Issue 11/2019

Arabian Journal for Science and Engineering 11/2019 Go to the issue

Research Article - Computer Engineering and Computer Science

On Some Improved Versions of Whale Optimization Algorithm

Premium Partners