Skip to main content
Top

2019 | OriginalPaper | Chapter

Speaker Recognition Based on Lightweight Neural Network for Smart Home Solutions

Authors : Haojun Ai, Wuyang Xia, Quanxin Zhang

Published in: Cyberspace Safety and Security

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

With the technological advancement of smart home devices, the lifestyles of people have been gradually changed. Meanwhile, speaker recognition is available in almost all smart home devices. Currently, the mainstream speaker recognition service is provided by a very deep neural network which trained on the cloud server. However, these deep neural networks are not suitable for deployment and operation on smart home devices. Aiming at this problem, in this paper, we propose a packet bottleneck method to improve SqueezeNet which has been widely used in the speaker recognition task. In the meantime, a lightweight structure named TrimNet has been designed. Besides, a model updating strategy based on transfer learning has been adopted to avoid model deteriorates due to the cold speech. The experimental results demonstrate that the proposed lightweight structure TrimNet is superior to SqueezeNet in classification accuracy, structural parameter quantity, and calculation amount. Moreover, the model updating method can increase the recognition rate of cold speech without damaging the recognition rate of other speakers.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Hansen, J.H.L., Hasan, T.: Speaker recognition by machines and humans: a tutorial review. IEEE Signal Process. Mag. 32(6), 74–99 (2015)CrossRef Hansen, J.H.L., Hasan, T.: Speaker recognition by machines and humans: a tutorial review. IEEE Signal Process. Mag. 32(6), 74–99 (2015)CrossRef
2.
go back to reference Richards, H., Haynes, R., Kim, Y., Bridle, J.: Generalised discriminative transform via curriculum learning for speaker recognition. In: 2018 IEEE ICASSP, pp. 5324–5328 (2018) Richards, H., Haynes, R., Kim, Y., Bridle, J.: Generalised discriminative transform via curriculum learning for speaker recognition. In: 2018 IEEE ICASSP, pp. 5324–5328 (2018)
3.
go back to reference Ghiurcau, M.V., Rusu, C., Astola, J.: A study of the effect of emotional state upon text-independent speaker identification. In: 2011 IEEE International Conference on ICASSP, 2011, pp. 4944–4947 (2011) Ghiurcau, M.V., Rusu, C., Astola, J.: A study of the effect of emotional state upon text-independent speaker identification. In: 2011 IEEE International Conference on ICASSP, 2011, pp. 4944–4947 (2011)
5.
go back to reference Przybocki, M.A., Martin, A.F., Le, A.N.: Nist speaker recognition evaluations utilizing the mixer corporał 2004, 2005, 2006. IEEE Trans. Audio Speech Lang. Process. 15(7), 1951–1959 (2007)CrossRef Przybocki, M.A., Martin, A.F., Le, A.N.: Nist speaker recognition evaluations utilizing the mixer corporał 2004, 2005, 2006. IEEE Trans. Audio Speech Lang. Process. 15(7), 1951–1959 (2007)CrossRef
6.
go back to reference Wagner, J., Fraga-Silva, T., Josse, Y., Schiller, D., Sei-derer, A., Andre, E.: Infected phonemes: how a cold impairs speech on a phonetic level. In: Proceedings of Interspeech 2017, pp. 3457–3461 (2017) Wagner, J., Fraga-Silva, T., Josse, Y., Schiller, D., Sei-derer, A., Andre, E.: Infected phonemes: how a cold impairs speech on a phonetic level. In: Proceedings of Interspeech 2017, pp. 3457–3461 (2017)
7.
go back to reference Bengio, Y.: End-to-end attention-based large vocabulary speech recognition. In: 2016 IEEE Inter- national Conference on ICASSP, 2016, pp. 4945–4949 (2016) Bengio, Y.: End-to-end attention-based large vocabulary speech recognition. In: 2016 IEEE Inter- national Conference on ICASSP, 2016, pp. 4945–4949 (2016)
8.
go back to reference Berry, D.A., Herzel, H., Titze, I.R., Krischer, K.: Interpretation of biomechanical simulations of normal and chaotic vocal fold oscillations with empirical eigenfunctions. J. Acoust. Soc. Am. 95(6), 3595–3604 (1994)CrossRef Berry, D.A., Herzel, H., Titze, I.R., Krischer, K.: Interpretation of biomechanical simulations of normal and chaotic vocal fold oscillations with empirical eigenfunctions. J. Acoust. Soc. Am. 95(6), 3595–3604 (1994)CrossRef
9.
go back to reference Godino Llorente, J.I., Díazde María, F.: Characterization of healthy and pathological voice through measures based on nonlinear dynamics. IEEE Trans. Audio Speech Lang. Process. 17(6), 1186–1195 (2009)CrossRef Godino Llorente, J.I., Díazde María, F.: Characterization of healthy and pathological voice through measures based on nonlinear dynamics. IEEE Trans. Audio Speech Lang. Process. 17(6), 1186–1195 (2009)CrossRef
10.
go back to reference Hansen, J.H.L., Gavidia Ceballos, L., Kaiser, J.F.: A nonlinear operator-based speech feature analysis method with application to vocal fold pathology assessment. IEEE Trans. Biomed. Eng. 45(3), 300–313 (1998)CrossRef Hansen, J.H.L., Gavidia Ceballos, L., Kaiser, J.F.: A nonlinear operator-based speech feature analysis method with application to vocal fold pathology assessment. IEEE Trans. Biomed. Eng. 45(3), 300–313 (1998)CrossRef
11.
go back to reference Tull, R.G., Rutledge, J.C., Larson, C.R: Cepstral analysis of cold-speech for speaker recognition: a second look. Ph.D. thesis, ASA (1996) Tull, R.G., Rutledge, J.C., Larson, C.R: Cepstral analysis of cold-speech for speaker recognition: a second look. Ph.D. thesis, ASA (1996)
12.
go back to reference Cole, R.A., Noel, M., Noel, V.: The CSLU speaker recognition corpus. In: Fifth International Conference on Spoken Language Processing (1998) Cole, R.A., Noel, M., Noel, V.: The CSLU speaker recognition corpus. In: Fifth International Conference on Spoken Language Processing (1998)
13.
go back to reference Beigi, H.: Effects of time lapse on speaker recognition results. In: 2009 16th Inter- national Conference on Digital Signal Processing, pp. 1–6 (2009) Beigi, H.: Effects of time lapse on speaker recognition results. In: 2009 16th Inter- national Conference on Digital Signal Processing, pp. 1–6 (2009)
14.
go back to reference Reynolds, D.A., Rose, R.C., et al.: Robust text-independent speaker identification using gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3(1), 72–83 (1995)CrossRef Reynolds, D.A., Rose, R.C., et al.: Robust text-independent speaker identification using gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3(1), 72–83 (1995)CrossRef
15.
go back to reference Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Speech Audio Process. 19(4), 788–798 (2011)CrossRef Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Speech Audio Process. 19(4), 788–798 (2011)CrossRef
16.
go back to reference Senior, I., Lopez-Moreno, A.: Improving DNN speaker independence with i-vector inputs. In: 2014 IEEE International Conference on ICASSP, 2014, pp. 225–229 (2014) Senior, I., Lopez-Moreno, A.: Improving DNN speaker independence with i-vector inputs. In: 2014 IEEE International Conference on ICASSP, 2014, pp. 225–229 (2014)
17.
go back to reference Kenny, P.: Bayesian speaker verification with heavy tailed priors. In: Odyssey 2010, p. 14 (2010) Kenny, P.: Bayesian speaker verification with heavy tailed priors. In: Odyssey 2010, p. 14 (2010)
18.
go back to reference Rohdin, J., Silnova, A., Diez, M., Plchot, O., Matějka, P., Burget, L.: End-to-end DNN based speaker recognition inspired by i-vector and PLDA. In: 2018 IEEE ICAS-SP, 2018, pp. 4874–4878 (2018) Rohdin, J., Silnova, A., Diez, M., Plchot, O., Matějka, P., Burget, L.: End-to-end DNN based speaker recognition inspired by i-vector and PLDA. In: 2018 IEEE ICAS-SP, 2018, pp. 4874–4878 (2018)
19.
go back to reference Yamada, T., Wang, L., Kai, A.: Improvement of distant-talking speaker identification using bottleneck features of DNN. In: Interspeech 2013, pp. 3661–3664 (2013) Yamada, T., Wang, L., Kai, A.: Improvement of distant-talking speaker identification using bottleneck features of DNN. In: Interspeech 2013, pp. 3661–3664 (2013)
20.
go back to reference Lei, Y., Scheffer, N., Ferrer, L., McLaren, M.: A novel scheme for speaker recognition using a phonetically- aware deep neural network. In: 2014 IEEE International Conference on ICASSP, 2014, pp. 1695–1699 (2014) Lei, Y., Scheffer, N., Ferrer, L., McLaren, M.: A novel scheme for speaker recognition using a phonetically- aware deep neural network. In: 2014 IEEE International Conference on ICASSP, 2014, pp. 1695–1699 (2014)
21.
go back to reference Torfi, A., Dawson, J., Nasrabadi, N.M.: Text-independent speaker verification using 3d convolutional neural networks. In: 2018 IEEE ICME, 2018, pp. 1–6 (2018) Torfi, A., Dawson, J., Nasrabadi, N.M.: Text-independent speaker verification using 3d convolutional neural networks. In: 2018 IEEE ICME, 2018, pp. 1–6 (2018)
Metadata
Title
Speaker Recognition Based on Lightweight Neural Network for Smart Home Solutions
Authors
Haojun Ai
Wuyang Xia
Quanxin Zhang
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-37352-8_37

Premium Partner