Skip to main content
Top
Published in: International Journal of Speech Technology 2/2020

07-05-2020

Speaker recognition based on short utterance compensation method of generative adversarial networks

Authors: Zhangfang Hu, Yaqin Fu, Yuan Luo, Xuan Xu, Zhiguang Xia, Hongwei Zhang

Published in: International Journal of Speech Technology | Issue 2/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

On the basis of gaussian mixture model–universal background model (GMM–UBM) in the speaker recognition system, the paper proposes a short utterance sample compensation method based on the generative adversarial network (GAN) to solve the problem of the inadequate corpus data caused by short utterance, which has led to a serious reduction of recognition rate. The presented method compensates the short utterance samples into the speech samples with sufficient speaker identity information by completing the antagonistic training of generator network and discriminator network. In order to avoid the model crash and gradient instability in the process of GAN training, this paper adopts the condition information in the conditional GAN to guide the compensation process of the generator network, and proposes the generator compensation performance measurement training task and the feature tag training task of the discriminator to stabilize training process. Finally, the proposed short utterance compensation method is evaluated on the speaker recognition system based on GMM–UBM. The experimental results indicate that the presented method can effectively reduce the equal error rate of the speaker recognition system in short utterance environment.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Abadi, M, et al. (2016). Tensorflow: A system for large-scale machine learning. In Proceedings of the 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI}16), pp. 265–283. Abadi, M, et al. (2016). Tensorflow: A system for large-scale machine learning. In Proceedings of the 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI}16), pp. 265–283.
go back to reference Campbell, J. P. (1997). Speaker recognition: A tutorial. Proceedings of IEEE,85(9), 1437–1462.CrossRef Campbell, J. P. (1997). Speaker recognition: A tutorial. Proceedings of IEEE,85(9), 1437–1462.CrossRef
go back to reference Chakroun, R., & Frikha, M. (2018). New approach for short utterance speaker identification. IET Signal Processing,12(7), 873–880.CrossRef Chakroun, R., & Frikha, M. (2018). New approach for short utterance speaker identification. IET Signal Processing,12(7), 873–880.CrossRef
go back to reference Chao, Y. H., Tsai, W. H., & Wang, H. M. (2009). Improving GMM–UBM speaker verification using discriminative feedback adaptation. Computer Speech & Language,23(3), 376–388.CrossRef Chao, Y. H., Tsai, W. H., & Wang, H. M. (2009). Improving GMM–UBM speaker verification using discriminative feedback adaptation. Computer Speech & Language,23(3), 376–388.CrossRef
go back to reference Guo, J., Xu, N., Qian, K., et al. (2018). Deep neural network based i-vector mapping for speaker verification using short utterances. Speech Communication,105, 92–102.CrossRef Guo, J., Xu, N., Qian, K., et al. (2018). Deep neural network based i-vector mapping for speaker verification using short utterances. Speech Communication,105, 92–102.CrossRef
go back to reference Hansen, J. H. L., & Hasan, T. (2015). Speaker recognition by machines and humans: A tutorial review. IEEE Signal Processing Magazine,32(6), 74–99.CrossRef Hansen, J. H. L., & Hasan, T. (2015). Speaker recognition by machines and humans: A tutorial review. IEEE Signal Processing Magazine,32(6), 74–99.CrossRef
go back to reference Heravi, A. R., & Hodtani, G. A. (2018). Where does minimum error entropy outperform minimum mean square error? A new and closer look. IEEE Access,6(99), 5856–5864.CrossRef Heravi, A. R., & Hodtani, G. A. (2018). Where does minimum error entropy outperform minimum mean square error? A new and closer look. IEEE Access,6(99), 5856–5864.CrossRef
go back to reference Isola P., Zhu J. Y., Zhou T., & Efros A. A. (2017). Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1125–1134. Isola P., Zhu J. Y., Zhou T., & Efros A. A. (2017). Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1125–1134.
go back to reference Li, L., Wang, D., Zhang, C., & Suzuki, M. M. (2016). Improving short utterance speaker recognition by modeling speech unit classes. IEEE/ACM Transactions on Audio Speech & Language Processing,24(6), 1129–1139.CrossRef Li, L., Wang, D., Zhang, C., & Suzuki, M. M. (2016). Improving short utterance speaker recognition by modeling speech unit classes. IEEE/ACM Transactions on Audio Speech & Language Processing,24(6), 1129–1139.CrossRef
go back to reference Liu, Z., Wu, Z., Li, T., et al. (2018). GMM and CNN hybrid method for short utterance speaker recognition. IEEE Transactions on Industrial informatics,14(7), 3244–3252.CrossRef Liu, Z., Wu, Z., Li, T., et al. (2018). GMM and CNN hybrid method for short utterance speaker recognition. IEEE Transactions on Industrial informatics,14(7), 3244–3252.CrossRef
go back to reference Martinez J., Jorge H.,et al. (2012). Speaker recognition using Mel frequency Cepstral Coefficients (MFCC) and vector quantization (VQ) techniques. In Proceedings of the International Conference on Electrical Communications & Computers, pp. 248–251. Martinez J., Jorge H.,et al. (2012). Speaker recognition using Mel frequency Cepstral Coefficients (MFCC) and vector quantization (VQ) techniques. In Proceedings of the International Conference on Electrical Communications & Computers, pp. 248–251.
go back to reference Shen P., Lu X., Li S., & Kawai H. (2018). Conditional generative adversarial nets classifier for spoken language identification. In Proceedings of the INTERSPEECH, pp. 2814–2818. Shen P., Lu X., Li S., & Kawai H. (2018). Conditional generative adversarial nets classifier for spoken language identification. In Proceedings of the INTERSPEECH, pp. 2814–2818.
go back to reference Sueur, J. (2018). Mel-frequency cepstral and linear predictive coefficients. In Proceedings of the Sound Analysis and Synthesis with R, pp. 381–398. Sueur, J. (2018). Mel-frequency cepstral and linear predictive coefficients. In Proceedings of the Sound Analysis and Synthesis with R, pp. 381–398.
go back to reference Villalba J., Brummer N., & Dehak N. (2017). Tied variational autoencoder backends for i-vector speaker recognition. In Proceedings of Interspeech, pp. 1004–1008. Villalba J., Brummer N., & Dehak N. (2017). Tied variational autoencoder backends for i-vector speaker recognition. In Proceedings of Interspeech, pp. 1004–1008.
go back to reference Wu, Z., Yu, Z., Yuan, J., & Zhang, J. (2016). A twice face recognition algorithm. Soft Computing,20(3), 1007–1019.CrossRef Wu, Z., Yu, Z., Yuan, J., & Zhang, J. (2016). A twice face recognition algorithm. Soft Computing,20(3), 1007–1019.CrossRef
go back to reference Zhang, L., Zhao, J. Y., Xu-Lun, Y. E., et al. (2018a). Co-operative generative adversarial nets. Zidonghua Xuebao/acta Automatica Sinica,44(5), 804–810.MATH Zhang, L., Zhao, J. Y., Xu-Lun, Y. E., et al. (2018a). Co-operative generative adversarial nets. Zidonghua Xuebao/acta Automatica Sinica,44(5), 804–810.MATH
go back to reference Zhang J., Inoue N., & Shinoda K. (2018). I-vector transformation using conditional generative adversarial networks for short utterance speaker verification. arXiv preprint arXiv:1804.00290. Zhang J., Inoue N., & Shinoda K. (2018). I-vector transformation using conditional generative adversarial networks for short utterance speaker verification. arXiv preprint arXiv:1804.00290.
Metadata
Title
Speaker recognition based on short utterance compensation method of generative adversarial networks
Authors
Zhangfang Hu
Yaqin Fu
Yuan Luo
Xuan Xu
Zhiguang Xia
Hongwei Zhang
Publication date
07-05-2020
Publisher
Springer US
Published in
International Journal of Speech Technology / Issue 2/2020
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-020-09711-0

Other articles of this Issue 2/2020

International Journal of Speech Technology 2/2020 Go to the issue