Skip to main content

2018 | OriginalPaper | Buchkapitel

Voice Conversion from Arbitrary Speakers Based on Deep Neural Networks with Adversarial Learning

verfasst von : Sou Miyamoto, Takashi Nose, Suzunosuke Ito, Harunori Koike, Yuya Chiba, Akinori Ito, Takahiro Shinozaki

Erschienen in: Advances in Intelligent Information Hiding and Multimedia Signal Processing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this study, we propose a voice conversion technique from arbitrary speakers based on deep neural networks using adversarial learning, which is realized by introducing adversarial learning to the conventional voice conversion. Adversarial learning is expected to enable us more natural voice conversion by using a discriminative model which classifies input speech to natural speech or converted speech in addition to a generative model. Experiments showed that proposed method was effective to enhance global variance (GV) of mel-cepstrum but naturalness of converted speech was a little lower than speech using the conventional variance compensation technique.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Desai, S., Raghavendra, E.V., Yegnanarayana, B., Black, A.W., Prahallad, K.: Voice conversion using artificial neural networks. In: Proceedings of the ICASSP, pp. 3893–3896 (2009) Desai, S., Raghavendra, E.V., Yegnanarayana, B., Black, A.W., Prahallad, K.: Voice conversion using artificial neural networks. In: Proceedings of the ICASSP, pp. 3893–3896 (2009)
2.
Zurück zum Zitat Furui, S.: Speaker-independent isolated word recognition using dynamic features of speech spectrum. IEEE Trans. Acoust. Speech Sig. Process. 34(1), 52–59 (1986)CrossRef Furui, S.: Speaker-independent isolated word recognition using dynamic features of speech spectrum. IEEE Trans. Acoust. Speech Sig. Process. 34(1), 52–59 (1986)CrossRef
3.
Zurück zum Zitat Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014) Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
4.
Zurück zum Zitat Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural comput. 9(8), 1735–1780 (1997)CrossRef Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural comput. 9(8), 1735–1780 (1997)CrossRef
5.
Zurück zum Zitat Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint (2015). arXiv:1502.03167 Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint (2015). arXiv:​1502.​03167
6.
Zurück zum Zitat Kain, A., Macon, M.: Spectral voice conversion for text-to-speech synthesis. In: Proceedings of the ICASSP, pp. 285–288 (1998) Kain, A., Macon, M.: Spectral voice conversion for text-to-speech synthesis. In: Proceedings of the ICASSP, pp. 285–288 (1998)
7.
Zurück zum Zitat Koike, H., Nose, T., Shinozaki, T., Ito, A.: Improvement of quality of voice conversion based on spectral differential filter using straight-based mel-cepstral coefficients. J. Acoust. Soc. Am. 140(4), 2963–2963 (2016)CrossRef Koike, H., Nose, T., Shinozaki, T., Ito, A.: Improvement of quality of voice conversion based on spectral differential filter using straight-based mel-cepstral coefficients. J. Acoust. Soc. Am. 140(4), 2963–2963 (2016)CrossRef
8.
Zurück zum Zitat Ling, Z.H., Wu, Y.J., Wang, Y.P., Qin, L., Wang, R.H.: USTC system for blizzard challenge 2006 an improved HMM-based speech synthesis method. In: Blizzard Challenge Workshop (2006) Ling, Z.H., Wu, Y.J., Wang, Y.P., Qin, L., Wang, R.H.: USTC system for blizzard challenge 2006 an improved HMM-based speech synthesis method. In: Blizzard Challenge Workshop (2006)
9.
Zurück zum Zitat Morise, M., Yokomori, F., Ozawa, K.: World: a vocoder-based high-quality speech synthesis system for real-time applications. IEICE Trans. Inf. Syst. 99(7), 1877–1884 (2016)CrossRef Morise, M., Yokomori, F., Ozawa, K.: World: a vocoder-based high-quality speech synthesis system for real-time applications. IEICE Trans. Inf. Syst. 99(7), 1877–1884 (2016)CrossRef
10.
Zurück zum Zitat Nose, T., Ota, Y., Kobayashi, T.: HMM-based voice conversion using quantized F0 context. IEICE Trans. Inf. Syst. E93–D(9), 2483–2490 (2010)CrossRef Nose, T., Ota, Y., Kobayashi, T.: HMM-based voice conversion using quantized F0 context. IEICE Trans. Inf. Syst. E93–D(9), 2483–2490 (2010)CrossRef
11.
Zurück zum Zitat Nose, T.: Efficient implementation of global variance compensation for parametric speech synthesis. IEEE/ACM Trans. Audio Speech Lang. Process. 24(10), 1694–1704 (2016)CrossRef Nose, T.: Efficient implementation of global variance compensation for parametric speech synthesis. IEEE/ACM Trans. Audio Speech Lang. Process. 24(10), 1694–1704 (2016)CrossRef
12.
Zurück zum Zitat Pilkington, N.C., Zen, H., Gales, M.J., et al.: Gaussian process experts for voice conversion. In: Proceedings of the INTERSPEECH, pp. 2772–2775 (2011) Pilkington, N.C., Zen, H., Gales, M.J., et al.: Gaussian process experts for voice conversion. In: Proceedings of the INTERSPEECH, pp. 2772–2775 (2011)
13.
Zurück zum Zitat Saito, Y., Takamichi, S., Saruwatari, H.: Training algorithm to deceive anti-spoofing verification for DNN-based speech synthesis. In: Proceedings of the ICASSP Saito, Y., Takamichi, S., Saruwatari, H.: Training algorithm to deceive anti-spoofing verification for DNN-based speech synthesis. In: Proceedings of the ICASSP
14.
Zurück zum Zitat Stylianou, Y.: Voice transformation: a survey. In: Proceedings of the ICASSP, pp. 3585–3588 (2009) Stylianou, Y.: Voice transformation: a survey. In: Proceedings of the ICASSP, pp. 3585–3588 (2009)
15.
Zurück zum Zitat Tomoki, T., Tokuda, K.: A speech parameter generation algorithm considering global variance for HMM-based speech synthesis. IEICE Trans. Inf. Syst. 90(5), 816–824 (2007) Tomoki, T., Tokuda, K.: A speech parameter generation algorithm considering global variance for HMM-based speech synthesis. IEICE Trans. Inf. Syst. 90(5), 816–824 (2007)
Metadaten
Titel
Voice Conversion from Arbitrary Speakers Based on Deep Neural Networks with Adversarial Learning
verfasst von
Sou Miyamoto
Takashi Nose
Suzunosuke Ito
Harunori Koike
Yuya Chiba
Akinori Ito
Takahiro Shinozaki
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-63859-1_13