Skip to main content
Top
Published in:

16-01-2023

A Novel Attention-Guided Generative Adversarial Network for Whisper-to-Normal Speech Conversion

Authors: Teng Gao, Qing Pan, Jian Zhou, Huabin Wang, Liang Tao, Hon Keung Kwan

Published in: Cognitive Computation | Issue 2/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Whispered speech is a special voicing style of speech that is employed publicly to protect speech information. It is also the primary pronunciation form for aphonic individuals with laryngectomy for oral communication. Converting whispered speech to normal-voiced speech can significantly improve speech quality and/or speech intelligibility for whisper perception or recognition. Due to the significant voicing style difference between normal speech and whispered speech, it is still a major challenge to estimate normal-voiced speech from its whispered counterpart. Existing whisper-to-normal speech conversion methods aim to learn a nonlinear function of features between whispered speech and its normal counterpart, and the converted normal speech is reconstructed with features selected by the learned function from the training data space. These methods may produce a discontinuous spectrum in successive frames, thus decreasing the speech quality and/or intelligibility of the converted normal speech. This paper proposes a novel generative model (AGAN-W2SC) for whisper-to-normal speech conversion. Unlike the feature mapping model, the proposed AGAN-W2SC model generates a normal speech spectrum from a whispered spectrum. To make the generated spectrum more similar to the reference normal speech, the inner-feature coherence of a whisper as well as the inter-feature coherence between whispered speech and its normal counterpart is modeled in the proposed AGAN-W2SC model. Specifically, a self-attention mechanism is introduced to capture the inner-spectrum structure while a Siamese neural network is adopted to capture the interspectrum structure in the cross-domain. Additionally, the proposed model adopts identity mapping to preserve linguistic information. The proposed AGAN-W2SC is parallel data-free and can be trained at the frame level. Experimental results on whisper-to-normal speech conversion demonstrate the superior performance and effectiveness of the proposed AGAN-W2SC method over all the compared competing methods in terms of speech quality and intelligibility

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Cotescu M, Drugman T, Huybrechts G, Lorenzo-Trueba J, Moinet A. Voice conversion for whispered speech synthesis. IEEE Signal Process Lett. 2020;27:186–90.CrossRef Cotescu M, Drugman T, Huybrechts G, Lorenzo-Trueba J, Moinet A. Voice conversion for whispered speech synthesis. IEEE Signal Process Lett. 2020;27:186–90.CrossRef
2.
go back to reference Xu M, Shao J, Ding H, Wang L. The effect of aging on identification of Mandarin consonants in normal and whisper registers. Front Psychol. 2022;13:962242.CrossRef Xu M, Shao J, Ding H, Wang L. The effect of aging on identification of Mandarin consonants in normal and whisper registers. Front Psychol. 2022;13:962242.CrossRef
3.
go back to reference Hendrickson K, Ernest D. The recognition of whispered speech in real-time. Ear Hear. 2022;43(2):554–62.CrossRef Hendrickson K, Ernest D. The recognition of whispered speech in real-time. Ear Hear. 2022;43(2):554–62.CrossRef
4.
go back to reference Rubin AD, Sataloff RT. Vocal fold paresis and paralysis. Otolaryngol Clin North Am. 2007;40(5):1109–31.CrossRef Rubin AD, Sataloff RT. Vocal fold paresis and paralysis. Otolaryngol Clin North Am. 2007;40(5):1109–31.CrossRef
5.
go back to reference Sulica L. Vocal fold paresis: an evolving clinical concept. Curr Otorhinolaryngol Rep. 2013;1(3):158–62.CrossRef Sulica L. Vocal fold paresis: an evolving clinical concept. Curr Otorhinolaryngol Rep. 2013;1(3):158–62.CrossRef
6.
7.
go back to reference Wallis L, Jackson-Menaldi C, Holland W, Giraldo A. Vocal fold nodule vs. vocal fold polyp: Answer from surgical pathologist and voice pathologist point of view. J Voice. 2004;18(1):125–9. Wallis L, Jackson-Menaldi C, Holland W, Giraldo A. Vocal fold nodule vs. vocal fold polyp: Answer from surgical pathologist and voice pathologist point of view. J Voice. 2004;18(1):125–9.
8.
go back to reference Mattiske JA, Oates JM, Greenwood KM. Vocal problems among teachers: a review of prevalence, causes, prevention, and treatment. J Voice. 1998;12(4):489–99.CrossRef Mattiske JA, Oates JM, Greenwood KM. Vocal problems among teachers: a review of prevalence, causes, prevention, and treatment. J Voice. 1998;12(4):489–99.CrossRef
9.
go back to reference Itoh T, Takeda K, Itakura F. Acoustic analysis and recognition of whispered speech. In: Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding; 2001. p. 429–32. Itoh T, Takeda K, Itakura F. Acoustic analysis and recognition of whispered speech. In: Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding; 2001. p. 429–32.
10.
go back to reference Zhang C, Hansen JHL. Advancements in whispered speech detection for interactive/speech systems. Signal Acoust Model Speech Commun Disorders. 2018;5:9–32.CrossRef Zhang C, Hansen JHL. Advancements in whispered speech detection for interactive/speech systems. Signal Acoust Model Speech Commun Disorders. 2018;5:9–32.CrossRef
11.
go back to reference Jin Q, Jou SS, Schultz T. Whispering speaker identification. In: Proceedings of IEEE International Conference on Multimedia and Expo; 2007. p. 1027–30. Jin Q, Jou SS, Schultz T. Whispering speaker identification. In: Proceedings of IEEE International Conference on Multimedia and Expo; 2007. p. 1027–30.
12.
go back to reference Fan X, Hansen JHL. Speaker identification for whispered speech based on frequency warping and score competition. In: Proceedings of INTERSPEECH; 2008. p. 1313–6. Fan X, Hansen JHL. Speaker identification for whispered speech based on frequency warping and score competition. In: Proceedings of INTERSPEECH; 2008. p. 1313–6.
13.
go back to reference Fan X, Hansen JHL. Speaker Identification with whispered speech based on modified LFCC parameters and feature mapping. In: Proceedings of IEEE Conference on Acoustics, Speech, and Signal Processing; 2009. p. 4553–6. Fan X, Hansen JHL. Speaker Identification with whispered speech based on modified LFCC parameters and feature mapping. In: Proceedings of IEEE Conference on Acoustics, Speech, and Signal Processing; 2009. p. 4553–6.
14.
go back to reference Fan X, Hansen JHL. Acoustic analysis for speaker identification of whispered speech. In: Proceedings of Conference on Acoustics, Speech, and Signal Processing; 2010. p. 5046–9. Fan X, Hansen JHL. Acoustic analysis for speaker identification of whispered speech. In: Proceedings of Conference on Acoustics, Speech, and Signal Processing; 2010. p. 5046–9.
15.
go back to reference Fan X, Hansen JHL. Speaker Identification for whispered speech using modified temporal patterns and MFCCs. In: Proceedings of INTERSPEECH; 2009. p. 912–5. Fan X, Hansen JHL. Speaker Identification for whispered speech using modified temporal patterns and MFCCs. In: Proceedings of INTERSPEECH; 2009. p. 912–5.
16.
go back to reference Ito T, Takeda K, Itakura F. Analysis and recognition of whispered speech. Speech Commun. 2005;45:139–52.CrossRef Ito T, Takeda K, Itakura F. Analysis and recognition of whispered speech. Speech Commun. 2005;45:139–52.CrossRef
17.
go back to reference Tajiri Y, Tanaka K, Toda T, Neubig G, Sakti S, Nakamura S. Non-audible murmur enhancement based on statistical conversion using air-and body-conductive microphones in noisy environments. In: Proceedings of INTERSPEECH; 2015. p. 2769–73. Tajiri Y, Tanaka K, Toda T, Neubig G, Sakti S, Nakamura S. Non-audible murmur enhancement based on statistical conversion using air-and body-conductive microphones in noisy environments. In: Proceedings of INTERSPEECH; 2015. p. 2769–73.
18.
go back to reference Ahmadi F, McLoughlin IV, Sharifzadeh HR. Analysis-by-synthesis method for whisper-speech reconstruction. In: 2008 IEEE Asia Pacific Conference on Circuits and Systems; 2008. p. 1280–3. Ahmadi F, McLoughlin IV, Sharifzadeh HR. Analysis-by-synthesis method for whisper-speech reconstruction. In: 2008 IEEE Asia Pacific Conference on Circuits and Systems; 2008. p. 1280–3.
19.
go back to reference Sharifzadeh HR, McLoughlin IV, Ahmadi F. Reconstruction of normal sounding speech for laryngectomy patients through a modified CELP codec. IEEE Trans Biomed Eng. 2010;57(10):2448–58.CrossRef Sharifzadeh HR, McLoughlin IV, Ahmadi F. Reconstruction of normal sounding speech for laryngectomy patients through a modified CELP codec. IEEE Trans Biomed Eng. 2010;57(10):2448–58.CrossRef
20.
go back to reference Li JJ, Mcloughlin IV, Dai LR, Ling ZH. Whisper-to-speech conversion using restricted Boltzmann machine arrays. Electron Lett. 2014;50(24):1781–2.CrossRef Li JJ, Mcloughlin IV, Dai LR, Ling ZH. Whisper-to-speech conversion using restricted Boltzmann machine arrays. Electron Lett. 2014;50(24):1781–2.CrossRef
21.
go back to reference Janke M, Wand M, Heistermann T, Schultz T, Prahallad K. Fundamental frequency generation for whisper-to-audible speech conversion. In: International Conference on Acoustics, Speech and Signal Processing; 2014. p. 2579–83. Janke M, Wand M, Heistermann T, Schultz T, Prahallad K. Fundamental frequency generation for whisper-to-audible speech conversion. In: International Conference on Acoustics, Speech and Signal Processing; 2014. p. 2579–83.
22.
go back to reference Meenakshi GN, Ghosh PK. Whispered speech to neutral speech conversion using bidirectional LSTMs. In: Interspeech. 2018. Meenakshi GN, Ghosh PK. Whispered speech to neutral speech conversion using bidirectional LSTMs. In: Interspeech. 2018.
23.
go back to reference Heeren, Willemijn FL. Vocalic correlates of pitch in whispered versus normal speech. J Acoust Soc Am. 2015;138(6):3800–10. Heeren, Willemijn FL. Vocalic correlates of pitch in whispered versus normal speech. J Acoust Soc Am. 2015;138(6):3800–10.
24.
go back to reference Clarke J, Baskent D, Gaudrain E. Pitch and spectral resolution: a systematic comparison of bottom-up cues for top-down repair of degraded speech. J Acoust Soc Am. 2016;139(1):395–405.CrossRef Clarke J, Baskent D, Gaudrain E. Pitch and spectral resolution: a systematic comparison of bottom-up cues for top-down repair of degraded speech. J Acoust Soc Am. 2016;139(1):395–405.CrossRef
25.
go back to reference Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735–80.CrossRef Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735–80.CrossRef
26.
go back to reference Janke M, Wand M, Heistermann T, Schultz T, Prahallad K. Fundamental frequency generation for whisper-to-audible speech conversion. In: Proceedings of IEEE Conference on Acoustics, Speech, and Signal Processing; 2014. p. 2579–83. Janke M, Wand M, Heistermann T, Schultz T, Prahallad K. Fundamental frequency generation for whisper-to-audible speech conversion. In: Proceedings of IEEE Conference on Acoustics, Speech, and Signal Processing; 2014. p. 2579–83.
27.
go back to reference Ji J, Wang M, Zhang X, Lei M. Relation constraint self-attention for image captioning. Neurocomputing. 2022;501:778–89.CrossRef Ji J, Wang M, Zhang X, Lei M. Relation constraint self-attention for image captioning. Neurocomputing. 2022;501:778–89.CrossRef
28.
go back to reference Guo M, Liu Z, Mu T, Hu S. Beyond self-attention: External attention using two linear layers for visual tasks. IEEE Trans Pattern Anal Mach Intell. 2022;14(8):1–13. Guo M, Liu Z, Mu T, Hu S. Beyond self-attention: External attention using two linear layers for visual tasks. IEEE Trans Pattern Anal Mach Intell. 2022;14(8):1–13.
29.
go back to reference Wang DL. The time dimension for scene analysis. IEEE Trans Neural Netw. 2005;16(6):1401–26.CrossRef Wang DL. The time dimension for scene analysis. IEEE Trans Neural Netw. 2005;16(6):1401–26.CrossRef
30.
go back to reference Subakan C, Ravanelli M, Cornell S, Bronzi M, Zhong J. Attention is all you need in speech separation. In: IEEE International Conference on Acoustics, Speech and Signal Processing. 2021. Subakan C, Ravanelli M, Cornell S, Bronzi M, Zhong J. Attention is all you need in speech separation. In: IEEE International Conference on Acoustics, Speech and Signal Processing. 2021.
31.
go back to reference Kaneko T, Kameoka H. CycleGAN-VC: Non-parallel voice conversion using cycle-consistent adversarial networks. In: Proceedings of 26th European Signal Processing Conference; 2018. p. 2100–4. Kaneko T, Kameoka H. CycleGAN-VC: Non-parallel voice conversion using cycle-consistent adversarial networks. In: Proceedings of 26th European Signal Processing Conference; 2018. p. 2100–4.
32.
go back to reference Auerbach Benjamin D, Gritton Howard J. Hearing in complex environments: Auditory gain control, attention, and hearing loss. Front Neurosci. 2022;16:1–23. Auerbach Benjamin D, Gritton Howard J. Hearing in complex environments: Auditory gain control, attention, and hearing loss. Front Neurosci. 2022;16:1–23.
33.
go back to reference Thomassen S, Hartung K, Einhäser W, Bendixen A. Low-high-low or high-low-high? Pattern effects on sequential auditory scene analysis. J Acoust Soc Am. 2022;152(5):2758–68. Thomassen S, Hartung K, Einhäser W, Bendixen A. Low-high-low or high-low-high? Pattern effects on sequential auditory scene analysis. J Acoust Soc Am. 2022;152(5):2758–68.
34.
go back to reference Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. In: Proceedings of Conference on Neural Information Processing Systems; 2014. p. 2672–80. Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. In: Proceedings of Conference on Neural Information Processing Systems; 2014. p. 2672–80.
35.
go back to reference Shah N, Shah NJ, Patil HA. Effectiveness of generative adversarial network for non-audible murmur-to-whisper speech conversion. In: Proceedings of INTERSPEECH; 2018. p. 3157–61. Shah N, Shah NJ, Patil HA. Effectiveness of generative adversarial network for non-audible murmur-to-whisper speech conversion. In: Proceedings of INTERSPEECH; 2018. p. 3157–61.
36.
go back to reference Patel M, Parmar M, Doshi S, Shah N, Patil. Novel inception-GAN for whispered-to-normal speech conversion. In: Proceedings of 10th ISCA Speech Synthesis Workshop; 2019. p. 87–92. Patel M, Parmar M, Doshi S, Shah N, Patil. Novel inception-GAN for whispered-to-normal speech conversion. In: Proceedings of 10th ISCA Speech Synthesis Workshop; 2019. p. 87–92.
37.
go back to reference Purohit M, Patel M, Malaviya H, Patil A, Parmar M, Shah N, Doshi S, Patil HA. Intelligibility improvement of dysarthric speech using MMSE DiscoGAN. In: Proceedings of Conference on Signal Processing and Communications; 2020. p. 1–5. Purohit M, Patel M, Malaviya H, Patil A, Parmar M, Shah N, Doshi S, Patil HA. Intelligibility improvement of dysarthric speech using MMSE DiscoGAN. In: Proceedings of Conference on Signal Processing and Communications; 2020. p. 1–5.
38.
go back to reference Parmar M, Doshi S, Shah NJ, Patel M, Patil HA. Effectiveness of cross-domain architectures for whisper-to-normal speech conversion. In: Proceedings of 27th European Signal Processing Conference; 2019. p. 1–5. Parmar M, Doshi S, Shah NJ, Patel M, Patil HA. Effectiveness of cross-domain architectures for whisper-to-normal speech conversion. In: Proceedings of 27th European Signal Processing Conference; 2019. p. 1–5.
39.
go back to reference Zhang H, Goodfellow I, Metaxas D, Odena A. Self-attention generative adversarial networks. In: Proceedings of International Conference on Machine Learning. 2019 . Zhang H, Goodfellow I, Metaxas D, Odena A. Self-attention generative adversarial networks. In: Proceedings of International Conference on Machine Learning. 2019 .
40.
go back to reference Amodio M, Krishnaswamy S. TraVeLGAN: Image-to-image translation by transformation vector learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition; 2019. p. 8975–84. Amodio M, Krishnaswamy S. TraVeLGAN: Image-to-image translation by transformation vector learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition; 2019. p. 8975–84.
41.
go back to reference Melekhov I, Kannala J, Rahtu E. Siamese network features for image matching. In: Proceedings of 23rd Conference Pattern Recognition; 2016. p. 378–83. Melekhov I, Kannala J, Rahtu E. Siamese network features for image matching. In: Proceedings of 23rd Conference Pattern Recognition; 2016. p. 378–83.
42.
go back to reference Gao Y, Singh R, Raj B. Voice impersonation using generative adversarial networks. In: Proceedings of Conference on Acoustics, Speech, and Signal Processing; 2018. p. 2506–10. Gao Y, Singh R, Raj B. Voice impersonation using generative adversarial networks. In: Proceedings of Conference on Acoustics, Speech, and Signal Processing; 2018. p. 2506–10.
43.
go back to reference Zhu J, Park T, Isola P, Efros AA. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of IEEE Conference on Computer Vision; 2017. p. 2242–51. Zhu J, Park T, Isola P, Efros AA. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of IEEE Conference on Computer Vision; 2017. p. 2242–51.
44.
go back to reference Taigman Y, Polyak A, Wolf L. Unsupervised cross-domain image generation. In: Proceedings of the International Conference on Learning Representations. 2016. Taigman Y, Polyak A, Wolf L. Unsupervised cross-domain image generation. In: Proceedings of the International Conference on Learning Representations. 2016.
45.
go back to reference Yamagishi J, Brown G, Yang CY, Clark R, King S. CSTR NAM TIMIT Plus, [dataset]. Centre for Speech Technology Research: University of Edinburgh; 2021. Yamagishi J, Brown G, Yang CY, Clark R, King S. CSTR NAM TIMIT Plus, [dataset]. Centre for Speech Technology Research: University of Edinburgh; 2021.
46.
go back to reference Toda T, Nakagiri M, Shikano K. Statistical voice conversion techniques for body-conducted unvoiced speech enhancement. IEEE Trans Audio Speech Language Process. 2012;20(9):2505–17.CrossRef Toda T, Nakagiri M, Shikano K. Statistical voice conversion techniques for body-conducted unvoiced speech enhancement. IEEE Trans Audio Speech Language Process. 2012;20(9):2505–17.CrossRef
47.
go back to reference Meenakshi GN, Ghosh PK. Whispered speech to neutral speech conversion using bidirectional LSTMs. In: Proceedings of INTERSPEECH; 2018. p. 491–5. Meenakshi GN, Ghosh PK. Whispered speech to neutral speech conversion using bidirectional LSTMs. In: Proceedings of INTERSPEECH; 2018. p. 491–5.
48.
go back to reference Griffin D, Lim J. Signal estimation from modified short-time Fourier transform. In: Proceedings of IEEE Conference on Acoustics, Speech, and Signal Processing; 1983. p. 804–7. Griffin D, Lim J. Signal estimation from modified short-time Fourier transform. In: Proceedings of IEEE Conference on Acoustics, Speech, and Signal Processing; 1983. p. 804–7.
49.
go back to reference Erro D, Sainz I, Navas E, Hernaez I. Improved HNM based vocoder for statistical synthesizers. In: INTERSPEECH, Florence, Italy; 2011. p. 1809–12. Erro D, Sainz I, Navas E, Hernaez I. Improved HNM based vocoder for statistical synthesizers. In: INTERSPEECH, Florence, Italy; 2011. p. 1809–12.
50.
go back to reference Taal CH, Hendriks RC, Heusdens R, Jensen J. A short-time objective intelligibility measure for time-frequency weighted noisy speech. In: Proceedings of Conference on Acoustics, Speech, and Signal Processing; 2010. p. 4214–7. Taal CH, Hendriks RC, Heusdens R, Jensen J. A short-time objective intelligibility measure for time-frequency weighted noisy speech. In: Proceedings of Conference on Acoustics, Speech, and Signal Processing; 2010. p. 4214–7.
51.
go back to reference Rix AW, Beerends JG, Hollier M, Hekstra AP. Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. 2001. p.749–752. Rix AW, Beerends JG, Hollier M, Hekstra AP. Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. 2001. p.749–752.
52.
go back to reference Kubichek R. Mel-cepstral distance measure for objective speech quality assessment. In: Proceedings of IEEE Pacific Rim Conference on Communications Computers and Signal Processing; 1993. p. 125–8. Kubichek R. Mel-cepstral distance measure for objective speech quality assessment. In: Proceedings of IEEE Pacific Rim Conference on Communications Computers and Signal Processing; 1993. p. 125–8.
53.
go back to reference Gray A, Markel J. Distance measures for speech processing. IEEE Trans Acoust Speech Signal Process. 1976;24(5):380–91.CrossRef Gray A, Markel J. Distance measures for speech processing. IEEE Trans Acoust Speech Signal Process. 1976;24(5):380–91.CrossRef
54.
go back to reference Malfait L, Berger J, Kastner M. P.563 - The ITU-T standard for single-ended speech quality assessment. IEEE Trans Audio Speech Language Process. 2006;14(6):1924–34. Malfait L, Berger J, Kastner M. P.563 - The ITU-T standard for single-ended speech quality assessment. IEEE Trans Audio Speech Language Process. 2006;14(6):1924–34.
Metadata
Title
A Novel Attention-Guided Generative Adversarial Network for Whisper-to-Normal Speech Conversion
Authors
Teng Gao
Qing Pan
Jian Zhou
Huabin Wang
Liang Tao
Hon Keung Kwan
Publication date
16-01-2023
Publisher
Springer US
Published in
Cognitive Computation / Issue 2/2023
Print ISSN: 1866-9956
Electronic ISSN: 1866-9964
DOI
https://doi.org/10.1007/s12559-023-10108-9

Other articles of this Issue 2/2023

Cognitive Computation 2/2023 Go to the issue

Premium Partner