Skip to main content
Erschienen in:

16.01.2023

A Novel Attention-Guided Generative Adversarial Network for Whisper-to-Normal Speech Conversion

verfasst von: Teng Gao, Qing Pan, Jian Zhou, Huabin Wang, Liang Tao, Hon Keung Kwan

Erschienen in: Cognitive Computation | Ausgabe 2/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Whispered speech is a special voicing style of speech that is employed publicly to protect speech information. It is also the primary pronunciation form for aphonic individuals with laryngectomy for oral communication. Converting whispered speech to normal-voiced speech can significantly improve speech quality and/or speech intelligibility for whisper perception or recognition. Due to the significant voicing style difference between normal speech and whispered speech, it is still a major challenge to estimate normal-voiced speech from its whispered counterpart. Existing whisper-to-normal speech conversion methods aim to learn a nonlinear function of features between whispered speech and its normal counterpart, and the converted normal speech is reconstructed with features selected by the learned function from the training data space. These methods may produce a discontinuous spectrum in successive frames, thus decreasing the speech quality and/or intelligibility of the converted normal speech. This paper proposes a novel generative model (AGAN-W2SC) for whisper-to-normal speech conversion. Unlike the feature mapping model, the proposed AGAN-W2SC model generates a normal speech spectrum from a whispered spectrum. To make the generated spectrum more similar to the reference normal speech, the inner-feature coherence of a whisper as well as the inter-feature coherence between whispered speech and its normal counterpart is modeled in the proposed AGAN-W2SC model. Specifically, a self-attention mechanism is introduced to capture the inner-spectrum structure while a Siamese neural network is adopted to capture the interspectrum structure in the cross-domain. Additionally, the proposed model adopts identity mapping to preserve linguistic information. The proposed AGAN-W2SC is parallel data-free and can be trained at the frame level. Experimental results on whisper-to-normal speech conversion demonstrate the superior performance and effectiveness of the proposed AGAN-W2SC method over all the compared competing methods in terms of speech quality and intelligibility

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Cotescu M, Drugman T, Huybrechts G, Lorenzo-Trueba J, Moinet A. Voice conversion for whispered speech synthesis. IEEE Signal Process Lett. 2020;27:186–90.CrossRef Cotescu M, Drugman T, Huybrechts G, Lorenzo-Trueba J, Moinet A. Voice conversion for whispered speech synthesis. IEEE Signal Process Lett. 2020;27:186–90.CrossRef
2.
Zurück zum Zitat Xu M, Shao J, Ding H, Wang L. The effect of aging on identification of Mandarin consonants in normal and whisper registers. Front Psychol. 2022;13:962242.CrossRef Xu M, Shao J, Ding H, Wang L. The effect of aging on identification of Mandarin consonants in normal and whisper registers. Front Psychol. 2022;13:962242.CrossRef
3.
Zurück zum Zitat Hendrickson K, Ernest D. The recognition of whispered speech in real-time. Ear Hear. 2022;43(2):554–62.CrossRef Hendrickson K, Ernest D. The recognition of whispered speech in real-time. Ear Hear. 2022;43(2):554–62.CrossRef
4.
Zurück zum Zitat Rubin AD, Sataloff RT. Vocal fold paresis and paralysis. Otolaryngol Clin North Am. 2007;40(5):1109–31.CrossRef Rubin AD, Sataloff RT. Vocal fold paresis and paralysis. Otolaryngol Clin North Am. 2007;40(5):1109–31.CrossRef
5.
Zurück zum Zitat Sulica L. Vocal fold paresis: an evolving clinical concept. Curr Otorhinolaryngol Rep. 2013;1(3):158–62.CrossRef Sulica L. Vocal fold paresis: an evolving clinical concept. Curr Otorhinolaryngol Rep. 2013;1(3):158–62.CrossRef
6.
Zurück zum Zitat Tartter VC. What is in a whisper. J Acoust Soc Am. 1989;86(5):1678–83.CrossRef Tartter VC. What is in a whisper. J Acoust Soc Am. 1989;86(5):1678–83.CrossRef
7.
Zurück zum Zitat Wallis L, Jackson-Menaldi C, Holland W, Giraldo A. Vocal fold nodule vs. vocal fold polyp: Answer from surgical pathologist and voice pathologist point of view. J Voice. 2004;18(1):125–9. Wallis L, Jackson-Menaldi C, Holland W, Giraldo A. Vocal fold nodule vs. vocal fold polyp: Answer from surgical pathologist and voice pathologist point of view. J Voice. 2004;18(1):125–9.
8.
Zurück zum Zitat Mattiske JA, Oates JM, Greenwood KM. Vocal problems among teachers: a review of prevalence, causes, prevention, and treatment. J Voice. 1998;12(4):489–99.CrossRef Mattiske JA, Oates JM, Greenwood KM. Vocal problems among teachers: a review of prevalence, causes, prevention, and treatment. J Voice. 1998;12(4):489–99.CrossRef
9.
Zurück zum Zitat Itoh T, Takeda K, Itakura F. Acoustic analysis and recognition of whispered speech. In: Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding; 2001. p. 429–32. Itoh T, Takeda K, Itakura F. Acoustic analysis and recognition of whispered speech. In: Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding; 2001. p. 429–32.
10.
Zurück zum Zitat Zhang C, Hansen JHL. Advancements in whispered speech detection for interactive/speech systems. Signal Acoust Model Speech Commun Disorders. 2018;5:9–32.CrossRef Zhang C, Hansen JHL. Advancements in whispered speech detection for interactive/speech systems. Signal Acoust Model Speech Commun Disorders. 2018;5:9–32.CrossRef
11.
Zurück zum Zitat Jin Q, Jou SS, Schultz T. Whispering speaker identification. In: Proceedings of IEEE International Conference on Multimedia and Expo; 2007. p. 1027–30. Jin Q, Jou SS, Schultz T. Whispering speaker identification. In: Proceedings of IEEE International Conference on Multimedia and Expo; 2007. p. 1027–30.
12.
Zurück zum Zitat Fan X, Hansen JHL. Speaker identification for whispered speech based on frequency warping and score competition. In: Proceedings of INTERSPEECH; 2008. p. 1313–6. Fan X, Hansen JHL. Speaker identification for whispered speech based on frequency warping and score competition. In: Proceedings of INTERSPEECH; 2008. p. 1313–6.
13.
Zurück zum Zitat Fan X, Hansen JHL. Speaker Identification with whispered speech based on modified LFCC parameters and feature mapping. In: Proceedings of IEEE Conference on Acoustics, Speech, and Signal Processing; 2009. p. 4553–6. Fan X, Hansen JHL. Speaker Identification with whispered speech based on modified LFCC parameters and feature mapping. In: Proceedings of IEEE Conference on Acoustics, Speech, and Signal Processing; 2009. p. 4553–6.
14.
Zurück zum Zitat Fan X, Hansen JHL. Acoustic analysis for speaker identification of whispered speech. In: Proceedings of Conference on Acoustics, Speech, and Signal Processing; 2010. p. 5046–9. Fan X, Hansen JHL. Acoustic analysis for speaker identification of whispered speech. In: Proceedings of Conference on Acoustics, Speech, and Signal Processing; 2010. p. 5046–9.
15.
Zurück zum Zitat Fan X, Hansen JHL. Speaker Identification for whispered speech using modified temporal patterns and MFCCs. In: Proceedings of INTERSPEECH; 2009. p. 912–5. Fan X, Hansen JHL. Speaker Identification for whispered speech using modified temporal patterns and MFCCs. In: Proceedings of INTERSPEECH; 2009. p. 912–5.
16.
Zurück zum Zitat Ito T, Takeda K, Itakura F. Analysis and recognition of whispered speech. Speech Commun. 2005;45:139–52.CrossRef Ito T, Takeda K, Itakura F. Analysis and recognition of whispered speech. Speech Commun. 2005;45:139–52.CrossRef
17.
Zurück zum Zitat Tajiri Y, Tanaka K, Toda T, Neubig G, Sakti S, Nakamura S. Non-audible murmur enhancement based on statistical conversion using air-and body-conductive microphones in noisy environments. In: Proceedings of INTERSPEECH; 2015. p. 2769–73. Tajiri Y, Tanaka K, Toda T, Neubig G, Sakti S, Nakamura S. Non-audible murmur enhancement based on statistical conversion using air-and body-conductive microphones in noisy environments. In: Proceedings of INTERSPEECH; 2015. p. 2769–73.
18.
Zurück zum Zitat Ahmadi F, McLoughlin IV, Sharifzadeh HR. Analysis-by-synthesis method for whisper-speech reconstruction. In: 2008 IEEE Asia Pacific Conference on Circuits and Systems; 2008. p. 1280–3. Ahmadi F, McLoughlin IV, Sharifzadeh HR. Analysis-by-synthesis method for whisper-speech reconstruction. In: 2008 IEEE Asia Pacific Conference on Circuits and Systems; 2008. p. 1280–3.
19.
Zurück zum Zitat Sharifzadeh HR, McLoughlin IV, Ahmadi F. Reconstruction of normal sounding speech for laryngectomy patients through a modified CELP codec. IEEE Trans Biomed Eng. 2010;57(10):2448–58.CrossRef Sharifzadeh HR, McLoughlin IV, Ahmadi F. Reconstruction of normal sounding speech for laryngectomy patients through a modified CELP codec. IEEE Trans Biomed Eng. 2010;57(10):2448–58.CrossRef
20.
Zurück zum Zitat Li JJ, Mcloughlin IV, Dai LR, Ling ZH. Whisper-to-speech conversion using restricted Boltzmann machine arrays. Electron Lett. 2014;50(24):1781–2.CrossRef Li JJ, Mcloughlin IV, Dai LR, Ling ZH. Whisper-to-speech conversion using restricted Boltzmann machine arrays. Electron Lett. 2014;50(24):1781–2.CrossRef
21.
Zurück zum Zitat Janke M, Wand M, Heistermann T, Schultz T, Prahallad K. Fundamental frequency generation for whisper-to-audible speech conversion. In: International Conference on Acoustics, Speech and Signal Processing; 2014. p. 2579–83. Janke M, Wand M, Heistermann T, Schultz T, Prahallad K. Fundamental frequency generation for whisper-to-audible speech conversion. In: International Conference on Acoustics, Speech and Signal Processing; 2014. p. 2579–83.
22.
Zurück zum Zitat Meenakshi GN, Ghosh PK. Whispered speech to neutral speech conversion using bidirectional LSTMs. In: Interspeech. 2018. Meenakshi GN, Ghosh PK. Whispered speech to neutral speech conversion using bidirectional LSTMs. In: Interspeech. 2018.
23.
Zurück zum Zitat Heeren, Willemijn FL. Vocalic correlates of pitch in whispered versus normal speech. J Acoust Soc Am. 2015;138(6):3800–10. Heeren, Willemijn FL. Vocalic correlates of pitch in whispered versus normal speech. J Acoust Soc Am. 2015;138(6):3800–10.
24.
Zurück zum Zitat Clarke J, Baskent D, Gaudrain E. Pitch and spectral resolution: a systematic comparison of bottom-up cues for top-down repair of degraded speech. J Acoust Soc Am. 2016;139(1):395–405.CrossRef Clarke J, Baskent D, Gaudrain E. Pitch and spectral resolution: a systematic comparison of bottom-up cues for top-down repair of degraded speech. J Acoust Soc Am. 2016;139(1):395–405.CrossRef
25.
Zurück zum Zitat Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735–80.CrossRef Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735–80.CrossRef
26.
Zurück zum Zitat Janke M, Wand M, Heistermann T, Schultz T, Prahallad K. Fundamental frequency generation for whisper-to-audible speech conversion. In: Proceedings of IEEE Conference on Acoustics, Speech, and Signal Processing; 2014. p. 2579–83. Janke M, Wand M, Heistermann T, Schultz T, Prahallad K. Fundamental frequency generation for whisper-to-audible speech conversion. In: Proceedings of IEEE Conference on Acoustics, Speech, and Signal Processing; 2014. p. 2579–83.
27.
Zurück zum Zitat Ji J, Wang M, Zhang X, Lei M. Relation constraint self-attention for image captioning. Neurocomputing. 2022;501:778–89.CrossRef Ji J, Wang M, Zhang X, Lei M. Relation constraint self-attention for image captioning. Neurocomputing. 2022;501:778–89.CrossRef
28.
Zurück zum Zitat Guo M, Liu Z, Mu T, Hu S. Beyond self-attention: External attention using two linear layers for visual tasks. IEEE Trans Pattern Anal Mach Intell. 2022;14(8):1–13. Guo M, Liu Z, Mu T, Hu S. Beyond self-attention: External attention using two linear layers for visual tasks. IEEE Trans Pattern Anal Mach Intell. 2022;14(8):1–13.
29.
Zurück zum Zitat Wang DL. The time dimension for scene analysis. IEEE Trans Neural Netw. 2005;16(6):1401–26.CrossRef Wang DL. The time dimension for scene analysis. IEEE Trans Neural Netw. 2005;16(6):1401–26.CrossRef
30.
Zurück zum Zitat Subakan C, Ravanelli M, Cornell S, Bronzi M, Zhong J. Attention is all you need in speech separation. In: IEEE International Conference on Acoustics, Speech and Signal Processing. 2021. Subakan C, Ravanelli M, Cornell S, Bronzi M, Zhong J. Attention is all you need in speech separation. In: IEEE International Conference on Acoustics, Speech and Signal Processing. 2021.
31.
Zurück zum Zitat Kaneko T, Kameoka H. CycleGAN-VC: Non-parallel voice conversion using cycle-consistent adversarial networks. In: Proceedings of 26th European Signal Processing Conference; 2018. p. 2100–4. Kaneko T, Kameoka H. CycleGAN-VC: Non-parallel voice conversion using cycle-consistent adversarial networks. In: Proceedings of 26th European Signal Processing Conference; 2018. p. 2100–4.
32.
Zurück zum Zitat Auerbach Benjamin D, Gritton Howard J. Hearing in complex environments: Auditory gain control, attention, and hearing loss. Front Neurosci. 2022;16:1–23. Auerbach Benjamin D, Gritton Howard J. Hearing in complex environments: Auditory gain control, attention, and hearing loss. Front Neurosci. 2022;16:1–23.
33.
Zurück zum Zitat Thomassen S, Hartung K, Einhäser W, Bendixen A. Low-high-low or high-low-high? Pattern effects on sequential auditory scene analysis. J Acoust Soc Am. 2022;152(5):2758–68. Thomassen S, Hartung K, Einhäser W, Bendixen A. Low-high-low or high-low-high? Pattern effects on sequential auditory scene analysis. J Acoust Soc Am. 2022;152(5):2758–68.
34.
Zurück zum Zitat Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. In: Proceedings of Conference on Neural Information Processing Systems; 2014. p. 2672–80. Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. In: Proceedings of Conference on Neural Information Processing Systems; 2014. p. 2672–80.
35.
Zurück zum Zitat Shah N, Shah NJ, Patil HA. Effectiveness of generative adversarial network for non-audible murmur-to-whisper speech conversion. In: Proceedings of INTERSPEECH; 2018. p. 3157–61. Shah N, Shah NJ, Patil HA. Effectiveness of generative adversarial network for non-audible murmur-to-whisper speech conversion. In: Proceedings of INTERSPEECH; 2018. p. 3157–61.
36.
Zurück zum Zitat Patel M, Parmar M, Doshi S, Shah N, Patil. Novel inception-GAN for whispered-to-normal speech conversion. In: Proceedings of 10th ISCA Speech Synthesis Workshop; 2019. p. 87–92. Patel M, Parmar M, Doshi S, Shah N, Patil. Novel inception-GAN for whispered-to-normal speech conversion. In: Proceedings of 10th ISCA Speech Synthesis Workshop; 2019. p. 87–92.
37.
Zurück zum Zitat Purohit M, Patel M, Malaviya H, Patil A, Parmar M, Shah N, Doshi S, Patil HA. Intelligibility improvement of dysarthric speech using MMSE DiscoGAN. In: Proceedings of Conference on Signal Processing and Communications; 2020. p. 1–5. Purohit M, Patel M, Malaviya H, Patil A, Parmar M, Shah N, Doshi S, Patil HA. Intelligibility improvement of dysarthric speech using MMSE DiscoGAN. In: Proceedings of Conference on Signal Processing and Communications; 2020. p. 1–5.
38.
Zurück zum Zitat Parmar M, Doshi S, Shah NJ, Patel M, Patil HA. Effectiveness of cross-domain architectures for whisper-to-normal speech conversion. In: Proceedings of 27th European Signal Processing Conference; 2019. p. 1–5. Parmar M, Doshi S, Shah NJ, Patel M, Patil HA. Effectiveness of cross-domain architectures for whisper-to-normal speech conversion. In: Proceedings of 27th European Signal Processing Conference; 2019. p. 1–5.
39.
Zurück zum Zitat Zhang H, Goodfellow I, Metaxas D, Odena A. Self-attention generative adversarial networks. In: Proceedings of International Conference on Machine Learning. 2019 . Zhang H, Goodfellow I, Metaxas D, Odena A. Self-attention generative adversarial networks. In: Proceedings of International Conference on Machine Learning. 2019 .
40.
Zurück zum Zitat Amodio M, Krishnaswamy S. TraVeLGAN: Image-to-image translation by transformation vector learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition; 2019. p. 8975–84. Amodio M, Krishnaswamy S. TraVeLGAN: Image-to-image translation by transformation vector learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition; 2019. p. 8975–84.
41.
Zurück zum Zitat Melekhov I, Kannala J, Rahtu E. Siamese network features for image matching. In: Proceedings of 23rd Conference Pattern Recognition; 2016. p. 378–83. Melekhov I, Kannala J, Rahtu E. Siamese network features for image matching. In: Proceedings of 23rd Conference Pattern Recognition; 2016. p. 378–83.
42.
Zurück zum Zitat Gao Y, Singh R, Raj B. Voice impersonation using generative adversarial networks. In: Proceedings of Conference on Acoustics, Speech, and Signal Processing; 2018. p. 2506–10. Gao Y, Singh R, Raj B. Voice impersonation using generative adversarial networks. In: Proceedings of Conference on Acoustics, Speech, and Signal Processing; 2018. p. 2506–10.
43.
Zurück zum Zitat Zhu J, Park T, Isola P, Efros AA. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of IEEE Conference on Computer Vision; 2017. p. 2242–51. Zhu J, Park T, Isola P, Efros AA. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of IEEE Conference on Computer Vision; 2017. p. 2242–51.
44.
Zurück zum Zitat Taigman Y, Polyak A, Wolf L. Unsupervised cross-domain image generation. In: Proceedings of the International Conference on Learning Representations. 2016. Taigman Y, Polyak A, Wolf L. Unsupervised cross-domain image generation. In: Proceedings of the International Conference on Learning Representations. 2016.
45.
Zurück zum Zitat Yamagishi J, Brown G, Yang CY, Clark R, King S. CSTR NAM TIMIT Plus, [dataset]. Centre for Speech Technology Research: University of Edinburgh; 2021. Yamagishi J, Brown G, Yang CY, Clark R, King S. CSTR NAM TIMIT Plus, [dataset]. Centre for Speech Technology Research: University of Edinburgh; 2021.
46.
Zurück zum Zitat Toda T, Nakagiri M, Shikano K. Statistical voice conversion techniques for body-conducted unvoiced speech enhancement. IEEE Trans Audio Speech Language Process. 2012;20(9):2505–17.CrossRef Toda T, Nakagiri M, Shikano K. Statistical voice conversion techniques for body-conducted unvoiced speech enhancement. IEEE Trans Audio Speech Language Process. 2012;20(9):2505–17.CrossRef
47.
Zurück zum Zitat Meenakshi GN, Ghosh PK. Whispered speech to neutral speech conversion using bidirectional LSTMs. In: Proceedings of INTERSPEECH; 2018. p. 491–5. Meenakshi GN, Ghosh PK. Whispered speech to neutral speech conversion using bidirectional LSTMs. In: Proceedings of INTERSPEECH; 2018. p. 491–5.
48.
Zurück zum Zitat Griffin D, Lim J. Signal estimation from modified short-time Fourier transform. In: Proceedings of IEEE Conference on Acoustics, Speech, and Signal Processing; 1983. p. 804–7. Griffin D, Lim J. Signal estimation from modified short-time Fourier transform. In: Proceedings of IEEE Conference on Acoustics, Speech, and Signal Processing; 1983. p. 804–7.
49.
Zurück zum Zitat Erro D, Sainz I, Navas E, Hernaez I. Improved HNM based vocoder for statistical synthesizers. In: INTERSPEECH, Florence, Italy; 2011. p. 1809–12. Erro D, Sainz I, Navas E, Hernaez I. Improved HNM based vocoder for statistical synthesizers. In: INTERSPEECH, Florence, Italy; 2011. p. 1809–12.
50.
Zurück zum Zitat Taal CH, Hendriks RC, Heusdens R, Jensen J. A short-time objective intelligibility measure for time-frequency weighted noisy speech. In: Proceedings of Conference on Acoustics, Speech, and Signal Processing; 2010. p. 4214–7. Taal CH, Hendriks RC, Heusdens R, Jensen J. A short-time objective intelligibility measure for time-frequency weighted noisy speech. In: Proceedings of Conference on Acoustics, Speech, and Signal Processing; 2010. p. 4214–7.
51.
Zurück zum Zitat Rix AW, Beerends JG, Hollier M, Hekstra AP. Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. 2001. p.749–752. Rix AW, Beerends JG, Hollier M, Hekstra AP. Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. 2001. p.749–752.
52.
Zurück zum Zitat Kubichek R. Mel-cepstral distance measure for objective speech quality assessment. In: Proceedings of IEEE Pacific Rim Conference on Communications Computers and Signal Processing; 1993. p. 125–8. Kubichek R. Mel-cepstral distance measure for objective speech quality assessment. In: Proceedings of IEEE Pacific Rim Conference on Communications Computers and Signal Processing; 1993. p. 125–8.
53.
Zurück zum Zitat Gray A, Markel J. Distance measures for speech processing. IEEE Trans Acoust Speech Signal Process. 1976;24(5):380–91.CrossRef Gray A, Markel J. Distance measures for speech processing. IEEE Trans Acoust Speech Signal Process. 1976;24(5):380–91.CrossRef
54.
Zurück zum Zitat Malfait L, Berger J, Kastner M. P.563 - The ITU-T standard for single-ended speech quality assessment. IEEE Trans Audio Speech Language Process. 2006;14(6):1924–34. Malfait L, Berger J, Kastner M. P.563 - The ITU-T standard for single-ended speech quality assessment. IEEE Trans Audio Speech Language Process. 2006;14(6):1924–34.
Metadaten
Titel
A Novel Attention-Guided Generative Adversarial Network for Whisper-to-Normal Speech Conversion
verfasst von
Teng Gao
Qing Pan
Jian Zhou
Huabin Wang
Liang Tao
Hon Keung Kwan
Publikationsdatum
16.01.2023
Verlag
Springer US
Erschienen in
Cognitive Computation / Ausgabe 2/2023
Print ISSN: 1866-9956
Elektronische ISSN: 1866-9964
DOI
https://doi.org/10.1007/s12559-023-10108-9