Skip to main content
Erschienen in: International Journal of Speech Technology 1/2019

12.12.2018

Enhancement of esophageal speech obtained by a voice conversion technique using time dilated Fourier cepstra

verfasst von: Imen Ben Othmane, Joseph Di Martino, Kaïs Ouni

Erschienen in: International Journal of Speech Technology | Ausgabe 1/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper presents a novel speaking-aid system for enhancing esophageal speech (ES). The method adopted in this paper aims to improve the quality of esophageal speech using a combination of a voice conversion technique and a time dilation algorithm. In the proposed system, a Deep Neural Network (DNN) is used as a nonlinear mapping function for vocal tract vector transformation. Then the converted frames are used to determine realistic excitation and phase vectors from the target training space using a frame selection algorithm. Next, in order to preserve speaker identity of the esophageal speakers, we use the source vocal tract features and propose to apply on them a time dilation algorithm to reduce the unpleasant esophageal noises. Finally the converted speech is reconstructed using the dilated source vocal tract frames and the predicted excitation and phase. DNN and Gaussian mixture model (GMM) based voice conversion systems have been evaluated using objective and subjective measures. Such an experimental study has been realized also in order to evaluate the changes in speech quality and intelligibility of the transformed signals. Experimental results demonstrate that the proposed methods provide considerable improvement in intelligibility and naturalness of the converted esophageal speech.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Abe, M., et al. (1990). Voice conversion through vector quantization. Journal of the Acoustical Society of Japan (E), 11(2), 71–76.CrossRef Abe, M., et al. (1990). Voice conversion through vector quantization. Journal of the Acoustical Society of Japan (E), 11(2), 71–76.CrossRef
Zurück zum Zitat Arya, S. (1996). Nearest neighbor searching and applications. Univ. of Maryland at College Park, MD. Arya, S. (1996). Nearest neighbor searching and applications. Univ. of Maryland at College Park, MD.
Zurück zum Zitat Barney, H. L., Haworth, F. E., & Dunn, H. K. (1959). An experimental transistorized artificial larynx. Bell Labs Technical Journal, 38(6), 1337–1356.CrossRef Barney, H. L., Haworth, F. E., & Dunn, H. K. (1959). An experimental transistorized artificial larynx. Bell Labs Technical Journal, 38(6), 1337–1356.CrossRef
Zurück zum Zitat Beauregard, G. T., Zhu, X., & Wyse, L. (2005) An efficient algorithm for real-time spectrogram inversion. In Proceedings of the 8th international conference on digital audio effects. Beauregard, G. T., Zhu, X., & Wyse, L. (2005) An efficient algorithm for real-time spectrogram inversion. In Proceedings of the 8th international conference on digital audio effects.
Zurück zum Zitat Boll, S. F. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech and Signal Processing, 27(2), 113120.CrossRef Boll, S. F. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech and Signal Processing, 27(2), 113120.CrossRef
Zurück zum Zitat Chenausky, K., & MacAuslan, J. (2000). Utilization of microprocessors in voice quality improvement: The electrolarynx. Current Opinion in Otolaryngology & Head and Neck Surgery, 8(3), 138–142.CrossRef Chenausky, K., & MacAuslan, J. (2000). Utilization of microprocessors in voice quality improvement: The electrolarynx. Current Opinion in Otolaryngology & Head and Neck Surgery, 8(3), 138–142.CrossRef
Zurück zum Zitat Childers, D. G., Skinner, D. P., & Kemerait, R. C. (1977). The cepstrum: A guide to processing. Proceedings of the IEEE, 65(10), 1428–1443.CrossRef Childers, D. G., Skinner, D. P., & Kemerait, R. C. (1977). The cepstrum: A guide to processing. Proceedings of the IEEE, 65(10), 1428–1443.CrossRef
Zurück zum Zitat Cole, D., et al. (1997). Application of noise reduction techniques for alaryngeal speech enhancement. In TENCON’97. IEEE region 10 annual conference. Speech and image technologies for computing and telecommunications., Proceedings of IEEE (Vol. 2). IEEE. Cole, D., et al. (1997). Application of noise reduction techniques for alaryngeal speech enhancement. In TENCON’97. IEEE region 10 annual conference. Speech and image technologies for computing and telecommunications., Proceedings of IEEE (Vol. 2). IEEE.
Zurück zum Zitat Del Pozo, A., & Young, S. (2006). Continuous tracheoesophageal speech repair. In Signal processing conference, 2006 14th European. IEEE. Del Pozo, A., & Young, S. (2006). Continuous tracheoesophageal speech repair. In Signal processing conference, 2006 14th European. IEEE.
Zurück zum Zitat Del Pozo, A., & Young, S. (2008). Repairing tracheoesophageal speech duration. In Proc Speech Prosody. Del Pozo, A., & Young, S. (2008). Repairing tracheoesophageal speech duration. In Proc Speech Prosody.
Zurück zum Zitat Desai, S., et al. (2009). Voice conversion using artificial neural networks. In Acoustics, speech and signal processing, 2009. ICASSP 2009. IEEE international conference on. IEEE. Desai, S., et al. (2009). Voice conversion using artificial neural networks. In Acoustics, speech and signal processing, 2009. ICASSP 2009. IEEE international conference on. IEEE.
Zurück zum Zitat Desai, S., et al. (2010). Spectral mapping using artificial neural networks for voice conversion. IEEE Transactions on Audio, Speech, and Language Processing, 18(5), 954–964.CrossRef Desai, S., et al. (2010). Spectral mapping using artificial neural networks for voice conversion. IEEE Transactions on Audio, Speech, and Language Processing, 18(5), 954–964.CrossRef
Zurück zum Zitat Deza, M. M., & Deza, E. (2009). Encyclopedia of distances. In Encyclopedia of distances (pp. 1–583). Springer, Berlin. Deza, M. M., & Deza, E. (2009). Encyclopedia of distances. In Encyclopedia of distances (pp. 1–583). Springer, Berlin.
Zurück zum Zitat Doi, H. (2010). Esophageal speech enhancement based on statistical voice conversion with Gaussian mixture models. IEICE Transaction on Information and Systems, 93(9), 2472–2482.CrossRef Doi, H. (2010). Esophageal speech enhancement based on statistical voice conversion with Gaussian mixture models. IEICE Transaction on Information and Systems, 93(9), 2472–2482.CrossRef
Zurück zum Zitat Doi, H., Nakamura, K., Toda, T., Saruwatari, H., & Shikano, K. (May 2011). An evaluation of alaryngeal speech enhancement methods based on voice conversion techniques. In Proc. ICASSP (pp. 5136–5139). Doi, H., Nakamura, K., Toda, T., Saruwatari, H., & Shikano, K. (May 2011). An evaluation of alaryngeal speech enhancement methods based on voice conversion techniques. In Proc. ICASSP (pp. 5136–5139).
Zurück zum Zitat Doi, H., et al. (2010). Statistical approach to enhancing esophageal speech based on Gaussian mixture models. In Acoustics speech and signal processing (ICASSP), 2010 IEEE international conference on. IEEE. Doi, H., et al. (2010). Statistical approach to enhancing esophageal speech based on Gaussian mixture models. In Acoustics speech and signal processing (ICASSP), 2010 IEEE international conference on. IEEE.
Zurück zum Zitat Filter, M. D., & Hyman, M. (1975). Relationship of acoustic parameters and perceptual ratings of esophageal speech. Perceptual and Motor Skills, 40(1), 63–68.CrossRef Filter, M. D., & Hyman, M. (1975). Relationship of acoustic parameters and perceptual ratings of esophageal speech. Perceptual and Motor Skills, 40(1), 63–68.CrossRef
Zurück zum Zitat García, B., Vicente, J., & Aramendi, E. (2002). Time-spectral technique for esophageal speech regeneration. In 11th EUSIPCO (European Signal Processing Conference). IEEE, Toulouse, France. García, B., Vicente, J., & Aramendi, E. (2002). Time-spectral technique for esophageal speech regeneration. In 11th EUSIPCO (European Signal Processing Conference). IEEE, Toulouse, France.
Zurück zum Zitat García, B., et al. (2005). Esophageal voices: Glottal flow restoration. In IEEE international conference on acoustics, speech, and signal processing, 2005. Proceedings (ICASSP’05) (Vol. 4). IEEE. García, B., et al. (2005). Esophageal voices: Glottal flow restoration. In IEEE international conference on acoustics, speech, and signal processing, 2005. Proceedings (ICASSP’05) (Vol. 4). IEEE.
Zurück zum Zitat Griffin, D., & Lim, J. (1984). Signal estimation from modified short-time Fourier transform. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(2), 236–243.CrossRef Griffin, D., & Lim, J. (1984). Signal estimation from modified short-time Fourier transform. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(2), 236–243.CrossRef
Zurück zum Zitat Hisada, A., & Sawada, H. (2002). Real-time clarification of esophageal speech using a comb filter. In International conference on disability, virtual reality and associated technologies (pp. 39–46). Hisada, A., & Sawada, H. (2002). Real-time clarification of esophageal speech using a comb filter. In International conference on disability, virtual reality and associated technologies (pp. 39–46).
Zurück zum Zitat Hoops, H. R., & Noll, J. D. (1969). Relationship of selected acoustic variables to judgments of esophageal speech. Journal of Communication Disorders, 2(1), 1–13.CrossRef Hoops, H. R., & Noll, J. D. (1969). Relationship of selected acoustic variables to judgments of esophageal speech. Journal of Communication Disorders, 2(1), 1–13.CrossRef
Zurück zum Zitat Ben Othmane, I., Di Martino, J., & Ouni, K. (2017). Enhancement of esophageal speech using voice conversion techniques. In International conference on natural language, signal and speech processing-ICNLSSP. Ben Othmane, I., Di Martino, J., & Ouni, K. (2017). Enhancement of esophageal speech using voice conversion techniques. In International conference on natural language, signal and speech processing-ICNLSSP.
Zurück zum Zitat Ben Othmane, I., Di Martino, J., & Ouni, K. (2018). Improving the computational performance of standard GMM-based voice conversion systems used in real-time applications. In ICECOCS18—1st international conference on electronics, control, optimization and computer science, Dec 2018, Kenitra, Morocco. IEEE. Ben Othmane, I., Di Martino, J., & Ouni, K. (2018). Improving the computational performance of standard GMM-based voice conversion systems used in real-time applications. In ICECOCS18—1st international conference on electronics, control, optimization and computer science, Dec 2018, Kenitra, Morocco. IEEE.
Zurück zum Zitat Ben Othmane, I., Di Martino, J., & Ouni, K. (2018). Enhancement of esophageal speech using statistical and neuromimetic voice conversion techniques. Journal of International Science and General Applications, 1(1), 10. Ben Othmane, I., Di Martino, J., & Ouni, K. (2018). Enhancement of esophageal speech using statistical and neuromimetic voice conversion techniques. Journal of International Science and General Applications, 1(1), 10.
Zurück zum Zitat Kain, A., & Macon, M. W. (1998). Spectral voice conversion for text-to-speech synthesis. In Acoustics, speech and signal processing, 1998. Proceedings of the 1998 IEEE international conference on (Vol. 1). IEEE. Kain, A., & Macon, M. W. (1998). Spectral voice conversion for text-to-speech synthesis. In Acoustics, speech and signal processing, 1998. Proceedings of the 1998 IEEE international conference on (Vol. 1). IEEE.
Zurück zum Zitat Kanungo, T. (2002). An efficient k-means clustering algorithm: Analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7), 881–892.CrossRef Kanungo, T. (2002). An efficient k-means clustering algorithm: Analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7), 881–892.CrossRef
Zurück zum Zitat Kawahara, H., et al. (1999). Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity. In Sixth european conference on speech communication and technology. Kawahara, H., et al. (1999). Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity. In Sixth european conference on speech communication and technology.
Zurück zum Zitat Ling-HuiChen, Z.-H., & YanSong, L.-R. (2013). Joint spectral distribution modeling using restricted boltzmann machines for voice conversion. Ling-HuiChen, Z.-H., & YanSong, L.-R. (2013). Joint spectral distribution modeling using restricted boltzmann machines for voice conversion.
Zurück zum Zitat Liu, H., Zhao, Q., Wan, M. X., & Wang, S. P. (2006). Enhancement of electrolarynx speech based on auditory masking. IEEE Transactions on Biomedical Engineering, 53(5), 865874. Liu, H., Zhao, Q., Wan, M. X., & Wang, S. P. (2006). Enhancement of electrolarynx speech based on auditory masking. IEEE Transactions on Biomedical Engineering, 53(5), 865874.
Zurück zum Zitat Mantilla-Caeiros, A., Nakano-Miyatake, M., & Perez-Meana, H. (2010). A pattern recognition based esophageal speech enhancement system. Journal of Applied Research and Technology, 8(1), 56–70. Mantilla-Caeiros, A., Nakano-Miyatake, M., & Perez-Meana, H. (2010). A pattern recognition based esophageal speech enhancement system. Journal of Applied Research and Technology, 8(1), 56–70.
Zurück zum Zitat Matsui, K., et al. (2002). Enhancement of esophageal speech using formant synthesis. Acoustical Science and Technology, 23(2), 69–76.CrossRef Matsui, K., et al. (2002). Enhancement of esophageal speech using formant synthesis. Acoustical Science and Technology, 23(2), 69–76.CrossRef
Zurück zum Zitat Matui, K., Hara, N., Kobayashi, N., & Hirose, H. (May, 1999). Enhancement of esophageal speech using formant synthesis. In Proc. ICASSP (pp. 1831–1834), Phoenix, Arizona. Matui, K., Hara, N., Kobayashi, N., & Hirose, H. (May, 1999). Enhancement of esophageal speech using formant synthesis. In Proc. ICASSP (pp. 1831–1834), Phoenix, Arizona.
Zurück zum Zitat Mouchtaris, A., Van der Spiegel, J., & Mueller, P. (2006). Nonparallel training for voice conversion based on a parameter adaptation approach. IEEE Transactions on Audio, Speech, and Language Processing, 14(3), 952–963.CrossRef Mouchtaris, A., Van der Spiegel, J., & Mueller, P. (2006). Nonparallel training for voice conversion based on a parameter adaptation approach. IEEE Transactions on Audio, Speech, and Language Processing, 14(3), 952–963.CrossRef
Zurück zum Zitat Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10). Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10).
Zurück zum Zitat Nakamura, K., Toda, T., Saruwatari, H., & Shikano, K. (2012). Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech. SPECOM, 54(1), 134146. Nakamura, K., Toda, T., Saruwatari, H., & Shikano, K. (2012). Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech. SPECOM, 54(1), 134146.
Zurück zum Zitat Nakashika, T., Takiguchi, T., & Ariki, Y. (2014). High-order sequence modeling using speaker-dependent recurrent temporal restricted Boltzmann machines for voice conversion. In Fifteenth annual conference of the international speech communication association. Nakashika, T., Takiguchi, T., & Ariki, Y. (2014). High-order sequence modeling using speaker-dependent recurrent temporal restricted Boltzmann machines for voice conversion. In Fifteenth annual conference of the international speech communication association.
Zurück zum Zitat Nakashika, T., et al. (2013). Voice conversion in high-order eigen space using deep belief nets. In Interspeech. Nakashika, T., et al. (2013). Voice conversion in high-order eigen space using deep belief nets. In Interspeech.
Zurück zum Zitat Nankaku, Y., et al. (2007). Spectral conversion based on statistical models including time-sequence matching. Nankaku, Y., et al. (2007). Spectral conversion based on statistical models including time-sequence matching.
Zurück zum Zitat Narendranath, M. (1995). Transformation of formants for voice conversion using artificial neural networks. Speech Communication, 16(2), 207–216.CrossRef Narendranath, M. (1995). Transformation of formants for voice conversion using artificial neural networks. Speech Communication, 16(2), 207–216.CrossRef
Zurück zum Zitat Park, S. H. (2011). Simple linear regression. In International Encyclopedia of Statistical Science (pp. 1327–1328). Springer, Berlin. Park, S. H. (2011). Simple linear regression. In International Encyclopedia of Statistical Science (pp. 1327–1328). Springer, Berlin.
Zurück zum Zitat Qi, Y., Weinberg, B., & Bi, N. (1995). Enhancement of female esophageal and tracheoesophageal speech. The Journal of the Acoustical Society of America, 98(5), 2461–2465.CrossRef Qi, Y., Weinberg, B., & Bi, N. (1995). Enhancement of female esophageal and tracheoesophageal speech. The Journal of the Acoustical Society of America, 98(5), 2461–2465.CrossRef
Zurück zum Zitat Robbins, J., et al. (1984). A comparative acoustic study of normal, esophageal, and tracheoesophageal speech production. The Journal of Speech and Hearing Disorders, 49(2), 202–210.CrossRef Robbins, J., et al. (1984). A comparative acoustic study of normal, esophageal, and tracheoesophageal speech production. The Journal of Speech and Hearing Disorders, 49(2), 202–210.CrossRef
Zurück zum Zitat Robbins, J., et al. (1984). Selected acoustic features of tracheoesophageal, esophageal, and laryngeal speech. Archives of Otolaryngology, 110(10), 670–672.CrossRef Robbins, J., et al. (1984). Selected acoustic features of tracheoesophageal, esophageal, and laryngeal speech. Archives of Otolaryngology, 110(10), 670–672.CrossRef
Zurück zum Zitat Sharifzadeh, H. R., McLoughlin, I. V., & Ahmadi, F. (2010). Reconstruction of normal sounding speech for laryngectomy patients through a modified CELP codec. IEEE Transactions on Biomedical Engineering, 57(10), 2448–2458.CrossRef Sharifzadeh, H. R., McLoughlin, I. V., & Ahmadi, F. (2010). Reconstruction of normal sounding speech for laryngectomy patients through a modified CELP codec. IEEE Transactions on Biomedical Engineering, 57(10), 2448–2458.CrossRef
Zurück zum Zitat Shipp, T. (1967). Frequency, duration, and perceptual measures in relation to judgments of alaryngeal speech acceptability. Journal of Speech, Language, and Hearing Research, 10, 417–427.CrossRef Shipp, T. (1967). Frequency, duration, and perceptual measures in relation to judgments of alaryngeal speech acceptability. Journal of Speech, Language, and Hearing Research, 10, 417–427.CrossRef
Zurück zum Zitat Snidecor, J. C., & Curry, E. T. (1959). XLIV temporal and pitch aspects of superior esophageal speech. Annals of Otology, Rhinology & Laryngology, 68(3), 623–636.CrossRef Snidecor, J. C., & Curry, E. T. (1959). XLIV temporal and pitch aspects of superior esophageal speech. Annals of Otology, Rhinology & Laryngology, 68(3), 623–636.CrossRef
Zurück zum Zitat Srivastava, N. (2013). Improving neural networks with dropout. University of Toronto 182. Srivastava, N. (2013). Improving neural networks with dropout. University of Toronto 182.
Zurück zum Zitat Srivastava, N. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1), 1929–1958.MathSciNetMATH Srivastava, N. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1), 1929–1958.MathSciNetMATH
Zurück zum Zitat Stylianou, O. C., & Moulines, E. (1998). Continuous probabilistic transform for voice conversion. IEEE Transactions on Speech and Audio Processing, 6(2), 131–142.CrossRef Stylianou, O. C., & Moulines, E. (1998). Continuous probabilistic transform for voice conversion. IEEE Transactions on Speech and Audio Processing, 6(2), 131–142.CrossRef
Zurück zum Zitat Toda, T., Black, A. W., & Tokuda, K. (2007). Voice conversion based on maximum likelihood estimation of spectral parameter trajectory. IEEE Transactions on Audio, Speech, and Language, 15(8), 2222–2235.CrossRef Toda, T., Black, A. W., & Tokuda, K. (2007). Voice conversion based on maximum likelihood estimation of spectral parameter trajectory. IEEE Transactions on Audio, Speech, and Language, 15(8), 2222–2235.CrossRef
Zurück zum Zitat Tüurkmen, H. I., & Karsligil, M. E. (2008). Reconstruction of dysphonic speech by melp. In Iberoamerican congress on pattern recognition. Springer, Berlin. Tüurkmen, H. I., & Karsligil, M. E. (2008). Reconstruction of dysphonic speech by melp. In Iberoamerican congress on pattern recognition. Springer, Berlin.
Zurück zum Zitat Werghi, A., Di Martino, J., & Jebara, S. B. (2010). On the use of an iterative estimation of continuous probabilistic transforms for voice conversion. In I/V Communications and mobile network (ISVC), 2010 5th international symposium on. IEEE. Werghi, A., Di Martino, J., & Jebara, S. B. (2010). On the use of an iterative estimation of continuous probabilistic transforms for voice conversion. In I/V Communications and mobile network (ISVC), 2010 5th international symposium on. IEEE.
Zurück zum Zitat Wu, Z., Chng, E. S., & Li, H. (2013). Conditional restricted boltzmann machine for voice conversion. In Signal and information processing (ChinaSIP), 2013 IEEE China summit & international conference on. IEEE. Wu, Z., Chng, E. S., & Li, H. (2013). Conditional restricted boltzmann machine for voice conversion. In Signal and information processing (ChinaSIP), 2013 IEEE China summit & international conference on. IEEE.
Zurück zum Zitat Zhang, M., et al. (2008). Text-independent voice conversion based on state mapped codebook. In Acoustics, speech and signal processing, 2008. ICASSP 2008. IEEE international conference on. IEEE. Zhang, M., et al. (2008). Text-independent voice conversion based on state mapped codebook. In Acoustics, speech and signal processing, 2008. ICASSP 2008. IEEE international conference on. IEEE.
Zurück zum Zitat Zhu, X., Beauregard, G. T., & Wyse, L. L. (2007). Real-time signal estimation from modified short-time Fourier transform magnitude spectra. IEEE Transactions on Audio, Speech, and Language Processing, 15(5), 1645–1653.CrossRef Zhu, X., Beauregard, G. T., & Wyse, L. L. (2007). Real-time signal estimation from modified short-time Fourier transform magnitude spectra. IEEE Transactions on Audio, Speech, and Language Processing, 15(5), 1645–1653.CrossRef
Metadaten
Titel
Enhancement of esophageal speech obtained by a voice conversion technique using time dilated Fourier cepstra
verfasst von
Imen Ben Othmane
Joseph Di Martino
Kaïs Ouni
Publikationsdatum
12.12.2018
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 1/2019
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-018-09579-1

Weitere Artikel der Ausgabe 1/2019

International Journal of Speech Technology 1/2019 Zur Ausgabe

Neuer Inhalt