Skip to main content
Erschienen in: International Journal of Speech Technology 1/2019

03.12.2018

Speech synthesis for glottal activity region processing

verfasst von: Nagaraj Adiga, S. R. M Prasanna

Erschienen in: International Journal of Speech Technology | Ausgabe 1/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The objective of this paper is to demonstrate the significance of combining different features present in the glottal activity region for statistical parametric speech synthesis (SPSS). Different features present in the glottal activity regions are broadly categorized as F0, system, and source features, which represent the quality of speech. F0 feature is computed from zero frequency filter and system feature is computed from 2-D based Riesz transform. Source features include aperiodicity and phase component. Aperiodicity component representing the amount of aperiodic component present in a frame is computed from Riesz transform, whereas, phase component is computed by modeling integrated linear prediction residual. The combined features resulted in better quality compared to STRAIGHT based SPSS both in terms of objective and subjective evaluation. Further, the proposed method is extended to two Indian languages, namely, Assamese and Manipuri, which shows similar improvement in quality.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Adiga, N., Khonglah, B. K., & Prasanna, S. M. (2017). Improved voicing decision using glottal activity features for statistical parametric speech synthesis. Digital Signal Processing, 71, 131–143.MathSciNetCrossRef Adiga, N., Khonglah, B. K., & Prasanna, S. M. (2017). Improved voicing decision using glottal activity features for statistical parametric speech synthesis. Digital Signal Processing, 71, 131–143.MathSciNetCrossRef
Zurück zum Zitat Adiga, N., & Prasanna, S. R. M. (2015). Detection of glottal activity using different attributes of source information. The IEEE Signal Processing Letters, 22(11), 2107–2111.CrossRef Adiga, N., & Prasanna, S. R. M. (2015). Detection of glottal activity using different attributes of source information. The IEEE Signal Processing Letters, 22(11), 2107–2111.CrossRef
Zurück zum Zitat Airaksinen, M., Bollepalli, B., Juvela, L., Wu, Z., King, S. & Alku, P. (2016). Glottdnna full-band glottal vocoder for statistical parametric speech synthesis. In Proc. Interspeech. Airaksinen, M., Bollepalli, B., Juvela, L., Wu, Z., King, S. & Alku, P. (2016). Glottdnna full-band glottal vocoder for statistical parametric speech synthesis. In Proc. Interspeech.
Zurück zum Zitat Alku, P. (1992). Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering. Speech Communication, 1(2), 109–118.CrossRef Alku, P. (1992). Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering. Speech Communication, 1(2), 109–118.CrossRef
Zurück zum Zitat Ananthapadmanabha, T. V. (1984). Acoustic analysis of voice source dynamics. STL-QPSR 23. Speech, Music and Hearing, Royal Institute of Technology, Stockholm: Tech. Rep. Ananthapadmanabha, T. V. (1984). Acoustic analysis of voice source dynamics. STL-QPSR 23. Speech, Music and Hearing, Royal Institute of Technology, Stockholm: Tech. Rep.
Zurück zum Zitat Aragonda, H. & Seelamantula, C. (2013) Riesz-transform-based demodulation of narrowband spectrograms of voiced speech. In Proc. IEEE Int. Conf. Acoust. Speech Signal Process., May (pp. 8203–8207). Aragonda, H. & Seelamantula, C. (2013) Riesz-transform-based demodulation of narrowband spectrograms of voiced speech. In Proc. IEEE Int. Conf. Acoust. Speech Signal Process., May (pp. 8203–8207).
Zurück zum Zitat Aragonda, H., & Seelamantula, C. (2015). Demodulation of narrowband speech spectrograms using the Riesz transform. The IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(11), 1824–1834.CrossRef Aragonda, H., & Seelamantula, C. (2015). Demodulation of narrowband speech spectrograms using the Riesz transform. The IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(11), 1824–1834.CrossRef
Zurück zum Zitat Arik, S. O., Chrzanowski, M., Coates, A., Diamos, G., Gibiansky, A., Kang, Y., Li, X., Miller, J., Raiman, J. & Sengupta, S. et al. (2017). Deep Voice: Real-time neural text-to-speech. arXiv:1702.07825. Arik, S. O., Chrzanowski, M., Coates, A., Diamos, G., Gibiansky, A., Kang, Y., Li, X., Miller, J., Raiman, J. & Sengupta, S. et al. (2017). Deep Voice: Real-time neural text-to-speech. arXiv:​1702.​07825.
Zurück zum Zitat Chi, C.-Y., & Kung, J.-Y. (1995). A new identification algorithm for allpass systems by higher-order statistics. Signal Processing, 41(2), 239–256.CrossRefMATH Chi, C.-Y., & Kung, J.-Y. (1995). A new identification algorithm for allpass systems by higher-order statistics. Signal Processing, 41(2), 239–256.CrossRefMATH
Zurück zum Zitat De Cheveigné, A., & Kawahara, H. (2002). YIN, a fundamental frequency estimator for speech and music. The Journal of the Acoustical Society of America, 111(4), 1917–1930.CrossRef De Cheveigné, A., & Kawahara, H. (2002). YIN, a fundamental frequency estimator for speech and music. The Journal of the Acoustical Society of America, 111(4), 1917–1930.CrossRef
Zurück zum Zitat Eleftherios, B., Daniel, E., Antonio, B., & Asuncion, M. (2008). Flexible harmonic/stochastic modeling for HMM-based speech synthesis. V Jornadas en Tecnologa del Habla. Eleftherios, B., Daniel, E., Antonio, B., & Asuncion, M. (2008). Flexible harmonic/stochastic modeling for HMM-based speech synthesis. V Jornadas en Tecnologa del Habla.
Zurück zum Zitat Erro, D., Sainz, I., Navas, E., & Hernaez, I. (2014). Harmonics plus noise model based vocoder for statistical parametric speech synthesis. IEEE Journal of Selected Topics in Signal Process, 8(2), 184–194.CrossRef Erro, D., Sainz, I., Navas, E., & Hernaez, I. (2014). Harmonics plus noise model based vocoder for statistical parametric speech synthesis. IEEE Journal of Selected Topics in Signal Process, 8(2), 184–194.CrossRef
Zurück zum Zitat Fisher, W. M., Doddington, G. R. & Goudie-Marshall, K. M. (1986). The DARPA speech recognition research database: Specifications and status. In Proc. DARPA workshop on speech recognition (pp. 93–99). Fisher, W. M., Doddington, G. R. & Goudie-Marshall, K. M. (1986). The DARPA speech recognition research database: Specifications and status. In Proc. DARPA workshop on speech recognition (pp. 93–99).
Zurück zum Zitat Flanagan, J . L. (2013). Speech analysis, synthesis and perception (Vol. 3). New York: Springer. Flanagan, J . L. (2013). Speech analysis, synthesis and perception (Vol. 3). New York: Springer.
Zurück zum Zitat Fukada, T., Tokuda, K., Kobayashi, T., & Imai, S. (1992). An adaptive algorithm for mel-cepstral analysis of speech. Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing, 1, 137–140. Fukada, T., Tokuda, K., Kobayashi, T., & Imai, S. (1992). An adaptive algorithm for mel-cepstral analysis of speech. Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing, 1, 137–140.
Zurück zum Zitat Hemptinne, C. (2006). Integration of the harmonic plus noise model (HNM) into the Hidden Markov Model-Based speech synthesis system (HTS). Master’s thesis, Idiap Research Institute. Hemptinne, C. (2006). Integration of the harmonic plus noise model (HNM) into the Hidden Markov Model-Based speech synthesis system (HTS). Master’s thesis, Idiap Research Institute.
Zurück zum Zitat Hermes, D. J. (1988). Measurement of pitch by subharmonic summation. The Journal of the Acoustical Society of America, 83(1), 257–264.CrossRef Hermes, D. J. (1988). Measurement of pitch by subharmonic summation. The Journal of the Acoustical Society of America, 83(1), 257–264.CrossRef
Zurück zum Zitat Hunt, A. J., & Black, A. W. (1996). Unit selection in a concatenative speech synthesis system using a large speech database. Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing, 1, 373–376.CrossRef Hunt, A. J., & Black, A. W. (1996). Unit selection in a concatenative speech synthesis system using a large speech database. Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing, 1, 373–376.CrossRef
Zurück zum Zitat Kawahara, H., Estill, J. & Osamu, F. (2001). Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system straight. In Proc. MAVEBA (pp. 59–64). Kawahara, H., Estill, J. & Osamu, F. (2001). Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system straight. In Proc. MAVEBA (pp. 59–64).
Zurück zum Zitat Kawahara, H., Masuda-Katsuse, I., & de Cheveign, A. (1999). Restructuring speech representations using a pitch-adaptive time frequency smoothing and an instantaneous-frequency-based F0 extraction. Speech Communication, 27, 187–207.CrossRef Kawahara, H., Masuda-Katsuse, I., & de Cheveign, A. (1999). Restructuring speech representations using a pitch-adaptive time frequency smoothing and an instantaneous-frequency-based F0 extraction. Speech Communication, 27, 187–207.CrossRef
Zurück zum Zitat King, S. (2011). An introduction to statistical parametric speech synthesis. Sadhana, 36(5), 837–852.CrossRef King, S. (2011). An introduction to statistical parametric speech synthesis. Sadhana, 36(5), 837–852.CrossRef
Zurück zum Zitat Krishnamurthy, A., & Childers, D. (1986). Two-channel speech analysis. IEEE Transactions on Acoustics, Speech, and Signal Processing, 34(4), 730–743.CrossRef Krishnamurthy, A., & Childers, D. (1986). Two-channel speech analysis. IEEE Transactions on Acoustics, Speech, and Signal Processing, 34(4), 730–743.CrossRef
Zurück zum Zitat Larkin, K. G., Bone, D. J., & Oldfield, M. A. (2001). Natural demodulation of two-dimensional fringe patterns. I. General background of the spiral phase quadrature transform. The Journal of the Optical Society of America A, 18(8), 1862–1870.CrossRef Larkin, K. G., Bone, D. J., & Oldfield, M. A. (2001). Natural demodulation of two-dimensional fringe patterns. I. General background of the spiral phase quadrature transform. The Journal of the Optical Society of America A, 18(8), 1862–1870.CrossRef
Zurück zum Zitat Makhoul, J. (1975). Linear prediction: A tutorial review. Proceedings of the IEEE, 63(4), 561–580.CrossRef Makhoul, J. (1975). Linear prediction: A tutorial review. Proceedings of the IEEE, 63(4), 561–580.CrossRef
Zurück zum Zitat McAulay, R. J., & Quatieri, T. F. (1986). Speech analysis/synthesis based on a sinusoidal representation. IEEE Transactions on Acoustics, Speech, and Signal Processing, 34(4), 744–754.CrossRef McAulay, R. J., & Quatieri, T. F. (1986). Speech analysis/synthesis based on a sinusoidal representation. IEEE Transactions on Acoustics, Speech, and Signal Processing, 34(4), 744–754.CrossRef
Zurück zum Zitat Mehri, S., Kumar, K., Gulrajani, I., Kumar, R., Jain, S., Sotelo, J., Courville, A. & Bengio, Y. (2016). SampleRNN: An unconditional end-to-end neural audio generation model. arXiv:1612.07837. Mehri, S., Kumar, K., Gulrajani, I., Kumar, R., Jain, S., Sotelo, J., Courville, A. & Bengio, Y. (2016). SampleRNN: An unconditional end-to-end neural audio generation model. arXiv:​1612.​07837.
Zurück zum Zitat Murthy, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio Speech and Language Processing, 16, 1602–1613.CrossRef Murthy, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio Speech and Language Processing, 16, 1602–1613.CrossRef
Zurück zum Zitat Murthy, K. S. R., Yegnanarayana, B., & Joseph, M. A. (2009). Characterization of glottal activity from speech signals. The IEEE Signal Processing Letters, 16(6), 469–472.CrossRef Murthy, K. S. R., Yegnanarayana, B., & Joseph, M. A. (2009). Characterization of glottal activity from speech signals. The IEEE Signal Processing Letters, 16(6), 469–472.CrossRef
Zurück zum Zitat Nemer, E., Goubran, R., & Mahmoud, S. (2001). Robust voice activity detection using higher-order statistics in the LPC residual domain. IEEE Transactions on Speech and Audio Processing, 9(3), 217–231.CrossRef Nemer, E., Goubran, R., & Mahmoud, S. (2001). Robust voice activity detection using higher-order statistics in the LPC residual domain. IEEE Transactions on Speech and Audio Processing, 9(3), 217–231.CrossRef
Zurück zum Zitat Oppenheim, A. V. (1969). Speech analysis-synthesis system based on homomorphic filtering. The Journal of the Acoustical Society of America, 45(2), 458–465.CrossRef Oppenheim, A. V. (1969). Speech analysis-synthesis system based on homomorphic filtering. The Journal of the Acoustical Society of America, 45(2), 458–465.CrossRef
Zurück zum Zitat Pantazis, Y. & Stylianou, Y. (2008). Improving the modeling of the noise part in the harmonic plus noise model of speech. In Proc. IEEE Int. Conf. Acoust. Speech Signal Process, March (pp. 4609–4612). Pantazis, Y. & Stylianou, Y. (2008). Improving the modeling of the noise part in the harmonic plus noise model of speech. In Proc. IEEE Int. Conf. Acoust. Speech Signal Process, March (pp. 4609–4612).
Zurück zum Zitat Patil, H. A., Patel, T. B., Shah, N. J., Sailor, H. B., Krishnan, R., Kasthuri, G., Nagarajan, T., Christina, L., Kumar, N. & Raghavendra V. et al. (2013). A syllable-based framework for unit selection synthesis in 13 Indian languages. In Proc. Oriental COCOSDA (pp. 1–8). IEEE. Patil, H. A., Patel, T. B., Shah, N. J., Sailor, H. B., Krishnan, R., Kasthuri, G., Nagarajan, T., Christina, L., Kumar, N. & Raghavendra V. et al. (2013). A syllable-based framework for unit selection synthesis in 13 Indian languages. In Proc. Oriental COCOSDA (pp. 1–8). IEEE.
Zurück zum Zitat Plante, F., Meyer, G., & Ainsworth, W. (1995). A pitch extraction reference database. Children, 8(12), 30–50. Plante, F., Meyer, G., & Ainsworth, W. (1995). A pitch extraction reference database. Children, 8(12), 30–50.
Zurück zum Zitat Prathosh, A., Ananthapadmanabha, T., & Ramakrishnan, A. (2013). Epoch extraction based on integrated linear prediction residual using plosion index. IEEE Transactions on Audio Speech and Language Processing, 21(12), 2471–2480.CrossRef Prathosh, A., Ananthapadmanabha, T., & Ramakrishnan, A. (2013). Epoch extraction based on integrated linear prediction residual using plosion index. IEEE Transactions on Audio Speech and Language Processing, 21(12), 2471–2480.CrossRef
Zurück zum Zitat Quatieri, T. F. (2002). 2-D processing of speech with application to pitch estimation. In Proc. Interspeech. Quatieri, T. F. (2002). 2-D processing of speech with application to pitch estimation. In Proc. Interspeech.
Zurück zum Zitat Raitio, T., Suni, A., Pulakka, H., Vainio, M. & Alku, P. (2011). Utilizing glottal source pulse library for generating improved excitation signal for HMM-based speech synthesis. In Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (pp. 4564–4567). Raitio, T., Suni, A., Pulakka, H., Vainio, M. & Alku, P. (2011). Utilizing glottal source pulse library for generating improved excitation signal for HMM-based speech synthesis. In Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (pp. 4564–4567).
Zurück zum Zitat Raitio, T., Suni, A., Yamagishi, J., Pulakka, H., Nurminen, J., Vainio, M., et al. (2011). HMM-based speech synthesis utilizing glottal inverse filtering. IEEE Transactions on Audio Speech and Language Processing, 19–1, 153–165.CrossRef Raitio, T., Suni, A., Yamagishi, J., Pulakka, H., Nurminen, J., Vainio, M., et al. (2011). HMM-based speech synthesis utilizing glottal inverse filtering. IEEE Transactions on Audio Speech and Language Processing, 19–1, 153–165.CrossRef
Zurück zum Zitat Seelamantula, C. S., Pavillon, N., Depeursinge, C., & Unser, M. (2012). Local demodulation of holograms using the Riesz transform with application to microscopy. The Journal of the Optical Society of America A, 29(10), 2118–2129.CrossRef Seelamantula, C. S., Pavillon, N., Depeursinge, C., & Unser, M. (2012). Local demodulation of holograms using the Riesz transform with application to microscopy. The Journal of the Optical Society of America A, 29(10), 2118–2129.CrossRef
Zurück zum Zitat Shamma, S. (2001). On the role of space and time in auditory processing. Trends in Cognitive Sciences, 5(8), 340–348.CrossRef Shamma, S. (2001). On the role of space and time in auditory processing. Trends in Cognitive Sciences, 5(8), 340–348.CrossRef
Zurück zum Zitat Sharma, B., Adiga, N. & Prasanna, S. M. (2015). Development of Assamese text-to-speech synthesis system. In Proc. TENCON (pp. 1–6). IEEE. Sharma, B., Adiga, N. & Prasanna, S. M. (2015). Development of Assamese text-to-speech synthesis system. In Proc. TENCON (pp. 1–6). IEEE.
Zurück zum Zitat Sjölander, K. & Beskow, J. (2000). Wavesurfer—An open source speech tool. In Proc. Interspeech (pp. 464–467). Sjölander, K. & Beskow, J. (2000). Wavesurfer—An open source speech tool. In Proc. Interspeech (pp. 464–467).
Zurück zum Zitat Stylianou, Y. (2001). Applying the harmonic plus noise model in concatenative speech synthesis. IEEE Transactions on Speech and Audio Processing, 9(1), 21–29.CrossRef Stylianou, Y. (2001). Applying the harmonic plus noise model in concatenative speech synthesis. IEEE Transactions on Speech and Audio Processing, 9(1), 21–29.CrossRef
Zurück zum Zitat Stylianou, I. (1996). Harmonic plus noise models for speech, combined with statistical methods, for speech and speaker modification. Ph.D. dissertation, Ecole Nationale Supérieure des Télécommunications Stylianou, I. (1996). Harmonic plus noise models for speech, combined with statistical methods, for speech and speaker modification. Ph.D. dissertation, Ecole Nationale Supérieure des Télécommunications
Zurück zum Zitat Tokuda, K., Kobayashi, T., Masuko, T. & Imai, S. (1994). Mel-generalized cepstral analysis-a unified approach to speech spectral estimation. In Proceedings of ICSLP. Tokuda, K., Kobayashi, T., Masuko, T. & Imai, S. (1994). Mel-generalized cepstral analysis-a unified approach to speech spectral estimation. In Proceedings of ICSLP.
Zurück zum Zitat Tokuda, K., Nankaku, Y., Toda, T., Zen, H., Yamagishi, J., & Oura, K. (2013). Speech synthesis based on hidden Markov models. Proceedings of the IEEE, 101–5, 1234–1252.CrossRef Tokuda, K., Nankaku, Y., Toda, T., Zen, H., Yamagishi, J., & Oura, K. (2013). Speech synthesis based on hidden Markov models. Proceedings of the IEEE, 101–5, 1234–1252.CrossRef
Zurück zum Zitat van den oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A. & Kavukcuoglu, K. (2016). WaveNet: A generative model for raw audio. arXiv:1609.03499. van den oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A. & Kavukcuoglu, K. (2016). WaveNet: A generative model for raw audio. arXiv:​1609.​03499.
Zurück zum Zitat Wang, T., & Quatieri, T. (2012). Two-dimensional speech-signal modeling. IEEE Transactions on Audio Speech and Language Processing, 20(6), 1843–1856.CrossRef Wang, T., & Quatieri, T. (2012). Two-dimensional speech-signal modeling. IEEE Transactions on Audio Speech and Language Processing, 20(6), 1843–1856.CrossRef
Zurück zum Zitat Wang, Y., Skerry-Ryan, R., Stanton, D., Wu, Y., Weiss, R. J., Jaitly, N., Yang, Z., Xiao, Y., Chen, Z., Bengio, S., Le, Q., Agiomyrgiannakis, Y., Clark, R. & Saurous, R. A. (2017). Tacotron: A fully end-to-end text-to-speech synthesis model. arXiv:1703.10135. Wang, Y., Skerry-Ryan, R., Stanton, D., Wu, Y., Weiss, R. J., Jaitly, N., Yang, Z., Xiao, Y., Chen, Z., Bengio, S., Le, Q., Agiomyrgiannakis, Y., Clark, R. & Saurous, R. A. (2017). Tacotron: A fully end-to-end text-to-speech synthesis model. arXiv:​1703.​10135.
Zurück zum Zitat Wu, Z., Watts, O., & King, S. (2016). Merlin: An open source neural network speech synthesis system. In Proceedings of the speech synthesis workshop (SSW). Sunnyvale, USA: SSW. Wu, Z., Watts, O., & King, S. (2016). Merlin: An open source neural network speech synthesis system. In Proceedings of the speech synthesis workshop (SSW). Sunnyvale, USA: SSW.
Zurück zum Zitat Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T. & Kitamura, T. (1999). Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis. In Proceedings of Eurospeech. Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T. & Kitamura, T. (1999). Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis. In Proceedings of Eurospeech.
Zurück zum Zitat Zen, H., Tokuda, K., & Black, A. W. (2009). Statistical parametric speech synthesis. Speech Communication, 51–11, 1039–1064.CrossRef Zen, H., Tokuda, K., & Black, A. W. (2009). Statistical parametric speech synthesis. Speech Communication, 51–11, 1039–1064.CrossRef
Metadaten
Titel
Speech synthesis for glottal activity region processing
verfasst von
Nagaraj Adiga
S. R. M Prasanna
Publikationsdatum
03.12.2018
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 1/2019
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-018-09583-5

Weitere Artikel der Ausgabe 1/2019

International Journal of Speech Technology 1/2019 Zur Ausgabe

Neuer Inhalt