Skip to main content
Erschienen in: International Journal of Speech Technology 3/2018

15.05.2018

Speech analysis and synthesis with a refined adaptive sinusoidal representation

verfasst von: Youcef Tabet, Mohamed Boughazi, Saddek Afifi

Erschienen in: International Journal of Speech Technology | Ausgabe 3/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper explores common speech signal representations along with a brief description of their corresponding analysis–synthesis stages. The main focus is on adaptive sinusoidal representations where a refined model of speech is suggested. This model is referred to as Refined adaptive Sinusoidal Representation (R_aSR). Based on the performance of the recently suggested adaptive Sinusoidal Models of speech, significant refinements are proposed at both the analysis and adaptive stages. First, a quasi-harmonic representation of speech is used in the analysis stage in order to obtain an initial estimation of the instantaneous model parameters. Next, in the adaptive stage, an adaptive scheme combined with an iterative frequency correction mechanism is used to allow a robust estimation of model parameters (amplitudes, frequencies, and phases). Finally, the speech signal is reconstructed as a sum of its estimated time-varying instantaneous components after an interpolation scheme. Objective evaluation tests prove that the suggested R_aSR achieves high quality reconstruction when applied in modeling voiced speech signals compared to state-of-the-art models. Moreover, transparent perceived quality was attained using the R_aSR according to results obtained from listening evaluation tests.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Abrantes, A. J., Marques, J. S., & Transcoso, I. M. (1991). Hybrid sinusoidal modeling of speech without voicing decicion. In Eurospeeech91, Genova (pp. 231-234). Abrantes, A. J., Marques, J. S., & Transcoso, I. M. (1991). Hybrid sinusoidal modeling of speech without voicing decicion. In Eurospeeech91, Genova (pp. 231-234).
Zurück zum Zitat Almeida, L. B., & Silva, F. M. (1984). Variable-frequency synthesis: An improved harmonic coding scheme. Proceedings of IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP), 1, 2751–2754. Almeida, L. B., & Silva, F. M. (1984). Variable-frequency synthesis: An improved harmonic coding scheme. Proceedings of IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP), 1, 2751–2754.
Zurück zum Zitat Atal, B., & Hanauer, S. (1971). Speech analysis and synthesis by linear prediction of the speech wave. Journal of Acoustical Society of America (JASA), 50, 637–655.CrossRef Atal, B., & Hanauer, S. (1971). Speech analysis and synthesis by linear prediction of the speech wave. Journal of Acoustical Society of America (JASA), 50, 637–655.CrossRef
Zurück zum Zitat Degottex, G., & Stylianou, Y. (2012). A full-band adaptive harmonic representation of speech. In Interspeech, Portland, OR. Degottex, G., & Stylianou, Y. (2012). A full-band adaptive harmonic representation of speech. In Interspeech, Portland, OR.
Zurück zum Zitat Degottex, G., & Stylianou, Y. (2013). Analysis and synthesis of speech using an adaptive full-band harmonic model. IEEE Transactions on Audio, Speech, and Language Processing, 21(10), 2085–2095.CrossRef Degottex, G., & Stylianou, Y. (2013). Analysis and synthesis of speech using an adaptive full-band harmonic model. IEEE Transactions on Audio, Speech, and Language Processing, 21(10), 2085–2095.CrossRef
Zurück zum Zitat Fant, G. (1960). Acoustic theory of speech production. Gravenhage: Mounton and Co. Fant, G. (1960). Acoustic theory of speech production. Gravenhage: Mounton and Co.
Zurück zum Zitat Griffin, D. W., & Lim, J. S. (1988). Multiband excitation vocoder. IEEE Transactions on Acoustics, Speech, and Signal Processing, 36(8), 1223–1235.CrossRefMATH Griffin, D. W., & Lim, J. S. (1988). Multiband excitation vocoder. IEEE Transactions on Acoustics, Speech, and Signal Processing, 36(8), 1223–1235.CrossRefMATH
Zurück zum Zitat Halabi, N. (2016). Modern standard arabic phonetics for speech synthesis. PhD Thesis, University of Southampton. Halabi, N. (2016). Modern standard arabic phonetics for speech synthesis. PhD Thesis, University of Southampton.
Zurück zum Zitat Hedlin, P. (1981). A tone-oriented voice-excited vocoder. In Proceedings of the IEEE international conference on accoustics, speech and signal processing, Atlanta (pp. 205–208). Hedlin, P. (1981). A tone-oriented voice-excited vocoder. In Proceedings of the IEEE international conference on accoustics, speech and signal processing, Atlanta (pp. 205–208).
Zurück zum Zitat Kafentzis, G. P., Degottex, G., Rosec, O., & Stylianou, Y. (2013). Time-scale modifications based on an adaptive harmonic model. In Proceedings of IEEE international conference on acoustic, speech, and signal processing (ICASSP), Vancouver, CA. Kafentzis, G. P., Degottex, G., Rosec, O., & Stylianou, Y. (2013). Time-scale modifications based on an adaptive harmonic model. In Proceedings of IEEE international conference on acoustic, speech, and signal processing (ICASSP), Vancouver, CA.
Zurück zum Zitat Kafentzis, G. P., Degottex, G., Rosec, O., & Stylianou, Y. (2014). Pitch modifications of speech based on an adaptive harmonic model. In Proceedings of IEEE international conference on acoustic, speech, and signal processing (ICASSP), Vancouver, CA. Kafentzis, G. P., Degottex, G., Rosec, O., & Stylianou, Y. (2014). Pitch modifications of speech based on an adaptive harmonic model. In Proceedings of IEEE international conference on acoustic, speech, and signal processing (ICASSP), Vancouver, CA.
Zurück zum Zitat Kafentzis, G.P., & Stylianou, Y. (2016). High-resolution sinusoidal modeling of unvoiced speech. In International Conference on acoustics, speech, and signal processing, Shanghai, China. Kafentzis, G.P., & Stylianou, Y. (2016). High-resolution sinusoidal modeling of unvoiced speech. In International Conference on acoustics, speech, and signal processing, Shanghai, China.
Zurück zum Zitat Kafentzis, G. P., Pantazis, Y., Rosec, O., & Stylianou, Y. (2012). An extension of the adaptive quasi-harmonic model. In Proceedings of IEEE international conference on acoustic, speech, and signal processing (ICASSP), Kyoto. Kafentzis, G. P., Pantazis, Y., Rosec, O., & Stylianou, Y. (2012). An extension of the adaptive quasi-harmonic model. In Proceedings of IEEE international conference on acoustic, speech, and signal processing (ICASSP), Kyoto.
Zurück zum Zitat Kafentzis, G. P., Rosec, O., & Stylianou, Y. (2013). On the modeling of voiceless stop sounds of speech using adaptive quasi-harmonic models. In Interspeech, Portland, OR. Kafentzis, G. P., Rosec, O., & Stylianou, Y. (2013). On the modeling of voiceless stop sounds of speech using adaptive quasi-harmonic models. In Interspeech, Portland, OR.
Zurück zum Zitat Kafentzis, G. P., Rosec, O., & Stylianou, Y. (2014). Robust full-band adaptive sinusoidal analysis and synthesis of speech. In Proceedings of IEEE international conference on acoustic, speech, and signal processing (ICASSP), Kyoto. Kafentzis, G. P., Rosec, O., & Stylianou, Y. (2014). Robust full-band adaptive sinusoidal analysis and synthesis of speech. In Proceedings of IEEE international conference on acoustic, speech, and signal processing (ICASSP), Kyoto.
Zurück zum Zitat Kafentzis, G.P., Yakoumaki, T., Mouchtaris, A., & Stylianou, Y. (2014). Analysis of emotional speech using an adaptive sinusoidal model. In European Signal Processing Conference (EUSIPCO), Lisbon. Kafentzis, G.P., Yakoumaki, T., Mouchtaris, A., & Stylianou, Y. (2014). Analysis of emotional speech using an adaptive sinusoidal model. In European Signal Processing Conference (EUSIPCO), Lisbon.
Zurück zum Zitat Kominek, J., & Black, A.W. (2003). The CMU ARCTIC databases for speech synthesis. Technical Report CMU-LTI-03-177, Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA Kominek, J., & Black, A.W. (2003). The CMU ARCTIC databases for speech synthesis. Technical Report CMU-LTI-03-177, Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA
Zurück zum Zitat Kominek, J., & Black, A. W. (2004). The CMU ARCTIC speech databases. In 5th ISCA speech synthesis workshop, Pittsburgh (pp. 223-224). Kominek, J., & Black, A. W. (2004). The CMU ARCTIC speech databases. In 5th ISCA speech synthesis workshop, Pittsburgh (pp. 223-224).
Zurück zum Zitat Laroche, J., Stylianou, Y., & Moulines, E. (1993). HNM: A simple, effecient harmonic plus noisemodel for speech. In Workshop on applications of signal processing to audio and acoustics (WASPAA), New Paltz, NY (pp. 169-172). Laroche, J., Stylianou, Y., & Moulines, E. (1993). HNM: A simple, effecient harmonic plus noisemodel for speech. In Workshop on applications of signal processing to audio and acoustics (WASPAA), New Paltz, NY (pp. 169-172).
Zurück zum Zitat Makhoul, J. (1975). Linear prediction: A tutorial review. Proceedings of the IEEE, 63, 561–580.CrossRef Makhoul, J. (1975). Linear prediction: A tutorial review. Proceedings of the IEEE, 63, 561–580.CrossRef
Zurück zum Zitat McAulay, R. J., & Quatieri, T. F. (1986). Speech analysis/synthesis based on a sinusoidal representation. IEEE Transactions on Acoustics, Speech, and Signal Processing, 34, 744–754.CrossRef McAulay, R. J., & Quatieri, T. F. (1986). Speech analysis/synthesis based on a sinusoidal representation. IEEE Transactions on Acoustics, Speech, and Signal Processing, 34, 744–754.CrossRef
Zurück zum Zitat McAulay, R., & Quatieri, T. T. Magnitude-only reconstruction using a sinusoidal speech model. In Proceedings of ICASSP-84, SanDiego, CA, session 27.6.1. Mar. x McAulay, R., & Quatieri, T. T. Magnitude-only reconstruction using a sinusoidal speech model. In Proceedings of ICASSP-84, SanDiego, CA, session 27.6.1. Mar. x
Zurück zum Zitat Oomen, W., & den Brinker, A. C. (1999). Sinusoids plus noise modelling for audio signals. In 17th international conference: High-quality audio coding, Florence. Oomen, W., & den Brinker, A. C. (1999). Sinusoids plus noise modelling for audio signals. In 17th international conference: High-quality audio coding, Florence.
Zurück zum Zitat Pantazis, Y., Rosec, O., & Stylianou, Y. (2008). On the properties of a time-varying quasi-harmonic model of speech. In Interspeech, Brisbane. Pantazis, Y., Rosec, O., & Stylianou, Y. (2008). On the properties of a time-varying quasi-harmonic model of speech. In Interspeech, Brisbane.
Zurück zum Zitat Pantazis, Y., Rosec, O., & Stylianou, Y. (2011). Adaptive AM-FM signal decomposition with application to speech analysis. IEEE Transactions on Audio, Speech, and Language Processing, 19, 290–300.CrossRef Pantazis, Y., Rosec, O., & Stylianou, Y. (2011). Adaptive AM-FM signal decomposition with application to speech analysis. IEEE Transactions on Audio, Speech, and Language Processing, 19, 290–300.CrossRef
Zurück zum Zitat Pantazis, Y., Tzedakis, G., Rosec, O., & Stylianou, Y. (2010). Analysis/synthesis of Speech based on an daptive Quasi-Harmonic plus Noise Model. In Proceedings of the IEEE ICASSP, Dallas, TX. Pantazis, Y., Tzedakis, G., Rosec, O., & Stylianou, Y. (2010). Analysis/synthesis of Speech based on an daptive Quasi-Harmonic plus Noise Model. In Proceedings of the IEEE ICASSP, Dallas, TX.
Zurück zum Zitat Quatieri, T. F. (2002). Discrete-time speech signal processing. Engewood Cliffs, NJ: Prentice Hall. Quatieri, T. F. (2002). Discrete-time speech signal processing. Engewood Cliffs, NJ: Prentice Hall.
Zurück zum Zitat Quatieri, T. F., & McAuley, R. J. (2002). Audio signal processing based on sinusoidal analysis/synthesis. In M. Kahrs & K. Brandenburg (Eds.), Applications of digital signal processing to audio and acoustics, Chapt 9 (pp. 343–416). Norwell, MA: Kluwer Academic Publishers.CrossRef Quatieri, T. F., & McAuley, R. J. (2002). Audio signal processing based on sinusoidal analysis/synthesis. In M. Kahrs & K. Brandenburg (Eds.), Applications of digital signal processing to audio and acoustics, Chapt 9 (pp. 343–416). Norwell, MA: Kluwer Academic Publishers.CrossRef
Zurück zum Zitat Rabiner, L. R., & Schafer, R. W. (1978). Digital processing of speech signals. Englewood Cliffs, NJ: Prentice Hall. Rabiner, L. R., & Schafer, R. W. (1978). Digital processing of speech signals. Englewood Cliffs, NJ: Prentice Hall.
Zurück zum Zitat Stylianou, Y. (1996). Harmonic plus noise models for speech, combined with statistical methods, for speech and speaker modification. PhD Thesis, E.N.S.T - Paris. Stylianou, Y. (1996). Harmonic plus noise models for speech, combined with statistical methods, for speech and speaker modification. PhD Thesis, E.N.S.T - Paris.
Zurück zum Zitat Stylianou, Y. (2001). Applying the harmonic plus noise model in concatenative speech synthesis. IEEE Transactions on Speech and Audio Processing, 9(1), 21–29.CrossRef Stylianou, Y. (2001). Applying the harmonic plus noise model in concatenative speech synthesis. IEEE Transactions on Speech and Audio Processing, 9(1), 21–29.CrossRef
Zurück zum Zitat Tabet, Y., Boughazi, M., & Affifi, S. (2015). A tutorial on speech synthesis models. Procedia Computer Science, 73, 48–55.CrossRef Tabet, Y., Boughazi, M., & Affifi, S. (2015). A tutorial on speech synthesis models. Procedia Computer Science, 73, 48–55.CrossRef
Zurück zum Zitat The ITU Radiocommunication Assembly. (2003). Itu-r bs.1284-1: General methods for the subjective assessment of sound quality, Technical Report, ITU. The ITU Radiocommunication Assembly. (2003). Itu-r bs.1284-1: General methods for the subjective assessment of sound quality, Technical Report, ITU.
Metadaten
Titel
Speech analysis and synthesis with a refined adaptive sinusoidal representation
verfasst von
Youcef Tabet
Mohamed Boughazi
Saddek Afifi
Publikationsdatum
15.05.2018
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 3/2018
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-018-9519-4

Weitere Artikel der Ausgabe 3/2018

International Journal of Speech Technology 3/2018 Zur Ausgabe

Neuer Inhalt