Skip to main content

2015 | OriginalPaper | Buchkapitel

Speech Driven by Artificial Larynx: Potential Advancement Using Synthetic Pitch Contours

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Despite a long history of development, the speech qualities achieved with artificial larynx devices are limited. This paper explores recent advances in prosodic speech processing and technology and assesses their potentials in improving the quality of speech with an artificial larynx – in particular, tone and intonation through pitch variation. Three approaches are discussed: manual pitch control, automatic pitch control and re-synthesized speech.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Stalker, J.L., Hawk, A.M., Smaldino, J.J.: The intelligibility and acceptability of speech produced by five different electronic artificial larynx devices. J. Commun. Disord. 1(5), 299–301 (1982)CrossRef Stalker, J.L., Hawk, A.M., Smaldino, J.J.: The intelligibility and acceptability of speech produced by five different electronic artificial larynx devices. J. Commun. Disord. 1(5), 299–301 (1982)CrossRef
2.
Zurück zum Zitat Pindzola, R.H., Moffet, B.: Comparison of ratings of four artificial larynxes. J. Commun. Disord. 21, 459–467 (1988)CrossRef Pindzola, R.H., Moffet, B.: Comparison of ratings of four artificial larynxes. J. Commun. Disord. 21, 459–467 (1988)CrossRef
3.
Zurück zum Zitat Modrzejewski, M., Olszewski, E., Wszol, W., Rerona, E., Strek, P.: Acoustic assessment of voice signal deformation after partial surgery of the larynx. Auris Nasus Larynx 26, 183–190 (1999)CrossRef Modrzejewski, M., Olszewski, E., Wszol, W., Rerona, E., Strek, P.: Acoustic assessment of voice signal deformation after partial surgery of the larynx. Auris Nasus Larynx 26, 183–190 (1999)CrossRef
4.
Zurück zum Zitat Alipour, F., Scherer, R.C., Finnegan, E.: Measures of spectral slope using an excised larynx model. J. Voice 26(4), 403–411 (2012)CrossRef Alipour, F., Scherer, R.C., Finnegan, E.: Measures of spectral slope using an excised larynx model. J. Voice 26(4), 403–411 (2012)CrossRef
5.
Zurück zum Zitat Ooe, K., Fukuda, T., Arai, F.: A new type of artificial larynx using a PZT ceramics vibrator as a sound source. IEEE/ASME Trans. Mechantronics 5(2), 221–225 (2000)CrossRef Ooe, K., Fukuda, T., Arai, F.: A new type of artificial larynx using a PZT ceramics vibrator as a sound source. IEEE/ASME Trans. Mechantronics 5(2), 221–225 (2000)CrossRef
6.
Zurück zum Zitat Niu, H.J., Won, M.X. Waq, S.P.: Enhancement of electronic artificial larynx speech by denoising. In: IEEE International Conference on Neural Networks & Signal Processing, pp. 908–911. IEEE Press (2003) Niu, H.J., Won, M.X. Waq, S.P.: Enhancement of electronic artificial larynx speech by denoising. In: IEEE International Conference on Neural Networks & Signal Processing, pp. 908–911. IEEE Press (2003)
7.
Zurück zum Zitat Schwarz, R., Huttner, B., Dollinger, M., Luegmair, G., Eysholdt, U., Schuster, M., Lohscheller, J., Gurlek, E.: Substitute voice production: quantification of PE segment vibrations using a biomechanical model. IEEE Trans. Biomed. Eng. 58(10), 2767–2776 (2011)CrossRef Schwarz, R., Huttner, B., Dollinger, M., Luegmair, G., Eysholdt, U., Schuster, M., Lohscheller, J., Gurlek, E.: Substitute voice production: quantification of PE segment vibrations using a biomechanical model. IEEE Trans. Biomed. Eng. 58(10), 2767–2776 (2011)CrossRef
8.
Zurück zum Zitat Sharifzadeh, H.R., McLoughlin, I.V., Ahmadi, F.: Reconstruction of normal sounding speech for laryngectomy patients through a modified CELP codec. IEEE Trans. Biomed. Eng. 57(10), 2448–2458 (2010)CrossRef Sharifzadeh, H.R., McLoughlin, I.V., Ahmadi, F.: Reconstruction of normal sounding speech for laryngectomy patients through a modified CELP codec. IEEE Trans. Biomed. Eng. 57(10), 2448–2458 (2010)CrossRef
9.
Zurück zum Zitat Ooe, K.: Development of controllable artificial larynx by neck myoelectric signal. Procedia Eng. 47, 869–872 (2012)CrossRef Ooe, K.: Development of controllable artificial larynx by neck myoelectric signal. Procedia Eng. 47, 869–872 (2012)CrossRef
10.
Zurück zum Zitat Stepp, C.A., Heaton, J.T., Rolland, R.G., Hillman, R.E.: Neck and face surface electromyography for prosthetic voice control after total laryngectomy. IEEE Trans. Neural Syst. Rehabil. Eng. 17(2), 146–155 (2009)CrossRef Stepp, C.A., Heaton, J.T., Rolland, R.G., Hillman, R.E.: Neck and face surface electromyography for prosthetic voice control after total laryngectomy. IEEE Trans. Neural Syst. Rehabil. Eng. 17(2), 146–155 (2009)CrossRef
11.
Zurück zum Zitat Heaton, J.T., Robertson, M., Griffin, C.: Development of a wireless electromyographically controlled electrolarynx voice prosthesis. In: 33rd Annual International Conference of the IEEE EMBS, pp. 5352–5355. IEEE Press (2011) Heaton, J.T., Robertson, M., Griffin, C.: Development of a wireless electromyographically controlled electrolarynx voice prosthesis. In: 33rd Annual International Conference of the IEEE EMBS, pp. 5352–5355. IEEE Press (2011)
12.
Zurück zum Zitat Uemi, N., Ifukube, T., Tamashi, T., Matsushima, J.: Design of a new electrolarynx having a pitch control function. In: IEEE lnternational Workshop on Robot and Human Communication, pp. 198–203. IEEE Press (1994) Uemi, N., Ifukube, T., Tamashi, T., Matsushima, J.: Design of a new electrolarynx having a pitch control function. In: IEEE lnternational Workshop on Robot and Human Communication, pp. 198–203. IEEE Press (1994)
13.
Zurück zum Zitat Blankinship, E., Beckwith, R.: Tools for expressive text-to-speech markup. In: Proceedings of the 14th Annual ACM Symposium on User Interface Software and Technology, pp. 159–160. ACM press (2001) Blankinship, E., Beckwith, R.: Tools for expressive text-to-speech markup. In: Proceedings of the 14th Annual ACM Symposium on User Interface Software and Technology, pp. 159–160. ACM press (2001)
14.
Zurück zum Zitat Győrbíró, N., Fábián, A., Hományi, G.: An activity recognition system for mobile phones. Mobile Netw. Appl. 14(1), 82–91 (2009)CrossRef Győrbíró, N., Fábián, A., Hományi, G.: An activity recognition system for mobile phones. Mobile Netw. Appl. 14(1), 82–91 (2009)CrossRef
15.
Zurück zum Zitat Carrino, F., Ridi, A., Ingold, R., Abou Khaled, O., Mugellini, E.: Gesture vs. gesticulation: a test protocol. In: Kurosu, M. (ed.) HCII/HCI 2013, Part IV. LNCS, vol. 8007, pp. 157–166. Springer, Heidelberg (2013) Carrino, F., Ridi, A., Ingold, R., Abou Khaled, O., Mugellini, E.: Gesture vs. gesticulation: a test protocol. In: Kurosu, M. (ed.) HCII/HCI 2013, Part IV. LNCS, vol. 8007, pp. 157–166. Springer, Heidelberg (2013)
16.
Zurück zum Zitat Plumpe, M., Meredith, S.: Which is more important in a concatenative text to speech system - pitch, duration or spectral discontinuity? In: Proceedings of the Third ESCA/COCOSDA Workshop on Speech Synthesis, Jenolan, Australia (1998) Plumpe, M., Meredith, S.: Which is more important in a concatenative text to speech system - pitch, duration or spectral discontinuity? In: Proceedings of the Third ESCA/COCOSDA Workshop on Speech Synthesis, Jenolan, Australia (1998)
17.
Zurück zum Zitat Klabbers, E., van Santen, J.P.H.: Control and prediction of the impact of pitch modification on synthetic speech quality. In: Eurospeech 2003 (2003) Klabbers, E., van Santen, J.P.H.: Control and prediction of the impact of pitch modification on synthetic speech quality. In: Eurospeech 2003 (2003)
18.
Zurück zum Zitat Gu, H.Y., Yang, C.C.: An HMM based pitch-contour generation method for mandarin speech synthesis. J. Inf. Sci. Eng. 27, 1561–1580 (2011)MathSciNet Gu, H.Y., Yang, C.C.: An HMM based pitch-contour generation method for mandarin speech synthesis. J. Inf. Sci. Eng. 27, 1561–1580 (2011)MathSciNet
19.
Zurück zum Zitat Chen, J.H., Kao, Y.A.: Pitch marking based on an adaptable filter and a peak-valley estimation method. Comput. Linguist. Chin. Lang. Process. 6(2), 1–12 (2012)MATH Chen, J.H., Kao, Y.A.: Pitch marking based on an adaptable filter and a peak-valley estimation method. Comput. Linguist. Chin. Lang. Process. 6(2), 1–12 (2012)MATH
20.
Zurück zum Zitat Hirschberg, J.: Accent and discourse context: assigning pitch accent in synthetic speech. In: AAAI 1990 Proceedings (1990) Hirschberg, J.: Accent and discourse context: assigning pitch accent in synthetic speech. In: AAAI 1990 Proceedings (1990)
21.
Zurück zum Zitat Hirschberg, J., Litman, D.: Disambiguating cue phrases in text and speech. In: Proceedings of COLING 1990, Helsinki, August (1990) Hirschberg, J., Litman, D.: Disambiguating cue phrases in text and speech. In: Proceedings of COLING 1990, Helsinki, August (1990)
22.
Zurück zum Zitat Hirschberg, J.: Pitch accent in context predicting intonational prominence from text. Artif. Intell. 63(1), 305–340 (1993)MathSciNetCrossRef Hirschberg, J.: Pitch accent in context predicting intonational prominence from text. Artif. Intell. 63(1), 305–340 (1993)MathSciNetCrossRef
23.
Zurück zum Zitat Chiou, G.I., Hwang, J.N.: Lipreading from color video. IEEE Trans. Image Process. 6(8), 1192–1195 (1997)CrossRef Chiou, G.I., Hwang, J.N.: Lipreading from color video. IEEE Trans. Image Process. 6(8), 1192–1195 (1997)CrossRef
24.
Zurück zum Zitat Zhou, Z.H., Zhao, G.Y., Pietikainen, M.: Towards a practical lip-reading system. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 137–144 (2011) Zhou, Z.H., Zhao, G.Y., Pietikainen, M.: Towards a practical lip-reading system. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 137–144 (2011)
25.
Zurück zum Zitat Li, M., Cheung, Y.M.: A novel motion based lip feature extraction for lip-reading. In: International Conference on Computational Intelligence and Security, CIS 2008, vol. 1, pp. 361–365 (2008) Li, M., Cheung, Y.M.: A novel motion based lip feature extraction for lip-reading. In: International Conference on Computational Intelligence and Security, CIS 2008, vol. 1, pp. 361–365 (2008)
26.
Zurück zum Zitat Garay-Vitoria, N., Abascal, J.: Text prediction systems: a survey. Univers. Access. Inf. Soc. 4(3), 188–203 (2006)CrossRef Garay-Vitoria, N., Abascal, J.: Text prediction systems: a survey. Univers. Access. Inf. Soc. 4(3), 188–203 (2006)CrossRef
27.
28.
Zurück zum Zitat Philips, L.: Hanging on the metaphone. Comput. Lang. 7(12), 38–43 (1990) Philips, L.: Hanging on the metaphone. Comput. Lang. 7(12), 38–43 (1990)
29.
Zurück zum Zitat Litman, D., Walker, M., Kearns, M.: Automatic detection of poor speech recognition at the dialogue level. In: Proceedings of the 37th Annual Meeting of the Association of Computational Linguistics, ACL 1999, College Park, pp. 309–316 (1999) Litman, D., Walker, M., Kearns, M.: Automatic detection of poor speech recognition at the dialogue level. In: Proceedings of the 37th Annual Meeting of the Association of Computational Linguistics, ACL 1999, College Park, pp. 309–316 (1999)
30.
Zurück zum Zitat Litman, D., Pan, S.: Empirically evaluating an adaptable spoken dialogue system. In: Proceedings of the 7th International Conference on User Modeling (UM), Banff, pp. 55–64 (1999) Litman, D., Pan, S.: Empirically evaluating an adaptable spoken dialogue system. In: Proceedings of the 7th International Conference on User Modeling (UM), Banff, pp. 55–64 (1999)
31.
Zurück zum Zitat Walker, M., Kamm, C., Litman, D.: Towards developing general models of usability with PARADISE. Nat. Lang. Eng. Special Issue on Best Practice Spoken Language Dialogue System Engineering 6, 363–377 (2000) Walker, M., Kamm, C., Litman, D.: Towards developing general models of usability with PARADISE. Nat. Lang. Eng. Special Issue on Best Practice Spoken Language Dialogue System Engineering 6, 363–377 (2000)
32.
Zurück zum Zitat Hirschberg, J., Litman, D., Swerts, M.: Prosodic and other cues to speech recognition failures. Speech Commun. 43(1), 155–175 (2004)CrossRef Hirschberg, J., Litman, D., Swerts, M.: Prosodic and other cues to speech recognition failures. Speech Commun. 43(1), 155–175 (2004)CrossRef
33.
Zurück zum Zitat Ostendorf, M., Byrne, B., Bacchiani, M., Finke, M., Gunawardana, A., Ross, K., Roweis, S., Shriberg, E., Talkin, D.,Waibel, A., Wheatley, B., Zeppenfeld, T.: Modeling systematic variations in pronunciation via a language-dependent hidden speaking mode. In: Report on 1996 CLSP/JHU Workshop on Innovative Techniques for Large Vocabulary Continuous Speech Recognition (1997) Ostendorf, M., Byrne, B., Bacchiani, M., Finke, M., Gunawardana, A., Ross, K., Roweis, S., Shriberg, E., Talkin, D.,Waibel, A., Wheatley, B., Zeppenfeld, T.: Modeling systematic variations in pronunciation via a language-dependent hidden speaking mode. In: Report on 1996 CLSP/JHU Workshop on Innovative Techniques for Large Vocabulary Continuous Speech Recognition (1997)
34.
Zurück zum Zitat Litman, D., Hirschberg, J., Swerts, M.: Predicting user reactions to system error. In: Proceedings of the ACL-2001, Toulouse, pp. 329–369 (2001) Litman, D., Hirschberg, J., Swerts, M.: Predicting user reactions to system error. In: Proceedings of the ACL-2001, Toulouse, pp. 329–369 (2001)
35.
Zurück zum Zitat Hirschberg, J., Litman, D., Swerts, M.: Identifying user corrections automatically in spoken dialogue systems. In: Procedings of the NAACL 2001, Pittsburgh, pp. 208–215 (2001) Hirschberg, J., Litman, D., Swerts, M.: Identifying user corrections automatically in spoken dialogue systems. In: Procedings of the NAACL 2001, Pittsburgh, pp. 208–215 (2001)
36.
Zurück zum Zitat Doddington, G., Liggett, W., Martin, A., Przybocki, M., Reynolds, D.: Sheep, goats, lambs and wolves: a statistical analysis of speaker performance in the NIST 1998 speaker recognition evaluation. In: Proceedings of the International Conference on Spoken Language Processing-98, Sydney, pp. 608–611 (1998) Doddington, G., Liggett, W., Martin, A., Przybocki, M., Reynolds, D.: Sheep, goats, lambs and wolves: a statistical analysis of speaker performance in the NIST 1998 speaker recognition evaluation. In: Proceedings of the International Conference on Spoken Language Processing-98, Sydney, pp. 608–611 (1998)
37.
Zurück zum Zitat Hirschberg, J., Litman, D., Swerts, M.: Prosodic cues to recognition errors. In: Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU 1999), Keystone, pp. 349–352 (1999) Hirschberg, J., Litman, D., Swerts, M.: Prosodic cues to recognition errors. In: Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU 1999), Keystone, pp. 349–352 (1999)
38.
Zurück zum Zitat Litman, D., Hirschberg, J., Swerts, M.: Characterizing and predicting corrections in spoken dialogue systems. Comput. Linguist. 32(3), 417–438 (2006)CrossRef Litman, D., Hirschberg, J., Swerts, M.: Characterizing and predicting corrections in spoken dialogue systems. Comput. Linguist. 32(3), 417–438 (2006)CrossRef
39.
Zurück zum Zitat Tamura, M., Masuko, T., Tokuda, K., Kobayashi, T.: Text-to-speech synthesis with arbitrary speaker’s voice from average voice. In: Proceedings of Eurospeech 2001, pp. 345–348 (2001) Tamura, M., Masuko, T., Tokuda, K., Kobayashi, T.: Text-to-speech synthesis with arbitrary speaker’s voice from average voice. In: Proceedings of Eurospeech 2001, pp. 345–348 (2001)
Metadaten
Titel
Speech Driven by Artificial Larynx: Potential Advancement Using Synthetic Pitch Contours
verfasst von
Hua-Li Jian
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-20684-4_30

Neuer Inhalt