Top

Published in:

2016 | OriginalPaper | Chapter

Designing High-Coverage Multi-level Text Corpus for Non-professional-voice Conservation

Authors : Markéta Jůzová, Daniel Tihelka, Jindřich Matoušek

Published in: Speech and Computer

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

The paper focuses on building a text corpus suitable for the conservation of the voices of non-professional speakers, who are loosing their voices due to serious healthy problems. Since we do not know in advance, how many sentences a speaker will be able to record, we propose a multi-level greedy algorithm which can ensure the coverage of selected texts by various phonetic and prosodic units. The comparison of such coverage is presented for various corpus sizes, and compared to the generic TTS corpus recorded by a healthy professional speaker.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Design of a Speech Corpus for Research on Cross-Lingual Prosody Transfer

next chapter Designing Syllable Models for an HMM Based Speech Recognition System

We consider here a diphone as TTS system unit, i.e. signal from a middle of one phone to the middle of the next phone. Nevertheless, the numbers presented will be the same when considering a diphone as the join of any two neighbouring phones.

Let us emphasize that we work with text now, so we can only expect how the sentence will, or should be pronounced.

Let us note that the real unnatural artefacts appearance will be much lower and it will be influenced by other factors like recording style, speech units segmentation, unit selection criteria etc. Those will not be influenced by the speech corpus per se.

Baumann, T., Schlangen, D.: Evaluating prosodic processing for incremental speech synthesis. In: INTERSPEECH, pp. 438–441, Portland, OR, USA (2012)

Erro, D., Hernaez, I., Alonso, A., García-Lorenzo, D., Navas, E., Ye, J., Arzelus, H., Jauk, I., Hy, N.Q., Magariňos, C., Pérez-Ramón, R., Sulír, M., Tian, X., Wang, X., Vitoria, B.: Personalized synthetic voices for speaking impaired: website and app. In: INTERSPEECH, pp. 1251–1254, Dresden, Germany (2015)

Examples: Personalised speech synthesis samples. https://docs.google.com/presentation/d/1iWeeWFW-jYIO1fMV9CxImC261QoPUUiZKOsT9w8UxXc/present#slide=id.gc411cf050_0_0

Hanzlíček, Z., Matoušek, J., Tihelka, D.: Experiments on reducing footprint of unit selection TTS system. In: Habernal, I. (ed.) TSD 2013. LNCS, vol. 8082, pp. 249–256. Springer, Heidelberg (2013)

Hanzlíček, Z., Romportl, J., Matoušek, J.: Voice conservation: towards creating a speech-aid system for total Laryngectomees. In: Kelemen, J., Romportl, J., Zackova, E. (eds.) Beyond Artificial Intelligence. TIEI, vol. 4, pp. 205–214. Springer, Heidelberg (2013)

Jůzová, M., Romportl, J., Tihelka, D.: Speech corpus preparation for voice banking of Laryngectomised patients. In: Král, P., Matoušek, V. (eds.) TSD 2015. LNCS, vol. 9302, pp. 282–290. Springer, Heidelberg (2015)CrossRef

Matoušek, J., Psutka, J., Krůta, J.: Design of speech corpus for text-to-speech synthesis. In: Eurospeech 2001 - Interspeech, Proceedings of the 7th European Conference on Speech Communication and Technology, pp. 2047–2050, Aalborg, Denmark (2001)

Matoušek, J., Romportl, J.: On Building phonetically and Prosodically rich speech corpus for text-to-speech synthesis. In: Proceedings of the 2nd IASTED International Conference on Computational Intelligence, pp. 442–447. ACTA Press, San Francisco (2006)

Matoušek, J., Tihelka, D., Romportl, J.: Current state of Czech text-to-speech system ARTIC. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 439–446. Springer, Heidelberg (2006)CrossRef

10.

Merritt, T., Clark, R.A.J., Wu, Z., Yamagishi, J., King, S.: Deep neural network-guided unit selection synthesis. In: Proceedings of ICASSP (2016)

11.

Nakamura, K., Toda, T., Saruwatari, H., Shikano, K.: The use of air-pressure sensor in electrolaryngeal speech enhancement based on statistical voice conversion. In: INTERSPEECH, pp. 1628–1631, Makuhari, Japan (2010)

12.

Romportl, J., Matoušek, J.: Formal prosodic structures and their application in NLP. In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS (LNAI), vol. 3658, pp. 371–378. Springer, Heidelberg (2005)CrossRef

13.

Romportl, J., Řepová, B., Betka, J.: Vocal rehabilitation of Laryngectomised patients by personalised computer speech synthesis. In: Phoniatrics. European Manual of Medicine. Springer, Heidelberg (2015) (in press)

14.

Tihelka, D., Matoušek, J.: Unit selection and its relation to symbolic prosody: a new approach. In: INTERSPEECH 2006 – ICSLP, Proceedings of 9th ICSLP, vol. 1, pp. 2042–2045. ISCA, Bonn (2006)

15.

Yamagishi, J., Veaux, C., King, S., Renals, S.: Speech synthesis technologies for individuals with vocal disabilities: voice banking and reconstruction. Acoust. Sci. Technol. 33, 1–5 (2012)CrossRef

Title: Designing High-Coverage Multi-level Text Corpus for Non-professional-voice Conservation
Authors: Markéta Jůzová
Daniel Tihelka
Jindřich Matoušek
Publisher: Springer International Publishing
Book: Speech and Computer
Print ISBN: 978-3-319-43957-0

Electronic ISBN: 978-3-319-43958-7

Copyright Year: 2016
DOI: https://doi.org/10.1007/978-3-319-43958-7_24

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner