nach oben

International Journal of Speech Technology

Erschienen in:

07.10.2017

A waveform concatenation technique for text-to-speech synthesis

verfasst von: Soumya Priyadarsini Panda, Ajit Kumar Nayak

Erschienen in: International Journal of Speech Technology | Ausgabe 4/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Designing text-to-speech systems capable of producing natural sounding speech segments in different Indian languages is a challenging and ongoing problem. Due to the large number of possible pronunciations in different Indian languages, a number of speech segments are needed to be stored in the speech database while a concatenative speech synthesis technique is used to achieve highly natural speech segments. However, the large speech database size makes it unusable for small hand held devices or human computer interactive systems with limited storage resources. In this paper, we proposed a fraction-based waveform concatenation technique to produce intelligible speech segments from a small footprint speech database. The results of all the experiments performed shows the effectiveness of the proposed technique in producing intelligible speech segments in different Indian languages even with very less storage and computation overhead compared to the existing syllable-based technique.

Vorheriger Artikel High payload multi-channel dual audio watermarking algorithm based on discrete wavelet transform and singular value decomposition

Nächster Artikel Performance enhancement of speaker identification systems using speech encryption and cancelable features

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Adell, J., Escudero, D., & Bonafonte, A. (2012). Production of filled pauses in concatenative speech synthesis based on the underlying fluent sentence. Speech Communication, 54(3), 459–476.CrossRef

Alías, F., Formiga, L., & Llora, X. (2011). Efficient and reliable perceptual weight tuning for unit-selection text-to-speech synthesis based on active interactive genetic algorithms: A proof-of-concept. Speech Communication, 53(5), 786–800.CrossRef

Bellur, A., Narayan, K. B., Krishnan, K. R., Murthy, H. (2011). Prosody modeling for syllable-based concatenative speech synthesis of Hindi and Tamil. In IEEE National conference on communications (NCC) (pp. 1–5).

Benoı̂t, C., & Le Goff, B. (1998). Audio-visual speech synthesis from French text: Eight years of models, designs and evaluation at the ICP. Speech Communication, 26(1), 117–129.CrossRef

Black, A., & Tokuda, K. (2005). The blizzard challenge 2005: Evaluating corpus-based speech synthesis on common databases. In Proceedings of interspeech.

Black, A. W., & Taylor, P. A. (1997). Automatically clustering similar units for unit selection in speech synthesis.

Cai, M. Q., Ling, Z. H., & Dai, L. R. (2015). Statistical parametric speech synthesis using a hidden trajectory model. Speech Communication, 72, 149–159.CrossRef

Christiansen, C., Pedersen, M. S., & Dau, T. (2010). Prediction of speech intelligibility based on an auditory preprocessing model. Speech Communication, 52(7–8), 678–692.CrossRef

Handley, Z. (2009). Is text-to-speech synthesis ready for use in computer-assisted language learning? Speech Communication, 51(10), 906–919.CrossRef

Hunt, A. J., & Black, A. W. (1996). Unit selection in a concatenative speech synthesis system using a large speech database. In IEEE International conference on acoustics, speech, and signal processing (pp. 373–376).

Iida, A., Campbell, N., Higuchi, F., & Yasumura, M. (2003). A corpus-based speech synthesis system with emotion. Speech Communication, 40(1), 161–187.CrossRefMATH

Kishore, S. P., & Black, A. W. (2003). Unit size in unit selection speech synthesis. In INTERSPEECH.

Kishore, S. P., Black, A. W., Kumar, R., & Sangal, R. (2003). Experiments with unit selection speech databases for Indian languages. In National seminar on language technology tools, Hyderabad, India.

Kishore, S. P., Kumar, R., & Sangal, R. (2002). A data driven synthesis approach for Indian languages using syllable as basic unit. In Proceedings of international conference on NLP (ICON) (pp. 311–316).

Li, Y., Tao, J., Hirose, K., Xu, X., & Lai, W. (2015). Hierarchical stress modeling and generation in mandarin for expressive text-to-speech. Speech Communication, 72, 59–73.CrossRef

Morton, H., Gunson, N., Marshall, D., McInnes, F., Ayres, A., & Jack, M. (2011). Usability assessment of text-to-speech synthesis for additional detail in an automated telephone banking system. Computer Speech & Language, 25(2), 341–362.CrossRef

Murthy, H. A., Bellur, A., Viswanath, V., Narayanan, B., Susan, A., Kasthuri, G., …, Prahallad, K. (2010). Building unit selection speech synthesis in Indian languages: An initiative by an Indian consortium. In Proceedings of COCOSDA, Kathmandu, Nepal.

Narendra, N. P., Rao, K. S., Ghosh, K., Vempada, R. R., & Maity, S. (2011). Development of syllable-based text to speech synthesis system in Bengali. International Journal of Speech Technology, 14, 167–181.CrossRef

Panda, S. P., & Nayak, A. K. (2014). Integration of fuzzy if-then rule with waveform concatenation technique for text-to-speech synthesis in Odia. In IEEE International conference on information technology (ICIT) (pp. 88–93).

Panda, S. P., & Nayak, A. K. (2014). A rule-based concatenative approach to speech synthesis in Indian language text-to-speech systems. In Intelligent computing, communication and devices (pp. 523–531). New Delhi: Springer.

Panda, S. P., & Nayak, A. K. (2015). An efficient model for text-to-speech synthesis in Indian languages. International Journal of Speech Technology, 18(3), 305–315.CrossRef

Panda, S. P., & Nayak, A. K. (2016). Modified Rule-based concatenative technique for intelligible speech synthesis in Indian languages. Advanced Science Letters, 22(2), 557–563.CrossRef

Panda, S. P., & Nayak, A. K. (2016). Automatic speech segmentation in syllable centric speech recognition system. International Journal of Speech Technology, 19(1), 9–18.CrossRef

Panda, S. P., Nayak, A. K., & Patnaik, S. (2015). Text-to-speech synthesis with an Indian language perspective. International Journal of Grid and Utility Computing, 6(3–4), 170–178.CrossRef

Patil, H., Patel, T. B., Shah, N. J., Sailor, H. B., Krishnan, R., Kasthuri, G. R., … Murthy, H. (2013). A syllable-based framework for unit selection synthesis in 13 Indian languages. In IEEE International conference on Asian spoken language research and evaluation (O-COCOSDA/CASLRE) (pp. 1–8).

Prahallad, K., Vadapalli, A., Elluru, N., Mantena, G., Pulugundla, B., Bhaskararao, P., … Black, A. W. (2013). The blizzard challenge 2013–Indian language task. In Blizzard challenge workshop.

Prasanna, S. M., Reddy, B. S., & Krishnamoorthy, P. (2009). Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Transactions on Audio, Speech, and Language Processing, 17(4), 556–565.CrossRef

Raghavendra, E. V., Desai, S., Yegnanarayana, B., Black, A. W., & Prahallad, K. (2008). Global syllable set for building speech synthesis in Indian languages. In IEEE Spoken language technology workshop, 2008 (SLT 2008) (pp. 49–52).

Rama, J., Ramakrishnan, A. G., Muralishankar, R., & Prathibha, R. (2002). A complete text-to-speech synthesis system in Tamil. In WSS’ proceedings (pp. 191–194).

Reddy, V. R., & Rao, K. S. (2013). Two-stage intonation modeling using feed forward neural networks for syllable based text-to-speech synthesis. Computer Speech & Language, 27(5), 1105–1126.CrossRef

Retrieved July 12, 2017, from http://tdil.mit.gov.in/.

Retrieved July 12, 2017, from http://dhvani.sourceforge.net.

Retrieved July 12, 2017, from http://www.unicode.org/.

Rojc, M., & Kačič, Z. (2007). Time and space-efficient architecture for a corpus-based text-to-speech synthesis system. Speech Communication, 49(3), 230–249.CrossRef

Romsdorfer, H., & Pfister, B. (2007). Text analysis and language identification for polyglot text-to-speech synthesis. Speech communication, 49(9), 697–724.CrossRef

Talesara, S., Patil, H. A., Patel, T., Sailor, H., & Shah, N. A. (2013). Novel Gaussian filter-based automatic labeling of speech data for TTS system in Gujarati language. In ICALP proceedings (pp. 139–142).

Thomas, S., Rao, M. N., Murthy, H., & Ramalingam, C. S. (2006). Natural sounding TTS based on syllable-like units. In IEEE 14th European signal processing conference (pp. 1–5).

Tiomkin, S., Malah, D., Shechtman, S., & Kons, Z. (2011). A Hybrid Text-to-speech system that combines concatenative and statistical synthesis units. IEEE Transactions on Audio, Speech and Language Processing, 19, 1278–1288.CrossRef

Toman, M., Pucher, M., Moosmüller, S., & Schabus, D. (2015). Unsupervised and phonologically controlled interpolation of Austrian German language varieties for speech synthesis. Speech Communication, 72, 176–193.CrossRef

Torres, H. M., & Gurlekian, J. A. (2008). Acoustic speech unit segmentation for concatenative synthesis. Computer Speech & Language, 22(2), 196–206.CrossRef

Viswanathan, M. (2005). Measuring speech quality for text-to-speech systems: Development and assessment of a modified mean opinion score (MOS) scale. Computer Speech and Language, 19, 55–83.CrossRef

Xia, X. J., Ling, Z. H., Jiang, Y., & Dai, L. R. (2014). HMM-based unit selection speech synthesis using log likelihood ratios derived from perceptual data. Speech Communication, 63, 27–37.CrossRef

Yeh, C. Y., Chang, S. C., & Hwang, S. H. (2013). A consistency analysis on an acoustic module for Mandarin text-to-speech. Speech Communication, 55(2), 266–277.CrossRef

York, J., & Pendharkar, P. C. (2004). Human–computer interaction issues for mobile computing in a variable work context. International Journal of Human-Computer Studies, 60(5), 771–797.CrossRef

Titel: A waveform concatenation technique for text-to-speech synthesis
verfasst von: Soumya Priyadarsini Panda
Ajit Kumar Nayak
Publikationsdatum: 07.10.2017
Verlag: Springer US
Erschienen in: International Journal of Speech Technology / Ausgabe 4/2017
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-017-9463-8

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Buchstaben, die aus einem Megaphon kommen/© MicroStockHub/Getty Images/iStock, Digitale Lieferkette/© zapp2photo / stock.adobe.com, Arbeitszeit/© granata68 / Fotolia, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Sustainibility Finance/© Robert Kneschke / stock.adobe.com / Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 4/2017

Spoken character classification using abductive network

Processing degraded speech for text dependent speaker verification

Speech enhancement using Teager energy operated ERB-like perceptual wavelet packet decomposition

On the application of quantum clustering on speech data

A heterogeneous speech feature vectors generation approach with hybrid hmm classifiers

Efficient compression and reconstruction of speech signals using compressed sensing

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.