nach oben

2018 | Buch

Kapitel lesen Erstes Kapitel lesen

Epoch Synchronous Overlap Add (ESOLA)

A Concatenative Synthesis Procedure for Speech

verfasst von: Prof. Asoke Kumar Datta

Verlag: Springer Singapore

Buchreihe : Signals and Communication Technology

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

This book presents details of a text-to-speech synthesis procedure using epoch synchronous overlap add (ESOLA), and provides a solution for development of a text-to-speech system using minimum data resources compared to existing solutions. It also examines most natural speech signals including random perturbation in synthesis. The book is intended for students, researchers and industrial practitioners in the field of text-to-speech synthesis.

Inhaltsverzeichnis

Frontmatter

Chapter 1. Introduction to ESOLA

Abstract

Speech is the primary and most common and efficient mode of communication between human beings. It is the real natural language. For educated people the other mode of communication is writing. People write in textual language. It is not only desirable but also essential at least for the common illiterate mass of the developing countries that the information dissemination be made via speech mode which is the most natural mode of human communication. With the unprecedented expansion of Information Technology (IT) invading the life of the common man it is only natural that an IT solution will be available for this. Speech synthesis, the automatic and artificial generation of the speech signal by a machine, is this solution.

Asoke Kumar Datta

Chapter 2. Epoch Synchronous Overlap Add (Esola) Algorithm

Abstract

Concatenative speech synthesis, the recent trend for synthesizing speech, uses real speech signal units for constructing utterances. Since speech is a continuous process one has to keep in mind the dynamics of the continuity so that it is not jeopardised at the junction points. One of the core novelties of the new concatenative TTS (Text-To-Speech) system for SCB (Standard Colloquial Bengali) presented in this book is the new signal unit ‘partneme’ at the sub-phonetic level. The partnemes i.e., part of the phones, which are the smallest units, so far, being used as the signal units for the concatenation in ESOLA.

Asoke Kumar Datta

Chapter 3. State Phase Analysis: PDA/VDA Algorithm

Abstract

This chapter presents a method of analysis of speech signals using state-phase approach. This method directly makes segmentations of speech signal into its basic classes namely quasi-periodic, quasi-random and quiescent and at the same time detects pitch in the quasi-periodic segments. In the context of development of a speech synthesis system in a particular language, the detection of pitch period from continuous speech signals is necessary for the synthesis procedure as we have seen in the last chapter.

Asoke Kumar Datta

Chapter 4. Phonological Rules for TTS

Abstract

In speech synthesis, it is necessary to identify the graphemes in every word to be converted to speech. This chapter deals with this process normally referred to as text-to-phoneme or grapheme-to-phoneme conversion. Many rules for such conversion, known as phonology in linguistic parlance, have been proposed by the eminent linguists for this dialect, namely SCB. Unfortunately these rules are not in the computer implementable form. This chapter presents the development of a rule-based G2P (Grapheme-To-Phoneme) conversion system for SCB.

Asoke Kumar Datta

Chapter 5. Intonation Rules for Text Reading

Abstract

Intonation is the cognitive aspect of the ensemble of pitch variations in the course of an utterance. This perceptual impression of speech melody correlates, to a first approximation, with changes in the fundamental frequency (F0) of the signal. This chapter presents the study of intonation patterns for text reading in Standard Colloquial Bengali for the development of rules and appropriate methods for using them in a text-to-speech synthesis system. In the model presented here, the pitch movements at the syllabic level are considered to be basic. Syllabic stylization uses the closest linear match using linear regression and t the pitch movements are expressed in semitones per second. The sentence level intonation pattern is the sequences of the word level patterns constituting the sentence. This chapter also presents the statistical method for the implementation of these obtained rule in TTS. The model is tested by synthesizing several sentences and the perceptual results are satisfactory.

Asoke Kumar Datta

Chapter 6. Shimmer, Jitter and Complexity Perturbation

Abstract

Normal human voice is not perfectly periodic, it is said to be quasi-periodic. Two successive pitch cycles do not produce exactly the same pressure waves. The variations are random in nature and occur for pitch, amplitude and complexity, referred to as jitter, shimmer and complexity perturbations respectively. The perceptual manifestation of these is the quality of sound. The excess of these makes speech harsh and absence of them produces a mechanical unnatural timbre. The study for finding out the values of shimmer, jitter and CP in the natural speech is thus necessary to make the synthesized speech signal to sound more natural. This chapter includes a comprehensive study of these carried out for Bangla. The goal of the studies in this chapter is to get the optimum values of these three parameters so that after inclusion of these values in the synthesized speech would increase the quality particularly naturalness. The signals of nonsense utterances, in adequate number, in CVC form are collected from the native SCB female speaker, whose voice is to be used for speech synthesis.

Asoke Kumar Datta

Backmatter

Titel: Epoch Synchronous Overlap Add (ESOLA)
verfasst von: Prof. Asoke Kumar Datta
Verlag: Springer Singapore
Electronic ISBN: 978-981-10-7016-7
Print ISBN: 978-981-10-7015-0
DOI: https://doi.org/10.1007/978-981-10-7016-7

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Beijing Auto Show 2024: Deutsche Hersteller wollen angreifen./© EKH-Pictures / Generated with AI / Stock.adobe.com, Buchstaben, die aus einem Megaphon kommen/© MicroStockHub/Getty Images/iStock, Digitale Lieferkette/© zapp2photo / stock.adobe.com, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Sustainibility Finance/© Robert Kneschke / stock.adobe.com / Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell

Springer Professional

Über dieses Buch

Inhaltsverzeichnis

Frontmatter

Chapter 1. Introduction to ESOLA

Chapter 2. Epoch Synchronous Overlap Add (Esola) Algorithm

Chapter 3. State Phase Analysis: PDA/VDA Algorithm

Chapter 4. Phonological Rules for TTS

Chapter 5. Intonation Rules for Text Reading

Chapter 6. Shimmer, Jitter and Complexity Perturbation

Backmatter

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.