Skip to main content

2018 | Buch

Epoch Synchronous Overlap Add (ESOLA)

A Concatenative Synthesis Procedure for Speech

insite
SUCHEN

Über dieses Buch

This book presents details of a text-to-speech synthesis procedure using epoch synchronous overlap add (ESOLA), and provides a solution for development of a text-to-speech system using minimum data resources compared to existing solutions. It also examines most natural speech signals including random perturbation in synthesis. The book is intended for students, researchers and industrial practitioners in the field of text-to-speech synthesis.

Inhaltsverzeichnis

Frontmatter
Chapter 1. Introduction to ESOLA
Abstract
Speech is the primary and most common and efficient mode of communication between human beings. It is the real natural language. For educated people the other mode of communication is writing. People write in textual language. It is not only desirable but also essential at least for the common illiterate mass of the developing countries that the information dissemination be made via speech mode which is the most natural mode of human communication. With the unprecedented expansion of Information Technology (IT) invading the life of the common man it is only natural that an IT solution will be available for this. Speech synthesis, the automatic and artificial generation of the speech signal by a machine, is this solution.
Asoke Kumar Datta
Chapter 2. Epoch Synchronous Overlap Add (Esola) Algorithm
Abstract
Concatenative speech synthesis, the recent trend for synthesizing speech, uses real speech signal units for constructing utterances. Since speech is a continuous process one has to keep in mind the dynamics of the continuity so that it is not jeopardised at the junction points. One of the core novelties of the new concatenative TTS (Text-To-Speech) system for SCB (Standard Colloquial Bengali) presented in this book is the new signal unit ‘partneme’ at the sub-phonetic level. The partnemes i.e., part of the phones, which are the smallest units, so far, being used as the signal units for the concatenation in ESOLA.
Asoke Kumar Datta
Chapter 3. State Phase Analysis: PDA/VDA Algorithm
Abstract
This chapter presents a method of analysis of speech signals using state-phase approach. This method directly makes segmentations of speech signal into its basic classes namely quasi-periodic, quasi-random and quiescent and at the same time detects pitch in the quasi-periodic segments. In the context of development of a speech synthesis system in a particular language, the detection of pitch period from continuous speech signals is necessary for the synthesis procedure as we have seen in the last chapter.
Asoke Kumar Datta
Chapter 4. Phonological Rules for TTS
Abstract
In speech synthesis, it is necessary to identify the graphemes in every word to be converted to speech. This chapter deals with this process normally referred to as text-to-phoneme or grapheme-to-phoneme conversion. Many rules for such conversion, known as phonology in linguistic parlance, have been proposed by the eminent linguists for this dialect, namely SCB. Unfortunately these rules are not in the computer implementable form. This chapter presents the development of a rule-based G2P (Grapheme-To-Phoneme) conversion system for SCB.
Asoke Kumar Datta
Chapter 5. Intonation Rules for Text Reading
Abstract
Intonation is the cognitive aspect of the ensemble of pitch variations in the course of an utterance. This perceptual impression of speech melody correlates, to a first approximation, with changes in the fundamental frequency (F0) of the signal. This chapter presents the study of intonation patterns for text reading in Standard Colloquial Bengali for the development of rules and appropriate methods for using them in a text-to-speech synthesis system. In the model presented here, the pitch movements at the syllabic level are considered to be basic. Syllabic stylization uses the closest linear match using linear regression and t the pitch movements are expressed in semitones per second. The sentence level intonation pattern is the sequences of the word level patterns constituting the sentence. This chapter also presents the statistical method for the implementation of these obtained rule in TTS. The model is tested by synthesizing several sentences and the perceptual results are satisfactory.
Asoke Kumar Datta
Chapter 6. Shimmer, Jitter and Complexity Perturbation
Abstract
Normal human voice is not perfectly periodic, it is said to be quasi-periodic. Two successive pitch cycles do not produce exactly the same pressure waves. The variations are random in nature and occur for pitch, amplitude and complexity, referred to as jitter, shimmer and complexity perturbations respectively. The perceptual manifestation of these is the quality of sound. The excess of these makes speech harsh and absence of them produces a mechanical unnatural timbre. The study for finding out the values of shimmer, jitter and CP in the natural speech is thus necessary to make the synthesized speech signal to sound more natural. This chapter includes a comprehensive study of these carried out for Bangla. The goal of the studies in this chapter is to get the optimum values of these three parameters so that after inclusion of these values in the synthesized speech would increase the quality particularly naturalness. The signals of nonsense utterances, in adequate number, in CVC form are collected from the native SCB female speaker, whose voice is to be used for speech synthesis.
Asoke Kumar Datta
Backmatter
Metadaten
Titel
Epoch Synchronous Overlap Add (ESOLA)
verfasst von
Prof. Asoke Kumar Datta
Copyright-Jahr
2018
Verlag
Springer Singapore
Electronic ISBN
978-981-10-7016-7
Print ISBN
978-981-10-7015-0
DOI
https://doi.org/10.1007/978-981-10-7016-7

Neuer Inhalt