Skip to main content
Top

An Efficient Speech Synthesizer: A Hybrid Monotonic Architecture for Text-to-speech via VAE & LPC-Net with Independent Sentence Length

  • 25-03-2025
Published in:

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The article explores the transformative potential of text-to-speech (TTS) technology, which converts written text into spoken language, making information accessible to a broader audience. It delves into the three key components of a 'talking computer's brain': text analysis, acoustic modeling, and vocoding, each playing a crucial role in generating natural-sounding speech. The text discusses the evolution of TTS synthesis, from early concatenative methods to advanced neural network-based approaches, highlighting the challenges and advancements in achieving naturalness and expressiveness. A significant portion of the article is dedicated to a proposed hybrid architecture that combines autoregressive and nonautoregressive models, aiming to generate high-quality speech with low latency and reduced complexity. This architecture leverages variational autoencoders (VAE) for prosody and phoneme extraction and LPC-Net for efficient vocoding, demonstrating superior performance in real-time speech synthesis. The article also addresses the ethical considerations and future prospects of TTS technology, emphasizing the need for responsible deployment and ongoing innovation in this field.

Not a customer yet? Then find out more about our access models now:

Individual Access

Start your personal individual access now. Get instant access to more than 164,000 books and 540 journals – including PDF downloads and new releases.

Starting from 54,00 € per month!    

Get access

Access for Businesses

Utilise Springer Professional in your company and provide your employees with sound specialist knowledge. Request information about corporate access now.

Find out how Springer Professional can uplift your work!

Contact us now
Title
An Efficient Speech Synthesizer: A Hybrid Monotonic Architecture for Text-to-speech via VAE & LPC-Net with Independent Sentence Length
Authors
Naveen Kumar Nallabala
Balamurugan Souprayen
Maruthamuthu Ramasamy
Seshu Kumar Penumarti
Nuthan Chandu Navuluri
Publication date
25-03-2025
Publisher
Springer US
Published in
Circuits, Systems, and Signal Processing / Issue 8/2025
Print ISSN: 0278-081X
Electronic ISSN: 1531-5878
DOI
https://doi.org/10.1007/s00034-025-03045-5
This content is only visible if you are logged in and have the appropriate permissions.