Top

Published in:

2021 | OriginalPaper | Chapter

Cross-Gender and Age Speech Conversion Using Hidden Markov Model Based on Cepstral Coefficients Conversion

Authors : Meisi Aristia H. Gultom, Raditiana Patmasari, Inung Wijayanto, Sugondo Hadiyoso

Published in: Proceedings of the 1st International Conference on Electronics, Biomedical Engineering, and Health Informatics

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Animation movies often use children’s characters and they need children aged 5–10 to do a dubbing. For cost efficiency, a speech conversion can be done to support dubbing a children’s speech. To deal with it, in this research we propose the method to converting an adult’s speech to children’s speech. The contribution of this study is to design a signal processing algorithm to perform the conversion. In this study we propose a conversion method using the Hidden Markov Model (HMM) based on Cepstral Coefficients Conversion. The input is the speech of source speakers and the target speakers that using similar sentences. Features extraction, which is used is by extracted pitch (f₀) and cepstral in conversion process, and the modeling method is HMM. System output is converted speech signals that has similar characteristics with target speech signal. From the testing results, the most optimal HMM parameter is using 4-state. The highest increase of cepstral Root Mean Square Error (RMSE) before conversion and after conversion is equal to 32.35% and an average 25.83% which obtained from 400 samples. Mean Opinion Score (MOS) on a scale from 1 (converted speech is very dissimilar with the target speech) to 5 (converted speech is very similar with the target speech). It resulted an average value of 2.505 in terms of similarities and has an average value of 2.805 in terms of quality which obtained from 30 respondents. The proposed method is expected to be used in the animation film industry in order to simplify and make efficient the dubbing process.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Performance Comparison of Three Thermoelectric Generator Types for Waste Heat Recovery

next chapter Design of Electric Wheelchair with Joystick Controller as Personal Mobility for Disabled Person

Haikuo Yu (2013) English-Chinese Film Translation in China. J Transl 9(2):55–65

Ban R, Dıaz-Cintas J (2017) Language and translation in film. The Routledge Handbook of Translation Studies and Linguistics, (January):313–326)

de Reyes Lozano J, Julio de los Reyes Lozano (2017) Bringing all the Senses into Play: the Dubbing of Animated Films for Children. Palimpsestes 30:99–115

Ye H, Young S (2004) Voice conversion for unknown speakers. In: 8th International Conference on Spoken Language Processing, ICSLP 2004, number June, pp 1161–1164)

Stylianou Y, Olivier C (1998) A System for Voice Conversion Based On Probabilistic Classification and a Harmonic Plus Noise Model. In Proceedings of the 1998 In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 98, pp 281–284

Yathigiri A, Bathula M, Kothapalli S, Vekkot S, Tripathi S (2017). Voice transformation using pitch and spectral mapping. In 2017 International Conference on Advances in Computing, Communications and Informatics, ICACCI 2017, volume 2017-Janua, pp 1540–1544

Lawlor B, FaganAD (1999). A Novel Efficient Algorithm for Voice Gender Conversion. In International Congress of Phonetic Sciences, pp 77–80

Bharti SK, Koolagudi SG, Sreenivasa Rao K, Choudhary A, Kumar B. Voice conversion using linear prediction coefficients and artificial neural network. In: ACM International Conference Proceeding Series, pp 240–245

Mousa Allam (2010) Voice Conversion using Pitch Shifting Algorithm by Time Stretching with PSOLA and Re-Sampling. J Electr Eng 61(1):57–61

10.

Kianbakht Sajjad (2016) Dubbing and subtitling american comedy series. Eur J Engl Lang Lit Stud 4(4):65–80

11.

Moh Supardi and Dea Amanda Putri (2018) Audio-Visual Translation Techniques: Subtitling and Dubbing of Movie Soundtrack in Frozen: Let it Go. Buletin Al-Turas 24(2):399–414CrossRef

12.

Piazza Roberta (2010) Voice-over and self-narrative in film: A multimodal analysis of Antonioni’s When Love Fails (Tentato Suicidio). Lang Lit 19(2):173–195CrossRef

13.

Szarkowska A, Jankowska A (2012). Text-to-speech audio description of voiced-over films . A case study of audio described Volver in Polish. Emerging topics in translation: Audio description, pp 81–98

14.

Fernandez-Torn ́A, Matamala A (2015). Text-to-speech vs. Human voiced audio descriptions: A reception study in films dubbed into Catalan. J SpecIsed Transl 24(July):61–88

15.

Jacob A, Mythili P (2008) Developing a Child Friendly Text-to-Speech System. Advances in Human-Computer Interaction, 1–6)

16.

Watts Oliver, Yamagishi Junichi, King Simon, Berkling Kay (2010) Synthesis of child speech with HMM adaptation and voice conversion. IEEE Trans Audio Speech Lang Process 18(5):1005–1016CrossRef

17.

Watts O, Yamagishi J, King S, Berkling K. HMM Adaptation and voice conversion for the synthesis of child speech: A comparison. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp 2627–2630

18.

Reima Karhila DR, Sanand, MK, Smit P. Creating synthetic voices for children by adapting adult average voice using stacked transformations and VTLN. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing—Proceedings, number March, pp 4501–4504

19.

Prashanth Gurunath Shivakumar and Panayiotis Georgiou (2020) Transfer learning from adult to children for speech recognition: Evaluation, analysis and recommendations. Comput Speech Lang 63:1–15

20.

Banno H, Hata H, Morise M, Takahashi T, Irino T, Kawahara H (2007) Implementation of realtime STRAIGHT speech manipulation system: Report on its first implementation. Acoust Sci Technol 28:140–146CrossRef

Title: Cross-Gender and Age Speech Conversion Using Hidden Markov Model Based on Cepstral Coefficients Conversion
Authors: Meisi Aristia H. Gultom
Raditiana Patmasari
Inung Wijayanto
Sugondo Hadiyoso
Publisher: Springer Singapore
Book: Proceedings of the 1st International Conference on Electronics, Biomedical Engineering, and Health Informatics
Print ISBN: 978-981-336-925-2

Electronic ISBN: 978-981-336-926-9

Copyright Year: 2021
DOI: https://doi.org/10.1007/978-981-33-6926-9_13

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"