Skip to main content
Top

2021 | OriginalPaper | Chapter

Cross-Gender and Age Speech Conversion Using Hidden Markov Model Based on Cepstral Coefficients Conversion

Authors : Meisi Aristia H. Gultom, Raditiana Patmasari, Inung Wijayanto, Sugondo Hadiyoso

Published in: Proceedings of the 1st International Conference on Electronics, Biomedical Engineering, and Health Informatics

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Animation movies often use children’s characters and they need children aged 5–10 to do a dubbing. For cost efficiency, a speech conversion can be done to support dubbing a children’s speech. To deal with it, in this research we propose the method to converting an adult’s speech to children’s speech. The contribution of this study is to design a signal processing algorithm to perform the conversion. In this study we propose a conversion method using the Hidden Markov Model (HMM) based on Cepstral Coefficients Conversion. The input is the speech of source speakers and the target speakers that using similar sentences. Features extraction, which is used is by extracted pitch (f0) and cepstral in conversion process, and the modeling method is HMM. System output is converted speech signals that has similar characteristics with target speech signal. From the testing results, the most optimal HMM parameter is using 4-state. The highest increase of cepstral Root Mean Square Error (RMSE) before conversion and after conversion is equal to 32.35% and an average 25.83% which obtained from 400 samples. Mean Opinion Score (MOS) on a scale from 1 (converted speech is very dissimilar with the target speech) to 5 (converted speech is very similar with the target speech). It resulted an average value of 2.505 in terms of similarities and has an average value of 2.805 in terms of quality which obtained from 30 respondents. The proposed method is expected to be used in the animation film industry in order to simplify and make efficient the dubbing process.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Haikuo Yu (2013) English-Chinese Film Translation in China. J Transl 9(2):55–65 Haikuo Yu (2013) English-Chinese Film Translation in China. J Transl 9(2):55–65
2.
go back to reference Ban R, Dıaz-Cintas J (2017) Language and translation in film. The Routledge Handbook of Translation Studies and Linguistics, (January):313–326) Ban R, Dıaz-Cintas J (2017) Language and translation in film. The Routledge Handbook of Translation Studies and Linguistics, (January):313–326)
3.
go back to reference de Reyes Lozano J, Julio de los Reyes Lozano (2017) Bringing all the Senses into Play: the Dubbing of Animated Films for Children. Palimpsestes 30:99–115 de Reyes Lozano J, Julio de los Reyes Lozano (2017) Bringing all the Senses into Play: the Dubbing of Animated Films for Children. Palimpsestes 30:99–115
4.
go back to reference Ye H, Young S (2004) Voice conversion for unknown speakers. In: 8th International Conference on Spoken Language Processing, ICSLP 2004, number June, pp 1161–1164) Ye H, Young S (2004) Voice conversion for unknown speakers. In: 8th International Conference on Spoken Language Processing, ICSLP 2004, number June, pp 1161–1164)
5.
go back to reference Stylianou Y, Olivier C (1998) A System for Voice Conversion Based On Probabilistic Classification and a Harmonic Plus Noise Model. In Proceedings of the 1998 In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 98, pp 281–284 Stylianou Y, Olivier C (1998) A System for Voice Conversion Based On Probabilistic Classification and a Harmonic Plus Noise Model. In Proceedings of the 1998 In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 98, pp 281–284
6.
go back to reference Yathigiri A, Bathula M, Kothapalli S, Vekkot S, Tripathi S (2017). Voice transformation using pitch and spectral mapping. In 2017 International Conference on Advances in Computing, Communications and Informatics, ICACCI 2017, volume 2017-Janua, pp 1540–1544 Yathigiri A, Bathula M, Kothapalli S, Vekkot S, Tripathi S (2017). Voice transformation using pitch and spectral mapping. In 2017 International Conference on Advances in Computing, Communications and Informatics, ICACCI 2017, volume 2017-Janua, pp 1540–1544
7.
go back to reference Lawlor B, FaganAD (1999). A Novel Efficient Algorithm for Voice Gender Conversion. In International Congress of Phonetic Sciences, pp 77–80 Lawlor B, FaganAD (1999). A Novel Efficient Algorithm for Voice Gender Conversion. In International Congress of Phonetic Sciences, pp 77–80
8.
go back to reference Bharti SK, Koolagudi SG, Sreenivasa Rao K, Choudhary A, Kumar B. Voice conversion using linear prediction coefficients and artificial neural network. In: ACM International Conference Proceeding Series, pp 240–245 Bharti SK, Koolagudi SG, Sreenivasa Rao K, Choudhary A, Kumar B. Voice conversion using linear prediction coefficients and artificial neural network. In: ACM International Conference Proceeding Series, pp 240–245
9.
go back to reference Mousa Allam (2010) Voice Conversion using Pitch Shifting Algorithm by Time Stretching with PSOLA and Re-Sampling. J Electr Eng 61(1):57–61 Mousa Allam (2010) Voice Conversion using Pitch Shifting Algorithm by Time Stretching with PSOLA and Re-Sampling. J Electr Eng 61(1):57–61
10.
go back to reference Kianbakht Sajjad (2016) Dubbing and subtitling american comedy series. Eur J Engl Lang Lit Stud 4(4):65–80 Kianbakht Sajjad (2016) Dubbing and subtitling american comedy series. Eur J Engl Lang Lit Stud 4(4):65–80
11.
go back to reference Moh Supardi and Dea Amanda Putri (2018) Audio-Visual Translation Techniques: Subtitling and Dubbing of Movie Soundtrack in Frozen: Let it Go. Buletin Al-Turas 24(2):399–414CrossRef Moh Supardi and Dea Amanda Putri (2018) Audio-Visual Translation Techniques: Subtitling and Dubbing of Movie Soundtrack in Frozen: Let it Go. Buletin Al-Turas 24(2):399–414CrossRef
12.
go back to reference Piazza Roberta (2010) Voice-over and self-narrative in film: A multimodal analysis of Antonioni’s When Love Fails (Tentato Suicidio). Lang Lit 19(2):173–195CrossRef Piazza Roberta (2010) Voice-over and self-narrative in film: A multimodal analysis of Antonioni’s When Love Fails (Tentato Suicidio). Lang Lit 19(2):173–195CrossRef
13.
go back to reference Szarkowska A, Jankowska A (2012). Text-to-speech audio description of voiced-over films . A case study of audio described Volver in Polish. Emerging topics in translation: Audio description, pp 81–98 Szarkowska A, Jankowska A (2012). Text-to-speech audio description of voiced-over films . A case study of audio described Volver in Polish. Emerging topics in translation: Audio description, pp 81–98
14.
go back to reference Fernandez-Torn ́A, Matamala A (2015). Text-to-speech vs. Human voiced audio descriptions: A reception study in films dubbed into Catalan. J SpecIsed Transl 24(July):61–88 Fernandez-Torn ́A, Matamala A (2015). Text-to-speech vs. Human voiced audio descriptions: A reception study in films dubbed into Catalan. J SpecIsed Transl 24(July):61–88
15.
go back to reference Jacob A, Mythili P (2008) Developing a Child Friendly Text-to-Speech System. Advances in Human-Computer Interaction, 1–6) Jacob A, Mythili P (2008) Developing a Child Friendly Text-to-Speech System. Advances in Human-Computer Interaction, 1–6)
16.
go back to reference Watts Oliver, Yamagishi Junichi, King Simon, Berkling Kay (2010) Synthesis of child speech with HMM adaptation and voice conversion. IEEE Trans Audio Speech Lang Process 18(5):1005–1016CrossRef Watts Oliver, Yamagishi Junichi, King Simon, Berkling Kay (2010) Synthesis of child speech with HMM adaptation and voice conversion. IEEE Trans Audio Speech Lang Process 18(5):1005–1016CrossRef
17.
go back to reference Watts O, Yamagishi J, King S, Berkling K. HMM Adaptation and voice conversion for the synthesis of child speech: A comparison. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp 2627–2630 Watts O, Yamagishi J, King S, Berkling K. HMM Adaptation and voice conversion for the synthesis of child speech: A comparison. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp 2627–2630
18.
go back to reference Reima Karhila DR, Sanand, MK, Smit P. Creating synthetic voices for children by adapting adult average voice using stacked transformations and VTLN. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing—Proceedings, number March, pp 4501–4504 Reima Karhila DR, Sanand, MK, Smit P. Creating synthetic voices for children by adapting adult average voice using stacked transformations and VTLN. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing—Proceedings, number March, pp 4501–4504
19.
go back to reference Prashanth Gurunath Shivakumar and Panayiotis Georgiou (2020) Transfer learning from adult to children for speech recognition: Evaluation, analysis and recommendations. Comput Speech Lang 63:1–15 Prashanth Gurunath Shivakumar and Panayiotis Georgiou (2020) Transfer learning from adult to children for speech recognition: Evaluation, analysis and recommendations. Comput Speech Lang 63:1–15
20.
go back to reference Banno H, Hata H, Morise M, Takahashi T, Irino T, Kawahara H (2007) Implementation of realtime STRAIGHT speech manipulation system: Report on its first implementation. Acoust Sci Technol 28:140–146CrossRef Banno H, Hata H, Morise M, Takahashi T, Irino T, Kawahara H (2007) Implementation of realtime STRAIGHT speech manipulation system: Report on its first implementation. Acoust Sci Technol 28:140–146CrossRef
Metadata
Title
Cross-Gender and Age Speech Conversion Using Hidden Markov Model Based on Cepstral Coefficients Conversion
Authors
Meisi Aristia H. Gultom
Raditiana Patmasari
Inung Wijayanto
Sugondo Hadiyoso
Copyright Year
2021
Publisher
Springer Singapore
DOI
https://doi.org/10.1007/978-981-33-6926-9_13