Skip to main content
Top

2021 | OriginalPaper | Chapter

A Modular Approach for Romanian-English Speech Translation

Authors : Andrei-Marius Avram, Vasile Păiş, Dan Tufiş

Published in: Natural Language Processing and Information Systems

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Automatic speech to speech translation is known to be highly beneficial in enabling people to directly communicate with each other when they do not share a common language. This work presents a modular system for Romanian to English and English to Romanian speech translation created by integrating four families of components in a cascaded manner: (1) automatic speech recognition, (2) transcription correction, (3) machine translation and (4) text-to-speech. We further experimented with several models for each component and present several indicators of the system’s performance. Modularity allows the system to be expanded with additional modules for each of the four components. The resulting system is currently deployed on RELATE and is available for public usage through the web interface of the platform.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Aguero, P., Adell, J., Bonafonte, A.: Prosody generation for speech-to-speech translation. In: 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings. vol. 1, p. I. IEEE (2006) Aguero, P., Adell, J., Bonafonte, A.: Prosody generation for speech-to-speech translation. In: 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings. vol. 1, p. I. IEEE (2006)
2.
go back to reference Amodei, D., et al.: Deep speech 2: end-to-end speech recognition in English and mandarin. In: International Conference on Machine Learning, pp. 173–182. PMLR (2016) Amodei, D., et al.: Deep speech 2: end-to-end speech recognition in English and mandarin. In: International Conference on Machine Learning, pp. 173–182. PMLR (2016)
3.
go back to reference Avram, A.M., Păiş, V., TufiŞ, D.: Romanian speech recognition experiments from the robin project. ISSN 1843–911X, p. 103 Avram, A.M., Păiş, V., TufiŞ, D.: Romanian speech recognition experiments from the robin project. ISSN 1843–911X, p. 103
4.
go back to reference Avram, A.M., Vasile, P., Tufis, D.: Towards a Romanian end-to-end automatic speech recognition based on deepspeech2. Proc. Rom. Acad. Ser. A. 21, 395–402 (2020) Avram, A.M., Vasile, P., Tufis, D.: Towards a Romanian end-to-end automatic speech recognition based on deepspeech2. Proc. Rom. Acad. Ser. A. 21, 395–402 (2020)
5.
go back to reference Battenberg, E., et al.: Location-relative attention mechanisms for robust long-form speech synthesis. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6194–6198. IEEE (2020) Battenberg, E., et al.: Location-relative attention mechanisms for robust long-form speech synthesis. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6194–6198. IEEE (2020)
6.
go back to reference Bérard, A., Besacier, L., Kocabiyikoglu, A.C., Pietquin, O.: End-to-end automatic speech translation of audiobooks. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6224–6228. IEEE (2018) Bérard, A., Besacier, L., Kocabiyikoglu, A.C., Pietquin, O.: End-to-end automatic speech translation of audiobooks. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6224–6228. IEEE (2018)
7.
go back to reference Bérard, A., Pietquin, O., Servan, C., Besacier, L.: Listen and translate: a proof of concept for end-to-end speech-to-text translation. arXiv preprint arXiv:1612.01744 (2016) Bérard, A., Pietquin, O., Servan, C., Besacier, L.: Listen and translate: a proof of concept for end-to-end speech-to-text translation. arXiv preprint arXiv:​1612.​01744 (2016)
8.
go back to reference Boros, T., Dumitrescu, S.D., Pais, V.: Tools and resources for Romanian text-to-speech and speech-to-text applications. arXiv preprint arXiv:1802.05583 (2018) Boros, T., Dumitrescu, S.D., Pais, V.: Tools and resources for Romanian text-to-speech and speech-to-text applications. arXiv preprint arXiv:​1802.​05583 (2018)
9.
go back to reference Boroş, T., Tufiş, D.: Romanian-English speech translation. Proc. Roman. Acad. Ser. A 15(1), 68–75 (2014) Boroş, T., Tufiş, D.: Romanian-English speech translation. Proc. Roman. Acad. Ser. A 15(1), 68–75 (2014)
13.
go back to reference Jia, Y., et al.: Direct speech-to-speech translation with a sequence-to-sequence model. Proc. Interspeech 2019, 1123–1127 (2019)CrossRef Jia, Y., et al.: Direct speech-to-speech translation with a sequence-to-sequence model. Proc. Interspeech 2019, 1123–1127 (2019)CrossRef
14.
go back to reference Ney, H.: Speech translation: coupling of recognition and translation. In: 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No. 99CH36258), vol. 1, pp. 517–520. IEEE (1999) Ney, H.: Speech translation: coupling of recognition and translation. In: 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No. 99CH36258), vol. 1, pp. 517–520. IEEE (1999)
15.
go back to reference Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: LibriSpeech: an ASR corpus based on public domain audio books. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5206–5210. IEEE (2015) Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: LibriSpeech: an ASR corpus based on public domain audio books. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5206–5210. IEEE (2015)
16.
go back to reference Păis, V., Tufiş, D., Ion, R.: Integration of Romanian NLP tools into the relate platform. In: International Conference on Linguistic Resources and Tools for Natural Language Processing (2019) Păis, V., Tufiş, D., Ion, R.: Integration of Romanian NLP tools into the relate platform. In: International Conference on Linguistic Resources and Tools for Natural Language Processing (2019)
18.
go back to reference Stan, A., Yamagishi, J., King, S., Aylett, M.: The Romanian speech synthesis (RSS) corpus: building a high quality hmm-based speech synthesis system using a high sampling rate. Speech Commun. 53(3), 442–450 (2011)CrossRef Stan, A., Yamagishi, J., King, S., Aylett, M.: The Romanian speech synthesis (RSS) corpus: building a high quality hmm-based speech synthesis system using a high sampling rate. Speech Commun. 53(3), 442–450 (2011)CrossRef
19.
go back to reference Vidal, E.: Finite-state speech-to-speech translation. In: 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 111–114. IEEE (1997) Vidal, E.: Finite-state speech-to-speech translation. In: 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 111–114. IEEE (1997)
Metadata
Title
A Modular Approach for Romanian-English Speech Translation
Authors
Andrei-Marius Avram
Vasile Păiş
Dan Tufiş
Copyright Year
2021
DOI
https://doi.org/10.1007/978-3-030-80599-9_6

Premium Partner