Skip to main content
Top

2021 | OriginalPaper | Chapter

Design of Text and Voice Machine Translation Tool for Presentations

Authors : Thi-My-Thanh Nguyen, Xuan-Dung Phan, Ngoc-Bich Le, Xuan-Quy Dao

Published in: Recent Challenges in Intelligent Information and Database Systems

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this paper, a machine translation tool for presentations was presented. This virtual translation tool is a novel approach for generating text or voice in other languages. The proposed system is expected to assists audiences in understanding foreign language content in the live presentations. In this study, the conventional translator was taken over by neural machine translation and human-machine interaction was improved significantly by using text to speech and speech recognition. Experimental results in Vietnamese-English pair showed the effectiveness of the proposed system design and deployment approach.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
3.
go back to reference Aiken, M., Wong, Z.: An updated evaluation of Google translate accuracy. Stud. Linguist. Lit. 3(3), 253–260 (2019)CrossRef Aiken, M., Wong, Z.: An updated evaluation of Google translate accuracy. Stud. Linguist. Lit. 3(3), 253–260 (2019)CrossRef
4.
go back to reference Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014) Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:​1409.​0473 (2014)
5.
go back to reference Chan, W., Jaitly, N., Le, Q., Vinyals, O.: Listen, attend and spell: a neural network for large vocabulary conversational speech recognition. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4960–4964. IEEE (2016) Chan, W., Jaitly, N., Le, Q., Vinyals, O.: Listen, attend and spell: a neural network for large vocabulary conversational speech recognition. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4960–4964. IEEE (2016)
7.
go back to reference Chiu, C.C., et al.: State-of-the-art speech recognition with sequence-to-sequence models. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4774–4778. IEEE (2018) Chiu, C.C., et al.: State-of-the-art speech recognition with sequence-to-sequence models. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4774–4778. IEEE (2018)
8.
go back to reference Chorowski, J.K., Bahdanau, D., Serdyuk, D., Cho, K., Bengio, Y.: Attention-based models for speech recognition. In: Advances in Neural Information Processing Systems, pp. 577–585 (2015) Chorowski, J.K., Bahdanau, D., Serdyuk, D., Cho, K., Bengio, Y.: Attention-based models for speech recognition. In: Advances in Neural Information Processing Systems, pp. 577–585 (2015)
10.
go back to reference Guo, J., Sainath, T.N., Weiss, R.J.: A spelling correction model for end-to-end speech recognition. In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2019, pp. 5651–5655. IEEE (2019) Guo, J., Sainath, T.N., Weiss, R.J.: A spelling correction model for end-to-end speech recognition. In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2019, pp. 5651–5655. IEEE (2019)
11.
go back to reference Hatim, B., Munday, J.: Translation: An Advanced Resource Book. Psychology Press (2004) Hatim, B., Munday, J.: Translation: An Advanced Resource Book. Psychology Press (2004)
12.
go back to reference Jia, Y., et al.: Transfer learning from speaker verification to multispeaker text-to-speech synthesis. In: Advances in Neural Information Processing Systems, pp. 4480–4490 (2018) Jia, Y., et al.: Transfer learning from speaker verification to multispeaker text-to-speech synthesis. In: Advances in Neural Information Processing Systems, pp. 4480–4490 (2018)
13.
go back to reference Këpuska, V., Bohouta, G.: Comparing speech recognition systems (Microsoft API, Google API and CMU Sphinx). Int. J. Eng. Res. Appl. 7(03), 20–24 (2017) Këpuska, V., Bohouta, G.: Comparing speech recognition systems (Microsoft API, Google API and CMU Sphinx). Int. J. Eng. Res. Appl. 7(03), 20–24 (2017)
14.
go back to reference Nguyen, T., Diep, H., Le, B., Dao, Q.: Comparing Vietnamese speech recognitions. In: 2021 5th International Conference on Machine Learning and Soft Computing (ICMLSC). ACM (2021, accepted) Nguyen, T., Diep, H., Le, B., Dao, Q.: Comparing Vietnamese speech recognitions. In: 2021 5th International Conference on Machine Learning and Soft Computing (ICMLSC). ACM (2021, accepted)
16.
go back to reference Prabhavalkar, R., Rao, K., Sainath, T.N., Li, B., Johnson, L., Jaitly, N.: A comparison of sequence-to-sequence models for speech recognition. In: Interspeech, pp. 939–943 (2017) Prabhavalkar, R., Rao, K., Sainath, T.N., Li, B., Johnson, L., Jaitly, N.: A comparison of sequence-to-sequence models for speech recognition. In: Interspeech, pp. 939–943 (2017)
17.
go back to reference Prabhavalkar, R., et al.: Minimum word error rate training for attention-based sequence-to-sequence models. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4839–4843. IEEE (2018) Prabhavalkar, R., et al.: Minimum word error rate training for attention-based sequence-to-sequence models. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4839–4843. IEEE (2018)
18.
go back to reference Salimans, T., Karpathy, A., Chen, X., Kingma, D.P.: PixelCNN++: improving the pixelcnn with discretized logistic mixture likelihood and other modifications. arXiv preprint arXiv:1701.05517 (2017) Salimans, T., Karpathy, A., Chen, X., Kingma, D.P.: PixelCNN++: improving the pixelcnn with discretized logistic mixture likelihood and other modifications. arXiv preprint arXiv:​1701.​05517 (2017)
19.
go back to reference Shen, J., et al.: Natural TTS synthesis by conditioning WaveNet on Mel spectrogram predictions. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4779–4783. IEEE (2018) Shen, J., et al.: Natural TTS synthesis by conditioning WaveNet on Mel spectrogram predictions. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4779–4783. IEEE (2018)
20.
go back to reference Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014) Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)
21.
go back to reference Trang, N.T.T., Tung, N.X.: Text-to-speech shared task in VLSP campaign 2019: evaluating Vietnamese speech synthesis on common datasets. In: Vietnamese Language Signal Processing. VLSP (2019) Trang, N.T.T., Tung, N.X.: Text-to-speech shared task in VLSP campaign 2019: evaluating Vietnamese speech synthesis on common datasets. In: Vietnamese Language Signal Processing. VLSP (2019)
22.
go back to reference Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017) Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
23.
go back to reference Waibel, A.: Organic machine learning (2021) Waibel, A.: Organic machine learning (2021)
24.
go back to reference Wang, Y., Fan, X., Chen, I.F., Liu, Y., Chen, T., Hoffmeister, B.: End-to-end anchored speech recognition. In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2019, pp. 7090–7094. IEEE (2019) Wang, Y., Fan, X., Chen, I.F., Liu, Y., Chen, T., Hoffmeister, B.: End-to-end anchored speech recognition. In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2019, pp. 7090–7094. IEEE (2019)
25.
go back to reference Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016) Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:​1609.​08144 (2016)
Metadata
Title
Design of Text and Voice Machine Translation Tool for Presentations
Authors
Thi-My-Thanh Nguyen
Xuan-Dung Phan
Ngoc-Bich Le
Xuan-Quy Dao
Copyright Year
2021
Publisher
Springer Singapore
DOI
https://doi.org/10.1007/978-981-16-1685-3_11

Premium Partner