Skip to main content

2021 | OriginalPaper | Buchkapitel

Design of Text and Voice Machine Translation Tool for Presentations

verfasst von : Thi-My-Thanh Nguyen, Xuan-Dung Phan, Ngoc-Bich Le, Xuan-Quy Dao

Erschienen in: Recent Challenges in Intelligent Information and Database Systems

Verlag: Springer Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, a machine translation tool for presentations was presented. This virtual translation tool is a novel approach for generating text or voice in other languages. The proposed system is expected to assists audiences in understanding foreign language content in the live presentations. In this study, the conventional translator was taken over by neural machine translation and human-machine interaction was improved significantly by using text to speech and speech recognition. Experimental results in Vietnamese-English pair showed the effectiveness of the proposed system design and deployment approach.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
3.
Zurück zum Zitat Aiken, M., Wong, Z.: An updated evaluation of Google translate accuracy. Stud. Linguist. Lit. 3(3), 253–260 (2019)CrossRef Aiken, M., Wong, Z.: An updated evaluation of Google translate accuracy. Stud. Linguist. Lit. 3(3), 253–260 (2019)CrossRef
4.
Zurück zum Zitat Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014) Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:​1409.​0473 (2014)
5.
Zurück zum Zitat Chan, W., Jaitly, N., Le, Q., Vinyals, O.: Listen, attend and spell: a neural network for large vocabulary conversational speech recognition. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4960–4964. IEEE (2016) Chan, W., Jaitly, N., Le, Q., Vinyals, O.: Listen, attend and spell: a neural network for large vocabulary conversational speech recognition. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4960–4964. IEEE (2016)
7.
Zurück zum Zitat Chiu, C.C., et al.: State-of-the-art speech recognition with sequence-to-sequence models. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4774–4778. IEEE (2018) Chiu, C.C., et al.: State-of-the-art speech recognition with sequence-to-sequence models. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4774–4778. IEEE (2018)
8.
Zurück zum Zitat Chorowski, J.K., Bahdanau, D., Serdyuk, D., Cho, K., Bengio, Y.: Attention-based models for speech recognition. In: Advances in Neural Information Processing Systems, pp. 577–585 (2015) Chorowski, J.K., Bahdanau, D., Serdyuk, D., Cho, K., Bengio, Y.: Attention-based models for speech recognition. In: Advances in Neural Information Processing Systems, pp. 577–585 (2015)
10.
Zurück zum Zitat Guo, J., Sainath, T.N., Weiss, R.J.: A spelling correction model for end-to-end speech recognition. In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2019, pp. 5651–5655. IEEE (2019) Guo, J., Sainath, T.N., Weiss, R.J.: A spelling correction model for end-to-end speech recognition. In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2019, pp. 5651–5655. IEEE (2019)
11.
Zurück zum Zitat Hatim, B., Munday, J.: Translation: An Advanced Resource Book. Psychology Press (2004) Hatim, B., Munday, J.: Translation: An Advanced Resource Book. Psychology Press (2004)
12.
Zurück zum Zitat Jia, Y., et al.: Transfer learning from speaker verification to multispeaker text-to-speech synthesis. In: Advances in Neural Information Processing Systems, pp. 4480–4490 (2018) Jia, Y., et al.: Transfer learning from speaker verification to multispeaker text-to-speech synthesis. In: Advances in Neural Information Processing Systems, pp. 4480–4490 (2018)
13.
Zurück zum Zitat Këpuska, V., Bohouta, G.: Comparing speech recognition systems (Microsoft API, Google API and CMU Sphinx). Int. J. Eng. Res. Appl. 7(03), 20–24 (2017) Këpuska, V., Bohouta, G.: Comparing speech recognition systems (Microsoft API, Google API and CMU Sphinx). Int. J. Eng. Res. Appl. 7(03), 20–24 (2017)
14.
Zurück zum Zitat Nguyen, T., Diep, H., Le, B., Dao, Q.: Comparing Vietnamese speech recognitions. In: 2021 5th International Conference on Machine Learning and Soft Computing (ICMLSC). ACM (2021, accepted) Nguyen, T., Diep, H., Le, B., Dao, Q.: Comparing Vietnamese speech recognitions. In: 2021 5th International Conference on Machine Learning and Soft Computing (ICMLSC). ACM (2021, accepted)
16.
Zurück zum Zitat Prabhavalkar, R., Rao, K., Sainath, T.N., Li, B., Johnson, L., Jaitly, N.: A comparison of sequence-to-sequence models for speech recognition. In: Interspeech, pp. 939–943 (2017) Prabhavalkar, R., Rao, K., Sainath, T.N., Li, B., Johnson, L., Jaitly, N.: A comparison of sequence-to-sequence models for speech recognition. In: Interspeech, pp. 939–943 (2017)
17.
Zurück zum Zitat Prabhavalkar, R., et al.: Minimum word error rate training for attention-based sequence-to-sequence models. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4839–4843. IEEE (2018) Prabhavalkar, R., et al.: Minimum word error rate training for attention-based sequence-to-sequence models. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4839–4843. IEEE (2018)
18.
Zurück zum Zitat Salimans, T., Karpathy, A., Chen, X., Kingma, D.P.: PixelCNN++: improving the pixelcnn with discretized logistic mixture likelihood and other modifications. arXiv preprint arXiv:1701.05517 (2017) Salimans, T., Karpathy, A., Chen, X., Kingma, D.P.: PixelCNN++: improving the pixelcnn with discretized logistic mixture likelihood and other modifications. arXiv preprint arXiv:​1701.​05517 (2017)
19.
Zurück zum Zitat Shen, J., et al.: Natural TTS synthesis by conditioning WaveNet on Mel spectrogram predictions. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4779–4783. IEEE (2018) Shen, J., et al.: Natural TTS synthesis by conditioning WaveNet on Mel spectrogram predictions. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4779–4783. IEEE (2018)
20.
Zurück zum Zitat Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014) Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)
21.
Zurück zum Zitat Trang, N.T.T., Tung, N.X.: Text-to-speech shared task in VLSP campaign 2019: evaluating Vietnamese speech synthesis on common datasets. In: Vietnamese Language Signal Processing. VLSP (2019) Trang, N.T.T., Tung, N.X.: Text-to-speech shared task in VLSP campaign 2019: evaluating Vietnamese speech synthesis on common datasets. In: Vietnamese Language Signal Processing. VLSP (2019)
22.
Zurück zum Zitat Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017) Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
23.
Zurück zum Zitat Waibel, A.: Organic machine learning (2021) Waibel, A.: Organic machine learning (2021)
24.
Zurück zum Zitat Wang, Y., Fan, X., Chen, I.F., Liu, Y., Chen, T., Hoffmeister, B.: End-to-end anchored speech recognition. In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2019, pp. 7090–7094. IEEE (2019) Wang, Y., Fan, X., Chen, I.F., Liu, Y., Chen, T., Hoffmeister, B.: End-to-end anchored speech recognition. In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2019, pp. 7090–7094. IEEE (2019)
25.
Zurück zum Zitat Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016) Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:​1609.​08144 (2016)
Metadaten
Titel
Design of Text and Voice Machine Translation Tool for Presentations
verfasst von
Thi-My-Thanh Nguyen
Xuan-Dung Phan
Ngoc-Bich Le
Xuan-Quy Dao
Copyright-Jahr
2021
Verlag
Springer Singapore
DOI
https://doi.org/10.1007/978-981-16-1685-3_11