nach oben

Erschienen in:

2021 | OriginalPaper | Buchkapitel

Design of Text and Voice Machine Translation Tool for Presentations

verfasst von : Thi-My-Thanh Nguyen, Xuan-Dung Phan, Ngoc-Bich Le, Xuan-Quy Dao

Erschienen in: Recent Challenges in Intelligent Information and Database Systems

Verlag: Springer Singapore

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In this paper, a machine translation tool for presentations was presented. This virtual translation tool is a novel approach for generating text or voice in other languages. The proposed system is expected to assists audiences in understanding foreign language content in the live presentations. In this study, the conventional translator was taken over by neural machine translation and human-machine interaction was improved significantly by using text to speech and speech recognition. Experimental results in Vietnamese-English pair showed the effectiveness of the proposed system design and deployment approach.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel A Subtype Classification of Hematopoietic Cancer Using Machine Learning Approach

Nächstes Kapitel Hybrid Approach for the Semantic Analysis of Texts in the Kazakh Language

A neural network for machine translation, at production scale. https://ai.googleblog.com/2016/09/a-neural-network-for-machine.html

Real time voice cloning. https://github.com/CorentinJ/Real-Time-Voice-Cloning

Aiken, M., Wong, Z.: An updated evaluation of Google translate accuracy. Stud. Linguist. Lit. 3(3), 253–260 (2019)CrossRef

Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)

Chan, W., Jaitly, N., Le, Q., Vinyals, O.: Listen, attend and spell: a neural network for large vocabulary conversational speech recognition. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4960–4964. IEEE (2016)

Chan, W., Jaitly, N., Le, Q.V., Vinyals, O.: Listen, attend and spell. arXiv preprint arXiv:1508.01211 (2015)

Chiu, C.C., et al.: State-of-the-art speech recognition with sequence-to-sequence models. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4774–4778. IEEE (2018)

Chorowski, J.K., Bahdanau, D., Serdyuk, D., Cho, K., Bengio, Y.: Attention-based models for speech recognition. In: Advances in Neural Information Processing Systems, pp. 577–585 (2015)

Filippidou, F., Moussiades, L.: A benchmarking of IBM, Google and wit automatic speech recognition systems. In: Maglogiannis, I., Iliadis, L., Pimenidis, E. (eds.) AIAI 2020. IAICT, vol. 583, pp. 73–82. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49161-1_7CrossRef

10.

Guo, J., Sainath, T.N., Weiss, R.J.: A spelling correction model for end-to-end speech recognition. In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2019, pp. 5651–5655. IEEE (2019)

11.

Hatim, B., Munday, J.: Translation: An Advanced Resource Book. Psychology Press (2004)

12.

Jia, Y., et al.: Transfer learning from speaker verification to multispeaker text-to-speech synthesis. In: Advances in Neural Information Processing Systems, pp. 4480–4490 (2018)

13.

Këpuska, V., Bohouta, G.: Comparing speech recognition systems (Microsoft API, Google API and CMU Sphinx). Int. J. Eng. Res. Appl. 7(03), 20–24 (2017)

14.

Nguyen, T., Diep, H., Le, B., Dao, Q.: Comparing Vietnamese speech recognitions. In: 2021 5th International Conference on Machine Learning and Soft Computing (ICMLSC). ACM (2021, accepted)

15.

van den Oord, A., et al.: WaveNet: a generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016)

16.

Prabhavalkar, R., Rao, K., Sainath, T.N., Li, B., Johnson, L., Jaitly, N.: A comparison of sequence-to-sequence models for speech recognition. In: Interspeech, pp. 939–943 (2017)

17.

Prabhavalkar, R., et al.: Minimum word error rate training for attention-based sequence-to-sequence models. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4839–4843. IEEE (2018)

18.

Salimans, T., Karpathy, A., Chen, X., Kingma, D.P.: PixelCNN++: improving the pixelcnn with discretized logistic mixture likelihood and other modifications. arXiv preprint arXiv:1701.05517 (2017)

19.

Shen, J., et al.: Natural TTS synthesis by conditioning WaveNet on Mel spectrogram predictions. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4779–4783. IEEE (2018)

20.

Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)

21.

Trang, N.T.T., Tung, N.X.: Text-to-speech shared task in VLSP campaign 2019: evaluating Vietnamese speech synthesis on common datasets. In: Vietnamese Language Signal Processing. VLSP (2019)

22.

Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)

23.

Waibel, A.: Organic machine learning (2021)

24.

Wang, Y., Fan, X., Chen, I.F., Liu, Y., Chen, T., Hoffmeister, B.: End-to-end anchored speech recognition. In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2019, pp. 7090–7094. IEEE (2019)

25.

Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016)

Titel: Design of Text and Voice Machine Translation Tool for Presentations
verfasst von: Thi-My-Thanh Nguyen
Xuan-Dung Phan
Ngoc-Bich Le
Xuan-Quy Dao
Verlag: Springer Singapore
Buch: Recent Challenges in Intelligent Information and Database Systems
Print ISBN: 978-981-16-1684-6

Electronic ISBN: 978-981-16-1685-3

Copyright-Jahr: 2021
DOI: https://doi.org/10.1007/978-981-16-1685-3_11

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"