nach oben

International Journal of Speech Technology

Erschienen in:

08.02.2022

To deploy trained speech with DNN-LSTM framework for controlling a smart wheeled-robot in limited learning circumstance

verfasst von: Zong-Peng Kuo, Joy Iong-Zong Chen

Erschienen in: International Journal of Speech Technology | Ausgabe 4/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The assessment to a trained speech controller with deep neural network long-short term memory (DNN-LSTM) framework adopted as the commander to control a smart wheeled-robot is implemented in the article. Accordingly, the deployment is implemented in recognition to control remotely a smart wheeled-robot which has been completed previously by other project of authors’ research group. Based on the machine learning skill a framework established with the DNN-LSTM model is embedded into the smart wheeled-robot prototype. Apart from, the control commands are designed over limited learning circumstance where constrained single-track (ST) and double-track (DT) speeches, and only are including 4 Chinese speech commands, “forward” (Chinese”前進”), “backward” (Chinese”後退”), “turn left” (Chinese”左轉”), and “turn right” (Chinese”右轉”). Though, there are just 4 simple speeches collected for data training, the investigation to the accurate ratio is deployed in 3 separated persons training work and each with 1000 to 5000 training times. There are just three parameters (this why “Limited Learning Circumstance” is referred as the article name) considered as the dominators for the performance evaluation of the speech controlled wheeled-robot. The results from the testing cases clearly show that the set with DT has the higher accurate comparison with the set of ST. The best outcomes form the performance of testing and validation happens at the case of DT channel, hereafter, the accurate and loss rate are obtained as 0.673 and 0.018 with 50% dropout, respectively. However, the ratio of dropout has been discovered definitely to dominate the accurate and loss rate when it is deployed during the process of training step. Eventually, the trained and developed model of speech command sets are uploaded into a micro-controller after accuracy analyzed, and embedded into the smart wheeled-robot plays as remotely pilot scheme.

Vorheriger Artikel AI driven human–computer interaction design framework of virtual environment based on comprehensive semantic data analysis with feature extraction

Nächster Artikel Intelligent automobile auxiliary propagation system based on speech recognition and AI driven feature extraction techniques

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Argyriou, A., Evgeniou, T., & Pontil, M. (2007). Multi-task feature learning. In Proceedings of the 19th Annual Conference on Neural Information Processing Systems (pp. 41–48).

Chen, J.I.-Z. (2020). The implementation to intelligent linkage service over AIoT hierarchical for material flow management. Journal of Ambient Intelligence and Humanized Computing, 12(2), 2207.CrossRef

Chen Du, J., Lin, G., He, Y., Xu, R., & Wang, X. (2019). Convolution-based neural attention with applications to sentiment classification. IEEE Access, 7, 27983–27992.CrossRef

Cheng, K., Yue, Y., & Song, Z. (2020). Sentiment classification based on part-of-speech and self-attention mechanism. IEEE Access, 8, 16387–16396.CrossRef

Chunwijitra, S., Boonkla, S., Chunwijitra, V., Kurpukdee, N., Sertsi, P., & Kasuriya, S. (2019). Distributing and sharing resources for automatic speech recognition applications. In The 22nd Conference of the Oriental COCOSDA, Cebu, Philippibes, October 25–27.

Daume´ III, H. (2007). Frustratingly easy domain adaptation. In Proceedings of the 45th Annual Meeting of the Association Computational Linguistics (pp. 256–263).

Deng, L. (2006). Dynamic speech models theory, algorithms, and applications. Redmond: Microsoft Research.CrossRef

Eyben, F. (2015). Real-time speech and music classification by large audio feature space extraction. Doctoral Thesis accepted by the Technische Universität München, Germany. https://doi.org/10.1007/978-3-319-27299-3

Lostanlen, V., & Cella, C.-E. (2016). Deep convolutional networks on the pitch spiral for musical instrument recognition. In 17th International Society for Music Information Retrieval Conference (pp. 1–7).

Mohamed, A.-R., Hinton, G., & Penn, G. (2012). Understanding how deep belief networks perform acoustic modelling. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4273–4276).

Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1358.CrossRef

Park, Y.-S., Song, J.-H., & Kang, S.-I., Lee, W., & Chang, J.-H. (2010). A statistical model-based double-talk detection incorporating soft decision. In 2010 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5082–5085).

Ranganathan, G. (2021). A study to find facts behind preprocessing on deeplearning algorithms. Journal of Innovative Image Processing (JIIP), 3(01), 66–74.CrossRef

Raspberry pi foundation. (2014). [online]. Retrieved from https://www.raspberrypi.com.tw/

Stan, A. (2019). Input encoding for sequence-to-sequence learning of Romanian grapheme-to-phoneme conversion. In 2019 IEEE International Conference on Speech Technology and Human-Computer Dialogue (SpeD), Timisoara, Romania, October 10–12.

Sengupta, A., Jin, F., & Cao, S. (2019). A DNN-LSTM based target tracking approach using mm wave radar and camera sensor fusion. In 2019 IEEE National Aerospace and Electronics Conference (NAECON) (pp. 688–693).

Sohn, J., Kim, N., & Sung, W. (1999). A statistical model-based voice activity detection. IEEE Signal Processing Letters, 6, 1–3.CrossRef

Tachioka, Y. (2018). DNN-based voice activity detection using auxiliary speech models in noisy environments. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5529–5533).

Tesfamikael, H. H., Fray, A., Mengsteab, I., Semere, A., & Amanuel, Z. (2021). Construction of mathematical model of DC servo motor mechanism with PID controller for electric wheel chair arrangement. Journal of Electronics, 3(01), 49–60.

Zhao, M., Yan, L., & Chen, J. (2021). LSTM-DNN based autoencoder network for nonlinear hyperspectral image unmixing. IEEE Journal of Selected Topics in Signal Processing, 15(2), 295–309.CrossRef

Titel: To deploy trained speech with DNN-LSTM framework for controlling a smart wheeled-robot in limited learning circumstance
verfasst von: Zong-Peng Kuo
Joy Iong-Zong Chen
Publikationsdatum: 08.02.2022
Verlag: Springer US
Erschienen in: International Journal of Speech Technology / Ausgabe 4/2022
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-022-09962-z

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Internationaler Motorenkongress/© [M] ATZlive | Chisnikov / Fotolia.com, Search Icon, Banner Hanser, Thorsten Mücke/© Alexandra Bachran, Gamification/© Sergey Shulgin / Getty Images / iStock, Benedikt Bonnmann von Adesso/© Adesso, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade, chassis.tech plus 2023/© [M] ATZlive / TÜV SÜD PRODUCT SERVICE GMBH

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 4/2022

Feature extraction from behavioral styles of children for prediction of severity of stuttering using historical stuttering data

Speech emotion recognition using data augmentation

An efficient contextual glove feature extraction model on large textual databases

AI driven human–computer interaction design framework of virtual environment based on comprehensive semantic data analysis with feature extraction

Analysis of influencing features with spectral feature extraction and multi-class classification using deep neural network for speech recognition system

Speaker identification using hybrid neural network support vector machine classifier

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.