Skip to main content
Erschienen in: International Journal of Speech Technology 4/2022

08.02.2022

To deploy trained speech with DNN-LSTM framework for controlling a smart wheeled-robot in limited learning circumstance

verfasst von: Zong-Peng Kuo, Joy Iong-Zong Chen

Erschienen in: International Journal of Speech Technology | Ausgabe 4/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The assessment to a trained speech controller with deep neural network long-short term memory (DNN-LSTM) framework adopted as the commander to control a smart wheeled-robot is implemented in the article. Accordingly, the deployment is implemented in recognition to control remotely a smart wheeled-robot which has been completed previously by other project of authors’ research group. Based on the machine learning skill a framework established with the DNN-LSTM model is embedded into the smart wheeled-robot prototype. Apart from, the control commands are designed over limited learning circumstance where constrained single-track (ST) and double-track (DT) speeches, and only are including 4 Chinese speech commands, “forward” (Chinese”前進”), “backward” (Chinese”後退”), “turn left” (Chinese”左轉”), and “turn right” (Chinese”右轉”). Though, there are just 4 simple speeches collected for data training, the investigation to the accurate ratio is deployed in 3 separated persons training work and each with 1000 to 5000 training times. There are just three parameters (this why “Limited Learning Circumstance” is referred as the article name) considered as the dominators for the performance evaluation of the speech controlled wheeled-robot. The results from the testing cases clearly show that the set with DT has the higher accurate comparison with the set of ST. The best outcomes form the performance of testing and validation happens at the case of DT channel, hereafter, the accurate and loss rate are obtained as 0.673 and 0.018 with 50% dropout, respectively. However, the ratio of dropout has been discovered definitely to dominate the accurate and loss rate when it is deployed during the process of training step. Eventually, the trained and developed model of speech command sets are uploaded into a micro-controller after accuracy analyzed, and embedded into the smart wheeled-robot plays as remotely pilot scheme.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Argyriou, A., Evgeniou, T., & Pontil, M. (2007). Multi-task feature learning. In Proceedings of the 19th Annual Conference on Neural Information Processing Systems (pp. 41–48). Argyriou, A., Evgeniou, T., & Pontil, M. (2007). Multi-task feature learning. In Proceedings of the 19th Annual Conference on Neural Information Processing Systems (pp. 41–48).
Zurück zum Zitat Chen, J.I.-Z. (2020). The implementation to intelligent linkage service over AIoT hierarchical for material flow management. Journal of Ambient Intelligence and Humanized Computing, 12(2), 2207.CrossRef Chen, J.I.-Z. (2020). The implementation to intelligent linkage service over AIoT hierarchical for material flow management. Journal of Ambient Intelligence and Humanized Computing, 12(2), 2207.CrossRef
Zurück zum Zitat Chen Du, J., Lin, G., He, Y., Xu, R., & Wang, X. (2019). Convolution-based neural attention with applications to sentiment classification. IEEE Access, 7, 27983–27992.CrossRef Chen Du, J., Lin, G., He, Y., Xu, R., & Wang, X. (2019). Convolution-based neural attention with applications to sentiment classification. IEEE Access, 7, 27983–27992.CrossRef
Zurück zum Zitat Cheng, K., Yue, Y., & Song, Z. (2020). Sentiment classification based on part-of-speech and self-attention mechanism. IEEE Access, 8, 16387–16396.CrossRef Cheng, K., Yue, Y., & Song, Z. (2020). Sentiment classification based on part-of-speech and self-attention mechanism. IEEE Access, 8, 16387–16396.CrossRef
Zurück zum Zitat Chunwijitra, S., Boonkla, S., Chunwijitra, V., Kurpukdee, N., Sertsi, P., & Kasuriya, S. (2019). Distributing and sharing resources for automatic speech recognition applications. In The 22nd Conference of the Oriental COCOSDA, Cebu, Philippibes, October 25–27. Chunwijitra, S., Boonkla, S., Chunwijitra, V., Kurpukdee, N., Sertsi, P., & Kasuriya, S. (2019). Distributing and sharing resources for automatic speech recognition applications. In The 22nd Conference of the Oriental COCOSDA, Cebu, Philippibes, October 25–27.
Zurück zum Zitat Daume´ III, H. (2007). Frustratingly easy domain adaptation. In Proceedings of the 45th Annual Meeting of the Association Computational Linguistics (pp. 256–263). Daume´ III, H. (2007). Frustratingly easy domain adaptation. In Proceedings of the 45th Annual Meeting of the Association Computational Linguistics (pp. 256–263).
Zurück zum Zitat Deng, L. (2006). Dynamic speech models theory, algorithms, and applications. Redmond: Microsoft Research.CrossRef Deng, L. (2006). Dynamic speech models theory, algorithms, and applications. Redmond: Microsoft Research.CrossRef
Zurück zum Zitat Lostanlen, V., & Cella, C.-E. (2016). Deep convolutional networks on the pitch spiral for musical instrument recognition. In 17th International Society for Music Information Retrieval Conference (pp. 1–7). Lostanlen, V., & Cella, C.-E. (2016). Deep convolutional networks on the pitch spiral for musical instrument recognition. In 17th International Society for Music Information Retrieval Conference (pp. 1–7).
Zurück zum Zitat Mohamed, A.-R., Hinton, G., & Penn, G. (2012). Understanding how deep belief networks perform acoustic modelling. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4273–4276). Mohamed, A.-R., Hinton, G., & Penn, G. (2012). Understanding how deep belief networks perform acoustic modelling. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4273–4276).
Zurück zum Zitat Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1358.CrossRef Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1358.CrossRef
Zurück zum Zitat Park, Y.-S., Song, J.-H., & Kang, S.-I., Lee, W., & Chang, J.-H. (2010). A statistical model-based double-talk detection incorporating soft decision. In 2010 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5082–5085). Park, Y.-S., Song, J.-H., & Kang, S.-I., Lee, W., & Chang, J.-H. (2010). A statistical model-based double-talk detection incorporating soft decision. In 2010 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5082–5085).
Zurück zum Zitat Ranganathan, G. (2021). A study to find facts behind preprocessing on deeplearning algorithms. Journal of Innovative Image Processing (JIIP), 3(01), 66–74.CrossRef Ranganathan, G. (2021). A study to find facts behind preprocessing on deeplearning algorithms. Journal of Innovative Image Processing (JIIP), 3(01), 66–74.CrossRef
Zurück zum Zitat Stan, A. (2019). Input encoding for sequence-to-sequence learning of Romanian grapheme-to-phoneme conversion. In 2019 IEEE International Conference on Speech Technology and Human-Computer Dialogue (SpeD), Timisoara, Romania, October 10–12. Stan, A. (2019). Input encoding for sequence-to-sequence learning of Romanian grapheme-to-phoneme conversion. In 2019 IEEE International Conference on Speech Technology and Human-Computer Dialogue (SpeD), Timisoara, Romania, October 10–12.
Zurück zum Zitat Sengupta, A., Jin, F., & Cao, S. (2019). A DNN-LSTM based target tracking approach using mm wave radar and camera sensor fusion. In 2019 IEEE National Aerospace and Electronics Conference (NAECON) (pp. 688–693). Sengupta, A., Jin, F., & Cao, S. (2019). A DNN-LSTM based target tracking approach using mm wave radar and camera sensor fusion. In 2019 IEEE National Aerospace and Electronics Conference (NAECON) (pp. 688–693).
Zurück zum Zitat Sohn, J., Kim, N., & Sung, W. (1999). A statistical model-based voice activity detection. IEEE Signal Processing Letters, 6, 1–3.CrossRef Sohn, J., Kim, N., & Sung, W. (1999). A statistical model-based voice activity detection. IEEE Signal Processing Letters, 6, 1–3.CrossRef
Zurück zum Zitat Tachioka, Y. (2018). DNN-based voice activity detection using auxiliary speech models in noisy environments. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5529–5533). Tachioka, Y. (2018). DNN-based voice activity detection using auxiliary speech models in noisy environments. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5529–5533).
Zurück zum Zitat Tesfamikael, H. H., Fray, A., Mengsteab, I., Semere, A., & Amanuel, Z. (2021). Construction of mathematical model of DC servo motor mechanism with PID controller for electric wheel chair arrangement. Journal of Electronics, 3(01), 49–60. Tesfamikael, H. H., Fray, A., Mengsteab, I., Semere, A., & Amanuel, Z. (2021). Construction of mathematical model of DC servo motor mechanism with PID controller for electric wheel chair arrangement. Journal of Electronics, 3(01), 49–60.
Zurück zum Zitat Zhao, M., Yan, L., & Chen, J. (2021). LSTM-DNN based autoencoder network for nonlinear hyperspectral image unmixing. IEEE Journal of Selected Topics in Signal Processing, 15(2), 295–309.CrossRef Zhao, M., Yan, L., & Chen, J. (2021). LSTM-DNN based autoencoder network for nonlinear hyperspectral image unmixing. IEEE Journal of Selected Topics in Signal Processing, 15(2), 295–309.CrossRef
Metadaten
Titel
To deploy trained speech with DNN-LSTM framework for controlling a smart wheeled-robot in limited learning circumstance
verfasst von
Zong-Peng Kuo
Joy Iong-Zong Chen
Publikationsdatum
08.02.2022
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 4/2022
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-022-09962-z

Weitere Artikel der Ausgabe 4/2022

International Journal of Speech Technology 4/2022 Zur Ausgabe

Neuer Inhalt