Skip to main content

Neural Network Control Interface of the Speaker Dependent Computer System «Deep Interactive Voice Assistant DIVA» to Help People with Speech Impairments

  • Conference paper
  • First Online:
Proceedings of the Third International Scientific Conference “Intelligent Information Technologies for Industry” (IITI’18) (IITI'18 2018)

Abstract

With the development of modern informational communication systems, voice control interface and speech recognition systems find application in various fields of activity. One application of such systems is for people with special needs who have speech impairments, and thus find using speech-dependent voice interfaces challenging. Our research team is developing a speaker dependent computer system «Deep Interactive Voice Assistant» (DIVA), which allows recognizing an arbitrary set of commands to control the computing system. The article presents the results of testing various artificial neural networks to train the machine to recognize vocal inputs. We examine such architectures as associative memory, multilayer perceptron and convolutional network. The research justifies the use of multilayer perceptron for the speaker dependent computer system DIVA as a training solution that demonstrated high results on a small selection. DIVA will be implemented in voice-user interface of such systems as «Smart House», mobile applications and IT-based assistive systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Convention on the Rights of Persons with Disabilities (CRPD): http://www.un.org/development/desa/disabilities/convention-on-the-rights-of-persons-with-disabilities.html. Accessed 01 May 2018

  2. Gaida, C.: Comparing open-source speech recognition toolkits. http://suendermann.com/su/pdf/oasis2014.pdf. Accessed 01 May 2018

  3. Gazetić, E.: Comparison Between Cloud-based and Offline Speech Recognition Systems. https://mediatum.ub.tum.de/doc/1399984/1399984.pdf. Accessed 01 May 2018

  4. Rybka, J., Janicki, A.: Comparison of speaker dependent and speaker independent emotion recognition. Appl. Math. Comput. Sci. 4(23), 797–808 (2013)

    Google Scholar 

  5. Lee, K., Huang, X.: On speaker-independent, speaker-dependent, and speaker-adaptive speech recognition. IEEE Trans. Speech Audio Process. 2(1), 150–157 (1993)

    Google Scholar 

  6. Senkevich, G.: Computer for People with Disabilities. BHV-Petersburg, St. Petersburg (2014)

    Google Scholar 

  7. Center of Speech Technologies: https://www.speechpro.ru/. Accessed 01 May 2018

  8. El Amrania, M., Hafizur Rahmanb, M., Wahiddinb, M., Shahb, A.: Building CMU Sphinx language model for the Holy Quran using simplified Arabic phonemes. Egypt. Inform. J. 3(17), 305–314 (2016)

    Article  Google Scholar 

  9. Tampel, I.: Automatic speech recognition - the main stages of 50 years. Sci. Tech. Her. Inf. Technol. Mech. Opt. 6(15), 957–968 (2015)

    Google Scholar 

  10. Roebuck, K.: Speech Recognition: High-Impact Emerging Technology - What You Need To Know: Definitions, Adoptions, Impact, Benefits, Maturity, Vendors. Emereo Publishing, Australia (2012)

    Google Scholar 

  11. Povey, D.: The Kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, pp. 1–4 (2011)

    Google Scholar 

  12. Lange, P., Suendermann-Oeft, D.: Tuning Sphinx to outperform Google’s speech API. In: Proceedings of the ESSV 2014, Conference on Electronic Speech Signal Processing, Dresden, Germany (2014)

    Google Scholar 

  13. Simon, O.: Haykin Neural Networks and Learning Machines, 3rd edn. Pearson, Upper Saddle River (2009)

    Google Scholar 

  14. Zhang, Y., Pezeshki, M., Brakel, P., Zhang, S., Bengio, C.L.Y., Courville, A.: Towards end-to-end speech recognition with deep convolutional neural networks, CoRR, vol. abs/1701.02720. http://arxiv.org/abs/1701.02720 (2017)

  15. Vazquez, R.A., Sossa, H.: Associative Memories Applied to Image Categorization. In: Martínez-Trinidad, J.F., Carrasco Ochoa, J.A., Kittler, J. (eds.) CIARP 2006. LNCS, vol. 4225, pp. 549–558. Springer, Heidelberg (2006)

    Google Scholar 

  16. Hopfield, J.J.: Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. 79, 2554–2558 (1982)

    Article  MathSciNet  Google Scholar 

  17. Vaishnavi, Y., Shreyas, R., Suhas, S., Surya, U.N., Ladwani V.M., Ramasubramanian, V.: Associative memory framework for speech recognition: adaptation of Hopfield network. In: IEEE Annual India Conference (INDICON), Bangalore, pp. 1–6 (2016)

    Google Scholar 

  18. Ladwani, V.M., Vaishnavi, Y., Shreyas, R., Vinay Kumar, B.R., Harisha, N., Yogesh, S., Shivaganga, P., Ramasubramanian, V.: Hopfield net framework for audio search. In: Communications (NCC), pp. 1–6. https://doi.org/10.1109/ncc.2017.8077074 (2017)

  19. Barra, A., Beccaria, M., Fachechi, A.: A relativistic extension of Hopfield neural networks via the mechanical analogy. arXiv:1801.01743v1 (2018)

  20. Hamming, R.: Coding and Information Theory. Prentice-Hall, Englewood Cliffs (1968)

    MATH  Google Scholar 

  21. Kosko, B.: Adaptive bidirectional associative memories. Appl. Opt. 26(23), 4947–4960 (1987)

    Article  Google Scholar 

  22. Willshaw, D.J., Buneman, O.P., Longuet-Higgins, H.C.: Non-holographic associative memory. Nature 222, 960–962 (1969)

    Article  Google Scholar 

  23. Stöckel, A.: Design Space Exploration of Associative Memories Using Spiking Neurons with Respect to Neuromorphic Hardware Implementations. Universität Bielefeld, Bielefeld (2016)

    Google Scholar 

  24. Vázquez, A.: New associative model with dynamical synapses. Neural Process. Lett. 28(3), 189–207 (2008)

    Article  Google Scholar 

  25. Vázquez, R. Sossa, H.: Voice translator based on associative memories. In: Advances in Neural Networks, pp. 341–350 (2008)

    Google Scholar 

  26. Minghu, J., Biqin, L., Baozong, Y.: Speech recognition by using the extended associative memory neural network (EAMNN). In: IEEE International Conference on Intelligent Processing Systems, vol. 2, pp. 1777–1780 (1997)

    Google Scholar 

  27. Krotov, D., Hopfield, J.: Dense associative memory for pattern recognition. In: Advances in Neural Information Processing Systems 29, pp. 1172–1180 (2016)

    Google Scholar 

  28. Giovanni, C.: Design of associative memory for gray-scale images by multilayer Hopfield neural networks. In: Proceedings of the 10th WSEAS International Conference on CIRCUITS, Vouliagmeni, Athens, Greece, pp. 376–379 (2006)

    Google Scholar 

  29. Sussner, P., Esmi, E., Villaverde, I., Graña, M.: The Kosko subsethood fuzzy associative memory (KS-FAM): mathematical background and applications in computer vision. J. Math. Imaging Vis. 42, 134–149 (2012)

    Article  MathSciNet  Google Scholar 

  30. Kohonen, T.: Self-organizing Maps, 3rd Extended edn. Springer, New York/Heidelberg (2001)

    Book  Google Scholar 

  31. Furao, S., Ouyang, Q., Kasai, W., Hasegawa, O.: A general associative memory based on self-organizing incremental neural network. Neurocomputing 104, 57–71 (2013)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tatiana Khorosheva .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Khorosheva, T., Novoseltseva, M., Geidarov, N., Krivosheev, N., Chernenko, S. (2019). Neural Network Control Interface of the Speaker Dependent Computer System «Deep Interactive Voice Assistant DIVA» to Help People with Speech Impairments. In: Abraham, A., Kovalev, S., Tarassov, V., Snasel, V., Sukhanov, A. (eds) Proceedings of the Third International Scientific Conference “Intelligent Information Technologies for Industry” (IITI’18). IITI'18 2018. Advances in Intelligent Systems and Computing, vol 874. Springer, Cham. https://doi.org/10.1007/978-3-030-01818-4_44

Download citation

Publish with us

Policies and ethics