Neural Network Control Interface of the Speaker Dependent Computer System «Deep Interactive Voice Assistant DIVA» to Help People with Speech Impairments

Khorosheva, Tatiana; Novoseltseva, Marina; Geidarov, Nazim; Krivosheev, Nikolay; Chernenko, Sergey

doi:10.1007/978-3-030-01818-4_44

Tatiana Khorosheva¹⁹,
Marina Novoseltseva¹⁹,
Nazim Geidarov¹⁹,
Nikolay Krivosheev¹⁹ &
…
Sergey Chernenko¹⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 874))

Included in the following conference series:

International Conference on Intelligent Information Technologies for Industry

341 Accesses
2 Citations

Abstract

With the development of modern informational communication systems, voice control interface and speech recognition systems find application in various fields of activity. One application of such systems is for people with special needs who have speech impairments, and thus find using speech-dependent voice interfaces challenging. Our research team is developing a speaker dependent computer system «Deep Interactive Voice Assistant» (DIVA), which allows recognizing an arbitrary set of commands to control the computing system. The article presents the results of testing various artificial neural networks to train the machine to recognize vocal inputs. We examine such architectures as associative memory, multilayer perceptron and convolutional network. The research justifies the use of multilayer perceptron for the speaker dependent computer system DIVA as a training solution that demonstrated high results on a small selection. DIVA will be implemented in voice-user interface of such systems as «Smart House», mobile applications and IT-based assistive systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Convention on the Rights of Persons with Disabilities (CRPD): http://www.un.org/development/desa/disabilities/convention-on-the-rights-of-persons-with-disabilities.html. Accessed 01 May 2018
Gaida, C.: Comparing open-source speech recognition toolkits. http://suendermann.com/su/pdf/oasis2014.pdf. Accessed 01 May 2018
Gazetić, E.: Comparison Between Cloud-based and Offline Speech Recognition Systems. https://mediatum.ub.tum.de/doc/1399984/1399984.pdf. Accessed 01 May 2018
Rybka, J., Janicki, A.: Comparison of speaker dependent and speaker independent emotion recognition. Appl. Math. Comput. Sci. 4(23), 797–808 (2013)
Google Scholar
Lee, K., Huang, X.: On speaker-independent, speaker-dependent, and speaker-adaptive speech recognition. IEEE Trans. Speech Audio Process. 2(1), 150–157 (1993)
Google Scholar
Senkevich, G.: Computer for People with Disabilities. BHV-Petersburg, St. Petersburg (2014)
Google Scholar
Center of Speech Technologies: https://www.speechpro.ru/. Accessed 01 May 2018
El Amrania, M., Hafizur Rahmanb, M., Wahiddinb, M., Shahb, A.: Building CMU Sphinx language model for the Holy Quran using simplified Arabic phonemes. Egypt. Inform. J. 3(17), 305–314 (2016)
Article Google Scholar
Tampel, I.: Automatic speech recognition - the main stages of 50 years. Sci. Tech. Her. Inf. Technol. Mech. Opt. 6(15), 957–968 (2015)
Google Scholar
Roebuck, K.: Speech Recognition: High-Impact Emerging Technology - What You Need To Know: Definitions, Adoptions, Impact, Benefits, Maturity, Vendors. Emereo Publishing, Australia (2012)
Google Scholar
Povey, D.: The Kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, pp. 1–4 (2011)
Google Scholar
Lange, P., Suendermann-Oeft, D.: Tuning Sphinx to outperform Google’s speech API. In: Proceedings of the ESSV 2014, Conference on Electronic Speech Signal Processing, Dresden, Germany (2014)
Google Scholar
Simon, O.: Haykin Neural Networks and Learning Machines, 3rd edn. Pearson, Upper Saddle River (2009)
Google Scholar
Zhang, Y., Pezeshki, M., Brakel, P., Zhang, S., Bengio, C.L.Y., Courville, A.: Towards end-to-end speech recognition with deep convolutional neural networks, CoRR, vol. abs/1701.02720. http://arxiv.org/abs/1701.02720 (2017)
Vazquez, R.A., Sossa, H.: Associative Memories Applied to Image Categorization. In: Martínez-Trinidad, J.F., Carrasco Ochoa, J.A., Kittler, J. (eds.) CIARP 2006. LNCS, vol. 4225, pp. 549–558. Springer, Heidelberg (2006)
Google Scholar
Hopfield, J.J.: Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. 79, 2554–2558 (1982)
Article MathSciNet Google Scholar
Vaishnavi, Y., Shreyas, R., Suhas, S., Surya, U.N., Ladwani V.M., Ramasubramanian, V.: Associative memory framework for speech recognition: adaptation of Hopfield network. In: IEEE Annual India Conference (INDICON), Bangalore, pp. 1–6 (2016)
Google Scholar
Ladwani, V.M., Vaishnavi, Y., Shreyas, R., Vinay Kumar, B.R., Harisha, N., Yogesh, S., Shivaganga, P., Ramasubramanian, V.: Hopfield net framework for audio search. In: Communications (NCC), pp. 1–6. https://doi.org/10.1109/ncc.2017.8077074 (2017)
Barra, A., Beccaria, M., Fachechi, A.: A relativistic extension of Hopfield neural networks via the mechanical analogy. arXiv:1801.01743v1 (2018)
Hamming, R.: Coding and Information Theory. Prentice-Hall, Englewood Cliffs (1968)
MATH Google Scholar
Kosko, B.: Adaptive bidirectional associative memories. Appl. Opt. 26(23), 4947–4960 (1987)
Article Google Scholar
Willshaw, D.J., Buneman, O.P., Longuet-Higgins, H.C.: Non-holographic associative memory. Nature 222, 960–962 (1969)
Article Google Scholar
Stöckel, A.: Design Space Exploration of Associative Memories Using Spiking Neurons with Respect to Neuromorphic Hardware Implementations. Universität Bielefeld, Bielefeld (2016)
Google Scholar
Vázquez, A.: New associative model with dynamical synapses. Neural Process. Lett. 28(3), 189–207 (2008)
Article Google Scholar
Vázquez, R. Sossa, H.: Voice translator based on associative memories. In: Advances in Neural Networks, pp. 341–350 (2008)
Google Scholar
Minghu, J., Biqin, L., Baozong, Y.: Speech recognition by using the extended associative memory neural network (EAMNN). In: IEEE International Conference on Intelligent Processing Systems, vol. 2, pp. 1777–1780 (1997)
Google Scholar
Krotov, D., Hopfield, J.: Dense associative memory for pattern recognition. In: Advances in Neural Information Processing Systems 29, pp. 1172–1180 (2016)
Google Scholar
Giovanni, C.: Design of associative memory for gray-scale images by multilayer Hopfield neural networks. In: Proceedings of the 10th WSEAS International Conference on CIRCUITS, Vouliagmeni, Athens, Greece, pp. 376–379 (2006)
Google Scholar
Sussner, P., Esmi, E., Villaverde, I., Graña, M.: The Kosko subsethood fuzzy associative memory (KS-FAM): mathematical background and applications in computer vision. J. Math. Imaging Vis. 42, 134–149 (2012)
Article MathSciNet Google Scholar
Kohonen, T.: Self-organizing Maps, 3rd Extended edn. Springer, New York/Heidelberg (2001)
Book Google Scholar
Furao, S., Ouyang, Q., Kasai, W., Hasegawa, O.: A general associative memory based on self-organizing incremental neural network. Neurocomputing 104, 57–71 (2013)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Kemerovo State University, Kemerovo, Russia
Tatiana Khorosheva, Marina Novoseltseva, Nazim Geidarov, Nikolay Krivosheev & Sergey Chernenko

Authors

Tatiana Khorosheva
View author publications
You can also search for this author in PubMed Google Scholar
Marina Novoseltseva
View author publications
You can also search for this author in PubMed Google Scholar
Nazim Geidarov
View author publications
You can also search for this author in PubMed Google Scholar
Nikolay Krivosheev
View author publications
You can also search for this author in PubMed Google Scholar
Sergey Chernenko
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tatiana Khorosheva .

Editor information

Editors and Affiliations

Scientific Network for Innovation and Research Excellence, Machine Intelligence Research Labs (MIR Labs), Auburn, WA, USA
Ajith Abraham
Rostov State Transport University, Rostov-on-Don, Russia
Sergey Kovalev
Bauman Moscow State Technical University, Moscow, Russia
Valery Tarassov
VSB-Technical University of Ostrava, Ostrava, Czech Republic
Vaclav Snasel
Rostov State Transport University, Rostov-on-Don, Russia
Andrey Sukhanov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Khorosheva, T., Novoseltseva, M., Geidarov, N., Krivosheev, N., Chernenko, S. (2019). Neural Network Control Interface of the Speaker Dependent Computer System «Deep Interactive Voice Assistant DIVA» to Help People with Speech Impairments. In: Abraham, A., Kovalev, S., Tarassov, V., Snasel, V., Sukhanov, A. (eds) Proceedings of the Third International Scientific Conference “Intelligent Information Technologies for Industry” (IITI’18). IITI'18 2018. Advances in Intelligent Systems and Computing, vol 874. Springer, Cham. https://doi.org/10.1007/978-3-030-01818-4_44

Download citation

DOI: https://doi.org/10.1007/978-3-030-01818-4_44
Published: 07 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01817-7
Online ISBN: 978-3-030-01818-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Neural Network Control Interface of the Speaker Dependent Computer System «Deep Interactive Voice Assistant DIVA» to Help People with Speech Impairments