Top

Published in:

2020 | OriginalPaper | Chapter

Set-Top Box Automated Lip-Reading Controller Based on Convolutional Neural Network

Authors : Yuanyao Lu, Haiyang Jiang

Published in: Advances in Human Factors and Systems Interaction

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Automated lip reading (ALR) is a new mode of human-computer interaction and can directly recognize the speech content from the lip motion image sequences of the speaker. Using ALR based on convolutional neural network to realize the control of the set-top box to power on/off, change the channel and adjust the volume is a challenging and useful work. The paper uses a coupled three-dimensional convolutional neural network (3D-CNN) architecture that calculates the similarity between the captured lip instruction characteristic image and the standard image after training. We identify the control commands, change the channel and adjust the volume to achieve new control mode for set-top boxes. The experimental accuracy of lip-reading recognition is obviously improved compared to other similar methods. We are making a meaningful exploration for the practical use of ALR and our results and application experience can be easily extended to other machines and smart home systems.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter System-of-Systems Approach for Sustainable Production Planning and Controlling in Manufacturing Companies

next chapter Learning from Employee Perceptions of Human-Work and Work-Organization in Digitized Production-Drilling Activity in Mines

Yadav, I.C., Shahnawazuddin, S., Pradhan, G.: Addressing noise and pitch sensitivity of speech recognition system through variational mode decomposition based spectral smoothing. Digit. Sig. Process. 86, 55–64 (2019)CrossRef

Huang, J., Kingsbury, B.: Audio-visual deep learning for noise robust speech recognition. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7596–7599 (2013)

Shahnawazuddin, S., Maity, K., Pradhan, G.: Improving the performance of keyword spotting system for children’s speech through prosody modification. Digit. Sig. Proc. 86, 11–18 (2019)CrossRef

Yi, J.Y., Tao, J.H., Wen, Z.Q., Bai, Y.: Language-adversarial transfer learning for low-resource speech recognition. IEEE-ACM Trans. Audio Speech Lang. Process. 27, 621–630 (2019)CrossRef

Alsharhan, E., Ramsay, A.: Improved Arabic speech recognition system through the automatic generation of fine-grained phonetic transcriptions. Inf. Process. Manag. 56, 343–363 (2019)CrossRef

Zeiler, S., Nicheli, R., Ma, N., Brown, G.J., Kolossa, D.: Robust audiovisual speech recognition using noise-adaptive linear discriminant analysis. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2797–2801 (2016)

Wang, J., Zhang, J., Honda, K., Wei, J., Dang, J.: Audio-visual speech recognition integrating 3D lip information obtained from the Kinect. Multimed. Syst. 22(3), 315–323 (2016)CrossRef

Petridis, S., Pantic, M.: Deep complementary bottleneck features for visual speech recognition. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2304–2308 (2016)

Lee, K.D., Lee, M.J., Lee, S. Y.: Extraction of frame-difference features based on PCA and ICA for lip-reading. In: IEEE International Joint Conference on Neural Networks (IJCNN 2005), Montreal, Canada, pp. 232–237 (2005)

10.

Nath, R., Rahman, F.S., Nath, S.: Lip contour extraction scheme using morphological reconstruction based segmentation. In: 1st International Conference on Electrical Engineering and Information and Communication Technology (ICEEICT), Dhaka, Bangladesh (2014)

11.

Li, Y., Hang, Y., Wang, Y. K., Chang, Y.C.: A lip localization method based on HSV transformation in smart phone environment. In: 12th IEEE International Conference on Signal Processing (ICSP), HangZhou, Peoples Republic of China, pp. 1285–1290 (2014)

12.

Gritzman, A.D., Rubin, D.M., Pantanowitz, A.: Comparison of colour transforms used in lip segmentation algorithms. Sig. Image Video Process. 9, 947–957 (2015)CrossRef

13.

Maulana, M.R.A.R., Fanany, M.I.: Sentence-level Indonesian lip reading with spatiotemporal CNN and gated RNN. In: International Conference on Advanced Computer Science and Information Systems. 9th International Conference on Advanced Computer Science and Information Systems (ICACSIS), pp. 375–380. Inst Pertanian, Dept Ilmu Komputer, Jakarta, Indonesia (2017)

14.

Rahmani, M.H., Almasganj, F.: Lip-reading via a DNN-HMM hybrid system using combination of the image-based and model-based features. In: 3rd International Conference on Pattern Analysis and Image Analysis (IPRIA), pp. 195–199. Faculty of Technology and Engineering, Shahrekord University, Shahrekord, Iran (2017)

15.

He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1026–1034. IEEE Press, Santiago, Chile (2015)

Title: Set-Top Box Automated Lip-Reading Controller Based on Convolutional Neural Network
Authors: Yuanyao Lu
Haiyang Jiang
Publisher: Springer International Publishing
Book: Advances in Human Factors and Systems Interaction
Print ISBN: 978-3-030-20039-8

Electronic ISBN: 978-3-030-20040-4

Copyright Year: 2020
DOI: https://doi.org/10.1007/978-3-030-20040-4_38

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner