Top

Published in:

2021 | OriginalPaper | Chapter

Correlation of Visual Perceptions and Extraction of Visual Articulators for Kannada Lip Reading

Authors : M. S. Nandini, Nagappa U. Bhajantri, Trisiladevi C. Nagavi

Published in: Progress in Advanced Computing and Intelligent Engineering

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

The Visual Articulators like Teeth, Lips and tongue are correlated to one another and these correlation among those visual features are extracted with visual perceptions. The term visual perceptions indicates the features that are used as a parameter for representation learning and the description of visual information. These visual features are extracted and classified into different classes of Kannada Words. The movements of lips, tongue, and teeth are extracted by analyzing the inner and outer portion of lips along with movement of tongue and teeth. These parts teeth, lips, tongue are together used for feature extraction, as these features are correlated with resonances. These resonance information is extracted from every frames by analyzing and understanding the correlation that exists among them in different sequence of frames of a video. The proposed method of visual perceptions has yielded an accuracy of 82.83% over a dataset having different benchmark challenges. These benchmark challenges include facial tilt as a result of which the correlation may be less among teeth, tongue and lips. Thus, we have erected a new methodology of analyzing and understanding the visual features. The Kannada Words spoken by a person is indicated by assigning labels to the sequence of frames of a video in specific pattern. If these sequence of patterns of data is extracted and visualized from a video, the system recognizes the lip movements into different classes of words spoken.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter A Supervised Approach to Aspect Term Extraction Using Minimal Robust Features for Sentiment Analysis

next chapter Automatic Short Answer Grading Using Corpus-Based Semantic Similarity Measurements

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M. et al.: Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv:1603.04467, 516.5

Assael, Y.M., Shilling Ford, B., Whiteson, S., de Fre-itas, N.: Lip net: Sentence-level lipreading. Under submission to ICLR 517, arXiv:1611.01599v2, 516. 2, 8

Bahdanau, D., Cho, K., Bengio Y.: Neural machine translation by jointly learning to align and translate. In: Proceedings ICLR, 515. 1, 3, 11

Bengio, S., Vinyals, O., Jaitly, N., Shazeer, N.: Scheduled sampling for sequence prediction with recurrent neural net-works. In: Advances in Neural Information Processing Systems, pp. 1171–1179, 515. 2, 4

Lan, Y., Harvey, R., Theobald, B., Ong, E.-J., Bow-den, R.: Comparing visual features for lip reading. In: International Conference on Auditory-Visual Speech Processing 509, pp. 102–106, 509. 8

Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: Proceedings of BMVC. 514. 2

Chorowski, J., Bahdanau, D., Cho, K., Bengio, Y.: End-to-end continuous speech recognition using attention-based recurrent nn: first results. arXiv:1412.1602,514.2

Chorowski, J.K., Bahdanau, D., Serdyuk, D., Cho, K., Bengio, Y.: Attention-based models for speech recognition. In: Advances in Neural Information Processing Systems, pp. 577–585, 515. 2, 3, 4

Chung, J.S., Zisserman, A.: Lip reading in the wild. In: Proceedings of ACCV, 516. 1, 2, 5, 6, 8

10.

Chung, J.S., Zisserman, A.: Out of time: automated lip sync in the wild. In: Workshop on Multi-view Lip-reading, ACCV, 516. 5

11.

Cooke, M., Barker, J., Cunningham, S., Shao, X.: An audio-visual corpus for speech perception and automatic speech recognition. J. Acoust. Soc. Am. 15(5), 2421–2424, 506. 1, 2, 6, 8

12.

Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: Proceedings CVPR, 516. 4

13.

Galatas, G., Potamianos, G., Makedon, F.: Audio-visual speech recognition incorporating facial depth information captured by the kinect. In: 512 Proceedings of the 5th European on Signal Processing Conference (EUSIPCO), pp. 2714–2717. IEEE, 512. 2

14.

Graves, A., Fernandez, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling un segmented sequence data with recurrent neural networks. In: Proceedings of the 5th International Conference on Machine Learning, pp. 369–376. ACM, 506. 1, 2

15.

Graves, A., Jaitly, N.: Towards end-to-end speech recognition with recurrent neural networks. In: Proceedings of the 19th International Conference on Machine Learning (ICML-14), pp. 1764–1772, 514. 1

16.

Graves, A. Jaitly, N., Mohamed, A.-R.: Hybrid speech recognition with deep bidirectional lstm. In: 513 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 273–278. IEEE, 513. 4

17.

Wand, M., Koutn, J. et al.: Lip reading with long short-term memory. In: 516 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6115–6119. IEEE, 516. 2, 8

18.

Krizhevsky, A., Sutskever, I., Hinton, G.E.: Image net classification with deep convolutional neural networks. In: NIPS, pp. 1106–1114, 512. 1

19.

Petridis, S., Pantic, M.: Deep complementary bottle neck features for visual speech recognition. In: ICASSP, pp. 504–508, 516. 2

Title: Correlation of Visual Perceptions and Extraction of Visual Articulators for Kannada Lip Reading
Authors: M. S. Nandini
Nagappa U. Bhajantri
Trisiladevi C. Nagavi
Publisher: Springer Singapore
Book: Progress in Advanced Computing and Intelligent Engineering
Print ISBN: 978-981-15-6352-2

Electronic ISBN: 978-981-15-6353-9

Copyright Year: 2021
DOI: https://doi.org/10.1007/978-981-15-6353-9_23

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"