Skip to main content
Top

2021 | OriginalPaper | Chapter

Correlation of Visual Perceptions and Extraction of Visual Articulators for Kannada Lip Reading

Authors : M. S. Nandini, Nagappa U. Bhajantri, Trisiladevi C. Nagavi

Published in: Progress in Advanced Computing and Intelligent Engineering

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The Visual Articulators like Teeth, Lips and tongue are correlated to one another and these correlation among those visual features are extracted with visual perceptions. The term visual perceptions indicates the features that are used as a parameter for representation learning and the description of visual information. These visual features are extracted and classified into different classes of Kannada Words. The movements of lips, tongue, and teeth are extracted by analyzing the inner and outer portion of lips along with movement of tongue and teeth. These parts teeth, lips, tongue are together used for feature extraction, as these features are correlated with resonances. These resonance information is extracted from every frames by analyzing and understanding the correlation that exists among them in different sequence of frames of a video. The proposed method of visual perceptions has yielded an accuracy of 82.83% over a dataset having different benchmark challenges. These benchmark challenges include facial tilt as a result of which the correlation may be less among teeth, tongue and lips. Thus, we have erected a new methodology of analyzing and understanding the visual features. The Kannada Words spoken by a person is indicated by assigning labels to the sequence of frames of a video in specific pattern. If these sequence of patterns of data is extracted and visualized from a video, the system recognizes the lip movements into different classes of words spoken.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M. et al.: Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv:1603.04467, 516.5 Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M. et al.: Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv:​1603.​04467, 516.5
2.
go back to reference Assael, Y.M., Shilling Ford, B., Whiteson, S., de Fre-itas, N.: Lip net: Sentence-level lipreading. Under submission to ICLR 517, arXiv:1611.01599v2, 516. 2, 8 Assael, Y.M., Shilling Ford, B., Whiteson, S., de Fre-itas, N.: Lip net: Sentence-level lipreading. Under submission to ICLR 517, arXiv:​1611.​01599v2, 516. 2, 8
3.
go back to reference Bahdanau, D., Cho, K., Bengio Y.: Neural machine translation by jointly learning to align and translate. In: Proceedings ICLR, 515. 1, 3, 11 Bahdanau, D., Cho, K., Bengio Y.: Neural machine translation by jointly learning to align and translate. In: Proceedings ICLR, 515. 1, 3, 11
4.
go back to reference Bengio, S., Vinyals, O., Jaitly, N., Shazeer, N.: Scheduled sampling for sequence prediction with recurrent neural net-works. In: Advances in Neural Information Processing Systems, pp. 1171–1179, 515. 2, 4 Bengio, S., Vinyals, O., Jaitly, N., Shazeer, N.: Scheduled sampling for sequence prediction with recurrent neural net-works. In: Advances in Neural Information Processing Systems, pp. 1171–1179, 515. 2, 4
5.
go back to reference Lan, Y., Harvey, R., Theobald, B., Ong, E.-J., Bow-den, R.: Comparing visual features for lip reading. In: International Conference on Auditory-Visual Speech Processing 509, pp. 102–106, 509. 8 Lan, Y., Harvey, R., Theobald, B., Ong, E.-J., Bow-den, R.: Comparing visual features for lip reading. In: International Conference on Auditory-Visual Speech Processing 509, pp. 102–106, 509. 8
6.
go back to reference Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: Proceedings of BMVC. 514. 2 Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: Proceedings of BMVC. 514. 2
7.
8.
go back to reference Chorowski, J.K., Bahdanau, D., Serdyuk, D., Cho, K., Bengio, Y.: Attention-based models for speech recognition. In: Advances in Neural Information Processing Systems, pp. 577–585, 515. 2, 3, 4 Chorowski, J.K., Bahdanau, D., Serdyuk, D., Cho, K., Bengio, Y.: Attention-based models for speech recognition. In: Advances in Neural Information Processing Systems, pp. 577–585, 515. 2, 3, 4
9.
go back to reference Chung, J.S., Zisserman, A.: Lip reading in the wild. In: Proceedings of ACCV, 516. 1, 2, 5, 6, 8 Chung, J.S., Zisserman, A.: Lip reading in the wild. In: Proceedings of ACCV, 516. 1, 2, 5, 6, 8
10.
go back to reference Chung, J.S., Zisserman, A.: Out of time: automated lip sync in the wild. In: Workshop on Multi-view Lip-reading, ACCV, 516. 5 Chung, J.S., Zisserman, A.: Out of time: automated lip sync in the wild. In: Workshop on Multi-view Lip-reading, ACCV, 516. 5
11.
go back to reference Cooke, M., Barker, J., Cunningham, S., Shao, X.: An audio-visual corpus for speech perception and automatic speech recognition. J. Acoust. Soc. Am. 15(5), 2421–2424, 506. 1, 2, 6, 8 Cooke, M., Barker, J., Cunningham, S., Shao, X.: An audio-visual corpus for speech perception and automatic speech recognition. J. Acoust. Soc. Am. 15(5), 2421–2424, 506. 1, 2, 6, 8
12.
go back to reference Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: Proceedings CVPR, 516. 4 Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: Proceedings CVPR, 516. 4
13.
go back to reference Galatas, G., Potamianos, G., Makedon, F.: Audio-visual speech recognition incorporating facial depth information captured by the kinect. In: 512 Proceedings of the 5th European on Signal Processing Conference (EUSIPCO), pp. 2714–2717. IEEE, 512. 2 Galatas, G., Potamianos, G., Makedon, F.: Audio-visual speech recognition incorporating facial depth information captured by the kinect. In: 512 Proceedings of the 5th European on Signal Processing Conference (EUSIPCO), pp. 2714–2717. IEEE, 512. 2
14.
go back to reference Graves, A., Fernandez, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling un segmented sequence data with recurrent neural networks. In: Proceedings of the 5th International Conference on Machine Learning, pp. 369–376. ACM, 506. 1, 2 Graves, A., Fernandez, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling un segmented sequence data with recurrent neural networks. In: Proceedings of the 5th International Conference on Machine Learning, pp. 369–376. ACM, 506. 1, 2
15.
go back to reference Graves, A., Jaitly, N.: Towards end-to-end speech recognition with recurrent neural networks. In: Proceedings of the 19th International Conference on Machine Learning (ICML-14), pp. 1764–1772, 514. 1 Graves, A., Jaitly, N.: Towards end-to-end speech recognition with recurrent neural networks. In: Proceedings of the 19th International Conference on Machine Learning (ICML-14), pp. 1764–1772, 514. 1
16.
go back to reference Graves, A. Jaitly, N., Mohamed, A.-R.: Hybrid speech recognition with deep bidirectional lstm. In: 513 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 273–278. IEEE, 513. 4 Graves, A. Jaitly, N., Mohamed, A.-R.: Hybrid speech recognition with deep bidirectional lstm. In: 513 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 273–278. IEEE, 513. 4
17.
go back to reference Wand, M., Koutn, J. et al.: Lip reading with long short-term memory. In: 516 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6115–6119. IEEE, 516. 2, 8 Wand, M., Koutn, J. et al.: Lip reading with long short-term memory. In: 516 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6115–6119. IEEE, 516. 2, 8
18.
go back to reference Krizhevsky, A., Sutskever, I., Hinton, G.E.: Image net classification with deep convolutional neural networks. In: NIPS, pp. 1106–1114, 512. 1 Krizhevsky, A., Sutskever, I., Hinton, G.E.: Image net classification with deep convolutional neural networks. In: NIPS, pp. 1106–1114, 512. 1
19.
go back to reference Petridis, S., Pantic, M.: Deep complementary bottle neck features for visual speech recognition. In: ICASSP, pp. 504–508, 516. 2 Petridis, S., Pantic, M.: Deep complementary bottle neck features for visual speech recognition. In: ICASSP, pp. 504–508, 516. 2
Metadata
Title
Correlation of Visual Perceptions and Extraction of Visual Articulators for Kannada Lip Reading
Authors
M. S. Nandini
Nagappa U. Bhajantri
Trisiladevi C. Nagavi
Copyright Year
2021
Publisher
Springer Singapore
DOI
https://doi.org/10.1007/978-981-15-6353-9_23