Skip to main content

2019 | OriginalPaper | Buchkapitel

Temporal and Spatial Features for Visual Speech Recognition

verfasst von : Ali Jafari Sheshpoli, Ali Nadian-Ghomsheh

Erschienen in: Fundamental Research in Electrical Engineering

Verlag: Springer Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Speech recognition from visual data is in important step towards communication when audio is not available. This paper considers several hand crafted features including HOG, MBH, DCT, LBP, MTC, and their combinations for recognizing speech from a sequence of images. Several classifiers including SVM, decision trees, K-nearest neighbor algorithm and the sub-space K-nearest algorithm were tested feature evaluation. Further, the application of PCA for dimensionality reduction was considered in this study. Two sets of tests were carried out in this study: lip pose recognition and recognition of isolated words. For evaluation, the MIRACL-VC1 data set was considered. Self-dependent tests reached an accuracy of over 95% while in the self-independent tests, the maximum accuracy of recognition was about 52%.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Terissi LD, Parodi M, Gómez JC (2014) Lip reading using wavelet-based features and random forests classification. In: 22nd international conference on pattern recognition, Sweden, 24–28 Aug 2014 Terissi LD, Parodi M, Gómez JC (2014) Lip reading using wavelet-based features and random forests classification. In: 22nd international conference on pattern recognition, Sweden, 24–28 Aug 2014
2.
Zurück zum Zitat Faridah F, Achmad B (2015) Lip image feature extraction utilizing snake’s control points for lip reading applications. Int J Electr Comput Eng 5(4):720–728 Faridah F, Achmad B (2015) Lip image feature extraction utilizing snake’s control points for lip reading applications. Int J Electr Comput Eng 5(4):720–728
3.
Zurück zum Zitat Chung JS et al (2016) Lip reading sentences in the wild. In: Asian conference on computer vision. Taiwan, 20–24 Nov 2016 Chung JS et al (2016) Lip reading sentences in the wild. In: Asian conference on computer vision. Taiwan, 20–24 Nov 2016
4.
Zurück zum Zitat Paleček K (2015) Comparison of depth-based features for lipreading. In: 38th International conference on telecommunications and signal processing (TSP), Prague, 9–11 Jul 2015 Paleček K (2015) Comparison of depth-based features for lipreading. In: 38th International conference on telecommunications and signal processing (TSP), Prague, 9–11 Jul 2015
5.
Zurück zum Zitat Wang J et al (2015) Lipreading using profile lips rebuilt by 3D data from the Kinect. J Comput Inf Syst 11(7):2429–2438 Wang J et al (2015) Lipreading using profile lips rebuilt by 3D data from the Kinect. J Comput Inf Syst 11(7):2429–2438
6.
Zurück zum Zitat Wand M, Koutník J, Schmidhuber J (2016) Lipreading with long short-term memory. In: International conference on acoustics, speech and signal processing, Shanghi, 20–25 Mar 2016 Wand M, Koutník J, Schmidhuber J (2016) Lipreading with long short-term memory. In: International conference on acoustics, speech and signal processing, Shanghi, 20–25 Mar 2016
7.
Zurück zum Zitat Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154CrossRef Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154CrossRef
8.
Zurück zum Zitat Rekik A, Ben-Hamadou A, Mahdi W (2016) An adaptive approach for lip-reading using image and depth data. Multimedia Tools Appl 75(14):8609–8636CrossRef Rekik A, Ben-Hamadou A, Mahdi W (2016) An adaptive approach for lip-reading using image and depth data. Multimedia Tools Appl 75(14):8609–8636CrossRef
9.
Zurück zum Zitat Pei Y, Kim T-K, Zha H (2013) Unsupervised random forest manifold alignment for lipreading. In: Proceedings of the IEEE international conference on computer vision, USA, 1–8 Dec 2013 Pei Y, Kim T-K, Zha H (2013) Unsupervised random forest manifold alignment for lipreading. In: Proceedings of the IEEE international conference on computer vision, USA, 1–8 Dec 2013
10.
Zurück zum Zitat Ho TK (1998) Nearest neighbors in random subspaces. In: 1998 proceedings joint IAPR international workshops advances in pattern recognition, Australia, 11–13 AugCrossRef Ho TK (1998) Nearest neighbors in random subspaces. In: 1998 proceedings joint IAPR international workshops advances in pattern recognition, Australia, 11–13 AugCrossRef
Metadaten
Titel
Temporal and Spatial Features for Visual Speech Recognition
verfasst von
Ali Jafari Sheshpoli
Ali Nadian-Ghomsheh
Copyright-Jahr
2019
Verlag
Springer Singapore
DOI
https://doi.org/10.1007/978-981-10-8672-4_10

Neuer Inhalt