Skip to main content

2019 | OriginalPaper | Buchkapitel

Lip-Reading Based on Deep Learning Model

verfasst von : Mei-li Zhu, Qing-qing Wang, Jiang-lin Luo

Erschienen in: Transactions on Edutainment XV

Verlag: Springer Berlin Heidelberg

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

With the rapid development of computer computing power, deep learning plays a more and more important role in the fields of automatic driving, medical research, industrial automation and so on. In order to improve the accuracy of lip-reading recognition, an algorithm based on the model of lip deep learning was proposed in this paper. Binary image of the lip contour motion sequence was projected to the spatio-temporal energy, lip dynamic grayscale was used to reduce noise interference in the recognition process and then lip-reading recognition result was improved by using the excellent characteristics of deep learning ability. The experimental results show that deep learning can obtain the effective characteristics of lip dynamic change from the lip dynamic gray scale and get better recognition results.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Baldonado, M., Chang, C.-C.K., Gravano, L., Paepcke, A.: The stanford digital library metadata architecture. Int. J. Digit. Libr. 1, 108–121 (1997)CrossRef Baldonado, M., Chang, C.-C.K., Gravano, L., Paepcke, A.: The stanford digital library metadata architecture. Int. J. Digit. Libr. 1, 108–121 (1997)CrossRef
5.
Zurück zum Zitat Yao, H., Gao, W., Wang, R.: A survey of lipreading-one of visual languages. Acta Electronica Sinica 2, 239–246 (2001) Yao, H., Gao, W., Wang, R.: A survey of lipreading-one of visual languages. Acta Electronica Sinica 2, 239–246 (2001)
6.
Zurück zum Zitat Yao, W., Liang, Y., Du, M.: A real-time lip localization and tacking for lipreading. In: Proceedings of the 3rd International Conference on Advanced Computer Theory and Engineering, pp. 363–366. IEEE, Chengdu (2010) Yao, W., Liang, Y., Du, M.: A real-time lip localization and tacking for lipreading. In: Proceedings of the 3rd International Conference on Advanced Computer Theory and Engineering, pp. 363–366. IEEE, Chengdu (2010)
7.
Zurück zum Zitat Rao, R.A., Russell, R.M.: Lip modeling for visual speech recognition. In: Proceeding of 28th Annual Asilomar Conference on Signals Systems and Computers, Pacific Grove: [s.n.] (1994) Rao, R.A., Russell, R.M.: Lip modeling for visual speech recognition. In: Proceeding of 28th Annual Asilomar Conference on Signals Systems and Computers, Pacific Grove: [s.n.] (1994)
8.
Zurück zum Zitat Jun, H., Hua, Z:. A real time lip detection method in lipreading. In: 2007 Chinese Control Conference, CCC 2007, 31 June–26 July 2007, pp. 516–520 (2007) Jun, H., Hua, Z:. A real time lip detection method in lipreading. In: 2007 Chinese Control Conference, CCC 2007, 31 June–26 July 2007, pp. 516–520 (2007)
9.
Zurück zum Zitat Pao, T.L., Liao, W.Y.: A motion feature approach for audio-visual recognition. In: Proceedings of 48th Midwest Symposium on Circuits and Systems, vol. 1, pp. 421–424 (2005) Pao, T.L., Liao, W.Y.: A motion feature approach for audio-visual recognition. In: Proceedings of 48th Midwest Symposium on Circuits and Systems, vol. 1, pp. 421–424 (2005)
10.
Zurück zum Zitat Da Silveira, L.G., Facon, J., Borges, D.L.: Visual speech recognition: a solution from feature extraction to words classification. In: Proceedings of 16th Brazilian Symposium on Computer Graphics and Image Processing, pp. 399–405 (2003) Da Silveira, L.G., Facon, J., Borges, D.L.: Visual speech recognition: a solution from feature extraction to words classification. In: Proceedings of 16th Brazilian Symposium on Computer Graphics and Image Processing, pp. 399–405 (2003)
12.
Zurück zum Zitat Leszczynski, M., Skarbek, W.: Viseme recognition - a comparative study. In: AVSS-Advanced Video and Signal Based Surveillance, pp. 287–292 (2005) Leszczynski, M., Skarbek, W.: Viseme recognition - a comparative study. In: AVSS-Advanced Video and Signal Based Surveillance, pp. 287–292 (2005)
13.
Zurück zum Zitat Kaynak, M.N., Zhi, Q., Cheok, A.D., et al.: Analysis of lip geometric features for audio—visual speech recognition. IEEE Trans. Syst. Man Cybern. Part A: Syst. Hum. 34(4), 564–570 (2004)CrossRef Kaynak, M.N., Zhi, Q., Cheok, A.D., et al.: Analysis of lip geometric features for audio—visual speech recognition. IEEE Trans. Syst. Man Cybern. Part A: Syst. Hum. 34(4), 564–570 (2004)CrossRef
14.
Zurück zum Zitat Seguier, R., Cladel, N.: Multiobjectives genetic snakes: application on audio-visual speech recognition. In: Proceedings of Fourth EURASIP Conference Focused on Video/Image Processing and Multimedia Communications, vol. 2, pp. 625–630 (2003) Seguier, R., Cladel, N.: Multiobjectives genetic snakes: application on audio-visual speech recognition. In: Proceedings of Fourth EURASIP Conference Focused on Video/Image Processing and Multimedia Communications, vol. 2, pp. 625–630 (2003)
15.
Zurück zum Zitat Matthews, I., Cootes, T.F., Bangham, J.A., et al.: Extraction of visual features for lipreading. IEEE Trans. Pattern Anal. Mach. Intell. 24(2), 198–213 (2002)CrossRef Matthews, I., Cootes, T.F., Bangham, J.A., et al.: Extraction of visual features for lipreading. IEEE Trans. Pattern Anal. Mach. Intell. 24(2), 198–213 (2002)CrossRef
16.
Zurück zum Zitat Wang, W., Cosker, D., Hicks, Y., Saneit, S., Chambers, J.: Video assisted speech source separation. In: 2005 Proceedings of International Conference on Acoustics, Speech, and Signal Processing, (ICASSP 2005), pp. 425–428. IEEE (2005) Wang, W., Cosker, D., Hicks, Y., Saneit, S., Chambers, J.: Video assisted speech source separation. In: 2005 Proceedings of International Conference on Acoustics, Speech, and Signal Processing, (ICASSP 2005), pp. 425–428. IEEE (2005)
17.
Zurück zum Zitat Cootes, T.F., Walker, K.N., Taylor, C.J.: View-based active appearance models. In: Proceedings of International Conference on Face and Gesture Recognition, pp. 227–232 (2000) Cootes, T.F., Walker, K.N., Taylor, C.J.: View-based active appearance models. In: Proceedings of International Conference on Face and Gesture Recognition, pp. 227–232 (2000)
18.
Zurück zum Zitat Bourlard, H., Kamp, Y.: Auto-association by multilayer perceptrons and singular value decomposition. Biol. Cybern. 59(4), 291–294 (1988)MathSciNetCrossRef Bourlard, H., Kamp, Y.: Auto-association by multilayer perceptrons and singular value decomposition. Biol. Cybern. 59(4), 291–294 (1988)MathSciNetCrossRef
19.
Zurück zum Zitat Hinton, G.E.: A practical guide to training restricted Boltzmann machines. Momentum 9(1), 599–619 (2012) Hinton, G.E.: A practical guide to training restricted Boltzmann machines. Momentum 9(1), 599–619 (2012)
20.
Zurück zum Zitat Cootes, T.F., Hill, A., Taylor, C.J., et al.: The use of active shape models for locating structures in medical images. Image Vis. Comput. 12(6), 355–366 (1994)CrossRef Cootes, T.F., Hill, A., Taylor, C.J., et al.: The use of active shape models for locating structures in medical images. Image Vis. Comput. 12(6), 355–366 (1994)CrossRef
21.
Zurück zum Zitat Li, G., Wang, M., Lin, L.: Improving Chinese lip-reading recognizing rate by unsymmetrical lip contour model. Optics Precis. Eng. (3), 473–477 (2006) Li, G., Wang, M., Lin, L.: Improving Chinese lip-reading recognizing rate by unsymmetrical lip contour model. Optics Precis. Eng. (3), 473–477 (2006)
Metadaten
Titel
Lip-Reading Based on Deep Learning Model
verfasst von
Mei-li Zhu
Qing-qing Wang
Jiang-lin Luo
Copyright-Jahr
2019
Verlag
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-662-59351-6_4