nach oben

International Journal of Speech Technology

Erschienen in:

23.01.2016

Multiple cameras audio visual speech recognition using active appearance model visual features in car environment

verfasst von: Astik Biswas, P. K. Sahu, Mahesh Chandra

Erschienen in: International Journal of Speech Technology | Ausgabe 1/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Consideration of visual speech features along with traditional acoustic features have shown decent performance in uncontrolled auditory environment. However, most of the existing audio-visual speech recognition (AVSR) systems have been developed in the laboratory conditions and rarely addressed the visual domain problems. This paper presents an active appearance model (AAM) based multiple-camera AVSR experiment. The shape and appearance information are extracted from jaw and lip region to enhance the performance in vehicle environments. At first, a series of visual speech recognition (VSR) experiments are carried out to study the impact of each camera on multi-stream VSR. Four cameras in car audio-visual corpus is used to perform the experiments. The individual camera stream is fused to have four-stream synchronous hidden Markov model visual speech recognizer. Finally, optimum four-stream VSR is combined with single stream acoustic HMM to build five-stream AVSR. The dual modality AVSR system shows more robustness compared to acoustic speech recognizer across all driving conditions.

Vorheriger Artikel Integration of Yoruba language into MaryTTS

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

The data of eight microphone is not available in the database. Thus literally we can say the number of microphones is seven.

Some files of some speakers are missing due to equipment failure while recording.

Biswas, A., Sahu, P., Bhowmick, A., & Chandra, M. (2015). AAM based features for multiple camera visual speech recognition in car environment. Procedia Computer Science, 57, 614–621.CrossRef

Biswas, A., Sahu, P. K., & Chandra, M. (2014). Admissible wavelet packet features based on human inner ear frequency response for hindi consonant recognition. Computers & Electrical Engineering (Elsevier), 40(4), 1111–1122.CrossRef

Chien, J.-T., Lai, J.-R., Lai, P.-Y. (2001). Microphone array signal processing for far-talking speech recognition. In IEEE Third Workshop on Signal Processing Advances in Wireless Communications, (pp. 322–325).

Cootes, T. F., Edwards, G. J., & Taylor, C. J. (1998). Active appearance models (pp. 484–498). Lecture Notes in Computer Science Heidelberg: Springer.

Davis, S. B., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. In IEEE Transactions on Acoustic Speech Signal Process ASSP-28 (357–366).

Estellers, V., & Thiran, J.-P. (2012). Multi-pose lipreading and audio-visual speech recognition. EURASIP Journal on Advances in Signal Processing, 2012(1), 1–23.CrossRef

Faubel, F., Georges, M., Kumatani, K., Bruhn, A., & Klakow, D. (2011). Improving hands-free speech recognition in a car through audio-visual voice activity detection. In Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA), (pp. 70–75).

Gao, X., Su, Y., Li, X., & Tao, D. (2010). A review of active appearance models. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 40(2), 145–158.CrossRef

Irwin, A. (2008). Investigating the effects of accent on visual speech, Ph.D. thesis, University of Nottingham.

Kaynak, M. N., Zhi, Q., Cheok, A. D., Sengupta, K., Jian, Z., & Chung, K. C. (2004). Lip geometric features for human-computer interaction using bimodal speech recognition: comparison and analysis. Speech Communication, 43(1), 1–16.CrossRef

Kleinschmidt, T., Dean, D., Sridharan, S., Mason, M. (2007). A continuous speech recognition evaluation protocol for the AVICAR database. In In proceedings of the International Conference on Signal Processing and Communication Systems (pp. 339–344).

Lee, K. F., & Hon, H. W. (1989). Speaker-independent phone recognition using hidden Markov models. IEEE Transactions of Acoustics, Speech and Signal Processing, 37(14), 1641–1648.CrossRef

Lee, B., Hasegawa-Johnson, M., Goudeseune, C., Kamdar, S., Borys, S., Liu, M., & Huang, T. S. (2004). AVICAR: Audio-visual speech corpus in a car environment. In INTERSPEECH (pp. 2489–2492). Jeju Island.

Lucey, P., & Potamianos, G. (2006). Lipreading using profile versus frontal views. In IEEE 8th Workshop on Multimedia Signal Processing (pp. 24–28).

Navarathna, R., Dean, D., Sridharan, S., & Lucey, P. (2013). Multiple cameras for audio-visual speech recognition in an automotive environment. Computer Speech & Language, 27(4), 911–927.CrossRef

Navarathna, R., Dean, D. B., Lucey, P. J., Sridharan, S., & Fookes, C. B. (2010). Recognising audio-visual speech in vehicles using the AVICAR database. In Proceedings of the 13th Australasian International Conference on Speech Science and Technology, The Australasian Speech Science & Technology Association (pp. 110–113).

Navarathna, R., Kleinschmidt, T., Dean, D. B., Sridharan, S., & Lucey, P. J. (2011). Can audio-visual speech recognition outperform acoustically enhanced speech recognition in automotive environment? In In Interspeech, (pp. 2241–2244).

Potamianos, G., & Neti, C. (2003) Audio-visual speech recognition in challenging environments. In INTERSPEECH (pp. 1293–1296).

Potamianos, G., Neti, C., Luettin, J., & Matthews, I. (2004). Audio-visual automatic speech recognition: An overview. Issues in Visual and Audio-Visual Speech Processing, 22, 23.

Potamianos, G., & Lucey, P. (2006). Audio-visual asr from multiple views inside smart rooms. In IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (pp. 35–40).

Stewart, D., Seymour, R., Pass, A., & Ming, J. (2014). Robust audio-visual speech recognition under noisy audio-video conditions. IEEE Transactions on Cybernetics, 44(2), 175–184.CrossRef

Viola, P., Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, (Vol. 1, pp. 511–518).

Titel: Multiple cameras audio visual speech recognition using active appearance model visual features in car environment
verfasst von: Astik Biswas
P. K. Sahu
Mahesh Chandra
Publikationsdatum: 23.01.2016
Verlag: Springer US
Erschienen in: International Journal of Speech Technology / Ausgabe 1/2016
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-016-9332-x

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Jonas Klose/© Pine Valley Capital GmbH, Carina Kießling von der Strategieberatung Roland Berger/© Monika Walther Fotografie | ATZ, Beijing Auto Show 2024: Deutsche Hersteller wollen angreifen./© EKH-Pictures / Generated with AI / Stock.adobe.com, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 1/2016

Combining the evidences of temporal and spectral enhancement techniques for improving the performance of Indian language identification system in the presence of background noise

Integration of Yoruba language into MaryTTS

Hybridization of spectral filtering with particle swarm optimization for speech signal enhancement

Speech coding using Best Tree Encoding (BTE) technique based on LPC and trigonometric features

Automatic prosodic tone choice classification with Brazil’s intonation model

ILATalk: a new multilingual text-to-speech synthesizer with machine learning

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.