Skip to main content
Top

Exploiting Visual Lip Movement Information for Automatic Speaker Recognition

  • 21-04-2025
Published in:

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The article addresses the limitations of conventional speaker recognition systems that rely solely on voice biometrics, such as susceptibility to synthetic voices, pre-recorded samples, and environmental noise. It introduces the concept of exploiting visual lip movement information as a complementary modality to enhance the accuracy and reliability of automatic speaker recognition (ASR) systems. The research utilizes the VidTIMIT and MPSTME databases to validate the effectiveness of integrating lip movement features with audio data. Key topics include the extraction and classification of voice features, the use of Histogram of Oriented Gradients (HOG) for lip region detection, and the application of Convolutional Neural Networks (CNNs) for feature extraction and classification. The article also discusses the challenges and complexities involved in implementing multimodal fusion techniques and suggests future directions for research. By combining audio and visual cues, the proposed system offers a more robust identification process, particularly in challenging conditions, and paves the way for advancements in speaker recognition technology.

Not a customer yet? Then find out more about our access models now:

Individual Access

Start your personal individual access now. Get instant access to more than 164,000 books and 540 journals – including PDF downloads and new releases.

Starting from 54,00 € per month!    

Get access

Access for Businesses

Utilise Springer Professional in your company and provide your employees with sound specialist knowledge. Request information about corporate access now.

Find out how Springer Professional can uplift your work!

Contact us now
Title
Exploiting Visual Lip Movement Information for Automatic Speaker Recognition
Authors
Sumita Nainan
Sonal Parmar
Kanchan Bakade
Publication date
21-04-2025
Publisher
Springer US
Published in
Circuits, Systems, and Signal Processing / Issue 9/2025
Print ISSN: 0278-081X
Electronic ISSN: 1531-5878
DOI
https://doi.org/10.1007/s00034-025-03113-w
This content is only visible if you are logged in and have the appropriate permissions.