Top

Published in:

2023 | OriginalPaper | Chapter

A Comparative Study on Effect of Temporal Phase for Speaker Verification

Authors : Doreen Nongrum, Fidalizia Pyrtuh

Published in: Proceedings of International Conference on Frontiers in Computing and Systems

Publisher: Springer Nature Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

In this paper, the temporal phase influence on speech signal is demonstrated through different experimental models, notably for speaker verification. Feature extraction is a fundamental block in a speaker recognition system responsible for obtaining speaker characteristics from speech signal. The commonly used short-term spectral features accentuate the magnitude spectrum while totally removing the phase spectrum. In this paper, the phase spectrum knowledge is extensively extracted and studied along with the magnitude information for speaker verification. The Linear Prediction Cepstral Coefficients (LPCC) are extracted from speech signal temporal phase and its scores are fused with Mel-Frequency Cepstral Coefficients (MFCC) scores. The trained data are modeled using the state-of-art speaker specific Gaussian mixture model (GMM) and GMM-Universal Background Model (GMM-UBM) for both LPCC and MFCC features. The scores are matched using dynamic time warping (DTW). The proposed method is tested on a fixed-pass phrase with a duration of <5 s in a speech signal. The score level fusion technique helps in the reduction of equal error rate (EER) and improves recognition rate.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Bengali POS Tagging Using Bi-LSTM with Word Embedding and Character-Level Embedding

next chapter An Acoustic/Prosodic Feature-Based Audio Dataset for Assamese Speech Summarization

Atal BS (1976) Automatic recognition of speakers from their voices. Proc IEEE 64(4):460–475CrossRef

Rabiner L, Juang B (1993) Fundamentals of speech recognition. Prentice-Hall, Inc, Upper Saddle River, NJ, USA

Doddington G (1985) Speaker recognition identifying people by their voices. Proc IEEE 73(11):1651–1664CrossRef

Naik J (1990) Speaker verification: a tutorial. IEEE Commun Mag 28(1):42–48CrossRef

Furui S (1981) Cepstral analysis technique for automatic speaker verification. IEEE Trans Acoustics Speech Signal Process 29(2):254–272CrossRef

Makhoul J (1975) Linear prediction: a tutorial review. Proc IEEE 63(4):561–580CrossRef

Reynolds DA, Rose RC (1995) Robust text- independent speaker identification using Gaussian mixture speaker models. IEEE Trans Speech Audio Process 3(1):72–83CrossRef

Reynolds DA, Quatieri TF, Dunn RB (2000) Speaker verification using adapted Gaussian mix-ture models. Digital Signal Proces 10(1):19–41CrossRef

Dehak N, Kenny P, Dehak R, Dumouchel P, Ouellet P (2011) Front-end factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 19(4):788–798CrossRef

10.

Richardson F, Reynolds D, Dehak N (2015) Deep neural network approaches to speaker and language recognition. IEEE Signal Process Lett 22(10):1671–1675CrossRef

11.

Jagiasi R, Ghosalkar S, Kulal P, Bharambe A (2019) “CNN based speaker recognition in language and text-independent small scale system.” In 2019 third international conference on i-smac (iot in social, mobile, analytics and cloud) (I-SMAC), pp 176–179

12.

Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Process ASSP-26(1):43–49CrossRef

13.

Răstoceanu F, Lazăr M (2011) “Score fusion methods for text-independent speaker verification applications.” In 2011 6th conference on speech technology and human-computer dialogue (SpeD), pp 1–6. https://doi.org/10.1109/SPED.2011.5940740

14.

Shetty M ‘ICICI bank to roll out voice authentication’. Available at: https://timesofindia.indiatimes.com/business/india-business/icici-bank-to-roll-out-voice-authentication/articleshow/46818823.cms. April 6, 2015 [Online]

15.

Loshin P ‘Barclays replaces passwords with voice authentication’. Available at: https://searchsecurity.techtarget.com/news/450301866/Barclays-replacespasswords-with-voice-authentication. 3 Aug 2016 [Online]

16.

Larcher A, Lee KA, Ma B, Li H (2012) ‘RSR2015: database for text-dependentspeaker verification using multiple pass-phrases.’ Proc Interspeech, pp 1580–1583

17.

Kanagasundaram A, Vogt R, Dean D, Sridharan S, Mason M (2011) “i-vector based speaker recognition on short utterances.” In Interspeech 2011

18.

Griffiths J (2017) “Citi tops 1 million mark for voice biometrics authentication for Asia Pacific consumer banking clients”. www.citigroup.com/citi/news/2017/170321b.htm

19.

Vijayan K, Kumar V, Murty KSR (2014) “Allpassmodelling of Fourier phase for speaker verification”. In Proceedings of ODYSSEY 2014: the speaker and lan-guage recognition workshop, Joensuu, Finland, pp 112–117

20.

Soong FK, Rosenberg AE (1998) On the use of instantaneous and transitional spectral information in speaker recognition. IEEE Trans Acoustics Speech Signal Process 36(6):871–879CrossRef

21.

Gandhi A, Patil HA (2018) “Feature extraction from temporal phase for speaker recognition.” In SPCOM 2018–12th international conference signal processing communications, pp 382–386

22.

Singh A, Kadyan V, Kumar M, Bassan N (2020) ASRoIL: a comprehensive survey for automatic speech recognition of Indian languages .53(5), Springer Netherlands

23.

Chaturvedi V, Kaur AB, Varshney V, Garg A, Chhabra GS, Kumar M (2021) “Music mood and human emotion recognition based on physiological signals: a systematic review.” Multimed Syst 0123456789

24.

Jessen M, Meir G, Solewicz YA (2019) Evaluation of nuance forensics 9.2 and 11.1 under conditions reflecting those of a real forensic voice comparison case (forensic_eval_01). Speech Commun 110:101–107CrossRef

25.

Graf S, Herbig T, Buck M et al (2015) Features for voice activity detection: a comparative analysis. EURASIP J Adv Signal Process 2015:91CrossRef

26.

Jelil S, Das RK, Sinha R, Prasanna SM (2015) “Speaker verification using gaussian posteriorgrams on fixed phrase short utterances”. In 2015 ISCA, Dresden, Germany, pp 1042–1046

27.

Prasanna SRM, Zachariah JM (2002) “Detection of vowel onset point in speech.” In 2002 IEEE international conference on acoustics, speech, and signal processing, pp IV-4159-IV-4159. https://doi.org/10.1109/ICASSP.2002.5745575

28.

Paliwal K, Alsteris L (2003) “Usefulness of phase spectrum in human speech perception”. In Proceedings eighth European conference on speech communication and technology (EUROSPEECH2003), Geneva, Switzerland, pp 2117–2120

29.

Oppenheim AV, Lim JS (1981) The importance of phase in signals. Proc IEEE 69(5):529–550CrossRef

30.

Schluter R, Ney H (2001) “Using phase spectrum information for improved speech recognition performance”. In Proceedings of IEEE international conference acoustics speech and signal process (ICASSP), Salt Lake City, UT, USA, vol 1, pp. 133–136

31.

Shi G, Shanechi MM, Aarabi P (2006) On the importance of phase in human speech recognition. IEEE Trans Audio Speech Lang Process 14:1867–1874CrossRef

32.

Vijayan K, Reddy P, Murty KSR (2016) Significance of analytic phase of speech signals in speaker verification. Speech Comm 81(C):54–71CrossRef

33.

Murty KR, Yegnanarayana B (2006) Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Process Lett 13:52–56CrossRef

34.

Brookes M et al (2018) “VOICEBOX: speech processing toolbox for MATLAB,” Software available [April 2018]

35.

MSR identity toolkit, Microsoft research, http://research.microsoft.com/. Available Online 2013. Last Accessed 28 Feb 2015

Title: A Comparative Study on Effect of Temporal Phase for Speaker Verification
Authors: Doreen Nongrum
Fidalizia Pyrtuh
Publisher: Springer Nature Singapore
Book: Proceedings of International Conference on Frontiers in Computing and Systems
Print ISBN: 978-981-19-0104-1

Electronic ISBN: 978-981-19-0105-8

Copyright Year: 2023
DOI: https://doi.org/10.1007/978-981-19-0105-8_56

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"