Skip to main content
Top

2023 | OriginalPaper | Chapter

A Comparative Study on Effect of Temporal Phase for Speaker Verification

Authors : Doreen Nongrum, Fidalizia Pyrtuh

Published in: Proceedings of International Conference on Frontiers in Computing and Systems

Publisher: Springer Nature Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this paper, the temporal phase influence on speech signal is demonstrated through different experimental models, notably for speaker verification. Feature extraction is a fundamental block in a speaker recognition system responsible for obtaining speaker characteristics from speech signal. The commonly used short-term spectral features accentuate the magnitude spectrum while totally removing the phase spectrum. In this paper, the phase spectrum knowledge is extensively extracted and studied along with the magnitude information for speaker verification. The Linear Prediction Cepstral Coefficients (LPCC) are extracted from speech signal temporal phase and its scores are fused with Mel-Frequency Cepstral Coefficients (MFCC) scores. The trained data are modeled using the state-of-art speaker specific Gaussian mixture model (GMM) and GMM-Universal Background Model (GMM-UBM) for both LPCC and MFCC features. The scores are matched using dynamic time warping (DTW). The proposed method is tested on a fixed-pass phrase with a duration of <5 s in a speech signal. The score level fusion technique helps in the reduction of equal error rate (EER) and improves recognition rate.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Atal BS (1976) Automatic recognition of speakers from their voices. Proc IEEE 64(4):460–475CrossRef Atal BS (1976) Automatic recognition of speakers from their voices. Proc IEEE 64(4):460–475CrossRef
2.
go back to reference Rabiner L, Juang B (1993) Fundamentals of speech recognition. Prentice-Hall, Inc, Upper Saddle River, NJ, USA Rabiner L, Juang B (1993) Fundamentals of speech recognition. Prentice-Hall, Inc, Upper Saddle River, NJ, USA
3.
go back to reference Doddington G (1985) Speaker recognition identifying people by their voices. Proc IEEE 73(11):1651–1664CrossRef Doddington G (1985) Speaker recognition identifying people by their voices. Proc IEEE 73(11):1651–1664CrossRef
4.
go back to reference Naik J (1990) Speaker verification: a tutorial. IEEE Commun Mag 28(1):42–48CrossRef Naik J (1990) Speaker verification: a tutorial. IEEE Commun Mag 28(1):42–48CrossRef
5.
go back to reference Furui S (1981) Cepstral analysis technique for automatic speaker verification. IEEE Trans Acoustics Speech Signal Process 29(2):254–272CrossRef Furui S (1981) Cepstral analysis technique for automatic speaker verification. IEEE Trans Acoustics Speech Signal Process 29(2):254–272CrossRef
6.
go back to reference Makhoul J (1975) Linear prediction: a tutorial review. Proc IEEE 63(4):561–580CrossRef Makhoul J (1975) Linear prediction: a tutorial review. Proc IEEE 63(4):561–580CrossRef
7.
go back to reference Reynolds DA, Rose RC (1995) Robust text- independent speaker identification using Gaussian mixture speaker models. IEEE Trans Speech Audio Process 3(1):72–83CrossRef Reynolds DA, Rose RC (1995) Robust text- independent speaker identification using Gaussian mixture speaker models. IEEE Trans Speech Audio Process 3(1):72–83CrossRef
8.
go back to reference Reynolds DA, Quatieri TF, Dunn RB (2000) Speaker verification using adapted Gaussian mix-ture models. Digital Signal Proces 10(1):19–41CrossRef Reynolds DA, Quatieri TF, Dunn RB (2000) Speaker verification using adapted Gaussian mix-ture models. Digital Signal Proces 10(1):19–41CrossRef
9.
go back to reference Dehak N, Kenny P, Dehak R, Dumouchel P, Ouellet P (2011) Front-end factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 19(4):788–798CrossRef Dehak N, Kenny P, Dehak R, Dumouchel P, Ouellet P (2011) Front-end factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 19(4):788–798CrossRef
10.
go back to reference Richardson F, Reynolds D, Dehak N (2015) Deep neural network approaches to speaker and language recognition. IEEE Signal Process Lett 22(10):1671–1675CrossRef Richardson F, Reynolds D, Dehak N (2015) Deep neural network approaches to speaker and language recognition. IEEE Signal Process Lett 22(10):1671–1675CrossRef
11.
go back to reference Jagiasi R, Ghosalkar S, Kulal P, Bharambe A (2019) “CNN based speaker recognition in language and text-independent small scale system.” In 2019 third international conference on i-smac (iot in social, mobile, analytics and cloud) (I-SMAC), pp 176–179 Jagiasi R, Ghosalkar S, Kulal P, Bharambe A (2019) “CNN based speaker recognition in language and text-independent small scale system.” In 2019 third international conference on i-smac (iot in social, mobile, analytics and cloud) (I-SMAC), pp 176–179
12.
go back to reference Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Process ASSP-26(1):43–49CrossRef Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Process ASSP-26(1):43–49CrossRef
16.
go back to reference Larcher A, Lee KA, Ma B, Li H (2012) ‘RSR2015: database for text-dependentspeaker verification using multiple pass-phrases.’ Proc Interspeech, pp 1580–1583 Larcher A, Lee KA, Ma B, Li H (2012) ‘RSR2015: database for text-dependentspeaker verification using multiple pass-phrases.’ Proc Interspeech, pp 1580–1583
17.
go back to reference Kanagasundaram A, Vogt R, Dean D, Sridharan S, Mason M (2011) “i-vector based speaker recognition on short utterances.” In Interspeech 2011 Kanagasundaram A, Vogt R, Dean D, Sridharan S, Mason M (2011) “i-vector based speaker recognition on short utterances.” In Interspeech 2011
19.
go back to reference Vijayan K, Kumar V, Murty KSR (2014) “Allpassmodelling of Fourier phase for speaker verification”. In Proceedings of ODYSSEY 2014: the speaker and lan-guage recognition workshop, Joensuu, Finland, pp 112–117 Vijayan K, Kumar V, Murty KSR (2014) “Allpassmodelling of Fourier phase for speaker verification”. In Proceedings of ODYSSEY 2014: the speaker and lan-guage recognition workshop, Joensuu, Finland, pp 112–117
20.
go back to reference Soong FK, Rosenberg AE (1998) On the use of instantaneous and transitional spectral information in speaker recognition. IEEE Trans Acoustics Speech Signal Process 36(6):871–879CrossRef Soong FK, Rosenberg AE (1998) On the use of instantaneous and transitional spectral information in speaker recognition. IEEE Trans Acoustics Speech Signal Process 36(6):871–879CrossRef
21.
go back to reference Gandhi A, Patil HA (2018) “Feature extraction from temporal phase for speaker recognition.” In SPCOM 2018–12th international conference signal processing communications, pp 382–386 Gandhi A, Patil HA (2018) “Feature extraction from temporal phase for speaker recognition.” In SPCOM 2018–12th international conference signal processing communications, pp 382–386
22.
go back to reference Singh A, Kadyan V, Kumar M, Bassan N (2020) ASRoIL: a comprehensive survey for automatic speech recognition of Indian languages .53(5), Springer Netherlands Singh A, Kadyan V, Kumar M, Bassan N (2020) ASRoIL: a comprehensive survey for automatic speech recognition of Indian languages .53(5), Springer Netherlands
23.
go back to reference Chaturvedi V, Kaur AB, Varshney V, Garg A, Chhabra GS, Kumar M (2021) “Music mood and human emotion recognition based on physiological signals: a systematic review.” Multimed Syst 0123456789 Chaturvedi V, Kaur AB, Varshney V, Garg A, Chhabra GS, Kumar M (2021) “Music mood and human emotion recognition based on physiological signals: a systematic review.” Multimed Syst 0123456789
24.
go back to reference Jessen M, Meir G, Solewicz YA (2019) Evaluation of nuance forensics 9.2 and 11.1 under conditions reflecting those of a real forensic voice comparison case (forensic_eval_01). Speech Commun 110:101–107CrossRef Jessen M, Meir G, Solewicz YA (2019) Evaluation of nuance forensics 9.2 and 11.1 under conditions reflecting those of a real forensic voice comparison case (forensic_eval_01). Speech Commun 110:101–107CrossRef
25.
go back to reference Graf S, Herbig T, Buck M et al (2015) Features for voice activity detection: a comparative analysis. EURASIP J Adv Signal Process 2015:91CrossRef Graf S, Herbig T, Buck M et al (2015) Features for voice activity detection: a comparative analysis. EURASIP J Adv Signal Process 2015:91CrossRef
26.
go back to reference Jelil S, Das RK, Sinha R, Prasanna SM (2015) “Speaker verification using gaussian posteriorgrams on fixed phrase short utterances”. In 2015 ISCA, Dresden, Germany, pp 1042–1046 Jelil S, Das RK, Sinha R, Prasanna SM (2015) “Speaker verification using gaussian posteriorgrams on fixed phrase short utterances”. In 2015 ISCA, Dresden, Germany, pp 1042–1046
28.
go back to reference Paliwal K, Alsteris L (2003) “Usefulness of phase spectrum in human speech perception”. In Proceedings eighth European conference on speech communication and technology (EUROSPEECH2003), Geneva, Switzerland, pp 2117–2120 Paliwal K, Alsteris L (2003) “Usefulness of phase spectrum in human speech perception”. In Proceedings eighth European conference on speech communication and technology (EUROSPEECH2003), Geneva, Switzerland, pp 2117–2120
29.
go back to reference Oppenheim AV, Lim JS (1981) The importance of phase in signals. Proc IEEE 69(5):529–550CrossRef Oppenheim AV, Lim JS (1981) The importance of phase in signals. Proc IEEE 69(5):529–550CrossRef
30.
go back to reference Schluter R, Ney H (2001) “Using phase spectrum information for improved speech recognition performance”. In Proceedings of IEEE international conference acoustics speech and signal process (ICASSP), Salt Lake City, UT, USA, vol 1, pp. 133–136 Schluter R, Ney H (2001) “Using phase spectrum information for improved speech recognition performance”. In Proceedings of IEEE international conference acoustics speech and signal process (ICASSP), Salt Lake City, UT, USA, vol 1, pp. 133–136
31.
go back to reference Shi G, Shanechi MM, Aarabi P (2006) On the importance of phase in human speech recognition. IEEE Trans Audio Speech Lang Process 14:1867–1874CrossRef Shi G, Shanechi MM, Aarabi P (2006) On the importance of phase in human speech recognition. IEEE Trans Audio Speech Lang Process 14:1867–1874CrossRef
32.
go back to reference Vijayan K, Reddy P, Murty KSR (2016) Significance of analytic phase of speech signals in speaker verification. Speech Comm 81(C):54–71CrossRef Vijayan K, Reddy P, Murty KSR (2016) Significance of analytic phase of speech signals in speaker verification. Speech Comm 81(C):54–71CrossRef
33.
go back to reference Murty KR, Yegnanarayana B (2006) Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Process Lett 13:52–56CrossRef Murty KR, Yegnanarayana B (2006) Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Process Lett 13:52–56CrossRef
34.
go back to reference Brookes M et al (2018) “VOICEBOX: speech processing toolbox for MATLAB,” Software available [April 2018] Brookes M et al (2018) “VOICEBOX: speech processing toolbox for MATLAB,” Software available [April 2018]
Metadata
Title
A Comparative Study on Effect of Temporal Phase for Speaker Verification
Authors
Doreen Nongrum
Fidalizia Pyrtuh
Copyright Year
2023
Publisher
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-19-0105-8_56