Skip to main content
Top
Published in: International Journal of Speech Technology 1/2013

01-03-2013

Speaker recognition utilizing distributed DCT-II based Mel frequency cepstral coefficients and fuzzy vector quantization

Authors: M. Afzal Hossan, Mark A. Gregory

Published in: International Journal of Speech Technology | Issue 1/2013

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this paper, a new and novel Automatic Speaker Recognition (ASR) system is presented. The new ASR system includes novel feature extraction and vector classification steps utilizing distributed Discrete Cosine Transform (DCT-II) based Mel Frequency Cepstral Coefficients (MFCC) and Fuzzy Vector Quantization (FVQ). The ASR algorithm utilizes an approach based on MFCC to identify dynamic features that are used for Speaker Recognition (SR). A series of experiments were performed utilizing three different feature extraction methods: (1) conventional MFCC; (2) Delta-Delta MFCC (DDMFCC); and (3) DCT-II based DDMFCC. The experiments were then expanded to include four classifiers: (1) FVQ; (2) K-means Vector Quantization (VQ); (3) Linde, Buzo and Gray VQ; and (4) Gaussian Mixed Model (GMM). The combination of DCT-II based MFCC, DMFCC and DDMFCC with FVQ was found to have the lowest Equal Error Rate for the VQ based classifiers. The results found were an improvement over previously reported non-GMM methods and approached the results achieved for the computationally expensive GMM based method. Speaker verification tests carried out highlighted the overall performance improvement for the new ASR system. The National Institute of Standards and Technology Speaker Recognition Evaluation corpora was used to provide speaker source data for the experiments.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Abida, M. K. (2007). Fuzzy gmm-based confidence measure towards keyword spotting application. A thesis presented to the University of Waterloo in fulfilment of the thesis requirement for the degree of master of applied science, electrical and computer engineering, University of Waterloo, Ontario, Canada. Abida, M. K. (2007). Fuzzy gmm-based confidence measure towards keyword spotting application. A thesis presented to the University of Waterloo in fulfilment of the thesis requirement for the degree of master of applied science, electrical and computer engineering, University of Waterloo, Ontario, Canada.
go back to reference Assaleh, K. T., & Mammone, R. J. (1994). Robust cepstral features for speaker identification. In IEEE international conference on acoustics, speech, and signal processing (Vol. 1, pp. 129–132). Assaleh, K. T., & Mammone, R. J. (1994). Robust cepstral features for speaker identification. In IEEE international conference on acoustics, speech, and signal processing (Vol. 1, pp. 129–132).
go back to reference Atal, B. S. & Hanauer, S. L. (1971). Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. The Journal of the Acoustical Society of America, 55(6), 1304–1312. CrossRef Atal, B. S. & Hanauer, S. L. (1971). Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. The Journal of the Acoustical Society of America, 55(6), 1304–1312. CrossRef
go back to reference Barbu, T. (2009). Comparing various voice recognition techniques. In Proceedings of the 5-th conference on speech technology and human-computer dialogue (pp. 1–6). CrossRef Barbu, T. (2009). Comparing various voice recognition techniques. In Proceedings of the 5-th conference on speech technology and human-computer dialogue (pp. 1–6). CrossRef
go back to reference Charbuillet, C., Gas, B., Chetouani, M., & Zarader, J. L. (2007). Complementary features for speaker verification based on genetic algorithms. In IEEE international conference on acoustics, speech and signal processing (Vol. 4, pp. IV-285–IV-288). Charbuillet, C., Gas, B., Chetouani, M., & Zarader, J. L. (2007). Complementary features for speaker verification based on genetic algorithms. In IEEE international conference on acoustics, speech and signal processing (Vol. 4, pp. IV-285–IV-288).
go back to reference Chen, J., Paliwal, K. K., Mizumachi, M., & Nakamura, S. (2001a). Robust MFCCs derived from differentiated power spectrum. In Proc. intern. conf. on speech processing, TaeJon, Korea (Vol. 2, pp. 577–582). Chen, J., Paliwal, K. K., Mizumachi, M., & Nakamura, S. (2001a). Robust MFCCs derived from differentiated power spectrum. In Proc. intern. conf. on speech processing, TaeJon, Korea (Vol. 2, pp. 577–582).
go back to reference Chen, W., Zhenjiang, M., & Xiao, M. (2001b). Comparison of different implementations of mfcc. Journal of Computer Science and Technology, 16(16), 582–589. Chen, W., Zhenjiang, M., & Xiao, M. (2001b). Comparison of different implementations of mfcc. Journal of Computer Science and Technology, 16(16), 582–589.
go back to reference Chen, W., Zhenjiang, M., & Xiao, M. (2008). Differential mfcc and vector quantization used for real-time speaker recognition system. In Congress on image and signal processing (pp. 319–323). Chen, W., Zhenjiang, M., & Xiao, M. (2008). Differential mfcc and vector quantization used for real-time speaker recognition system. In Congress on image and signal processing (pp. 319–323).
go back to reference Cheng, J., & Wang, H. C. (2004). A method of estimating the equal error rate for automatic speaker verification. In International symposium on Chinese spoken language processing (pp. 285–288). CrossRef Cheng, J., & Wang, H. C. (2004). A method of estimating the equal error rate for automatic speaker verification. In International symposium on Chinese spoken language processing (pp. 285–288). CrossRef
go back to reference Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28, 357–366. CrossRef Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28, 357–366. CrossRef
go back to reference Ganchev, T. D. (2005). Speaker recognition. A dissertation submitted to the University of Patras in partial fulfilment of the requirements for the degree doctor of philosophy. Ganchev, T. D. (2005). Speaker recognition. A dissertation submitted to the University of Patras in partial fulfilment of the requirements for the degree doctor of philosophy.
go back to reference Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis of speech. The Journal of the Acoustical Society of America, 87(4), 1738–1752. CrossRef Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis of speech. The Journal of the Acoustical Society of America, 87(4), 1738–1752. CrossRef
go back to reference Hossan, M. A. (2011). Automatic speaker recognition dynamic feature identification and classification using distributed discrete cosine transform based Mel frequency cepstral coefficients and fuzzy vector quantization. A thesis presented to the RMIT University in fulfilment of the thesis requirement for the degree of master of engineering, electrical and computer engineering, RMIT University, Melbourne, Australia. Hossan, M. A. (2011). Automatic speaker recognition dynamic feature identification and classification using distributed discrete cosine transform based Mel frequency cepstral coefficients and fuzzy vector quantization. A thesis presented to the RMIT University in fulfilment of the thesis requirement for the degree of master of engineering, electrical and computer engineering, RMIT University, Melbourne, Australia.
go back to reference Hossan, M. A., Memon, S., & Gregory, M. A. (2010). A novel approach for MFCC feature extraction. In 4th international conference on signal processing and communication systems (ICSPCS), Dec. 2010 (pp. 1–5), 13–15. Hossan, M. A., Memon, S., & Gregory, M. A. (2010). A novel approach for MFCC feature extraction. In 4th international conference on signal processing and communication systems (ICSPCS), Dec. 2010 (pp. 1–5), 13–15.
go back to reference Jayanna, H. S., & Prasanna, S. R. M. (2008). Fuzzy vector quantization for speaker recognition under limited data conditions. In IEEE region 10 conference (TENCON 2008) (pp. 1–4). CrossRef Jayanna, H. S., & Prasanna, S. R. M. (2008). Fuzzy vector quantization for speaker recognition under limited data conditions. In IEEE region 10 conference (TENCON 2008) (pp. 1–4). CrossRef
go back to reference Kanade, P. M., & Hall, L. O. (2007). Fuzzy ants and clustering. IEEE Transactions on Systems, Man and Cybernetics. Part A. Systems and Humans, 37(5), 758–769. CrossRef Kanade, P. M., & Hall, L. O. (2007). Fuzzy ants and clustering. IEEE Transactions on Systems, Man and Cybernetics. Part A. Systems and Humans, 37(5), 758–769. CrossRef
go back to reference Keshet, J., & Bengio, S. (2009). Automatic speech and speaker recognition: large margin and kernel methods. New York: Wiley. CrossRef Keshet, J., & Bengio, S. (2009). Automatic speech and speaker recognition: large margin and kernel methods. New York: Wiley. CrossRef
go back to reference Kim, S., & Eriksson, T. (2004). A pitch synchronous feature extraction method for speaker recognition. In IEEE international conference on acoustics, speech, and signal processing, 2004 (Vol. 1, pp. 405–408). Kim, S., & Eriksson, T. (2004). A pitch synchronous feature extraction method for speaker recognition. In IEEE international conference on acoustics, speech, and signal processing, 2004 (Vol. 1, pp. 405–408).
go back to reference Memon, S., Lech, M., & He, L. (2009a). Using information theoretic vector quantization for inverted mfcc based speaker verification. In 2nd international conference on computer, control and communication (pp. 1–5). CrossRef Memon, S., Lech, M., & He, L. (2009a). Using information theoretic vector quantization for inverted mfcc based speaker verification. In 2nd international conference on computer, control and communication (pp. 1–5). CrossRef
go back to reference Memon, S., Lech, M., & Maddage, N. (2009b). Speaker verification based on different vector quantization techniques with Gaussian mixture models. In Third international conference on network and system security (pp. 403–408). CrossRef Memon, S., Lech, M., & Maddage, N. (2009b). Speaker verification based on different vector quantization techniques with Gaussian mixture models. In Third international conference on network and system security (pp. 403–408). CrossRef
go back to reference Oppenheim, A. V. (1969). A speech analysis-synthesis system based on homomorphic filtering. The Journal of the Acoustical Society of America, 45, 458–465. CrossRef Oppenheim, A. V. (1969). A speech analysis-synthesis system based on homomorphic filtering. The Journal of the Acoustical Society of America, 45, 458–465. CrossRef
go back to reference Paul, A. K., Das, D., & Kamal, M. (2009). Bangla speech recognition system using lpc and ann. In Seventh international conference on advances in pattern recognition (pp. 171–174). CrossRef Paul, A. K., Das, D., & Kamal, M. (2009). Bangla speech recognition system using lpc and ann. In Seventh international conference on advances in pattern recognition (pp. 171–174). CrossRef
go back to reference Saeidi, R., Mohammadi, H. R. S., Rodman, R. D., & Kinnunen, T. (2007). A new segmentation algorithm combined with transient frames power for text independent speaker verification. In IEEE international conference on acoustics, speech and signal processing (ICASSP) (Vol. 4, pp. 305–308). Saeidi, R., Mohammadi, H. R. S., Rodman, R. D., & Kinnunen, T. (2007). A new segmentation algorithm combined with transient frames power for text independent speaker verification. In IEEE international conference on acoustics, speech and signal processing (ICASSP) (Vol. 4, pp. 305–308).
go back to reference Sahidullah, M., & Saha, G. (2009). On the use of distributed DCT in speaker identification. In 2009 annual IEEE India conference (INDICON) (pp. 1–4). CrossRef Sahidullah, M., & Saha, G. (2009). On the use of distributed DCT in speaker identification. In 2009 annual IEEE India conference (INDICON) (pp. 1–4). CrossRef
go back to reference Salman, A., Muhammad, E., & Khurshid, K. (2007). Speaker verification using boosted cepstral features with Gaussian distributions. In IEEE international multitopic conference (pp. 1–5). CrossRef Salman, A., Muhammad, E., & Khurshid, K. (2007). Speaker verification using boosted cepstral features with Gaussian distributions. In IEEE international multitopic conference (pp. 1–5). CrossRef
go back to reference Shi, N., Liu, X., & Guan, Y. (2010). Research on k-means clustering algorithm: an improved k-means clustering algorithm. In Third international symposium on intelligent information technology and security informatics (IITSI), 2–4 April 2010 (pp. 63–67). Shi, N., Liu, X., & Guan, Y. (2010). Research on k-means clustering algorithm: an improved k-means clustering algorithm. In Third international symposium on intelligent information technology and security informatics (IITSI), 2–4 April 2010 (pp. 63–67).
go back to reference Wang, W., Zhang, Y., Li, Y., & Zhang, X. (2006). The global fuzzy C-means clustering algorithm. In The sixth world congress on intelligent control and automation (WCICA 2006) (Vol. 1, pp. 3604–3607). CrossRef Wang, W., Zhang, Y., Li, Y., & Zhang, X. (2006). The global fuzzy C-means clustering algorithm. In The sixth world congress on intelligent control and automation (WCICA 2006) (Vol. 1, pp. 3604–3607). CrossRef
go back to reference Wang, H., Zhang, X., Suo, H., Zhao, Q., & Yan, Y. (2009). A novel fuzzy-based automatic speaker clustering algorithm. In 6th international symposium on neural networks (ISNN 2009), Wuhan, China, 26–29 May 2009 (pp. 639–646), Part II. CrossRef Wang, H., Zhang, X., Suo, H., Zhao, Q., & Yan, Y. (2009). A novel fuzzy-based automatic speaker clustering algorithm. In 6th international symposium on neural networks (ISNN 2009), Wuhan, China, 26–29 May 2009 (pp. 639–646), Part II. CrossRef
go back to reference Wei-Guo, G., Li-Ping, Y., & Di, C. (2008). Pitch synchronous based feature extraction for noise-robust speaker verification. In Congress on image and signal processing (CISP’08) (Vol. 5, pp. 295–298). Wei-Guo, G., Li-Ping, Y., & Di, C. (2008). Pitch synchronous based feature extraction for noise-robust speaker verification. In Congress on image and signal processing (CISP’08) (Vol. 5, pp. 295–298).
go back to reference Zilca, R. D., Navratil, J., & Ramaswamy, G. N. (2003). Depitch and the role of fundamental frequency in speaker recognition. In IEEE international conference on acoustics, speech, and signal processing, 2003 (Vol. 2, pp. 81–84). Zilca, R. D., Navratil, J., & Ramaswamy, G. N. (2003). Depitch and the role of fundamental frequency in speaker recognition. In IEEE international conference on acoustics, speech, and signal processing, 2003 (Vol. 2, pp. 81–84).
Metadata
Title
Speaker recognition utilizing distributed DCT-II based Mel frequency cepstral coefficients and fuzzy vector quantization
Authors
M. Afzal Hossan
Mark A. Gregory
Publication date
01-03-2013
Publisher
Springer US
Published in
International Journal of Speech Technology / Issue 1/2013
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-012-9166-0

Other articles of this Issue 1/2013

International Journal of Speech Technology 1/2013 Go to the issue