Top

International Journal of Speech Technology

Published in:

01-03-2013

Speaker recognition utilizing distributed DCT-II based Mel frequency cepstral coefficients and fuzzy vector quantization

Authors: M. Afzal Hossan, Mark A. Gregory

Published in: International Journal of Speech Technology | Issue 1/2013

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

In this paper, a new and novel Automatic Speaker Recognition (ASR) system is presented. The new ASR system includes novel feature extraction and vector classification steps utilizing distributed Discrete Cosine Transform (DCT-II) based Mel Frequency Cepstral Coefficients (MFCC) and Fuzzy Vector Quantization (FVQ). The ASR algorithm utilizes an approach based on MFCC to identify dynamic features that are used for Speaker Recognition (SR). A series of experiments were performed utilizing three different feature extraction methods: (1) conventional MFCC; (2) Delta-Delta MFCC (DDMFCC); and (3) DCT-II based DDMFCC. The experiments were then expanded to include four classifiers: (1) FVQ; (2) K-means Vector Quantization (VQ); (3) Linde, Buzo and Gray VQ; and (4) Gaussian Mixed Model (GMM). The combination of DCT-II based MFCC, DMFCC and DDMFCC with FVQ was found to have the lowest Equal Error Rate for the VQ based classifiers. The results found were an improvement over previously reported non-GMM methods and approached the results achieved for the computationally expensive GMM based method. Speaker verification tests carried out highlighted the overall performance improvement for the new ASR system. The National Institute of Standards and Technology Speaker Recognition Evaluation corpora was used to provide speaker source data for the experiments.

previous article Non-intrusive speech quality assessment using several combinations of auditory features

next article Improving the performance of speaker and language identification tasks using unique characteristics of a class

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Abida, M. K. (2007). Fuzzy gmm-based confidence measure towards keyword spotting application. A thesis presented to the University of Waterloo in fulfilment of the thesis requirement for the degree of master of applied science, electrical and computer engineering, University of Waterloo, Ontario, Canada.

Assaleh, K. T., & Mammone, R. J. (1994). Robust cepstral features for speaker identification. In IEEE international conference on acoustics, speech, and signal processing (Vol. 1, pp. 129–132).

Atal, B. S. & Hanauer, S. L. (1971). Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. The Journal of the Acoustical Society of America, 55(6), 1304–1312. CrossRef

Barbu, T. (2009). Comparing various voice recognition techniques. In Proceedings of the 5-th conference on speech technology and human-computer dialogue (pp. 1–6). CrossRef

Charbuillet, C., Gas, B., Chetouani, M., & Zarader, J. L. (2007). Complementary features for speaker verification based on genetic algorithms. In IEEE international conference on acoustics, speech and signal processing (Vol. 4, pp. IV-285–IV-288).

Chen, J., Paliwal, K. K., Mizumachi, M., & Nakamura, S. (2001a). Robust MFCCs derived from differentiated power spectrum. In Proc. intern. conf. on speech processing, TaeJon, Korea (Vol. 2, pp. 577–582).

Chen, W., Zhenjiang, M., & Xiao, M. (2001b). Comparison of different implementations of mfcc. Journal of Computer Science and Technology, 16(16), 582–589.

Chen, W., Zhenjiang, M., & Xiao, M. (2008). Differential mfcc and vector quantization used for real-time speaker recognition system. In Congress on image and signal processing (pp. 319–323).

Cheng, J., & Wang, H. C. (2004). A method of estimating the equal error rate for automatic speaker verification. In International symposium on Chinese spoken language processing (pp. 285–288). CrossRef

Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28, 357–366. CrossRef

Ganchev, T. D. (2005). Speaker recognition. A dissertation submitted to the University of Patras in partial fulfilment of the requirements for the degree doctor of philosophy.

Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis of speech. The Journal of the Acoustical Society of America, 87(4), 1738–1752. CrossRef

Hossan, M. A. (2011). Automatic speaker recognition dynamic feature identification and classification using distributed discrete cosine transform based Mel frequency cepstral coefficients and fuzzy vector quantization. A thesis presented to the RMIT University in fulfilment of the thesis requirement for the degree of master of engineering, electrical and computer engineering, RMIT University, Melbourne, Australia.

Hossan, M. A., Memon, S., & Gregory, M. A. (2010). A novel approach for MFCC feature extraction. In 4th international conference on signal processing and communication systems (ICSPCS), Dec. 2010 (pp. 1–5), 13–15.

Jayanna, H. S., & Prasanna, S. R. M. (2008). Fuzzy vector quantization for speaker recognition under limited data conditions. In IEEE region 10 conference (TENCON 2008) (pp. 1–4). CrossRef

Kanade, P. M., & Hall, L. O. (2007). Fuzzy ants and clustering. IEEE Transactions on Systems, Man and Cybernetics. Part A. Systems and Humans, 37(5), 758–769. CrossRef

Keshet, J., & Bengio, S. (2009). Automatic speech and speaker recognition: large margin and kernel methods. New York: Wiley. CrossRef

Kim, S., & Eriksson, T. (2004). A pitch synchronous feature extraction method for speaker recognition. In IEEE international conference on acoustics, speech, and signal processing, 2004 (Vol. 1, pp. 405–408).

MATLAB (2012). MATLAB & Simulink, Mathworks, USA. http://www.mathworks.com.au/products/matlab/. Accessed on 1 March 2012.

Memon, S., Lech, M., & He, L. (2009a). Using information theoretic vector quantization for inverted mfcc based speaker verification. In 2nd international conference on computer, control and communication (pp. 1–5). CrossRef

Memon, S., Lech, M., & Maddage, N. (2009b). Speaker verification based on different vector quantization techniques with Gaussian mixture models. In Third international conference on network and system security (pp. 403–408). CrossRef

National Institute of Standards and Technology speaker recognition evaluation (2004). http://www.itl.nist.gov/iad/mig/tests/spk/2004/. Accessed online 20/9/2010.

Oppenheim, A. V. (1969). A speech analysis-synthesis system based on homomorphic filtering. The Journal of the Acoustical Society of America, 45, 458–465. CrossRef

Paul, A. K., Das, D., & Kamal, M. (2009). Bangla speech recognition system using lpc and ann. In Seventh international conference on advances in pattern recognition (pp. 171–174). CrossRef

Saeidi, R., Mohammadi, H. R. S., Rodman, R. D., & Kinnunen, T. (2007). A new segmentation algorithm combined with transient frames power for text independent speaker verification. In IEEE international conference on acoustics, speech and signal processing (ICASSP) (Vol. 4, pp. 305–308).

Sahidullah, M., & Saha, G. (2009). On the use of distributed DCT in speaker identification. In 2009 annual IEEE India conference (INDICON) (pp. 1–4). CrossRef

Salman, A., Muhammad, E., & Khurshid, K. (2007). Speaker verification using boosted cepstral features with Gaussian distributions. In IEEE international multitopic conference (pp. 1–5). CrossRef

Shi, N., Liu, X., & Guan, Y. (2010). Research on k-means clustering algorithm: an improved k-means clustering algorithm. In Third international symposium on intelligent information technology and security informatics (IITSI), 2–4 April 2010 (pp. 63–67).

Wang, W., Zhang, Y., Li, Y., & Zhang, X. (2006). The global fuzzy C-means clustering algorithm. In The sixth world congress on intelligent control and automation (WCICA 2006) (Vol. 1, pp. 3604–3607). CrossRef

Wang, H., Zhang, X., Suo, H., Zhao, Q., & Yan, Y. (2009). A novel fuzzy-based automatic speaker clustering algorithm. In 6th international symposium on neural networks (ISNN 2009), Wuhan, China, 26–29 May 2009 (pp. 639–646), Part II. CrossRef

Wei-Guo, G., Li-Ping, Y., & Di, C. (2008). Pitch synchronous based feature extraction for noise-robust speaker verification. In Congress on image and signal processing (CISP’08) (Vol. 5, pp. 295–298).

Zilca, R. D., Navratil, J., & Ramaswamy, G. N. (2003). Depitch and the role of fundamental frequency in speaker recognition. In IEEE international conference on acoustics, speech, and signal processing, 2003 (Vol. 2, pp. 81–84).

Title: Speaker recognition utilizing distributed DCT-II based Mel frequency cepstral coefficients and fuzzy vector quantization
Authors: M. Afzal Hossan
Mark A. Gregory
Publication date: 01-03-2013
Publisher: Springer US
Published in: International Journal of Speech Technology / Issue 1/2013
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-012-9166-0

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 1/2013

Improving the performance of speaker and language identification tasks using unique characteristics of a class

Effect of aging on speech features and phoneme recognition: a study on Bengali voicing vowels

Dynamic prosody modification using zero frequency filtered signal

Blind separation of audio signals using trigonometric transforms and Kalman filtering

Emotion modeling from speech signal based on wavelet packet transform

Non-intrusive speech quality assessment using several combinations of auditory features