Top

Wireless Personal Communications

Published in:

08-08-2016

Noise Robust Speaker Identification Using RASTA–MFCC Feature with Quadrilateral Filter Bank Structure

Authors: S. Selva Nidhyananthan, R. Shantha Selva Kumari, T. Senthur Selvi

Published in: Wireless Personal Communications | Issue 3/2016

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

This paper motivates the use of Relative Spectra–Mel Frequency Cepstral Coefficients (RASTA–MFCC) feature extracted from the newly designed Quadrilateral filter bank structure and Gaussian Mixture Model–Universal Background Model (GMM–UBM) for improved text independent speaker identification under noisy environment. Unlike neural network model which requires retraining of entire database when a new sample is added to it, GMM–UBM model does not require retraining of entire database which leads to easier and faster processing. RASTA–MFCC is found to be more robust to noisy environment compared with traditional MFCC method. MFCC is an efficient feature for identifying the speaker as it has speaker specific information capturing ability. RASTA processing of speech improves the performance of recognizer in the presence of convolution and additive noise. This work combines the better of these two processes to yield RASTA–MFCC feature which is robust to noise and also proposes a new Quadrilateral filter bank structure which approximates the response of cochlear membrane of human ear to effectively capture the feature vectors. The proposed Quadrilateral filter bank structure with RASTA–MFCC feature and GMM–UBM modeling for speaker identification demonstrates supremacy over triangular and Gaussian filter banks and offers a speaker identification accuracy of 97.67 % for the MEPCO noisy speech database with 50 speakers.

previous article Improved Hierarchical Key Management Scheme (IHKMS) with Three Phase Technique for Ad-Hoc Networks

next article An Authentication Protocol Based on Quantum Key Distribution Using Decoy-State Method for Heterogeneous IoT

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Atal, B. S. (1974). Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. The Journal of the Acoustical Society of America, 55(6), 1304–1312. doi:10.1121/1.1914702.CrossRef

Reynolds, D.A. (2008). Gaussian mixture models. Lexington, MA: MIT Lincoln Laboratory.

Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2013). Speaker verification using adapted gaussian mixture models. IEEE Transactions on Audio, Speech and Language Processing. doi:10.1006/dspr.1999.0361.

Bhattacharjee, U., & Sarmah, K. (2012). GMM–UBM based speaker verification in multilingual environments. IJCSI International Journal of Computer Science Issues, 9(6), 2.

Xiaojia, Z., Yang, S., & De Liang, W. (2011). Robust speaker identification using auditory features and computational auditory scene analysis. IEEE Proceedings of the ICASSP-2008. doi:10.1109/ICASSP.2008.4517928.

Hermansky, H., Morgan, N., Bayya, A., & Kohn, P. (1992). RASTA-PLP speech analysis technique. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1, 121–124. doi:10.1109/ICASSP.1992.225957.

Skowronski, M. D., & Harris, J. G. (2003). Improving the filter bank of a classic speech feature extraction algorithm. IEEE International Symposium on Circuits and Systems. doi:10.1109/ISCAS.2003.1205828.

Schwarz, R., et al. (1994). Comparative experiments on large vocabulary speech recognition. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1(1), 561–564. doi:10.1109/ICASSP.1994.389232.

Un, C., & Lee, H. (1980). Voiced/unvoiced/silence discrimination of speech by delta modulation. IEEE Transaction on Acoustics, Speech and Signal Processing, 28(4), 398–407. doi:10.1109/TASSP.1980.1163424.CrossRef

10.

Picone, J. (1993). Signal modeling techniques in speech recognition. Proceedings of the IEEE, 81(9), 1215–1247. doi:10.1109/5.237532.CrossRef

11.

Deller, J. R., Proakis, J. G., & Hansen, J. H. L. (1993). Discrete time processing of speech signals. London: Macmillan.

12.

Hermansky, H., & Morgan, N. (1994). RASTA processing of speech. IEEE Transaction Speech Audio Process, 2(4), 578–589. doi:10.1109/89.326616.CrossRef

13.

Gaubitch, N. D., Brookes, M., & Naylor, P. A. (2013). Blind channel magnitude response estimation in speech using spectrum classification. IEEE Transactions on Audio, Speech and Language Processing. doi:10.1109/TASL.2013.2270406.

14.

Togneri, R., & Pullela, D. (2011). An overview of speaker identification and accuracy. IEEE Circuits and Systems Magazine. doi:10.1109/MCAS.2011.941079.

15.

Stockham, T., Cannon, T., & Ingebretsen, R. (1975). Blind deconvolution through digital signal processing. Proceedings of the IEEE, 63, 678–692. doi:10.1109/PROC.1975.9800.CrossRef

16.

Wojcicki, K., & Loizou, P. (2012). Channel selection in the modulation domain for improved speech intelligibility in noise. Journal of the Acoustical Society of America, 131(4), 2904–2913. doi:10.1121/1.3688488.CrossRef

17.

Moore, B. C. J., & Glasberg, B. R. (1983). Suggested formulae for calculating auditory-filter bandwidths and excitation patterns. The Journal of the Acoustical Society of America, 74(3), 750–753.CrossRef

18.

Grimaldi, M., & Cummins, F. (2008). Speaker identification using instantaneous frequencies. IEEE Transactions on Audio, Speech and Language Processing. doi:10.1109/TASL.2008.2001109.

19.

Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker identification using gaussian mixture models. IEEE Transaction on Speech Audio Processing, 3(1), 72–83. doi:10.1109/89.365379.CrossRef

20.

Revathi, A., Ganapathy, R., & Venkataramani, Y. (2009). Text independent speaker recognition and speaker independent speech recognition using iterative clustering approach. International Journal of Computer Science and Information Technology, 1(2), 30–42.

21.

Gomez, P. (2011). A text independent speaker recognition system using a novel parametric neural network. Proceedings of International Journal of Signal Processing, Image Processing and Pattern Recognition, 4(4), 1–16.

Title: Noise Robust Speaker Identification Using RASTA–MFCC Feature with Quadrilateral Filter Bank Structure
Authors: S. Selva Nidhyananthan
R. Shantha Selva Kumari
T. Senthur Selvi
Publication date: 08-08-2016
Publisher: Springer US
Published in: Wireless Personal Communications / Issue 3/2016
Print ISSN: 0929-6212
Electronic ISSN: 1572-834X
DOI: https://doi.org/10.1007/s11277-016-3530-3

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 3/2016

Empirical Models of the Azimuthal Reception Angle—Part II: Adaptation of the Empirical Models in Analytical and Simulation Studies

Evaluation of Conventional and Wavelet Based OFDM System for ICI Cancellation

Hybrid Overlay/Underlay Transmission Scheme with Optimal Resource Allocation for Primary User Throughput Maximization in Cooperative Cognitive Radio Networks

LPWS Algorithm for Wideband Spectrum Sensing

Performance Improvement Techniques for OFDM system using Software Defined Radio

Dominant Interferers in Cognitive Radio Network