Skip to main content
Top
Published in: Wireless Personal Communications 3/2016

08-08-2016

Noise Robust Speaker Identification Using RASTA–MFCC Feature with Quadrilateral Filter Bank Structure

Authors: S. Selva Nidhyananthan, R. Shantha Selva Kumari, T. Senthur Selvi

Published in: Wireless Personal Communications | Issue 3/2016

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This paper motivates the use of Relative Spectra–Mel Frequency Cepstral Coefficients (RASTA–MFCC) feature extracted from the newly designed Quadrilateral filter bank structure and Gaussian Mixture Model–Universal Background Model (GMM–UBM) for improved text independent speaker identification under noisy environment. Unlike neural network model which requires retraining of entire database when a new sample is added to it, GMM–UBM model does not require retraining of entire database which leads to easier and faster processing. RASTA–MFCC is found to be more robust to noisy environment compared with traditional MFCC method. MFCC is an efficient feature for identifying the speaker as it has speaker specific information capturing ability. RASTA processing of speech improves the performance of recognizer in the presence of convolution and additive noise. This work combines the better of these two processes to yield RASTA–MFCC feature which is robust to noise and also proposes a new Quadrilateral filter bank structure which approximates the response of cochlear membrane of human ear to effectively capture the feature vectors. The proposed Quadrilateral filter bank structure with RASTA–MFCC feature and GMM–UBM modeling for speaker identification demonstrates supremacy over triangular and Gaussian filter banks and offers a speaker identification accuracy of 97.67 % for the MEPCO noisy speech database with 50 speakers.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Atal, B. S. (1974). Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. The Journal of the Acoustical Society of America, 55(6), 1304–1312. doi:10.1121/1.1914702.CrossRef Atal, B. S. (1974). Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. The Journal of the Acoustical Society of America, 55(6), 1304–1312. doi:10.​1121/​1.​1914702.CrossRef
2.
go back to reference Reynolds, D.A. (2008). Gaussian mixture models. Lexington, MA: MIT Lincoln Laboratory. Reynolds, D.A. (2008). Gaussian mixture models. Lexington, MA: MIT Lincoln Laboratory.
3.
go back to reference Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2013). Speaker verification using adapted gaussian mixture models. IEEE Transactions on Audio, Speech and Language Processing. doi:10.1006/dspr.1999.0361. Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2013). Speaker verification using adapted gaussian mixture models. IEEE Transactions on Audio, Speech and Language Processing. doi:10.​1006/​dspr.​1999.​0361.
4.
go back to reference Bhattacharjee, U., & Sarmah, K. (2012). GMM–UBM based speaker verification in multilingual environments. IJCSI International Journal of Computer Science Issues, 9(6), 2. Bhattacharjee, U., & Sarmah, K. (2012). GMM–UBM based speaker verification in multilingual environments. IJCSI International Journal of Computer Science Issues, 9(6), 2.
5.
go back to reference Xiaojia, Z., Yang, S., & De Liang, W. (2011). Robust speaker identification using auditory features and computational auditory scene analysis. IEEE Proceedings of the ICASSP-2008. doi:10.1109/ICASSP.2008.4517928. Xiaojia, Z., Yang, S., & De Liang, W. (2011). Robust speaker identification using auditory features and computational auditory scene analysis. IEEE Proceedings of the ICASSP-2008. doi:10.​1109/​ICASSP.​2008.​4517928.
6.
go back to reference Hermansky, H., Morgan, N., Bayya, A., & Kohn, P. (1992). RASTA-PLP speech analysis technique. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1, 121–124. doi:10.1109/ICASSP.1992.225957. Hermansky, H., Morgan, N., Bayya, A., & Kohn, P. (1992). RASTA-PLP speech analysis technique. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1, 121–124. doi:10.​1109/​ICASSP.​1992.​225957.
7.
go back to reference Skowronski, M. D., & Harris, J. G. (2003). Improving the filter bank of a classic speech feature extraction algorithm. IEEE International Symposium on Circuits and Systems. doi:10.1109/ISCAS.2003.1205828. Skowronski, M. D., & Harris, J. G. (2003). Improving the filter bank of a classic speech feature extraction algorithm. IEEE International Symposium on Circuits and Systems. doi:10.​1109/​ISCAS.​2003.​1205828.
8.
go back to reference Schwarz, R., et al. (1994). Comparative experiments on large vocabulary speech recognition. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1(1), 561–564. doi:10.1109/ICASSP.1994.389232. Schwarz, R., et al. (1994). Comparative experiments on large vocabulary speech recognition. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1(1), 561–564. doi:10.​1109/​ICASSP.​1994.​389232.
11.
go back to reference Deller, J. R., Proakis, J. G., & Hansen, J. H. L. (1993). Discrete time processing of speech signals. London: Macmillan. Deller, J. R., Proakis, J. G., & Hansen, J. H. L. (1993). Discrete time processing of speech signals. London: Macmillan.
13.
go back to reference Gaubitch, N. D., Brookes, M., & Naylor, P. A. (2013). Blind channel magnitude response estimation in speech using spectrum classification. IEEE Transactions on Audio, Speech and Language Processing. doi:10.1109/TASL.2013.2270406. Gaubitch, N. D., Brookes, M., & Naylor, P. A. (2013). Blind channel magnitude response estimation in speech using spectrum classification. IEEE Transactions on Audio, Speech and Language Processing. doi:10.​1109/​TASL.​2013.​2270406.
16.
go back to reference Wojcicki, K., & Loizou, P. (2012). Channel selection in the modulation domain for improved speech intelligibility in noise. Journal of the Acoustical Society of America, 131(4), 2904–2913. doi:10.1121/1.3688488.CrossRef Wojcicki, K., & Loizou, P. (2012). Channel selection in the modulation domain for improved speech intelligibility in noise. Journal of the Acoustical Society of America, 131(4), 2904–2913. doi:10.​1121/​1.​3688488.CrossRef
17.
go back to reference Moore, B. C. J., & Glasberg, B. R. (1983). Suggested formulae for calculating auditory-filter bandwidths and excitation patterns. The Journal of the Acoustical Society of America, 74(3), 750–753.CrossRef Moore, B. C. J., & Glasberg, B. R. (1983). Suggested formulae for calculating auditory-filter bandwidths and excitation patterns. The Journal of the Acoustical Society of America, 74(3), 750–753.CrossRef
18.
19.
go back to reference Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker identification using gaussian mixture models. IEEE Transaction on Speech Audio Processing, 3(1), 72–83. doi:10.1109/89.365379.CrossRef Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker identification using gaussian mixture models. IEEE Transaction on Speech Audio Processing, 3(1), 72–83. doi:10.​1109/​89.​365379.CrossRef
20.
go back to reference Revathi, A., Ganapathy, R., & Venkataramani, Y. (2009). Text independent speaker recognition and speaker independent speech recognition using iterative clustering approach. International Journal of Computer Science and Information Technology, 1(2), 30–42. Revathi, A., Ganapathy, R., & Venkataramani, Y. (2009). Text independent speaker recognition and speaker independent speech recognition using iterative clustering approach. International Journal of Computer Science and Information Technology, 1(2), 30–42.
21.
go back to reference Gomez, P. (2011). A text independent speaker recognition system using a novel parametric neural network. Proceedings of International Journal of Signal Processing, Image Processing and Pattern Recognition, 4(4), 1–16. Gomez, P. (2011). A text independent speaker recognition system using a novel parametric neural network. Proceedings of International Journal of Signal Processing, Image Processing and Pattern Recognition, 4(4), 1–16.
Metadata
Title
Noise Robust Speaker Identification Using RASTA–MFCC Feature with Quadrilateral Filter Bank Structure
Authors
S. Selva Nidhyananthan
R. Shantha Selva Kumari
T. Senthur Selvi
Publication date
08-08-2016
Publisher
Springer US
Published in
Wireless Personal Communications / Issue 3/2016
Print ISSN: 0929-6212
Electronic ISSN: 1572-834X
DOI
https://doi.org/10.1007/s11277-016-3530-3

Other articles of this Issue 3/2016

Wireless Personal Communications 3/2016 Go to the issue