Skip to main content
Top
Published in: International Journal of Speech Technology 4/2017

24-08-2017

Processing degraded speech for text dependent speaker verification

Authors: Banriskhem K. Khonglah, Ramesh K. Bhukya, S. R. Mahadeva Prasanna

Published in: International Journal of Speech Technology | Issue 4/2017

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This work explores the use of speech enhancement for enhancing degraded speech which may be useful for text dependent speaker verification system. The degradation may be due to noise or background speech. The text dependent speaker verification is based on the dynamic time warping (DTW) method. Hence there is a necessity of the end point detection. The end point detection can be performed easily if the speech is clean. However the presence of degradation tends to give errors in the estimation of the end points and this error propagates into the overall accuracy of the speaker verification system. Temporal and spectral enhancement is performed on the degraded speech so that ideally the nature of the enhanced speech will be similar to the clean speech. Results show that the temporal and spectral processing methods do contribute to the task by eliminating the degradation and improved accuracy is obtained for the text dependent speaker verification system using DTW.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Boll, S. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27(2), 113–120.CrossRef Boll, S. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27(2), 113–120.CrossRef
go back to reference Chakrabarty, D., Prasanna, S. R., Mahadeva, Das, & Kumar, Rohan. (2013). Development and evaluation of online text-independent speaker verification system for remote person authentication. International Journal of Speech Technology, 16(1), 75–88.CrossRef Chakrabarty, D., Prasanna, S. R., Mahadeva, Das, & Kumar, Rohan. (2013). Development and evaluation of online text-independent speaker verification system for remote person authentication. International Journal of Speech Technology, 16(1), 75–88.CrossRef
go back to reference Das, C. K., Sanaullah, M., Sarower, H. M. G., & Hassan, M. M. (2009). Development of a cell phone based remote control system: An effective switching system for controlling home and office appliances. International Journal of Electrical and Computer Sciences IJECS, 9(10), 37–43. Das, C. K., Sanaullah, M., Sarower, H. M. G., & Hassan, M. M. (2009). Development of a cell phone based remote control system: An effective switching system for controlling home and office appliances. International Journal of Electrical and Computer Sciences IJECS, 9(10), 37–43.
go back to reference Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 357–366.CrossRef Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 357–366.CrossRef
go back to reference Deepak, K. T., & Prasanna, S. R. M. (2016). Foreground speech segmentation and enhancement using glottal closure instants and mel cepstral coefficients. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(7), 1204–1218.CrossRef Deepak, K. T., & Prasanna, S. R. M. (2016). Foreground speech segmentation and enhancement using glottal closure instants and mel cepstral coefficients. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(7), 1204–1218.CrossRef
go back to reference Dehak, N., Kenny, P. J., Dehak, R., Dumouchel, P., & Ouellet, P. (2011). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 19(4), 788–798.CrossRef Dehak, N., Kenny, P. J., Dehak, R., Dumouchel, P., & Ouellet, P. (2011). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 19(4), 788–798.CrossRef
go back to reference Ephraim, Y., & Malah, D. (1985). Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 33(2), 443–445.CrossRef Ephraim, Y., & Malah, D. (1985). Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 33(2), 443–445.CrossRef
go back to reference Furui, S. (1981). Cepstral analysis technique for automatic speaker verification. IEEE Transactions on Acoustics, Speech and Signal Processing, 29, 254–272.CrossRef Furui, S. (1981). Cepstral analysis technique for automatic speaker verification. IEEE Transactions on Acoustics, Speech and Signal Processing, 29, 254–272.CrossRef
go back to reference Haris, B., Pradhan, G., Misra, A., Shukla, S., Sinha, R., Prasanna, S., (2011). Multi-variability speech database for robust speaker recognition. In Communications (NCC), 2011 National conference on IEEE, pp. 1–5. Haris, B., Pradhan, G., Misra, A., Shukla, S., Sinha, R., Prasanna, S., (2011). Multi-variability speech database for robust speaker recognition. In Communications (NCC), 2011 National conference on IEEE, pp. 1–5.
go back to reference Hébert, M., (2008). Text-dependent speaker recognition. In Springer handbook of speech processing, pp. 743–762. Hébert, M., (2008). Text-dependent speaker recognition. In Springer handbook of speech processing, pp. 743–762.
go back to reference Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: From features to supervectors. Speech Communication, 52(1), 12–40.CrossRef Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: From features to supervectors. Speech Communication, 52(1), 12–40.CrossRef
go back to reference Krishnamoorthy, P., & Prasanna, S. R. M. (2011). Enhancement of noisy speech by temporal and spectral processing. Speech Communication, 53(2), 154–174.CrossRef Krishnamoorthy, P., & Prasanna, S. R. M. (2011). Enhancement of noisy speech by temporal and spectral processing. Speech Communication, 53(2), 154–174.CrossRef
go back to reference Larcher, A., Lee, K. A., Ma, B., & Li, H. (2014). Text-dependent speaker verification: Classifiers, databases and rsr2015. Speech Communication, 60, 56–77.CrossRef Larcher, A., Lee, K. A., Ma, B., & Li, H. (2014). Text-dependent speaker verification: Classifiers, databases and rsr2015. Speech Communication, 60, 56–77.CrossRef
go back to reference Mahanta, D., Paul, A., Ramesh K Bhukya, Rohan K Das, Sinha, R, Prasanna, S.R.M., (2016). Warping path and gross spectrum information for speaker verification under degraded condition. In Communication (NCC), 2016 Twenty Second National Conference on IEEE, pp. 1–6. Mahanta, D., Paul, A., Ramesh K Bhukya, Rohan K Das, Sinha, R, Prasanna, S.R.M., (2016). Warping path and gross spectrum information for speaker verification under degraded condition. In Communication (NCC), 2016 Twenty Second National Conference on IEEE, pp. 1–6.
go back to reference Marinov, S., (2003). Text dependent and text independent speaker verification system: Technology and application. Overview article. Marinov, S., (2003). Text dependent and text independent speaker verification system: Technology and application. Overview article.
go back to reference Murthy, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, Language Processing, 16(8), 16021613. Murthy, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, Language Processing, 16(8), 16021613.
go back to reference Onukwugha, C., & Asagba, P. (2013). Remote control of home appliances using mobile phone: A polymorphous based system. African Journal of Computing and ICT, 6(5), 81–90. Onukwugha, C., & Asagba, P. (2013). Remote control of home appliances using mobile phone: A polymorphous based system. African Journal of Computing and ICT, 6(5), 81–90.
go back to reference Pandit, M., Kittler, J., (1998). Feature selection for a dtw-based speaker verification system. In Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on IEEE, Vol. 2., pp. 769–772. Pandit, M., Kittler, J., (1998). Feature selection for a dtw-based speaker verification system. In Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on IEEE, Vol. 2., pp. 769–772.
go back to reference Piyare, R., Tazil, M., (2011). Bluetooth based home automation system using cell phone. In Consumer Electronics (ISCE), 2011 IEEE 15th International Symposium on IEEE, pp. 192–195. Piyare, R., Tazil, M., (2011). Bluetooth based home automation system using cell phone. In Consumer Electronics (ISCE), 2011 IEEE 15th International Symposium on IEEE, pp. 192–195.
go back to reference Pradhan, G., & Prasanna, S. M. (2011). Speaker verification under degraded condition: A perceptual study. International Journal of Speech Technology, 14(4), 405.CrossRef Pradhan, G., & Prasanna, S. M. (2011). Speaker verification under degraded condition: A perceptual study. International Journal of Speech Technology, 14(4), 405.CrossRef
go back to reference Pradhan, G., & Prasanna, S. M. (2013). Speaker verification by vowel and nonvowel like segmentation. IEEE Transactions on Audio, Speech, and Language Processing, 21(4), 854–867.CrossRef Pradhan, G., & Prasanna, S. M. (2013). Speaker verification by vowel and nonvowel like segmentation. IEEE Transactions on Audio, Speech, and Language Processing, 21(4), 854–867.CrossRef
go back to reference Prasanna, S. M., Zachariah, J. M., Yegnanarayana, B., (2003). Begin-end detection using vowel onset points. In Workshop on Spoken Language Processing. Prasanna, S. M., Zachariah, J. M., Yegnanarayana, B., (2003). Begin-end detection using vowel onset points. In Workshop on Spoken Language Processing.
go back to reference Prasanna, S. R. M., Zachariah, J. M., Yegnanarayana, B. (2003). Begin-end detection using vowel onset points. In Workshop on Spoken Language Processing, (TIFR, Mumbai, India). Prasanna, S. R. M., Zachariah, J. M., Yegnanarayana, B. (2003). Begin-end detection using vowel onset points. In Workshop on Spoken Language Processing, (TIFR, Mumbai, India).
go back to reference Rabiner, L., & Juang, B.-H. (1993a). Fundamentals of speech recognition. New Jersey: Pearson Education.MATH Rabiner, L., & Juang, B.-H. (1993a). Fundamentals of speech recognition. New Jersey: Pearson Education.MATH
go back to reference Rabiner, L. R., & Juang, B. H. (1993b). Fundamentals of speech recognition. Upper Saddle River: Prentice-Hall.MATH Rabiner, L. R., & Juang, B. H. (1993b). Fundamentals of speech recognition. Upper Saddle River: Prentice-Hall.MATH
go back to reference Rabiner, L. R., Rosenberg, A. E., & Levinson, S. E. (1978). Considerations in dynamic time warping algorithms for discrete word recognition. The Journal of the Acoustical Society of America, 63(S1), S79–S79.CrossRefMATH Rabiner, L. R., Rosenberg, A. E., & Levinson, S. E. (1978). Considerations in dynamic time warping algorithms for discrete word recognition. The Journal of the Acoustical Society of America, 63(S1), S79–S79.CrossRefMATH
go back to reference Sakoe, H., & Chiba, S. (1978). Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 26(1), 43–49.CrossRefMATH Sakoe, H., & Chiba, S. (1978). Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 26(1), 43–49.CrossRefMATH
go back to reference Savoji, M. H. (1989). A robust algorithm for accurate endpointing of speech. Speech Communication, 8, 45–60.CrossRef Savoji, M. H. (1989). A robust algorithm for accurate endpointing of speech. Speech Communication, 8, 45–60.CrossRef
go back to reference Shahriyar, R., Hoque, E., Sohan, S., Naim, I., Akbar, M. M., & Khan, M. K. (2008). Remote controlling of home appliances using mobile telephony. International Journal of Smart Home, 2(3), 37–54. Shahriyar, R., Hoque, E., Sohan, S., Naim, I., Akbar, M. M., & Khan, M. K. (2008). Remote controlling of home appliances using mobile telephony. International Journal of Smart Home, 2(3), 37–54.
go back to reference Subhadeep Dey, Sujit Barman, Ramesh K Bhukya, Rohan K Das, Haris, BC, Prasanna, S.R.M., Sinha, R, (2014). Speech biometric based attendance system. In Communications (NCC), 2014 Twentieth National Conference on IEEE, pp. 1–6. Subhadeep Dey, Sujit Barman, Ramesh K Bhukya, Rohan K Das, Haris, BC, Prasanna, S.R.M., Sinha, R, (2014). Speech biometric based attendance system. In Communications (NCC), 2014 Twentieth National Conference on IEEE, pp. 1–6.
go back to reference Tsao, C., Gray, R. M., (1984). An endpoint detection for lpc speech using residual look-ahead for vector quantization applications. In IEEE International conference on acoustic, speech, signal processing. Tsao, C., Gray, R. M., (1984). An endpoint detection for lpc speech using residual look-ahead for vector quantization applications. In IEEE International conference on acoustic, speech, signal processing.
go back to reference Varga, A., & Steeneken, H. J. (1993). Assessment for automatic speech recognition: Ii. noisex-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication, 12(3), 247–251.CrossRef Varga, A., & Steeneken, H. J. (1993). Assessment for automatic speech recognition: Ii. noisex-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication, 12(3), 247–251.CrossRef
go back to reference Yegnanarayana, B., Prasanna, S. R. M., Zachariah, J. M., & Gupta, C. S. (2005). Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system. IEEE Transactions on Acoustics, Speech and Signal Processing, 13, 575–582.CrossRef Yegnanarayana, B., Prasanna, S. R. M., Zachariah, J. M., & Gupta, C. S. (2005). Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system. IEEE Transactions on Acoustics, Speech and Signal Processing, 13, 575–582.CrossRef
go back to reference Yegnanarayana, B., Prasanna, S. R. M., Zachariah, J. M., & Gupta, S. (2005). Combining evidence from source, suprasegmental and spectral features for a fixed text speaker verification system. IEEE Transactions on Speech and Audio Processing, 13(4), 575–582.CrossRef Yegnanarayana, B., Prasanna, S. R. M., Zachariah, J. M., & Gupta, S. (2005). Combining evidence from source, suprasegmental and spectral features for a fixed text speaker verification system. IEEE Transactions on Speech and Audio Processing, 13(4), 575–582.CrossRef
Metadata
Title
Processing degraded speech for text dependent speaker verification
Authors
Banriskhem K. Khonglah
Ramesh K. Bhukya
S. R. Mahadeva Prasanna
Publication date
24-08-2017
Publisher
Springer US
Published in
International Journal of Speech Technology / Issue 4/2017
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-017-9451-z

Other articles of this Issue 4/2017

International Journal of Speech Technology 4/2017 Go to the issue