Skip to main content
Top
Published in: International Journal of Speech Technology 2/2018

28-03-2018

Emirati-accented speaker identification in each of neutral and shouted talking environments

Authors: Ismail Shahin, Ali Bou Nassif, Mohammed Bahutair

Published in: International Journal of Speech Technology | Issue 2/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This work is devoted to capturing Emirati-accented speech database (Arabic United Arab Emirates database) in each of neutral and shouted talking environments in order to study and enhance text-independent Emirati-accented “speaker identification performance in shouted environment” based on each of “first-order circular suprasegmental hidden Markov models (CSPHMM1s), second-order circular suprasegmental hidden Markov models (CSPHMM2s), and third-order circular suprasegmental hidden Markov models (CSPHMM3s)” as classifiers. In this research, our database was collected from 50 Emirati native speakers (25 per gender) uttering eight common Emirati sentences in each of neutral and shouted talking environments. The extracted features of our collected database are called “Mel-Frequency Cepstral Coefficients (MFCCs)”. Our results show that average Emirati-accented speaker identification performance in neutral environment is 94.0, 95.2, and 95.9% based on CSPHMM1s, CSPHMM2s, and CSPHMM3s, respectively. On the other hand, the average performance in shouted environment is 51.3, 55.5, and 59.3% based, respectively, on “CSPHMM1s, CSPHMM2s, and CSPHMM3s”. The achieved “average speaker identification performance in shouted environment based on CSPHMM3s” is very similar to that obtained in “subjective assessment by human listeners”.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Al-Dahri, S. S., Al-Jassar, Y. H., Alotaibi, Y. A., Alsulaiman, M. M., & Abdullah-Al-Mamun, K. A. (2008). A word-dependent automatic Arabic speaker identification system. In signal processing and information technology (ISSPIT 2008) (pp. 198–202). Al-Dahri, S. S., Al-Jassar, Y. H., Alotaibi, Y. A., Alsulaiman, M. M., & Abdullah-Al-Mamun, K. A. (2008). A word-dependent automatic Arabic speaker identification system. In signal processing and information technology (ISSPIT 2008) (pp. 198–202).
go back to reference Campbell, W. M., Campbell, J. R., Reynolds, D. A., Singer, E., & Torres-Carrasquillo, P. A. (2006). Support vector machines for speaker and language recognition. Computer Speech and Language, 20, 210–229.CrossRef Campbell, W. M., Campbell, J. R., Reynolds, D. A., Singer, E., & Torres-Carrasquillo, P. A. (2006). Support vector machines for speaker and language recognition. Computer Speech and Language, 20, 210–229.CrossRef
go back to reference Casale, S., Russo, A., & Serano, S. (2007). Multistyle classification of speech under stress using feature subset selection based on genetic algorithms. Speech Communication, 49(10), 801–810.CrossRef Casale, S., Russo, A., & Serano, S. (2007). Multistyle classification of speech under stress using feature subset selection based on genetic algorithms. Speech Communication, 49(10), 801–810.CrossRef
go back to reference Falk, T. H., & Chan, W. Y. (2010). Modulation spectral features for robust far-field speaker identification. IEEE Transactions on Audio, Speech and Language Processing, 18(1), 90–100.CrossRef Falk, T. H., & Chan, W. Y. (2010). Modulation spectral features for robust far-field speaker identification. IEEE Transactions on Audio, Speech and Language Processing, 18(1), 90–100.CrossRef
go back to reference Farrell, K. R., Mammone, R. J., & Assaleh, K. T. (1994). Speaker recognition using neural networks and conventional classifiers. IEEE Transactions on Speech and Audio Processing, 2, 194–205.CrossRef Farrell, K. R., Mammone, R. J., & Assaleh, K. T. (1994). Speaker recognition using neural networks and conventional classifiers. IEEE Transactions on Speech and Audio Processing, 2, 194–205.CrossRef
go back to reference Furui, S. (1991). Speaker-dependent-feature-extraction, recognition and processing techniques. Speech Communication, 10, 505–520.CrossRef Furui, S. (1991). Speaker-dependent-feature-extraction, recognition and processing techniques. Speech Communication, 10, 505–520.CrossRef
go back to reference Grozdić, I. T., Jovičić, S. T., & Subotić, M. (2017). Whispered speech recognition using deep denoising autoencoder. Engineering Applications of Artificial Intelligence, 59, 15–22.CrossRef Grozdić, I. T., Jovičić, S. T., & Subotić, M. (2017). Whispered speech recognition using deep denoising autoencoder. Engineering Applications of Artificial Intelligence, 59, 15–22.CrossRef
go back to reference Hong, Q. Y., & Kwong, S. (2005). A genetic classification method for speaker recognition. Engineering Applications of Artificial Intelligence, 18, 13–19.CrossRef Hong, Q. Y., & Kwong, S. (2005). A genetic classification method for speaker recognition. Engineering Applications of Artificial Intelligence, 18, 13–19.CrossRef
go back to reference Kinnunen, T., Karpov, E., & Franti, P. (2006). Real-time speaker identification and verification. IEEE Transactions on Audio, Speech and Language Processing, 14(1), 277–288.CrossRefMATH Kinnunen, T., Karpov, E., & Franti, P. (2006). Real-time speaker identification and verification. IEEE Transactions on Audio, Speech and Language Processing, 14(1), 277–288.CrossRefMATH
go back to reference Kinnunen, T., & Li, H. (2010) An overview of text-independent speaker recognition: From features to supervectors. Speech Communication, 52(1), 12–40.CrossRef Kinnunen, T., & Li, H. (2010) An overview of text-independent speaker recognition: From features to supervectors. Speech Communication, 52(1), 12–40.CrossRef
go back to reference Kirchhoff, K., Bilmes, J., Das, S., Duta, N., Egan, M., Ji, G., He, F., Henderson, J., Liu, D., Noamany, M., Schone, P., Schwartz, R., Vergyri, D. (2003) Novel approaches to Arabic speech recognition: Report from the 2002 Johns-Hopkins workshop. In proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP) (vol. 1, 2003, pp. 344–347). Kirchhoff, K., Bilmes, J., Das, S., Duta, N., Egan, M., Ji, G., He, F., Henderson, J., Liu, D., Noamany, M., Schone, P., Schwartz, R., Vergyri, D. (2003) Novel approaches to Arabic speech recognition: Report from the 2002 Johns-Hopkins workshop. In proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP) (vol. 1, 2003, pp. 344–347).
go back to reference Krobba, A., Debyeche, M., Amrouche, A. (2010) Evaluation of speaker identification system using GSMEFR speech data, Proc. 2010 International Conference on Design & Technology of Integrated Systems in Nanoscale Era, Hammamet, March 2010, pp. 1–5. Krobba, A., Debyeche, M., Amrouche, A. (2010) Evaluation of speaker identification system using GSMEFR speech data, Proc. 2010 International Conference on Design & Technology of Integrated Systems in Nanoscale Era, Hammamet, March 2010, pp. 1–5.
go back to reference Mahmood, A., Alsulaiman, M., & Muhammad, G. (2014) Automatic speaker recognition using multi directional local features (MDLF). Arabian Journal for Science and Engineering, 39(5), 3799–3811.CrossRef Mahmood, A., Alsulaiman, M., & Muhammad, G. (2014) Automatic speaker recognition using multi directional local features (MDLF). Arabian Journal for Science and Engineering, 39(5), 3799–3811.CrossRef
go back to reference Pavel, M., Ondrej, G., Ondrej, N., Oldrich, P., Frantisek, G., Lukas, B., & Jan, H. C. (2016). Analysis of DNN approaches to speaker identification. In International conference on acoustics, speech and signal processing 2016 (pp. 5100–5104). Pavel, M., Ondrej, G., Ondrej, N., Oldrich, P., Frantisek, G., Lukas, B., & Jan, H. C. (2016). Analysis of DNN approaches to speaker identification. In International conference on acoustics, speech and signal processing 2016 (pp. 5100–5104).
go back to reference Polzin, T. S., & Waibel, A. H. (1998). Detecting emotions in speech, cooperative multimodal communication. In 2nd International Conference 1998, CMC, 1998. Polzin, T. S., & Waibel, A. H. (1998). Detecting emotions in speech, cooperative multimodal communication. In 2nd International Conference 1998, CMC, 1998.
go back to reference Reynolds, D. A. (1995) Speaker identification and verification using Gaussian mixture speaker models. Speech Communication, 17(1–2), 91–108.CrossRef Reynolds, D. A. (1995) Speaker identification and verification using Gaussian mixture speaker models. Speech Communication, 17(1–2), 91–108.CrossRef
go back to reference Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3(1), 72–83.CrossRef Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3(1), 72–83.CrossRef
go back to reference Saeed, K., & Nammous, M. K. (2007). A speech-and-speaker identification system: Feature extraction, description, and classification of speech signal image. IEEE Transactions on Industrial Electrons, 54(2), 887–897.CrossRef Saeed, K., & Nammous, M. K. (2007). A speech-and-speaker identification system: Feature extraction, description, and classification of speech signal image. IEEE Transactions on Industrial Electrons, 54(2), 887–897.CrossRef
go back to reference Shahin, I. (2006). Enhancing speaker identification performance under the shouted talking condition using second-order circular hidden Markov models. Speech Communication, 48(8), 1047–1055.CrossRef Shahin, I. (2006). Enhancing speaker identification performance under the shouted talking condition using second-order circular hidden Markov models. Speech Communication, 48(8), 1047–1055.CrossRef
go back to reference Shahin, I. (2016). Employing emotion cues to verify speakers in emotional talking environments. Journal of Intelligent Systems, Special Issue on Intelligent Healthcare Systems, 25(1), 3–17.MathSciNet Shahin, I. (2016). Employing emotion cues to verify speakers in emotional talking environments. Journal of Intelligent Systems, Special Issue on Intelligent Healthcare Systems, 25(1), 3–17.MathSciNet
go back to reference Shahin, I. (2008). Speaker identification in the shouted environment using suprasegmental hidden Markov models. Signal Processing Journal, 88(11), 2700–2708.CrossRefMATH Shahin, I. (2008). Speaker identification in the shouted environment using suprasegmental hidden Markov models. Signal Processing Journal, 88(11), 2700–2708.CrossRefMATH
go back to reference Shahin, I. (2013). Speaker identification in emotional talking environments based on CSPHMM2s. Engineering Applications of Artificial Intelligence, 26(7), 1652–1659.CrossRef Shahin, I. (2013). Speaker identification in emotional talking environments based on CSPHMM2s. Engineering Applications of Artificial Intelligence, 26(7), 1652–1659.CrossRef
go back to reference Shahin, I., & Ba-Hutair, M. N. (2014). Emarati speaker identification. In 12th International Conference on Signal Processing (ICSP 2014) (pp. 488–493). HangZhou, China. Shahin, I., & Ba-Hutair, M. N. (2014). Emarati speaker identification. In 12th International Conference on Signal Processing (ICSP 2014) (pp. 488–493). HangZhou, China.
go back to reference Shahin, I., & Botros, N., Modeling and analyzing the vocal tract under normal and stressful talking conditions. In IEEE SOUTHEASTCON 2001., Clemson, March 2001, pp. 213–220. Shahin, I., & Botros, N., Modeling and analyzing the vocal tract under normal and stressful talking conditions. In IEEE SOUTHEASTCON 2001., Clemson, March 2001, pp. 213–220.
go back to reference Staroniewicz, P., & Majewski, W. (2004) SVM based text-dependent speaker identification for large set of voices. In 12th European Signal Processing Conference, EUSIPCO 2004, Vienna, Austria, September 2004, pp. 333–336. Staroniewicz, P., & Majewski, W. (2004) SVM based text-dependent speaker identification for large set of voices. In 12th European Signal Processing Conference, EUSIPCO 2004, Vienna, Austria, September 2004, pp. 333–336.
go back to reference Tolba, H. (2011). A high-performance text-independent speaker identification of Arabic speakers using a CHMM-based approach. Alexandria Engineering, 50, 43–47.CrossRef Tolba, H. (2011). A high-performance text-independent speaker identification of Arabic speakers using a CHMM-based approach. Alexandria Engineering, 50, 43–47.CrossRef
go back to reference Zhao, X., Shao, Y., & Wang, D. (2012). CASA-based robust speaker identification. IEEE Transactions on Audio, Speech and Language Processing Journal, 20(5), 1608–1616.CrossRef Zhao, X., Shao, Y., & Wang, D. (2012). CASA-based robust speaker identification. IEEE Transactions on Audio, Speech and Language Processing Journal, 20(5), 1608–1616.CrossRef
go back to reference Zheng, C., & Yuan, B. Z. (1988). Text-dependent speaker identification using circular hidden Markov models. In IEEE International Conference on Acoustics, Speech and Signal Processing, S13.3, pp. 580–582. Zheng, C., & Yuan, B. Z. (1988). Text-dependent speaker identification using circular hidden Markov models. In IEEE International Conference on Acoustics, Speech and Signal Processing, S13.3, pp. 580–582.
Metadata
Title
Emirati-accented speaker identification in each of neutral and shouted talking environments
Authors
Ismail Shahin
Ali Bou Nassif
Mohammed Bahutair
Publication date
28-03-2018
Publisher
Springer US
Published in
International Journal of Speech Technology / Issue 2/2018
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-018-9502-0

Other articles of this Issue 2/2018

International Journal of Speech Technology 2/2018 Go to the issue