Top

Published in:

2018 | OriginalPaper | Chapter

Acoustic Feature Comparison for Different Speaking Rates

Authors : Abdolreza Sabzi Shahrebabaki, Ali Shariq Imran, Negar Olfati, Torbjørn Svendsen

Published in: Human-Computer Interaction. Interaction Technologies

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

This paper investigates the effect of speaking rate variation on the task of frame classification. This task is indicative of the performance on phoneme and word recognition and is a first step towards designing voice-controlled interfaces. Different speaking rates cause different dynamics. For example, speaking rate variations will cause changes both in formant frequencies and in their transition tracks. A word spoken at normal speed gets recognized more often than the same word spoken by the same speaker at a much faster or slower pace, or vice-versa. It is thus imperative to design interfaces which take into account different speaking variabilities. To better incorporate speaker variability into digital devices, we study the effect of (a) feature selection and (b) the choice of network architecture on variable speaking rates. Four different features are evaluated on multiple configurations of Deep Neural Network (DNN) architectures. The findings show that log Filter-Bank Energies (FBE) outperformed the other acoustic features not only on normal speaking rate but for slow and fast speaking rates as well.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Comparing Cascaded LSTM Architectures for Generating Head Motion from Speech in Task-Oriented Dialogs

next chapter Expressing Mixed Emotions via Gradient Color: An Interactive Online Chat Interface Design Based on Affective Recognition

https://www.apple.com/ios/siri/.

https://www.microsoft.com/en-us/windows/cortana.

https://store.google.com/us/product/google_home?hl=en-US.

Zen, H., Senior, A., Schuster, M.: Statistical parametric speech synthesis using deep neural networks. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 20, pp. 1713–1724 (2013)

Chen, G., Parada, C., Heigold, G.: Small-footprint keyword spotting using deep neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4087–4091 (2014)

Martinez, F., Tapias, D., Alvarez, J., Leon, P.: Characteristics of slow, average and fast speech and their effects in large vocabulary continuous speech recognition. In: Fifth European Conference on Speech Communication and Technology (EUROSPEECH), pp. 469–472 (1997)

Brondsted, T., Madsen, J.P.: Analysis of speaking rate variations in stress-timed languages. In: Fifth European Conference on Speech Communication and Technology (EUROSPEECH), pp. 481–484 (1997)

Martinez, J.F., Tapias, D., Alvarez, I.: Toward speech rate independence in large vocabulary continuous speech recognition. In: International Conference on Signal and Speech Processing, pp. 725–728 (1998)

Pfau, T., Ruske, G.: Creating hidden markov models for fast speech. In: Fifth International Conference on Spoken Language Processing, pp. 205–208 (1998)

Wrede, B., Fink, G.A., Sagerer, G.: An investigation of modelling aspects for rate-dependent speech recognition. In: Proceedings of the INTERSPEECH, pp. 2527–2530 (2001)

Xu, M., Zhang, L., Wang, L.: Database collection for study on speech variation robust speaker recognition. In: Proceedings of the O-COCOSDA (2008)

Grimaldi, M., Cummins, F.: Speech style and speaker recognition: a case study. In: Tenth Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 920–923 (2009)

10.

Faltlhauser, R., Ruske, G., Thomae, M.: Towards the question: why has speaking rate such an impact on speech recognition performance? In: Seventh International Conference on Spoken Language Processing (ICSLP), pp. 2429–2432 (2002)

11.

Rozi, A., Li, L., Wang, D., Zheng, T.F.: Feature transformation for speaker verification under speaking rate mismatch condition. In: Signal and Information Processing Association Annual Summit and Conference (APSIPA), pp. 1–4. IEEE (2016)

12.

Zeng, X., Yin, S., Wang, D.: Learning speech rate in speech recognition. In: Proceedings of the INTERSPEECH, pp. 528–532 (2015)

13.

Yuan, J., Liberman, M., Cieri, C.: Towards an integrated understanding of speaking rate in conversation. In: Proceedings of the INTERSPEECH, pp. 541–544 (2006)

14.

Rao, K.S., Koolagudi, S.G.: Robust emotion recognition using speaking rate features. In: Robust Emotion Recognition using Spectral and Prosodic Features. Springer Briefs in Electrical and Computer Engineering, pp. 85–94. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-6360-3_5

15.

Guion, S.G., Flege, J.E., Liu, S.H., Yeni-Komshian, G.H.: Age of learning effects on the duration of sentences produced in a second language. Appl. Psycholinguistics 21, 205–228 (2000)CrossRef

16.

Baese-Berk, M.M., Morrill, T.H.: Speaking rate consistency in native and non-native speakers of English. J. Acoust. Soc. Am. 138(3), EL223–EL228 (2015)CrossRef

17.

Morrill, T., Baese-Berk, M., Bradlow, A.: Speaking rate consistency and variability in spontaneous speech by native and non-native speakers of English. In: Proceedings of the International Conference on Speech Prosody, pp. 1119–1123 (2016)

18.

Francis, A.L., Nusbaum, H.C.: Paying attention to speaking rate. In: Fourth International Conference on Spoken Language (ICSLP), pp. 1537–1540. IEEE, October 1996

19.

Meyer, B.T., Wesker, T., Brand, T., Mertins, A., Kollmeier, B.: A human-machine comparison in speech recognition based on a logatome corpus. In: Workshop on Speech Recognition and Intrinsic Variation, pp. 95–101 (2006)

20.

Meyer, B.T., Brand, T., Kollmeier, B.: Effect of speech-intrinsic variations on human and automatic recognition of spoken phonemes. J. Acous. Soc. Am. 129(1), 388–403 (2011)CrossRef

21.

Exter, M., Meyer, B.T.: DNN-based automatic speech recognition as a model for human phoneme perception. In: Proceedings of the INTERSPEECH, pp. 615–619 (2016)

22.

Varghese, D., Mathew, D.: Phoneme classification using reservoirs with MFCC and Rasta-PLP features. In: Computer Communication and Informatics (ICCCI), pp. 1–6. IEEE (2016)

23.

Yang, J., Cao, T., Sun, X., Huang, S., Huan, L.: Phoneme classification based on supervised manifold learning. In: Robotics and Applications (ISRA), pp. 931–934. IEEE (2012)

24.

Laleye, F.A., Ezin, E.C., Motamed, C.: Adaptive decision-level fusion for Fongbe phoneme classification using fuzzy logic and deep belief networks. In: Informatics in Control, Automation and Robotics (ICINCO), pp. 15–24. IEEE (2015)

25.

Meftah, A., Alotaibi, Y.A., Selouani, S.A.: A comparative study of different speech features for arabic phonemes classification. In: Modelling Symposium (EMS), pp. 47–52. IEEE (2016)

26.

Bharali, S.S., Kalita, S.K.: A comparative study of different features for isolated spoken word recognition using HMM with reference to Assamese language. Int. J. Speech Technol. 18(4), 673–684 (2015)CrossRef

27.

Kiktova, E., Lojka, M., Pleva, M., Juhar, J., Cizmar, A.: Comparison of different feature types for acoustic event detection system. In: Dziech, A., Czyżewski, A. (eds.) MCSS 2013. CCIS, vol. 368, pp. 288–297. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38559-9_25CrossRef

28.

Sukhummek, P., Kasuriya, S., Theeramunkong, T., Wutiwiwatchai, C., Kunieda, H.: Feature selection experiments on emotional speech classification. In: Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), pp. 1–4. IEEE (2015)

29.

Hinton, G., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)MathSciNetCrossRef

30.

Meyer, B., Brand, T., Kollmeier, B.: Phoneme confusions in human and automatic speech recognition. In: Proceedings of the INTERSPEECH, pp. 1485–1488 (2007)

31.

Markel, J.D., Gray, A.J.: Linear Prediction of Speech, vol. 12. Springer, Heidelberg (2013)MATH

Title: Acoustic Feature Comparison for Different Speaking Rates
Authors: Abdolreza Sabzi Shahrebabaki
Ali Shariq Imran
Negar Olfati
Torbjørn Svendsen
Publisher: Springer International Publishing
Book: Human-Computer Interaction. Interaction Technologies
Print ISBN: 978-3-319-91249-3

Electronic ISBN: 978-3-319-91250-9

Copyright Year: 2018
DOI: https://doi.org/10.1007/978-3-319-91250-9_14

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"