Skip to main content
Top

2018 | OriginalPaper | Chapter

Acoustic Feature Comparison for Different Speaking Rates

Authors : Abdolreza Sabzi Shahrebabaki, Ali Shariq Imran, Negar Olfati, Torbjørn Svendsen

Published in: Human-Computer Interaction. Interaction Technologies

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This paper investigates the effect of speaking rate variation on the task of frame classification. This task is indicative of the performance on phoneme and word recognition and is a first step towards designing voice-controlled interfaces. Different speaking rates cause different dynamics. For example, speaking rate variations will cause changes both in formant frequencies and in their transition tracks. A word spoken at normal speed gets recognized more often than the same word spoken by the same speaker at a much faster or slower pace, or vice-versa. It is thus imperative to design interfaces which take into account different speaking variabilities. To better incorporate speaker variability into digital devices, we study the effect of (a) feature selection and (b) the choice of network architecture on variable speaking rates. Four different features are evaluated on multiple configurations of Deep Neural Network (DNN) architectures. The findings show that log Filter-Bank Energies (FBE) outperformed the other acoustic features not only on normal speaking rate but for slow and fast speaking rates as well.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Zen, H., Senior, A., Schuster, M.: Statistical parametric speech synthesis using deep neural networks. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 20, pp. 1713–1724 (2013) Zen, H., Senior, A., Schuster, M.: Statistical parametric speech synthesis using deep neural networks. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 20, pp. 1713–1724 (2013)
2.
go back to reference Chen, G., Parada, C., Heigold, G.: Small-footprint keyword spotting using deep neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4087–4091 (2014) Chen, G., Parada, C., Heigold, G.: Small-footprint keyword spotting using deep neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4087–4091 (2014)
3.
go back to reference Martinez, F., Tapias, D., Alvarez, J., Leon, P.: Characteristics of slow, average and fast speech and their effects in large vocabulary continuous speech recognition. In: Fifth European Conference on Speech Communication and Technology (EUROSPEECH), pp. 469–472 (1997) Martinez, F., Tapias, D., Alvarez, J., Leon, P.: Characteristics of slow, average and fast speech and their effects in large vocabulary continuous speech recognition. In: Fifth European Conference on Speech Communication and Technology (EUROSPEECH), pp. 469–472 (1997)
4.
go back to reference Brondsted, T., Madsen, J.P.: Analysis of speaking rate variations in stress-timed languages. In: Fifth European Conference on Speech Communication and Technology (EUROSPEECH), pp. 481–484 (1997) Brondsted, T., Madsen, J.P.: Analysis of speaking rate variations in stress-timed languages. In: Fifth European Conference on Speech Communication and Technology (EUROSPEECH), pp. 481–484 (1997)
5.
go back to reference Martinez, J.F., Tapias, D., Alvarez, I.: Toward speech rate independence in large vocabulary continuous speech recognition. In: International Conference on Signal and Speech Processing, pp. 725–728 (1998) Martinez, J.F., Tapias, D., Alvarez, I.: Toward speech rate independence in large vocabulary continuous speech recognition. In: International Conference on Signal and Speech Processing, pp. 725–728 (1998)
6.
go back to reference Pfau, T., Ruske, G.: Creating hidden markov models for fast speech. In: Fifth International Conference on Spoken Language Processing, pp. 205–208 (1998) Pfau, T., Ruske, G.: Creating hidden markov models for fast speech. In: Fifth International Conference on Spoken Language Processing, pp. 205–208 (1998)
7.
go back to reference Wrede, B., Fink, G.A., Sagerer, G.: An investigation of modelling aspects for rate-dependent speech recognition. In: Proceedings of the INTERSPEECH, pp. 2527–2530 (2001) Wrede, B., Fink, G.A., Sagerer, G.: An investigation of modelling aspects for rate-dependent speech recognition. In: Proceedings of the INTERSPEECH, pp. 2527–2530 (2001)
8.
go back to reference Xu, M., Zhang, L., Wang, L.: Database collection for study on speech variation robust speaker recognition. In: Proceedings of the O-COCOSDA (2008) Xu, M., Zhang, L., Wang, L.: Database collection for study on speech variation robust speaker recognition. In: Proceedings of the O-COCOSDA (2008)
9.
go back to reference Grimaldi, M., Cummins, F.: Speech style and speaker recognition: a case study. In: Tenth Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 920–923 (2009) Grimaldi, M., Cummins, F.: Speech style and speaker recognition: a case study. In: Tenth Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 920–923 (2009)
10.
go back to reference Faltlhauser, R., Ruske, G., Thomae, M.: Towards the question: why has speaking rate such an impact on speech recognition performance? In: Seventh International Conference on Spoken Language Processing (ICSLP), pp. 2429–2432 (2002) Faltlhauser, R., Ruske, G., Thomae, M.: Towards the question: why has speaking rate such an impact on speech recognition performance? In: Seventh International Conference on Spoken Language Processing (ICSLP), pp. 2429–2432 (2002)
11.
go back to reference Rozi, A., Li, L., Wang, D., Zheng, T.F.: Feature transformation for speaker verification under speaking rate mismatch condition. In: Signal and Information Processing Association Annual Summit and Conference (APSIPA), pp. 1–4. IEEE (2016) Rozi, A., Li, L., Wang, D., Zheng, T.F.: Feature transformation for speaker verification under speaking rate mismatch condition. In: Signal and Information Processing Association Annual Summit and Conference (APSIPA), pp. 1–4. IEEE (2016)
12.
go back to reference Zeng, X., Yin, S., Wang, D.: Learning speech rate in speech recognition. In: Proceedings of the INTERSPEECH, pp. 528–532 (2015) Zeng, X., Yin, S., Wang, D.: Learning speech rate in speech recognition. In: Proceedings of the INTERSPEECH, pp. 528–532 (2015)
13.
go back to reference Yuan, J., Liberman, M., Cieri, C.: Towards an integrated understanding of speaking rate in conversation. In: Proceedings of the INTERSPEECH, pp. 541–544 (2006) Yuan, J., Liberman, M., Cieri, C.: Towards an integrated understanding of speaking rate in conversation. In: Proceedings of the INTERSPEECH, pp. 541–544 (2006)
14.
15.
go back to reference Guion, S.G., Flege, J.E., Liu, S.H., Yeni-Komshian, G.H.: Age of learning effects on the duration of sentences produced in a second language. Appl. Psycholinguistics 21, 205–228 (2000)CrossRef Guion, S.G., Flege, J.E., Liu, S.H., Yeni-Komshian, G.H.: Age of learning effects on the duration of sentences produced in a second language. Appl. Psycholinguistics 21, 205–228 (2000)CrossRef
16.
go back to reference Baese-Berk, M.M., Morrill, T.H.: Speaking rate consistency in native and non-native speakers of English. J. Acoust. Soc. Am. 138(3), EL223–EL228 (2015)CrossRef Baese-Berk, M.M., Morrill, T.H.: Speaking rate consistency in native and non-native speakers of English. J. Acoust. Soc. Am. 138(3), EL223–EL228 (2015)CrossRef
17.
go back to reference Morrill, T., Baese-Berk, M., Bradlow, A.: Speaking rate consistency and variability in spontaneous speech by native and non-native speakers of English. In: Proceedings of the International Conference on Speech Prosody, pp. 1119–1123 (2016) Morrill, T., Baese-Berk, M., Bradlow, A.: Speaking rate consistency and variability in spontaneous speech by native and non-native speakers of English. In: Proceedings of the International Conference on Speech Prosody, pp. 1119–1123 (2016)
18.
go back to reference Francis, A.L., Nusbaum, H.C.: Paying attention to speaking rate. In: Fourth International Conference on Spoken Language (ICSLP), pp. 1537–1540. IEEE, October 1996 Francis, A.L., Nusbaum, H.C.: Paying attention to speaking rate. In: Fourth International Conference on Spoken Language (ICSLP), pp. 1537–1540. IEEE, October 1996
19.
go back to reference Meyer, B.T., Wesker, T., Brand, T., Mertins, A., Kollmeier, B.: A human-machine comparison in speech recognition based on a logatome corpus. In: Workshop on Speech Recognition and Intrinsic Variation, pp. 95–101 (2006) Meyer, B.T., Wesker, T., Brand, T., Mertins, A., Kollmeier, B.: A human-machine comparison in speech recognition based on a logatome corpus. In: Workshop on Speech Recognition and Intrinsic Variation, pp. 95–101 (2006)
20.
go back to reference Meyer, B.T., Brand, T., Kollmeier, B.: Effect of speech-intrinsic variations on human and automatic recognition of spoken phonemes. J. Acous. Soc. Am. 129(1), 388–403 (2011)CrossRef Meyer, B.T., Brand, T., Kollmeier, B.: Effect of speech-intrinsic variations on human and automatic recognition of spoken phonemes. J. Acous. Soc. Am. 129(1), 388–403 (2011)CrossRef
21.
go back to reference Exter, M., Meyer, B.T.: DNN-based automatic speech recognition as a model for human phoneme perception. In: Proceedings of the INTERSPEECH, pp. 615–619 (2016) Exter, M., Meyer, B.T.: DNN-based automatic speech recognition as a model for human phoneme perception. In: Proceedings of the INTERSPEECH, pp. 615–619 (2016)
22.
go back to reference Varghese, D., Mathew, D.: Phoneme classification using reservoirs with MFCC and Rasta-PLP features. In: Computer Communication and Informatics (ICCCI), pp. 1–6. IEEE (2016) Varghese, D., Mathew, D.: Phoneme classification using reservoirs with MFCC and Rasta-PLP features. In: Computer Communication and Informatics (ICCCI), pp. 1–6. IEEE (2016)
23.
go back to reference Yang, J., Cao, T., Sun, X., Huang, S., Huan, L.: Phoneme classification based on supervised manifold learning. In: Robotics and Applications (ISRA), pp. 931–934. IEEE (2012) Yang, J., Cao, T., Sun, X., Huang, S., Huan, L.: Phoneme classification based on supervised manifold learning. In: Robotics and Applications (ISRA), pp. 931–934. IEEE (2012)
24.
go back to reference Laleye, F.A., Ezin, E.C., Motamed, C.: Adaptive decision-level fusion for Fongbe phoneme classification using fuzzy logic and deep belief networks. In: Informatics in Control, Automation and Robotics (ICINCO), pp. 15–24. IEEE (2015) Laleye, F.A., Ezin, E.C., Motamed, C.: Adaptive decision-level fusion for Fongbe phoneme classification using fuzzy logic and deep belief networks. In: Informatics in Control, Automation and Robotics (ICINCO), pp. 15–24. IEEE (2015)
25.
go back to reference Meftah, A., Alotaibi, Y.A., Selouani, S.A.: A comparative study of different speech features for arabic phonemes classification. In: Modelling Symposium (EMS), pp. 47–52. IEEE (2016) Meftah, A., Alotaibi, Y.A., Selouani, S.A.: A comparative study of different speech features for arabic phonemes classification. In: Modelling Symposium (EMS), pp. 47–52. IEEE (2016)
26.
go back to reference Bharali, S.S., Kalita, S.K.: A comparative study of different features for isolated spoken word recognition using HMM with reference to Assamese language. Int. J. Speech Technol. 18(4), 673–684 (2015)CrossRef Bharali, S.S., Kalita, S.K.: A comparative study of different features for isolated spoken word recognition using HMM with reference to Assamese language. Int. J. Speech Technol. 18(4), 673–684 (2015)CrossRef
28.
go back to reference Sukhummek, P., Kasuriya, S., Theeramunkong, T., Wutiwiwatchai, C., Kunieda, H.: Feature selection experiments on emotional speech classification. In: Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), pp. 1–4. IEEE (2015) Sukhummek, P., Kasuriya, S., Theeramunkong, T., Wutiwiwatchai, C., Kunieda, H.: Feature selection experiments on emotional speech classification. In: Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), pp. 1–4. IEEE (2015)
29.
go back to reference Hinton, G., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)MathSciNetCrossRef Hinton, G., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)MathSciNetCrossRef
30.
go back to reference Meyer, B., Brand, T., Kollmeier, B.: Phoneme confusions in human and automatic speech recognition. In: Proceedings of the INTERSPEECH, pp. 1485–1488 (2007) Meyer, B., Brand, T., Kollmeier, B.: Phoneme confusions in human and automatic speech recognition. In: Proceedings of the INTERSPEECH, pp. 1485–1488 (2007)
31.
go back to reference Markel, J.D., Gray, A.J.: Linear Prediction of Speech, vol. 12. Springer, Heidelberg (2013)MATH Markel, J.D., Gray, A.J.: Linear Prediction of Speech, vol. 12. Springer, Heidelberg (2013)MATH
Metadata
Title
Acoustic Feature Comparison for Different Speaking Rates
Authors
Abdolreza Sabzi Shahrebabaki
Ali Shariq Imran
Negar Olfati
Torbjørn Svendsen
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-91250-9_14