Skip to main content

2019 | OriginalPaper | Buchkapitel

Evaluating the Performance of ASR Systems for TV Interactions in Several Domestic Noise Scenarios

verfasst von : Pedro Beça, Jorge Abreu, Rita Santos, Ana Rodrigues

Erschienen in: Applications and Usability of Interactive TV

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Voice interaction with the television is becoming a reality on domestic environments. However, one of the factors that influences the correct operation of these systems is the background noise that obstructs the performance of the automatic speech recognition (ASR) component. In order to further understand this issue, the paper presents an analysis of the performance of three ASR systems (Bing Speech API, Google API, and Nuance ASR) in several domestic noise scenarios resembling the interaction with the TV on a domestic context. A group of 36 users was asked to utter sentences based on TV requests, where the sentences’ corpus comprised typical phrases used when interacting with the TV. To better know the behavior, performance and robustness of each ASR to noise, the tests were carried out with three recording devices placed at different distances from the user. Google ASR proved to be the most robust to noise with a higher recognition precision, followed by Bing Speech and Nuance. The results obtained showed that ASR systems performance is globally quite robust but tends to deteriorate with domestic background noise. Future replications of the evaluation setup will allow the evaluation of ASR solutions in other scenarios.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Bernhaupt, R., Boutonnnet, M., Gatellier, B., Gimenez, Y., Pouchepanadin, C., Souiba, L.: A set of recommendations for the control of IPTV-systems via smartphones based on the understanding of users practices and needs (2012) Bernhaupt, R., Boutonnnet, M., Gatellier, B., Gimenez, Y., Pouchepanadin, C., Souiba, L.: A set of recommendations for the control of IPTV-systems via smartphones based on the understanding of users practices and needs (2012)
3.
Zurück zum Zitat Bernhaupt, R., Drouet, D., Manciet, F., Pirker, M., Pottier, G.: Using speech to search comparing built-in and ambient speech search in terms of privacy and user experience (2017) Bernhaupt, R., Drouet, D., Manciet, F., Pirker, M., Pottier, G.: Using speech to search comparing built-in and ambient speech search in terms of privacy and user experience (2017)
4.
Zurück zum Zitat Bohouta, G., Këpuska, V.: Performance of WUW and general ASR speech recognition systems in different acoustic environments. J. Acoust. Soc. Am. 143(3), 1758 (2018)CrossRef Bohouta, G., Këpuska, V.: Performance of WUW and general ASR speech recognition systems in different acoustic environments. J. Acoust. Soc. Am. 143(3), 1758 (2018)CrossRef
5.
Zurück zum Zitat Cordeiro, J.P.R.: Conversação Homem-máquina. Caracterização e Avaliação do Estado Actual das Soluções de Speech Recognition, Speech Synthesis e Sistemas de conversação Homem-máquina (2016) Cordeiro, J.P.R.: Conversação Homem-máquina. Caracterização e Avaliação do Estado Actual das Soluções de Speech Recognition, Speech Synthesis e Sistemas de conversação Homem-máquina (2016)
7.
Zurück zum Zitat Gomes, R.: Teste de interfaces de Voz (2007) Gomes, R.: Teste de interfaces de Voz (2007)
8.
Zurück zum Zitat Goto, J., Kim, Y.-B., Strl, N., Miyazaki, M., Komine, K., Uratani, N.: A spoken dialogue interface for TV operations based on data collected by using WOZ method (2004) Goto, J., Kim, Y.-B., Strl, N., Miyazaki, M., Komine, K., Uratani, N.: A spoken dialogue interface for TV operations based on data collected by using WOZ method (2004)
9.
Zurück zum Zitat Hirayama, N., Yoshino, K., Itoyama, K., Mori, S., Okuno, H.G.: Automatic speech recognition for mixed dialect utterances by mixing dialect language models. IEEE/ACM Trans. Audio Speech Lang. Process. 23(2), 373–382 (2015)CrossRef Hirayama, N., Yoshino, K., Itoyama, K., Mori, S., Okuno, H.G.: Automatic speech recognition for mixed dialect utterances by mixing dialect language models. IEEE/ACM Trans. Audio Speech Lang. Process. 23(2), 373–382 (2015)CrossRef
11.
Zurück zum Zitat Këpuska, V.: Comparing speech recognition systems (Microsoft API, Google API And CMU Sphinx). Int. J. Eng. Res. Appl. 07(03), 20–24 (2017) Këpuska, V.: Comparing speech recognition systems (Microsoft API, Google API And CMU Sphinx). Int. J. Eng. Res. Appl. 07(03), 20–24 (2017)
13.
Zurück zum Zitat Morbini, F., Audhkhasi, K., Sagae, K., Artstein, R.: Which ASR should I choose for my dialogue system? In: Sigdial, pp. 394–403, August 2013 Morbini, F., Audhkhasi, K., Sagae, K., Artstein, R.: Which ASR should I choose for my dialogue system? In: Sigdial, pp. 394–403, August 2013
14.
Zurück zum Zitat Nakatoh, Y., Kuwano, H., Kanamori, T., Hoshimi, M.: Speech recognition interface system for digital TV control. Acoust. Sci. Technol. 28(3), 165–171 (2007)CrossRef Nakatoh, Y., Kuwano, H., Kanamori, T., Hoshimi, M.: Speech recognition interface system for digital TV control. Acoust. Sci. Technol. 28(3), 165–171 (2007)CrossRef
15.
Zurück zum Zitat Shahamiri, S.R., Binti Salim, S.S.: Real-time frequency-based noise-robust automatic speech recognition using multi-nets artificial neural networks: a multi-views multi-learners’ approach. Neurocomputing 129, 199–207 (2014)CrossRef Shahamiri, S.R., Binti Salim, S.S.: Real-time frequency-based noise-robust automatic speech recognition using multi-nets artificial neural networks: a multi-views multi-learners’ approach. Neurocomputing 129, 199–207 (2014)CrossRef
17.
Zurück zum Zitat Stolfi, G.: Perceção auditiva e compressão de áudio. In Princípios de Televisão Digital, pp. 1–26 (2008) Stolfi, G.: Perceção auditiva e compressão de áudio. In Princípios de Televisão Digital, pp. 1–26 (2008)
18.
Zurück zum Zitat He, L.D., Alex, A.: Why word error rate is not a good metric for speech recognizer training for the speech translation task? In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5632–5635 (2011) He, L.D., Alex, A.: Why word error rate is not a good metric for speech recognizer training for the speech translation task? In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5632–5635 (2011)
19.
Zurück zum Zitat Lecouteux, B., Vacher, M., Portet, F.: Distant speech processing for smart home: comparison of ASR approaches in scattered microphone network for voice command. Int. J. Speech Technol. 21, 601–618 (2018)CrossRef Lecouteux, B., Vacher, M., Portet, F.: Distant speech processing for smart home: comparison of ASR approaches in scattered microphone network for voice command. Int. J. Speech Technol. 21, 601–618 (2018)CrossRef
20.
Zurück zum Zitat Turunen, M., et al.: User expectations and user experience with different modalities in a mobile phone-controlled home entertainment system. In: Proceedings of the 11th International Conference on Human-Computer Interaction with Mobile Devices, pp. 1–4. ACM, New York (2009) Turunen, M., et al.: User expectations and user experience with different modalities in a mobile phone-controlled home entertainment system. In: Proceedings of the 11th International Conference on Human-Computer Interaction with Mobile Devices, pp. 1–4. ACM, New York (2009)
21.
Zurück zum Zitat Vipperla, R., Bozonnet, S., Wang, D., Evans, N.: Robust speech recognition in multi-source noise environments using convolutive non-negative matrix factorization. In: CHiME: Workshop on Machine Learning in Multisource Environments, pp. 74–79 (2011) Vipperla, R., Bozonnet, S., Wang, D., Evans, N.: Robust speech recognition in multi-source noise environments using convolutive non-negative matrix factorization. In: CHiME: Workshop on Machine Learning in Multisource Environments, pp. 74–79 (2011)
22.
Zurück zum Zitat Ward, N., Rivera, A., Ward, K., Novick, D.: Some Usability issues and research priorities in spoken dialog applications. Departmental Technical Reports (2005) Ward, N., Rivera, A., Ward, K., Novick, D.: Some Usability issues and research priorities in spoken dialog applications. Departmental Technical Reports (2005)
24.
Zurück zum Zitat Lecouteux, B., Vacher, B., Portet, F.: Distant speech processing for smart home: comparison of ASR approaches in scattered microphone network for voice command. Int. J. Speech Technol. 21(3), 601–618 (2018)CrossRef Lecouteux, B., Vacher, B., Portet, F.: Distant speech processing for smart home: comparison of ASR approaches in scattered microphone network for voice command. Int. J. Speech Technol. 21(3), 601–618 (2018)CrossRef
25.
Zurück zum Zitat Nematollahi, M.A., Al-Haddad, S.A.R.: Distant speaker recognition: an overview. Int. J. Humanoid Robot. 13(02), 1550032 (2016)CrossRef Nematollahi, M.A., Al-Haddad, S.A.R.: Distant speaker recognition: an overview. Int. J. Humanoid Robot. 13(02), 1550032 (2016)CrossRef
26.
Zurück zum Zitat Pellegrini, T., et al.: A corpus-based study of elderly and young speakers of European Portuguese: acoustic correlates and their impact on speech recognition performance (2013) Pellegrini, T., et al.: A corpus-based study of elderly and young speakers of European Portuguese: acoustic correlates and their impact on speech recognition performance (2013)
28.
Zurück zum Zitat Ali, A., Magdy, W., Renals, S.: Multi-Reference Evaluation for Dialectal Speech Recognition System: A Study for Egyptian ASR (2015) Ali, A., Magdy, W., Renals, S.: Multi-Reference Evaluation for Dialectal Speech Recognition System: A Study for Egyptian ASR (2015)
29.
30.
Zurück zum Zitat deMauro, T.: Linguística Elementar. Editorial Estampa, Lisboa (2000) deMauro, T.: Linguística Elementar. Editorial Estampa, Lisboa (2000)
Metadaten
Titel
Evaluating the Performance of ASR Systems for TV Interactions in Several Domestic Noise Scenarios
verfasst von
Pedro Beça
Jorge Abreu
Rita Santos
Ana Rodrigues
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-23862-9_12