Skip to main content

2020 | OriginalPaper | Buchkapitel

Α Benchmarking of IBM, Google and Wit Automatic Speech Recognition Systems

verfasst von : Foteini Filippidou, Lefteris Moussiades

Erschienen in: Artificial Intelligence Applications and Innovations

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

As the requirements for automatic speech recognition are continually increasing, the demand for accuracy and efficiency is also of particular interest. In this paper, we present most of the well-known Automated Speech Recognition systems (ASR), and we benchmark three of them, namely the IBM Watson, Google, and Wit, using the WER, Hper, and Rper error metrics. The experimental results show that Google’s automatic speech recognition performs better among the three systems. We intend to extend the benchmarking both to include most of the available Automated Speech Recognition systems and increase our test data.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Anusuya, M.A., Katti, S.K.: Speech Recognition by Machine, A Review. ArXiv10012267 Cs, January 2010 Anusuya, M.A., Katti, S.K.: Speech Recognition by Machine, A Review. ArXiv10012267 Cs, January 2010
2.
Zurück zum Zitat Sharma, F., Wasson, S.G.: A Speech Recognition and Synthesis Tool : Assistive Technology for Physically Disabled Persons (2012) Sharma, F., Wasson, S.G.: A Speech Recognition and Synthesis Tool : Assistive Technology for Physically Disabled Persons (2012)
5.
Zurück zum Zitat Britain scores in easier man-machine communication. Sens. Rev. 1(4), 172–173 (1981) Britain scores in easier man-machine communication. Sens. Rev. 1(4), 172–173 (1981)
6.
Zurück zum Zitat Upton, J.: Speech recognition for computers in industry. Sens. Rev. 4(4), 177–178 (1984)CrossRef Upton, J.: Speech recognition for computers in industry. Sens. Rev. 4(4), 177–178 (1984)CrossRef
7.
Zurück zum Zitat Applications and potential of speech recognition. Sens. Rev. 4 (1983) Applications and potential of speech recognition. Sens. Rev. 4 (1983)
9.
Zurück zum Zitat Alyousefi, S.: Digital Automatic Speech Recognition using Kaldi, Thesis (2018) Alyousefi, S.: Digital Automatic Speech Recognition using Kaldi, Thesis (2018)
10.
Zurück zum Zitat Srikanth, R., Salsman, J.: Automatic pronunciation scoring and mispronunciation detection using CMUSphinx. In: Proceedings of the Workshop on Speech and Language Processing Tools in Education, Mumbai, India, pp. 61–68 (2012) Srikanth, R., Salsman, J.: Automatic pronunciation scoring and mispronunciation detection using CMUSphinx. In: Proceedings of the Workshop on Speech and Language Processing Tools in Education, Mumbai, India, pp. 61–68 (2012)
11.
Zurück zum Zitat Karpagavalli, S., Chandra, E.: A Review on Automatic Speech Recognition Architecture and Approaches (2016) Karpagavalli, S., Chandra, E.: A Review on Automatic Speech Recognition Architecture and Approaches (2016)
13.
Zurück zum Zitat Morbini, F. et al.: Which ASR should I choose for my dialogue system? In: Proceedings of the SIGDIAL 2013 Conference, pp. 394–403 (2013) Morbini, F. et al.: Which ASR should I choose for my dialogue system? In: Proceedings of the SIGDIAL 2013 Conference, pp. 394–403 (2013)
14.
Zurück zum Zitat Lamere, P., et al.: The CMU Sphinx-4 Speech Recognition System, p. 4 (2003) Lamere, P., et al.: The CMU Sphinx-4 Speech Recognition System, p. 4 (2003)
15.
Zurück zum Zitat Satori, H., ElHaoussi, F.: Investigation Amazigh speech recognition using CMU tools. Int. J. Speech Technol. 17(3), 235–243 (2014)CrossRef Satori, H., ElHaoussi, F.: Investigation Amazigh speech recognition using CMU tools. Int. J. Speech Technol. 17(3), 235–243 (2014)CrossRef
16.
Zurück zum Zitat Povey, D., et al.: The Kaldi Speech Recognition Toolkit (2011) Povey, D., et al.: The Kaldi Speech Recognition Toolkit (2011)
18.
Zurück zum Zitat Dragon NaturallySpeaking, Wikipedia, 17 April 2019 Dragon NaturallySpeaking, Wikipedia, 17 April 2019
19.
Zurück zum Zitat Baker, J.M.: DragonDictatetm-30K: natural language speech recognition with 30,000 words. In: Presented at the First European Conference on Speech Communication and Technology, Paris, France, September 1989. Accessed 26 Apr 2019 Baker, J.M.: DragonDictatetm-30K: natural language speech recognition with 30,000 words. In: Presented at the First European Conference on Speech Communication and Technology, Paris, France, September 1989. Accessed 26 Apr 2019
26.
Zurück zum Zitat Sharp, R.D., et al.: The Watson speech recognition engine. In: 1997 IEEE International Conference Acoustics Speech Signal Process Sharp, R.D., et al.: The Watson speech recognition engine. In: 1997 IEEE International Conference Acoustics Speech Signal Process
27.
Zurück zum Zitat Goffin, V., et al.: The AT&T WATSON Speech Recognizer. In: Proceedings of ICASSP 3905 IEEE International Conference Acoustics Speech Signal Process (2005) Goffin, V., et al.: The AT&T WATSON Speech Recognizer. In: Proceedings of ICASSP 3905 IEEE International Conference Acoustics Speech Signal Process (2005)
28.
31.
Zurück zum Zitat Sjölander, K., Beskow, J.: WAVESURFER - AN OPEN SOURCE SPEECH TOOL, p. 4 (2001) Sjölander, K., Beskow, J.: WAVESURFER - AN OPEN SOURCE SPEECH TOOL, p. 4 (2001)
32.
Zurück zum Zitat Lee, A., Kawahara, T.: Recent development of open-source speech recognition engine julius. In: Proceedings: APSIPA ASC 2009: Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference, pp. 131–137, October 2009 Lee, A., Kawahara, T.: Recent development of open-source speech recognition engine julius. In: Proceedings: APSIPA ASC 2009: Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference, pp. 131–137, October 2009
33.
Zurück zum Zitat Errattahi, R., El Hannani, A., Ouahmane, H.: Automatic speech recognition errors detection and correction: a review. Proc. Comput. Sci. 128, 32–37 (2018)CrossRef Errattahi, R., El Hannani, A., Ouahmane, H.: Automatic speech recognition errors detection and correction: a review. Proc. Comput. Sci. 128, 32–37 (2018)CrossRef
34.
Zurück zum Zitat Gaikwad, S.K., Gawali, B.W., Yannawar, P.: A review on speech recognition technique. Int. J. Comput. Appl. 10(3), 16–24 (2010) Gaikwad, S.K., Gawali, B.W., Yannawar, P.: A review on speech recognition technique. Int. J. Comput. Appl. 10(3), 16–24 (2010)
35.
Zurück zum Zitat Morris, C., Maier, V., Green, P.: From WER and RIL to MER and WIL: improved evaluation measures for connected speech recognition, p. 4 (2004) Morris, C., Maier, V., Green, P.: From WER and RIL to MER and WIL: improved evaluation measures for connected speech recognition, p. 4 (2004)
36.
Zurück zum Zitat Seljan, S., Dunđer, I.: Combined Automatic Speech Recognition and Machine Translation in Business Correspondence Domain for English-Croatian (2014) Seljan, S., Dunđer, I.: Combined Automatic Speech Recognition and Machine Translation in Business Correspondence Domain for English-Croatian (2014)
37.
Zurück zum Zitat Madl, D.: Hjerson: An Open Source Tool for Automatic Error Classification of Machine Translation Output: cidermole/hjerson (2018) Madl, D.: Hjerson: An Open Source Tool for Automatic Error Classification of Machine Translation Output: cidermole/hjerson (2018)
Metadaten
Titel
Α Benchmarking of IBM, Google and Wit Automatic Speech Recognition Systems
verfasst von
Foteini Filippidou
Lefteris Moussiades
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-49161-1_7

Premium Partner