Skip to main content
Erschienen in: Soft Computing 9/2016

09.08.2015 | Focus

Simplified scoring methods for HMM-based speech recognition

verfasst von: Pavel Paramonov, Nadezhda Sutula

Erschienen in: Soft Computing | Ausgabe 9/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Most of the contemporary speech recognition systems exploit complex algorithms based on Hidden Markov Models (HMMs) to achieve high accuracy. However, in some cases rich computational resources are not available, and even isolated words recognition becomes challenging task. In this paper, we present two ways to simplify scoring in HMM-based speech recognition in order to reduce its computational complexity. We focus on core HMM procedure—forward algorithm, which is used to find the probability of generating observation sequence by given HMM, applying methods of dynamic programming. All proposed approaches were tested on Russian words recognition and the results were compared with those demonstrated by conventional forward algorithm.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Baker JK (1975) The DRAGON system: an overview. IEEE Trans Acoust Speech Signal Process 23:24–29CrossRef Baker JK (1975) The DRAGON system: an overview. IEEE Trans Acoust Speech Signal Process 23:24–29CrossRef
Zurück zum Zitat Baker JM, Deng L, Glass J, Khudanpur S, Lee C, Morgan N, OShaughnessy D (2009) Research developments and directions in speech recognition and understanding, part 1. IEEE Signal Process Mag 26:75–80CrossRef Baker JM, Deng L, Glass J, Khudanpur S, Lee C, Morgan N, OShaughnessy D (2009) Research developments and directions in speech recognition and understanding, part 1. IEEE Signal Process Mag 26:75–80CrossRef
Zurück zum Zitat Bertsekas D, Tsitsiklis J (2008) Introduction to probability, 2nd edn. Athena Scientific, Belmont Bertsekas D, Tsitsiklis J (2008) Introduction to probability, 2nd edn. Athena Scientific, Belmont
Zurück zum Zitat Dahl GE, Yu D, Deng L, Acero A (2012) Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans Audio Speech Lang Process 20:30–42CrossRef Dahl GE, Yu D, Deng L, Acero A (2012) Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans Audio Speech Lang Process 20:30–42CrossRef
Zurück zum Zitat Deng L, Li X (2013) Machine learning paradigms for speech recognition: an overview. IEEE Trans Audio Speech Lang Process 21:1060–1089CrossRef Deng L, Li X (2013) Machine learning paradigms for speech recognition: an overview. IEEE Trans Audio Speech Lang Process 21:1060–1089CrossRef
Zurück zum Zitat Hinton G, Deng L, Yu D, Dahl GE, Mohamed A, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath TN, Kingsbury B (2012) Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process Mag 29:82–97CrossRef Hinton G, Deng L, Yu D, Dahl GE, Mohamed A, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath TN, Kingsbury B (2012) Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process Mag 29:82–97CrossRef
Zurück zum Zitat Huang X, Acero A (2001) Spoken language processing: a guide to theory, algorithm, and system development. Prentice-Hall International, New Jersey Huang X, Acero A (2001) Spoken language processing: a guide to theory, algorithm, and system development. Prentice-Hall International, New Jersey
Zurück zum Zitat Jelinek F (1976) Continuous speech recognition by statistical methods. IEEE Proc 64:532–556CrossRef Jelinek F (1976) Continuous speech recognition by statistical methods. IEEE Proc 64:532–556CrossRef
Zurück zum Zitat Ke S, Hou Y, Huang Z, Li H (2008) A HMM speech recognition system based on FPGA. Congr Image Signal Process 5:305–309CrossRef Ke S, Hou Y, Huang Z, Li H (2008) A HMM speech recognition system based on FPGA. Congr Image Signal Process 5:305–309CrossRef
Zurück zum Zitat Mohamed A, Dahl GE, Hinton G (2012) Acoustic modeling using deep belief networks. IEEE Trans Audio Speech Lang Process 20:14–22CrossRef Mohamed A, Dahl GE, Hinton G (2012) Acoustic modeling using deep belief networks. IEEE Trans Audio Speech Lang Process 20:14–22CrossRef
Zurück zum Zitat Mohamed A, Hinton G, Penn G, (2012) Understanding how deep belief networks perform acoustic modeling. IEEE Int Conf Acoust Speech Signal Process, pp 4273–4276 Mohamed A, Hinton G, Penn G, (2012) Understanding how deep belief networks perform acoustic modeling. IEEE Int Conf Acoust Speech Signal Process, pp 4273–4276
Zurück zum Zitat Mosleh M, Setayeshi S, Mehdi Lotfinejad M, Mirshekari A (2010) FPGA implementation of a linear systolic array for speech recognition based on HMM. The 2nd International Conference on Computer and Automation Engineering 3:75–78 Mosleh M, Setayeshi S, Mehdi Lotfinejad M, Mirshekari A (2010) FPGA implementation of a linear systolic array for speech recognition based on HMM. The 2nd International Conference on Computer and Automation Engineering 3:75–78
Zurück zum Zitat Rabiner L (1989) Tutorial on hidden Markov models and selected applications in speech recognition. IEEE Proc 77:257–286CrossRef Rabiner L (1989) Tutorial on hidden Markov models and selected applications in speech recognition. IEEE Proc 77:257–286CrossRef
Zurück zum Zitat Tamuleviius G, Arminas V, Ivanovas E, Navakauskas D, (2010) Hardware accelerated FPGA implementation of lithuanian isolated word recognition system. Elektronika ir Elektrotechnika, pp 57–62 Tamuleviius G, Arminas V, Ivanovas E, Navakauskas D, (2010) Hardware accelerated FPGA implementation of lithuanian isolated word recognition system. Elektronika ir Elektrotechnika, pp 57–62
Zurück zum Zitat Trentin E, Gori M (2001) A survey of hybrid ANN/HMM models for automatic speech recognition. Neurocomputing 37:91–126CrossRefMATH Trentin E, Gori M (2001) A survey of hybrid ANN/HMM models for automatic speech recognition. Neurocomputing 37:91–126CrossRefMATH
Metadaten
Titel
Simplified scoring methods for HMM-based speech recognition
verfasst von
Pavel Paramonov
Nadezhda Sutula
Publikationsdatum
09.08.2015
Verlag
Springer Berlin Heidelberg
Erschienen in
Soft Computing / Ausgabe 9/2016
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-015-1831-1

Weitere Artikel der Ausgabe 9/2016

Soft Computing 9/2016 Zur Ausgabe