1992 | OriginalPaper | Buchkapitel
Hidden Markov Models for Speech Recognition — Strengths and Limitations
verfasst von : L. R. Rabiner, B. H. Juang
Erschienen in: Speech Recognition and Understanding
Verlag: Springer Berlin Heidelberg
Enthalten in: Professional Book Archive
Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
The use of hidden Markov models for speech recognition has become predominant for the last several years, as evidenced by the number of published papers and talks at major speech conferences. The reasons why this method has become so popular are the inherent statistical (mathematically precise) framework, the ease and availability of training algorithms for estimating the parameters of the models from finite training sets of speech data, the flexibility of the resulting recognition system where one can easily change the size, type, or architecture of the models to suit particular words, sounds etc., and the ease of implementation of the overall recognition system. However, although hidden Markov model technology has brought speech recognition system performance to new high levels for a variety of applications, there remain some fundamental areas where aspects of the theory are either inadequate for speech, or for which the assumptions that are made do not apply. Examples of such areas range from the fundamental modeling assumption, i.e. that a maximum likelihood estimate of the model parameters provides the best system performance, to issues involved with inadequate training data which leads to the concepts of parameter tying across states, deleted interpolation and other smoothing methods, etc. Other aspects of the basic hidden Markov modeling methodology which are still not well understood include; ways of integrating new features (e.g. prosodic versus spectral features) into the framework in a consistent and meaningful way; the way to properly model sound durations (both within a state and across states of a model); the way to properly use the information in state transitions; and finally the way in which models can be split or clustered as warranted by the training data. It is the purpose of this paper to examine each of these strengths and limitations and discuss how they affect overall performance of a typical speech recognition system.