2003 | OriginalPaper | Buchkapitel
Combating the Sparse Data Problem of Language Modelling
verfasst von : Frederick Jelinek
Erschienen in: Text, Speech and Dialogue
Verlag: Springer Berlin Heidelberg
Enthalten in: Professional Book Archive
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
The talk will concern several ideas that combat the sparse data problem of language modeling. All alleviate it, neither solves it. These ideas are: equivalence classification of histories, positional clustering (different cluster systems for different n-gram positions), use of linguistic classes (e.g., Wordnet), class constraints in maximum entropy estimation, random forests, and neural network classification. An interesting problem that must be faced is as follows: words that are sparse and need to be classified do not have sufficient statistics to indicate their appropriate class membership.