Article

Free Access

An empirical study of smoothing techniques for language modeling

Authors:
Stanley F. Chen

Harvard University, Cambridge, MA

Harvard University, Cambridge, MA
View Profile

,
Joshua Goodman

Harvard University, Cambridge, MA

Harvard University, Cambridge, MA
View Profile

ACL '96: Proceedings of the 34th annual meeting on Association for Computational LinguisticsJune 1996Pages 310–318https://doi.org/10.3115/981863.981904

Published:24 June 1996Publication History

ACL '96: Proceedings of the 34th annual meeting on Association for Computational Linguistics

Pages 310–318

ABSTRACT

We present an extensive empirical comparison of several smoothing techniques in the domain of language modeling, including those described by Jelinek and Mercer (1980), Katz (1987), and Church and Gale (1991). We investigate for the first time how factors such as training data size, corpus (e.g., Brown versus Wall Street Journal), and n-gram order (bigram versus trigram) affect the relative performance of these methods, which we measure through the cross-entropy of test data. In addition, we introduce two novel smoothing techniques, one a variation of Jelinek-Mercer smoothing and one a very simple linear interpolation technique, both of which outperform existing methods.

References

Bahl, Lalit R., Frederick Jelinek, and Robert L. Mercer. 1983. A maximum likelihood approach to continuous speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-5(2):179--190, March.Google ScholarDigital Library
Brown, Peter F., John Cocke, Stephen A. DellaPietra, Vincent J. DellaPietra, Frederick Jelinek, John D. Lafferty, Robert L. Mercer, and Paul S. Roossin. 1990. A statistical approach to machine translation. Computational Linguistics, 16(2):79--85, June. Google ScholarDigital Library
Brown, Peter F., Stephen A. DellaPietra, Vincent J. DellaPietra, Jennifer C. Lai, and Robert L. Mercer. 1992. An estimate of an upper bound for the entropy of English. Computational Linguistics, 18(1):31--40, March. Google ScholarDigital Library
Chen, Stanley F. 1996. Building Probabilistic Models for Natural Language. Ph.D. thesis, Harvard University. In preparation. Google ScholarDigital Library
Church, Kenneth. 1988. A stochastic parts program and noun phrase parser for unrestricted text. In Proceedings of the Second Conference on Applied Natural Language Processing, pages 136--143. Google ScholarDigital Library
Church, Kenneth W. and William A. Gale. 1991. A comparison of the enhanced Good-Turing and deleted estimation methods for estimating probabilities of English bigrams. Computer Speech and Language, 5:19--54.Google ScholarCross Ref
Collins, Michael and James Brooks. 1995. Prepositional phrase attachment through a backed-off model. In David Yarowsky and Kenneth Church, editors, Proceedings of the Third Workshop on Very Large Corpora, pages 27--38, Cambridge, MA, June.Google Scholar
Gale, William A. and Kenneth W. Church. 1990. Estimation procedures for language context: poor estimates are worse than none. In COMPSTAT, Proceedings in Computational Statistics, 9th Symposium, pages 69--74, Dubrovnik, Yugoslavia, September.Google Scholar
Gale, William A. and Kenneth W. Church. 1994. What's wrong with adding one? In N. Oostdijk and P. de Haan, editors, Corpus-Based Research into Language. Rodolpi, Amsterdam.Google Scholar
Gale, William A. and Geoffrey Sampson. 1995. Good-Turing frequency estimation without tears. Journal of Quantitive Linguistics, 2(3). To appear.Google ScholarCross Ref
Good, I. J. 1953. The population frequencies of species and the estimation of population parameters. Biometrika, 40(3 and 4):237--264.Google Scholar
Jeffreys, H. 1948. Theory of Probability. Clarendon Press, Oxford, second edition.Google Scholar
Jelinek, Frederick and Robert L. Mercer. 1980. Interpolated estimation of Markov source parameters from sparse data. In Proceedings of the Workshop on Pattern Recognition in Practice, Amsterdam, The Netherlands: North-Holland, May.Google Scholar
Johnson, W. E. 1932. Probability: deductive and inductive problems. Mind, 41:421--423.Google Scholar
Katz, Slava M. 1987. Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, Speech and Signal Processing, ASSP-35(3):400--401, March.Google ScholarCross Ref
Kernighan, M. D., K. W. Church, and W. A. Gale. 1990. A spelling correction program based on a noisy channel model. In Proceedings of the Thirteenth International Conference on Computational Linguistics, pages 205--210. Google ScholarDigital Library
Lidstone, G. J. 1920. Note on the general case of the Bayes-Laplace formula for inductive or a posteriori probabilities. Transactions of the Faculty of Actuaries, 8:182--192.Google Scholar
MacKay, David J. C. and Linda C. Peto. 1995. A hierarchical Dirichlet language model. Natural Language Engineering, 1(3):1--19.Google ScholarCross Ref
Magerman, David M. 1994. Natural Language Parsing as Statistical Pattern Recognition. Ph.D. thesis, Stanford University, February. Google ScholarDigital Library
Nadas, Arthur. 1984. Estimation of probabilities in the language model of the IBM speech recognition system. IEEE Transactions on Acoustics, Speech and Signal Processing, ASSP-32(4):859--861, August.Google ScholarCross Ref
Press, W. H., B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling. 1988. Numerical Recipes in C. Cambridge University Press, Cambridge.Google Scholar

An empirical study of smoothing techniques for language modeling
1. Computing methodologies
  1. Artificial intelligence
2. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

On an Empirical Study of Smoothing Techniques for a Tiny Language Model
IPAC '15: Proceedings of the International Conference on Intelligent Information Processing, Security and Advanced Communication

The language models (LM) are an important module in many areas of natural language processing, in particular speech recognition and machine translation. In this experimental work, we present the most popular smoothing methods and their effects on ...
Read More
An empirical study of smoothing techniques for language modeling

We survey the most widely-used algorithms for smoothing models for language n -gram modeling. We then present an extensive empirical comparison of several of these smoothing techniques, including those described by Jelinek and Mercer (1980); Katz (1987);...
Read More
Cooccurrence smoothing for stochastic language modeling
ICASSP'92: Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1

Training corpora for stochastic language models are virtually always too small for maximum-likelihood estimation, so smoothing the models is of great importance. This paper derives the cooccurrence smoothing technique for stochastic language modeling ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ACL '96: Proceedings of the 34th annual meeting on Association for Computational Linguistics
June 1996
399 pages
Program Chairs:
Aravind Joshi
University of Pennsylvania, Philadelphia, PA
,
Martha Palmer
University of Pennsylvania, Philadelphia, PA
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 24 June 1996
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate85of443submissions,19%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 215
  Total Citations
  View Citations
- 4,958
  Total Downloads
- Downloads (Last 12 months)48
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

An empirical study of smoothing techniques for language modeling

ACL '96: Proceedings of the 34th annual meeting on Association for Computational Linguistics

ABSTRACT

References

Cited By

Recommendations

On an Empirical Study of Smoothing Techniques for a Tiny Language Model

An empirical study of smoothing techniques for language modeling

Cooccurrence smoothing for stochastic language modeling

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

An empirical study of smoothing techniques for language modeling

ACL '96: Proceedings of the 34th annual meeting on Association for Computational Linguistics

ABSTRACT

References

Cited By

Recommendations

On an Empirical Study of Smoothing Techniques for a Tiny Language Model

An empirical study of smoothing techniques for language modeling

Cooccurrence smoothing for stochastic language modeling

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media