A Winnow-Based Approach to Context-Sensitive Spelling Correction

Golding, Andrew R.; Roth, Dan

doi:10.1023/A:1007545901558

A Winnow-Based Approach to Context-Sensitive Spelling Correction

Published: February 1999

Volume 34, pages 107–130, (1999)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

A Winnow-Based Approach to Context-Sensitive Spelling Correction

Download PDF

Andrew R. Golding¹ &
Dan Roth²

3309 Accesses
120 Citations
3 Altmetric
Explore all metrics

Abstract

A large class of machine-learning problems in natural language require the characterization of linguistic context. Two characteristic properties of such problems are that their feature space is of very high dimensionality, and their target concepts depend on only a small subset of the features in the space. Under such conditions, multiplicative weight-update algorithms such as Winnow have been shown to have exceptionally good theoretical properties. In the work reported here, we present an algorithm combining variants of Winnow and weighted-majority voting, and apply it to a problem in the aforementioned class: context-sensitive spelling correction. This is the task of fixing spelling errors that happen to result in valid words, such as substituting to for too, casual for causal, and so on. We evaluate our algorithm, WinSpell, by comparing it against BaySpell, a statistics-based method representing the state of the art for this task. We find: (1) When run with a full (unpruned) set of features, WinSpell achieves accuracies significantly higher than BaySpell was able to achieve in either the pruned or unpruned condition; (2) When compared with other systems in the literature, WinSpell exhibits the highest performance; (3) While several aspects of WinSpell's architecture contribute to its superiority over BaySpell, the primary factor is that it is able to learn a better linear separator than BaySpell learns; (4) When run on a test set drawn from a different corpus than the training set was drawn from, WinSpell is better able than BaySpell to adapt, using a strategy we will present that combines supervised learning on the training set with unsupervised learning on the (noisy) test set.

References

Blum, A. (1992). Learning boolean functions in an infinite attribute space. Machine Learning, 9(4), 373–386.
Google Scholar
Breiman, L. (1994). Bagging predictors (Technical Report 421). University of California, Berkeley.
Google Scholar
Cesa-Bianchi, N., Freund, Y., Helmbold, D.P., & Warmuth, M.(1994). On-line prediction and conversion strategies. Computational Learning Theory: Eurocolt'93 (pp. 205–216). Oxford University Press.
Chen, S.F., & Goodman, J. (1996). An empirical study of smoothing techniques for language modeling. Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, Santa Cruz, CA.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20, 273–297.
Google Scholar
Dagan, I., Karov, Y., & Roth, D. (1997). Mistake-driven learning in text categorization. EMNLP-97, The Second Conference on Empirical Methods in Natural Language Processing (pp. 55–63).
Dietterich, T.G. (1998). Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10(7), 1895–1924.
Google Scholar
Domingos, P., & Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29, 103–130.
Google Scholar
Duda, R.O., & Hart, P.E. (1973). Pattern classification and scene analysis. Wiley.
Fleiss, J.L. (1981). Statistical methods for rates and proportions. John Wiley and Sons.
Flexner, S.B. (Ed.). (1983). Random House unabridged dictionary (2nd ed.). New York: Random House.
Google Scholar
Freund, Y., & Schapire, R.E. (1995). A decision-theoretic generalization of on-line learning and an application to boosting. Computational Learning Theory: Eurocolt'95 (pp. 23–37). Springer-Verlag.
Gale, W.A., Church, K.W., & Yarowsky, D. (1993). A method for disambiguating word senses in a large corpus. Computers and the Humanities, 26, 415–439.
Google Scholar
Golding, A.R. (1995). A Bayesian hybrid method for context-sensitive spelling correction. Proceedings of the 3rd Workshop on Very Large Corpora, Boston, MA.
Golding, A.R., & Schabes, Y. (1996). Combining trigram-based and feature-based methods for context-sensitive spelling correction. Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, Santa Cruz, CA.
Herbster, M., & Warmuth, M. (1995). Tracking the best expert. Proceedings of the 12th International Conference on Machine Learning (pp. 286–294). Morgan Kaufmann.
Holte, R.C., Acker, L.E., & Porter, B.W. (1989). Concept learning and the problem of small disjuncts. Proceedings of the International Joint Conference on Artificial Intelligence, Detroit.
Jones, M.P., & Martin, J.H. (1997). Contextual spelling correction using latent semantic analysis. Proceedings of the 5th Conference on Applied Natural Language Processing, Washington, DC.
Katz, S.M. (1987). Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Trans. on Acoustics, Speech, and Signal Processing, ASSP-35(3), 400–401.
Google Scholar
Kivinen, J., & Warmuth, M.K. (1995). Exponentiated gradient versus gradient descent for linear predictors. ACM Symp. on the Theory of Computing.
Kneser, R., & Ney, H. (1995). Improved backing-off for m-gram language modeling. Proceedings of the International Conf. on Acoustics, Speech, and Signal Processing (Vol. 1, pp. 181–184).
Google Scholar
Kohavi, R., Becker, B., & Sommerfield, D. (1997). Improving simple Bayes. Proceedings of the European Conference on Machine Learning.
KuLcera, H., & Francis, W.N. (1967). Computational analysis of present-day American English. Providence, RI: Brown University Press.
Google Scholar
Kukich, K. (1992). Techniques for automatically correcting words in text. ACM Computing Surveys, 24(4), 377–439.
Google Scholar
Littlestone, N. (1988). Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2, 285–318.
Google Scholar
Littlestone, N. (1991). Redundant noisy attributes, attribute errors, and linear threshold learning using Winnow. Proceedings of the 4th AnnualWorkshop on Computational Learning Theory (pp. 147–156). Morgan Kaufmann.
Littlestone, N. (1995). Comparing several linear-threshold learning algorithms on tasks involving superfluous attributes. Proceedings of the 12th International Conference on Machine Learning (pp. 353–361). Morgan Kaufmann.
Littlestone, N., & Warmuth, M.K. (1994). The weighted majority algorithm. Information and Computation, 108(2), 212–261.
Google Scholar
Mangu, L., & Brill, E. (1997). Automatic rule acquisition for spelling correction. Proceedings of the 14th International Conference on Machine Learning. Morgan Kaufmann.
Marcus, M.P., Santorini, B., & Marcinkiewicz, M. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2), 313–330.
Google Scholar
Mays, E., Damerau, F.J., & Mercer, R.L. (1991). Context based spelling correction. Information Processing and Management, 27(5), 517–522.
Google Scholar
Ney, H., Essen, U., & Kneser, R. (1994). On structuring probabilistic dependences in stochastic language modelling. Computer Speech and Language, 8, 1–38.
Google Scholar
Ng, H.T., & Lee, H.B. (1996). Integrating multiple knowledge sources to disambiguate word sense: An examplarbased approach. Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, Santa Cruz, CA.
Powers, D. (1997). Learning and application of differential grammars. Proceedings of the ACL Special Interest Group in Natural Language Learning, Madrid.
Reddy, L., & Tadepalli, P. (1997). Active learning with committees for text categorization. Proceedings of the National Conference on Artificial Intelligence (pp. 602–608).
Roth, D. (1998). Learning to resolve natural language ambiguities: Aunified approach. Proceedings of the National Conference on Artificial Intelligence (pp. 806–813).
Roth, D., & Zelenko, D. (1998). Part of speech tagging using a network of linear separators. COLING-ACL 98, The 17th International Conference on Computational Linguistics (pp. 1136–1142).
Valiant, L.G. (1994). Circuits of the mind. Oxford University Press.
Valiant, L.G. (1995). Rationality. Workshop on Computational Learning Theory (pp. 3–14).
Yarowsky, D. (1994). Decision lists for lexical ambiguity resolution: Application to accent restoration in Spanish and French. Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, Las Cruces, NM.

Download references

Author information

Authors and Affiliations

MERL—A Mitsubishi Electric Research Laboratory, 201 Broadway, Cambridge, MA, 02139
Andrew R. Golding
Department of Computer Science, University of Illinois—Urbana/Champaign, 1304 W. Springfield Avenue, Urbana, IL, 61801
Dan Roth

Authors

Andrew R. Golding
View author publications
You can also search for this author in PubMed Google Scholar
Dan Roth
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Golding, A.R., Roth, D. A Winnow-Based Approach to Context-Sensitive Spelling Correction. Machine Learning 34, 107–130 (1999). https://doi.org/10.1023/A:1007545901558

Download citation

Issue Date: February 1999
DOI: https://doi.org/10.1023/A:1007545901558

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A Winnow-Based Approach to Context-Sensitive Spelling Correction

Abstract

Article PDF

Similar content being viewed by others

Real-word error correction with trigrams: correcting multiple errors in a sentence

Automatic Spelling Detection and Correction in the Medical Domain: A Systematic Literature Review

A Spelling Mistake Correction (SMC) Model for Resolving Real-Word Error

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

A Winnow-Based Approach to Context-Sensitive Spelling Correction

Abstract

Article PDF

Similar content being viewed by others

Real-word error correction with trigrams: correcting multiple errors in a sentence

Automatic Spelling Detection and Correction in the Medical Domain: A Systematic Literature Review

A Spelling Mistake Correction (SMC) Model for Resolving Real-Word Error

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation