Abstract
A large class of machine-learning problems in natural language require the characterization of linguistic context. Two characteristic properties of such problems are that their feature space is of very high dimensionality, and their target concepts depend on only a small subset of the features in the space. Under such conditions, multiplicative weight-update algorithms such as Winnow have been shown to have exceptionally good theoretical properties. In the work reported here, we present an algorithm combining variants of Winnow and weighted-majority voting, and apply it to a problem in the aforementioned class: context-sensitive spelling correction. This is the task of fixing spelling errors that happen to result in valid words, such as substituting to for too, casual for causal, and so on. We evaluate our algorithm, WinSpell, by comparing it against BaySpell, a statistics-based method representing the state of the art for this task. We find: (1) When run with a full (unpruned) set of features, WinSpell achieves accuracies significantly higher than BaySpell was able to achieve in either the pruned or unpruned condition; (2) When compared with other systems in the literature, WinSpell exhibits the highest performance; (3) While several aspects of WinSpell's architecture contribute to its superiority over BaySpell, the primary factor is that it is able to learn a better linear separator than BaySpell learns; (4) When run on a test set drawn from a different corpus than the training set was drawn from, WinSpell is better able than BaySpell to adapt, using a strategy we will present that combines supervised learning on the training set with unsupervised learning on the (noisy) test set.
Article PDF
Similar content being viewed by others
References
Blum, A. (1992). Learning boolean functions in an infinite attribute space. Machine Learning, 9(4), 373–386.
Breiman, L. (1994). Bagging predictors (Technical Report 421). University of California, Berkeley.
Cesa-Bianchi, N., Freund, Y., Helmbold, D.P., & Warmuth, M.(1994). On-line prediction and conversion strategies. Computational Learning Theory: Eurocolt'93 (pp. 205–216). Oxford University Press.
Chen, S.F., & Goodman, J. (1996). An empirical study of smoothing techniques for language modeling. Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, Santa Cruz, CA.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20, 273–297.
Dagan, I., Karov, Y., & Roth, D. (1997). Mistake-driven learning in text categorization. EMNLP-97, The Second Conference on Empirical Methods in Natural Language Processing (pp. 55–63).
Dietterich, T.G. (1998). Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10(7), 1895–1924.
Domingos, P., & Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29, 103–130.
Duda, R.O., & Hart, P.E. (1973). Pattern classification and scene analysis. Wiley.
Fleiss, J.L. (1981). Statistical methods for rates and proportions. John Wiley and Sons.
Flexner, S.B. (Ed.). (1983). Random House unabridged dictionary (2nd ed.). New York: Random House.
Freund, Y., & Schapire, R.E. (1995). A decision-theoretic generalization of on-line learning and an application to boosting. Computational Learning Theory: Eurocolt'95 (pp. 23–37). Springer-Verlag.
Gale, W.A., Church, K.W., & Yarowsky, D. (1993). A method for disambiguating word senses in a large corpus. Computers and the Humanities, 26, 415–439.
Golding, A.R. (1995). A Bayesian hybrid method for context-sensitive spelling correction. Proceedings of the 3rd Workshop on Very Large Corpora, Boston, MA.
Golding, A.R., & Schabes, Y. (1996). Combining trigram-based and feature-based methods for context-sensitive spelling correction. Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, Santa Cruz, CA.
Herbster, M., & Warmuth, M. (1995). Tracking the best expert. Proceedings of the 12th International Conference on Machine Learning (pp. 286–294). Morgan Kaufmann.
Holte, R.C., Acker, L.E., & Porter, B.W. (1989). Concept learning and the problem of small disjuncts. Proceedings of the International Joint Conference on Artificial Intelligence, Detroit.
Jones, M.P., & Martin, J.H. (1997). Contextual spelling correction using latent semantic analysis. Proceedings of the 5th Conference on Applied Natural Language Processing, Washington, DC.
Katz, S.M. (1987). Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Trans. on Acoustics, Speech, and Signal Processing, ASSP-35(3), 400–401.
Kivinen, J., & Warmuth, M.K. (1995). Exponentiated gradient versus gradient descent for linear predictors. ACM Symp. on the Theory of Computing.
Kneser, R., & Ney, H. (1995). Improved backing-off for m-gram language modeling. Proceedings of the International Conf. on Acoustics, Speech, and Signal Processing (Vol. 1, pp. 181–184).
Kohavi, R., Becker, B., & Sommerfield, D. (1997). Improving simple Bayes. Proceedings of the European Conference on Machine Learning.
KuLcera, H., & Francis, W.N. (1967). Computational analysis of present-day American English. Providence, RI: Brown University Press.
Kukich, K. (1992). Techniques for automatically correcting words in text. ACM Computing Surveys, 24(4), 377–439.
Littlestone, N. (1988). Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2, 285–318.
Littlestone, N. (1991). Redundant noisy attributes, attribute errors, and linear threshold learning using Winnow. Proceedings of the 4th AnnualWorkshop on Computational Learning Theory (pp. 147–156). Morgan Kaufmann.
Littlestone, N. (1995). Comparing several linear-threshold learning algorithms on tasks involving superfluous attributes. Proceedings of the 12th International Conference on Machine Learning (pp. 353–361). Morgan Kaufmann.
Littlestone, N., & Warmuth, M.K. (1994). The weighted majority algorithm. Information and Computation, 108(2), 212–261.
Mangu, L., & Brill, E. (1997). Automatic rule acquisition for spelling correction. Proceedings of the 14th International Conference on Machine Learning. Morgan Kaufmann.
Marcus, M.P., Santorini, B., & Marcinkiewicz, M. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2), 313–330.
Mays, E., Damerau, F.J., & Mercer, R.L. (1991). Context based spelling correction. Information Processing and Management, 27(5), 517–522.
Ney, H., Essen, U., & Kneser, R. (1994). On structuring probabilistic dependences in stochastic language modelling. Computer Speech and Language, 8, 1–38.
Ng, H.T., & Lee, H.B. (1996). Integrating multiple knowledge sources to disambiguate word sense: An examplarbased approach. Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, Santa Cruz, CA.
Powers, D. (1997). Learning and application of differential grammars. Proceedings of the ACL Special Interest Group in Natural Language Learning, Madrid.
Reddy, L., & Tadepalli, P. (1997). Active learning with committees for text categorization. Proceedings of the National Conference on Artificial Intelligence (pp. 602–608).
Roth, D. (1998). Learning to resolve natural language ambiguities: Aunified approach. Proceedings of the National Conference on Artificial Intelligence (pp. 806–813).
Roth, D., & Zelenko, D. (1998). Part of speech tagging using a network of linear separators. COLING-ACL 98, The 17th International Conference on Computational Linguistics (pp. 1136–1142).
Valiant, L.G. (1994). Circuits of the mind. Oxford University Press.
Valiant, L.G. (1995). Rationality. Workshop on Computational Learning Theory (pp. 3–14).
Yarowsky, D. (1994). Decision lists for lexical ambiguity resolution: Application to accent restoration in Spanish and French. Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, Las Cruces, NM.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Golding, A.R., Roth, D. A Winnow-Based Approach to Context-Sensitive Spelling Correction. Machine Learning 34, 107–130 (1999). https://doi.org/10.1023/A:1007545901558
Issue Date:
DOI: https://doi.org/10.1023/A:1007545901558