Abstract
We present an off-line variant of the mistake-bound model of learning. This is an intermediate model between the on-line learning model (Littlestone, 1988, Littlestone, 1989) and the self-directed learning model (Goldman, Rivest & Schapire, 1993, Goldman & Sloan, 1994). Just like in the other two models, a learner in the off-line model has to learn an unknown concept from a sequence of elements of the instance space on which it makes “guess and test” trials. In all models, the aim of the learner is to make as few mistakes as possible. The difference between the models is that, while in the on-line model only the set of possible elements is known, in the off-line model the sequence of elements (i.e., the identity of the elements as well as the order in which they are to be presented) is known to the learner in advance. On the other hand, the learner is weaker than the self-directed learner, which is allowed to choose adaptively the sequence of elements presented to him.
We study some of the fundamental properties of the off-line model. In particular, we compare the number of mistakes made by the off-line learner on certain concept classes to those made by the on-line and self-directed learners. We give bounds on the possible gaps between the various models and show examples that prove that our bounds are tight.
Another contribution of this paper is the extension of the combinatorial tool of labeled trees to a unified approach that captures the various mistake bound measures of all the models discussed. We believe that this tool will prove to be useful for further study of models of incremental learning.
Article PDF
Similar content being viewed by others
References
D. Angluin, (1989). “Equivalence Queries and Approximate Fingerprints”, Proc. of 2nd COLT pp. 134–145.
A. Blum, (1994). “Separating Distribution-Free and Mistake-Bound Learning Models over the Boolean Domain”, SIAM Journal on Computing, Vol. 23, No. 4, pp. 373–386. (Also, Proceedings of FOCS90, pp. 211-218, 1990.)
A. Blum, (1992a). “Learning Boolean Functions in an Infinite Attribute Space”, Machine Learning, Vol. 9. (Also, Proceedings of STOC90, pp. 64-72, 1990.)
A. Blum, (1992b). “Rank-rDecision Trees are a Subclass of r-Decision Lists”, Information Processing Letters, Vol. 42, pp. 183-185.
N. Cesa-Bianchi, Y. Freund, D. P. Helmbold, D. Haussler, R. E. Schapire, & M. K. Warmuth, (1993). “How to Use Expert Advice”, Proceedings of STOC93, pp. 382-391.
Z. Chen, & W. Maass, (1994). “On-line Learning of Rectangles”, Machine Learning, Vol. 17, pp. 23-50. (Also, Proceedings of COLT92, pp. 16-27, 1992.)
T. H. Cormen, C. E. Leiserson, & R. L. Rivest, (1990). Introduction to Algorithms. MIT Press.
A. Ehrenfeucht & D. Haussler, (1989). “Learning Decision Trees from Random Examples”, Information and Computation, Vol. 82, pp. 231–246.
M. Feder, N. Merhav, & M. Gutman, (1992). “Universal Prediction of Individual Sequences”, IEEE Trans. on Information Theory, IT-38, No. 4, pp. 1258–1270.
S. A. Goldman, R. L. Rivest, & R. E. Schapire, (1993). “Learning Binary Relations and Total Orders”, SIAM Journal on Computing, Vol. 22, No. 5, pp. 1006-1034.
S. A. Goldman, & R. H. Sloan, (1994). “The Power of Self-Directed Learning”, Machine Learning, Vol. 14, pp. 271-294.
D.P. Helmbold, N. Littlestone, & P.M. Long, (1992). “AppleTasting and Nearly One-Sided Learning”, Proceedings of FOCS92, pp. 493–502.
N. Littlestone, (1988). “Learning when Irrelevant Attributes Abound: A New Linear-Threshold Algorithm”, Machine Learning, Vol. 2, pp. 285-318.
N. Littlestone, (1989). “Mistake Bounds and Logarithmic Linear-Threshold Learning Algorithms”, PhD thesis, U.C. Santa Cruz.
N. Littlestone & M. K. Warmuth, (1994). “The Weighted Majority Algorithm”, Information and Computation, Vol. 108, pp. 212–261. (Also, Proceedings of FOCS89, pp. 256-261, 1989.)
W. Maass, (1991). “On-line Learning with an Oblivious Environment and the Power of Randomization”, Proceedings of COLT91, pp. 167-175.
N. Merhav & M. Feder, (1993). “Universal Sequential Decision Schemes from Individual Sequences”, IEEE Trans. on Information Theory, IT-39, pp. 1280–1292.
S. Porat & J. Feldman, (1991). “Learning Automata from Ordered Examples”, Machine Learning, Vol. 7, pp. 109–138. (Also, Proc. of 1st COLTpp. 386-396, 1988.)
N. Sauer, (1972). “On the Density of Families of Sets”, Journal of Combinatorial Theory (A), Vol. 13, pp. 145-147.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Ben-David, S., Kushilevitz, E. & Mansour, Y. Online Learning versus Offline Learning. Machine Learning 29, 45–63 (1997). https://doi.org/10.1023/A:1007465907571
Issue Date:
DOI: https://doi.org/10.1023/A:1007465907571