article

Free Access

Learnability and the Vapnik-Chervonenkis dimension

Authors:
Anselm Blumer

Tufts Univ., Medford, MA

Tufts Univ., Medford, MA
View Profile

,
A. Ehrenfeucht

Univ. of Colorado at Boulder, Boulder

Univ. of Colorado at Boulder, Boulder
View Profile

,
David Haussler

Univ. of California at Santa Cruz, Santa Cruz

Univ. of California at Santa Cruz, Santa Cruz
View Profile

,
Manfred K. Warmuth

Univ. of California at Santa Cruz, Santa Cruz

Univ. of California at Santa Cruz, Santa Cruz
View Profile

Authors Info & Claims

Journal of the ACM Volume 36 Issue 4pp 929–965https://doi.org/10.1145/76359.76371

Published:01 October 1989Publication History

Journal of the ACM

Abstract

Valiant's learnability model is extended to learning classes of concepts defined by regions in Euclidean space Eⁿ. The methods in this paper lead to a unified treatment of some of Valiant's results, along with previous results on distribution-free convergence of certain pattern recognition algorithms. It is shown that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned. Using this parameter, the complexity and closure properties of learnable classes are analyzed, and the necessary and sufficient conditions are provided for feasible learnability.

References

1 AHO, A., HOPCROFT, J., AND ULLMAN, j. The Design and Analysis of Computer Algorithms. Addison-Wesley, London, 1974. Google Scholar
2 ANGLUIN, D. Learning regular sets from queries and counterexamples. Inf. Comput. 75 (1987), 87-106. Google Scholar
3 ANGLUIN, D. Queries and concept learning. Mach. Learning 2, 2 (1988), 319-342. Google Scholar
4 ANGLUIN, D., AND LAIRD, P. D. Learning from noisy examples. Mach. Learning 2, 2 (1988), 343-370. Google Scholar
5 ANGLUIN, D., AND VALIANT, L. Fast probabilistic algorithms for Hamiltonian circuits and matchings. J. Comput. Syst. Sci. 19 (1979), 155-193.Google Scholar
6 ASSOUAD, P. Densit6 et Dimension. Ann. Inst. Fourier, Grenoble 33, 3 (1983), 233-282.Google Scholar
7 BAUM, E., AND HAUSSLER, D. What size net gives valid generalization. Neural Comput. 1, 1 (1989) 151-160. Google Scholar
7a BEN-DAvID, S., BENEDEK, G., AND MANSOUR, Y. A parameterization scheme for classifying models of learnability. In Proceedings of the 2nd Workshop of Computational Learning Theory (Santa Cruz, Calif., July 31-Aug. 2). Morgan Kaufman, San Mateo, Calif., 1989, to appear. Google Scholar
8 BENEDEK, G., AND ITAI, A. Nonuniform learnability. In Proceedings of the 15th International Conference on Automata, Languages and Programming (July). 1988, pp. 82-92. Google Scholar
9 BLUM, A., ANt) RIVEST, R. Training a 3-node neural network is NP-complete. In Proceedings of the 1st Workshop on Computational Learning Theory (Cambridge, Mass., Aug. 3-5). Morgan Kaufmann, San Mateo, Calif., 1988, pp.9-18. Google Scholar
10 BLUMER, A., EHRENFEUCHT, m., HAUSSLER, D., AND WARMUTH, M. K. Occam's Razor. Inf. Process. Lett. 24 (1987), 377-380. Google Scholar
11 BOLLOB/~S, B. Combinatorics. Cambridge Univ. Press, Cambridge, Mass., 1986.Google Scholar
12 BOUCHERON, S., AND SALLANTIN, J. Some remarks about space-complexity of learning and circuit complexity of recognizing. In Proceedings of the 1st Workshop on Computational Learning Theory (Cambridge, Mass., Aug. 3-5). Morgan Kaufmann, San Mateo, Calif., 1988, pp. 125-138. Google Scholar
13 COVER, T. Geometrical and statistical properties of systems of linear inequalities with applications to pattern recognition. IEEE Trans. Elect. Comp. 14 (1965), 326-334.Google Scholar
14 DEVROYE, L.P. Automatic pattern recognition: A study of the probability of error. IEEE Trans. Pattern Analysis and Mach. Intell. 10, 4 (1988), 530-543. Google Scholar
15 DEVROYE, L. P., AND WAGNER, T.J. A distribution-free performance bound in error estimation. IEEE Trans. inf. Theorem 22 (1976), 586-587.Google Scholar
16 DUDA, R., AND HART, P. Pattern Classification and Scene Analysis. Wiley, New York, 1973.Google Scholar
17 DUDLEY, R.M. A course on empirical processes. In Lecture Notes in Mathematics, vol. 1097. Springer-Verlag, New York, 1984.Google Scholar
18 DUDLEY, R. M. Universal Donsker classes and metric entropy. Ann. Prob. 15, 4 (1987), 1306-1326.Google Scholar
19 EDELSBRUNNER, H., AND PREr'ARATA, F.P. Minimal polygonal separation. Inf. Comput. 77 (1988), 218-232. Google Scholar
20 EHRENFEUCHT, A., HAUSSLER, D., KEARNS, M., AND VALIANT, L.G. A general lower bound on the number of examples needed for learning. Inf. Comput., to appear. Google Scholar
21 GARE~, M., AND JOHNSON, D. Computers and Intractability: A Guide to the Theory of NP- Completeness. W. H. Freeman, San Francisco, 1979. Google Scholar
22 GII~L, J. Probabilistic Turing machines. SlAM J. Comput. 6, 4 (1977), 675-695.Google Scholar
23 Gm~, E., AND ZINN, J. Lectures on the central limit theorem for empirical processes. In Lecture Notes in Mathematics, vol. 122 I. Springer-Verlag, New York, 1986.Google Scholar
24 GOLDREICH, O., GOLDWASSER, S., AND MICALI, S. How to construct random functions. J. ACM 33, 4 (Oct. 1986), 792-807. Google Scholar
25 HAMPSON, S. E., AND VOLPER, D. Linear Function neurons: Structure and training. Biol. Cyber. 53 (1986), 203-217. Google Scholar
26 HAUSSLER, D. Learning conjunctive concepts in structural domains. Mach. Learning 4, 1 (1989) Google Scholar
27 HA USSLER, D. Applying Valiant's learning framework to AI concept learning problems. In Proceedings of the 4th International Workshop on Machine Learning. Morgan Kaufinann, San Mateo, Calif., 1987, pp. 324-336.Google Scholar
28 HA USSLER, D. Quantifying inductive bias: AI learning algorithms and Valiant's learning framework. Artif Inteil. 36 (1988), 177-221. Google Scholar
28a HA USSLER, D. Generalizing the PAC model: Sample size bounds from metric dimension-based uniform convergence results. In Proceedings of the 30th IEEE Symposium on Foundations of Computer Science (Research Triangle Park, N.C., Oct. 30-Nov. 1). IEEE, New York:, 1989, to appear.Google Scholar
29 HA USSLER, D., AND WELZL, E. Epsilon-nets and simplex range queries. Disc. Comput. Geometry 2 (1987), 127-151.Google Scholar
30 HAUSSLER, D., KEARNS, M., LITTLESTONE, N., AND WARMUTH, M.K. Equivalence of models for polynomial learnability. Tech. Rep. UCSC-CRL-88-06. Univ. California at Santa Cruz, Santa Cruz, Calif., 1988.Google Scholar
31 HAUSSLER, D., LITTLESTONE, N., AND WARMUTH, M.K. Predicting {0, 1}-functions on~ randomly drawn points. In Proceedings of the 29th IEEE Symposium on Foundations of Computer Science (White Plains, N.Y., Oct.). IEEE, New York, 1988, pp. 100-109.Google Scholar
32 JOHNSON, D.S. Approximation algorithms for combinatorial problems. J. Comput. Syst. Sci. 9 (1974), 256-278.Google Scholar
33 KARMARKAR, N. A new polynomial-time algorithm for linear programming. Comb;natorica 4 (1984), 373-395. Google Scholar
34 KEARNS, M., AND LI, M. Learning in the presence of malicious errors. In Proceedings of the 20th Symposium on Theory of Computing (Chicago, Ill., May 2-4). ACM, New York, 1988, pp. 267-280. Google Scholar
35 KEARNS, M., AND VALIANT, L. Cryptographic limitations on learning Boolean formulae and finite automata. In Proceedings of the 21st ACM Symposium on Theory of Computing (Seattle, Wash., May 15-17). ACM, New York, 1989, pp. 433-444. Google Scholar
36 KEARNS, M., El, M., PITT, L., AND VALIANT, L. On the learnability of Boolean formulae. In Proceedings of the 19th ACM Symposium on Theory of Computation (New York, N.Y., May 25-27). ACM, New York, 1987, pp. 285-295. Google Scholar
37 KEARNS, M., LI, M., PITT, L., AND VALIANT, L. Recent results on Boolean concept learning. In Proceedings of the 4th International Workshop on Machine Learning. Morgan-Kaufinann, San Mateo, Calif., 1987, pp. 337-352.Google Scholar
38 KNUTH, D.E. The Art of Computer Programming, 2nd ed., voI. I. Addison-Wesley, Reading, Mass., 1973. Google Scholar
39 LAIRD, P.D. Learning from good data and bad. Tech. Rep. YALEU/DCS/TR-551. Yale Univ., New Haven, Conn., 1987.Google Scholar
40 LEE, D. T., AND PREPARATA, F.P. Computational geometry--A survey. IEEE Trans. Comput. 33, 12 (1984), 1072-1101.Google Scholar
41 LINIAL, N., MANSOUR, Y., AND RWEST, R. Results on learnability and the Vapnik-Chervonenkis dimension. In Proceedings of the 29th IEEE Symposium on Foundations of Computer Science (White Plains, N.Y., Oct.). IEEE, New York, 1988, pp. 120-129.Google Scholar
42 LITTLESTONE, N. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Mach. Learning 2, 2 (I988), 285-318. Google Scholar
43 MASEK, W.J. Some NP-complete set cover problems. MIT Laboratory for Computer Science, unpublished manuscript.Google Scholar
44 MEGIDDO, N. Linear programming in linear time when the dimension is fixed. J. ACM 31, 1 (Jan. 1984), 114-127. Google Scholar
45 MEGIDDO, N. On the complexity of polyhedral separability. Discrete Comput. Geometry 3 (1988), 325-337.Google Scholar
46 MUROGA, S. Threshold Logic and Its Applications. Wiley, New York, 1971.Google Scholar
47 NATARAJAN, B. K. On learning Boolean functions. In Proceedings of the 19th Amtual ACM Symposium on Theory of Computation (New York, N.Y., May 25-27). ACM, New York, 1987, pp. 296-304. Google Scholar
48 NATARAJAN, B.K. Learning functions from examples. Tech. Rep. CMU-RI-TR-87-19. Carnegie Mellon Univ., Pittsburgh, Pa., Aug. 1987.Google Scholar
49 NIGMATULLIN, R. G. The fastest descent method for coveting problems (in Russian). In Proceedings of a Symposium on Questions of Precision and Efficiency of Computer Algorithms, Book 5. Kiev, 1969, pp. 116-126.Google Scholar
50 PEARL, J. On the connection between the complexity and credibility of inferred models. Int. J. Gen. Syst. 4 (1978), 255-264.Google Scholar
51 PEARL, J. Capacity and error estimates for Boolean classifiers with limited capacity. IEEE Trans. Pattern Analysis Mach. Intell. 1, 4 (1979), 350-355.Google Scholar
52 PITT, L., AND VALIANT, L.G. Computational limitations on learning from examples. J. ACM 35, 4 (Oct. 1988), 965-984. Google Scholar
53 PITT, L., AND WARMUTH, M. Reductions among prediction problems, on the difficulty of predicting automata. In Proceedings of the 3rd IEEE Structure in Complexity Theory Conference (Washington, D.C.). IEEE, New York, 1988, pp. 62-69.Google Scholar
54 POLLARD, D. Convergence of Stochastic Processes. Springer-Verlag, New York, 1984.Google Scholar
55 QUINLAN, R., AND RIVEST, R. Inferring decision trees using the minimum description length principle. Inf. Comput., to appear. Google Scholar
56 RIVEST, R. Learning decision-lists. Mach. Learning 2, 3 (1987), 229-246. Google Scholar
57 ROYDEN, H.L. RealAnalysis, 2nd ed. MacMillan, New York, 1968.Google Scholar
58 SLOAN, R. Types of noise in data for concept learning. In Proceedings of the 1st Workshop on Computational Learning Theory (Cambridge, Mass., Aug. 3-5). Morgan Kaufmann, San MateD, Calif., 1988, pp. 91-96. Google Scholar
59 VALIANT, L.G. A theory of the learnable. Commun. ACM 27, 11 (Nov. I984), 1134-1142. Google Scholar
60 VALIANT, L.G. Learning disjunctions of conjunctions. In Proceedings of the 9th International Conference on Artificial Intelligence (Los Angeles, Calif., Aug.), vol. 1. Morgan Kaufmann, San MateD, Calif., 1985, pp. 560-566.Google Scholar
61 VAPNIK, V.N. Estimation of Dependences Based on Empirical Data. Springer Verlag, New York, 1982. Google Scholar
62 VAPNIK, V. N., AND CHERVONENKIS, A. YA. On the uniform convergence of relative frequencies of events to their probabilities. Theoret. Probi. and its Appl. 16, 2 ( 1971), 264-280.Google Scholar
63 VAPNIK, V. N., AND CHERVONENKIS, A. YA. Theory of Pattern Recognition (in Russian). Nauka, Moscow, 1974.Google Scholar
64 VITTER, J., AND LIN, J. H. Learning in parallel. In Proceedings of the 1st Workshop on Computational Learning Theory (Cambridge, Mass., Aug. 3-5). Morgan Kaufmann, San MateD, Calif., 1988, pp. 106-124. Google Scholar
65 WATANABE, S. Pattern recognition as information compression. In Frontiers of Pattern Recognition, S. Watanabe, Ed. Academic Press, Orlando, Fla., 1972.Google Scholar
66 WENOCUR, R. S., AND DUDLEY, R.M. Some special Vapnik-Chervonenkis classes. Discrete Math. 33 (1981), 313-318.Google Scholar
67 WINSTON, P. Learning structural descriptions from examples. In The Psychology of Computer Vision. McGraw-Hill, New York, 1975.Google Scholar

Index Terms

Learnability and the Vapnik-Chervonenkis dimension

Recommendations

Results on learnability and the Vapnik-Chervonenkis dimension
SFCS '88: Proceedings of the 29th Annual Symposium on Foundations of Computer Science

The problem of learning a concept from examples in a distribution-free model is considered. The notion of dynamic sampling, wherein the number of examples examined can increase with the complexity of the target concept, is introduced. This method is ...
Read More
Sample Compression, Learnability, and the Vapnik-Chervonenkis Dimension

Within the framework of pac-learning, we explore the learnability of concepts from samples using the paradigm of sample compression schemes. A sample compression scheme of size k for a concept class C ⊆ 2^X consists of a compression function and a ...
Read More
SAMPLE COMPRESSION, LEARNABILITY, AND THE VAPNIK-CHERVONENKIS DIMENSION
Read More

Reviews

Reviewer: Vitit Kantabutra

The type of learning considered in this paper, which extends Valiant's theory of “learning by sampling” [1] from Boolean domains to multidimensional Euclidean spaces, is exemplified by the following case. We want an algorithm that learns classes of men, where the men are classified by height and weight. We model each class (or concept) of men by an axis-parallel rectangle in the Euclidean XY plane, where the X axis represents the height and the Y axis represents the weight. Let <__?__Pub Fmt italic>C<__?__Pub Fmt /italic> represent the set of all axis-parallel rectangles, that is, the set of all classes of men. (To be more realistic, we could exclude classes containing unrealistic heights or weights.) In each run, the algorithm must learn a particular class of men, called the <__?__Pub Fmt italic>target class,<__?__Pub Fmt /italic> such as the class of medium-built men. The algorithm is not told which rectangle corresponds to the target class. Instead, it samples <__?__Pub Fmt italic>m<__?__Pub Fmt /italic> random points in the plane and determines whether each point is in the target class. The algorithm then guesses the target class based on this information, as follows: Let <__?__Pub Fmt italic>l, r, b,<__?__Pub Fmt /italic> and <__?__Pub Fmt italic>t<__?__Pub Fmt /italic> denote the minimum and maximum X and Y coordinates of the sample points that are in the target class. Output the rectangle [<__?__Pub Fmt italic>l, r<__?__Pub Fmt /italic>]<__?__Pub Fmt kern Amount="4pt">×<__?__Pub Fmt kern Amount="4pt">[<__?__Pub Fmt italic>b,t<__?__Pub Fmt /italic>] as the guess for the target class. Intuitively, we can improve the guess by increasing the number <__?__Pub Fmt italic>m<__?__Pub Fmt /italic> of sample points. The main objective of this paper is to study the sample complexity of learning algorithms, that is, how many samples are needed to get a good approximation to the target class. The authors use a simple combinatorial parameter of the class <__?__Pub Fmt italic>C<__?__Pub Fmt /italic> of concepts called the Vapnik-Chervonenkis (VC) dimension to characterize the existence of certain important types of learning algorithms; when such algorithms exist, the VC dimension of <__?__Pub Fmt italic>C<__?__Pub Fmt /italic> also appears in the sample complexity. In particular, suppose <__?__Pub Fmt italic>P<__?__Pub Fmt /italic> is a fixed probability distribution on the Euclidean space and the learning algorithm takes samples according to <__?__Pub Fmt italic>P<__?__Pub Fmt /italic>. Let the error of a guess for the target class be the probability that it disagrees with the actual target class on a randomly drawn sample (drawn according to <__?__Pub Fmt italic>P<__?__Pub Fmt /italic>). The <__?__Pub Fmt italic>distribution-free property<__?__Pub Fmt /italic> of a learning algorithm is the property that if <__?__Pub Fmt italic>m<__?__Pub Fmt /italic> is large enough, the algorithm computes a guess such that with arbitrarily high probability the guess achieves arbitrarily small error. Most important, the bound on the sample size must be independent of <__?__Pub Fmt italic>P<__?__Pub Fmt /italic>. The authors show that a distribution-free learning algorithm exists if and only if the VC dimension of <__?__Pub Fmt italic>C<__?__Pub Fmt /italic> is finite. Moreover, if <__?__Pub Fmt italic>C<__?__Pub Fmt /italic> is of finite VC dimension <__?__Pub Fmt italic>d<__?__Pub Fmt /italic>, there exists a learning algorithm for concepts in <__?__Pub Fmt italic>C<__?__Pub Fmt /italic> that achieves error of no more than &egr; with probability at least 1<__?__Pub Fmt kern Amount="4pt">?<__?__Pub Fmt kern Amount="4pt">&dgr; for m= max 4 e log 2 d , 8d e log 13 e <__?__Pub Caret> independently of <__?__Pub Fmt italic>P<__?__Pub Fmt /italic>. Most of the rest of the paper studies sample complexities that are within the feasible range. For example, the authors study polynomial learnability with respect to target complexity. In this case, the sample size is allowed to grow polynomially with respect to the complexity of the target concept. The domain is fixed (for example, the Euclidean plane) but various targets have different complexities (the targets could be the convex polygons, in which case the complexity of each polygon is the number of sides). This paper plays an important role in the development of the theory of learnability.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Journal of the ACM Volume 36, Issue 4
Oct. 1989
279 pages
ISSN:0004-5411
EISSN:1557-735X
DOI:10.1145/76359
Issue’s Table of Contents

Copyright © 1989 ACM
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 October 1989
Published in jacm Volume 36, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1,306
  Total Citations
  View Citations
- 8,285
  Total Downloads
- Downloads (Last 12 months)1,600
- Downloads (Last 6 weeks)198
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Learnability and the Vapnik-Chervonenkis dimension

Journal of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

Results on learnability and the Vapnik-Chervonenkis dimension

Sample Compression, Learnability, and the Vapnik-Chervonenkis Dimension

SAMPLE COMPRESSION, LEARNABILITY, AND THE VAPNIK-CHERVONENKIS DIMENSION

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Learnability and the Vapnik-Chervonenkis dimension

Journal of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

Results on learnability and the Vapnik-Chervonenkis dimension

Sample Compression, Learnability, and the Vapnik-Chervonenkis Dimension

SAMPLE COMPRESSION, LEARNABILITY, AND THE VAPNIK-CHERVONENKIS DIMENSION

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media