Abstract
Valiant's learnability model is extended to learning classes of concepts defined by regions in Euclidean space En. The methods in this paper lead to a unified treatment of some of Valiant's results, along with previous results on distribution-free convergence of certain pattern recognition algorithms. It is shown that the essential condition for distribution-free learnability is finiteness of the Vapnik-Chervonenkis dimension, a simple combinatorial parameter of the class of concepts to be learned. Using this parameter, the complexity and closure properties of learnable classes are analyzed, and the necessary and sufficient conditions are provided for feasible learnability.
- 1 AHO, A., HOPCROFT, J., AND ULLMAN, j. The Design and Analysis of Computer Algorithms. Addison-Wesley, London, 1974. Google Scholar
- 2 ANGLUIN, D. Learning regular sets from queries and counterexamples. Inf. Comput. 75 (1987), 87-106. Google Scholar
- 3 ANGLUIN, D. Queries and concept learning. Mach. Learning 2, 2 (1988), 319-342. Google Scholar
- 4 ANGLUIN, D., AND LAIRD, P. D. Learning from noisy examples. Mach. Learning 2, 2 (1988), 343-370. Google Scholar
- 5 ANGLUIN, D., AND VALIANT, L. Fast probabilistic algorithms for Hamiltonian circuits and matchings. J. Comput. Syst. Sci. 19 (1979), 155-193.Google Scholar
- 6 ASSOUAD, P. Densit6 et Dimension. Ann. Inst. Fourier, Grenoble 33, 3 (1983), 233-282.Google Scholar
- 7 BAUM, E., AND HAUSSLER, D. What size net gives valid generalization. Neural Comput. 1, 1 (1989) 151-160. Google Scholar
- 7a BEN-DAvID, S., BENEDEK, G., AND MANSOUR, Y. A parameterization scheme for classifying models of learnability. In Proceedings of the 2nd Workshop of Computational Learning Theory (Santa Cruz, Calif., July 31-Aug. 2). Morgan Kaufman, San Mateo, Calif., 1989, to appear. Google Scholar
- 8 BENEDEK, G., AND ITAI, A. Nonuniform learnability. In Proceedings of the 15th International Conference on Automata, Languages and Programming (July). 1988, pp. 82-92. Google Scholar
- 9 BLUM, A., ANt) RIVEST, R. Training a 3-node neural network is NP-complete. In Proceedings of the 1st Workshop on Computational Learning Theory (Cambridge, Mass., Aug. 3-5). Morgan Kaufmann, San Mateo, Calif., 1988, pp.9-18. Google Scholar
- 10 BLUMER, A., EHRENFEUCHT, m., HAUSSLER, D., AND WARMUTH, M. K. Occam's Razor. Inf. Process. Lett. 24 (1987), 377-380. Google Scholar
- 11 BOLLOB/~S, B. Combinatorics. Cambridge Univ. Press, Cambridge, Mass., 1986.Google Scholar
- 12 BOUCHERON, S., AND SALLANTIN, J. Some remarks about space-complexity of learning and circuit complexity of recognizing. In Proceedings of the 1st Workshop on Computational Learning Theory (Cambridge, Mass., Aug. 3-5). Morgan Kaufmann, San Mateo, Calif., 1988, pp. 125-138. Google Scholar
- 13 COVER, T. Geometrical and statistical properties of systems of linear inequalities with applications to pattern recognition. IEEE Trans. Elect. Comp. 14 (1965), 326-334.Google Scholar
- 14 DEVROYE, L.P. Automatic pattern recognition: A study of the probability of error. IEEE Trans. Pattern Analysis and Mach. Intell. 10, 4 (1988), 530-543. Google Scholar
- 15 DEVROYE, L. P., AND WAGNER, T.J. A distribution-free performance bound in error estimation. IEEE Trans. inf. Theorem 22 (1976), 586-587.Google Scholar
- 16 DUDA, R., AND HART, P. Pattern Classification and Scene Analysis. Wiley, New York, 1973.Google Scholar
- 17 DUDLEY, R.M. A course on empirical processes. In Lecture Notes in Mathematics, vol. 1097. Springer-Verlag, New York, 1984.Google Scholar
- 18 DUDLEY, R. M. Universal Donsker classes and metric entropy. Ann. Prob. 15, 4 (1987), 1306-1326.Google Scholar
- 19 EDELSBRUNNER, H., AND PREr'ARATA, F.P. Minimal polygonal separation. Inf. Comput. 77 (1988), 218-232. Google Scholar
- 20 EHRENFEUCHT, A., HAUSSLER, D., KEARNS, M., AND VALIANT, L.G. A general lower bound on the number of examples needed for learning. Inf. Comput., to appear. Google Scholar
- 21 GARE~, M., AND JOHNSON, D. Computers and Intractability: A Guide to the Theory of NP- Completeness. W. H. Freeman, San Francisco, 1979. Google Scholar
- 22 GII~L, J. Probabilistic Turing machines. SlAM J. Comput. 6, 4 (1977), 675-695.Google Scholar
- 23 Gm~, E., AND ZINN, J. Lectures on the central limit theorem for empirical processes. In Lecture Notes in Mathematics, vol. 122 I. Springer-Verlag, New York, 1986.Google Scholar
- 24 GOLDREICH, O., GOLDWASSER, S., AND MICALI, S. How to construct random functions. J. ACM 33, 4 (Oct. 1986), 792-807. Google Scholar
- 25 HAMPSON, S. E., AND VOLPER, D. Linear Function neurons: Structure and training. Biol. Cyber. 53 (1986), 203-217. Google Scholar
- 26 HAUSSLER, D. Learning conjunctive concepts in structural domains. Mach. Learning 4, 1 (1989) Google Scholar
- 27 HA USSLER, D. Applying Valiant's learning framework to AI concept learning problems. In Proceedings of the 4th International Workshop on Machine Learning. Morgan Kaufinann, San Mateo, Calif., 1987, pp. 324-336.Google Scholar
- 28 HA USSLER, D. Quantifying inductive bias: AI learning algorithms and Valiant's learning framework. Artif Inteil. 36 (1988), 177-221. Google Scholar
- 28a HA USSLER, D. Generalizing the PAC model: Sample size bounds from metric dimension-based uniform convergence results. In Proceedings of the 30th IEEE Symposium on Foundations of Computer Science (Research Triangle Park, N.C., Oct. 30-Nov. 1). IEEE, New York:, 1989, to appear.Google Scholar
- 29 HA USSLER, D., AND WELZL, E. Epsilon-nets and simplex range queries. Disc. Comput. Geometry 2 (1987), 127-151.Google Scholar
- 30 HAUSSLER, D., KEARNS, M., LITTLESTONE, N., AND WARMUTH, M.K. Equivalence of models for polynomial learnability. Tech. Rep. UCSC-CRL-88-06. Univ. California at Santa Cruz, Santa Cruz, Calif., 1988.Google Scholar
- 31 HAUSSLER, D., LITTLESTONE, N., AND WARMUTH, M.K. Predicting {0, 1}-functions on~ randomly drawn points. In Proceedings of the 29th IEEE Symposium on Foundations of Computer Science (White Plains, N.Y., Oct.). IEEE, New York, 1988, pp. 100-109.Google Scholar
- 32 JOHNSON, D.S. Approximation algorithms for combinatorial problems. J. Comput. Syst. Sci. 9 (1974), 256-278.Google Scholar
- 33 KARMARKAR, N. A new polynomial-time algorithm for linear programming. Comb;natorica 4 (1984), 373-395. Google Scholar
- 34 KEARNS, M., AND LI, M. Learning in the presence of malicious errors. In Proceedings of the 20th Symposium on Theory of Computing (Chicago, Ill., May 2-4). ACM, New York, 1988, pp. 267-280. Google Scholar
- 35 KEARNS, M., AND VALIANT, L. Cryptographic limitations on learning Boolean formulae and finite automata. In Proceedings of the 21st ACM Symposium on Theory of Computing (Seattle, Wash., May 15-17). ACM, New York, 1989, pp. 433-444. Google Scholar
- 36 KEARNS, M., El, M., PITT, L., AND VALIANT, L. On the learnability of Boolean formulae. In Proceedings of the 19th ACM Symposium on Theory of Computation (New York, N.Y., May 25-27). ACM, New York, 1987, pp. 285-295. Google Scholar
- 37 KEARNS, M., LI, M., PITT, L., AND VALIANT, L. Recent results on Boolean concept learning. In Proceedings of the 4th International Workshop on Machine Learning. Morgan-Kaufinann, San Mateo, Calif., 1987, pp. 337-352.Google Scholar
- 38 KNUTH, D.E. The Art of Computer Programming, 2nd ed., voI. I. Addison-Wesley, Reading, Mass., 1973. Google Scholar
- 39 LAIRD, P.D. Learning from good data and bad. Tech. Rep. YALEU/DCS/TR-551. Yale Univ., New Haven, Conn., 1987.Google Scholar
- 40 LEE, D. T., AND PREPARATA, F.P. Computational geometry--A survey. IEEE Trans. Comput. 33, 12 (1984), 1072-1101.Google Scholar
- 41 LINIAL, N., MANSOUR, Y., AND RWEST, R. Results on learnability and the Vapnik-Chervonenkis dimension. In Proceedings of the 29th IEEE Symposium on Foundations of Computer Science (White Plains, N.Y., Oct.). IEEE, New York, 1988, pp. 120-129.Google Scholar
- 42 LITTLESTONE, N. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Mach. Learning 2, 2 (I988), 285-318. Google Scholar
- 43 MASEK, W.J. Some NP-complete set cover problems. MIT Laboratory for Computer Science, unpublished manuscript.Google Scholar
- 44 MEGIDDO, N. Linear programming in linear time when the dimension is fixed. J. ACM 31, 1 (Jan. 1984), 114-127. Google Scholar
- 45 MEGIDDO, N. On the complexity of polyhedral separability. Discrete Comput. Geometry 3 (1988), 325-337.Google Scholar
- 46 MUROGA, S. Threshold Logic and Its Applications. Wiley, New York, 1971.Google Scholar
- 47 NATARAJAN, B. K. On learning Boolean functions. In Proceedings of the 19th Amtual ACM Symposium on Theory of Computation (New York, N.Y., May 25-27). ACM, New York, 1987, pp. 296-304. Google Scholar
- 48 NATARAJAN, B.K. Learning functions from examples. Tech. Rep. CMU-RI-TR-87-19. Carnegie Mellon Univ., Pittsburgh, Pa., Aug. 1987.Google Scholar
- 49 NIGMATULLIN, R. G. The fastest descent method for coveting problems (in Russian). In Proceedings of a Symposium on Questions of Precision and Efficiency of Computer Algorithms, Book 5. Kiev, 1969, pp. 116-126.Google Scholar
- 50 PEARL, J. On the connection between the complexity and credibility of inferred models. Int. J. Gen. Syst. 4 (1978), 255-264.Google Scholar
- 51 PEARL, J. Capacity and error estimates for Boolean classifiers with limited capacity. IEEE Trans. Pattern Analysis Mach. Intell. 1, 4 (1979), 350-355.Google Scholar
- 52 PITT, L., AND VALIANT, L.G. Computational limitations on learning from examples. J. ACM 35, 4 (Oct. 1988), 965-984. Google Scholar
- 53 PITT, L., AND WARMUTH, M. Reductions among prediction problems, on the difficulty of predicting automata. In Proceedings of the 3rd IEEE Structure in Complexity Theory Conference (Washington, D.C.). IEEE, New York, 1988, pp. 62-69.Google Scholar
- 54 POLLARD, D. Convergence of Stochastic Processes. Springer-Verlag, New York, 1984.Google Scholar
- 55 QUINLAN, R., AND RIVEST, R. Inferring decision trees using the minimum description length principle. Inf. Comput., to appear. Google Scholar
- 56 RIVEST, R. Learning decision-lists. Mach. Learning 2, 3 (1987), 229-246. Google Scholar
- 57 ROYDEN, H.L. RealAnalysis, 2nd ed. MacMillan, New York, 1968.Google Scholar
- 58 SLOAN, R. Types of noise in data for concept learning. In Proceedings of the 1st Workshop on Computational Learning Theory (Cambridge, Mass., Aug. 3-5). Morgan Kaufmann, San MateD, Calif., 1988, pp. 91-96. Google Scholar
- 59 VALIANT, L.G. A theory of the learnable. Commun. ACM 27, 11 (Nov. I984), 1134-1142. Google Scholar
- 60 VALIANT, L.G. Learning disjunctions of conjunctions. In Proceedings of the 9th International Conference on Artificial Intelligence (Los Angeles, Calif., Aug.), vol. 1. Morgan Kaufmann, San MateD, Calif., 1985, pp. 560-566.Google Scholar
- 61 VAPNIK, V.N. Estimation of Dependences Based on Empirical Data. Springer Verlag, New York, 1982. Google Scholar
- 62 VAPNIK, V. N., AND CHERVONENKIS, A. YA. On the uniform convergence of relative frequencies of events to their probabilities. Theoret. Probi. and its Appl. 16, 2 ( 1971), 264-280.Google Scholar
- 63 VAPNIK, V. N., AND CHERVONENKIS, A. YA. Theory of Pattern Recognition (in Russian). Nauka, Moscow, 1974.Google Scholar
- 64 VITTER, J., AND LIN, J. H. Learning in parallel. In Proceedings of the 1st Workshop on Computational Learning Theory (Cambridge, Mass., Aug. 3-5). Morgan Kaufmann, San MateD, Calif., 1988, pp. 106-124. Google Scholar
- 65 WATANABE, S. Pattern recognition as information compression. In Frontiers of Pattern Recognition, S. Watanabe, Ed. Academic Press, Orlando, Fla., 1972.Google Scholar
- 66 WENOCUR, R. S., AND DUDLEY, R.M. Some special Vapnik-Chervonenkis classes. Discrete Math. 33 (1981), 313-318.Google Scholar
- 67 WINSTON, P. Learning structural descriptions from examples. In The Psychology of Computer Vision. McGraw-Hill, New York, 1975.Google Scholar
Index Terms
- Learnability and the Vapnik-Chervonenkis dimension
Recommendations
Results on learnability and the Vapnik-Chervonenkis dimension
SFCS '88: Proceedings of the 29th Annual Symposium on Foundations of Computer ScienceThe problem of learning a concept from examples in a distribution-free model is considered. The notion of dynamic sampling, wherein the number of examples examined can increase with the complexity of the target concept, is introduced. This method is ...
Sample Compression, Learnability, and the Vapnik-Chervonenkis Dimension
Within the framework of pac-learning, we explore the learnability of concepts from samples using the paradigm of sample compression schemes. A sample compression scheme of size k for a concept class C ⊆ 2X consists of a compression function and a ...
Comments