skip to main content
article
Free Access

Efficient noise-tolerant learning from statistical queries

Published:01 November 1998Publication History
Skip Abstract Section

Abstract

In this paper, we study the problem of learning in the presence of classification noise in the probabilistic learning model of Valiant and its variants. In order to identify the class of “robust” learning algorithms in the most general way, we formalize a new but related model of learning from statistical queries. Intuitively, in this model a learning algorithm is forbidden to examine individual examples of the unknown target function, but is given acess to an oracle providing estimates of probabilities over the sample space of random examples.

One of our main results shows that any class of functions learnable from statistical queries is in fact learnable with classification noise in Valiant's model, with a noise rate approaching the information-theoretic barrier of 1/2. We then demonstrate the generality of the statistical query model, showing that practically every class learnable in Valiant's model and its variants can also be learned in the new model (and thus can be learned in the presence of noise). A notable exception to this statement is the class of parity functions, which we prove is not learnable from statistical queries, and for which no noise-tolerant algorithm is known.

References

  1. ANGLUIN, D., AND LAIRD, P. 1988. Learning from noisy examples. Mach. Learn. 2, 4, 343-370.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. ASLAM, J. A., AND DECATUR, S.E. 1993. General bounds on statistical query learning and PAC learning with noise via hypothesis boosting. In Proceedings of the 34th Annual Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, Calif. pp. 282-291.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. ASLAM, J.A., AND DECATUR, S.E. 1995. Specification and simulation of statistical query algorithms for efficiency and noise tolerance. In Proceedings of the 8th Annual Conference on Computational Learning Theory (Santa Cruz, Calif., July 5-8). ACM, New York, pp. 437-446.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. BAUM, E.B., AND LYUU, Y.-D. 1991. The transition to perfect generalization in perceptrons. Neural Comput. 3, 386-401.]]Google ScholarGoogle ScholarCross RefCross Ref
  5. BLUM, A., FRIEZE, A., KANNAN, R., AND VEMPALA, S. 1996. A polynomial-time algorithm for learning noisy linear threshold functions. In Proceedings of the 37th Annual Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, Calif. pp. 330-338.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. BLUM, A., FURST, M., JACKSON, J., KEARNS, M., MANSOUR, Y., AND RUDICH, S. 1994. Weakly learning DNF and characterizing statistical query learning using Fourier analysis. In Proceedings of the 26th Annual ACM Symposium on the Theory of Computing (Montreal, Que., Canada, May 23-25). ACM, New York, pp. 253-262.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. BLUMER, A., EHRENFEUCHT, A., HAUSSLER, D., AND WARMUTH, M.K. 1989. Learnability and Vapnik-Chervonenkis dimension. J. ACM 36, 4 (Oct.), 929-965.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. COHEN, E. 1997. Learning noisy perceptrons by a perceptron in polynomial time. In Proceedings of the 38th Annual Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, Calif.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. EHRENFEUCHT, A., HAUSSLER, D., KEARNS, M., AND VALIANT, L. 1988. A general lower bound on the number of examples needed for learning. In Proceedings of the 1st Workshop on Computational Learning Theory (Cambridge, Mass., Aug.). Morgan-Kaufmann, San Mateo, Calif., pp. 139-154.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. FISHER, P., AND SIMON, H.U. 1992. On learning ring-sum expansions. SIAM J. Comput. 21, 1, 181-192.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. FURST, M.L., JACKSON, J. C., AND SMITH, S.W. 1991. Improved learning of AC~ functions. In Proceedings of the 4th Annual Workshop on Computational Learning Theory (Aug.). Morgan- Kaufmann, San Mateo, Calif., pp. 317-325.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. GARDNER, E., AND DERRIDA, B. 1989. Three unfinished works on the optimal storage capacity of networks. J. Phys. A: Math. Gen. 22, 1983-1994.]]Google ScholarGoogle ScholarCross RefCross Ref
  13. HANCOCK, T., AND MANSOUR, Y. 1991. Learning monotone k DNF formulas on product distributions. In Proceedings of the 4th Annual Workshop on Computational Learning Theory (Aug.). Morgan-Kaufmann, San Mateo, Calif. pp. 179-183.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. HAUSSLER, D. 1988. Quantifying inductive bias: AI learning algorithms and Valiant's learning framework. Artif. Int. 36, 177-221.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. HELMBOLD, D., SLOAN, R., AND WARMUTH, M.K. 1992. Learning integer lattices. SIAM J. Comput. 21, 2, 240-266.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. KEARNS, M. 1990. The Computational Complexity of Machine Learning. The MIT Press, Cambridge, Mass.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. KEARNS, M., AND LI, M. 1993. Learning in the presence of malicious errors. SIAM J. Comput. 22, 4, 807-837.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. KEARNS, M., LI, M., PITT, L., AND VALIANT, L. 1987. On the learnability of Boolean formulae. In Proceedings of the 19th Annual ACM Symposium on Theory of Computing (New York, N.Y., May 25-27). ACM, New York, pp. 285-295.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. KEARNS, M., AND PITT, L. 1989. A polynomial-time algorithm for learning k-variable pattern languages from examples. In Proceedings of the 2nd Annual Workshop on Computational Learning Theory (July). ACM, New York, pp. 57-71.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. KEARNS, M.J., AND SCHAPIRE, R.E. 1994. Efficient distribution-free learning of probabilistic concepts. J. Comput. Syst. Sci. 48, 3, 464-497.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. KEARNS, M. J., AND VAZIRANI, g. 1994. An Introduction to Computational Learning Theory. The MIT Press, Cambridge, Mass.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. LAIRD, P.D. 1988. Learning from Good and Bad Data. Kluwer International Series in Engineering and Computer Science. Kluwer Academic Publishers, Boston, Mass.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. LINIAL, N., MANSOUR, Y., AND NISAN, N. 1989. Constant depth circuits, Fourier transform, and learnability. In Proceedings of the 30th Annual Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, Calif., pp. 574-579.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. MINSKY, M., AND PAPERT, S. 1988. Perceptrons: An Introduction to Computational Geometry (Expanded Edition). The MIT Press, Cambridge, Mass.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. PITT, L., AND VALIANT, L.G. 1988. Computational limitations on learning from examples. J. ACM 35, 4 (Oct.), 965-984.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. RIVEST, R.L. 1987. Learning decision lists. Mach. Learn. 2, 3, 229-246.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. SAKAKIBARA, Y. 1991. Algorithmic learning of formal languages and decision trees. Ph.D. dissertation. Tokyo Institute of Technology. Res. Rep. IIAD-RR-91-22E. International Institute for Advanced Study of Social Information Science, Fujitsu Laboratories, Ltd., Tokyo, Japan.]]Google ScholarGoogle Scholar
  28. SCHAPIRE, R.E. 1990. Learning probabilistic read-once formulas on product distributions. Mach. Learn. 5, 2, 197-227.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. SCHAPIRE, R.E. 1992. The Design and Analysis of Efficient Learning Algorithms. The MIT Press, Cambridge, Mass.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. SEUNG, H.S., SOMPOLINSKY, H., AND TISHBY, N. 1992. Statistical mechanics of learning from examples. Phys. Rev. A 45, 8 (Apr.), 6056-6091.]]Google ScholarGoogle ScholarCross RefCross Ref
  31. SLOAN, R.H. 1988. Types of noise in data for concept learning. In Proceedings of the 1988 Workshop on Computational Learning Theory (Aug.). Morgan-Kaufmann, San Mateo, Calif. pp. 91-96.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. VALIANT, L.G. 1984. A theory of the learnable. Commun. ACM 27, 11 (Nov.), 1134-1142.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. VALIANT, L.G. 1985. Learning disjunctions of conjunctions. In Proceedings of the 9th International Joint Conference on Artificial Intelligence (Aug.). Morgan-Kaufmann, Los Altos, Calif., pp. 560-566.]]Google ScholarGoogle Scholar
  34. VAPNIK, g. N., AND CHERVONENKIS, A. YA. 1971. On the uniform convergence of relative frequencies of events to their probabilities. Theory Prob. Appl. XVI, 2, 264-280.]]Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Efficient noise-tolerant learning from statistical queries

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image Journal of the ACM
          Journal of the ACM  Volume 45, Issue 6
          Nov. 1998
          185 pages
          ISSN:0004-5411
          EISSN:1557-735X
          DOI:10.1145/293347
          Issue’s Table of Contents

          Copyright © 1998 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 November 1998
          Published in jacm Volume 45, Issue 6

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader