article

Free Access

Efficient noise-tolerant learning from statistical queries

Author:
Michael Kearns

AT&T Labs-Research, Florham Park, NJ

AT&T Labs-Research, Florham Park, NJ
View Profile

Authors Info & Claims

Journal of the ACM Volume 45 Issue 6pp 983–1006https://doi.org/10.1145/293347.293351

Published:01 November 1998Publication History

Journal of the ACM

Abstract

In this paper, we study the problem of learning in the presence of classification noise in the probabilistic learning model of Valiant and its variants. In order to identify the class of “robust” learning algorithms in the most general way, we formalize a new but related model of learning from statistical queries. Intuitively, in this model a learning algorithm is forbidden to examine individual examples of the unknown target function, but is given acess to an oracle providing estimates of probabilities over the sample space of random examples.

One of our main results shows that any class of functions learnable from statistical queries is in fact learnable with classification noise in Valiant's model, with a noise rate approaching the information-theoretic barrier of 1/2. We then demonstrate the generality of the statistical query model, showing that practically every class learnable in Valiant's model and its variants can also be learned in the new model (and thus can be learned in the presence of noise). A notable exception to this statement is the class of parity functions, which we prove is not learnable from statistical queries, and for which no noise-tolerant algorithm is known.

References

ANGLUIN, D., AND LAIRD, P. 1988. Learning from noisy examples. Mach. Learn. 2, 4, 343-370.]] Google ScholarDigital Library
ASLAM, J. A., AND DECATUR, S.E. 1993. General bounds on statistical query learning and PAC learning with noise via hypothesis boosting. In Proceedings of the 34th Annual Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, Calif. pp. 282-291.]]Google ScholarDigital Library
ASLAM, J.A., AND DECATUR, S.E. 1995. Specification and simulation of statistical query algorithms for efficiency and noise tolerance. In Proceedings of the 8th Annual Conference on Computational Learning Theory (Santa Cruz, Calif., July 5-8). ACM, New York, pp. 437-446.]] Google ScholarDigital Library
BAUM, E.B., AND LYUU, Y.-D. 1991. The transition to perfect generalization in perceptrons. Neural Comput. 3, 386-401.]]Google ScholarCross Ref
BLUM, A., FRIEZE, A., KANNAN, R., AND VEMPALA, S. 1996. A polynomial-time algorithm for learning noisy linear threshold functions. In Proceedings of the 37th Annual Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, Calif. pp. 330-338.]] Google ScholarDigital Library
BLUM, A., FURST, M., JACKSON, J., KEARNS, M., MANSOUR, Y., AND RUDICH, S. 1994. Weakly learning DNF and characterizing statistical query learning using Fourier analysis. In Proceedings of the 26th Annual ACM Symposium on the Theory of Computing (Montreal, Que., Canada, May 23-25). ACM, New York, pp. 253-262.]] Google ScholarDigital Library
BLUMER, A., EHRENFEUCHT, A., HAUSSLER, D., AND WARMUTH, M.K. 1989. Learnability and Vapnik-Chervonenkis dimension. J. ACM 36, 4 (Oct.), 929-965.]] Google ScholarDigital Library
COHEN, E. 1997. Learning noisy perceptrons by a perceptron in polynomial time. In Proceedings of the 38th Annual Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, Calif.]] Google ScholarDigital Library
EHRENFEUCHT, A., HAUSSLER, D., KEARNS, M., AND VALIANT, L. 1988. A general lower bound on the number of examples needed for learning. In Proceedings of the 1st Workshop on Computational Learning Theory (Cambridge, Mass., Aug.). Morgan-Kaufmann, San Mateo, Calif., pp. 139-154.]] Google ScholarDigital Library
FISHER, P., AND SIMON, H.U. 1992. On learning ring-sum expansions. SIAM J. Comput. 21, 1, 181-192.]] Google ScholarDigital Library
FURST, M.L., JACKSON, J. C., AND SMITH, S.W. 1991. Improved learning of AC~ functions. In Proceedings of the 4th Annual Workshop on Computational Learning Theory (Aug.). Morgan- Kaufmann, San Mateo, Calif., pp. 317-325.]] Google ScholarDigital Library
GARDNER, E., AND DERRIDA, B. 1989. Three unfinished works on the optimal storage capacity of networks. J. Phys. A: Math. Gen. 22, 1983-1994.]]Google ScholarCross Ref
HANCOCK, T., AND MANSOUR, Y. 1991. Learning monotone k DNF formulas on product distributions. In Proceedings of the 4th Annual Workshop on Computational Learning Theory (Aug.). Morgan-Kaufmann, San Mateo, Calif. pp. 179-183.]] Google ScholarDigital Library
HAUSSLER, D. 1988. Quantifying inductive bias: AI learning algorithms and Valiant's learning framework. Artif. Int. 36, 177-221.]] Google ScholarDigital Library
HELMBOLD, D., SLOAN, R., AND WARMUTH, M.K. 1992. Learning integer lattices. SIAM J. Comput. 21, 2, 240-266.]] Google ScholarDigital Library
KEARNS, M. 1990. The Computational Complexity of Machine Learning. The MIT Press, Cambridge, Mass.]] Google ScholarDigital Library
KEARNS, M., AND LI, M. 1993. Learning in the presence of malicious errors. SIAM J. Comput. 22, 4, 807-837.]] Google ScholarDigital Library
KEARNS, M., LI, M., PITT, L., AND VALIANT, L. 1987. On the learnability of Boolean formulae. In Proceedings of the 19th Annual ACM Symposium on Theory of Computing (New York, N.Y., May 25-27). ACM, New York, pp. 285-295.]] Google ScholarDigital Library
KEARNS, M., AND PITT, L. 1989. A polynomial-time algorithm for learning k-variable pattern languages from examples. In Proceedings of the 2nd Annual Workshop on Computational Learning Theory (July). ACM, New York, pp. 57-71.]] Google ScholarDigital Library
KEARNS, M.J., AND SCHAPIRE, R.E. 1994. Efficient distribution-free learning of probabilistic concepts. J. Comput. Syst. Sci. 48, 3, 464-497.]] Google ScholarDigital Library
KEARNS, M. J., AND VAZIRANI, g. 1994. An Introduction to Computational Learning Theory. The MIT Press, Cambridge, Mass.]] Google ScholarDigital Library
LAIRD, P.D. 1988. Learning from Good and Bad Data. Kluwer International Series in Engineering and Computer Science. Kluwer Academic Publishers, Boston, Mass.]] Google ScholarDigital Library
LINIAL, N., MANSOUR, Y., AND NISAN, N. 1989. Constant depth circuits, Fourier transform, and learnability. In Proceedings of the 30th Annual Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, Calif., pp. 574-579.]]Google ScholarDigital Library
MINSKY, M., AND PAPERT, S. 1988. Perceptrons: An Introduction to Computational Geometry (Expanded Edition). The MIT Press, Cambridge, Mass.]]Google ScholarDigital Library
PITT, L., AND VALIANT, L.G. 1988. Computational limitations on learning from examples. J. ACM 35, 4 (Oct.), 965-984.]] Google ScholarDigital Library
RIVEST, R.L. 1987. Learning decision lists. Mach. Learn. 2, 3, 229-246.]] Google ScholarDigital Library
SAKAKIBARA, Y. 1991. Algorithmic learning of formal languages and decision trees. Ph.D. dissertation. Tokyo Institute of Technology. Res. Rep. IIAD-RR-91-22E. International Institute for Advanced Study of Social Information Science, Fujitsu Laboratories, Ltd., Tokyo, Japan.]]Google Scholar
SCHAPIRE, R.E. 1990. Learning probabilistic read-once formulas on product distributions. Mach. Learn. 5, 2, 197-227.]] Google ScholarDigital Library
SCHAPIRE, R.E. 1992. The Design and Analysis of Efficient Learning Algorithms. The MIT Press, Cambridge, Mass.]] Google ScholarDigital Library
SEUNG, H.S., SOMPOLINSKY, H., AND TISHBY, N. 1992. Statistical mechanics of learning from examples. Phys. Rev. A 45, 8 (Apr.), 6056-6091.]]Google ScholarCross Ref
SLOAN, R.H. 1988. Types of noise in data for concept learning. In Proceedings of the 1988 Workshop on Computational Learning Theory (Aug.). Morgan-Kaufmann, San Mateo, Calif. pp. 91-96.]] Google ScholarDigital Library
VALIANT, L.G. 1984. A theory of the learnable. Commun. ACM 27, 11 (Nov.), 1134-1142.]] Google ScholarDigital Library
VALIANT, L.G. 1985. Learning disjunctions of conjunctions. In Proceedings of the 9th International Joint Conference on Artificial Intelligence (Aug.). Morgan-Kaufmann, Los Altos, Calif., pp. 560-566.]]Google Scholar
VAPNIK, g. N., AND CHERVONENKIS, A. YA. 1971. On the uniform convergence of relative frequencies of events to their probabilities. Theory Prob. Appl. XVI, 2, 264-280.]]Google ScholarCross Ref

Index Terms

Efficient noise-tolerant learning from statistical queries
1. Computing methodologies
  1. Artificial intelligence
    1. Philosophical/theoretical foundations of artificial intelligence
  2. Machine learning
2. Mathematics of computing
  1. Probability and statistics

Recommendations

Noise-tolerant learning, the parity problem, and the statistical query model

We describe a slightly subexponential time algorithm for learning parity functions in the presence of random classification noise, a problem closely related to several cryptographic and coding problems. Our algorithm runs in polynomial time for the case ...
Read More
Toward Efficient Agnostic Learning
Special issue on computational learning theory, COLT'92

In this paper we initiate an investigation of generalizations of the Probably Approximately Correct (PAC) learning model that attempt to significantly weaken the target function assumptions. The ultimate goal in this direction is informally termed ...
Read More
Structural Results About On-line Learning Models With and Without Queries

We solve an open problem of Maass and Turán, showing that the optimal mistake-bound when learning a given concept class without membership queries is within a constant factor of the optimal number of mistakes plus membership queries required by an ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Journal of the ACM Volume 45, Issue 6
Nov. 1998
185 pages
ISSN:0004-5411
EISSN:1557-735X
DOI:10.1145/293347
Editor:
F. Thomson Leighton
Massachusetts Institute of Technology, Cambridge
Issue’s Table of Contents
Copyright © 1998 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 November 1998
Published in jacm Volume 45, Issue 6

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
computational learning theory
machine learning
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 332
  Total Citations
  View Citations
- 2,444
  Total Downloads
- Downloads (Last 12 months)333
- Downloads (Last 6 weeks)30
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Efficient noise-tolerant learning from statistical queries

Journal of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

Noise-tolerant learning, the parity problem, and the statistical query model

Toward Efficient Agnostic Learning

Structural Results About On-line Learning Models With and Without Queries

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Efficient noise-tolerant learning from statistical queries

Journal of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

Noise-tolerant learning, the parity problem, and the statistical query model

Toward Efficient Agnostic Learning

Structural Results About On-line Learning Models With and Without Queries

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media