Abstract
In this article we propose a moment-based method for studying models and model selection measures. By focusing on the probabilistic space of classifiers induced by the classification algorithm rather than on that of datasets, we obtain efficient characterizations for computing the moments, which is followed by visualization of the resulting formulae that are too complicated for direct interpretation. By assuming the data to be drawn independently and identically distributed from the underlying probability distribution, and by going over the space of all possible datasets, we establish general relationships between the generalization error, hold-out-set error, cross-validation error, and leave-one-out error. We later exemplify the method and the results by studying the behavior of the errors for the naive Bayes classifier.
- Bengio, Y. and Grandvalet, Y. 2003. No unbiased estimator of the variance of k-fold cross validation. J. Mach. Learn. Res. Google ScholarDigital Library
- Bertsimas, D. and Popescu, I. 1998. Optimal inequalities in probability theory: A convex optimization approach. Tech. rep., Department of Mathematics, O.R., Cambridge, Massachusetts, 02139.Google Scholar
- Blum, A., Kalai, A., and Langford, J. 1999. Beating the hold-out: Bounds for k-fold and progressive cross-validation. In Computational Learing Theory, 203--208. Google ScholarDigital Library
- Boucheron, S., Bousquet, O., and Lugosi, G. 2005. Introduction to statistical learning theory. http://www.kyb.mpg.de/publications/pdfs/pdf2819.pdf.Google Scholar
- Boyd, S. and Vandenberghe, L. 2004. Convex Optimization. Cambridge. Google ScholarDigital Library
- Breiman, L. 1996. Heuristics of instability and stabilization in model selection. Ann. Statis. 24, 6, 2350--2383.Google ScholarCross Ref
- Butler, R. and Sutton, R. 1998. Saddlepoint approximation for multivariate cumulative distribution functions and probability computations in sampling theory and outlier testing. J. Amer. Statis. Assoc. 93, 442, 596--604.Google ScholarCross Ref
- Connor-Linton, J. 2003. Chi square tutorial. http://www.georgetown.edu/faculty/ballc/webtools/web_chi_tut.html.Google Scholar
- Devroye, L., Györfi, L., and Lugosi, G. 1996. A Probabilistic Theory of Pattern Recognition. Springer.Google Scholar
- Doob, J. 1994. Measure Theory. Springer.Google Scholar
- Elisseeff, A. and Pontil, M. 2003. Leave-one-out error and stability of learning algorithms with applications. In Learning Theory and Practice. IOS Press.Google Scholar
- Goutte, C. 1997. Note on free lunches and cross-validation. Neural Comput. 9, 6, 1245--1249. Google ScholarDigital Library
- Hall, P. 1992. The Bootstrap and Edgeworth Expansion. Springer.Google Scholar
- Isii, K. 1960. The extrema of probability determined by generalized moments(i) bounded random variables. Ann. Inst. Stat. Math 12, 119--133.Google ScholarCross Ref
- Isii, K. 1963. On the sharpeness of chebyshev-type inequalities. Ann. Inst. Stat. Math 14, 185--197.Google ScholarCross Ref
- Karlin, S. and Shapely, L. 1953. Geometry of moment spaces. Memoirs Amer. Math. Soc. 12.Google Scholar
- Kearns, M. and Ron, D. 1997. Algorithmic stability and sanity-check bounds for leave-one-out cross-validation. In Computational Learing Theory, 152--162. Google ScholarDigital Library
- Kohavi, R. 1995. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the 14th International Joint Conference on Artificial Intelligence. Morgan Kaufmann, 1137--1143. Google ScholarDigital Library
- Kutin, S. and Niyogi, P. 2002. Almost-Everywhere algorithmic stability and generalization error. In Proceedings of the 18th Annual Conference on Uncertainty in Artificial Intelligence (UAI-02). Morgan Kaufmann Publishers, 275--282. Google ScholarDigital Library
- Langford, J. 2005. Filed under: Prediction theory, problems. http://hunch.net/index.php?p=29.Google Scholar
- Langley, P. and Sage, S. 1999. Tractable average-case analysis of naive Bayesian classifiers. In Proceedings of the 16th International Conference on Machine Learning. Morgan Kaufmann, 220--228. Google ScholarDigital Library
- Levin, B. 1981. A representation for multinomial cumulative distribution functions. Ann. Statis. 9, 5, 1123--1126.Google ScholarCross Ref
- Moore, A. and Lee, M. 1994. Efficient algorithms for minimizing cross validation error. In Proceedings of the International Conference on Machine Learning, 190--198.Google Scholar
- Plutowski, M. 1996. Survey: Cross-validation in theory and in practice. www.emotivate.com/CvSurvey.doc.Google Scholar
- Prekopa, A. 1989. The discrete moment problem and linear programming. RUTCOR Res. rep.Google Scholar
- Shao, J. 1993. Linear model selection by cross validation. J. Amer. Statis. Assoc. 88.Google ScholarCross Ref
- Vapnik, V. 1998. Statistical Learning Theory. Wiley & Sons.Google Scholar
- Williamson, R. 2001. Srm and vc theory (statistical learning theory). http://axiom.anu.edu.au/williams/papers/P151.pdf.Google Scholar
- Wolfram-Research. 2009. Mathematica. http://www.wolfram.com/.Google Scholar
- Wu, S.-P. and Boyd, S. 1996. Sdpsol: A parser/solver for sdp and maxdet problems with matrix structure. http://www.stanford.edu/boyd/SDPSOL.html.Google Scholar
- Zhu, H. and Rohwer, R. 1996. No free lunch for cross validation. Neural Comput. 8, 7, 1421--1426. Google ScholarDigital Library
Index Terms
- Semi-analytical method for analyzing models and model selection measures based on moment analysis
Recommendations
Estimating Generalization Error on Two-Class Datasets Using Out-of-Bag Estimates
For two-class datasets, we provide a method for estimating the generalization error of a bag using out-of-bag estimates. In bagging, each predictor (single hypothesis) is learned from a bootstrap sample of the training examples; the output of a bag (a ...
The role of margins in boosting and ensemble performance
Boosting refers to methods that create a sequence of classifiers that perform at least slightly better than random (weak learners) and combine them into a highly accurate ensemble model (strong learners) through weighted voting. There is sufficient ...
Comments