skip to main content
10.1145/1014052.1014121acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Diagnosing extrapolation: tree-based density estimation

Published:22 August 2004Publication History

ABSTRACT

There has historically been very little concern with extrapolation in Machine Learning, yet extrapolation can be critical to diagnose. Predictor functions are almost always learned on a set of highly correlated data comprising a very small segment of predictor space. Moreover, flexible predictors, by their very nature, are not controlled at points of extrapolation. This becomes a problem for diagnostic tools that require evaluation on a product distribution. It is also an issue when we are trying to optimize a response over some variable in the input space. Finally, it can be a problem in non-static systems in which the underlying predictor distribution gradually drifts with time or when typographical errors misrecord the values of some predictors.We present a diagnosis for extrapolation as a statistical test for a point originating from the data distribution as opposed to a null hypothesis uniform distribution. This allows us to employ general classification methods for estimating such a test statistic. Further, we observe that CART can be modified to accept an exact distribution as an argument, providing a better classification tool which becomes our extrapolation-detection procedure. We explore some of the advantages of this approach and present examples of its practical application.

References

  1. L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Wadsworth, 1984.Google ScholarGoogle Scholar
  2. J. H. Friedman. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5):1189--1232, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  3. D. Harrison and D. L. Rubinfeld. Hedonic prices and the demand for clean air. Journal of Environmental Economics and Management, 5:81--102, 1978.Google ScholarGoogle ScholarCross RefCross Ref
  4. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Machine Learning: Data Mining, Inference and Prediction. Springer, New York, 2001.Google ScholarGoogle Scholar
  5. G. Hooker. Black box diagnostics and the problem of extrapolation: Extending the functional anova. Technical report, Stanford University, 2004.Google ScholarGoogle Scholar
  6. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaurmann, San Mateo, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R-project. http://www.r-project.org/.Google ScholarGoogle Scholar

Index Terms

  1. Diagnosing extrapolation: tree-based density estimation

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
          August 2004
          874 pages
          ISBN:1581138881
          DOI:10.1145/1014052

          Copyright © 2004 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 22 August 2004

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate1,133of8,635submissions,13%

          Upcoming Conference

          KDD '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader