Skip to main content
Log in

Model Diagnostics for Remote Access Regression Servers

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

To protect public-use microdata, one approach is not to allow users access to the microdata. Instead, users submit analyses to a remote computer that reports back basic output from the fitted model, such as coefficients and standard errors. To be most useful, this remote server also should provide some way for users to check the fit of their models, without disclosing actual data values. This paper discusses regression diagnostics for remote servers. The proposal is to release synthetic diagnostics—i.e. simulated values of residuals and dependent and independent variables–constructed to mimic the relationships among the real-data residuals and independent variables. Using simulations, it is shown that the proposed synthetic diagnostics can reveal model inadequacies without substantial increase in the risk of disclosures. This approach also can be used to develop remote server diagnostics for generalized linear models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abowd J.M. and Woodcock S.D. 2001. Disclosure limitation in longitudinal linked data. In: Doyle P., Lane J., Zayatz L., and Theeuwes J., (Eds.), Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, North-Holland, Amsterdam, 215–277.

    Google Scholar 

  • Bustros J. 2000. Access to microdata files at Statistics Canada. In: Proceedings of the Survey Methods Section of the Statistical Society of Canada, pp. 61–68.

  • Cleveland W.S. 1979. Robust locally-weighted regression and smoothing scatterplots. Journal of the American Statistical Association 74: 829–836.

    Google Scholar 

  • Duncan G.T., Keller-McNulty S.A., and Stokes S.L. 2001. Disclosure risk vs. data utility: The R-U confidentiality map. Tech. Rep., U.S. National Institute of Statistical Sciences.

  • Duncan G.T. and Mukherjee S. 2000. Optimal disclosure limitation strategy in statistical databases: Deterring tracker attacks through additive noise. Journal of the American Statistical Association 95: 720–729.

    Google Scholar 

  • Fienberg S.E., Makov U.E., and Steele R.J. 1998. Disclosure limitation using perturbation and related methods for categorical data. Journal of Official Statistics 14: 485–502.

    Google Scholar 

  • Franconi L. and Stander J. 2003. Spatial and non-spatial model-based protection procedures for the release of business microdata. Statistics and Computing 13: 295–305.

    Google Scholar 

  • Fuller W.A. 1993. Masking procedures for microdata disclosure limitation. Journal of Official Statistics 9: 383–406.

    Google Scholar 

  • Hastie T.J. and Tibshirani R.J. 1990. Generalized Additive Models. Chapman & Hall, New York.

    Google Scholar 

  • Keller-McNulty S. and Unger E.A. 1998. A database system proto-type for remote access to information based on confidential data. Journal of Official Statistics 14: 347–360.

    Google Scholar 

  • Kennickell A.B. 1997. Multiple imputation and disclosure protection: The case of the 1995 Survey of Consumer Finances. In: Alvey W. and Jamerson B. (Eds.), Record Linkage Techniques, 1997, National Academy Press, Washington, DC, pp. 248–267.

    Google Scholar 

  • Mantel H. and Nadon S. 1999. Dummy file creation for the remote access program of the National Population Health Survey. In: Proceedings of the Survey Methods Section of the Statistical Society of Canada, pp. 181–186.

  • Muralidhar K. and Sarathy R. 2003. A theoretical basis for perturbation methods. Statistics and Computing 13: 339–342.

    Google Scholar 

  • Polettini S. 2003. Maximum entropy simulation for microdata protection. Statistics and Computing 13: 307–320.

    Google Scholar 

  • Raghunathan T.E., Reiter J.P., and Rubin D.B. 2003. Multiple imputation for statistical disclosure limitation. Journal of Official Statistics (forthcoming).

  • Reiter J.P. 2002. Satisfying disclosure restrictions with synthetic data sets. Journal of Official Statistics 18: 531–544.

    Google Scholar 

  • Reiter J.P. 2003a. Inference for partially synthetic, public use microdata sets. Survey Methodology ( forthcoming).

  • Reiter J.P. 2003b. Releasing multiply-imputed, synthetic public use microdata: An illustration and empirical study. Tech. Rep., Institute of Statistics and Decision Sciences, Duke University.

  • Rubin D.B. 1993. Discussion: Statistical disclosure limitation. Journal of Official Statistics 9: 462–468.

    Google Scholar 

  • Schouten B. and Cigrang M. 2003. Remote access systems for statistical analysis of microdata. Statistics and Computing 13: 381–389.

    Google Scholar 

  • Venables W.N. and Ripley B.D. 1997. Modern Applied Statistics with S-Plus. Springer-Verlag, New York.

    Google Scholar 

  • Wegman E.J. 1972. Nonparametric probability density estimation. Technometrics 14: 533–546.

    Google Scholar 

  • Willenborg L. and de Waal T. 2001. Elements of Statistical Disclosure Control. Springer-Verlag, New York.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Reiter, J.P. Model Diagnostics for Remote Access Regression Servers. Statistics and Computing 13, 371–380 (2003). https://doi.org/10.1023/A:1025623108012

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1025623108012

Navigation