skip to main content
article
Free Access

The use of regression methodology for the compromise of confidential information in statistical databases

Published:01 November 1987Publication History
Skip Abstract Section

Abstract

A regression methodology based technique can be used to compromise confidentiality in a statistical database. This holds true even when the DBMS prevents application of regression methodology to the database. Existing inference controls, including cell restriction, perturbation, and table restriction approaches, are shown to be generally ineffective against this compromise technique. The effect of incomplete supplemental knowledge on the regression methodology based compromise technique is examined. Finally, some potential complicators of this disclosure scheme are introduced.

References

  1. 1 ALAGAR, V. Complexity of compromising statistical databases. Tech. Rep., Dept. of Computer Science, Concordia Univ., Montreal, Nov. 1980.Google ScholarGoogle Scholar
  2. 2 AMERICAN STATISTICAL ASSOCIATION. Report of the ASA Ad Hoc Committee on Privacy and Confidentiality. Am. Stat. 31, 2 (May 1977), 59-78.Google ScholarGoogle Scholar
  3. 3 AMERICAN STATISTICAL ASSOCIATION. Business directories: Findings and recommendations of the ASA Committee on Privacy and Confidentiality. Am. Star. 34, 1 (Feb. 1980), 8-10.Google ScholarGoogle Scholar
  4. 4 BECK, L.L. A security mechanism for statistical databases. ACM Trans. Database Syst. 5, 3 (Sept. 1980), 316-338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5 BUCK, S.F. A method of estimation of missing values in multivariate data suitable for use with an electronic computer. J. Roy. Stat. Soc. Ser. B.22 (1960), 302-307Google ScholarGoogle Scholar
  6. 6 CENSUS OF POPULATION AND HOUSING. 1980: Public-use microdata sample (C sample), New York State {machine-readable data file}. Prepared by the Bureau of the Census, Washington, D.C., 1983.Google ScholarGoogle Scholar
  7. 7 CHIN, F. Y., AND OZSOYO(~LU, G. Auditing and inference control in statistical databases. IEEE Trans. Softw. Eng. SE-8, 6 (Nov. 1982), 574-582.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8 Cox, L. H., AND ERNST, L.R. Controlled Rounding. U.S. Bureau of the Census, Washington, D.C., 1981.Google ScholarGoogle Scholar
  9. 9 DALENIUS, T. The invasion of privacy problem and statistics production--an overview. Statistik Tidskrift 12 (1974), 213-225.Google ScholarGoogle Scholar
  10. 10 DALENIUS, T. Towards a methodology for statistical disclosure control. Statistik Tidskrift 15 (1977), 429-444.Google ScholarGoogle Scholar
  11. 11 DALEN{US, T., AND REINS, S. Data-swapping: A technique for disclosure control. J. Star. Plan. In{erence 6, 1 (Jan. 1982), 73-85.Google ScholarGoogle Scholar
  12. 12 DENNWNG, D. A review of the research of statistical database security, in Foundations o{ Secure Computation, R. DeMillo, D. Dobkin, A. Jones, and R. Lipton, Eds. Academic Press, New York, 1978, 15-25.Google ScholarGoogle Scholar
  13. 13 DENNING, D. AMD SCHLORER, J. Inference controls for statsitical databases. IEEE Computer 16, 7 (July 1983), 69-82.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. 14 DURBIN, J. Statistics and the report of the data protection committee. J. Roy. Star. Soc. Ser. A 142, 3 (!979), 99.q-.~n8Google ScholarGoogle Scholar
  15. 15 IMSL. international Mathematical and Statistical Libraries, subroutine GGNML. IMSL, Houston, Tex.Google ScholarGoogle Scholar
  16. 16 KEMPTHORNE, O. President's Column. Inst. Math. Star. Bull. 14, 4 (May 1985), 165-166.Google ScholarGoogle Scholar
  17. 17 LOYNES, R. M. Discussion of the papers by Professor Dalenius and by Professor Durbin. J. Roy. Star. Soc. Ser. A 142, 3 (1979), 325-326.Google ScholarGoogle Scholar
  18. 18 MEYER, P. L. Introductory Probability and Statistical Applications. 2nd ed. Addison-Wesley, Reading, Mass., 1970.Google ScholarGoogle Scholar
  19. 19 PALLEY, M.A. Security of statistical databases--invasion of privacy through attribute correlational modeling. Ph.D. dissertation, Graduate School of Business Administration, New York Univ., 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. 20 PALLEY, M.A. Security of statistical databases: Compromise through attribute correlational modeling. In Proceedings of the 2nd international Con{erence on Data Engineering {Los Angeles, Calif., 1986), IEEE, New York 1986 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. 21 SCHLORER, J. Disclosure from statistical databases: Quantitative aspects of trackers. ACM Trans. Database Syst. 5, 4 (Dec. 1980), 467-492. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. 22 SHOSHANI, A. Statistical databases: Characteristics, problems, and some solutions. In proceeedings of the 8th International Con{ereru~e on Very Large Data Bases (Mexico City, 1982), 208-222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. 23 SPRUILL, N.L. The confidentiality and analytic usefulness of masked business microdata. In ASA Proceedings Securi~ Surve~ Research Methods. American Statistical Association, Washington, D.C., 1983, (302-607.Google ScholarGoogle Scholar
  24. 24 TRAUB, J. F., WOZNIAKOWSKI, H., AND YEMINI, Y. The statistical security of a statistical database. ACM Trans. Database Syst. 9, 4 (Dec. 1984), 672-679. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. 25 ULLMAN, J.D. Principles of Database Systems. Computer Science Press, Rockville, Md., 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. 26 DENNING, D. Secure statistical databases with random sample queries. A CM Trans. Database Syst. 5, 3 (Sept. 1980), 291-315. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. The use of regression methodology for the compromise of confidential information in statistical databases

                Recommendations

                Reviews

                John A. Sonquist

                Confidentiality in statistical databases can be compromised by intruders using regression methodologies, even when the regression cannot be run on the database itself. Instead, a representative sample of queries yields a synthetic database, which is then analyzed to produce a prediction equation applied back to the main database. Technical controls, such as table size restrictions, are ineffective against the strategy, or have side effects reducing the usability of the data. The problem is real and serious, and has received attention from professional associations. The paper introduces the problem, reviews the use of regression techniques for estimating individual data, and examines the prospects of using several types of controls. The discussion lists database characteristics that increase the difficulty of successful uses of regression: low correlations between database attributes, uniform distributions, and continuous variables. The problem was studied using census data. The issue posed at the end is how to defend against regression-based intrusion strategies without over-restricting legitimate access. One cannot fault the authors for doing a good job on what they set out to do—evaluating specific technological fixes for the problem of the legitimate user who compromises system integrity. However, one wishes they had chosen also to review other technical defenses, such as monitoring types of query patterns and their frequency, forced identification of terminal users, and identifying terminals when they come on line. Administrative defenses are not treated. The real problem is the betrayal of trust by a legitimate user of the data. No combination of technological fixes is likely to solve this problem. Management training procedures, good employee relations, ethical standards supported by professional associations, and other sociological approaches will also be needed if confidentiality in statistical databases is not to be compromised.

                Access critical reviews of Computing literature here

                Become a reviewer for Computing Reviews.

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in

                Full Access

                • Published in

                  cover image ACM Transactions on Database Systems
                  ACM Transactions on Database Systems  Volume 12, Issue 4
                  Dec. 1987
                  172 pages
                  ISSN:0362-5915
                  EISSN:1557-4644
                  DOI:10.1145/32204
                  Issue’s Table of Contents

                  Copyright © 1987 ACM

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 1 November 1987
                  Published in tods Volume 12, Issue 4

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • article

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader