Abstract
The query programs of certain databases report raw statistics for query sets, which are groups of records specified implicitly by a characteristic formula. The raw statistics include query set size and sums of powers of values in the query set. Many users and designers believe that the individual records will remain confidential as long as query programs refuse to report the statistics of query sets which are too small. It is shown that the compromise of small query sets can in fact almost always be accomplished with the help of characteristic formulas called trackers. Schlörer's individual tracker is reviewed; it is derived from known characteristics of a given individual and permits deducing additional characteristics he may have. The general tracker is introduced: It permits calculating statistics for arbitrary query sets, without requiring preknowledge of anything in the database. General trackers always exist if there are enough distinguishable classes of individuals in the database, in which case the trackers have a simple form. Almost all databases have a general tracker, and general trackers are almost always easy to find. Security is not guaranteed by the lack of a general tracker.
- 1 ASTRAHAN, M.M., ET AL. System R: Relational approach to database management. ACM Trans. Database Syst. 1, 2 {June 1976), 97-137. Google ScholarDigital Library
- 2 CHAMBERLIN, D.D., AND BOYCE, R. SEQUEL: A structured English query language. Proc. ACM SIGMOD Workshop on Data Description, Access, and Control, May 1974, pp. 249-264. Google ScholarDigital Library
- 3 CHIN, F.Y. Security in statistical data bases for queries with small counts. ACM Trans. Database Syst. 3, 1 (March 1978), 92-I04. Google ScholarDigital Library
- 4 DOBKIN, D., JONES, A.K., AND LIPTON, R.J. Secure databases: Protection against user inference. Res. Rep. No. 65, Dept. Comptr. Sci., Yale U,, New Haven, Conn., April 1976. To appear in ACM Trans. Database Syst. Google ScholarDigital Library
- 5 FELLE6{, I.P. On the question of statistical confidentiality. J. Amer. Statist. Assoc. 67, 337 (March 1972), 7-18.Google Scholar
- 6 FELLEGI, I.P., AND PHILLIPS, J. L. Statistical confidentiality: Some theory and applications to data dissemination. Annals Econ. Soc'l Measurement 3, 2 (April 1974), 399-409.Google Scholar
- 7 GAREY, M.R., AND JOHNSON, D. S. Strong NP-completeness results: Motivation, examples, and implications. J. ACM 25, 3(July I978), 499-508. Google ScholarDigital Library
- 8 HANSEN, M.H. Insuring confidentiality of individual records in data storage and retrieval for statistical purposes. Proc. AFIPS 1971 FJCC, Vol. 39, AFIPS Press, Montvale, N.J., pp. 579-585.Google Scholar
- 9 HAQ, M.I. Security in a statistical data base. Proc. Amer. Soc. Inform. Sci. 11 (1974), 33-39.Google Scholar
- 10 HOFFMAN, L.J., AND MILLER, W.F. Getting a personal dossier from a statistical data bank. Datamation16, 5 (May 1970), 74-75.Google Scholar
- 11 KAM, J.B., AND ULLMAN, J.D. A model of statistical databases and their security. ACM Trans. Database Syst. 2, 1 (March 1977), 1-10. Google ScholarDigital Library
- 12 NARGUNDKAR, M.S., AND SAVELAND, W. Random rounding to prevent statistical disclosure. Proc. Amer. Statist. Assoc., Soc. Statistics Sect. (1972), 382-385.Google Scholar
- 13 PALME, J. Software security. Datamation 20, 1 (Jan. 1974), 51-55.Google Scholar
- 14 SCHLORER, J. Identification and retrieval of personal records from a statistical data bank. Methods of Inform. in Medicine 14, 1 (Jan. 1975), 7-I3.Google ScholarCross Ref
- 15 SCHLORZR, J. Confidentiality of statistical records: A threat monitoring scheme for on-line dialogue. Methods of Inform. in Medicine 15, 1 (Jan. 1976), 36-42.Google ScholarCross Ref
- 16 SCHLORER, J. Union tracker and open statistical databases. Rep. TB-IMSD 1/78, Institut ftir Medizinische Statistik und Dokumentation, Universi~t Giessen, June 1978.Google Scholar
- 17 SCHWARTZ, M.D. Inference from statistic,al data bases. Ph.D. Th., Dept. Comptr. Sci., Purdue U., W. Lafayette, Ind., Aug. 1977. Google ScholarDigital Library
- 18 SCHWARTZ, M.D., DENNING, D.E., AND DENNING, P.J. Linear queries in statistical data bases. TR-216, Dept. Comptr. Sci., Purdue, U., W. Lafayette, Ind., Nov. 1976. To appear in ACM Trans. Database Syst. Google ScholarDigital Library
- 19 SCHWARTZ, M.D., DENNING, D.E., AND DENNING, P.J. Securing data bases under linear queries. Information Processing 77, North-Holland Pub. Co., Amsterdam, 1977, pp. 395-398.Google Scholar
- 20 STONEBRAKER, M., WONG, E., KREPS, P., AND HELD, G. The design and implementation of INGRES. ACM Trans. Database Syst. i, 3 {Sept. 1976}, 189-222. Google ScholarDigital Library
- 21 Yu, C.T., AND CHIN, F.Y. A study on the protection of statistical data bases. ACM SIGMOD Conf. Manage. of Data, Toronto, Canada, Aug. 1977, pp. 169-181. Google ScholarDigital Library
- 22 WEIDE, B. A survey of analysis techniques for discrete algorithms. Cornptng. Surveys 9, 4 (Dec. 1977), 291-313. Google ScholarDigital Library
Index Terms
- The tracker: a threat to statistical database security
Recommendations
A fast procedure for finding a tracker in a statistical database
To avoid trivial compromises, most on-line statistical databases refuse to answer queries for statistics about small subgroups. Previous research discovered a powerful snooping tool, the tracker, with which the answers to these unanswerable queries are ...
Security of statistical databases: multidimensional transformation
Statistical evaluation of databases which contain personal records may entail risks for the confidentiality of the individual records. The risk has increased with the availability of flexible interactive evaluation programs which permit the use of ...
Secure statistical databases with random sample queries
A new inference control, called random sample queries, is proposed for safeguarding confidential data in on-line statistical databases. The random sample queries control deals directly with the basic principle of compromise by making it impossible for a ...
Comments