skip to main content
10.1145/342009.335438acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article
Free Access

Privacy-preserving data mining

Published:16 May 2000Publication History

ABSTRACT

A fruitful direction for future data mining research will be the development of techniques that incorporate privacy concerns. Specifically, we address the following question. Since the primary task in data mining is the development of models about aggregated data, can we develop accurate models without access to precise information in individual data records? We consider the concrete case of building a decision-tree classifier from training data in which the values of individual records have been perturbed. The resulting data records look very different from the original records and the distribution of data values is also very different from the original distribution. While it is not possible to accurately estimate original values in individual data records, we propose a novel reconstruction procedure to accurately estimate the distribution of original data values. By using these reconstructed distributions, we are able to build classifiers whose accuracy is comparable to the accuracy of classifiers built with the original data.

References

  1. AC99.M.S. Ackcrman and L. Cranor. Privacy critics: UI components to safeguard users' privacy. In A OM Con#. Human Factors in Computing Systems (CHI'99,), 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. AGI+92.Rakesh Agrawal, Sakfi Ghosla, Tomasz Imielinski, Bala Iyer, and Arun Swami. An interval tinssifter for database mining applications. In Proc. of the VLDB Conference, pages 560-573, Vancouver, British Columbia, Canada, August 1992.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Agr99.Rakesh Agrawal. Data Mining: Crossing the Chasm. In 5th Int'l Con}erence on Knowledge Discovery in Databases and Data Mining, San Diego, California, August 1999. Available from http ://www. almaden, ibm. eom/cs/quese / papers/kdd99_chasm, pp#.]]Google ScholarGoogle Scholar
  4. AW89.Nabil R. Adam and John C. Wortman. Securitycontrol methods for statistical databases. A CM Computing Surveys, 21(4):515-556, Dec. 1989.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. BDF+97.D. Barbara, W. DuMouchel, C. Faloutsos, P. J. Haas, J. M. Hellerstein, Y. Ioatmidis, it. V. Jagadish, T. Johnson, R.Ng, V. Poosala, and K. Sevcik. The New Jersey Data Reduction Report. Data Bngrg. Bull., 20:3-45, Dec. 1997.]]Google ScholarGoogle Scholar
  6. Bec80.Leland L. Beck. A security mechanism for statistical databases. A CM TOPS, 5(3):316--338, September 1980.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Ben99.Paola Benassi. "IYuste: an online privacy seal program. Comm. A CM, 42(2):56-59, Feb. 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. BFOS84.L. Breiman, J. H, Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth, Belmont, 1984.]]Google ScholarGoogle Scholar
  9. BS97.D. Barbara and M. Sullivan. Quasi cubes: Exploiting approximations in multidimensional databases. SIGMOD Recoed, 26(3):12-17, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. CM96.C. Clifton and D. Marks. Security and privacy implications of data mining. In ACId SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, pages 15-19, May 1996.]]Google ScholarGoogle Scholar
  11. CO82.F.Y. Chin and G. O#soyoglu. Auditing and infrence control in statistical databases. IEBE Trans. Sof~w. Eng., SE-8(6):113-139, April 1982.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Cox80.L.H. Cox. Suppression methodology and statistical disclosure control, or. Am. Star. Assoc., 75(370):377-395, April 1980.]]Google ScholarGoogle ScholarCross RefCross Ref
  13. Cra46.H. Cramer. Mathematical Methods o{ Statistics. Princeton University Press, 1946.]]Google ScholarGoogle Scholar
  14. CRA99a.L.F. Cranor, J. Reagle, and M.S. Ackerman. Beyond concern: Understanding net users' attitudes about online privacy. Technical Report TR 99.4.3, AT&T Labs-Research, April 1999. Available from http://www, research.art, cam/ library/trs/TRs/99/99.4/99.4.3/report, him.]]Google ScholarGoogle Scholar
  15. Cra99b.Lorrie Faith Cranor, editor. Special Issue on Internet Privacy. Comm. ACM, 42(3), Feb. 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. CS76.R. Conway and D. Strip. Selective partial access to a database, in Proc. A CM Annual Con}., pages 85-89, 1976.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. DDS79.D.E. Denning, P.J. Denning, and M.D. Schwartz. The tracker: A threat to statistical database security. ACM TODS, 4(1):76-96, March 1979.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Den80.D.E. Denning. Secure statistical databases with random sample queries. A CM TOPS, 5(3):291- 315, Sept. 1980.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Den82.D.E. Denming. Cryptography and Data Security. Addison-Wesley, 1982.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Din78.C.T. Dinardo. Computers and Security. AFIPS Press, 1978.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. DJL79.D. Dobkin, A.K. Jones, and R.J. Lipton. Secure databases: Protection against user influence. ACM TOPS, 4(1):97-106, March 1979.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. ECB99.V. EstiviU-Castr0 and L. Brankovic. Data swe,ppmg: Balancing privacy against precision in mining for logic rules. In M. Mohania and A.M. Tjoa, editors, Data Warehousing and Knowledge Discovery Da WaK-99, pages 389-398. Springer- Verlag Lecture Notes irt Computer Science 1676, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Eco99.The Economist. The End of Privacy, May 1999.]]Google ScholarGoogle Scholar
  24. EHN96.H.W. Engl, M. Hanke, and A. Neubaue. Regularization of Inverse Problems. Kluwer, 1996.]]Google ScholarGoogle ScholarCross RefCross Ref
  25. eu998.The European Union's Directive on Privacy Protection, October 1998. Available from hetp: I/.... echo. lu/l egal/en/dat aprot/ dSrectiv/direct iv. html.]]Google ScholarGoogle Scholar
  26. Fel72.I.P. FeUegi. On the question of statistical confidentiality2# I. Am. Star. Assoc., 67(337):7- 18, March 1972.]]Google ScholarGoogle ScholarCross RefCross Ref
  27. Fis63.Marek Fisz. Probability #heory and Mathematical Statistics. Wiley, 1963:]]Google ScholarGoogle Scholar
  28. FJS97.C. Faloutsos, H.V. Jagadish, and N.D. Sidiropoulos. Recovering information from summary data. In Proc. of the Z3rd fat'{ Conference on Very Large Databases, pages 36-46, Athens, Greece, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. GWB97.Inn Goldberg, David Wagner, and Eric Brewer. Privacy-enhancing technologie# for the internet. In IEEE GOMPCON, February 97.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. HE98.C. Hine and J. Eve. Privacy in 'the marketplace. The ln:ormation Society, L42(2):#6-59, 1998.]]Google ScholarGoogle Scholar
  31. HS99.John Hagel and Moxc Singer. Net Worth. Harvard Business School Press, 1999.]]Google ScholarGoogle Scholar
  32. LCL85.Chang K. Liew, Uinam J. Choi, and Chung J. Liew. A data distortion by probability distribution. A CM TODS, I0(3):395-411, 1985,]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. LEW99.Tessa Lau, Ores Etzioni, and Daniel S. Weld. Privacy interfaces for information management. Comm. A CM, 42(10):89-94, October 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. LM99.J.B. Lotspiech and R.J.T. Morris. Method and system for client/server communications with user information revealed as a function of willingness to reveal and whether the information is required. U.S. Patent No. 5913030, June 1999.]]Google ScholarGoogle Scholar
  35. LST83.E. Lefons, A. Silvestri, and F. Tangorra. Art analytic approach to statistical databases. In 9th Int. Conf. Very Large Data Bases, pages 260- 274. Morgan Kaufmmm, Oct-Nov 1983.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. MAR96.Manish Mehta, Rakesh Agrawal, and Jorrna Rissaaen. SLIQ: A fast scalable clasdfier for data mining. :In Proc. of the Fifth Int 'l Conference on B2tending Database Technology (EDBT), Avignon, France, March 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. MST94.D. Michie, D. J. Spiegelhalter, and (3. (3. Taylor. Machine Learning, Neural and Statiatical Claasificatior# Ellis Horwood, 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Off98.Office of the Information and Privacy Commissioner, Ontario. Data Mining: Staking a Claim or, Your Privacy, January 1998. Available from http:{/,w,.ipc,on.ca/ web.#ite, eng/mat t ers / s ttm#pap /papers { dat amine .htm.]]Google ScholarGoogle Scholar
  39. Opp97.R. Oppliger. Internet security: Firewalls and beyond. Comm. A CM, 40(5):92-102, May 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Qui93.J. Ross Quinlan. C#.5: Programs }or Machine Learn{ng. Morgan Kaufman, 1993.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Rei84.Steven P. Reiss. Practical data-swapping: The first steps. ACM TODS, 9(1):20-37, 1984.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. RG98.A. Rubin and D. Greet. A survey of the world wide web security. IEEE Computer, 31(9):34-41, Sept. 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. SAM96.John Sharer, Rakesh Agrawal, and Manish Mehta. SPRINT: A scalable parallel classifier for data mining. In Proc. ojf the #2nd lnt'l Conforesee on Very Large Databases, Bombay, India, September 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Sho82.A. Shoshani. Statistical databases: Characteristics, problems and some solutions. In Proceedings of the Eighth International Conference on Very Large Databases (VLDB), pages 208-213, Mexico City, Mexico, September 1982.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. ST90.P.D. Stachour and B.M. Thuraisingham. Design of LDV: A multilevel secure relational database management system. IEEE Trans. Knowledge and Data Eng., 2(2):190--209, 1990.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. The98.Kurt Thearling. Data mining and privacy: A conflict in making. DS*, March 1998.]]Google ScholarGoogle Scholar
  47. Tim97.Time. The Death of Privacy, August 1997.]]Google ScholarGoogle Scholar
  48. TYW84.J.F. Traub, Y. Yemini, mad H. Woznaikowski. The statistical security of a statistical database. AGM TOD:?, 9(4):672-679, Dec. 1984.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. War65.S.L. Warner. Randomized response: A survey technique for eliminating evasive answer bias. J. Am. Star. Assoc., 60(309):63-69, March 1965.]]Google ScholarGoogle ScholarCross RefCross Ref
  50. Wes98a.A.F. Westin. E-commerce and privacy: What net uzers want. Technical report, Louis Harris & Associates, June 1998. Available from http ://www. pri racy ex change, org/iss/ surveys / ec ommsum, html.]]Google ScholarGoogle Scholar
  51. Wes98b.A.F. Westin. Priwcy concerns & consumer choice. Technical report, Louis Harris & Associates, Dec. 1998. Available from http ://www. privacyexchange, org/iss/ surveys/1298#oc, html.]]Google ScholarGoogle Scholar
  52. Wes99.A.F. Westin. Freebies and privacy: What net users think. Technical report, Opinion Research Corporation, July 1999. Available from http : //www. privacyexahange, org/iss/ surveys/st990714, html.]]Google ScholarGoogle Scholar
  53. Wor.The World Wide Web Consortium. The Plat}orm for Privacy Preference (P3P). Available from http: //www. w3. org/P3P/P3FAQ, html.]]Google ScholarGoogle Scholar
  54. YC77.C.T. Yu and F.Y. Chin. A study on the protection of statistical databases. In Proc. A CM glGMOD Int. Conf. Management o} Data, pages 169-181, 1977.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Privacy-preserving data mining

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            SIGMOD '00: Proceedings of the 2000 ACM SIGMOD international conference on Management of data
            May 2000
            604 pages
            ISBN:1581132174
            DOI:10.1145/342009

            Copyright © 2000 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 16 May 2000

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • Article

            Acceptance Rates

            SIGMOD '00 Paper Acceptance Rate42of248submissions,17%Overall Acceptance Rate785of4,003submissions,20%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader