ABSTRACT
A fruitful direction for future data mining research will be the development of techniques that incorporate privacy concerns. Specifically, we address the following question. Since the primary task in data mining is the development of models about aggregated data, can we develop accurate models without access to precise information in individual data records? We consider the concrete case of building a decision-tree classifier from training data in which the values of individual records have been perturbed. The resulting data records look very different from the original records and the distribution of data values is also very different from the original distribution. While it is not possible to accurately estimate original values in individual data records, we propose a novel reconstruction procedure to accurately estimate the distribution of original data values. By using these reconstructed distributions, we are able to build classifiers whose accuracy is comparable to the accuracy of classifiers built with the original data.
- AC99.M.S. Ackcrman and L. Cranor. Privacy critics: UI components to safeguard users' privacy. In A OM Con#. Human Factors in Computing Systems (CHI'99,), 1999.]] Google ScholarDigital Library
- AGI+92.Rakesh Agrawal, Sakfi Ghosla, Tomasz Imielinski, Bala Iyer, and Arun Swami. An interval tinssifter for database mining applications. In Proc. of the VLDB Conference, pages 560-573, Vancouver, British Columbia, Canada, August 1992.]] Google ScholarDigital Library
- Agr99.Rakesh Agrawal. Data Mining: Crossing the Chasm. In 5th Int'l Con}erence on Knowledge Discovery in Databases and Data Mining, San Diego, California, August 1999. Available from http ://www. almaden, ibm. eom/cs/quese / papers/kdd99_chasm, pp#.]]Google Scholar
- AW89.Nabil R. Adam and John C. Wortman. Securitycontrol methods for statistical databases. A CM Computing Surveys, 21(4):515-556, Dec. 1989.]] Google ScholarDigital Library
- BDF+97.D. Barbara, W. DuMouchel, C. Faloutsos, P. J. Haas, J. M. Hellerstein, Y. Ioatmidis, it. V. Jagadish, T. Johnson, R.Ng, V. Poosala, and K. Sevcik. The New Jersey Data Reduction Report. Data Bngrg. Bull., 20:3-45, Dec. 1997.]]Google Scholar
- Bec80.Leland L. Beck. A security mechanism for statistical databases. A CM TOPS, 5(3):316--338, September 1980.]] Google ScholarDigital Library
- Ben99.Paola Benassi. "IYuste: an online privacy seal program. Comm. A CM, 42(2):56-59, Feb. 1999.]] Google ScholarDigital Library
- BFOS84.L. Breiman, J. H, Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth, Belmont, 1984.]]Google Scholar
- BS97.D. Barbara and M. Sullivan. Quasi cubes: Exploiting approximations in multidimensional databases. SIGMOD Recoed, 26(3):12-17, 1997.]] Google ScholarDigital Library
- CM96.C. Clifton and D. Marks. Security and privacy implications of data mining. In ACId SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, pages 15-19, May 1996.]]Google Scholar
- CO82.F.Y. Chin and G. O#soyoglu. Auditing and infrence control in statistical databases. IEBE Trans. Sof~w. Eng., SE-8(6):113-139, April 1982.]]Google ScholarDigital Library
- Cox80.L.H. Cox. Suppression methodology and statistical disclosure control, or. Am. Star. Assoc., 75(370):377-395, April 1980.]]Google ScholarCross Ref
- Cra46.H. Cramer. Mathematical Methods o{ Statistics. Princeton University Press, 1946.]]Google Scholar
- CRA99a.L.F. Cranor, J. Reagle, and M.S. Ackerman. Beyond concern: Understanding net users' attitudes about online privacy. Technical Report TR 99.4.3, AT&T Labs-Research, April 1999. Available from http://www, research.art, cam/ library/trs/TRs/99/99.4/99.4.3/report, him.]]Google Scholar
- Cra99b.Lorrie Faith Cranor, editor. Special Issue on Internet Privacy. Comm. ACM, 42(3), Feb. 1999.]] Google ScholarDigital Library
- CS76.R. Conway and D. Strip. Selective partial access to a database, in Proc. A CM Annual Con}., pages 85-89, 1976.]] Google ScholarDigital Library
- DDS79.D.E. Denning, P.J. Denning, and M.D. Schwartz. The tracker: A threat to statistical database security. ACM TODS, 4(1):76-96, March 1979.]] Google ScholarDigital Library
- Den80.D.E. Denning. Secure statistical databases with random sample queries. A CM TOPS, 5(3):291- 315, Sept. 1980.]] Google ScholarDigital Library
- Den82.D.E. Denming. Cryptography and Data Security. Addison-Wesley, 1982.]] Google ScholarDigital Library
- Din78.C.T. Dinardo. Computers and Security. AFIPS Press, 1978.]] Google ScholarDigital Library
- DJL79.D. Dobkin, A.K. Jones, and R.J. Lipton. Secure databases: Protection against user influence. ACM TOPS, 4(1):97-106, March 1979.]] Google ScholarDigital Library
- ECB99.V. EstiviU-Castr0 and L. Brankovic. Data swe,ppmg: Balancing privacy against precision in mining for logic rules. In M. Mohania and A.M. Tjoa, editors, Data Warehousing and Knowledge Discovery Da WaK-99, pages 389-398. Springer- Verlag Lecture Notes irt Computer Science 1676, 1999.]] Google ScholarDigital Library
- Eco99.The Economist. The End of Privacy, May 1999.]]Google Scholar
- EHN96.H.W. Engl, M. Hanke, and A. Neubaue. Regularization of Inverse Problems. Kluwer, 1996.]]Google ScholarCross Ref
- eu998.The European Union's Directive on Privacy Protection, October 1998. Available from hetp: I/.... echo. lu/l egal/en/dat aprot/ dSrectiv/direct iv. html.]]Google Scholar
- Fel72.I.P. FeUegi. On the question of statistical confidentiality2# I. Am. Star. Assoc., 67(337):7- 18, March 1972.]]Google ScholarCross Ref
- Fis63.Marek Fisz. Probability #heory and Mathematical Statistics. Wiley, 1963:]]Google Scholar
- FJS97.C. Faloutsos, H.V. Jagadish, and N.D. Sidiropoulos. Recovering information from summary data. In Proc. of the Z3rd fat'{ Conference on Very Large Databases, pages 36-46, Athens, Greece, 1997.]] Google ScholarDigital Library
- GWB97.Inn Goldberg, David Wagner, and Eric Brewer. Privacy-enhancing technologie# for the internet. In IEEE GOMPCON, February 97.]] Google ScholarDigital Library
- HE98.C. Hine and J. Eve. Privacy in 'the marketplace. The ln:ormation Society, L42(2):#6-59, 1998.]]Google Scholar
- HS99.John Hagel and Moxc Singer. Net Worth. Harvard Business School Press, 1999.]]Google Scholar
- LCL85.Chang K. Liew, Uinam J. Choi, and Chung J. Liew. A data distortion by probability distribution. A CM TODS, I0(3):395-411, 1985,]] Google ScholarDigital Library
- LEW99.Tessa Lau, Ores Etzioni, and Daniel S. Weld. Privacy interfaces for information management. Comm. A CM, 42(10):89-94, October 1999.]] Google ScholarDigital Library
- LM99.J.B. Lotspiech and R.J.T. Morris. Method and system for client/server communications with user information revealed as a function of willingness to reveal and whether the information is required. U.S. Patent No. 5913030, June 1999.]]Google Scholar
- LST83.E. Lefons, A. Silvestri, and F. Tangorra. Art analytic approach to statistical databases. In 9th Int. Conf. Very Large Data Bases, pages 260- 274. Morgan Kaufmmm, Oct-Nov 1983.]] Google ScholarDigital Library
- MAR96.Manish Mehta, Rakesh Agrawal, and Jorrna Rissaaen. SLIQ: A fast scalable clasdfier for data mining. :In Proc. of the Fifth Int 'l Conference on B2tending Database Technology (EDBT), Avignon, France, March 1996.]] Google ScholarDigital Library
- MST94.D. Michie, D. J. Spiegelhalter, and (3. (3. Taylor. Machine Learning, Neural and Statiatical Claasificatior# Ellis Horwood, 1994.]] Google ScholarDigital Library
- Off98.Office of the Information and Privacy Commissioner, Ontario. Data Mining: Staking a Claim or, Your Privacy, January 1998. Available from http:{/,w,.ipc,on.ca/ web.#ite, eng/mat t ers / s ttm#pap /papers { dat amine .htm.]]Google Scholar
- Opp97.R. Oppliger. Internet security: Firewalls and beyond. Comm. A CM, 40(5):92-102, May 1997.]] Google ScholarDigital Library
- Qui93.J. Ross Quinlan. C#.5: Programs }or Machine Learn{ng. Morgan Kaufman, 1993.]] Google ScholarDigital Library
- Rei84.Steven P. Reiss. Practical data-swapping: The first steps. ACM TODS, 9(1):20-37, 1984.]] Google ScholarDigital Library
- RG98.A. Rubin and D. Greet. A survey of the world wide web security. IEEE Computer, 31(9):34-41, Sept. 1998.]] Google ScholarDigital Library
- SAM96.John Sharer, Rakesh Agrawal, and Manish Mehta. SPRINT: A scalable parallel classifier for data mining. In Proc. ojf the #2nd lnt'l Conforesee on Very Large Databases, Bombay, India, September 1996.]] Google ScholarDigital Library
- Sho82.A. Shoshani. Statistical databases: Characteristics, problems and some solutions. In Proceedings of the Eighth International Conference on Very Large Databases (VLDB), pages 208-213, Mexico City, Mexico, September 1982.]] Google ScholarDigital Library
- ST90.P.D. Stachour and B.M. Thuraisingham. Design of LDV: A multilevel secure relational database management system. IEEE Trans. Knowledge and Data Eng., 2(2):190--209, 1990.]] Google ScholarDigital Library
- The98.Kurt Thearling. Data mining and privacy: A conflict in making. DS*, March 1998.]]Google Scholar
- Tim97.Time. The Death of Privacy, August 1997.]]Google Scholar
- TYW84.J.F. Traub, Y. Yemini, mad H. Woznaikowski. The statistical security of a statistical database. AGM TOD:?, 9(4):672-679, Dec. 1984.]] Google ScholarDigital Library
- War65.S.L. Warner. Randomized response: A survey technique for eliminating evasive answer bias. J. Am. Star. Assoc., 60(309):63-69, March 1965.]]Google ScholarCross Ref
- Wes98a.A.F. Westin. E-commerce and privacy: What net uzers want. Technical report, Louis Harris & Associates, June 1998. Available from http ://www. pri racy ex change, org/iss/ surveys / ec ommsum, html.]]Google Scholar
- Wes98b.A.F. Westin. Priwcy concerns & consumer choice. Technical report, Louis Harris & Associates, Dec. 1998. Available from http ://www. privacyexchange, org/iss/ surveys/1298#oc, html.]]Google Scholar
- Wes99.A.F. Westin. Freebies and privacy: What net users think. Technical report, Opinion Research Corporation, July 1999. Available from http : //www. privacyexahange, org/iss/ surveys/st990714, html.]]Google Scholar
- Wor.The World Wide Web Consortium. The Plat}orm for Privacy Preference (P3P). Available from http: //www. w3. org/P3P/P3FAQ, html.]]Google Scholar
- YC77.C.T. Yu and F.Y. Chin. A study on the protection of statistical databases. In Proc. A CM glGMOD Int. Conf. Management o} Data, pages 169-181, 1977.]] Google ScholarDigital Library
Index Terms
- Privacy-preserving data mining
Recommendations
Privacy-preserving data mining
A fruitful direction for future data mining research will be the development of techniques that incorporate privacy concerns. Specifically, we address the following question. Since the primary task in data mining is the development of models about ...
Comments