skip to main content
10.1145/1008694.1008698acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Privacy-preserving data integration and sharing

Published:13 June 2004Publication History

ABSTRACT

Integrating data from multiple sources has been a longstanding challenge in the database community. Techniques such as privacy-preserving data mining promises privacy, but assume data has integration has been accomplished. Data integration methods are seriously hampered by inability to share the data to be integrated. This paper lays out a privacy framework for data integration. Challenges for data integration in the context of this framework are discussed, in the context of existing accomplishments in data integration. Many of these challenges are opportunities for the data mining community.

References

  1. N. R. Adam and J. C. Wortmann, "Security-control methods for statistical databases: A comparative study," ACM Computing Surveys, vol. 21, no. 4, pp. 515--556, Dec. 1989. {Online}. Available: http://doi.acm.org/10.1145/76894.76895 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Agrawal, A. Evfimievski, and R. Srikant, "Information sharing across private databases," in Proceedings of ACM SIGMOD International Conference on Management of Data, San Diego, California, June 9--12 2003. {Online}. Available: http://doi.acm.org/10.1145/872757.872771 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Agrawal, J. Kiernan, R. Srikant, and Y. Xu, "Hippocratic databases," in Proceedings of the 28th International Conference on Very Large Databases, Hong Kong, Aug. 20--23 2002, pp. 143--154. {Online}. Available: http://www.vldb.org/conf/2002/S05P02.pdf Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. J. Atallah, H. G. Elmongui, V. Deshpande, and L. B. Schwarz, "Secure supply-chain protocols," in IEEE International Conference on E-Commerce, Newport Beach, California, June 24--27 2003, pp. 293--302. {Online}. Available: http://ieeexplore.ieee. org/xpl/citationdwnld.jsp?arNumber=1210264Google ScholarGoogle Scholar
  5. S. Castano and V. D. Antonellis, "A schema analysis and reconciliation tool environment," in Proceedings of the Int. Database Engineering and Applications Symposium (IDEAS), 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. D. Chowdhury, G. T. Duncan, R. Krishnan, S. Roehrig, and S. Mukherjee, "Logical vs. numerical inference on statistical databases," in Proceedings of the Twenty-Ninth Hawaii International Conference on System Sciences, Jan. 3--6 1996, pp. 3--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Clifton, E. Housman, and A. Rosenthal, "Experience with a combined approach to attribute-matching across heterogeneous databases," in 7th IFIP 2.6 Working Conference on Database Semantics. Leysin, Switzerland: Chapman & Hall, Oct. 7--10 1997, pp. 428--451.Google ScholarGoogle Scholar
  8. C. Clifton, M. Kantarcioglu, X. Lin, J. Vaidya, and M. Zhu, "Tools for privacy preserving distributed data mining," SIGKDD Explorations, vol. 4, no. 2, pp. 28--34, Jan. 2003. {Online}. Available: http://www.acm.org/sigs/sigkdd/explorations/issue4-2/contents.htm Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. L. H. Cox, "Protecting confidentiality in small population health and environmental statistics," Statistics in Medicine, vol. 15, pp. 1895--1905, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  10. L. Cranor, M. Langheinrich, M. Marchiori, M. Presler-Marshall, and J. Reagle, "The platform for privacy preferences 1.0 (P3P1.0) specification," Apr. 16 2002. {Online}. Available: http://www.w3.org/TR/P3P/Google ScholarGoogle Scholar
  11. D. E. Denning, "Secure statistical databases with random sample queries," ACM Transactions on Database Systems, vol. 5, no. 3, pp. 291--315, Sept. 1980. {Online}. Available: http://doi.acm.org/10.1145/320613.320616 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Doan, P. Domingos, and A. Halevy, "Learning to match the schemas of databases: A multistrategy approach," Machine Learning Journal, vol. 50, pp. 279--301, 2003. {Online}. Available: http://anhai.cs.uiuc.edu/home/papers/lsd-mlj03.pdf Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Dobkin, A. K. Jones, and R. J. Lipton, "Secure databases: Protection against user influence," ACM Transactions on Database Systems, vol. 4, no. 1, pp. 97--106, Mar. 1979. {Online}. Available: http://doi.acm.org/10.1145/320064.320068 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. T. Duncan, S. A. Keller-McNulty, and S. L. Stokes, "Disclosure risk vs. data utility: The r-u confidentiality map," National Institute of Statistical Sciences, Tech. Rep. 121, Dec 2001. {Online}. Available: http://www.niss.org/technicalreports/tr121.pdfGoogle ScholarGoogle Scholar
  15. M. Elfeky, V. Verykios, and A. Elmagarmid, "TAILOR: A record linkage toolbox," in Proceedings of the 18th International Conference on Data Engineering, San Jose, California, Feb. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Hernandez and S. Stolfo, "Real world data is dirty: Data cleansing and the merge/purge problem," Journal of Data Mining and Knowledge Discovery, vol. 2, no. 1, pp. 9--37, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Kantarcioĝlu and C. Clifton, "Assuring privacy when big brother is watching," in The 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD'2003), San Diego, California, June 13 2003. {Online}. Available: http://doi.acm.org/10.1145/882082.882102 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Lewis, "Department of defense appropriations act, 2004," July 17 2003, title VIII section 8120. Enacted as Public Law 108-87. {Online}. Available: http://thomas.loc.gov/cgi-bin/bdquery/z?d108:h.r.02658:Google ScholarGoogle Scholar
  19. W.-S. Li and C. Clifton, "SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks," Data and Knowledge Engineering, vol. 33, no. 1, pp. 49--84, Apr. 2000. {Online}. Available: http://dx.doi.org/10.1016/S0169-023X(99)00044-0 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y. Lindell and B. Pinkas, "Privacy preserving data mining," Journal of Cryptology, vol. 15, no. 3, pp. 177--206, 2002. {Online}. Available: http://www. research.ibm.com/people/l/lindell//id3_abs.htmlGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. G. Marks, "Inference in MLS database systems," IEEE Trans. Knowledge Data Eng., vol. 8, no. 1, Feb. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. G. Miklau and D. Suciu, "Controlling access to published data using cryptography," in Proceedings of 29th International Conference on Very Large Data Bases (VLDB 2003). Berlin, Germany: Morgan-Kaufmann, Sept. 9--12 2003, pp. 898--909. {Online}. Available: http://www.vldb.org/conf/2003/papers/S27P01.pdf Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. E. Rahm and P. Bernstein, "On matching schemas automatically," VLDB Journal, vol. 10, no. 4, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. G. Schadow, S. J. Grannis, and C. J. McDonald, "Privacy-preserving distributed queries for a clinical case research network," in IEEE International Conference on Data Mining Workshop on Privacy, Security, and Data Mining, C. Clifton and V. Estivill-Castro, Eds., vol. 14. Maebashi City, Japan: Australian Computer Society, Dec. 9 2002, pp. 55--65. {Online}. Available: http://crpit.com/Vol14.html Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. Struck, "Don't store my data, Japanese tell government," International Herald Tribune, p. 1, Aug. 24--25 2002.Google ScholarGoogle Scholar
  26. F.-C. Tsui, J. U. Espino, V. M. Dato, P. H. Gesteland, J. Hutman, and M. M. Wagner, "Technical description of RODS: A real-time public health surveillance system," J Am Med Inform Assoc, vol. 10, no. 5, pp. 399--408, Sept. 2003.Google ScholarGoogle ScholarCross RefCross Ref
  27. J. Vaidya and C. Clifton, "Privacy preserving naïve bayes classifier for vertically partitioned data," in 2004 SIAM International Conference on Data Mining, Lake Buena Vista, Florida, Apr. 22--24 2004.Google ScholarGoogle Scholar
  28. V. Verykios, G. Moustakides, and M. Elfeky, "A bayesian decision model for cost optimal record matching," The Very Large Data Bases Journal, vol. 12, no. 1, pp. 28--40, May 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Privacy-preserving data integration and sharing

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in
                • Published in

                  cover image ACM Conferences
                  DMKD '04: Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
                  June 2004
                  85 pages
                  ISBN:158113908X
                  DOI:10.1145/1008694

                  Copyright © 2004 ACM

                  Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 13 June 2004

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • Article

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader