ABSTRACT
Integrating data from multiple sources has been a longstanding challenge in the database community. Techniques such as privacy-preserving data mining promises privacy, but assume data has integration has been accomplished. Data integration methods are seriously hampered by inability to share the data to be integrated. This paper lays out a privacy framework for data integration. Challenges for data integration in the context of this framework are discussed, in the context of existing accomplishments in data integration. Many of these challenges are opportunities for the data mining community.
- N. R. Adam and J. C. Wortmann, "Security-control methods for statistical databases: A comparative study," ACM Computing Surveys, vol. 21, no. 4, pp. 515--556, Dec. 1989. {Online}. Available: http://doi.acm.org/10.1145/76894.76895 Google ScholarDigital Library
- R. Agrawal, A. Evfimievski, and R. Srikant, "Information sharing across private databases," in Proceedings of ACM SIGMOD International Conference on Management of Data, San Diego, California, June 9--12 2003. {Online}. Available: http://doi.acm.org/10.1145/872757.872771 Google ScholarDigital Library
- R. Agrawal, J. Kiernan, R. Srikant, and Y. Xu, "Hippocratic databases," in Proceedings of the 28th International Conference on Very Large Databases, Hong Kong, Aug. 20--23 2002, pp. 143--154. {Online}. Available: http://www.vldb.org/conf/2002/S05P02.pdf Google ScholarDigital Library
- M. J. Atallah, H. G. Elmongui, V. Deshpande, and L. B. Schwarz, "Secure supply-chain protocols," in IEEE International Conference on E-Commerce, Newport Beach, California, June 24--27 2003, pp. 293--302. {Online}. Available: http://ieeexplore.ieee. org/xpl/citationdwnld.jsp?arNumber=1210264Google Scholar
- S. Castano and V. D. Antonellis, "A schema analysis and reconciliation tool environment," in Proceedings of the Int. Database Engineering and Applications Symposium (IDEAS), 1999. Google ScholarDigital Library
- S. D. Chowdhury, G. T. Duncan, R. Krishnan, S. Roehrig, and S. Mukherjee, "Logical vs. numerical inference on statistical databases," in Proceedings of the Twenty-Ninth Hawaii International Conference on System Sciences, Jan. 3--6 1996, pp. 3--10. Google ScholarDigital Library
- C. Clifton, E. Housman, and A. Rosenthal, "Experience with a combined approach to attribute-matching across heterogeneous databases," in 7th IFIP 2.6 Working Conference on Database Semantics. Leysin, Switzerland: Chapman & Hall, Oct. 7--10 1997, pp. 428--451.Google Scholar
- C. Clifton, M. Kantarcioglu, X. Lin, J. Vaidya, and M. Zhu, "Tools for privacy preserving distributed data mining," SIGKDD Explorations, vol. 4, no. 2, pp. 28--34, Jan. 2003. {Online}. Available: http://www.acm.org/sigs/sigkdd/explorations/issue4-2/contents.htm Google ScholarDigital Library
- L. H. Cox, "Protecting confidentiality in small population health and environmental statistics," Statistics in Medicine, vol. 15, pp. 1895--1905, 1996.Google ScholarCross Ref
- L. Cranor, M. Langheinrich, M. Marchiori, M. Presler-Marshall, and J. Reagle, "The platform for privacy preferences 1.0 (P3P1.0) specification," Apr. 16 2002. {Online}. Available: http://www.w3.org/TR/P3P/Google Scholar
- D. E. Denning, "Secure statistical databases with random sample queries," ACM Transactions on Database Systems, vol. 5, no. 3, pp. 291--315, Sept. 1980. {Online}. Available: http://doi.acm.org/10.1145/320613.320616 Google ScholarDigital Library
- A. Doan, P. Domingos, and A. Halevy, "Learning to match the schemas of databases: A multistrategy approach," Machine Learning Journal, vol. 50, pp. 279--301, 2003. {Online}. Available: http://anhai.cs.uiuc.edu/home/papers/lsd-mlj03.pdf Google ScholarDigital Library
- D. Dobkin, A. K. Jones, and R. J. Lipton, "Secure databases: Protection against user influence," ACM Transactions on Database Systems, vol. 4, no. 1, pp. 97--106, Mar. 1979. {Online}. Available: http://doi.acm.org/10.1145/320064.320068 Google ScholarDigital Library
- G. T. Duncan, S. A. Keller-McNulty, and S. L. Stokes, "Disclosure risk vs. data utility: The r-u confidentiality map," National Institute of Statistical Sciences, Tech. Rep. 121, Dec 2001. {Online}. Available: http://www.niss.org/technicalreports/tr121.pdfGoogle Scholar
- M. Elfeky, V. Verykios, and A. Elmagarmid, "TAILOR: A record linkage toolbox," in Proceedings of the 18th International Conference on Data Engineering, San Jose, California, Feb. 2002. Google ScholarDigital Library
- M. Hernandez and S. Stolfo, "Real world data is dirty: Data cleansing and the merge/purge problem," Journal of Data Mining and Knowledge Discovery, vol. 2, no. 1, pp. 9--37, 1998. Google ScholarDigital Library
- M. Kantarcioĝlu and C. Clifton, "Assuring privacy when big brother is watching," in The 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD'2003), San Diego, California, June 13 2003. {Online}. Available: http://doi.acm.org/10.1145/882082.882102 Google ScholarDigital Library
- M. Lewis, "Department of defense appropriations act, 2004," July 17 2003, title VIII section 8120. Enacted as Public Law 108-87. {Online}. Available: http://thomas.loc.gov/cgi-bin/bdquery/z?d108:h.r.02658:Google Scholar
- W.-S. Li and C. Clifton, "SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks," Data and Knowledge Engineering, vol. 33, no. 1, pp. 49--84, Apr. 2000. {Online}. Available: http://dx.doi.org/10.1016/S0169-023X(99)00044-0 Google ScholarDigital Library
- Y. Lindell and B. Pinkas, "Privacy preserving data mining," Journal of Cryptology, vol. 15, no. 3, pp. 177--206, 2002. {Online}. Available: http://www. research.ibm.com/people/l/lindell//id3_abs.htmlGoogle ScholarDigital Library
- D. G. Marks, "Inference in MLS database systems," IEEE Trans. Knowledge Data Eng., vol. 8, no. 1, Feb. 1996. Google ScholarDigital Library
- G. Miklau and D. Suciu, "Controlling access to published data using cryptography," in Proceedings of 29th International Conference on Very Large Data Bases (VLDB 2003). Berlin, Germany: Morgan-Kaufmann, Sept. 9--12 2003, pp. 898--909. {Online}. Available: http://www.vldb.org/conf/2003/papers/S27P01.pdf Google ScholarDigital Library
- E. Rahm and P. Bernstein, "On matching schemas automatically," VLDB Journal, vol. 10, no. 4, 2001. Google ScholarDigital Library
- G. Schadow, S. J. Grannis, and C. J. McDonald, "Privacy-preserving distributed queries for a clinical case research network," in IEEE International Conference on Data Mining Workshop on Privacy, Security, and Data Mining, C. Clifton and V. Estivill-Castro, Eds., vol. 14. Maebashi City, Japan: Australian Computer Society, Dec. 9 2002, pp. 55--65. {Online}. Available: http://crpit.com/Vol14.html Google ScholarDigital Library
- D. Struck, "Don't store my data, Japanese tell government," International Herald Tribune, p. 1, Aug. 24--25 2002.Google Scholar
- F.-C. Tsui, J. U. Espino, V. M. Dato, P. H. Gesteland, J. Hutman, and M. M. Wagner, "Technical description of RODS: A real-time public health surveillance system," J Am Med Inform Assoc, vol. 10, no. 5, pp. 399--408, Sept. 2003.Google ScholarCross Ref
- J. Vaidya and C. Clifton, "Privacy preserving naïve bayes classifier for vertically partitioned data," in 2004 SIAM International Conference on Data Mining, Lake Buena Vista, Florida, Apr. 22--24 2004.Google Scholar
- V. Verykios, G. Moustakides, and M. Elfeky, "A bayesian decision model for cost optimal record matching," The Very Large Data Bases Journal, vol. 12, no. 1, pp. 28--40, May 2003. Google ScholarDigital Library
Index Terms
- Privacy-preserving data integration and sharing
Recommendations
A Privacy Preserving Repository for Data Integration across Data Sharing Services
Current data sharing and integration among various organizations require a central and trusted authority to first collect data from all data sources and then integrate the collected data. This process tends to complicate the update of data and to ...
Trusted third parties for secure and privacy-preserving data integration and sharing in the public sector
dg.o '12: Proceedings of the 13th Annual International Conference on Digital Government ResearchFor public organizations data integration and sharing are important in delivering better services. However, when sensitive data are integrated and shared, privacy protection and information security become key issues. This means that information systems ...
Comments