ABSTRACT
Privacy-preserving record linkage (PPRL) is the process of identifying records that correspond to the same real-world entities across several databases without revealing any sensitive information about these entities. Various techniques have been developed to tackle the problem of PPRL, with the majority of them only considering linking two databases. However, in many real-world applications data from more than two sources need to be linked. In this paper we consider the problem of linking data from three or more sources in an efficient and secure way. We propose a protocol that combines the use of Bloom filters, secure summation, and Dice coefficient similarity calculation with the aim to identify all records held by the different data sources that have a similarity above a certain threshold. Our protocol is secure in that no party learns any sensitive information about the other parties' data, but all parties learn which of their records have a high similarity with records held by the other parties. We evaluate our protocol on a large dataset showing the scalability, linkage quality, and privacy of our protocol.
- P. Christen. Data Matching. Springer, 2012.Google Scholar
- P. Christen and D. Vatsalan. Flexible and extensible generation and corruption of personal data. In ACM CIKM, San Francisco, 2013. Google ScholarDigital Library
- C. Clifton, M. Kantarcioglu, A. Doan, G. Schadow, J. Vaidya, A. Elmagarmid, and D. Suciu. Privacy- preserving data integration and sharing. In ACM SIGMOD Workshop DMKD, Paris, 2004. Google ScholarDigital Library
- E. A. Durham, C. Toth, M. Kuzu, M. Kantarcioglu, Y. Xue, and B. Malin. Composite Bloom filters for secure record linkage. TKDE, 99(PrePrints), 2013.Google Scholar
- M. Kantarcioglu, W. Jiang, and B. Malin. A privacy-preserving framework for integrating person-specific databases. In PSD, Istanbul, 2008. Google ScholarDigital Library
- A. F. Karr, X. Lin, A. P. Sanil, and J. P. Reiter. Analysis of integrated data without data integration. Chance, 17(3):26--29, 2004.Google ScholarCross Ref
- P. Lai, S. Yiu, K. Chow, C. Chong, and L. Hui. An Efficient Bloom filter based Solution for Multiparty Private Matching. In SAM, Las Vegas, 2006.Google Scholar
- N. Mohammed, B. Fung, and M. Debbabi. Anonymity meets game theory: secure data integration with malicious participants. VLDB, 20(4):567--588, 2011. Google ScholarDigital Library
- C. M. O'Keefe, M. Yung, L. Gu, and R. Baxter. Privacy-preserving data linkage protocols. In ACM WPES, Washington DC, 2004. Google ScholarDigital Library
- C. Quantin, H. Bouzelat, F. Allaert, and et al. How to ensure data security of an epidemiological follow-up: quality assessment of an anonymous record linkage procedure. IJMI, 49(1):117--122, 1998.Google Scholar
- R. Schnell, T. Bachteler, and J. Reiher. Privacy-preserving record linkage using Bloom filters. BMC Med Inform Decis Mak, 9(1), 2009.Google Scholar
- D. Vatsalan, P. Christen, C. M. O'Keefe, and V. S. Verykios. An evaluation framework for privacy-preserving record linkage. JPC, 6(1), 2014.Google Scholar
- D. Vatsalan, P. Christen, and V. S. Verykios. A taxonomy of privacy-preserving record linkage techniques. JIS, 38(6):946--969, 2013. Google ScholarDigital Library
Index Terms
- Scalable Privacy-Preserving Record Linkage for Multiple Databases
Recommendations
An iterative two-party protocol for scalable privacy-preserving record linkage
AusDM '12: Proceedings of the Tenth Australasian Data Mining Conference - Volume 134Record linkage is the process of identifying which records in different databases refer to the same real-world entities. When personal details of individuals, such as names and addresses, are used to link databases across different organisations, then ...
An enhanced privacy-preserving record linkage approach for multiple databases
AbstractFor the purpose of research, organizations often need to share and link data that belongs to a single individual while protecting the privacy, which is referred to as privacy preserving record linkage (PPRL). Various approaches have been developed ...
Efficient Multi-party Privacy-Preserving Record Linkage Based on Blockchain
Web Information Systems and ApplicationsAbstractWith the explosive growth of data, it is increasingly important to integrate data. Privacy-preserving record linkage (PPRL) refers to linking multiple data sources, matching the same entity to be shared by all parties, without disclosing other ...
Comments