Skip to main content
Top
Published in: Knowledge and Information Systems 1/2020

11-06-2019 | Regular Paper

A scalable privacy-preserving framework for temporal record linkage

Authors: Thilina Ranbaduge, Peter Christen

Published in: Knowledge and Information Systems | Issue 1/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Record linkage (RL) is the process of identifying matching records from different databases that refer to the same entity. In many applications, it is common that the attribute values of records that belong to the same entity evolve over time, for example people can change their surname or address. Therefore, to identify the records that refer to the same entity over time, RL should make use of temporal information such as the time-stamp of when a record was created and/or update last. However, if RL needs to be conducted on information about people, due to privacy and confidentiality concerns organisations are often not willing or allowed to share sensitive data in their databases, such as personal medical records or location and financial details, with other organisations. This paper proposes a scalable framework for privacy-preserving temporal record linkage that can link different databases while ensuring the privacy of sensitive data in these databases. We propose two protocols that can be used in different linkage scenarios with and without a third party. Our protocols use Bloom filter encoding which incorporates the temporal information available in records during the linkage process. Our approaches first securely calculate the probabilities of entities changing attribute values in their records over a period of time. Based on these probabilities, we then generate a set of masking Bloom filters to adjust the similarities between record pairs. We provide a theoretical analysis of the complexity and privacy of our techniques and conduct an empirical study on large real databases containing several millions of records. The experimental results show that our approaches can achieve better linkage quality compared to non-temporal PPRL while providing privacy to individuals in the databases that are being linked.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Chiang YH, Doan A, Naughton JF (2014) Modeling entity evolution for temporal record matching. In: ACM SIGMOD, pp 1175–1186 Chiang YH, Doan A, Naughton JF (2014) Modeling entity evolution for temporal record matching. In: ACM SIGMOD, pp 1175–1186
2.
go back to reference Christen P (2012) Data matching–concepts and techniques for record linkage, entity resolution, and duplicate detection. Springer, Berlin Christen P (2012) Data matching–concepts and techniques for record linkage, entity resolution, and duplicate detection. Springer, Berlin
3.
go back to reference Christen P, Gayler RW (2013) Adaptive temporal entity resolution on dynamic databases. In: PAKDD. Springer, pp 558–569 Christen P, Gayler RW (2013) Adaptive temporal entity resolution on dynamic databases. In: PAKDD. Springer, pp 558–569
4.
go back to reference Christen P, Vatsalan D, Wang Q (2015) Efficient entity resolution with adaptive and interactive training data selection. In: IEEE ICDM Christen P, Vatsalan D, Wang Q (2015) Efficient entity resolution with adaptive and interactive training data selection. In: IEEE ICDM
5.
go back to reference Christen P, Schnell R, Vatsalan D, Ranbaduge T (2017a) Efficient cryptanalysis of Bloom filters for privacy-preserving record linkage. In: PAKDD Christen P, Schnell R, Vatsalan D, Ranbaduge T (2017a) Efficient cryptanalysis of Bloom filters for privacy-preserving record linkage. In: PAKDD
6.
go back to reference Christen V, Groß A, Fisher J, Wang Q, Christen P, Rahm E (2017b) Temporal group linkage and evolution analysis for census data. In: EDBT, pp 620–631 Christen V, Groß A, Fisher J, Wang Q, Christen P, Rahm E (2017b) Temporal group linkage and evolution analysis for census data. In: EDBT, pp 620–631
7.
go back to reference Clifton C, Kantarcioglu M, Vaidya J, Lin X, Zhu M (2002) Tools for privacy preserving distributed data mining. SIGKDD Explor 4(2):28–34CrossRef Clifton C, Kantarcioglu M, Vaidya J, Lin X, Zhu M (2002) Tools for privacy preserving distributed data mining. SIGKDD Explor 4(2):28–34CrossRef
8.
go back to reference Durham EA, Toth C, Kuzu M, Kantarcioglu M, Xue Y, Malin B (2013) Composite Bloom filters for secure record linkage. In: TKDE Durham EA, Toth C, Kuzu M, Kantarcioglu M, Xue Y, Malin B (2013) Composite Bloom filters for secure record linkage. In: TKDE
9.
go back to reference Hand D, Christen P (2018) A note on using the F-measure for evaluating record linkage algorithms. Stat Comput 28(3):539–547MathSciNetCrossRef Hand D, Christen P (2018) A note on using the F-measure for evaluating record linkage algorithms. Stat Comput 28(3):539–547MathSciNetCrossRef
10.
go back to reference Hu Y, Wang Q, Vatsalan D, Christen P (2017) Improving temporal record linkage using regression classification. In: PAKDD Hu Y, Wang Q, Vatsalan D, Christen P (2017) Improving temporal record linkage using regression classification. In: PAKDD
11.
go back to reference Inan A, Kantarcioglu M, Ghinita G, Bertino E (2010) Private record matching using differential privacy. In: International conference on extending database technology. ACM, pp 123–134 Inan A, Kantarcioglu M, Ghinita G, Bertino E (2010) Private record matching using differential privacy. In: International conference on extending database technology. ACM, pp 123–134
12.
go back to reference Karakasidis A, Verykios V (2011) Secure blocking+ secure matching= secure record linkage. J Comput Sci Eng 5(3):223–235CrossRef Karakasidis A, Verykios V (2011) Secure blocking+ secure matching= secure record linkage. J Comput Sci Eng 5(3):223–235CrossRef
13.
go back to reference Li F, Lee ML, Hsu W, Tan WC (2015) Linking temporal records for profiling entities. In: ACM SIGMOD, pp 593–605 Li F, Lee ML, Hsu W, Tan WC (2015) Linking temporal records for profiling entities. In: ACM SIGMOD, pp 593–605
14.
go back to reference Li P, Dong XL, Maurino A, Srivastava D (2011) Linking temporal records. VLDB Endowment 4(11):956–967MATH Li P, Dong XL, Maurino A, Srivastava D (2011) Linking temporal records. VLDB Endowment 4(11):956–967MATH
15.
go back to reference Lin HY, Tzeng WG (2005) An efficient solution to the Millionaires’ problem based on homomorphic encryption. In: Applied cryptography and network security. Springer, pp 456–466 Lin HY, Tzeng WG (2005) An efficient solution to the Millionaires’ problem based on homomorphic encryption. In: Applied cryptography and network security. Springer, pp 456–466
16.
go back to reference Lindell Y, Pinkas B (2009) Secure multiparty computation for privacy-preserving data mining. JPC 1(1):5CrossRef Lindell Y, Pinkas B (2009) Secure multiparty computation for privacy-preserving data mining. JPC 1(1):5CrossRef
18.
go back to reference Naehrig M, Lauter K, Vaikuntanathan V (2011) Can homomorphic encryption be practical? In: 3rd ACM workshop on cloud computing security workshop. ACM Naehrig M, Lauter K, Vaikuntanathan V (2011) Can homomorphic encryption be practical? In: 3rd ACM workshop on cloud computing security workshop. ACM
19.
go back to reference Paillier P (1999) Public-key cryptosystems based on composite degree residuosity classes. In: EUROCRYPT. Springer, pp 223–238 Paillier P (1999) Public-key cryptosystems based on composite degree residuosity classes. In: EUROCRYPT. Springer, pp 223–238
20.
go back to reference Ranbaduge T, Christen P (2018) Privacy-preserving temporal record linkage. In: IEEE ICDM, pp 1161–1171 Ranbaduge T, Christen P (2018) Privacy-preserving temporal record linkage. In: IEEE ICDM, pp 1161–1171
21.
go back to reference Ranbaduge T, Vatsalan D, Christen P (2014) Tree based scalable indexing for multi-party privacy-preserving record linkage. In: AusDM, CRPIT 158. Brisbane Ranbaduge T, Vatsalan D, Christen P (2014) Tree based scalable indexing for multi-party privacy-preserving record linkage. In: AusDM, CRPIT 158. Brisbane
22.
go back to reference Ranbaduge T, Vatsalan D, Christen P (2015) Clustering-based scalable indexing for multi-party privacy-preserving record linkage. In: PAKDD’09. Springer LNAI, VietnamCrossRef Ranbaduge T, Vatsalan D, Christen P (2015) Clustering-based scalable indexing for multi-party privacy-preserving record linkage. In: PAKDD’09. Springer LNAI, VietnamCrossRef
23.
go back to reference Randall S, Ferrante A, Boyd J, Semmens J (2013) The effect of data cleaning on record linkage quality. BMC Med Inform Decis Mak 13:64CrossRef Randall S, Ferrante A, Boyd J, Semmens J (2013) The effect of data cleaning on record linkage quality. BMC Med Inform Decis Mak 13:64CrossRef
24.
go back to reference Randall SM, Ferrante AM, Boyd JH, Bauer JK, Semmens JB (2014) Privacy-preserving record linkage on large real world datasets. JBI 50:205 Randall SM, Ferrante AM, Boyd JH, Bauer JK, Semmens JB (2014) Privacy-preserving record linkage on large real world datasets. JBI 50:205
25.
go back to reference Schnell R, Bachteler T, Reiher J (2009) Privacy-preserving record linkage using Bloom filters. BMC Med Inform Decis Mak 9:41CrossRef Schnell R, Bachteler T, Reiher J (2009) Privacy-preserving record linkage using Bloom filters. BMC Med Inform Decis Mak 9:41CrossRef
26.
go back to reference Sweeney L (2002) K-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst 10(5):557–570MathSciNetCrossRef Sweeney L (2002) K-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst 10(5):557–570MathSciNetCrossRef
27.
go back to reference Vatsalan D, Christen P (2012) An iterative two-party protocol for scalable privacy-preserving record linkage. In: AusDM, CRPIT 134. Sydney, Australia Vatsalan D, Christen P (2012) An iterative two-party protocol for scalable privacy-preserving record linkage. In: AusDM, CRPIT 134. Sydney, Australia
28.
go back to reference Vatsalan D, Christen P (2013) Sorted nearest neighborhood clustering for efficient private blocking. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 341–352 Vatsalan D, Christen P (2013) Sorted nearest neighborhood clustering for efficient private blocking. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 341–352
29.
go back to reference Vatsalan D, Christen P (2014) Scalable privacy-preserving record linkage for multiple databases. In: ACM CIKM, pp 1795–1798 Vatsalan D, Christen P (2014) Scalable privacy-preserving record linkage for multiple databases. In: ACM CIKM, pp 1795–1798
30.
31.
go back to reference Vatsalan D, Christen P, Verykios V (2013a) Efficient two-party private blocking based on sorted nearest neighborhood clustering. In: ACM CIKM. San Francisco, pp 1949–1958 Vatsalan D, Christen P, Verykios V (2013a) Efficient two-party private blocking based on sorted nearest neighborhood clustering. In: ACM CIKM. San Francisco, pp 1949–1958
32.
go back to reference Vatsalan D, Christen P, Verykios VS (2013b) A taxonomy of privacy-preserving record linkage techniques. JIS 38(6):946 Vatsalan D, Christen P, Verykios VS (2013b) A taxonomy of privacy-preserving record linkage techniques. JIS 38(6):946
33.
go back to reference Vatsalan D, Sehili Z, Christen P, Rahm E (2017) Privacy-preserving record linkage for big data: current approaches and research challenges. Springer, Berlin, pp 851–895 Vatsalan D, Sehili Z, Christen P, Rahm E (2017) Privacy-preserving record linkage for big data: current approaches and research challenges. Springer, Berlin, pp 851–895
34.
go back to reference Yakout M, Atallah M, Elmagarmid A (2009) Efficient private record linkage. In: IEEE international conference on data engineering, pp 1283–1286 Yakout M, Atallah M, Elmagarmid A (2009) Efficient private record linkage. In: IEEE international conference on data engineering, pp 1283–1286
35.
go back to reference Yao AC (1982) Protocols for secure computations. In: IEEE SFCS Yao AC (1982) Protocols for secure computations. In: IEEE SFCS
36.
go back to reference Yasuda M, Shimoyama T, Kogure J, Yokoyama K, Koshiba T (2015) New packing method in somewhat homomorphic encryption and its applications. Secur Commun Netw 8(13):2194–2213CrossRef Yasuda M, Shimoyama T, Kogure J, Yokoyama K, Koshiba T (2015) New packing method in somewhat homomorphic encryption and its applications. Secur Commun Netw 8(13):2194–2213CrossRef
Metadata
Title
A scalable privacy-preserving framework for temporal record linkage
Authors
Thilina Ranbaduge
Peter Christen
Publication date
11-06-2019
Publisher
Springer London
Published in
Knowledge and Information Systems / Issue 1/2020
Print ISSN: 0219-1377
Electronic ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-019-01370-1

Other articles of this Issue 1/2020

Knowledge and Information Systems 1/2020 Go to the issue

Premium Partner