Skip to main content
Top

Hint

Swipe to navigate through the chapters of this book

2023 | OriginalPaper | Chapter

More Sparking Soundex-Based Privacy-Preserving Record Linkage

Authors : Alexandros Karakasidis, Georgia Koloniari

Published in: Algorithmic Aspects of Cloud Computing

Publisher: Springer International Publishing

Abstract

Privacy preserving record linkage refers to the problem of matching records from two or more data holders without revealing any personal identifiers, thus, maintaining the privacy of the individuals described by these records. While parallel processing has been deployed in the context of privacy preserving record linkage for handling big data, in this paper, we further explore parallel methods based on Apache Spark and phonetic codes and propose improvements, which manage to achieve superior performance with respect to time efficiency and privacy characteristics. To support our claims, we provide extensive experimental results and a rigorous discussion.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Bonomi, L., Huang, Y., Ohno-Machado, L.: Privacy challenges and research opportunities for genomic data sharing. Nat. Genet. 52(7), 646–654 (2020) CrossRef Bonomi, L., Huang, Y., Ohno-Machado, L.: Privacy challenges and research opportunities for genomic data sharing. Nat. Genet. 52(7), 646–654 (2020) CrossRef
2.
go back to reference Chen, F., et al.: Perfectly secure and efficient two-party electronic-health-record linkage. IEEE Internet Comput. 22(2), 32–41 (2018) CrossRef Chen, F., et al.: Perfectly secure and efficient two-party electronic-health-record linkage. IEEE Internet Comput. 22(2), 32–41 (2018) CrossRef
6.
go back to reference Durham, E., Kantarcioglu, M., Xue, Y., Toth, C., Kuzu, M., Malin, B., et al.: Composite bloom filters for secure record linkage. IEEE Trans. Knowl. Data Eng. 26(12), 2956–2968 (2014) CrossRef Durham, E., Kantarcioglu, M., Xue, Y., Toth, C., Kuzu, M., Malin, B., et al.: Composite bloom filters for secure record linkage. IEEE Trans. Knowl. Data Eng. 26(12), 2956–2968 (2014) CrossRef
7.
go back to reference Essex, A.: Secure approximate string matching for privacy-preserving record linkage. IEEE Trans. Inf. Forensics Secur. 14(10), 2623–2632 (2019) CrossRef Essex, A.: Secure approximate string matching for privacy-preserving record linkage. IEEE Trans. Inf. Forensics Secur. 14(10), 2623–2632 (2019) CrossRef
8.
go back to reference Franke, M., Sehili, Z., Rahm, E.: Parallel privacy-preserving record linkage using LSH-based blocking. In: 3rd International Conference on Internet of Things, Big Data and Security, pp. 195–203. SciTePress (2018) Franke, M., Sehili, Z., Rahm, E.: Parallel privacy-preserving record linkage using LSH-based blocking. In: 3rd International Conference on Internet of Things, Big Data and Security, pp. 195–203. SciTePress (2018)
9.
go back to reference Franke, M., Sehili, Z., Rohde, F., Rahm, E.: Evaluation of hardening techniques for privacy-preserving record linkage. In: 24th International Conference on Extending Database Technology, pp. 289–300. OpenProceedings.org (2021) Franke, M., Sehili, Z., Rohde, F., Rahm, E.: Evaluation of hardening techniques for privacy-preserving record linkage. In: 24th International Conference on Extending Database Technology, pp. 289–300. OpenProceedings.org (2021)
10.
go back to reference Gkoulalas-Divanis, A., Vatsalan, D., Karapiperis, D., Kantarcioglu, M.: Modern privacy-preserving record linkage techniques: An overview. IEEE Trans. Inf. Forensics Secur. 16, 4966–4987 (2021) CrossRef Gkoulalas-Divanis, A., Vatsalan, D., Karapiperis, D., Kantarcioglu, M.: Modern privacy-preserving record linkage techniques: An overview. IEEE Trans. Inf. Forensics Secur. 16, 4966–4987 (2021) CrossRef
11.
go back to reference Goodrich, M.T.: The mastermind attack on genomic data. In: 30th IEEE Symposium on Security and Privacy, pp. 204–218. IEEE Computer Society (2009) Goodrich, M.T.: The mastermind attack on genomic data. In: 30th IEEE Symposium on Security and Privacy, pp. 204–218. IEEE Computer Society (2009)
13.
go back to reference Karakasidis, A., Koloniari, G., Verykios, V.S.: Scalable blocking for privacy preserving record linkage. In: The 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 527–536. ACM (2015) Karakasidis, A., Koloniari, G., Verykios, V.S.: Scalable blocking for privacy preserving record linkage. In: The 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 527–536. ACM (2015)
14.
go back to reference Karakasidis, A., Verykios, V.S.: Privacy preserving record linkage using phonetic codes. In: Fourth Balkan Conference in Informatics, pp. 101–106. IEEE Computer Society (2009) Karakasidis, A., Verykios, V.S.: Privacy preserving record linkage using phonetic codes. In: Fourth Balkan Conference in Informatics, pp. 101–106. IEEE Computer Society (2009)
16.
go back to reference Karapiperis, D., Verykios, V.S.: A distributed near-optimal LSH-based framework for privacy-preserving record linkage. Comput. Sci. Inf. Syst. 11(2), 745–763 (2014) CrossRef Karapiperis, D., Verykios, V.S.: A distributed near-optimal LSH-based framework for privacy-preserving record linkage. Comput. Sci. Inf. Syst. 11(2), 745–763 (2014) CrossRef
17.
go back to reference Kolb, L., Thor, A., Rahm, E.: Dedoop: efficient deduplication with hadoop. Proceed. VLDB Endow. 5(12), 1878–1881 (2012) CrossRef Kolb, L., Thor, A., Rahm, E.: Dedoop: efficient deduplication with hadoop. Proceed. VLDB Endow. 5(12), 1878–1881 (2012) CrossRef
18.
go back to reference Koneru, K., Varol, C.: Privacy preserving record linkage using metasoundex algorithm. In: 16th IEEE International Conference on Machine Learning and Applications, pp. 443–447. IEEE (2017) Koneru, K., Varol, C.: Privacy preserving record linkage using metasoundex algorithm. In: 16th IEEE International Conference on Machine Learning and Applications, pp. 443–447. IEEE (2017)
19.
go back to reference Mullaymeri, X., Karakasidis, A.: Using fuzzy vaults for privacy preserving record linkage. In: The 23rd International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data. CEUR Workshop Proceedings, vol. 2840, pp. 101–110. CEUR-WS.org (2021) Mullaymeri, X., Karakasidis, A.: Using fuzzy vaults for privacy preserving record linkage. In: The 23rd International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data. CEUR Workshop Proceedings, vol. 2840, pp. 101–110. CEUR-WS.org (2021)
20.
go back to reference Odell, M., Russell, R.: The soundex coding system. US Patents 1261167 (1918) Odell, M., Russell, R.: The soundex coding system. US Patents 1261167 (1918)
21.
go back to reference Philips, L.: Hanging on the metaphone. Comput. Lang. 7(12), 39–43 (1990) Philips, L.: Hanging on the metaphone. Comput. Lang. 7(12), 39–43 (1990)
22.
go back to reference Pita, R., Pinto, C., Melo, P., Silva, M., Barreto, M., Rasella, D.: A spark-based workflow for probabilistic record linkage of healthcare data. In: Proceedings of the Workshops of the EDBT/ICDT 2015 Joint Conference. CEUR Workshop Proceedings, vol. 1330, pp. 17–26. CEUR-WS.org (2015) Pita, R., Pinto, C., Melo, P., Silva, M., Barreto, M., Rasella, D.: A spark-based workflow for probabilistic record linkage of healthcare data. In: Proceedings of the Workshops of the EDBT/ICDT 2015 Joint Conference. CEUR Workshop Proceedings, vol. 1330, pp. 17–26. CEUR-WS.org (2015)
24.
go back to reference Rao, F., Cao, J., Bertino, E., Kantarcioglu, M.: Hybrid private record linkage: Separating differentially private synopses from matching records. ACM Trans. Priv. Secur. 22(3), 1–36 (2019) Rao, F., Cao, J., Bertino, E., Kantarcioglu, M.: Hybrid private record linkage: Separating differentially private synopses from matching records. ACM Trans. Priv. Secur. 22(3), 1–36 (2019)
25.
go back to reference Saleem, A., Khan, A., Shahid, F., Alam, M., Khan, M.K.: Recent advancements in garbled computing: How far have we come towards achieving secure, efficient and reusable garbled circuits. J. Netw. Comput. Appl. 108, 1–19 (2018) CrossRef Saleem, A., Khan, A., Shahid, F., Alam, M., Khan, M.K.: Recent advancements in garbled computing: How far have we come towards achieving secure, efficient and reusable garbled circuits. J. Netw. Comput. Appl. 108, 1–19 (2018) CrossRef
26.
go back to reference Salloum, S., Dautov, R., Chen, X., Peng, P.X., Huang, J.Z.: Big data analytics on apache spark. Int. J. Data Sci. Anal. 1(3–4), 145–164 (2016) CrossRef Salloum, S., Dautov, R., Chen, X., Peng, P.X., Huang, J.Z.: Big data analytics on apache spark. Int. J. Data Sci. Anal. 1(3–4), 145–164 (2016) CrossRef
27.
go back to reference Scannapieco, M., Figotin, I., Bertino, E., Elmagarmid, A.K.: Privacy preserving schema and data matching. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 653–664. ACM (2007) Scannapieco, M., Figotin, I., Bertino, E., Elmagarmid, A.K.: Privacy preserving schema and data matching. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 653–664. ACM (2007)
28.
go back to reference Schnell, R., Bachteler, T., Reiher, J.: Privacy-preserving record linkage using bloom filters. BMC Med. Inform. Decis. Mak. 9, 41 (2009) CrossRef Schnell, R., Bachteler, T., Reiher, J.: Privacy-preserving record linkage using bloom filters. BMC Med. Inform. Decis. Mak. 9, 41 (2009) CrossRef
29.
go back to reference Shanahan, J.G., Dai, L.: Large scale distributed data science using apache spark. In: The 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2323–2324. ACM (2015) Shanahan, J.G., Dai, L.: Large scale distributed data science using apache spark. In: The 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2323–2324. ACM (2015)
30.
go back to reference Smith, D.: Secure pseudonymisation for privacy-preserving probabilistic record linkage. J. Inf. Secur. Appl. 34, 271–279 (2017) Smith, D.: Secure pseudonymisation for privacy-preserving probabilistic record linkage. J. Inf. Secur. Appl. 34, 271–279 (2017)
32.
go back to reference Vidanage, A., Ranbaduge, T., Christen, P., Schnell, R.: A taxonomy of attacks on privacy-preserving record linkage. J. Priv. Confidentiality 12(1), jpc.764 (2022) Vidanage, A., Ranbaduge, T., Christen, P., Schnell, R.: A taxonomy of attacks on privacy-preserving record linkage. J. Priv. Confidentiality 12(1), jpc.764 (2022)
33.
go back to reference Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: 2nd USENIX Workshop on Hot Topics in Cloud Computing, HotCloud 2010. USENIX Association (2010) Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: 2nd USENIX Workshop on Hot Topics in Cloud Computing, HotCloud 2010. USENIX Association (2010)
Metadata
Title
More Sparking Soundex-Based Privacy-Preserving Record Linkage
Authors
Alexandros Karakasidis
Georgia Koloniari
Copyright Year
2023
DOI
https://doi.org/10.1007/978-3-031-33437-5_5