Skip to main content

2016 | OriginalPaper | Buchkapitel

Private Record Linkage: Comparison of Selected Techniques for Name Matching

verfasst von : Pawel Grzebala, Michelle Cheatham

Erschienen in: The Semantic Web. Latest Advances and New Domains

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The rise of Big Data Analytics has shown the utility of analyzing all aspects of a problem by bringing together disparate data sets. Efficient and accurate private record linkage algorithms are necessary to achieve this. However, records are often linked based on personally identifiable information, and protecting the privacy of individuals is critical. This paper contributes to this field by studying an important component of the private record linkage problem: linking based on names while keeping those names encrypted, both on disk and in memory. We explore the applicability, accuracy and speed of three different primary approaches to this problem (along with several variations) and compare the results to common name-matching metrics on unprotected data. While these approaches are not new, this paper provides a thorough analysis on a range of datasets containing systematically introduced flaws common to name-based data entry, such as typographical errors, optical character recognition errors, and phonetic errors.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
A standard access control system to allow authorized consumers to query the database while preventing unauthorized users from doing so is assumed to be in place.
 
2
Note that this exposes the raw PII values in memory, though only those in the query, not those in every database record.
 
3
In 1990 Lawrence Philips created a phoenetic algorithm called Metaphone that improves upon Soundex by considering numerous situations in which the pronunication of English words differs from what would be anticipated based on their spelling [8]. Metaphone was not considered for this effort because the extensions that it makes beyond Soundex are primarily intended to improve the performance on regular words rather than on names; however, the metric does fit the requirements for use in this application, and will be considered during our future work on this topic.
 
Literatur
1.
Zurück zum Zitat Christen, P.: A comparison of personal name matching: techniques and practical issues. In: Sixth IEEE International Conference on Data Mining Workshops, ICDM Workshops 2006, pp. 290–294. IEEE (2006) Christen, P.: A comparison of personal name matching: techniques and practical issues. In: Sixth IEEE International Conference on Data Mining Workshops, ICDM Workshops 2006, pp. 290–294. IEEE (2006)
2.
Zurück zum Zitat Christen, P.: Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer, Heidelberg (2012) Christen, P.: Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer, Heidelberg (2012)
3.
Zurück zum Zitat Churches, T., Christen, P.: Some methods for blindfolded record linkage. BMC Med. Inform. Decis. Mak. 4(1), 9 (2004)CrossRef Churches, T., Christen, P.: Some methods for blindfolded record linkage. BMC Med. Inform. Decis. Mak. 4(1), 9 (2004)CrossRef
4.
Zurück zum Zitat Dreßler, K., Ngomo, A.C.N.: Time-efficient execution of bounded jaro-winkler distances. In: Proceedings of the 9th International Conference on Ontology Matching, vol. 1317, pp. 37–48. CEUR-WS. org (2014) Dreßler, K., Ngomo, A.C.N.: Time-efficient execution of bounded jaro-winkler distances. In: Proceedings of the 9th International Conference on Ontology Matching, vol. 1317, pp. 37–48. CEUR-WS. org (2014)
5.
Zurück zum Zitat Giereth, M.: On partial encryption of RDF-graphs. In: Gil, Y., Motta, E., Benjamins, V.R., Musen, M.A. (eds.) ISWC 2005. LNCS, vol. 3729, pp. 308–322. Springer, Heidelberg (2005)CrossRef Giereth, M.: On partial encryption of RDF-graphs. In: Gil, Y., Motta, E., Benjamins, V.R., Musen, M.A. (eds.) ISWC 2005. LNCS, vol. 3729, pp. 308–322. Springer, Heidelberg (2005)CrossRef
6.
Zurück zum Zitat Keskustalo, H., Pirkola, A., Visala, K., Leppänen, E., Järvelin, K.: Non-adjacent digrams improve matching of cross-lingual spelling variants. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds.) SPIRE 2003. LNCS, vol. 2857, pp. 252–265. Springer, Heidelberg (2003)CrossRef Keskustalo, H., Pirkola, A., Visala, K., Leppänen, E., Järvelin, K.: Non-adjacent digrams improve matching of cross-lingual spelling variants. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds.) SPIRE 2003. LNCS, vol. 2857, pp. 252–265. Springer, Heidelberg (2003)CrossRef
7.
Zurück zum Zitat Muñoz, J.C., Tamura, G., Villegas, N.M., Müller, H.A.: Surprise: user-controlled granular privacy and security for personal data in smartercontext. In: Proceedings of the 2012 Conference of the Center for Advanced Studies on Collaborative Research, pp. 131–145. IBM Corp. (2012) Muñoz, J.C., Tamura, G., Villegas, N.M., Müller, H.A.: Surprise: user-controlled granular privacy and security for personal data in smartercontext. In: Proceedings of the 2012 Conference of the Center for Advanced Studies on Collaborative Research, pp. 131–145. IBM Corp. (2012)
8.
Zurück zum Zitat Philips, L.: Hanging on the metaphone. Comput. Lang. 7(12) (1990) Philips, L.: Hanging on the metaphone. Comput. Lang. 7(12) (1990)
9.
Zurück zum Zitat Snae, C.: A comparison and analysis of name matching algorithms. Int. J. Appl. Sci. Eng. Technol. 4(1), 252–257 (2007) Snae, C.: A comparison and analysis of name matching algorithms. Int. J. Appl. Sci. Eng. Technol. 4(1), 252–257 (2007)
10.
Zurück zum Zitat Tran, K.N., Vatsalan, D., Christen, P.: Geco: an online personal data generator and corruptor. In: Proceedings of the 22nd ACM International Conference on Conference on Information & Knowledge Management, pp. 2473–2476. ACM (2013) Tran, K.N., Vatsalan, D., Christen, P.: Geco: an online personal data generator and corruptor. In: Proceedings of the 22nd ACM International Conference on Conference on Information & Knowledge Management, pp. 2473–2476. ACM (2013)
11.
Zurück zum Zitat Vatsalan, D., Christen, P., Verykios, V.S.: An efficient two-party protocol for approximate matching in private record linkage. In: Proceedings of the Ninth Australasian Data Mining Conference, vol. 121, pp. 125–136. Australian Computer Society, Inc. (2011) Vatsalan, D., Christen, P., Verykios, V.S.: An efficient two-party protocol for approximate matching in private record linkage. In: Proceedings of the Ninth Australasian Data Mining Conference, vol. 121, pp. 125–136. Australian Computer Society, Inc. (2011)
12.
Zurück zum Zitat Yakout, M., Atallah, M.J., Elmagarmid, A.: Efficient private record linkage. In: IEEE 25th International Conference on Data Engineering, ICDE 2009, pp. 1283–1286. IEEE (2009) Yakout, M., Atallah, M.J., Elmagarmid, A.: Efficient private record linkage. In: IEEE 25th International Conference on Data Engineering, ICDE 2009, pp. 1283–1286. IEEE (2009)
Metadaten
Titel
Private Record Linkage: Comparison of Selected Techniques for Name Matching
verfasst von
Pawel Grzebala
Michelle Cheatham
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-34129-3_36

Neuer Inhalt