Skip to main content

2023 | OriginalPaper | Buchkapitel

On Computing the Jaro Similarity Between Two Strings

verfasst von : Joyanta Basak, Ahmed Soliman, Nachiket Deo, Kenneth Haase, Anup Mathur, Krista Park, Rebecca Steorts, Daniel Weinberg, Sartaj Sahni, Sanguthevar Rajasekaran

Erschienen in: Bioinformatics Research and Applications

Verlag: Springer Nature Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Jaro similarity is widely used in computing the similarity (or distance) between two strings of characters. For example, record linkage is an application of great interest in many domains for which Jaro similarity is popularly employed. Existing algorithms for computing the Jaro similarity between two given strings take quadratic time in the worst case. In this paper, we present an algorithm for Jaro similarity computation that takes only linear time. We also present experimental results that reveal that our algorithm outperforms existing algorithms.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Basak, J., Soliman, A., Deo, N., Rajasekaran, S.: SuperBlocking: an efficient blocking technique for record linkage, manuscript (2023) Basak, J., Soliman, A., Deo, N., Rajasekaran, S.: SuperBlocking: an efficient blocking technique for record linkage, manuscript (2023)
2.
Zurück zum Zitat Clark, D.E.: Practical introduction to record linkage for injury research. Injury Prevention BMJ J. 10(3), 186–191 (2004)CrossRef Clark, D.E.: Practical introduction to record linkage for injury research. Injury Prevention BMJ J. 10(3), 186–191 (2004)CrossRef
4.
Zurück zum Zitat Horowitz, E., Sahni, S., Rajasekaran, S.: Computer Algorithms. Silicon Press (2008) Horowitz, E., Sahni, S., Rajasekaran, S.: Computer Algorithms. Silicon Press (2008)
6.
Zurück zum Zitat Maizlish, N., Herrera, L.: A record linkage protocol for a diabetes registry at ethnically diverse community health centers. J. Am. Med. Inform. Assoc. 12, 331–337 (2005)CrossRefPubMedPubMedCentral Maizlish, N., Herrera, L.: A record linkage protocol for a diabetes registry at ethnically diverse community health centers. J. Am. Med. Inform. Assoc. 12, 331–337 (2005)CrossRefPubMedPubMedCentral
7.
Zurück zum Zitat Papadakis, G., Ioannou, E., Thanos, E., Palpanas, T.: The four generations of entity resolution. Synthesis Lectures Data Manage. 16, 1–170 (2021)CrossRef Papadakis, G., Ioannou, E., Thanos, E., Palpanas, T.: The four generations of entity resolution. Synthesis Lectures Data Manage. 16, 1–170 (2021)CrossRef
8.
Zurück zum Zitat Saeedi, A., Peukert, E., Rahm, E.: Using link features for entity clustering in knowledge graphs. The Semantic Web: 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, Proceedings 15, pp. 576–592 (2018) Saeedi, A., Peukert, E., Rahm, E.: Using link features for entity clustering in knowledge graphs. The Semantic Web: 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, Proceedings 15, pp. 576–592 (2018)
9.
Zurück zum Zitat Soliman, A., Rajasekaran, S.: FIRLA: a Fast Incremental Record Linkage Algorithm. J. Biomed. Inform. 130, 104094 (2022)CrossRefPubMed Soliman, A., Rajasekaran, S.: FIRLA: a Fast Incremental Record Linkage Algorithm. J. Biomed. Inform. 130, 104094 (2022)CrossRefPubMed
10.
Zurück zum Zitat Soliman, A., Rajasekaran, S.: A Novel String Map-Based Approach for Distance Calculations with Applications to Faster Record Linkage, manuscript (2023) Soliman, A., Rajasekaran, S.: A Novel String Map-Based Approach for Distance Calculations with Applications to Faster Record Linkage, manuscript (2023)
11.
Zurück zum Zitat Winkler, W.E.: String comparator metrics and enhanced decision rules in the fellegi-sunter model of record linkage. In: Proceedings of the Section on Survey Research Methods, American Statistical Association: 354–359 (1990) Winkler, W.E.: String comparator metrics and enhanced decision rules in the fellegi-sunter model of record linkage. In: Proceedings of the Section on Survey Research Methods, American Statistical Association: 354–359 (1990)
12.
Zurück zum Zitat Winkler, W.E.: Overview of Record Linkage and Current Research Directions, Research Report Series, Statistical Research Division, U.S. Census Bureau, Washington, DC 20233 (2006) Winkler, W.E.: Overview of Record Linkage and Current Research Directions, Research Report Series, Statistical Research Division, U.S. Census Bureau, Washington, DC 20233 (2006)
Metadaten
Titel
On Computing the Jaro Similarity Between Two Strings
verfasst von
Joyanta Basak
Ahmed Soliman
Nachiket Deo
Kenneth Haase
Anup Mathur
Krista Park
Rebecca Steorts
Daniel Weinberg
Sartaj Sahni
Sanguthevar Rajasekaran
Copyright-Jahr
2023
Verlag
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-99-7074-2_3

Premium Partner