Skip to main content
Erschienen in: Journal of Intelligent Information Systems 2/2019

29.01.2019

Incremental entity resolution process over query results for data integration systems

verfasst von: Priscilla Kelly Machado Vieira, Bernadette Farias Lóscio, Ana Carolina Salgado

Erschienen in: Journal of Intelligent Information Systems | Ausgabe 2/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Entity Resolution (ER) in data integration systems is the problem of identifying groups of tuples from one or multiple data sources that represent the same real-world entity. This is a crucial stage of data integration processes, which often need to integrate data at query-time. This task becomes even more challenging in scenarios with dynamic data sources or when a large volume of data needs to be integrated. Then, to deal with large volumes of data, new ER solutions have been proposed. One possible approach consists in performing the ER process over query results rather than in the whole set of tuples being integrated. Additionally, previous results of ER tasks can be reused in order to reduce the number of comparisons between pairs of tuples at query-time. In a similar way, indexing techniques can also be employed to help the identification of equivalent tuples and to reduce the number of comparisons between pairs of tuples. In this context, this work proposes an incremental ER process over query results. The contributions of this work are the specification, the implementation and the evaluation of the proposed incremental process. We performed some experiments and we concluded that the incremental ER at query-time is more efficient than traditional ER processes.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Bellahsene, Z., Bonifati, A., Rahm, E. (2011). Schema matching and mapping, 1st edn. Heidelberg: Springer.CrossRefMATH Bellahsene, Z., Bonifati, A., Rahm, E. (2011). Schema matching and mapping, 1st edn. Heidelberg: Springer.CrossRefMATH
Zurück zum Zitat Benjelloun, O., Garcia-Molina, H., Menestrina, D., Su, Q., Whang, S.E., Widom, J. (2009). Swoosh: a generic approach to entity resolution. The VLDB Journal, 18(1), 255–276.CrossRef Benjelloun, O., Garcia-Molina, H., Menestrina, D., Su, Q., Whang, S.E., Widom, J. (2009). Swoosh: a generic approach to entity resolution. The VLDB Journal, 18(1), 255–276.CrossRef
Zurück zum Zitat Bhattacharya, I., & Getoor, L. (2007). Query-time entity resolution. Journal of Artificial Intelligence Research (JAIR), 30, 621–657.CrossRefMATH Bhattacharya, I., & Getoor, L. (2007). Query-time entity resolution. Journal of Artificial Intelligence Research (JAIR), 30, 621–657.CrossRefMATH
Zurück zum Zitat Bhattacharya, I., & Getoor, L. (2006). Entity Resolution in Graphs, (pp. 311–344). New York: Wiley. Bhattacharya, I., & Getoor, L. (2006). Entity Resolution in Graphs, (pp. 311–344). New York: Wiley.
Zurück zum Zitat Christen, P. (2012). Data matching - concepts and techniques for record linkage, entity resolution, and duplicate detection. Data-centric systems and applications. Berlin: Springer. Christen, P. (2012). Data matching - concepts and techniques for record linkage, entity resolution, and duplicate detection. Data-centric systems and applications. Berlin: Springer.
Zurück zum Zitat Day, W.H., & Edelsbrunner, H. (1984). Efficient algorithms for agglomerative hierarchical clustering methods. Journal of Classification, 1(1), 7–24.CrossRefMATH Day, W.H., & Edelsbrunner, H. (1984). Efficient algorithms for agglomerative hierarchical clustering methods. Journal of Classification, 1(1), 7–24.CrossRefMATH
Zurück zum Zitat Mamun, A.A., Mi, T., Aseltine, R., Rajasekaran, S. (2013). Efficient sequential and parallel algorithms for record linkage. Journal of the American Medical Informatics Association, 21(2), 252–262.CrossRef Mamun, A.A., Mi, T., Aseltine, R., Rajasekaran, S. (2013). Efficient sequential and parallel algorithms for record linkage. Journal of the American Medical Informatics Association, 21(2), 252–262.CrossRef
Zurück zum Zitat Otero-Cerdeira, L., Rodríguez-Martínez, F.J., Gómez-Rodríguez, A. (2014). Ontology matching: a literature review. Expert Systems with Applications, 42, 949–971.CrossRef Otero-Cerdeira, L., Rodríguez-Martínez, F.J., Gómez-Rodríguez, A. (2014). Ontology matching: a literature review. Expert Systems with Applications, 42, 949–971.CrossRef
Zurück zum Zitat Su, W., Wang, J., Lochovsky, F.H., Society, I.C. (2010). Record matching over query results from multiple web databases. IEEE Transactions on Knowledge and Data Engineering, 22(4), 578–589.CrossRef Su, W., Wang, J., Lochovsky, F.H., Society, I.C. (2010). Record matching over query results from multiple web databases. IEEE Transactions on Knowledge and Data Engineering, 22(4), 578–589.CrossRef
Zurück zum Zitat Whang, S.E., & Garcia-Molina, H. (2010). Entity resolution with evolving rules. Proceedings of the VLDB Endowment, 3(1–2), 1326–1337.CrossRef Whang, S.E., & Garcia-Molina, H. (2010). Entity resolution with evolving rules. Proceedings of the VLDB Endowment, 3(1–2), 1326–1337.CrossRef
Metadaten
Titel
Incremental entity resolution process over query results for data integration systems
verfasst von
Priscilla Kelly Machado Vieira
Bernadette Farias Lóscio
Ana Carolina Salgado
Publikationsdatum
29.01.2019
Verlag
Springer US
Erschienen in
Journal of Intelligent Information Systems / Ausgabe 2/2019
Print ISSN: 0925-9902
Elektronische ISSN: 1573-7675
DOI
https://doi.org/10.1007/s10844-019-00544-1

Weitere Artikel der Ausgabe 2/2019

Journal of Intelligent Information Systems 2/2019 Zur Ausgabe

Premium Partner