Skip to main content
Top
Published in: Journal of Intelligent Information Systems 2/2019

29-01-2019

Incremental entity resolution process over query results for data integration systems

Authors: Priscilla Kelly Machado Vieira, Bernadette Farias Lóscio, Ana Carolina Salgado

Published in: Journal of Intelligent Information Systems | Issue 2/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Entity Resolution (ER) in data integration systems is the problem of identifying groups of tuples from one or multiple data sources that represent the same real-world entity. This is a crucial stage of data integration processes, which often need to integrate data at query-time. This task becomes even more challenging in scenarios with dynamic data sources or when a large volume of data needs to be integrated. Then, to deal with large volumes of data, new ER solutions have been proposed. One possible approach consists in performing the ER process over query results rather than in the whole set of tuples being integrated. Additionally, previous results of ER tasks can be reused in order to reduce the number of comparisons between pairs of tuples at query-time. In a similar way, indexing techniques can also be employed to help the identification of equivalent tuples and to reduce the number of comparisons between pairs of tuples. In this context, this work proposes an incremental ER process over query results. The contributions of this work are the specification, the implementation and the evaluation of the proposed incremental process. We performed some experiments and we concluded that the incremental ER at query-time is more efficient than traditional ER processes.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Bellahsene, Z., Bonifati, A., Rahm, E. (2011). Schema matching and mapping, 1st edn. Heidelberg: Springer.CrossRefMATH Bellahsene, Z., Bonifati, A., Rahm, E. (2011). Schema matching and mapping, 1st edn. Heidelberg: Springer.CrossRefMATH
go back to reference Benjelloun, O., Garcia-Molina, H., Menestrina, D., Su, Q., Whang, S.E., Widom, J. (2009). Swoosh: a generic approach to entity resolution. The VLDB Journal, 18(1), 255–276.CrossRef Benjelloun, O., Garcia-Molina, H., Menestrina, D., Su, Q., Whang, S.E., Widom, J. (2009). Swoosh: a generic approach to entity resolution. The VLDB Journal, 18(1), 255–276.CrossRef
go back to reference Bhattacharya, I., & Getoor, L. (2007). Query-time entity resolution. Journal of Artificial Intelligence Research (JAIR), 30, 621–657.CrossRefMATH Bhattacharya, I., & Getoor, L. (2007). Query-time entity resolution. Journal of Artificial Intelligence Research (JAIR), 30, 621–657.CrossRefMATH
go back to reference Bhattacharya, I., & Getoor, L. (2006). Entity Resolution in Graphs, (pp. 311–344). New York: Wiley. Bhattacharya, I., & Getoor, L. (2006). Entity Resolution in Graphs, (pp. 311–344). New York: Wiley.
go back to reference Christen, P. (2012). Data matching - concepts and techniques for record linkage, entity resolution, and duplicate detection. Data-centric systems and applications. Berlin: Springer. Christen, P. (2012). Data matching - concepts and techniques for record linkage, entity resolution, and duplicate detection. Data-centric systems and applications. Berlin: Springer.
go back to reference Day, W.H., & Edelsbrunner, H. (1984). Efficient algorithms for agglomerative hierarchical clustering methods. Journal of Classification, 1(1), 7–24.CrossRefMATH Day, W.H., & Edelsbrunner, H. (1984). Efficient algorithms for agglomerative hierarchical clustering methods. Journal of Classification, 1(1), 7–24.CrossRefMATH
go back to reference Mamun, A.A., Mi, T., Aseltine, R., Rajasekaran, S. (2013). Efficient sequential and parallel algorithms for record linkage. Journal of the American Medical Informatics Association, 21(2), 252–262.CrossRef Mamun, A.A., Mi, T., Aseltine, R., Rajasekaran, S. (2013). Efficient sequential and parallel algorithms for record linkage. Journal of the American Medical Informatics Association, 21(2), 252–262.CrossRef
go back to reference Otero-Cerdeira, L., Rodríguez-Martínez, F.J., Gómez-Rodríguez, A. (2014). Ontology matching: a literature review. Expert Systems with Applications, 42, 949–971.CrossRef Otero-Cerdeira, L., Rodríguez-Martínez, F.J., Gómez-Rodríguez, A. (2014). Ontology matching: a literature review. Expert Systems with Applications, 42, 949–971.CrossRef
go back to reference Su, W., Wang, J., Lochovsky, F.H., Society, I.C. (2010). Record matching over query results from multiple web databases. IEEE Transactions on Knowledge and Data Engineering, 22(4), 578–589.CrossRef Su, W., Wang, J., Lochovsky, F.H., Society, I.C. (2010). Record matching over query results from multiple web databases. IEEE Transactions on Knowledge and Data Engineering, 22(4), 578–589.CrossRef
go back to reference Whang, S.E., & Garcia-Molina, H. (2010). Entity resolution with evolving rules. Proceedings of the VLDB Endowment, 3(1–2), 1326–1337.CrossRef Whang, S.E., & Garcia-Molina, H. (2010). Entity resolution with evolving rules. Proceedings of the VLDB Endowment, 3(1–2), 1326–1337.CrossRef
Metadata
Title
Incremental entity resolution process over query results for data integration systems
Authors
Priscilla Kelly Machado Vieira
Bernadette Farias Lóscio
Ana Carolina Salgado
Publication date
29-01-2019
Publisher
Springer US
Published in
Journal of Intelligent Information Systems / Issue 2/2019
Print ISSN: 0925-9902
Electronic ISSN: 1573-7675
DOI
https://doi.org/10.1007/s10844-019-00544-1

Other articles of this Issue 2/2019

Journal of Intelligent Information Systems 2/2019 Go to the issue

Premium Partner