skip to main content
research-article
Public Access

A Declarative Framework for Linking Entities

Published:18 July 2016Publication History
Skip Abstract Section

Abstract

We introduce and develop a declarative framework for entity linking and, in particular, for entity resolution. As in some earlier approaches, our framework is based on a systematic use of constraints. However, the constraints we adopt are link-to-source constraints, unlike in earlier approaches where source-to-link constraints were used to dictate how to generate links. Our approach makes it possible to focus entirely on the intended properties of the outcome of entity linking, thus separating the constraints from any procedure of how to achieve that outcome. The core language consists of link-to-source constraints that specify the desired properties of a link relation in terms of source relations and built-in predicates such as similarity measures. A key feature of the link-to-source constraints is that they employ disjunction, which enables the declarative listing of all the reasons two entities should be linked. We also consider extensions of the core language that capture collective entity resolution by allowing interdependencies among the link relations.

We identify a class of “good” solutions for entity-linking specifications, which we call maximum-value solutions and which capture the strength of a link by counting the reasons that justify it. We study natural algorithmic problems associated with these solutions, including the problem of enumerating the “good” solutions and the problem of finding the certain links, which are the links that appear in every “good” solution. We show that these problems are tractable for the core language but may become intractable once we allow interdependencies among the link relations. We also make some surprising connections between our declarative framework, which is deterministic, and probabilistic approaches such as ones based on Markov Logic Networks.

References

  1. N. Alur, A. K. Jha, B. Rosen, and T. Skov. 2008. IBM WebSphere QualityStage Methodologies, Standardization, and Matching. Redbooks. http://www.redbooks.ibm.com/redbooks/pdfs/sg247546.pdf.Google ScholarGoogle Scholar
  2. B. Alexe, D. Burdick, M. A. Hernández, G. Koutrika, R. Krishnamurthy, L. Popa, I. R. Stanoi, and R. Wisnesky. 2013. High-level rules for integration and analysis of data: New challenges. In LNCS 8000: In Search of Elegance in the Theory and Practice of Computation. 36--55.Google ScholarGoogle Scholar
  3. A. Arasu, C. Re, and D. Suciu. 2009. Large-scale deduplication with constraints using dedupalog. In ICDE. 952--963. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Arenas, P. Barceló, R. Fagin, and L. Libkin. 2013. Solutions and query rewriting in data exchange. Inf. Comp. 228 (2013), 28--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Arenas, L. E. Bertossi, and J. Chomicki. 1999. Consistent query answers in inconsistent databases. In PODS. 68--79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. L. E. Bertossi, S. Kolahi, and L. V. S. Lakshmanan. 2013. Data cleaning and query answering with matching dependencies and matching functions. Theory Comput. Syst. 52, 3 (2013), 441--482. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. I. Bhattacharya and L. Getoor. 2007. Collective entity resolution in relational data. TKDD 1, 1 (2007). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. Burdick, R. Fagin, Ph. G. Kolaitis, L. Popa, and W.-C. Tan. 2015. A declarative framework for linking entities. In 18th International Conference on Database Theory (ICDT’15). 25--43.Google ScholarGoogle Scholar
  9. D. Burdick, M. A. Hernández, H. Ho, G. Koutrika, R. Krishnamurthy, L. Popa, I. R. Stanoi, S. Vaithyanathan, and S. Das. 2011. Extracting, linking and integrating data from public sources: A financial case study. IEEE Data Eng. Bull. 34, 3 (2011), 60--67.Google ScholarGoogle Scholar
  10. C. R. Chegireddy and H. W. Hamacher. 1987. Algorithms for finding k-best perfect matchings. Discrete Appl. Math. 18, 2 (1987), 155--165.Google ScholarGoogle ScholarCross RefCross Ref
  11. J. Chomicki and J. Marcinkowski. 2005. Minimal-change integrity maintenance using tuple deletions. Inf. Comp. 197 (2005), 90--121. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. P. Christen. 2012. A survey of indexing techniques for scalable record linkage and deduplication. IEEE Trans. Knowl. Data Eng. 24, 9 (2012), 1537--1555. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. X. Dong, A. Y. Halevy, and J. Madhavan. 2005. Reference reconciliation in complex information spaces. In SIGMOD. 85--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Droschinsky, B. Heinemann, N. Kriege, and P. Mutzel. 2014. Enumeration of maximum common subtree isomorphisms with polynomial-delay. In Proceedings of Algorithms and Computation - 25th International Symposium, (ISAAC’14). 81--93.Google ScholarGoogle Scholar
  15. A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios. 2007. Duplicate record detection: A survey. IEEE TKDE 19, 1 (2007), 1--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. Fagin, P. G. Kolaitis, R. J. Miller, and L. Popa. 2005. Data exchange: Semantics and query answering. Theor. Comput. Sci. (TCS) 336, 1 (2005), 89--124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. W. Fan. 2008. Dependencies revisited for improving data quality. In PODS. 159--170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. W. Fan and F. Geerts. 2012. Foundations of Data Quality Management. Morgan & Claypool Publishers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. I. P. Fellegi and A. B. Sunter. 1969. A theory for record linkage. J. Am. Statistical Assoc. 64, 328 (1969), 1183--1210.Google ScholarGoogle ScholarCross RefCross Ref
  20. H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C.-A. Saita. 2001. Declarative data cleaning: Language, model, and algorithms. In VLDB. 371--380. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. V. Ganti and A. Das Sarma. 2013. Data Cleaning: A Practical Perspective. Morgan & Claypool Publishers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. L. Getoor and A. Machanavajjhala. 2012. Entity resolution: Theory, practice & open challenges. PVLDB 5, 12 (2012), 2018--2019. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. O. Hassanzadeh, A. Kementsietsidis, L. Lim, R. J. Miller, and M. Wang. 2009. A framework for semantic link discovery over relational data. In CIKM. 1027--1036. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. A. Hernández, G. Koutrika, R. Krishnamurthy, L. Popa, and R. Wisnesky. 2013. HIL: A high-level scripting language for entity integration. In EDBT. 549--560. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. A. Hernández and S. J. Stolfo. 1995. The merge/purge problem for large databases. In SIGMOD. 127--138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. Itai, M. Rodeh, and S. L. Tanimoto. 1978. Some matching problems for bipartite graphs. J. ACM 25, 4 (1978), 517--525. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. D. S. Johnson, C. H. Papadimitriou, and M. Yannakakis. 1988. On generating all maximal independent sets. Inf. Process. Lett. 27, 3 (1988), 119--123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. P. Jonsson and A. A. Krokhin. 2004. Recognizing frozen variables in constraint Satisfaction Problems. Theor. Comput. Sci. (TCS) 329, 1--3 (2004), 93--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. N. Koudas, S. Sarawagi, and D. Srivastava. 2006. Record linkage: Similarity measures and algorithms. In SIGMOD. 802--803. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. K. G. Murty. 1968. An algorithm for ranking all the assignments in order of increasing cost. Oper. Res. 16, 3 (1968), 682--687.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. C. H. Papadimitriou. 1994. Computational Complexity. Addison-Wesley.Google ScholarGoogle Scholar
  32. M. Richardson and P. Domingos. 2006. Markov logic networks. Machine Learn. 62, 1--2 (2006), 107--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. P. Singla and P. Domingos. 2006. Entity resolution with Markov logic. In ICDM. 572--582. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Declarative Framework for Linking Entities

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Database Systems
        ACM Transactions on Database Systems  Volume 41, Issue 3
        August 2016
        247 pages
        ISSN:0362-5915
        EISSN:1557-4644
        DOI:10.1145/2966276
        Issue’s Table of Contents

        Copyright © 2016 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 18 July 2016
        • Accepted: 1 February 2016
        • Revised: 1 December 2015
        • Received: 1 August 2015
        Published in tods Volume 41, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader