Abstract
We introduce and develop a declarative framework for entity linking and, in particular, for entity resolution. As in some earlier approaches, our framework is based on a systematic use of constraints. However, the constraints we adopt are link-to-source constraints, unlike in earlier approaches where source-to-link constraints were used to dictate how to generate links. Our approach makes it possible to focus entirely on the intended properties of the outcome of entity linking, thus separating the constraints from any procedure of how to achieve that outcome. The core language consists of link-to-source constraints that specify the desired properties of a link relation in terms of source relations and built-in predicates such as similarity measures. A key feature of the link-to-source constraints is that they employ disjunction, which enables the declarative listing of all the reasons two entities should be linked. We also consider extensions of the core language that capture collective entity resolution by allowing interdependencies among the link relations.
We identify a class of “good” solutions for entity-linking specifications, which we call maximum-value solutions and which capture the strength of a link by counting the reasons that justify it. We study natural algorithmic problems associated with these solutions, including the problem of enumerating the “good” solutions and the problem of finding the certain links, which are the links that appear in every “good” solution. We show that these problems are tractable for the core language but may become intractable once we allow interdependencies among the link relations. We also make some surprising connections between our declarative framework, which is deterministic, and probabilistic approaches such as ones based on Markov Logic Networks.
- N. Alur, A. K. Jha, B. Rosen, and T. Skov. 2008. IBM WebSphere QualityStage Methodologies, Standardization, and Matching. Redbooks. http://www.redbooks.ibm.com/redbooks/pdfs/sg247546.pdf.Google Scholar
- B. Alexe, D. Burdick, M. A. Hernández, G. Koutrika, R. Krishnamurthy, L. Popa, I. R. Stanoi, and R. Wisnesky. 2013. High-level rules for integration and analysis of data: New challenges. In LNCS 8000: In Search of Elegance in the Theory and Practice of Computation. 36--55.Google Scholar
- A. Arasu, C. Re, and D. Suciu. 2009. Large-scale deduplication with constraints using dedupalog. In ICDE. 952--963. Google ScholarDigital Library
- M. Arenas, P. Barceló, R. Fagin, and L. Libkin. 2013. Solutions and query rewriting in data exchange. Inf. Comp. 228 (2013), 28--51. Google ScholarDigital Library
- M. Arenas, L. E. Bertossi, and J. Chomicki. 1999. Consistent query answers in inconsistent databases. In PODS. 68--79. Google ScholarDigital Library
- L. E. Bertossi, S. Kolahi, and L. V. S. Lakshmanan. 2013. Data cleaning and query answering with matching dependencies and matching functions. Theory Comput. Syst. 52, 3 (2013), 441--482. Google ScholarDigital Library
- I. Bhattacharya and L. Getoor. 2007. Collective entity resolution in relational data. TKDD 1, 1 (2007). Google ScholarDigital Library
- D. Burdick, R. Fagin, Ph. G. Kolaitis, L. Popa, and W.-C. Tan. 2015. A declarative framework for linking entities. In 18th International Conference on Database Theory (ICDT’15). 25--43.Google Scholar
- D. Burdick, M. A. Hernández, H. Ho, G. Koutrika, R. Krishnamurthy, L. Popa, I. R. Stanoi, S. Vaithyanathan, and S. Das. 2011. Extracting, linking and integrating data from public sources: A financial case study. IEEE Data Eng. Bull. 34, 3 (2011), 60--67.Google Scholar
- C. R. Chegireddy and H. W. Hamacher. 1987. Algorithms for finding k-best perfect matchings. Discrete Appl. Math. 18, 2 (1987), 155--165.Google ScholarCross Ref
- J. Chomicki and J. Marcinkowski. 2005. Minimal-change integrity maintenance using tuple deletions. Inf. Comp. 197 (2005), 90--121. Google ScholarDigital Library
- P. Christen. 2012. A survey of indexing techniques for scalable record linkage and deduplication. IEEE Trans. Knowl. Data Eng. 24, 9 (2012), 1537--1555. Google ScholarDigital Library
- X. Dong, A. Y. Halevy, and J. Madhavan. 2005. Reference reconciliation in complex information spaces. In SIGMOD. 85--96. Google ScholarDigital Library
- A. Droschinsky, B. Heinemann, N. Kriege, and P. Mutzel. 2014. Enumeration of maximum common subtree isomorphisms with polynomial-delay. In Proceedings of Algorithms and Computation - 25th International Symposium, (ISAAC’14). 81--93.Google Scholar
- A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios. 2007. Duplicate record detection: A survey. IEEE TKDE 19, 1 (2007), 1--16. Google ScholarDigital Library
- R. Fagin, P. G. Kolaitis, R. J. Miller, and L. Popa. 2005. Data exchange: Semantics and query answering. Theor. Comput. Sci. (TCS) 336, 1 (2005), 89--124. Google ScholarDigital Library
- W. Fan. 2008. Dependencies revisited for improving data quality. In PODS. 159--170. Google ScholarDigital Library
- W. Fan and F. Geerts. 2012. Foundations of Data Quality Management. Morgan & Claypool Publishers. Google ScholarDigital Library
- I. P. Fellegi and A. B. Sunter. 1969. A theory for record linkage. J. Am. Statistical Assoc. 64, 328 (1969), 1183--1210.Google ScholarCross Ref
- H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C.-A. Saita. 2001. Declarative data cleaning: Language, model, and algorithms. In VLDB. 371--380. Google ScholarDigital Library
- V. Ganti and A. Das Sarma. 2013. Data Cleaning: A Practical Perspective. Morgan & Claypool Publishers. Google ScholarDigital Library
- L. Getoor and A. Machanavajjhala. 2012. Entity resolution: Theory, practice & open challenges. PVLDB 5, 12 (2012), 2018--2019. Google ScholarDigital Library
- O. Hassanzadeh, A. Kementsietsidis, L. Lim, R. J. Miller, and M. Wang. 2009. A framework for semantic link discovery over relational data. In CIKM. 1027--1036. Google ScholarDigital Library
- M. A. Hernández, G. Koutrika, R. Krishnamurthy, L. Popa, and R. Wisnesky. 2013. HIL: A high-level scripting language for entity integration. In EDBT. 549--560. Google ScholarDigital Library
- M. A. Hernández and S. J. Stolfo. 1995. The merge/purge problem for large databases. In SIGMOD. 127--138. Google ScholarDigital Library
- A. Itai, M. Rodeh, and S. L. Tanimoto. 1978. Some matching problems for bipartite graphs. J. ACM 25, 4 (1978), 517--525. Google ScholarDigital Library
- D. S. Johnson, C. H. Papadimitriou, and M. Yannakakis. 1988. On generating all maximal independent sets. Inf. Process. Lett. 27, 3 (1988), 119--123. Google ScholarDigital Library
- P. Jonsson and A. A. Krokhin. 2004. Recognizing frozen variables in constraint Satisfaction Problems. Theor. Comput. Sci. (TCS) 329, 1--3 (2004), 93--113. Google ScholarDigital Library
- N. Koudas, S. Sarawagi, and D. Srivastava. 2006. Record linkage: Similarity measures and algorithms. In SIGMOD. 802--803. Google ScholarDigital Library
- K. G. Murty. 1968. An algorithm for ranking all the assignments in order of increasing cost. Oper. Res. 16, 3 (1968), 682--687.Google ScholarDigital Library
- C. H. Papadimitriou. 1994. Computational Complexity. Addison-Wesley.Google Scholar
- M. Richardson and P. Domingos. 2006. Markov logic networks. Machine Learn. 62, 1--2 (2006), 107--136. Google ScholarDigital Library
- P. Singla and P. Domingos. 2006. Entity resolution with Markov logic. In ICDM. 572--582. Google ScholarDigital Library
Index Terms
- A Declarative Framework for Linking Entities
Recommendations
Joint linking of entity and relation for question answering over knowledge graph
AbstractEntity linking and relation linking are two crucial components in many question answering systems over knowledge graphs, which aim to identify the relevant entity or relation mentions in a question and link them to the target entity or relation in ...
Entity Difference Modeling Based Entity Linking for Question Answering over Knowledge Graphs
Natural Language Processing and Chinese ComputingAbstractEntity linking plays a vital role in Question Answering over Knowledge Graphs (KGQA), and the representation of entities is a fundamental component of entity linking for user questions. In order to alleviate the problem of entity descriptions that ...
Leveraging Context Information for Joint Entity and Relation Linking
Web and Big DataAbstractAs an important module in most knowledge base question answering (KBQA) systems, entity and relation linking maps proper nouns and relational phrases to corresponding semantic constructs (entities and relations, respectively) in a given KB. ...
Comments