Skip to main content
Erschienen in: Computing 4/2017

07.04.2016

A systematic review and comparative analysis of cross-document coreference resolution methods and tools

verfasst von: Seyed-Mehdi-Reza Beheshti, Boualem Benatallah, Srikumar Venugopal, Seung Hwan Ryu, Hamid Reza Motahari-Nezhad, Wei Wang

Erschienen in: Computing | Ausgabe 4/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Information extraction (IE) is the task of automatically extracting structured information from unstructured/semi-structured machine-readable documents. Among various IE tasks, extracting actionable intelligence from an ever-increasing amount of data depends critically upon cross-document coreference resolution (CDCR) - the task of identifying entity mentions across information sources that refer to the same underlying entity. CDCR is the basis of knowledge acquisition and is at the heart of Web search, recommendations, and analytics. Real time processing of CDCR processes is very important and have various applications in discovering must-know information in real-time for clients in finance, public sector, news, and crisis management. Being an emerging area of research and practice, the reported literature on CDCR challenges and solutions is growing fast but is scattered due to the large space, various applications, and large datasets of the order of peta-/tera-bytes. In order to fill this gap, we provide a systematic review of the state of the art of challenges and solutions for a CDCR process. We identify a set of quality attributes, that have been frequently reported in the context of CDCR processes, to be used as a guide to identify important and outstanding issues for further investigations. Finally, we assess existing tools and techniques for CDCR subtasks and provide guidance on selection of tools and algorithms.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Fußnoten
4
In this example, the annotations have been done using so-called ENAMEX (a user defined element in the XML schema) tags that were developed for the Message Understanding Conference in the 1990s.
 
5
Agglomerative algorithms begin with each element and merge them in successively larger clusters.
 
6
Divisive algorithms begin with the whole set and proceed to divide it into successively smaller clusters.
 
7
An outlier is an observation point that is distant from other observations.
 
Literatur
1.
Zurück zum Zitat McCallum A (2005) Information extraction: distilling structured data from unstructured text. ACM Queue 3(9):48–57CrossRef McCallum A (2005) Information extraction: distilling structured data from unstructured text. ACM Queue 3(9):48–57CrossRef
2.
Zurück zum Zitat Crouch R, van den Berg MH, Salvetti F, Thione GL, Ahn D (2014) Coreference resolution in an ambiguity-sensitive natural language processing system. Google Patent 8,712,758 Crouch R, van den Berg MH, Salvetti F, Thione GL, Ahn D (2014) Coreference resolution in an ambiguity-sensitive natural language processing system. Google Patent 8,712,758
3.
Zurück zum Zitat Bagga A, Baldwin B (1998) Entity-based cross-document coreferencing using the vector space model. In: COLING-ACL, pp 79-85 Bagga A, Baldwin B (1998) Entity-based cross-document coreferencing using the vector space model. In: COLING-ACL, pp 79-85
4.
Zurück zum Zitat Dutta S, Weikum G (2015) Cross-document co-reference resolution using sample-based clustering with knowledge enrichment. Trans Assoc Comput Linguist 3:15–28 Dutta S, Weikum G (2015) Cross-document co-reference resolution using sample-based clustering with knowledge enrichment. Trans Assoc Comput Linguist 3:15–28
5.
Zurück zum Zitat Mayfield J et al (2009) Cross-document coreference resolution: a key technology for learning by reading. In: AAAI’09, pp 65-70 Mayfield J et al (2009) Cross-document coreference resolution: a key technology for learning by reading. In: AAAI’09, pp 65-70
6.
Zurück zum Zitat Vincent Ng, Cardie C (2002) Improving machine learning approaches to coreference resolution. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp 104-111 Vincent Ng, Cardie C (2002) Improving machine learning approaches to coreference resolution. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp 104-111
7.
Zurück zum Zitat Wellner B et al (2004) An integrated, conditional model of information extraction and coreference with application to citation matching. In: UAI’04, pp 593-601. AUAI Press Wellner B et al (2004) An integrated, conditional model of information extraction and coreference with application to citation matching. In: UAI’04, pp 593-601. AUAI Press
8.
Zurück zum Zitat Singhal A (2012) Introducing the knowledge graph: things, not strings. Official Google Blog Singhal A (2012) Introducing the knowledge graph: things, not strings. Official Google Blog
9.
Zurück zum Zitat Elsayed T, Lin JJ, Oard DW (2008) Pairwise document similarity in large collections with mapreduce. In: ACL (short papers), pp 265-268 Elsayed T, Lin JJ, Oard DW (2008) Pairwise document similarity in large collections with mapreduce. In: ACL (short papers), pp 265-268
10.
Zurück zum Zitat Kolb L, Thor A, Rahm E (2012) Dedoop: efficient deduplication with hadoop. Proc VLDB Endow 5(12):1878–1881CrossRef Kolb L, Thor A, Rahm E (2012) Dedoop: efficient deduplication with hadoop. Proc VLDB Endow 5(12):1878–1881CrossRef
11.
Zurück zum Zitat Pantel P, Crestan E, Borkovsky A, Popescu AM, Vyas V (2009) Web-scale distributional similarity and entity set expansion. In: EMNLP, pp 938-947 Pantel P, Crestan E, Borkovsky A, Popescu AM, Vyas V (2009) Web-scale distributional similarity and entity set expansion. In: EMNLP, pp 938-947
12.
Zurück zum Zitat Sarmento L, Kehlenbeck A, Oliveira EC, Ungar LH (2009) An approach to web-scale named-entity disambiguation. In: MLDM, pp 689-703 Sarmento L, Kehlenbeck A, Oliveira EC, Ungar LH (2009) An approach to web-scale named-entity disambiguation. In: MLDM, pp 689-703
13.
Zurück zum Zitat Singh S, Subramanya A, Pereira FCN, McCallum A (2011) Large-scale cross-document coreference using distributed inference and hierarchical models. In: ACL, pp 793-803 Singh S, Subramanya A, Pereira FCN, McCallum A (2011) Large-scale cross-document coreference using distributed inference and hierarchical models. In: ACL, pp 793-803
14.
Zurück zum Zitat Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1):107–113CrossRef Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1):107–113CrossRef
15.
Zurück zum Zitat Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. In: USENIX’10, pp 10-10 Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. In: USENIX’10, pp 10-10
16.
Zurück zum Zitat Barnawi A, Batarfi O, Beheshti SMR, Elshawi R, Nouri R, Sakr S (2014) On characterizing the performance of distributed graph computation platforms. In: TPCTC Barnawi A, Batarfi O, Beheshti SMR, Elshawi R, Nouri R, Sakr S (2014) On characterizing the performance of distributed graph computation platforms. In: TPCTC
17.
Zurück zum Zitat Keele S (2007) Guidelines for performing systematic literature reviews in software engineering. Technical report, Technical report, EBSE Technical Report EBSE-2007-01 Keele S (2007) Guidelines for performing systematic literature reviews in software engineering. Technical report, Technical report, EBSE Technical Report EBSE-2007-01
18.
Zurück zum Zitat Cornolti M, Ferragina P, Ciaramita M (2013) A framework for benchmarking entity-annotation systems. In: WWW’13, pp 249-260 Cornolti M, Ferragina P, Ciaramita M (2013) A framework for benchmarking entity-annotation systems. In: WWW’13, pp 249-260
19.
Zurück zum Zitat Bollacker KD, Evans C, Paritosh P, Sturge T, Taylor J (2008) Freebase: a collaboratively created graph database for structuring human knowledge. In: SIGMOD Conference, pp 1247-1250 Bollacker KD, Evans C, Paritosh P, Sturge T, Taylor J (2008) Freebase: a collaboratively created graph database for structuring human knowledge. In: SIGMOD Conference, pp 1247-1250
20.
Zurück zum Zitat Suchanek FM, Kasneci G, Weikum G (2007) Yago: a core of semantic knowledge. In: WWW, pp 697-706 Suchanek FM, Kasneci G, Weikum G (2007) Yago: a core of semantic knowledge. In: WWW, pp 697-706
21.
Zurück zum Zitat Ah-Pine J, Jacquet G (2009) Clique-based clustering for improving named entity recognition systems. In: EACL, pp 51-59 Ah-Pine J, Jacquet G (2009) Clique-based clustering for improving named entity recognition systems. In: EACL, pp 51-59
22.
Zurück zum Zitat Attardi G, Rossi SD, Simi M (2010) Tanl-1: coreference resolution by parse analysis and similarity clustering. In: SemEval’10, pp 108-111 Attardi G, Rossi SD, Simi M (2010) Tanl-1: coreference resolution by parse analysis and similarity clustering. In: SemEval’10, pp 108-111
23.
Zurück zum Zitat Bengtson E, Roth D (2008) Understanding the value of features for coreference resolution. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’08, pp 294-303 Bengtson E, Roth D (2008) Understanding the value of features for coreference resolution. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’08, pp 294-303
24.
Zurück zum Zitat Bryl V, Giuliano C, Serafini L, Tymoshenko K (2010) Using background knowledge to support coreference resolution. In: ECAI, pp 759-764 Bryl V, Giuliano C, Serafini L, Tymoshenko K (2010) Using background knowledge to support coreference resolution. In: ECAI, pp 759-764
25.
Zurück zum Zitat Chen C, Ng V (2012) Combining the best of two worlds: a hybrid approach to multilingual coreference resolution. EMNLP-CoNLL, p 56 Chen C, Ng V (2012) Combining the best of two worlds: a hybrid approach to multilingual coreference resolution. EMNLP-CoNLL, p 56
26.
Zurück zum Zitat Chen H-H, Ding Y-W, Tsai S-C (1998) Named entity extraction for information retrieval. Comput Process Orient Lang 12(1):75–85 Chen H-H, Ding Y-W, Tsai S-C (1998) Named entity extraction for information retrieval. Comput Process Orient Lang 12(1):75–85
27.
Zurück zum Zitat Elsner M, Charniak E, Johnson M (2009) Structured generative models for unsupervised named-entity clustering. In: HLT-NAACL, pp 164-172 Elsner M, Charniak E, Johnson M (2009) Structured generative models for unsupervised named-entity clustering. In: HLT-NAACL, pp 164-172
28.
Zurück zum Zitat Luo X (2005) On coreference resolution performance metrics. In: HLT’05, pp 25-32 Luo X (2005) On coreference resolution performance metrics. In: HLT’05, pp 25-32
29.
Zurück zum Zitat Màrquez L, Recasens M, Sapena E (2013) Coreference resolution: an empirical study based on semeval-2010 shared task 1. Lang Resour Eval 47(3):661–694CrossRef Màrquez L, Recasens M, Sapena E (2013) Coreference resolution: an empirical study based on semeval-2010 shared task 1. Lang Resour Eval 47(3):661–694CrossRef
30.
Zurück zum Zitat Luisa B, Christian G, Emanuele P (2008) Creating a gold standard for person crossdocument coreference resolution in italian news. In: The Workshop Programme, p 19 Luisa B, Christian G, Emanuele P (2008) Creating a gold standard for person crossdocument coreference resolution in italian news. In: The Workshop Programme, p 19
31.
Zurück zum Zitat Bizer C, Heath T, Berners-Lee T (2009) Linked data—the story so far. Int J Semant Web Inf Syst 5(3):1–22CrossRef Bizer C, Heath T, Berners-Lee T (2009) Linked data—the story so far. Int J Semant Web Inf Syst 5(3):1–22CrossRef
32.
Zurück zum Zitat Daumé III H, Marcu D (2005) A large-scale exploration of effective global features for a joint entity detection and tracking model. In: HLTNLP’05, pp 97-104 Daumé III H, Marcu D (2005) A large-scale exploration of effective global features for a joint entity detection and tracking model. In: HLTNLP’05, pp 97-104
33.
Zurück zum Zitat Green S, Andrews N, Gormley MR, Dredze M, Manning CD (2012) Entity clustering across languages. In: HLT-NAACL, pp 60-69 Green S, Andrews N, Gormley MR, Dredze M, Manning CD (2012) Entity clustering across languages. In: HLT-NAACL, pp 60-69
34.
Zurück zum Zitat Köpcke H, Thor A, Rahm E (2010) Learning-based approaches for matching web data entities. IEEE Internet Comput 14(4):23–31CrossRef Köpcke H, Thor A, Rahm E (2010) Learning-based approaches for matching web data entities. IEEE Internet Comput 14(4):23–31CrossRef
35.
Zurück zum Zitat Ni Y, Zhang L, Qiu Z, Wang C (2010) Enhancing the open-domain classification of named entity using linked open data. Int Semantic Web Conf 1:566–581 Ni Y, Zhang L, Qiu Z, Wang C (2010) Enhancing the open-domain classification of named entity using linked open data. Int Semantic Web Conf 1:566–581
36.
Zurück zum Zitat Niu C, Li W, Srihari RK (2004) Weakly supervised learning for cross-document person name disambiguation supported by information extraction. In: ACL’04, USA Niu C, Li W, Srihari RK (2004) Weakly supervised learning for cross-document person name disambiguation supported by information extraction. In: ACL’04, USA
37.
Zurück zum Zitat Singh S, Wick ML, McCallum A (2010) Distantly labeling data for large scale cross-document coreference. CoRR. arXiv:1005.4298 Singh S, Wick ML, McCallum A (2010) Distantly labeling data for large scale cross-document coreference. CoRR. arXiv:​1005.​4298
38.
Zurück zum Zitat Sleeman j, Finin T (2013) Entity type recognition for heterogeneous semantic graphs. In: Semantics for Big Data, AAAI Technical Report FS-13-04 Sleeman j, Finin T (2013) Entity type recognition for heterogeneous semantic graphs. In: Semantics for Big Data, AAAI Technical Report FS-13-04
39.
Zurück zum Zitat Wang J, Li G, Feng J (2011) Fast-join: an efficient method for fuzzy token matching based string similarity join. In: ICDE, pp 458-469 Wang J, Li G, Feng J (2011) Fast-join: an efficient method for fuzzy token matching based string similarity join. In: ICDE, pp 458-469
40.
Zurück zum Zitat Wick ML, Culotta A, Rohanimanesh K, McCallum A (2009) An entity based model for coreference resolution. In: SDM, pp 365-376 Wick ML, Culotta A, Rohanimanesh K, McCallum A (2009) An entity based model for coreference resolution. In: SDM, pp 365-376
41.
Zurück zum Zitat Zheng J, Vilnis L, Singh S, Choi J, McCallum A (2013) Dynamic knowledge-base alignment for coreference resolution. In: CoNLL’13, pp 153-162 Zheng J, Vilnis L, Singh S, Choi J, McCallum A (2013) Dynamic knowledge-base alignment for coreference resolution. In: CoNLL’13, pp 153-162
42.
Zurück zum Zitat Ando RK, Zhang T (2005) A high-performance semi-supervised learning method for text chunking. In: Proceedings of the 43rd annual meeting on association for computational linguistics, pp 1-9 Ando RK, Zhang T (2005) A high-performance semi-supervised learning method for text chunking. In: Proceedings of the 43rd annual meeting on association for computational linguistics, pp 1-9
43.
Zurück zum Zitat Bagga A, Baldwin B (1998) Algorithms for scoring coreference chains. Int Conf Lang Resour Eval Workshop Linguist Coreference 1:563–566 Bagga A, Baldwin B (1998) Algorithms for scoring coreference chains. Int Conf Lang Resour Eval Workshop Linguist Coreference 1:563–566
44.
Zurück zum Zitat Black W, Rinaldi F, Mowatt D (1998) Facile: description of the ne system used for muc-7. In: Proceedings of Message Uunderstanding Conference (MUC)-7 Black W, Rinaldi F, Mowatt D (1998) Facile: description of the ne system used for muc-7. In: Proceedings of Message Uunderstanding Conference (MUC)-7
45.
Zurück zum Zitat Chen Y, Martin J (2007) Towards robust unsupervised personal name disambiguation. In: EMNLP-CoNLL, pp 190-198 Chen Y, Martin J (2007) Towards robust unsupervised personal name disambiguation. In: EMNLP-CoNLL, pp 190-198
46.
Zurück zum Zitat Fleischman M, Hovy E (2004) Multi-document person name resolution. In: ACL, pp 66-82 Fleischman M, Hovy E (2004) Multi-document person name resolution. In: ACL, pp 66-82
47.
Zurück zum Zitat Giles CB, Wren JD (2008) Large-scale directional relationship extraction and resolution. BMC Bioinform 9(S-9) Giles CB, Wren JD (2008) Large-scale directional relationship extraction and resolution. BMC Bioinform 9(S-9)
48.
Zurück zum Zitat Gooi CH, Allan J (2004) Cross-document coreference on a large scale corpus. In: HLT-NAACL, pp 9-16 Gooi CH, Allan J (2004) Cross-document coreference on a large scale corpus. In: HLT-NAACL, pp 9-16
50.
Zurück zum Zitat Holmes DO, McCabe MC (2002) Improving precision and recall for soundex retrieval. In: ITCC, pp 22-27 Holmes DO, McCabe MC (2002) Improving precision and recall for soundex retrieval. In: ITCC, pp 22-27
51.
Zurück zum Zitat Kambhatla N (2004) Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations. In: ACL’04, ACLdemo ’04 Kambhatla N (2004) Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations. In: ACL’04, ACLdemo ’04
52.
Zurück zum Zitat Karaboga D, Ozturk C (2011) A novel clustering approach: artificial bee colony (abc) algorithm. Appl Soft Comput 11(1):652–657CrossRef Karaboga D, Ozturk C (2011) A novel clustering approach: artificial bee colony (abc) algorithm. Appl Soft Comput 11(1):652–657CrossRef
53.
Zurück zum Zitat Luo X, Ittycheriah A, Jing H, Kambhatla N, Roukos S (2004) A mention-synchronous coreference resolution algorithm based on the bell tree. In: ACL, pp 135-142 Luo X, Ittycheriah A, Jing H, Kambhatla N, Roukos S (2004) A mention-synchronous coreference resolution algorithm based on the bell tree. In: ACL, pp 135-142
54.
Zurück zum Zitat Vincent Ng (2010) Supervised noun phrase coreference research: the first fifteen years. In: ACL Vincent Ng (2010) Supervised noun phrase coreference research: the first fifteen years. In: ACL
55.
Zurück zum Zitat Randell L (1993) An assessment of name matching algorithms. Technical reports 550, Department of Computer Science, University of Newcastle upon Tyne Randell L (1993) An assessment of name matching algorithms. Technical reports 550, Department of Computer Science, University of Newcastle upon Tyne
56.
Zurück zum Zitat Rao D, McNamee P, Dredze M (2010) Streaming cross document entity coreference resolution. In: COLING (Posters), pp 1050-1058 Rao D, McNamee P, Dredze M (2010) Streaming cross document entity coreference resolution. In: COLING (Posters), pp 1050-1058
57.
Zurück zum Zitat Ravichandran D, Pantel P, Hovy EH (2005) Randomized algorithms and nlp: using locality sensitive hash functions for high speed noun clustering. In: ACL Ravichandran D, Pantel P, Hovy EH (2005) Randomized algorithms and nlp: using locality sensitive hash functions for high speed noun clustering. In: ACL
58.
Zurück zum Zitat Sarawagi S, Kirpal A (2004) Efficient set joins on similarity predicates. In: SIGMOD Conference, pp 743-754 Sarawagi S, Kirpal A (2004) Efficient set joins on similarity predicates. In: SIGMOD Conference, pp 743-754
59.
Zurück zum Zitat Tsuruoka Y et al (2005) Developing a robust part-of-speech tagger for biomedical text. In: Panhellenic Conference on Informatics, pp 382-392 Tsuruoka Y et al (2005) Developing a robust part-of-speech tagger for biomedical text. In: Panhellenic Conference on Informatics, pp 382-392
60.
Zurück zum Zitat Vilain M, Burger J, Aberdeen J, Connolly D, Hirschman L (1995) A model-theoretic coreference scoring scheme. In: MUC6’95, pp 45-52. USA Vilain M, Burger J, Aberdeen J, Connolly D, Hirschman L (1995) A model-theoretic coreference scoring scheme. In: MUC6’95, pp 45-52. USA
61.
Zurück zum Zitat Wick M, Singh S, McCallum A (2012) A discriminative hierarchical model for fast coreference at large scale. In: ACL’12, pp 379-388 Wick M, Singh S, McCallum A (2012) A discriminative hierarchical model for fast coreference at large scale. In: ACL’12, pp 379-388
62.
Zurück zum Zitat Anderberg MR (1973) Cluster analysis for applications. Academic Press, New YorkMATH Anderberg MR (1973) Cluster analysis for applications. Academic Press, New YorkMATH
63.
Zurück zum Zitat Auer S, Bizer C, Kobilarov G, Lehmann J, Cyganiak R, Ives ZG (2007) Dbpedia: a nucleus for a web of open data. In: ISWC/ASWC, pp 722-735 Auer S, Bizer C, Kobilarov G, Lehmann J, Cyganiak R, Ives ZG (2007) Dbpedia: a nucleus for a web of open data. In: ISWC/ASWC, pp 722-735
64.
Zurück zum Zitat Benjelloun O, Garcia-Molina H, Menestrina D, Qi S, Whang SE, Widom J (2009) Swoosh: a generic approach to entity resolution. VLDB J 18(1):255-276 Benjelloun O, Garcia-Molina H, Menestrina D, Qi S, Whang SE, Widom J (2009) Swoosh: a generic approach to entity resolution. VLDB J 18(1):255-276
65.
Zurück zum Zitat Day D, Hitzeman J, Wick ML, Crouch K, Poesio M (2008) A corpus for cross-document co-reference. In: LREC Day D, Hitzeman J, Wick ML, Crouch K, Poesio M (2008) A corpus for cross-document co-reference. In: LREC
66.
Zurück zum Zitat Elfeky MG, Elmagarmid AK, Verykios VS (2002) Tailor: a record linkage toolbox. In: Data Engineering. Proceedings 18th International Conference on. IEEE, pp 17-28 Elfeky MG, Elmagarmid AK, Verykios VS (2002) Tailor: a record linkage toolbox. In: Data Engineering. Proceedings 18th International Conference on. IEEE, pp 17-28
67.
Zurück zum Zitat Finkel JR, Grenager T, Manning C (2005) Incorporating non-local information into information extraction systems by gibbs sampling. In: ACL’05, pp 363-370 Finkel JR, Grenager T, Manning C (2005) Incorporating non-local information into information extraction systems by gibbs sampling. In: ACL’05, pp 363-370
68.
Zurück zum Zitat Hachey B, Grover C, Tobin R (2012) Datasets for generic relation extraction. Nat Lang Eng 18(1):21–59CrossRef Hachey B, Grover C, Tobin R (2012) Datasets for generic relation extraction. Nat Lang Eng 18(1):21–59CrossRef
69.
Zurück zum Zitat Lee H, Peirsman Y, Chang , Chambers N, Surdeanu M, Jurafsky D (2011) Stanford’s multi-pass sieve coreference resolution system at the conll-2011 shared task. In: CONLL’11 Lee H, Peirsman Y, Chang , Chambers N, Surdeanu M, Jurafsky D (2011) Stanford’s multi-pass sieve coreference resolution system at the conll-2011 shared task. In: CONLL’11
70.
Zurück zum Zitat Miller GA (1995) Wordnet: a lexical database for english. Commun ACM 38(11):39–41CrossRef Miller GA (1995) Wordnet: a lexical database for english. Commun ACM 38(11):39–41CrossRef
71.
Zurück zum Zitat Miller GA, Fellbaum C (2007) Wordnet then and now. Lang Resour Eval 41(2):209–214CrossRef Miller GA, Fellbaum C (2007) Wordnet then and now. Lang Resour Eval 41(2):209–214CrossRef
72.
Zurück zum Zitat Nastase V, Strube M, Boerschinger B, Zirn C, Elghafari A (2010) A very large scale multi-lingual concept network. In: LREC, Wikinet Nastase V, Strube M, Boerschinger B, Zirn C, Elghafari A (2010) A very large scale multi-lingual concept network. In: LREC, Wikinet
73.
Zurück zum Zitat Philips L (2000) The double-metaphone search algorithm. C/C++ User’s J 18(6):38-43 Philips L (2000) The double-metaphone search algorithm. C/C++ User’s J 18(6):38-43
74.
Zurück zum Zitat Ponzetto SP, Strube M (2007) Deriving a large-scale taxonomy from wikipedia. In: AAAI, pp 1440-1445 Ponzetto SP, Strube M (2007) Deriving a large-scale taxonomy from wikipedia. In: AAAI, pp 1440-1445
75.
Zurück zum Zitat Singh S et al (2012) Wikilinks: a large-scale cross-document coreference corpus labeled via links to Wikipedia. Technical Report UM-CS-2012-015. University of Massachusetts, Amherst Singh S et al (2012) Wikilinks: a large-scale cross-document coreference corpus labeled via links to Wikipedia. Technical Report UM-CS-2012-015. University of Massachusetts, Amherst
76.
Zurück zum Zitat Spitkovsky VI, Chang AX (2012) A cross-lingual dictionary for english wikipedia concepts. In: LREC, pp 3168-3175 Spitkovsky VI, Chang AX (2012) A cross-lingual dictionary for english wikipedia concepts. In: LREC, pp 3168-3175
77.
Zurück zum Zitat Nadeau D, Sekine S (2007) A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1):3–26CrossRef Nadeau D, Sekine S (2007) A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1):3–26CrossRef
78.
Zurück zum Zitat Sekine S, Ranchhod E (2009) Named entities: recognition, classification and use, vol 19. John Benjamins Publishing Company, The Netherlands Sekine S, Ranchhod E (2009) Named entities: recognition, classification and use, vol 19. John Benjamins Publishing Company, The Netherlands
79.
Zurück zum Zitat Skut W, Brants T (1998) Chunk tagger–statistical recognition of noun phrases. CoRR. arXiv:9807007 [cmp-lg] Skut W, Brants T (1998) Chunk tagger–statistical recognition of noun phrases. CoRR. arXiv:​9807007 [cmp-lg]
80.
Zurück zum Zitat Witten IH, Frank E (1999) Data mining: practical machine learning tools and techniques with Java Implementations. Morgan Kaufmann, USA Witten IH, Frank E (1999) Data mining: practical machine learning tools and techniques with Java Implementations. Morgan Kaufmann, USA
81.
Zurück zum Zitat Weikum G, Hoffart J, Nakashole N, Spaniol M, Suchanek F, Yosef M (2012) Big data methods for computational linguistics. IEEE Data Eng Bull 35(3):46–64 Weikum G, Hoffart J, Nakashole N, Spaniol M, Suchanek F, Yosef M (2012) Big data methods for computational linguistics. IEEE Data Eng Bull 35(3):46–64
82.
Zurück zum Zitat Riddle WE (1984) The magic number eighteen plus or minus three: a study of software technology maturation. ACM SIGSOFT Softw Eng Note 9(2):21–37MathSciNetCrossRef Riddle WE (1984) The magic number eighteen plus or minus three: a study of software technology maturation. ACM SIGSOFT Softw Eng Note 9(2):21–37MathSciNetCrossRef
83.
Zurück zum Zitat Cruzes DS, Dyba T (2011) Recommended steps for thematic synthesis in software engineering. In: Empirical Software Engineering and Measurement (ESEM), pp 275-284. IEEE Cruzes DS, Dyba T (2011) Recommended steps for thematic synthesis in software engineering. In: Empirical Software Engineering and Measurement (ESEM), pp 275-284. IEEE
84.
Zurück zum Zitat Marrero M, Sanchez-Cuadrado S, Morato J, Andreadakis Y (2009) Evaluation of named entity extraction systems. Adv Comput Linguistics 41:47–58 Marrero M, Sanchez-Cuadrado S, Morato J, Andreadakis Y (2009) Evaluation of named entity extraction systems. Adv Comput Linguistics 41:47–58
85.
Zurück zum Zitat Mousavi H, Kerr D, Iseli M, Zaniolo C (2014) Mining semantic structures from syntactic structures in free text documents. In: ICSC’14, pp 84-91. IEEE Mousavi H, Kerr D, Iseli M, Zaniolo C (2014) Mining semantic structures from syntactic structures in free text documents. In: ICSC’14, pp 84-91. IEEE
86.
Zurück zum Zitat Rahman A, Ng V (2011) Coreference resolution with world knowledge. In: ACL, pp 814-824 Rahman A, Ng V (2011) Coreference resolution with world knowledge. In: ACL, pp 814-824
87.
Zurück zum Zitat SMR Beheshti, Motahari Nezhad HR, Benatallah B (2012) Temporal provenance model (tpm): model and query language. CoRR. arXiv:1211.5009 SMR Beheshti, Motahari Nezhad HR, Benatallah B (2012) Temporal provenance model (tpm): model and query language. CoRR. arXiv:​1211.​5009
88.
Zurück zum Zitat Tasdemir K, Merényi E (2011) A validity index for prototype-based clustering of data sets with complex cluster structures. IEEE Trans 41(4):1039–1053 Tasdemir K, Merényi E (2011) A validity index for prototype-based clustering of data sets with complex cluster structures. IEEE Trans 41(4):1039–1053
89.
Zurück zum Zitat Estivill-Castro V, Houle ME (2001)Robust distance-based clustering with applications to spatial data mining. Algorithmica 30(2):216-242 Estivill-Castro V, Houle ME (2001)Robust distance-based clustering with applications to spatial data mining. Algorithmica 30(2):216-242
90.
Zurück zum Zitat Vincent Ng (2008) Unsupervised models for coreference resolution. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pp 640-649 Vincent Ng (2008) Unsupervised models for coreference resolution. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pp 640-649
91.
Zurück zum Zitat Olston C, Reed B, Srivastava U, Kumar R, Tomkins A (2008) Pig latin: a not-so-foreign language for data processing. In: SIGMOD’08. ACM, pp 1099-1110 Olston C, Reed B, Srivastava U, Kumar R, Tomkins A (2008) Pig latin: a not-so-foreign language for data processing. In: SIGMOD’08. ACM, pp 1099-1110
92.
Zurück zum Zitat Frakes WB, Baeza-Yates R (eds) (1992) Information retrieval: data structures and algorithms. Prentice-Hall Inc, Upper Saddle River Frakes WB, Baeza-Yates R (eds) (1992) Information retrieval: data structures and algorithms. Prentice-Hall Inc, Upper Saddle River
93.
Zurück zum Zitat Nist Ac (2008) Extraction automatic content: Evaluation plan (ace08). In: Proceedings of the ACE, pp 1-3 Nist Ac (2008) Extraction automatic content: Evaluation plan (ace08). In: Proceedings of the ACE, pp 1-3
94.
Zurück zum Zitat McNamee P, Dang H (2009) Overview of the TAC 2009 knowledge base population track. In: Proc. Text Analysis Conference (TAC) Workshop McNamee P, Dang H (2009) Overview of the TAC 2009 knowledge base population track. In: Proc. Text Analysis Conference (TAC) Workshop
95.
Zurück zum Zitat Salton G, McGill M (1984) Introduction to Modern Information Retrieval. McGraw-Hill Book Company, New York Salton G, McGill M (1984) Introduction to Modern Information Retrieval. McGraw-Hill Book Company, New York
96.
Zurück zum Zitat US NIST (2003) The ace 2003 evaluation plan. US National Institute for Standards and Technology (NIST), pp 2003-2008 US NIST (2003) The ace 2003 evaluation plan. US National Institute for Standards and Technology (NIST), pp 2003-2008
97.
Zurück zum Zitat Ciaramita M, Altun Y (2006) Broad-coverage sense disambiguation and information extraction with a supersense sequence tagger. In: EMNLP, pp 594-602 Ciaramita M, Altun Y (2006) Broad-coverage sense disambiguation and information extraction with a supersense sequence tagger. In: EMNLP, pp 594-602
98.
Zurück zum Zitat Van Zaanen M, Mollá D et al (2007) A named entity recogniser for question answering. Pacific Association for Computational Linguistics Van Zaanen M, Mollá D et al (2007) A named entity recogniser for question answering. Pacific Association for Computational Linguistics
99.
Zurück zum Zitat Beheshti SMR et al (2013) Big data and cross-document coreference resolution: current state and future opportunities. CoRR. arXiv:1311.3987 Beheshti SMR et al (2013) Big data and cross-document coreference resolution: current state and future opportunities. CoRR. arXiv:​1311.​3987
Metadaten
Titel
A systematic review and comparative analysis of cross-document coreference resolution methods and tools
verfasst von
Seyed-Mehdi-Reza Beheshti
Boualem Benatallah
Srikumar Venugopal
Seung Hwan Ryu
Hamid Reza Motahari-Nezhad
Wei Wang
Publikationsdatum
07.04.2016
Verlag
Springer Vienna
Erschienen in
Computing / Ausgabe 4/2017
Print ISSN: 0010-485X
Elektronische ISSN: 1436-5057
DOI
https://doi.org/10.1007/s00607-016-0490-0

Weitere Artikel der Ausgabe 4/2017

Computing 4/2017 Zur Ausgabe