Abstract
Relation extraction is a subtask of information extraction where semantic relationships are extracted from natural language text and then classified. In essence, it allows us to acquire structured knowledge from unstructured text. In this article, we present a survey of relation extraction methods that leverage pre-existing structured or semi-structured data to guide the extraction process. We introduce a taxonomy of existing methods and describe distant supervision approaches in detail. We describe, in addition, the evaluation methodologies and the datasets commonly used for quality assessment. Finally, we give a high-level outlook on the field, highlighting open problems as well as the most promising research directions.
- Eugene Agichtein and Luis Gravano. 2000. Snowball: Extracting relations from large plain-text collections. In Proceedings of the 5th ACM Conference on Digital Libraries. 85--94. Google ScholarDigital Library
- Alan Akbik, Thilo Michael, and Christoph Boden. 2014. Exploratory relation extraction in large text corpora. In Proceedings of the 25th International Conference on Computational Linguistics (COLING’14). 2087--2096.Google Scholar
- Enrique Alfonseca, Katja Filippova, Jean-Yves Delort, and Guillermo Garrido. 2012. Pattern learning for relation extraction with a hierarchical topic model. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. 54--59. Google ScholarDigital Library
- Gabor Angeli, Julie Tibshirani, Jean Wu, and Christopher D. Manning. 2014. Combining distant and partial supervision for relation extraction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1556--1567.Google Scholar
- Isabelle Augenstein, Diana Maynard, and Fabio Ciravegna. 2016. Distantly supervised web relation extraction for knowledge base population. Semant. Web 7 (2016), 335--349.Google ScholarDigital Library
- Nguyen Bach and Sameer Badaskar. 2007. A survey on relation extraction. Language Technologies Institute, Carnegie Mellon University.Google Scholar
- Michele Banko, Michael J. Cafarella, Stephen Soderland, Matthew Broadhead, and Oren Etzioni. 2007. Open information extraction from the web. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI’07), Vol. 7. 2670--2676. Google ScholarDigital Library
- David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. J. Mach. Learn. Res. 3 (2003), 993--1022. Google ScholarDigital Library
- Sergey Brin. 1998. Extracting patterns and relations from the world wide web. In Proceedings of the International Workshop on the World Wide Web and Databases (WebDB’98). Springer, 172--183. Google ScholarDigital Library
- Kai-Wei Chang, Scott Wen-tau Yih, Bishan Yang, and Chris Meek. 2014. Typed tensor decomposition of knowledge bases for relation extraction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1568--1579.Google ScholarCross Ref
- Peter Pin-Shan Chen. 1976. The entity-relationship model—Toward a unified view of data. ACM Trans. Database Syst. 1 (1976), 9--36. Google ScholarDigital Library
- Mark Craven, Johan Kumlien, et al. 1999. Constructing biological knowledge bases by extracting information from text sources. In Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology. 77--86. Google ScholarDigital Library
- Gianluca Demartini, Djellel Eddine Difallah, and Philippe Cudré-Mauroux. 2012. ZenCrowd: Leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In Proceedings of the 21st World Wide Web Conference (WWW’12). 469--478. Google ScholarDigital Library
- Rodrigo Dienstmann, In Sock Jang, Brian Bot, Stephen Friend, and Justin Guinney. 2015. Database of genomic biomarkers for cancer drugs and clinical targetability in solid tumors. Cancer Discov. 5 (2015), 118--123.Google ScholarCross Ref
- Patrick Ernst, Amy Siu, and Gerhard Weikum. 2018. HighLife: Higher-arity fact harvesting. In Proceedings of the World Wide Web Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1013--1022. Google ScholarDigital Library
- Oren Etzioni, Michele Banko, Stephen Soderland, and Daniel S. Weld. 2008. Open information extraction from the web. Commun. ACM 51 (2008), 68--74. Google ScholarDigital Library
- Oren Etzioni, Michael Cafarella, Doug Downey, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, and Alexander Yates. 2005. Unsupervised named-entity extraction from the web: An experimental study. Artific. Intell. 165, 1 (2005), 91--134. Google ScholarDigital Library
- Oren Etzioni, Anthony Fader, Janara Christensen, Stephen Soderland, and Mausam Mausam. 2011. Open information extraction: The second generation. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI’11), vol. 11. 3--10. Google ScholarDigital Library
- Anthony Fader, Stephen Soderland, and Oren Etzioni. 2011. Identifying relations for open information extraction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’11). 1535--1545. Google ScholarDigital Library
- Miao Fan, Deli Zhao, Qiang Zhou, Zhiyuan Liu, Thomas Fang Zheng, and Edward Y. Chang. 2014. Distant supervision for relation extraction with matrix completion. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL’14). 839--849.Google Scholar
- Evgeniy Gabrilovich, Michael Ringgaard, and Amarnag Subramanya. 2013. FACC1: Freebase annotation of ClueWeb corpora, Version 1. Retrieved from http://lemurproject.org/clueweb09/FACC1/Cited by.Google Scholar
- Xianpei Han and Le Sun. 2016. Global distant supervision for relation extraction. In Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2950--2956. Google ScholarDigital Library
- Xianpei Han, Le Sun, and Jun Zhao. 2011. Collective entity linking in web text: A graph-based method. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’11). 765--774. Google ScholarDigital Library
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9 (1997), 1735--1780.Google ScholarDigital Library
- Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke Zettlemoyer, and Daniel S. Weld. 2011. Knowledge-based weak supervision for information extraction of overlapping relations. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies—Volume 1. 541--550. Google ScholarDigital Library
- Heng Ji, Ralph Grishman, and Hoa Dang. 2011. Overview of the TAC2011 knowledge base population track. In Proceedings of the Text Analysis Conference.Google Scholar
- Heng Ji, Ralph Grishman, Hoa Trang Dang, Kira Griffitt, and Joe Ellis. 2010. Overview of the TAC 2010 knowledge base population track. In Proceedings of the Text Analysis Conference.Google Scholar
- Jin-Dong Kim, Tomoko Ohta, Sampo Pyysalo, Yoshinobu Kano, and Jun’ichi Tsujii. 2009. Overview of BioNLP’09 shared task on event extraction. In Proceedings of the Biomedical Natural Language Processing Workshop Companion Volume for Shared Task (BioNLP@HLT-NAACL’09). 1--9. Google ScholarDigital Library
- Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1746--1751.Google ScholarCross Ref
- Johannes Kirschnick, Alan Akbik, and Holmer Hemsen. 2014. Freepal: A large collection of deep lexico-syntactic patterns for relation extraction. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC’14). 2071--2075.Google Scholar
- Mitchell Koch, John Gilmer, Stephen Soderland, and Daniel S. Weld. 2014. Type-aware distantly supervised relation extraction with linked arguments. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1891--1901.Google Scholar
- Heeyoung Lee, Angel Chang, Yves Peirsman, Nathanael Chambers, Mihai Surdeanu, and Dan Jurafsky. 2013. Deterministic coreference resolution based on entity-centric, precision-ranked rules. Comput. Linguist. 39 (2013), 885--916. Google ScholarDigital Library
- David D. Lewis, Yiming Yang, Tony G. Rose, and Fan Li. 2004. Rcv1: A new benchmark collection for text categorization research. J. Mach. Learn. Res. 5 (2004), 361--397. Google ScholarDigital Library
- Yankai Lin, Shiqi Shen, Zhiyuan Liu, Huanbo Luan, and Maosong Sun. 2016. Neural relation extraction with selective attention over instances. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL’16).Google ScholarCross Ref
- Shiqian Ma, Donald Goldfarb, and Lifeng Chen. 2011. Fixed point and Bregman iterative methods for matrix rank minimization. Math. Program. 128 (2011), 321--353.Google ScholarCross Ref
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv Preprint arXiv:1301.3781.Google Scholar
- David Milne and Ian H. Witten. 2008. Learning to link with wikipedia. In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM’08). 509--518. Google ScholarDigital Library
- Bonan Min, Ralph Grishman, Li Wan, Chang Wang, and David Gondek. 2013. Distant supervision for relation extraction with an incomplete knowledge base. In Proceedings of the Human Language Technologies Conference of the North American Chapter of the Association of Computational Linguistics. 777--782.Google Scholar
- Bonan Min, Xiang Li, Ralph Grishman, and Ang Sun. 2012. New york university 2012 system for KBP slot filling. In Proceedings of the 5th Text Analysis Conference (TAC’12).Google Scholar
- Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. 1003--1011. Google ScholarDigital Library
- Raymond J. Mooney and Razvan C. Bunescu. 2006. Subsequence kernels for relation extraction. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS’05). 171--178. Google ScholarDigital Library
- Andrea Moro, Alessandro Raganato, and Roberto Navigli. 2014. Entity linking meets word sense disambiguation: A unified approach. Trans. Assoc. Comput. Linguist. 2 (2014), 231--244.Google ScholarCross Ref
- Thien Huu Nguyen and Ralph Grishman. 2015. Relation extraction: Perspective from convolutional neural networks. In Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing (VS@NAACL-HLT’15). 39--48.Google ScholarCross Ref
- Maximilian Nickel, Volker Tresp, and Hans-Peter Kriegel. 2011. A three-way model for collective learning on multi-relational data. In Proceedings of the 28th International Conference on Machine Learning (ICML’11). 809--816. Google ScholarDigital Library
- Joakim Nivre. 2006. Inductive Dependency Parsing, Vol. 34. Springer.Google Scholar
- Nanyun Peng, Hoifung Poon, Chris Quirk, Kristina Toutanova, and Wen-tau Yih. 2017. Cross-sentence N-ary relation extraction with graph LSTMs. Trans. Assoc. Comput. Linguist. (2017). arXiv preprint arXiv:1708.03743.Google Scholar
- Maria Pershina, Bonan Min, Wei Xu, and Ralph Grishman. 2014. Infusion of labeled data into distant supervision for relation extraction. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL’14). 732--738.Google ScholarCross Ref
- Hoifung Poon, Kristina Toutanova, and Chris Quirk. 2015. Distant supervision for cancer pathway extraction from text. In Proceedings of the Pacific Symposium on Biocomputing. 120--131.Google Scholar
- Deepak Ravichandran and Eduard Hovy. 2002. Learning surface text patterns for a question answering system. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 41--47. Google ScholarDigital Library
- Sebastian Riedel, Limin Yao, and Andrew McCallum. 2010. Modeling relations and their mentions without labeled text. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD’10). Springer, 148--163. Google ScholarDigital Library
- Sebastian Riedel, Limin Yao, Andrew McCallum, and Benjamin M. Marlin. 2013. Relation extraction with matrix factorization and universal schemas. In Proceedings of the Human Language Technologies Conference of the North American Chapter of the Association of Computational Linguistics. 74--84.Google Scholar
- Alan Ritter, Luke Zettlemoyer, Oren Etzioni, et al. 2013. Modeling missing data in distant supervision for information extraction. Trans. Assoc. Comput. Linguis. 1, 367--378.Google ScholarCross Ref
- Tim Rocktäschel, Sameer Singh, and Sebastian Riedel. 2015. Injecting logical background knowledge into embeddings for relation extraction. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT’15). 1119--1129.Google ScholarCross Ref
- Benjamin Roth, Tassilo Barth, Michael Wiegand, and Dietrich Klakow. 2013. A survey of noise reduction methods for distant supervision. In Proceedings of the Workshop on Automated Knowledge Base Construction (AKBC@CIKM’13). 73--78. Google ScholarDigital Library
- Benjamin Roth and Dietrich Klakow. 2013. Combining generative and discriminative model scores for distant supervision. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’13). 24--29.Google Scholar
- Benjamin Roth and Dietrich Klakow. 2013. Feature-based models for improving the quality of noisy training data for relation extraction. In Proceedings of the 22nd ACM International Conference on Information and Knowledge Management (CIKM’13). 1181--1184. Google ScholarDigital Library
- Evan Sandhaus. 2008. The new york times annotated corpus. Proceedings of the Linguistic Data Consortium.Google Scholar
- Carl F. Schaefer, Kira Anthony, Shiva Krupa, Jeffrey Buchoff, Matthew Day, Timo Hannay, and Kenneth H. Buetow. 2009. PID: The pathway interaction database. Nucleic Acids Res. 37 (2009), 674--679.Google ScholarCross Ref
- Wei Shen, Jianyong Wang, and Jiawei Han. 2015. Entity linking with a knowledge base: Issues, techniques, and solutions. IEEE Trans. Knowl. Data Eng. 27 (2015), 443--460.Google ScholarCross Ref
- Yusuke Shinyama and Satoshi Sekine. 2006. Preemptive information extraction using unrestricted relation discovery. In Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics. Association for Computational Linguistics, 304--311. Google ScholarDigital Library
- Mihai Surdeanu, Julie Tibshirani, Ramesh Nallapati, and Christopher D. Manning. 2012. Multi-instance multi-label learning for relation extraction. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 455--465. Google ScholarDigital Library
- Shingo Takamatsu, Issei Sato, and Hiroshi Nakagawa. 2012. Reducing wrong labels in distant supervision for relation extraction. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. 721--729. Google ScholarDigital Library
- Alberto Tonon, Michele Catasta, Gianluca Demartini, Philippe Cudré-Mauroux, and Karl Aberer. 2013. Trank: Ranking entity types using the web of data. In Proceedings of the 12th International Semantic Web Conference on the Semantic Web (ISWC’13). Springer, 640--656. Google ScholarDigital Library
- Kristina Toutanova, Danqi Chen, Patrick Pantel, Hoifung Poon, Pallavi Choudhury, and Michael Gamon. 2015. Representing text for joint embedding of text and knowledge bases. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’15). 1499--1509.Google ScholarCross Ref
- Chang Wang, James Fan, Aditya Kalyanpur, and David Gondek. 2011. Relation extraction with relation topics. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’11). 1426--1436. Google ScholarDigital Library
- Jason Weston, Antoine Bordes, Oksana Yakhnenko, and Nicolas Usunier. 2013. Connecting language and knowledge bases with embedding models for relation extraction. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP’13). 1366--1371.Google Scholar
- Michael Wick, Khashayar Rohanimanesh, Aron Culotta, and Andrew McCallum. 2009. Samplerank: Learning preferences from atomic gradients. In Proceedings of the Workshop on Advances in Ranking: Neural Information Processing Systems (NIPS’09).Google Scholar
- Fei Wu and Daniel S. Weld. 2007. Autonomously semantifying Wikipedia. In Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM’07). 41--50. Google ScholarDigital Library
- Fei Wu and Daniel S. Weld. 2010. Open information extraction using Wikipedia. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL’10). 118--127. Google ScholarDigital Library
- Wei Xu, Raphael Hoffmann, Le Zhao, and Ralph Grishman. 2013. Filling knowledge base gaps for distant supervision of relation extraction. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL’13). 665--670.Google Scholar
- Limin Yao, Aria Haghighi, Sebastian Riedel, and Andrew McCallum. 2011. Structured relation discovery using generative models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1456--1466. Google ScholarDigital Library
- Dmitry Zelenko, Chinatsu Aone, and Anthony Richardella. 2003. Kernel methods for relation extraction. J. Mach. Learn. Res. 3 (2003), 1083--1106. Google ScholarDigital Library
- Daojian Zeng, Kang Liu, Yubo Chen, and Jun Zhao. 2015. Distant supervision for relation extraction via piecewise convolutional neural networks. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP’15). 1753--1762.Google ScholarCross Ref
- Daojian Zeng, Kang Liu, Siwei Lai, Guangyou Zhou, Jun Zhao et al. 2014. Relation classification via convolutional deep neural network. In Proceedings of the 25th International Conference on Computational Linguistics (COLING’14). 2335--2344.Google Scholar
- Ce Zhang. 2015. DeepDive: A data management system for automatic knowledge base construction. University of Wisconsin-Madison, Madison, Wisconsin.Google Scholar
Index Terms
- Relation Extraction Using Distant Supervision: A Survey
Recommendations
Distant supervision for relation extraction with hierarchical attention-based networks
AbstractDistant supervision employs external knowledge bases to automatically label corpora. The labeled sentences in a corpus are usually packaged and trained for relation extraction using a multi-instance learning paradigm. The automated ...
Highlights- Propose a novel hierarchical attention-based networks for relation extraction.
- ...
Distant Supervision for Relation Extraction via Group Selection
ICONIP 2015: Proceeings, Part II, of the 22nd International Conference on Neural Information Processing - Volume 9490Distant supervision DS aligns relations between name entities from a knowledge base KB with free text and automatically annotates the training corpus with relation mentions. One big challenge of DS is that the heuristically generated relation labels ...
Bootstrapped Multi-level Distant Supervision for Relation Extraction
Web Information Systems Engineering – WISE 2018AbstractDistant supervised relation extraction has been widely used to identify new relation facts from free text. However, relying on a single-node categorization model to identify relation facts for thousands of relations simultaneously inevitably ...
Comments