survey

Relation Extraction Using Distant Supervision: A Survey

Authors:
Alisa Smirnova

eXascale Infolab, University of Fribourg—Switzerland, Fribourg, Switzerland

eXascale Infolab, University of Fribourg—Switzerland, Fribourg, Switzerland

0000-0002-7108-9917
View Profile

,
Philippe Cudré-Mauroux

eXascale Infolab, University of Fribourg—Switzerland, Fribourg, Switzerland

eXascale Infolab, University of Fribourg—Switzerland, Fribourg, Switzerland

0000-0003-2588-4212
View Profile

Authors Info & Claims

ACM Computing Surveys Volume 51 Issue 5Article No.: 106pp 1–35https://doi.org/10.1145/3241741

Published:19 November 2018Publication History

ACM Computing Surveys

Abstract

Relation extraction is a subtask of information extraction where semantic relationships are extracted from natural language text and then classified. In essence, it allows us to acquire structured knowledge from unstructured text. In this article, we present a survey of relation extraction methods that leverage pre-existing structured or semi-structured data to guide the extraction process. We introduce a taxonomy of existing methods and describe distant supervision approaches in detail. We describe, in addition, the evaluation methodologies and the datasets commonly used for quality assessment. Finally, we give a high-level outlook on the field, highlighting open problems as well as the most promising research directions.

References

Eugene Agichtein and Luis Gravano. 2000. Snowball: Extracting relations from large plain-text collections. In Proceedings of the 5th ACM Conference on Digital Libraries. 85--94. Google ScholarDigital Library
Alan Akbik, Thilo Michael, and Christoph Boden. 2014. Exploratory relation extraction in large text corpora. In Proceedings of the 25th International Conference on Computational Linguistics (COLING’14). 2087--2096.Google Scholar
Enrique Alfonseca, Katja Filippova, Jean-Yves Delort, and Guillermo Garrido. 2012. Pattern learning for relation extraction with a hierarchical topic model. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. 54--59. Google ScholarDigital Library
Gabor Angeli, Julie Tibshirani, Jean Wu, and Christopher D. Manning. 2014. Combining distant and partial supervision for relation extraction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1556--1567.Google Scholar
Isabelle Augenstein, Diana Maynard, and Fabio Ciravegna. 2016. Distantly supervised web relation extraction for knowledge base population. Semant. Web 7 (2016), 335--349.Google ScholarDigital Library
Nguyen Bach and Sameer Badaskar. 2007. A survey on relation extraction. Language Technologies Institute, Carnegie Mellon University.Google Scholar
Michele Banko, Michael J. Cafarella, Stephen Soderland, Matthew Broadhead, and Oren Etzioni. 2007. Open information extraction from the web. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI’07), Vol. 7. 2670--2676. Google ScholarDigital Library
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. J. Mach. Learn. Res. 3 (2003), 993--1022. Google ScholarDigital Library
Sergey Brin. 1998. Extracting patterns and relations from the world wide web. In Proceedings of the International Workshop on the World Wide Web and Databases (WebDB’98). Springer, 172--183. Google ScholarDigital Library
Kai-Wei Chang, Scott Wen-tau Yih, Bishan Yang, and Chris Meek. 2014. Typed tensor decomposition of knowledge bases for relation extraction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1568--1579.Google ScholarCross Ref
Peter Pin-Shan Chen. 1976. The entity-relationship model—Toward a unified view of data. ACM Trans. Database Syst. 1 (1976), 9--36. Google ScholarDigital Library
Mark Craven, Johan Kumlien, et al. 1999. Constructing biological knowledge bases by extracting information from text sources. In Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology. 77--86. Google ScholarDigital Library
Gianluca Demartini, Djellel Eddine Difallah, and Philippe Cudré-Mauroux. 2012. ZenCrowd: Leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In Proceedings of the 21st World Wide Web Conference (WWW’12). 469--478. Google ScholarDigital Library
Rodrigo Dienstmann, In Sock Jang, Brian Bot, Stephen Friend, and Justin Guinney. 2015. Database of genomic biomarkers for cancer drugs and clinical targetability in solid tumors. Cancer Discov. 5 (2015), 118--123.Google ScholarCross Ref
Patrick Ernst, Amy Siu, and Gerhard Weikum. 2018. HighLife: Higher-arity fact harvesting. In Proceedings of the World Wide Web Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1013--1022. Google ScholarDigital Library
Oren Etzioni, Michele Banko, Stephen Soderland, and Daniel S. Weld. 2008. Open information extraction from the web. Commun. ACM 51 (2008), 68--74. Google ScholarDigital Library
Oren Etzioni, Michael Cafarella, Doug Downey, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, and Alexander Yates. 2005. Unsupervised named-entity extraction from the web: An experimental study. Artific. Intell. 165, 1 (2005), 91--134. Google ScholarDigital Library
Oren Etzioni, Anthony Fader, Janara Christensen, Stephen Soderland, and Mausam Mausam. 2011. Open information extraction: The second generation. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI’11), vol. 11. 3--10. Google ScholarDigital Library
Anthony Fader, Stephen Soderland, and Oren Etzioni. 2011. Identifying relations for open information extraction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’11). 1535--1545. Google ScholarDigital Library
Miao Fan, Deli Zhao, Qiang Zhou, Zhiyuan Liu, Thomas Fang Zheng, and Edward Y. Chang. 2014. Distant supervision for relation extraction with matrix completion. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL’14). 839--849.Google Scholar
Evgeniy Gabrilovich, Michael Ringgaard, and Amarnag Subramanya. 2013. FACC1: Freebase annotation of ClueWeb corpora, Version 1. Retrieved from http://lemurproject.org/clueweb09/FACC1/Cited by.Google Scholar
Xianpei Han and Le Sun. 2016. Global distant supervision for relation extraction. In Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2950--2956. Google ScholarDigital Library
Xianpei Han, Le Sun, and Jun Zhao. 2011. Collective entity linking in web text: A graph-based method. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’11). 765--774. Google ScholarDigital Library
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9 (1997), 1735--1780.Google ScholarDigital Library
Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke Zettlemoyer, and Daniel S. Weld. 2011. Knowledge-based weak supervision for information extraction of overlapping relations. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies—Volume 1. 541--550. Google ScholarDigital Library
Heng Ji, Ralph Grishman, and Hoa Dang. 2011. Overview of the TAC2011 knowledge base population track. In Proceedings of the Text Analysis Conference.Google Scholar
Heng Ji, Ralph Grishman, Hoa Trang Dang, Kira Griffitt, and Joe Ellis. 2010. Overview of the TAC 2010 knowledge base population track. In Proceedings of the Text Analysis Conference.Google Scholar
Jin-Dong Kim, Tomoko Ohta, Sampo Pyysalo, Yoshinobu Kano, and Jun’ichi Tsujii. 2009. Overview of BioNLP’09 shared task on event extraction. In Proceedings of the Biomedical Natural Language Processing Workshop Companion Volume for Shared Task (BioNLP@HLT-NAACL’09). 1--9. Google ScholarDigital Library
Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1746--1751.Google ScholarCross Ref
Johannes Kirschnick, Alan Akbik, and Holmer Hemsen. 2014. Freepal: A large collection of deep lexico-syntactic patterns for relation extraction. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC’14). 2071--2075.Google Scholar
Mitchell Koch, John Gilmer, Stephen Soderland, and Daniel S. Weld. 2014. Type-aware distantly supervised relation extraction with linked arguments. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1891--1901.Google Scholar
Heeyoung Lee, Angel Chang, Yves Peirsman, Nathanael Chambers, Mihai Surdeanu, and Dan Jurafsky. 2013. Deterministic coreference resolution based on entity-centric, precision-ranked rules. Comput. Linguist. 39 (2013), 885--916. Google ScholarDigital Library
David D. Lewis, Yiming Yang, Tony G. Rose, and Fan Li. 2004. Rcv1: A new benchmark collection for text categorization research. J. Mach. Learn. Res. 5 (2004), 361--397. Google ScholarDigital Library
Yankai Lin, Shiqi Shen, Zhiyuan Liu, Huanbo Luan, and Maosong Sun. 2016. Neural relation extraction with selective attention over instances. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL’16).Google ScholarCross Ref
Shiqian Ma, Donald Goldfarb, and Lifeng Chen. 2011. Fixed point and Bregman iterative methods for matrix rank minimization. Math. Program. 128 (2011), 321--353.Google ScholarCross Ref
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv Preprint arXiv:1301.3781.Google Scholar
David Milne and Ian H. Witten. 2008. Learning to link with wikipedia. In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM’08). 509--518. Google ScholarDigital Library
Bonan Min, Ralph Grishman, Li Wan, Chang Wang, and David Gondek. 2013. Distant supervision for relation extraction with an incomplete knowledge base. In Proceedings of the Human Language Technologies Conference of the North American Chapter of the Association of Computational Linguistics. 777--782.Google Scholar
Bonan Min, Xiang Li, Ralph Grishman, and Ang Sun. 2012. New york university 2012 system for KBP slot filling. In Proceedings of the 5th Text Analysis Conference (TAC’12).Google Scholar
Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. 1003--1011. Google ScholarDigital Library
Raymond J. Mooney and Razvan C. Bunescu. 2006. Subsequence kernels for relation extraction. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS’05). 171--178. Google ScholarDigital Library
Andrea Moro, Alessandro Raganato, and Roberto Navigli. 2014. Entity linking meets word sense disambiguation: A unified approach. Trans. Assoc. Comput. Linguist. 2 (2014), 231--244.Google ScholarCross Ref
Thien Huu Nguyen and Ralph Grishman. 2015. Relation extraction: Perspective from convolutional neural networks. In Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing (VS@NAACL-HLT’15). 39--48.Google ScholarCross Ref
Maximilian Nickel, Volker Tresp, and Hans-Peter Kriegel. 2011. A three-way model for collective learning on multi-relational data. In Proceedings of the 28th International Conference on Machine Learning (ICML’11). 809--816. Google ScholarDigital Library
Joakim Nivre. 2006. Inductive Dependency Parsing, Vol. 34. Springer.Google Scholar
Nanyun Peng, Hoifung Poon, Chris Quirk, Kristina Toutanova, and Wen-tau Yih. 2017. Cross-sentence N-ary relation extraction with graph LSTMs. Trans. Assoc. Comput. Linguist. (2017). arXiv preprint arXiv:1708.03743.Google Scholar
Maria Pershina, Bonan Min, Wei Xu, and Ralph Grishman. 2014. Infusion of labeled data into distant supervision for relation extraction. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL’14). 732--738.Google ScholarCross Ref
Hoifung Poon, Kristina Toutanova, and Chris Quirk. 2015. Distant supervision for cancer pathway extraction from text. In Proceedings of the Pacific Symposium on Biocomputing. 120--131.Google Scholar
Deepak Ravichandran and Eduard Hovy. 2002. Learning surface text patterns for a question answering system. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 41--47. Google ScholarDigital Library
Sebastian Riedel, Limin Yao, and Andrew McCallum. 2010. Modeling relations and their mentions without labeled text. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD’10). Springer, 148--163. Google ScholarDigital Library
Sebastian Riedel, Limin Yao, Andrew McCallum, and Benjamin M. Marlin. 2013. Relation extraction with matrix factorization and universal schemas. In Proceedings of the Human Language Technologies Conference of the North American Chapter of the Association of Computational Linguistics. 74--84.Google Scholar
Alan Ritter, Luke Zettlemoyer, Oren Etzioni, et al. 2013. Modeling missing data in distant supervision for information extraction. Trans. Assoc. Comput. Linguis. 1, 367--378.Google ScholarCross Ref
Tim Rocktäschel, Sameer Singh, and Sebastian Riedel. 2015. Injecting logical background knowledge into embeddings for relation extraction. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT’15). 1119--1129.Google ScholarCross Ref
Benjamin Roth, Tassilo Barth, Michael Wiegand, and Dietrich Klakow. 2013. A survey of noise reduction methods for distant supervision. In Proceedings of the Workshop on Automated Knowledge Base Construction (AKBC@CIKM’13). 73--78. Google ScholarDigital Library
Benjamin Roth and Dietrich Klakow. 2013. Combining generative and discriminative model scores for distant supervision. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’13). 24--29.Google Scholar
Benjamin Roth and Dietrich Klakow. 2013. Feature-based models for improving the quality of noisy training data for relation extraction. In Proceedings of the 22nd ACM International Conference on Information and Knowledge Management (CIKM’13). 1181--1184. Google ScholarDigital Library
Evan Sandhaus. 2008. The new york times annotated corpus. Proceedings of the Linguistic Data Consortium.Google Scholar
Carl F. Schaefer, Kira Anthony, Shiva Krupa, Jeffrey Buchoff, Matthew Day, Timo Hannay, and Kenneth H. Buetow. 2009. PID: The pathway interaction database. Nucleic Acids Res. 37 (2009), 674--679.Google ScholarCross Ref
Wei Shen, Jianyong Wang, and Jiawei Han. 2015. Entity linking with a knowledge base: Issues, techniques, and solutions. IEEE Trans. Knowl. Data Eng. 27 (2015), 443--460.Google ScholarCross Ref
Yusuke Shinyama and Satoshi Sekine. 2006. Preemptive information extraction using unrestricted relation discovery. In Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics. Association for Computational Linguistics, 304--311. Google ScholarDigital Library
Mihai Surdeanu, Julie Tibshirani, Ramesh Nallapati, and Christopher D. Manning. 2012. Multi-instance multi-label learning for relation extraction. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 455--465. Google ScholarDigital Library
Shingo Takamatsu, Issei Sato, and Hiroshi Nakagawa. 2012. Reducing wrong labels in distant supervision for relation extraction. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. 721--729. Google ScholarDigital Library
Alberto Tonon, Michele Catasta, Gianluca Demartini, Philippe Cudré-Mauroux, and Karl Aberer. 2013. Trank: Ranking entity types using the web of data. In Proceedings of the 12th International Semantic Web Conference on the Semantic Web (ISWC’13). Springer, 640--656. Google ScholarDigital Library
Kristina Toutanova, Danqi Chen, Patrick Pantel, Hoifung Poon, Pallavi Choudhury, and Michael Gamon. 2015. Representing text for joint embedding of text and knowledge bases. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’15). 1499--1509.Google ScholarCross Ref
Chang Wang, James Fan, Aditya Kalyanpur, and David Gondek. 2011. Relation extraction with relation topics. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’11). 1426--1436. Google ScholarDigital Library
Jason Weston, Antoine Bordes, Oksana Yakhnenko, and Nicolas Usunier. 2013. Connecting language and knowledge bases with embedding models for relation extraction. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP’13). 1366--1371.Google Scholar
Michael Wick, Khashayar Rohanimanesh, Aron Culotta, and Andrew McCallum. 2009. Samplerank: Learning preferences from atomic gradients. In Proceedings of the Workshop on Advances in Ranking: Neural Information Processing Systems (NIPS’09).Google Scholar
Fei Wu and Daniel S. Weld. 2007. Autonomously semantifying Wikipedia. In Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM’07). 41--50. Google ScholarDigital Library
Fei Wu and Daniel S. Weld. 2010. Open information extraction using Wikipedia. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL’10). 118--127. Google ScholarDigital Library
Wei Xu, Raphael Hoffmann, Le Zhao, and Ralph Grishman. 2013. Filling knowledge base gaps for distant supervision of relation extraction. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL’13). 665--670.Google Scholar
Limin Yao, Aria Haghighi, Sebastian Riedel, and Andrew McCallum. 2011. Structured relation discovery using generative models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1456--1466. Google ScholarDigital Library
Dmitry Zelenko, Chinatsu Aone, and Anthony Richardella. 2003. Kernel methods for relation extraction. J. Mach. Learn. Res. 3 (2003), 1083--1106. Google ScholarDigital Library
Daojian Zeng, Kang Liu, Yubo Chen, and Jun Zhao. 2015. Distant supervision for relation extraction via piecewise convolutional neural networks. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP’15). 1753--1762.Google ScholarCross Ref
Daojian Zeng, Kang Liu, Siwei Lai, Guangyou Zhou, Jun Zhao et al. 2014. Relation classification via convolutional deep neural network. In Proceedings of the 25th International Conference on Computational Linguistics (COLING’14). 2335--2344.Google Scholar
Ce Zhang. 2015. DeepDive: A data management system for automatic knowledge base construction. University of Wisconsin-Madison, Madison, Wisconsin.Google Scholar

Index Terms

Relation Extraction Using Distant Supervision: A Survey
1. Information systems
  1. Information retrieval
    1. Document representation
      1. Content analysis and feature selection
    2. Retrieval tasks and goals
      1. Information extraction

Recommendations

Distant supervision for relation extraction with hierarchical attention-based networks
Abstract
Distant supervision employs external knowledge bases to automatically label corpora. The labeled sentences in a corpus are usually packaged and trained for relation extraction using a multi-instance learning paradigm. The automated ...
Highlights
- Propose a novel hierarchical attention-based networks for relation extraction.
- ...
Read More
Distant Supervision for Relation Extraction via Group Selection
ICONIP 2015: Proceeings, Part II, of the 22nd International Conference on Neural Information Processing - Volume 9490

Distant supervision DS aligns relations between name entities from a knowledge base KB with free text and automatically annotates the training corpus with relation mentions. One big challenge of DS is that the heuristically generated relation labels ...
Read More
Bootstrapped Multi-level Distant Supervision for Relation Extraction
Web Information Systems Engineering – WISE 2018
Abstract
Distant supervised relation extraction has been widely used to identify new relation facts from free text. However, relying on a single-node categorization model to identify relation facts for thousands of relations simultaneously inevitably ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Computing Surveys Volume 51, Issue 5
September 2019
791 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/3271482
Editor:
Sartaj Sahni
Department of Computer and Information Science and Engineering
Issue’s Table of Contents
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 November 2018
- Revised: 1 July 2018
- Accepted: 1 July 2018
- Received: 1 March 2018
Published in csur Volume 51, Issue 5

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Relation extraction
distant supervision
knowledge graph
Qualifiers
- survey
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 77
  Total Citations
  View Citations
- 2,289
  Total Downloads
- Downloads (Last 12 months)195
- Downloads (Last 6 weeks)26
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Relation Extraction Using Distant Supervision: A Survey

ACM Computing Surveys

Abstract

References

Cited By

Index Terms

Recommendations

Distant supervision for relation extraction with hierarchical attention-based networks

Distant Supervision for Relation Extraction via Group Selection

Bootstrapped Multi-level Distant Supervision for Relation Extraction