skip to main content
survey

Relation Extraction Using Distant Supervision: A Survey

Published:19 November 2018Publication History
Skip Abstract Section

Abstract

Relation extraction is a subtask of information extraction where semantic relationships are extracted from natural language text and then classified. In essence, it allows us to acquire structured knowledge from unstructured text. In this article, we present a survey of relation extraction methods that leverage pre-existing structured or semi-structured data to guide the extraction process. We introduce a taxonomy of existing methods and describe distant supervision approaches in detail. We describe, in addition, the evaluation methodologies and the datasets commonly used for quality assessment. Finally, we give a high-level outlook on the field, highlighting open problems as well as the most promising research directions.

References

  1. Eugene Agichtein and Luis Gravano. 2000. Snowball: Extracting relations from large plain-text collections. In Proceedings of the 5th ACM Conference on Digital Libraries. 85--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Alan Akbik, Thilo Michael, and Christoph Boden. 2014. Exploratory relation extraction in large text corpora. In Proceedings of the 25th International Conference on Computational Linguistics (COLING’14). 2087--2096.Google ScholarGoogle Scholar
  3. Enrique Alfonseca, Katja Filippova, Jean-Yves Delort, and Guillermo Garrido. 2012. Pattern learning for relation extraction with a hierarchical topic model. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. 54--59. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Gabor Angeli, Julie Tibshirani, Jean Wu, and Christopher D. Manning. 2014. Combining distant and partial supervision for relation extraction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1556--1567.Google ScholarGoogle Scholar
  5. Isabelle Augenstein, Diana Maynard, and Fabio Ciravegna. 2016. Distantly supervised web relation extraction for knowledge base population. Semant. Web 7 (2016), 335--349.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Nguyen Bach and Sameer Badaskar. 2007. A survey on relation extraction. Language Technologies Institute, Carnegie Mellon University.Google ScholarGoogle Scholar
  7. Michele Banko, Michael J. Cafarella, Stephen Soderland, Matthew Broadhead, and Oren Etzioni. 2007. Open information extraction from the web. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI’07), Vol. 7. 2670--2676. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. J. Mach. Learn. Res. 3 (2003), 993--1022. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Sergey Brin. 1998. Extracting patterns and relations from the world wide web. In Proceedings of the International Workshop on the World Wide Web and Databases (WebDB’98). Springer, 172--183. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Kai-Wei Chang, Scott Wen-tau Yih, Bishan Yang, and Chris Meek. 2014. Typed tensor decomposition of knowledge bases for relation extraction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1568--1579.Google ScholarGoogle ScholarCross RefCross Ref
  11. Peter Pin-Shan Chen. 1976. The entity-relationship model—Toward a unified view of data. ACM Trans. Database Syst. 1 (1976), 9--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Mark Craven, Johan Kumlien, et al. 1999. Constructing biological knowledge bases by extracting information from text sources. In Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology. 77--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Gianluca Demartini, Djellel Eddine Difallah, and Philippe Cudré-Mauroux. 2012. ZenCrowd: Leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In Proceedings of the 21st World Wide Web Conference (WWW’12). 469--478. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Rodrigo Dienstmann, In Sock Jang, Brian Bot, Stephen Friend, and Justin Guinney. 2015. Database of genomic biomarkers for cancer drugs and clinical targetability in solid tumors. Cancer Discov. 5 (2015), 118--123.Google ScholarGoogle ScholarCross RefCross Ref
  15. Patrick Ernst, Amy Siu, and Gerhard Weikum. 2018. HighLife: Higher-arity fact harvesting. In Proceedings of the World Wide Web Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1013--1022. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Oren Etzioni, Michele Banko, Stephen Soderland, and Daniel S. Weld. 2008. Open information extraction from the web. Commun. ACM 51 (2008), 68--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Oren Etzioni, Michael Cafarella, Doug Downey, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, and Alexander Yates. 2005. Unsupervised named-entity extraction from the web: An experimental study. Artific. Intell. 165, 1 (2005), 91--134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Oren Etzioni, Anthony Fader, Janara Christensen, Stephen Soderland, and Mausam Mausam. 2011. Open information extraction: The second generation. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI’11), vol. 11. 3--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Anthony Fader, Stephen Soderland, and Oren Etzioni. 2011. Identifying relations for open information extraction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’11). 1535--1545. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Miao Fan, Deli Zhao, Qiang Zhou, Zhiyuan Liu, Thomas Fang Zheng, and Edward Y. Chang. 2014. Distant supervision for relation extraction with matrix completion. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL’14). 839--849.Google ScholarGoogle Scholar
  21. Evgeniy Gabrilovich, Michael Ringgaard, and Amarnag Subramanya. 2013. FACC1: Freebase annotation of ClueWeb corpora, Version 1. Retrieved from http://lemurproject.org/clueweb09/FACC1/Cited by.Google ScholarGoogle Scholar
  22. Xianpei Han and Le Sun. 2016. Global distant supervision for relation extraction. In Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2950--2956. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Xianpei Han, Le Sun, and Jun Zhao. 2011. Collective entity linking in web text: A graph-based method. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’11). 765--774. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9 (1997), 1735--1780.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke Zettlemoyer, and Daniel S. Weld. 2011. Knowledge-based weak supervision for information extraction of overlapping relations. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies—Volume 1. 541--550. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Heng Ji, Ralph Grishman, and Hoa Dang. 2011. Overview of the TAC2011 knowledge base population track. In Proceedings of the Text Analysis Conference.Google ScholarGoogle Scholar
  27. Heng Ji, Ralph Grishman, Hoa Trang Dang, Kira Griffitt, and Joe Ellis. 2010. Overview of the TAC 2010 knowledge base population track. In Proceedings of the Text Analysis Conference.Google ScholarGoogle Scholar
  28. Jin-Dong Kim, Tomoko Ohta, Sampo Pyysalo, Yoshinobu Kano, and Jun’ichi Tsujii. 2009. Overview of BioNLP’09 shared task on event extraction. In Proceedings of the Biomedical Natural Language Processing Workshop Companion Volume for Shared Task (BioNLP@HLT-NAACL’09). 1--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1746--1751.Google ScholarGoogle ScholarCross RefCross Ref
  30. Johannes Kirschnick, Alan Akbik, and Holmer Hemsen. 2014. Freepal: A large collection of deep lexico-syntactic patterns for relation extraction. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC’14). 2071--2075.Google ScholarGoogle Scholar
  31. Mitchell Koch, John Gilmer, Stephen Soderland, and Daniel S. Weld. 2014. Type-aware distantly supervised relation extraction with linked arguments. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1891--1901.Google ScholarGoogle Scholar
  32. Heeyoung Lee, Angel Chang, Yves Peirsman, Nathanael Chambers, Mihai Surdeanu, and Dan Jurafsky. 2013. Deterministic coreference resolution based on entity-centric, precision-ranked rules. Comput. Linguist. 39 (2013), 885--916. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. David D. Lewis, Yiming Yang, Tony G. Rose, and Fan Li. 2004. Rcv1: A new benchmark collection for text categorization research. J. Mach. Learn. Res. 5 (2004), 361--397. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Yankai Lin, Shiqi Shen, Zhiyuan Liu, Huanbo Luan, and Maosong Sun. 2016. Neural relation extraction with selective attention over instances. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL’16).Google ScholarGoogle ScholarCross RefCross Ref
  35. Shiqian Ma, Donald Goldfarb, and Lifeng Chen. 2011. Fixed point and Bregman iterative methods for matrix rank minimization. Math. Program. 128 (2011), 321--353.Google ScholarGoogle ScholarCross RefCross Ref
  36. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv Preprint arXiv:1301.3781.Google ScholarGoogle Scholar
  37. David Milne and Ian H. Witten. 2008. Learning to link with wikipedia. In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM’08). 509--518. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Bonan Min, Ralph Grishman, Li Wan, Chang Wang, and David Gondek. 2013. Distant supervision for relation extraction with an incomplete knowledge base. In Proceedings of the Human Language Technologies Conference of the North American Chapter of the Association of Computational Linguistics. 777--782.Google ScholarGoogle Scholar
  39. Bonan Min, Xiang Li, Ralph Grishman, and Ang Sun. 2012. New york university 2012 system for KBP slot filling. In Proceedings of the 5th Text Analysis Conference (TAC’12).Google ScholarGoogle Scholar
  40. Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. 1003--1011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Raymond J. Mooney and Razvan C. Bunescu. 2006. Subsequence kernels for relation extraction. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS’05). 171--178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Andrea Moro, Alessandro Raganato, and Roberto Navigli. 2014. Entity linking meets word sense disambiguation: A unified approach. Trans. Assoc. Comput. Linguist. 2 (2014), 231--244.Google ScholarGoogle ScholarCross RefCross Ref
  43. Thien Huu Nguyen and Ralph Grishman. 2015. Relation extraction: Perspective from convolutional neural networks. In Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing (VS@NAACL-HLT’15). 39--48.Google ScholarGoogle ScholarCross RefCross Ref
  44. Maximilian Nickel, Volker Tresp, and Hans-Peter Kriegel. 2011. A three-way model for collective learning on multi-relational data. In Proceedings of the 28th International Conference on Machine Learning (ICML’11). 809--816. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Joakim Nivre. 2006. Inductive Dependency Parsing, Vol. 34. Springer.Google ScholarGoogle Scholar
  46. Nanyun Peng, Hoifung Poon, Chris Quirk, Kristina Toutanova, and Wen-tau Yih. 2017. Cross-sentence N-ary relation extraction with graph LSTMs. Trans. Assoc. Comput. Linguist. (2017). arXiv preprint arXiv:1708.03743.Google ScholarGoogle Scholar
  47. Maria Pershina, Bonan Min, Wei Xu, and Ralph Grishman. 2014. Infusion of labeled data into distant supervision for relation extraction. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL’14). 732--738.Google ScholarGoogle ScholarCross RefCross Ref
  48. Hoifung Poon, Kristina Toutanova, and Chris Quirk. 2015. Distant supervision for cancer pathway extraction from text. In Proceedings of the Pacific Symposium on Biocomputing. 120--131.Google ScholarGoogle Scholar
  49. Deepak Ravichandran and Eduard Hovy. 2002. Learning surface text patterns for a question answering system. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 41--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Sebastian Riedel, Limin Yao, and Andrew McCallum. 2010. Modeling relations and their mentions without labeled text. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD’10). Springer, 148--163. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Sebastian Riedel, Limin Yao, Andrew McCallum, and Benjamin M. Marlin. 2013. Relation extraction with matrix factorization and universal schemas. In Proceedings of the Human Language Technologies Conference of the North American Chapter of the Association of Computational Linguistics. 74--84.Google ScholarGoogle Scholar
  52. Alan Ritter, Luke Zettlemoyer, Oren Etzioni, et al. 2013. Modeling missing data in distant supervision for information extraction. Trans. Assoc. Comput. Linguis. 1, 367--378.Google ScholarGoogle ScholarCross RefCross Ref
  53. Tim Rocktäschel, Sameer Singh, and Sebastian Riedel. 2015. Injecting logical background knowledge into embeddings for relation extraction. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT’15). 1119--1129.Google ScholarGoogle ScholarCross RefCross Ref
  54. Benjamin Roth, Tassilo Barth, Michael Wiegand, and Dietrich Klakow. 2013. A survey of noise reduction methods for distant supervision. In Proceedings of the Workshop on Automated Knowledge Base Construction (AKBC@CIKM’13). 73--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Benjamin Roth and Dietrich Klakow. 2013. Combining generative and discriminative model scores for distant supervision. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’13). 24--29.Google ScholarGoogle Scholar
  56. Benjamin Roth and Dietrich Klakow. 2013. Feature-based models for improving the quality of noisy training data for relation extraction. In Proceedings of the 22nd ACM International Conference on Information and Knowledge Management (CIKM’13). 1181--1184. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Evan Sandhaus. 2008. The new york times annotated corpus. Proceedings of the Linguistic Data Consortium.Google ScholarGoogle Scholar
  58. Carl F. Schaefer, Kira Anthony, Shiva Krupa, Jeffrey Buchoff, Matthew Day, Timo Hannay, and Kenneth H. Buetow. 2009. PID: The pathway interaction database. Nucleic Acids Res. 37 (2009), 674--679.Google ScholarGoogle ScholarCross RefCross Ref
  59. Wei Shen, Jianyong Wang, and Jiawei Han. 2015. Entity linking with a knowledge base: Issues, techniques, and solutions. IEEE Trans. Knowl. Data Eng. 27 (2015), 443--460.Google ScholarGoogle ScholarCross RefCross Ref
  60. Yusuke Shinyama and Satoshi Sekine. 2006. Preemptive information extraction using unrestricted relation discovery. In Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics. Association for Computational Linguistics, 304--311. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Mihai Surdeanu, Julie Tibshirani, Ramesh Nallapati, and Christopher D. Manning. 2012. Multi-instance multi-label learning for relation extraction. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 455--465. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Shingo Takamatsu, Issei Sato, and Hiroshi Nakagawa. 2012. Reducing wrong labels in distant supervision for relation extraction. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. 721--729. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Alberto Tonon, Michele Catasta, Gianluca Demartini, Philippe Cudré-Mauroux, and Karl Aberer. 2013. Trank: Ranking entity types using the web of data. In Proceedings of the 12th International Semantic Web Conference on the Semantic Web (ISWC’13). Springer, 640--656. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Kristina Toutanova, Danqi Chen, Patrick Pantel, Hoifung Poon, Pallavi Choudhury, and Michael Gamon. 2015. Representing text for joint embedding of text and knowledge bases. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’15). 1499--1509.Google ScholarGoogle ScholarCross RefCross Ref
  65. Chang Wang, James Fan, Aditya Kalyanpur, and David Gondek. 2011. Relation extraction with relation topics. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’11). 1426--1436. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Jason Weston, Antoine Bordes, Oksana Yakhnenko, and Nicolas Usunier. 2013. Connecting language and knowledge bases with embedding models for relation extraction. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP’13). 1366--1371.Google ScholarGoogle Scholar
  67. Michael Wick, Khashayar Rohanimanesh, Aron Culotta, and Andrew McCallum. 2009. Samplerank: Learning preferences from atomic gradients. In Proceedings of the Workshop on Advances in Ranking: Neural Information Processing Systems (NIPS’09).Google ScholarGoogle Scholar
  68. Fei Wu and Daniel S. Weld. 2007. Autonomously semantifying Wikipedia. In Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM’07). 41--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Fei Wu and Daniel S. Weld. 2010. Open information extraction using Wikipedia. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL’10). 118--127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Wei Xu, Raphael Hoffmann, Le Zhao, and Ralph Grishman. 2013. Filling knowledge base gaps for distant supervision of relation extraction. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL’13). 665--670.Google ScholarGoogle Scholar
  71. Limin Yao, Aria Haghighi, Sebastian Riedel, and Andrew McCallum. 2011. Structured relation discovery using generative models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1456--1466. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Dmitry Zelenko, Chinatsu Aone, and Anthony Richardella. 2003. Kernel methods for relation extraction. J. Mach. Learn. Res. 3 (2003), 1083--1106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Daojian Zeng, Kang Liu, Yubo Chen, and Jun Zhao. 2015. Distant supervision for relation extraction via piecewise convolutional neural networks. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP’15). 1753--1762.Google ScholarGoogle ScholarCross RefCross Ref
  74. Daojian Zeng, Kang Liu, Siwei Lai, Guangyou Zhou, Jun Zhao et al. 2014. Relation classification via convolutional deep neural network. In Proceedings of the 25th International Conference on Computational Linguistics (COLING’14). 2335--2344.Google ScholarGoogle Scholar
  75. Ce Zhang. 2015. DeepDive: A data management system for automatic knowledge base construction. University of Wisconsin-Madison, Madison, Wisconsin.Google ScholarGoogle Scholar

Index Terms

  1. Relation Extraction Using Distant Supervision: A Survey

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Computing Surveys
        ACM Computing Surveys  Volume 51, Issue 5
        September 2019
        791 pages
        ISSN:0360-0300
        EISSN:1557-7341
        DOI:10.1145/3271482
        • Editor:
        • Sartaj Sahni
        Issue’s Table of Contents

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 19 November 2018
        • Revised: 1 July 2018
        • Accepted: 1 July 2018
        • Received: 1 March 2018
        Published in csur Volume 51, Issue 5

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • survey
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format