skip to main content
10.1145/3269206.3271812acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

StuffIE: Semantic Tagging of Unlabeled Facets Using Fine-Grained Information Extraction

Authors Info & Claims
Published:17 October 2018Publication History

ABSTRACT

Recent knowledge extraction methods are moving towards ternary and higher-arity relations to capture more information about binary facts. An example is to include the time, the location, and the duration of a specific fact. These relations can be even more complex to extract in advanced domains such as news, where events typically come with different facets including reasons, consequences, purposes, involved parties, and related events. The main challenge consists in first finding the set of facets related to each fact, and second tagging those facets to the relevant category.

In this paper, we tackle the above problems by proposing StuffIE, a fine-grained information extraction approach which is facet-centric. We exploit the Stanford dependency parsing enhanced by lexical databases such as WordNet to extract nested triple relations. Then, we exploit the syntactical dependencies to semantically tag facets using distant learning based on Oxford dictionary. We have tested the accuracy of the extracted facets and their semantic tags using DUC'04 dataset. The results show the high accuracy and coverage of our approach with respect to ClausIE, OLLIE, SEMAFOR SRL and Illinois SRL.

References

  1. Gabor Angeli, Melvin Jose Johnson Premkumar, and Christopher D. Manning. 2015. Leveraging Linguistic Structure For Open Domain Information Extraction. In Proceedings of the 53rd ACL and the 7th IJCNLP (Volume 1: Long Papers). ACL, Beijing, China, 344--354.Google ScholarGoogle Scholar
  2. Collin F. Baker, Charles J. Fillmore, and John B. Lowe. 1998. The berkeley framenet project. In Proceedings of the 36th ACL and 17th ICCL - Volume 1. Association for Computational Linguistics, 86--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Soumia Lilia Berrahou, Patrice Buche, Juliette Dibie, and Mathieu Roche. 2016. Xart System: Discovering and Extracting Correlated Arguments of N-ary Relations from Text. In Proceedings of the 6th WIMS (WIMS '16). ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Nikita Bhutani, H. V. Jagadish, and Dragomir R. Radev. 2016. Nested Propositions in Open Information Extraction. In EMNLP. The Association for Computational Linguistics, 55--64.Google ScholarGoogle Scholar
  5. Janara Christensen, Mausam, Stephen Soderland, and Oren Etzioni. 2010. Semantic Role Labeling for Open Information Extraction. In Proceedings of the NAACL HLT 2010 FAM-LbR (FAM-LbR '10). Association for Computational Linguistics, Stroudsburg, PA, USA, 52--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. James Clarke, Vivek Srikumar, Mark Sammons, and Dan Roth. 2012. An NLP Curator (or: How I Learned to Stop Worrying and Love NLP Pipelines). In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12). Istanbul, Turkey, x--y.Google ScholarGoogle Scholar
  7. Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research, Vol. 12, Aug (2011), 2493--2537. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. SP Corder. 1968. Double-object verbs in English. (1968).Google ScholarGoogle Scholar
  9. Luciano Del Corro and Rainer Gemulla. 2013. Clausie: clause-based open information extraction. Proceedings of the 22nd international conference on World Wide Web. ACM, 355--366. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. George Doddington, Alexis Mitchell, Mark Przybocki, Lance Ramshaw, Stephanie Strassel, and Ralph Weischedel. 2004. The Automatic Content Extraction (ACE) Program Tasks, Data, and Evaluation. In Proceedings of LREC-2004. ELRA, Lisbon, Portugal.Google ScholarGoogle Scholar
  11. Patrick Ernst, Amy Siu, and Gerhard Weikum. 2018. HighLife: Higher-arity Fact Harvesting. In Proceedings of the 2018 WWW (WWW '18). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 1013--1022. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Kiril Gashteovski, Rainer Gemulla, and Luciano Del Corro. 2017. MinIE: Minimizing Facts in Open Information Extraction. In EMNLP. Association for Computational Linguistics, 2630--2640.Google ScholarGoogle Scholar
  13. Daniel Gildea and Daniel Jurafsky. 2002. Automatic Labeling of Semantic Roles. Comput. Linguist., Vol. 28, 3 (Sept. 2002), 245--288. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Paul Kingsbury and Martha Palmer. 2002. From TreeBank to PropBank. In LREC. 1989--1993.Google ScholarGoogle Scholar
  15. Sebastian Krause, Hong Li, Hans Uszkoreit, and Feiyu Xu. 2012. Large-Scale Learning of Relation-extraction Rules with Distant Supervision from the Web. In Proceedings of the 11th ISWC (ISWC'12). Springer-Verlag, Berlin, 263--278. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Meghana Kshirsagar, Sam Thomson, Nathan Schneider, Jaime Carbonell, Noah A Smith, and Chris Dyer. 2015. Frame-semantic role labeling with heterogeneous annotations. people, Vol. 3 (2015), A0.Google ScholarGoogle Scholar
  17. Erdal Kuzey, Jilles Vreeken, and Gerhard Weikum. 2014. A Fresh Look on Knowledge Bases: Distilling Named Events from News. In Proceedings of the 23rd CIKM. 1689--1698. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Hong Li, Sebastian Krause, Feiyu Xu, Andrea Moro, Hans Uszkoreit, and Roberto Navigli. 2015. Improvement of n-ary Relation Extraction by Adding Lexical Semantics to Distant-Supervision Rule Learning. In ICAART 2015 - Proceedings of the International Conference on Agents and Artificial Intelligence. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Filipe Mesquita, Jordan Schmidek, and Denilson Barbosa. 2013. Effectiveness and Efficiency of Open Relation Extraction. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing . Association for Computational Linguistics, 447--457.Google ScholarGoogle Scholar
  20. Martha Palmer, Daniel Gildea, and Nianwen Xue. 2010. Semantic Role Labeling .Morgan & Claypool Publishers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Nanyun Peng, Hoifung Poon, Chris Quirk, Kristina Toutanova, and Wen-tau Yih. 2017. Cross-Sentence N-ary Relation Extraction with Graph LSTMs. TACL, Vol. 5 (2017), 101--115.Google ScholarGoogle ScholarCross RefCross Ref
  22. Vasin Punyakanok, Dan Roth, and Wen-tau Yih. 2008. The Importance of Syntactic Parsing and Inference in Semantic Role Labeling. Comput. Linguist., Vol. 34, 2 (June 2008), 257--287. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Tengyu Ma Sanjeev Arora, Yingyu Liang. 2017. A Simple but Tough-to-Beat Baseline for Sentence Embeddings. In ICLR.Google ScholarGoogle Scholar
  24. Michael Schmitz, Robert Bart, Stephen Soderland, Oren Etzioni, et al. 2012. Open language learning for information extraction. Proceedings of the 2012 EMNLP. Association for Computational Linguistics, 523--534. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Dafna Shahaf and Carlos Guestrin. 2012. Connecting Two (or Less) Dots: Discovering Structure in News Articles. TKDD, Vol. 5, 4 (2012), 24:1--24:31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Vivek Srikumar and Dan Roth. 2013. Modeling Semantic Relations Expressed by Prepositions. Transactions of the Association for Computational Linguistics, Vol. 1 (2013), 231--242.Google ScholarGoogle ScholarCross RefCross Ref
  27. Zhibiao Wu and Martha Palmer. 1994. Verbs Semantics and Lexical Selection. In Proceedings of the 32Nd Annual Meeting on Association for Computational Linguistics (ACL '94). Association for Computational Linguistics, Stroudsburg, PA, USA, 133--138. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. StuffIE: Semantic Tagging of Unlabeled Facets Using Fine-Grained Information Extraction

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management
        October 2018
        2362 pages
        ISBN:9781450360142
        DOI:10.1145/3269206

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 17 October 2018

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        CIKM '18 Paper Acceptance Rate147of826submissions,18%Overall Acceptance Rate1,861of8,427submissions,22%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader