ABSTRACT
Recent knowledge extraction methods are moving towards ternary and higher-arity relations to capture more information about binary facts. An example is to include the time, the location, and the duration of a specific fact. These relations can be even more complex to extract in advanced domains such as news, where events typically come with different facets including reasons, consequences, purposes, involved parties, and related events. The main challenge consists in first finding the set of facets related to each fact, and second tagging those facets to the relevant category.
In this paper, we tackle the above problems by proposing StuffIE, a fine-grained information extraction approach which is facet-centric. We exploit the Stanford dependency parsing enhanced by lexical databases such as WordNet to extract nested triple relations. Then, we exploit the syntactical dependencies to semantically tag facets using distant learning based on Oxford dictionary. We have tested the accuracy of the extracted facets and their semantic tags using DUC'04 dataset. The results show the high accuracy and coverage of our approach with respect to ClausIE, OLLIE, SEMAFOR SRL and Illinois SRL.
- Gabor Angeli, Melvin Jose Johnson Premkumar, and Christopher D. Manning. 2015. Leveraging Linguistic Structure For Open Domain Information Extraction. In Proceedings of the 53rd ACL and the 7th IJCNLP (Volume 1: Long Papers). ACL, Beijing, China, 344--354.Google Scholar
- Collin F. Baker, Charles J. Fillmore, and John B. Lowe. 1998. The berkeley framenet project. In Proceedings of the 36th ACL and 17th ICCL - Volume 1. Association for Computational Linguistics, 86--90. Google ScholarDigital Library
- Soumia Lilia Berrahou, Patrice Buche, Juliette Dibie, and Mathieu Roche. 2016. Xart System: Discovering and Extracting Correlated Arguments of N-ary Relations from Text. In Proceedings of the 6th WIMS (WIMS '16). ACM, New York. Google ScholarDigital Library
- Nikita Bhutani, H. V. Jagadish, and Dragomir R. Radev. 2016. Nested Propositions in Open Information Extraction. In EMNLP. The Association for Computational Linguistics, 55--64.Google Scholar
- Janara Christensen, Mausam, Stephen Soderland, and Oren Etzioni. 2010. Semantic Role Labeling for Open Information Extraction. In Proceedings of the NAACL HLT 2010 FAM-LbR (FAM-LbR '10). Association for Computational Linguistics, Stroudsburg, PA, USA, 52--60. Google ScholarDigital Library
- James Clarke, Vivek Srikumar, Mark Sammons, and Dan Roth. 2012. An NLP Curator (or: How I Learned to Stop Worrying and Love NLP Pipelines). In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12). Istanbul, Turkey, x--y.Google Scholar
- Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research, Vol. 12, Aug (2011), 2493--2537. Google ScholarDigital Library
- SP Corder. 1968. Double-object verbs in English. (1968).Google Scholar
- Luciano Del Corro and Rainer Gemulla. 2013. Clausie: clause-based open information extraction. Proceedings of the 22nd international conference on World Wide Web. ACM, 355--366. Google ScholarDigital Library
- George Doddington, Alexis Mitchell, Mark Przybocki, Lance Ramshaw, Stephanie Strassel, and Ralph Weischedel. 2004. The Automatic Content Extraction (ACE) Program Tasks, Data, and Evaluation. In Proceedings of LREC-2004. ELRA, Lisbon, Portugal.Google Scholar
- Patrick Ernst, Amy Siu, and Gerhard Weikum. 2018. HighLife: Higher-arity Fact Harvesting. In Proceedings of the 2018 WWW (WWW '18). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 1013--1022. Google ScholarDigital Library
- Kiril Gashteovski, Rainer Gemulla, and Luciano Del Corro. 2017. MinIE: Minimizing Facts in Open Information Extraction. In EMNLP. Association for Computational Linguistics, 2630--2640.Google Scholar
- Daniel Gildea and Daniel Jurafsky. 2002. Automatic Labeling of Semantic Roles. Comput. Linguist., Vol. 28, 3 (Sept. 2002), 245--288. Google ScholarDigital Library
- Paul Kingsbury and Martha Palmer. 2002. From TreeBank to PropBank. In LREC. 1989--1993.Google Scholar
- Sebastian Krause, Hong Li, Hans Uszkoreit, and Feiyu Xu. 2012. Large-Scale Learning of Relation-extraction Rules with Distant Supervision from the Web. In Proceedings of the 11th ISWC (ISWC'12). Springer-Verlag, Berlin, 263--278. Google ScholarDigital Library
- Meghana Kshirsagar, Sam Thomson, Nathan Schneider, Jaime Carbonell, Noah A Smith, and Chris Dyer. 2015. Frame-semantic role labeling with heterogeneous annotations. people, Vol. 3 (2015), A0.Google Scholar
- Erdal Kuzey, Jilles Vreeken, and Gerhard Weikum. 2014. A Fresh Look on Knowledge Bases: Distilling Named Events from News. In Proceedings of the 23rd CIKM. 1689--1698. Google ScholarDigital Library
- Hong Li, Sebastian Krause, Feiyu Xu, Andrea Moro, Hans Uszkoreit, and Roberto Navigli. 2015. Improvement of n-ary Relation Extraction by Adding Lexical Semantics to Distant-Supervision Rule Learning. In ICAART 2015 - Proceedings of the International Conference on Agents and Artificial Intelligence. Google ScholarDigital Library
- Filipe Mesquita, Jordan Schmidek, and Denilson Barbosa. 2013. Effectiveness and Efficiency of Open Relation Extraction. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing . Association for Computational Linguistics, 447--457.Google Scholar
- Martha Palmer, Daniel Gildea, and Nianwen Xue. 2010. Semantic Role Labeling .Morgan & Claypool Publishers. Google ScholarDigital Library
- Nanyun Peng, Hoifung Poon, Chris Quirk, Kristina Toutanova, and Wen-tau Yih. 2017. Cross-Sentence N-ary Relation Extraction with Graph LSTMs. TACL, Vol. 5 (2017), 101--115.Google ScholarCross Ref
- Vasin Punyakanok, Dan Roth, and Wen-tau Yih. 2008. The Importance of Syntactic Parsing and Inference in Semantic Role Labeling. Comput. Linguist., Vol. 34, 2 (June 2008), 257--287. Google ScholarDigital Library
- Tengyu Ma Sanjeev Arora, Yingyu Liang. 2017. A Simple but Tough-to-Beat Baseline for Sentence Embeddings. In ICLR.Google Scholar
- Michael Schmitz, Robert Bart, Stephen Soderland, Oren Etzioni, et al. 2012. Open language learning for information extraction. Proceedings of the 2012 EMNLP. Association for Computational Linguistics, 523--534. Google ScholarDigital Library
- Dafna Shahaf and Carlos Guestrin. 2012. Connecting Two (or Less) Dots: Discovering Structure in News Articles. TKDD, Vol. 5, 4 (2012), 24:1--24:31. Google ScholarDigital Library
- Vivek Srikumar and Dan Roth. 2013. Modeling Semantic Relations Expressed by Prepositions. Transactions of the Association for Computational Linguistics, Vol. 1 (2013), 231--242.Google ScholarCross Ref
- Zhibiao Wu and Martha Palmer. 1994. Verbs Semantics and Lexical Selection. In Proceedings of the 32Nd Annual Meeting on Association for Computational Linguistics (ACL '94). Association for Computational Linguistics, Stroudsburg, PA, USA, 133--138. Google ScholarDigital Library
Index Terms
- StuffIE: Semantic Tagging of Unlabeled Facets Using Fine-Grained Information Extraction
Recommendations
Automatic Facet Extraction Based on Multidimensional Semantic Index
SKG '12: Proceedings of the 2012 Eighth International Conference on Semantics, Knowledge and GridsFaceted search on web pages needs exact facets. However, it is difficult to extract facets exactly from web pages because the web pages are unstructured and lack of facet information. Therefore, facet extraction is a key to faceted search. This paper ...
Learning the semantics of structured data sources
Information sources such as relational databases, spreadsheets, XML, JSON, and Web APIs contain a tremendous amount of structured data that can be leveraged to build and augment knowledge graphs. However, they rarely provide a semantic model to describe ...
A Distant Learning Approach for Extracting Hypernym Relations from Wikipedia Disambiguation Pages
Extracting hypernym relations from text is one of the key steps in the automated construction and enrichment of semantic resources. The state of the art offers a large varierty of methods (linguistic, statistical, learning based, hybrid). This variety ...
Comments