research-article

StuffIE: Semantic Tagging of Unlabeled Facets Using Fine-Grained Information Extraction

Authors:
Radityo Eko Prasojo

Free University of Bozen-Bolzano, Bozen-Bolzano, Italy

Free University of Bozen-Bolzano, Bozen-Bolzano, Italy
View Profile

,
Mouna Kacimi

Free University of Bozen-Bolzano, Bozen-Bolzano, Italy

Free University of Bozen-Bolzano, Bozen-Bolzano, Italy
View Profile

,
Werner Nutt

Free University of Bozen-Bolzano, Bozen-Bolzano, Italy

Free University of Bozen-Bolzano, Bozen-Bolzano, Italy
View Profile

CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge ManagementOctober 2018Pages 467–476https://doi.org/10.1145/3269206.3271812

Published:17 October 2018Publication History

CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management

Pages 467–476

ABSTRACT

Recent knowledge extraction methods are moving towards ternary and higher-arity relations to capture more information about binary facts. An example is to include the time, the location, and the duration of a specific fact. These relations can be even more complex to extract in advanced domains such as news, where events typically come with different facets including reasons, consequences, purposes, involved parties, and related events. The main challenge consists in first finding the set of facets related to each fact, and second tagging those facets to the relevant category.

In this paper, we tackle the above problems by proposing StuffIE, a fine-grained information extraction approach which is facet-centric. We exploit the Stanford dependency parsing enhanced by lexical databases such as WordNet to extract nested triple relations. Then, we exploit the syntactical dependencies to semantically tag facets using distant learning based on Oxford dictionary. We have tested the accuracy of the extracted facets and their semantic tags using DUC'04 dataset. The results show the high accuracy and coverage of our approach with respect to ClausIE, OLLIE, SEMAFOR SRL and Illinois SRL.

References

Gabor Angeli, Melvin Jose Johnson Premkumar, and Christopher D. Manning. 2015. Leveraging Linguistic Structure For Open Domain Information Extraction. In Proceedings of the 53rd ACL and the 7th IJCNLP (Volume 1: Long Papers). ACL, Beijing, China, 344--354.Google Scholar
Collin F. Baker, Charles J. Fillmore, and John B. Lowe. 1998. The berkeley framenet project. In Proceedings of the 36th ACL and 17th ICCL - Volume 1. Association for Computational Linguistics, 86--90. Google ScholarDigital Library
Soumia Lilia Berrahou, Patrice Buche, Juliette Dibie, and Mathieu Roche. 2016. Xart System: Discovering and Extracting Correlated Arguments of N-ary Relations from Text. In Proceedings of the 6th WIMS (WIMS '16). ACM, New York. Google ScholarDigital Library
Nikita Bhutani, H. V. Jagadish, and Dragomir R. Radev. 2016. Nested Propositions in Open Information Extraction. In EMNLP. The Association for Computational Linguistics, 55--64.Google Scholar
Janara Christensen, Mausam, Stephen Soderland, and Oren Etzioni. 2010. Semantic Role Labeling for Open Information Extraction. In Proceedings of the NAACL HLT 2010 FAM-LbR (FAM-LbR '10). Association for Computational Linguistics, Stroudsburg, PA, USA, 52--60. Google ScholarDigital Library
James Clarke, Vivek Srikumar, Mark Sammons, and Dan Roth. 2012. An NLP Curator (or: How I Learned to Stop Worrying and Love NLP Pipelines). In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12). Istanbul, Turkey, x--y.Google Scholar
Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research, Vol. 12, Aug (2011), 2493--2537. Google ScholarDigital Library
SP Corder. 1968. Double-object verbs in English. (1968).Google Scholar
Luciano Del Corro and Rainer Gemulla. 2013. Clausie: clause-based open information extraction. Proceedings of the 22nd international conference on World Wide Web. ACM, 355--366. Google ScholarDigital Library
George Doddington, Alexis Mitchell, Mark Przybocki, Lance Ramshaw, Stephanie Strassel, and Ralph Weischedel. 2004. The Automatic Content Extraction (ACE) Program Tasks, Data, and Evaluation. In Proceedings of LREC-2004. ELRA, Lisbon, Portugal.Google Scholar
Patrick Ernst, Amy Siu, and Gerhard Weikum. 2018. HighLife: Higher-arity Fact Harvesting. In Proceedings of the 2018 WWW (WWW '18). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 1013--1022. Google ScholarDigital Library
Kiril Gashteovski, Rainer Gemulla, and Luciano Del Corro. 2017. MinIE: Minimizing Facts in Open Information Extraction. In EMNLP. Association for Computational Linguistics, 2630--2640.Google Scholar
Daniel Gildea and Daniel Jurafsky. 2002. Automatic Labeling of Semantic Roles. Comput. Linguist., Vol. 28, 3 (Sept. 2002), 245--288. Google ScholarDigital Library
Paul Kingsbury and Martha Palmer. 2002. From TreeBank to PropBank. In LREC. 1989--1993.Google Scholar
Sebastian Krause, Hong Li, Hans Uszkoreit, and Feiyu Xu. 2012. Large-Scale Learning of Relation-extraction Rules with Distant Supervision from the Web. In Proceedings of the 11th ISWC (ISWC'12). Springer-Verlag, Berlin, 263--278. Google ScholarDigital Library
Meghana Kshirsagar, Sam Thomson, Nathan Schneider, Jaime Carbonell, Noah A Smith, and Chris Dyer. 2015. Frame-semantic role labeling with heterogeneous annotations. people, Vol. 3 (2015), A0.Google Scholar
Erdal Kuzey, Jilles Vreeken, and Gerhard Weikum. 2014. A Fresh Look on Knowledge Bases: Distilling Named Events from News. In Proceedings of the 23rd CIKM. 1689--1698. Google ScholarDigital Library
Hong Li, Sebastian Krause, Feiyu Xu, Andrea Moro, Hans Uszkoreit, and Roberto Navigli. 2015. Improvement of n-ary Relation Extraction by Adding Lexical Semantics to Distant-Supervision Rule Learning. In ICAART 2015 - Proceedings of the International Conference on Agents and Artificial Intelligence. Google ScholarDigital Library
Filipe Mesquita, Jordan Schmidek, and Denilson Barbosa. 2013. Effectiveness and Efficiency of Open Relation Extraction. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing . Association for Computational Linguistics, 447--457.Google Scholar
Martha Palmer, Daniel Gildea, and Nianwen Xue. 2010. Semantic Role Labeling .Morgan & Claypool Publishers. Google ScholarDigital Library
Nanyun Peng, Hoifung Poon, Chris Quirk, Kristina Toutanova, and Wen-tau Yih. 2017. Cross-Sentence N-ary Relation Extraction with Graph LSTMs. TACL, Vol. 5 (2017), 101--115.Google ScholarCross Ref
Vasin Punyakanok, Dan Roth, and Wen-tau Yih. 2008. The Importance of Syntactic Parsing and Inference in Semantic Role Labeling. Comput. Linguist., Vol. 34, 2 (June 2008), 257--287. Google ScholarDigital Library
Tengyu Ma Sanjeev Arora, Yingyu Liang. 2017. A Simple but Tough-to-Beat Baseline for Sentence Embeddings. In ICLR.Google Scholar
Michael Schmitz, Robert Bart, Stephen Soderland, Oren Etzioni, et al. 2012. Open language learning for information extraction. Proceedings of the 2012 EMNLP. Association for Computational Linguistics, 523--534. Google ScholarDigital Library
Dafna Shahaf and Carlos Guestrin. 2012. Connecting Two (or Less) Dots: Discovering Structure in News Articles. TKDD, Vol. 5, 4 (2012), 24:1--24:31. Google ScholarDigital Library
Vivek Srikumar and Dan Roth. 2013. Modeling Semantic Relations Expressed by Prepositions. Transactions of the Association for Computational Linguistics, Vol. 1 (2013), 231--242.Google ScholarCross Ref
Zhibiao Wu and Martha Palmer. 1994. Verbs Semantics and Lexical Selection. In Proceedings of the 32Nd Annual Meeting on Association for Computational Linguistics (ACL '94). Association for Computational Linguistics, Stroudsburg, PA, USA, 133--138. Google ScholarDigital Library

Index Terms

StuffIE: Semantic Tagging of Unlabeled Facets Using Fine-Grained Information Extraction
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Information extraction
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning

Recommendations

Automatic Facet Extraction Based on Multidimensional Semantic Index
SKG '12: Proceedings of the 2012 Eighth International Conference on Semantics, Knowledge and Grids

Faceted search on web pages needs exact facets. However, it is difficult to extract facets exactly from web pages because the web pages are unstructured and lack of facet information. Therefore, facet extraction is a key to faceted search. This paper ...
Read More
Learning the semantics of structured data sources

Information sources such as relational databases, spreadsheets, XML, JSON, and Web APIs contain a tremendous amount of structured data that can be leveraged to build and augment knowledge graphs. However, they rarely provide a semantic model to describe ...
Read More
A Distant Learning Approach for Extracting Hypernym Relations from Wikipedia Disambiguation Pages

Extracting hypernym relations from text is one of the key steps in the automated construction and enrichment of semantic resources. The state of the art offers a large varierty of methods (linguistic, statistical, learning based, hybrid). This variety ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management
October 2018
2362 pages
ISBN:9781450360142
DOI:10.1145/3269206
General Chair:
Alfredo Cuzzocrea
University of Trieste, Italy
,
Program Chairs:
James Allan
University of Massachusetts, USA
,
Norman Paton
University of Manchester, United Kingdom
,
Divesh Srivastava
AT&T Labs Research, USA
,
Rakesh Agrawal
Data Insights Lab, USA
,
Andrei Broder
Google Research, USA
,
Mohammed Zaki
Rensselaer Polytechnic Institute, USA
,
Selcuk Candan
Arizona State University, USA
,
Alexandros Labrinidis
University of Pittsburgh, USA
,
Assaf Schuster
Technion, Israel
,
Haixun Wang
Google Research, USA
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 October 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
distant learning
facet extraction
semantic labeling
Qualifiers
- research-article
Conference

Acceptance Rates
CIKM '18 Paper Acceptance Rate147of826submissions,18%Overall Acceptance Rate1,861of8,427submissions,22%
More
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 362
  Total Downloads
- Downloads (Last 12 months)11
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

StuffIE: Semantic Tagging of Unlabeled Facets Using Fine-Grained Information Extraction

CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Automatic Facet Extraction Based on Multidimensional Semantic Index

Learning the semantics of structured data sources

A Distant Learning Approach for Extracting Hypernym Relations from Wikipedia Disambiguation Pages