research-article

Distilling Task Knowledge from How-To Communities

Authors:
Cuong Xuan Chu

Max Planck Institute for Informatics, Saarbrücken, Germany

Max Planck Institute for Informatics, Saarbrücken, Germany
View Profile

,
Niket Tandon

Allen Institute for Artificial Intelligence, Seattle, USA

Allen Institute for Artificial Intelligence, Seattle, USA
View Profile

,
Gerhard Weikum

Max Planck Institute for Informatics, Saarbrücken, Germany

Max Planck Institute for Informatics, Saarbrücken, Germany
View Profile

WWW '17: Proceedings of the 26th International Conference on World Wide WebApril 2017Pages 805–814https://doi.org/10.1145/3038912.3052715

Published:03 April 2017Publication History

WWW '17: Proceedings of the 26th International Conference on World Wide Web

Pages 805–814

ABSTRACT

Knowledge graphs have become a fundamental asset for search engines. A fair amount of user queries seek information on problem-solving tasks such as building a fence or repairing a bicycle. However, knowledge graphs completely lack this kind of how-to knowledge. This paper presents a method for automatically constructing a formal knowledge base on tasks and task-solving steps, by tapping the contents of online communities such as WikiHow. We employ Open-IE techniques to extract noisy candidates for tasks, steps and the required tools and other items. For cleaning and properly organizing this data, we devise embedding-based clustering techniques. The resulting knowledge base, HowToKB, includes a hierarchical taxonomy of disambiguated tasks, temporal orders of sub-tasks, and attributes for involved items. A comprehensive evaluation of HowToKB shows high accuracy. As an extrinsic use case, we evaluate automatically searching related YouTube videos for HowToKB tasks.

References

C. F. Baker, C. J. Fillmore, and J. B. Lowe. The berkeley framenet project. In COLING-ACL, 1998. Google ScholarDigital Library
N. Balasubramanian, S. Soderland, Mausam, and O. Etzioni. Generating coherent event schemas at scale. In EMNLP, 2013.Google Scholar
J. Berant, V. Srikumar, P.-C. Chen, A. V. Linden, B. Harding, B. Huang, P. Clark, and C. D. Manning. Modeling biological processes for reading comprehension. In EMNLP, 2014.Google ScholarCross Ref
C. D. Bovi, L. Telesca, and R. Navigli. Large-Scale Information Extraction from Textual Definitions through Deep Syntactic and Semantic Analysis. TACL, pages 529--543, 2015.Google ScholarCross Ref
T. Brants and A. Franz. Web 1t 5-gram v1. 2006.Google Scholar
L. D. Brown, T. T. Cai, and A. DasGupta. Interval estimation for a binomial proportion. Statistical Science, 16(2):101--133, 2001.Google ScholarCross Ref
N. Chambers. Event schema induction with a probabilistic entity-driven model. In EMNLP, 2013.Google Scholar
N. Chambers and D. Jurafsky. Unsupervised learning of narrative event chains. In ACL, 2008.Google Scholar
N. Chinchor. Muc-4 evaluation metrics. In MUC, 1992. Google ScholarDigital Library
D. Dahlmeier and H. T. Ng. Domain adaptation for semantic role labeling in the biomedical domain. Bioinformatics, 26:1098--1104, 2010. Google ScholarDigital Library
D. Das, D. Chen, A. F. T. Martins, N. Schneider, and N. A. Smith. Frame-semantic parsing. Computational Linguistics, 40:9--56, 2014. Google ScholarDigital Library
O. Etzioni, A. Fader, J. Christensen, S. Soderland, and M. Mausam. Open information extraction: The second generation. In IJCAI, pages 3--10, 2011. Google ScholarDigital Library
C. Fellbaum and G. Miller. WordNet: An Electronic Lexical Database. MIT Press, 1998.Google ScholarCross Ref
R. E. Fikes and N. J. Nilsson. Strips: A new approach to the application of theorem proving to problem solving. Artificial intelligence, 2(3--4):189--208, 1971. Google ScholarDigital Library
C. J. Fillmore. Frame semantics and the nature of language. Annals of the New York Academy of Sciences, 280(1):20--32, 1976.Google ScholarCross Ref
L. Frermann, I. Titov, and M. Pinkal. A hierarchical bayesian model for unsupervised induction of script knowledge. In EACL, 2014.Google ScholarCross Ref
N. Garg and J. Henderson. A bayesian model of multilingual unsupervised semantic role induction. CoRR, abs/1603.01514, 2016.Google Scholar
D. Gildea and D. Jurafsky. Automatic labeling of semantic roles. Computational Linguistics, 2000. Google ScholarDigital Library
R. Girju and D. I. Moldovan. Text mining for causal relations. In FLAIRS Conference, 2002. Google ScholarDigital Library
A. Grycner and G. Weikum. Poly: Mining relational paraphrases from multilingual sentences. In EMNLP, 2016.Google ScholarCross Ref
L. He, M. Lewis, and L. S. Zettlemoyer. Question-answer driven semantic role labeling: Using natural language to annotate natural language. In EMNLP, 2015.Google ScholarCross Ref
T.-H. Huang, F. Ferraro, N. Mostafazadeh, I. Misra, A. Agrawal, J. Devlin, R. B. Girshick, X. He, P. Kohli, D. Batra, C. L. Zitnick, D. Parikh, L. Vanderwende, M. Galley, and M. Mitchell. Visual storytelling. In HLT-NAACL, 2016.Google ScholarCross Ref
R. Johansson and P. Nugues. Lth: Semantic structure extraction using nonprojective dependency trees. In SemEval@ACL, 2007. Google ScholarDigital Library
C. Kiddon, G. T. Ponnuraj, L. S. Zettlemoyer, and Y. Choi. Mise en place: Unsupervised interpretation of instructional recipes. In EMNLP, 2015.Google ScholarCross Ref
P. Kingsbury and M. Palmer. Propbank: the next level of treebank. In Proceedings of Treebanks and lexical Theories, volume 3. Citeseer, 2003.Google Scholar
J. Lang and M. Lapata. Unsupervised semantic role induction with graph partitioning. In EMNLP, 2011. Google ScholarDigital Library
M. Lewis, L. He, and L. S. Zettlemoyer. Joint a* ccg parsing and semantic role labelling. In EMNLP, 2015.Google ScholarCross Ref
D. Lin and P. Pantel. Dirt@ sbt@ discovery of inference rules from text. In KDD 2001. Google ScholarDigital Library
S. Louvan, C. Naik, S. Kumaravel, H. Kwon, N. Balasubramanian, and P. Clark. Cross sentence inference for process knowledge. In EMNLP, 2016.Google ScholarCross Ref
Mausam, M. Schmitz, S. Soderland, R. Bart, and O. Etzioni. Open language learning for information extraction. In EMNLP-CoNLL, 2012. Google ScholarDigital Library
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, pages 3111--3119, 2013. Google ScholarDigital Library
M. Minsky. A framework for representing knowledge. In MIT-AI Laboratory Memo 306, 1974. Google ScholarDigital Library
F. Murtagh and P. Contreras. Algorithms for hierarchical clustering: an overview. Wiley Interdisc. Rew.: Data Mining and Knowledge Discovery, 2:86--97, 2012.Google ScholarCross Ref
N. Nakashole, G. Weikum, and F. Suchanek. Patty: a taxonomy of relational patterns with semantic types. In EMNLP, pages 1135--1145, 2012. Google ScholarDigital Library
J. W. Orr, P. Tadepalli, J. R. Doppa, X. Z. Fern, and T. G. Dietterich. Learning scripts as hidden markov models. In AAAI, 2014. Google ScholarDigital Library
D. Pal, M. Mitra, and K. Datta. Improving query expansion using wordnet. Journal of the Association for Information Science and Technology, 65(12):2469--2478, 2014.Google ScholarDigital Library
M. Palmer, I. Titov, and S. Wu. Semantic role labeling. In Tutorials, NAACL-HLT, 2013.Google Scholar
M. Pennacchiotti, D. D. Cao, R. Basili, D. Croce, and M. Roth. Automatic induction of framenet lexical units. In EMNLP, 2008. Google ScholarDigital Library
K. Pichotta and R. J. Mooney. Statistical script learning with recurrent neural networks. EMNLP, page 11, 2016.Google Scholar
M. Regneri, A. Koller, and M. Pinkal. Learning script knowledge with web experiments. In ACL, pages 979--988, 2010. Google ScholarDigital Library
M. Roth and M. Lapata. Neural semantic role labeling with dependency path embeddings. CoRR, abs/1605.07515, 2016.Google Scholar
R. Rudinger, P. Rastogi, F. Ferraro, and B. V. Durme. Script induction as language modeling. In EMNLP 2015.Google ScholarCross Ref
R. C. Schank and R. P. Abelson. Scripts, plans and knowledge. In IJCAI, 1975. Google ScholarDigital Library
K. K. Schuler. Verbnet: A broad-coverage, comprehensive verb lexicon. Doctoral Dissertation, University of Pennsylvania, 2005. Google ScholarDigital Library
G. Schwarz. Estimating the dimension of a model. The Annals of Statistics 6(2), pages 461--464, 1978.Google Scholar
R. Sharp, M. Surdeanu, P. Jansen, P. Clark, and M. Hammond. Creating causal embeddings for question answering with minimal supervision. In EMNLP, 2016.Google ScholarCross Ref
R. Sibson. Slink: an optimally efficient algorithm for the single-link cluster method. The computer journal, 16(1):30--34, 1973.Google Scholar
R. Speer and C. Havasi. Representing general relational knowledge in conceptnet 5. In LREC, pages 3679--3686, 2012.Google Scholar
N. Tandon, G. de Melo, A. De, and G. Weikum. Knowlywood: Mining activity knowledge from hollywood narratives. In CIKM, pages 223--232, 2015. Google ScholarDigital Library
N. Tandon, G. de Melo, and G. Weikum. Acquiring comparative commonsense knowledge from the web. In AAAI, 2014. Google ScholarDigital Library
S. ting Yi, E. Loper, and M. Palmer. Can semantic roles generalize across genres? In HLT-NAACL, 2007.Google Scholar
I. Titov and A. Klementiev. A bayesian approach to unsupervised semantic role induction. In EACL, 2012. Google ScholarDigital Library
K. Toutanova, A. Haghighi, and C. Manning. Joint learning improves semantic role labeling. In ACL 2005. Google ScholarDigital Library
L. D. A. Wanzare, A. Zarcone, S. Thater, and M. Pinkal. A crowdsourced database of event sequence descriptions for the acquisition of high-quality script knowledge. In LREC, 2016.Google Scholar
Z. Wu and M. Palmer. Verb semantics and lexical selection. In ACL, pages 133--138, 1994. Google ScholarDigital Library
Z. Yang and E. Nyberg. Leveraging procedural knowledge for task-oriented search. In SIGIR 2015. Google ScholarDigital Library
P. Young, A. Lai, M. Hodosh, and J. Hockenmaier. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. TACL, 2:67--78, 2014.Google ScholarCross Ref

Index Terms

Distilling Task Knowledge from How-To Communities
1. Information systems
  1. Data management systems
    1. Information integration
      1. Entity resolution

Recommendations

Impact of Knowledge Management Practices on Task Knowledge: An Individual Level Study

Organizational level studies of knowledge management have been hampered by the lack of measures of individual level knowledge management practices and outcomes that can be used as success criteria to determine whether, or to what degree, specific ...
Read More
Mining and supporting task-stage knowledge: a hierarchical clustering technique
PAKM'06: Proceedings of the 6th international conference on Practical Aspects of Knowledge Management

In task-based business environments, organizations usually conduct knowledge-intensive tasks to achieve organizational goals; thus, knowledge management systems (KMSs) need to provide relevant information to fulfill the information needs of knowledge ...
Read More
Exploring search task difficulty reasons in different task types and user knowledge groups

Development of a search task difficulty reason scheme.Relationship between task difficulty and task type: common and different reasons across task types.Relationship between task difficulty and user knowledge: common and different reasons among ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '17: Proceedings of the 26th International Conference on World Wide Web
April 2017
1678 pages
ISBN:9781450349130
General Chairs:
Rick Barrett
W3Events
,
Rick Cummings
Murdoch University
,
Program Chairs:
Eugene Agichtein
Emory University
,
Evgeniy Gabrilovich
Google Research
Copyright © 2017 Copyright is held by the International World Wide Web Conference Committee (IW3C2).
Sponsors
In-Cooperation
Publisher
International World Wide Web Conferences Steering Committee
Republic and Canton of Geneva, Switzerland
Publication History
- Published: 3 April 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
howto commonsense
knowledge base construction
Qualifiers
- research-article
Conference

Acceptance Rates
WWW '17 Paper Acceptance Rate164of966submissions,17%Overall Acceptance Rate1,899of8,196submissions,23%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 16
  Total Citations
  View Citations
- 694
  Total Downloads
- Downloads (Last 12 months)37
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Distilling Task Knowledge from How-To Communities

WWW '17: Proceedings of the 26th International Conference on World Wide Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Impact of Knowledge Management Practices on Task Knowledge: An Individual Level Study

Mining and supporting task-stage knowledge: a hierarchical clustering technique

Exploring search task difficulty reasons in different task types and user knowledge groups

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Distilling Task Knowledge from How-To Communities

WWW '17: Proceedings of the 26th International Conference on World Wide Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Impact of Knowledge Management Practices on Task Knowledge: An Individual Level Study

Mining and supporting task-stage knowledge: a hierarchical clustering technique

Exploring search task difficulty reasons in different task types and user knowledge groups

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media