ABSTRACT
Knowledge graphs have become a fundamental asset for search engines. A fair amount of user queries seek information on problem-solving tasks such as building a fence or repairing a bicycle. However, knowledge graphs completely lack this kind of how-to knowledge. This paper presents a method for automatically constructing a formal knowledge base on tasks and task-solving steps, by tapping the contents of online communities such as WikiHow. We employ Open-IE techniques to extract noisy candidates for tasks, steps and the required tools and other items. For cleaning and properly organizing this data, we devise embedding-based clustering techniques. The resulting knowledge base, HowToKB, includes a hierarchical taxonomy of disambiguated tasks, temporal orders of sub-tasks, and attributes for involved items. A comprehensive evaluation of HowToKB shows high accuracy. As an extrinsic use case, we evaluate automatically searching related YouTube videos for HowToKB tasks.
- C. F. Baker, C. J. Fillmore, and J. B. Lowe. The berkeley framenet project. In COLING-ACL, 1998. Google ScholarDigital Library
- N. Balasubramanian, S. Soderland, Mausam, and O. Etzioni. Generating coherent event schemas at scale. In EMNLP, 2013.Google Scholar
- J. Berant, V. Srikumar, P.-C. Chen, A. V. Linden, B. Harding, B. Huang, P. Clark, and C. D. Manning. Modeling biological processes for reading comprehension. In EMNLP, 2014.Google ScholarCross Ref
- C. D. Bovi, L. Telesca, and R. Navigli. Large-Scale Information Extraction from Textual Definitions through Deep Syntactic and Semantic Analysis. TACL, pages 529--543, 2015.Google ScholarCross Ref
- T. Brants and A. Franz. Web 1t 5-gram v1. 2006.Google Scholar
- L. D. Brown, T. T. Cai, and A. DasGupta. Interval estimation for a binomial proportion. Statistical Science, 16(2):101--133, 2001.Google ScholarCross Ref
- N. Chambers. Event schema induction with a probabilistic entity-driven model. In EMNLP, 2013.Google Scholar
- N. Chambers and D. Jurafsky. Unsupervised learning of narrative event chains. In ACL, 2008.Google Scholar
- N. Chinchor. Muc-4 evaluation metrics. In MUC, 1992. Google ScholarDigital Library
- D. Dahlmeier and H. T. Ng. Domain adaptation for semantic role labeling in the biomedical domain. Bioinformatics, 26:1098--1104, 2010. Google ScholarDigital Library
- D. Das, D. Chen, A. F. T. Martins, N. Schneider, and N. A. Smith. Frame-semantic parsing. Computational Linguistics, 40:9--56, 2014. Google ScholarDigital Library
- O. Etzioni, A. Fader, J. Christensen, S. Soderland, and M. Mausam. Open information extraction: The second generation. In IJCAI, pages 3--10, 2011. Google ScholarDigital Library
- C. Fellbaum and G. Miller. WordNet: An Electronic Lexical Database. MIT Press, 1998.Google ScholarCross Ref
- R. E. Fikes and N. J. Nilsson. Strips: A new approach to the application of theorem proving to problem solving. Artificial intelligence, 2(3--4):189--208, 1971. Google ScholarDigital Library
- C. J. Fillmore. Frame semantics and the nature of language. Annals of the New York Academy of Sciences, 280(1):20--32, 1976.Google ScholarCross Ref
- L. Frermann, I. Titov, and M. Pinkal. A hierarchical bayesian model for unsupervised induction of script knowledge. In EACL, 2014.Google ScholarCross Ref
- N. Garg and J. Henderson. A bayesian model of multilingual unsupervised semantic role induction. CoRR, abs/1603.01514, 2016.Google Scholar
- D. Gildea and D. Jurafsky. Automatic labeling of semantic roles. Computational Linguistics, 2000. Google ScholarDigital Library
- R. Girju and D. I. Moldovan. Text mining for causal relations. In FLAIRS Conference, 2002. Google ScholarDigital Library
- A. Grycner and G. Weikum. Poly: Mining relational paraphrases from multilingual sentences. In EMNLP, 2016.Google ScholarCross Ref
- L. He, M. Lewis, and L. S. Zettlemoyer. Question-answer driven semantic role labeling: Using natural language to annotate natural language. In EMNLP, 2015.Google ScholarCross Ref
- T.-H. Huang, F. Ferraro, N. Mostafazadeh, I. Misra, A. Agrawal, J. Devlin, R. B. Girshick, X. He, P. Kohli, D. Batra, C. L. Zitnick, D. Parikh, L. Vanderwende, M. Galley, and M. Mitchell. Visual storytelling. In HLT-NAACL, 2016.Google ScholarCross Ref
- R. Johansson and P. Nugues. Lth: Semantic structure extraction using nonprojective dependency trees. In SemEval@ACL, 2007. Google ScholarDigital Library
- C. Kiddon, G. T. Ponnuraj, L. S. Zettlemoyer, and Y. Choi. Mise en place: Unsupervised interpretation of instructional recipes. In EMNLP, 2015.Google ScholarCross Ref
- P. Kingsbury and M. Palmer. Propbank: the next level of treebank. In Proceedings of Treebanks and lexical Theories, volume 3. Citeseer, 2003.Google Scholar
- J. Lang and M. Lapata. Unsupervised semantic role induction with graph partitioning. In EMNLP, 2011. Google ScholarDigital Library
- M. Lewis, L. He, and L. S. Zettlemoyer. Joint a* ccg parsing and semantic role labelling. In EMNLP, 2015.Google ScholarCross Ref
- D. Lin and P. Pantel. Dirt@ sbt@ discovery of inference rules from text. In KDD 2001. Google ScholarDigital Library
- S. Louvan, C. Naik, S. Kumaravel, H. Kwon, N. Balasubramanian, and P. Clark. Cross sentence inference for process knowledge. In EMNLP, 2016.Google ScholarCross Ref
- Mausam, M. Schmitz, S. Soderland, R. Bart, and O. Etzioni. Open language learning for information extraction. In EMNLP-CoNLL, 2012. Google ScholarDigital Library
- T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, pages 3111--3119, 2013. Google ScholarDigital Library
- M. Minsky. A framework for representing knowledge. In MIT-AI Laboratory Memo 306, 1974. Google ScholarDigital Library
- F. Murtagh and P. Contreras. Algorithms for hierarchical clustering: an overview. Wiley Interdisc. Rew.: Data Mining and Knowledge Discovery, 2:86--97, 2012.Google ScholarCross Ref
- N. Nakashole, G. Weikum, and F. Suchanek. Patty: a taxonomy of relational patterns with semantic types. In EMNLP, pages 1135--1145, 2012. Google ScholarDigital Library
- J. W. Orr, P. Tadepalli, J. R. Doppa, X. Z. Fern, and T. G. Dietterich. Learning scripts as hidden markov models. In AAAI, 2014. Google ScholarDigital Library
- D. Pal, M. Mitra, and K. Datta. Improving query expansion using wordnet. Journal of the Association for Information Science and Technology, 65(12):2469--2478, 2014.Google ScholarDigital Library
- M. Palmer, I. Titov, and S. Wu. Semantic role labeling. In Tutorials, NAACL-HLT, 2013.Google Scholar
- M. Pennacchiotti, D. D. Cao, R. Basili, D. Croce, and M. Roth. Automatic induction of framenet lexical units. In EMNLP, 2008. Google ScholarDigital Library
- K. Pichotta and R. J. Mooney. Statistical script learning with recurrent neural networks. EMNLP, page 11, 2016.Google Scholar
- M. Regneri, A. Koller, and M. Pinkal. Learning script knowledge with web experiments. In ACL, pages 979--988, 2010. Google ScholarDigital Library
- M. Roth and M. Lapata. Neural semantic role labeling with dependency path embeddings. CoRR, abs/1605.07515, 2016.Google Scholar
- R. Rudinger, P. Rastogi, F. Ferraro, and B. V. Durme. Script induction as language modeling. In EMNLP 2015.Google ScholarCross Ref
- R. C. Schank and R. P. Abelson. Scripts, plans and knowledge. In IJCAI, 1975. Google ScholarDigital Library
- K. K. Schuler. Verbnet: A broad-coverage, comprehensive verb lexicon. Doctoral Dissertation, University of Pennsylvania, 2005. Google ScholarDigital Library
- G. Schwarz. Estimating the dimension of a model. The Annals of Statistics 6(2), pages 461--464, 1978.Google Scholar
- R. Sharp, M. Surdeanu, P. Jansen, P. Clark, and M. Hammond. Creating causal embeddings for question answering with minimal supervision. In EMNLP, 2016.Google ScholarCross Ref
- R. Sibson. Slink: an optimally efficient algorithm for the single-link cluster method. The computer journal, 16(1):30--34, 1973.Google Scholar
- R. Speer and C. Havasi. Representing general relational knowledge in conceptnet 5. In LREC, pages 3679--3686, 2012.Google Scholar
- N. Tandon, G. de Melo, A. De, and G. Weikum. Knowlywood: Mining activity knowledge from hollywood narratives. In CIKM, pages 223--232, 2015. Google ScholarDigital Library
- N. Tandon, G. de Melo, and G. Weikum. Acquiring comparative commonsense knowledge from the web. In AAAI, 2014. Google ScholarDigital Library
- S. ting Yi, E. Loper, and M. Palmer. Can semantic roles generalize across genres? In HLT-NAACL, 2007.Google Scholar
- I. Titov and A. Klementiev. A bayesian approach to unsupervised semantic role induction. In EACL, 2012. Google ScholarDigital Library
- K. Toutanova, A. Haghighi, and C. Manning. Joint learning improves semantic role labeling. In ACL 2005. Google ScholarDigital Library
- L. D. A. Wanzare, A. Zarcone, S. Thater, and M. Pinkal. A crowdsourced database of event sequence descriptions for the acquisition of high-quality script knowledge. In LREC, 2016.Google Scholar
- Z. Wu and M. Palmer. Verb semantics and lexical selection. In ACL, pages 133--138, 1994. Google ScholarDigital Library
- Z. Yang and E. Nyberg. Leveraging procedural knowledge for task-oriented search. In SIGIR 2015. Google ScholarDigital Library
- P. Young, A. Lai, M. Hodosh, and J. Hockenmaier. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. TACL, 2:67--78, 2014.Google ScholarCross Ref
Index Terms
- Distilling Task Knowledge from How-To Communities
Recommendations
Impact of Knowledge Management Practices on Task Knowledge: An Individual Level Study
Organizational level studies of knowledge management have been hampered by the lack of measures of individual level knowledge management practices and outcomes that can be used as success criteria to determine whether, or to what degree, specific ...
Mining and supporting task-stage knowledge: a hierarchical clustering technique
PAKM'06: Proceedings of the 6th international conference on Practical Aspects of Knowledge ManagementIn task-based business environments, organizations usually conduct knowledge-intensive tasks to achieve organizational goals; thus, knowledge management systems (KMSs) need to provide relevant information to fulfill the information needs of knowledge ...
Exploring search task difficulty reasons in different task types and user knowledge groups
Development of a search task difficulty reason scheme.Relationship between task difficulty and task type: common and different reasons across task types.Relationship between task difficulty and user knowledge: common and different reasons among ...
Comments