skip to main content
10.1145/3038912.3052715acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Distilling Task Knowledge from How-To Communities

Authors Info & Claims
Published:03 April 2017Publication History

ABSTRACT

Knowledge graphs have become a fundamental asset for search engines. A fair amount of user queries seek information on problem-solving tasks such as building a fence or repairing a bicycle. However, knowledge graphs completely lack this kind of how-to knowledge. This paper presents a method for automatically constructing a formal knowledge base on tasks and task-solving steps, by tapping the contents of online communities such as WikiHow. We employ Open-IE techniques to extract noisy candidates for tasks, steps and the required tools and other items. For cleaning and properly organizing this data, we devise embedding-based clustering techniques. The resulting knowledge base, HowToKB, includes a hierarchical taxonomy of disambiguated tasks, temporal orders of sub-tasks, and attributes for involved items. A comprehensive evaluation of HowToKB shows high accuracy. As an extrinsic use case, we evaluate automatically searching related YouTube videos for HowToKB tasks.

References

  1. C. F. Baker, C. J. Fillmore, and J. B. Lowe. The berkeley framenet project. In COLING-ACL, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. N. Balasubramanian, S. Soderland, Mausam, and O. Etzioni. Generating coherent event schemas at scale. In EMNLP, 2013.Google ScholarGoogle Scholar
  3. J. Berant, V. Srikumar, P.-C. Chen, A. V. Linden, B. Harding, B. Huang, P. Clark, and C. D. Manning. Modeling biological processes for reading comprehension. In EMNLP, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  4. C. D. Bovi, L. Telesca, and R. Navigli. Large-Scale Information Extraction from Textual Definitions through Deep Syntactic and Semantic Analysis. TACL, pages 529--543, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  5. T. Brants and A. Franz. Web 1t 5-gram v1. 2006.Google ScholarGoogle Scholar
  6. L. D. Brown, T. T. Cai, and A. DasGupta. Interval estimation for a binomial proportion. Statistical Science, 16(2):101--133, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  7. N. Chambers. Event schema induction with a probabilistic entity-driven model. In EMNLP, 2013.Google ScholarGoogle Scholar
  8. N. Chambers and D. Jurafsky. Unsupervised learning of narrative event chains. In ACL, 2008.Google ScholarGoogle Scholar
  9. N. Chinchor. Muc-4 evaluation metrics. In MUC, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Dahlmeier and H. T. Ng. Domain adaptation for semantic role labeling in the biomedical domain. Bioinformatics, 26:1098--1104, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Das, D. Chen, A. F. T. Martins, N. Schneider, and N. A. Smith. Frame-semantic parsing. Computational Linguistics, 40:9--56, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. O. Etzioni, A. Fader, J. Christensen, S. Soderland, and M. Mausam. Open information extraction: The second generation. In IJCAI, pages 3--10, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C. Fellbaum and G. Miller. WordNet: An Electronic Lexical Database. MIT Press, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  14. R. E. Fikes and N. J. Nilsson. Strips: A new approach to the application of theorem proving to problem solving. Artificial intelligence, 2(3--4):189--208, 1971. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C. J. Fillmore. Frame semantics and the nature of language. Annals of the New York Academy of Sciences, 280(1):20--32, 1976.Google ScholarGoogle ScholarCross RefCross Ref
  16. L. Frermann, I. Titov, and M. Pinkal. A hierarchical bayesian model for unsupervised induction of script knowledge. In EACL, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  17. N. Garg and J. Henderson. A bayesian model of multilingual unsupervised semantic role induction. CoRR, abs/1603.01514, 2016.Google ScholarGoogle Scholar
  18. D. Gildea and D. Jurafsky. Automatic labeling of semantic roles. Computational Linguistics, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. Girju and D. I. Moldovan. Text mining for causal relations. In FLAIRS Conference, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Grycner and G. Weikum. Poly: Mining relational paraphrases from multilingual sentences. In EMNLP, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  21. L. He, M. Lewis, and L. S. Zettlemoyer. Question-answer driven semantic role labeling: Using natural language to annotate natural language. In EMNLP, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  22. T.-H. Huang, F. Ferraro, N. Mostafazadeh, I. Misra, A. Agrawal, J. Devlin, R. B. Girshick, X. He, P. Kohli, D. Batra, C. L. Zitnick, D. Parikh, L. Vanderwende, M. Galley, and M. Mitchell. Visual storytelling. In HLT-NAACL, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  23. R. Johansson and P. Nugues. Lth: Semantic structure extraction using nonprojective dependency trees. In SemEval@ACL, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. C. Kiddon, G. T. Ponnuraj, L. S. Zettlemoyer, and Y. Choi. Mise en place: Unsupervised interpretation of instructional recipes. In EMNLP, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  25. P. Kingsbury and M. Palmer. Propbank: the next level of treebank. In Proceedings of Treebanks and lexical Theories, volume 3. Citeseer, 2003.Google ScholarGoogle Scholar
  26. J. Lang and M. Lapata. Unsupervised semantic role induction with graph partitioning. In EMNLP, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. Lewis, L. He, and L. S. Zettlemoyer. Joint a* ccg parsing and semantic role labelling. In EMNLP, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  28. D. Lin and P. Pantel. Dirt@ sbt@ discovery of inference rules from text. In KDD 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Louvan, C. Naik, S. Kumaravel, H. Kwon, N. Balasubramanian, and P. Clark. Cross sentence inference for process knowledge. In EMNLP, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  30. Mausam, M. Schmitz, S. Soderland, R. Bart, and O. Etzioni. Open language learning for information extraction. In EMNLP-CoNLL, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, pages 3111--3119, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. Minsky. A framework for representing knowledge. In MIT-AI Laboratory Memo 306, 1974. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. F. Murtagh and P. Contreras. Algorithms for hierarchical clustering: an overview. Wiley Interdisc. Rew.: Data Mining and Knowledge Discovery, 2:86--97, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  34. N. Nakashole, G. Weikum, and F. Suchanek. Patty: a taxonomy of relational patterns with semantic types. In EMNLP, pages 1135--1145, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. W. Orr, P. Tadepalli, J. R. Doppa, X. Z. Fern, and T. G. Dietterich. Learning scripts as hidden markov models. In AAAI, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. D. Pal, M. Mitra, and K. Datta. Improving query expansion using wordnet. Journal of the Association for Information Science and Technology, 65(12):2469--2478, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. Palmer, I. Titov, and S. Wu. Semantic role labeling. In Tutorials, NAACL-HLT, 2013.Google ScholarGoogle Scholar
  38. M. Pennacchiotti, D. D. Cao, R. Basili, D. Croce, and M. Roth. Automatic induction of framenet lexical units. In EMNLP, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. K. Pichotta and R. J. Mooney. Statistical script learning with recurrent neural networks. EMNLP, page 11, 2016.Google ScholarGoogle Scholar
  40. M. Regneri, A. Koller, and M. Pinkal. Learning script knowledge with web experiments. In ACL, pages 979--988, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. M. Roth and M. Lapata. Neural semantic role labeling with dependency path embeddings. CoRR, abs/1605.07515, 2016.Google ScholarGoogle Scholar
  42. R. Rudinger, P. Rastogi, F. Ferraro, and B. V. Durme. Script induction as language modeling. In EMNLP 2015.Google ScholarGoogle ScholarCross RefCross Ref
  43. R. C. Schank and R. P. Abelson. Scripts, plans and knowledge. In IJCAI, 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. K. K. Schuler. Verbnet: A broad-coverage, comprehensive verb lexicon. Doctoral Dissertation, University of Pennsylvania, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. G. Schwarz. Estimating the dimension of a model. The Annals of Statistics 6(2), pages 461--464, 1978.Google ScholarGoogle Scholar
  46. R. Sharp, M. Surdeanu, P. Jansen, P. Clark, and M. Hammond. Creating causal embeddings for question answering with minimal supervision. In EMNLP, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  47. R. Sibson. Slink: an optimally efficient algorithm for the single-link cluster method. The computer journal, 16(1):30--34, 1973.Google ScholarGoogle Scholar
  48. R. Speer and C. Havasi. Representing general relational knowledge in conceptnet 5. In LREC, pages 3679--3686, 2012.Google ScholarGoogle Scholar
  49. N. Tandon, G. de Melo, A. De, and G. Weikum. Knowlywood: Mining activity knowledge from hollywood narratives. In CIKM, pages 223--232, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. N. Tandon, G. de Melo, and G. Weikum. Acquiring comparative commonsense knowledge from the web. In AAAI, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. S. ting Yi, E. Loper, and M. Palmer. Can semantic roles generalize across genres? In HLT-NAACL, 2007.Google ScholarGoogle Scholar
  52. I. Titov and A. Klementiev. A bayesian approach to unsupervised semantic role induction. In EACL, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. K. Toutanova, A. Haghighi, and C. Manning. Joint learning improves semantic role labeling. In ACL 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. L. D. A. Wanzare, A. Zarcone, S. Thater, and M. Pinkal. A crowdsourced database of event sequence descriptions for the acquisition of high-quality script knowledge. In LREC, 2016.Google ScholarGoogle Scholar
  55. Z. Wu and M. Palmer. Verb semantics and lexical selection. In ACL, pages 133--138, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Z. Yang and E. Nyberg. Leveraging procedural knowledge for task-oriented search. In SIGIR 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. P. Young, A. Lai, M. Hodosh, and J. Hockenmaier. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. TACL, 2:67--78, 2014.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Distilling Task Knowledge from How-To Communities

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      WWW '17: Proceedings of the 26th International Conference on World Wide Web
      April 2017
      1678 pages
      ISBN:9781450349130

      Copyright © 2017 Copyright is held by the International World Wide Web Conference Committee (IW3C2).

      Publisher

      International World Wide Web Conferences Steering Committee

      Republic and Canton of Geneva, Switzerland

      Publication History

      • Published: 3 April 2017

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      WWW '17 Paper Acceptance Rate164of966submissions,17%Overall Acceptance Rate1,899of8,196submissions,23%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader