skip to main content
10.1145/1321440.1321510acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

The role of documents vs. queries in extracting class attributes from text

Published:06 November 2007Publication History

ABSTRACT

Challenging the implicit reliance on document collections, this paper discusses the pros and cons of using query logs rather than document collections, as self-contained sources of data in textual information extraction. The differences are quantified as part of a large-scale study on extracting prominent attributes or quantifiable properties of classes (e.g., top speed, price and fuel consumption for CarModel) from unstructured text. In a head-to-head qualitative comparison, a lightweight extraction method produces class attributes that are 45% more accurate on average, when acquired from query logs rather than Web documents.

References

  1. E. Agichtein, E. Brill, and S. Dumais. Improving Web search ranking by incorporating user behavior information. In Proceedings of the 29th ACM Conference on Research and Development in Information Retrieval (SIGIR-06), pages 19--26, Seattle, Washington, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. E. Agichtein and L. Gravano. Snowball: Extracting relations from large plaintext collections. In Proceedings of the 5th ACM International Conference on Digital Libraries (DL-00), pages 85--94, San Antonio, Texas, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. T. Brants. TnT - a statistical part of speech tagger. In Proceedings of the 6th Conference on Applied Natural Language Processing (ANLP-00), pages 224--231, Seattle, Washington, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Cafarella, D. Downey, S. Soderland, and O. Etzioni. KnowItNow: Fast, scalable information extraction from the Web. In Proceedings of the Human Language Technology Conference (HLT-EMNLP-05), pages 563--570, Vancouver, Canada, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. T. Chklovski and Y. Gil. An analysis of knowledge collected from volunteer contributors. In Proceedings of the 20th National Conference on Artificial Intelligence (AAAI-05), pages 564--571, Pittsburgh, Pennsylvania, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. H. Cui, J. Wen, J. Nie, and W. Ma. Probabilistic query expansion using query logs. In Proceedings of the 11th World Wide Web Conference (WWW-02), pages 325--332, Honolulu, Hawaii, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Dowty, R. Wall, and S. Peters. Introduction to Montague Semantics. Springer, 1980.Google ScholarGoogle ScholarCross RefCross Ref
  8. M. Li, M. Zhu, Y. Zhang, and M. Zhou. Exploring distributional similarity based models for query spelling correction. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL-06), pages 1025--1032, Sydney, Australia, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. X. Li and D. Roth. Learning question classifiers. In Proceedings of the 19th International Conference on Computational Linguistics (COLING-02), pages 556--562, Taipei, Taiwan, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Lin and P. Pantel. Concept discovery from text. In Proceedings of the 19th International Conference on Computational linguistics (COLING-02), pages 1--7, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L. Lita and J. Carbonell. Instance-based question answering: A data driven approach. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-04), pages 396--403, Barcelona, Spain, 2004.Google ScholarGoogle Scholar
  12. R. Mooney and R. Bunescu. Mining knowledge from text using information extraction. SIGKDD Explorations, 7(1):3--10, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Paşca and B. Van Durme. What you seek is what you get: Extraction of class attributes from query logs. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI-07), pages 2832--2837, Hyderabad, India, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. P. Pantel and M. Pennacchiotti. Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL-06), pages 113--120, Sydney, Australia, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. Pantel and D. Ravichandran. Automatically labeling semantic classes. In Proceedings of the 2004 Human Language Technology Conference (HLT-NAACL-04), pages 321--328, Boston, Massachusetts, 2004.Google ScholarGoogle Scholar
  16. M. Remy. Wikipedia: The free encyclopedia. Online Information Review, 26(6):434, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  17. L. Schubert. Turing's dream and the knowledge challenge. In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI--06), Boston, Massachusetts, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Y. Shinyama and S. Sekine. Preemptive information extraction using unrestricted relation discovery. In Proceedings of the 2006 Human Language Technology Conference (HLT-NAACL-06), pages 204--311, New York, New York, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. K. Shinzato and K. Torisawa. Acquiring hyponymy relations from web documents. In Proceedings of the 2004 Human Language Technology Conference (HLT-NAACL-04), pages 73--80, Boston, Massachusetts, 2004.Google ScholarGoogle Scholar
  20. K. Tokunaga, J. Kazama, and K. Torisawa. Automatic discovery of attribute words from Web documents. In Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP-05), pages 106--118, Jeju Island,Korea, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. E. Voorhees. Evaluating answers to definition questions. In Proceedings of the 2003 Human Language Technology Conference (HLT-NAACL-03), pages 109--111, Edmonton, Canada, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. G. Wang, T. Chua, and Y. Wang. Extracting key semantic terms from Chinese speech query for Web searches. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL-03), pages 248--255, Sapporo, Japan, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Z. Zhuang and S. Cucerzan. Re-ranking search results using query logs. In Proceedings of the 15th International Conference on Information and Knowledge Management (CIKM-06), pages 860--861, Arlington, Virginia, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. The role of documents vs. queries in extracting class attributes from text

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader