research-article

The role of documents vs. queries in extracting class attributes from text

Authors:
Marius Paşca

Google Inc., Mountain View, CA

Google Inc., Mountain View, CA
View Profile

,
Benjamin Van Durme

University of Rochester, Rochester, NY

University of Rochester, Rochester, NY
View Profile

,
Nikesh Garera

Johns Hopkins University, Baltimore, MD

Johns Hopkins University, Baltimore, MD
View Profile

CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge managementNovember 2007Pages 485–494https://doi.org/10.1145/1321440.1321510

Published:06 November 2007Publication History

CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management

Pages 485–494

ABSTRACT

Challenging the implicit reliance on document collections, this paper discusses the pros and cons of using query logs rather than document collections, as self-contained sources of data in textual information extraction. The differences are quantified as part of a large-scale study on extracting prominent attributes or quantifiable properties of classes (e.g., top speed, price and fuel consumption for CarModel) from unstructured text. In a head-to-head qualitative comparison, a lightweight extraction method produces class attributes that are 45% more accurate on average, when acquired from query logs rather than Web documents.

References

E. Agichtein, E. Brill, and S. Dumais. Improving Web search ranking by incorporating user behavior information. In Proceedings of the 29th ACM Conference on Research and Development in Information Retrieval (SIGIR-06), pages 19--26, Seattle, Washington, 2006. Google ScholarDigital Library
E. Agichtein and L. Gravano. Snowball: Extracting relations from large plaintext collections. In Proceedings of the 5th ACM International Conference on Digital Libraries (DL-00), pages 85--94, San Antonio, Texas, 2000. Google ScholarDigital Library
T. Brants. TnT - a statistical part of speech tagger. In Proceedings of the 6th Conference on Applied Natural Language Processing (ANLP-00), pages 224--231, Seattle, Washington, 2000. Google ScholarDigital Library
M. Cafarella, D. Downey, S. Soderland, and O. Etzioni. KnowItNow: Fast, scalable information extraction from the Web. In Proceedings of the Human Language Technology Conference (HLT-EMNLP-05), pages 563--570, Vancouver, Canada, 2005. Google ScholarDigital Library
T. Chklovski and Y. Gil. An analysis of knowledge collected from volunteer contributors. In Proceedings of the 20th National Conference on Artificial Intelligence (AAAI-05), pages 564--571, Pittsburgh, Pennsylvania, 2005. Google ScholarDigital Library
H. Cui, J. Wen, J. Nie, and W. Ma. Probabilistic query expansion using query logs. In Proceedings of the 11th World Wide Web Conference (WWW-02), pages 325--332, Honolulu, Hawaii, 2002. Google ScholarDigital Library
D. Dowty, R. Wall, and S. Peters. Introduction to Montague Semantics. Springer, 1980.Google ScholarCross Ref
M. Li, M. Zhu, Y. Zhang, and M. Zhou. Exploring distributional similarity based models for query spelling correction. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL-06), pages 1025--1032, Sydney, Australia, 2006. Google ScholarDigital Library
X. Li and D. Roth. Learning question classifiers. In Proceedings of the 19th International Conference on Computational Linguistics (COLING-02), pages 556--562, Taipei, Taiwan, 2002. Google ScholarDigital Library
D. Lin and P. Pantel. Concept discovery from text. In Proceedings of the 19th International Conference on Computational linguistics (COLING-02), pages 1--7, 2002. Google ScholarDigital Library
L. Lita and J. Carbonell. Instance-based question answering: A data driven approach. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-04), pages 396--403, Barcelona, Spain, 2004.Google Scholar
R. Mooney and R. Bunescu. Mining knowledge from text using information extraction. SIGKDD Explorations, 7(1):3--10, 2005. Google ScholarDigital Library
M. Paşca and B. Van Durme. What you seek is what you get: Extraction of class attributes from query logs. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI-07), pages 2832--2837, Hyderabad, India, 2007. Google ScholarDigital Library
P. Pantel and M. Pennacchiotti. Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL-06), pages 113--120, Sydney, Australia, 2006. Google ScholarDigital Library
P. Pantel and D. Ravichandran. Automatically labeling semantic classes. In Proceedings of the 2004 Human Language Technology Conference (HLT-NAACL-04), pages 321--328, Boston, Massachusetts, 2004.Google Scholar
M. Remy. Wikipedia: The free encyclopedia. Online Information Review, 26(6):434, 2002.Google ScholarCross Ref
L. Schubert. Turing's dream and the knowledge challenge. In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI--06), Boston, Massachusetts, 2006. Google ScholarDigital Library
Y. Shinyama and S. Sekine. Preemptive information extraction using unrestricted relation discovery. In Proceedings of the 2006 Human Language Technology Conference (HLT-NAACL-06), pages 204--311, New York, New York, 2006. Google ScholarDigital Library
K. Shinzato and K. Torisawa. Acquiring hyponymy relations from web documents. In Proceedings of the 2004 Human Language Technology Conference (HLT-NAACL-04), pages 73--80, Boston, Massachusetts, 2004.Google Scholar
K. Tokunaga, J. Kazama, and K. Torisawa. Automatic discovery of attribute words from Web documents. In Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP-05), pages 106--118, Jeju Island,Korea, 2005. Google ScholarDigital Library
E. Voorhees. Evaluating answers to definition questions. In Proceedings of the 2003 Human Language Technology Conference (HLT-NAACL-03), pages 109--111, Edmonton, Canada, 2003. Google ScholarDigital Library
G. Wang, T. Chua, and Y. Wang. Extracting key semantic terms from Chinese speech query for Web searches. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL-03), pages 248--255, Sapporo, Japan, 2003. Google ScholarDigital Library
Z. Zhuang and S. Cucerzan. Re-ranking search results using query logs. In Proceedings of the 15th International Conference on Information and Knowledge Management (CIKM-06), pages 860--861, Arlington, Virginia, 2006. Google ScholarDigital Library

Index Terms

The role of documents vs. queries in extracting class attributes from text
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning
2. Information systems
  1. Information retrieval
    1. Document representation

Recommendations

Using structured text for large-scale attribute extraction
CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management

We propose a weakly-supervised approach for extracting class attributes from structured text available within Web documents. The overall precision of the extracted attributes is around 30% higher than with previous methods operating on Web documents. In ...
Read More
Extraction of open-domain class attributes from text: building blocks for faceted search
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

Knowledge automatically extracted from text captures instances, classes of instances and relations among them. In particular, the acquisition of class attributes (e.g., "top speed", "body style" and "number of cylinders" for the class of "sports cars") ...
Read More
Enabling Structured Queries over Unstructured Documents
MDM '11: Proceedings of the 2011 IEEE 12th International Conference on Mobile Data Management - Volume 02

With the information explosion on the internet, finding precise answers efficiently is a prevalent requirement by many users. Today, search engines answer keyword queries with a ranked list of documents. Users might not be always willing to read the top ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
November 2007
1048 pages
ISBN:9781595938039
DOI:10.1145/1321440
Co-chair:
Alberto H. F. Laender,
Conference Chairs:
André O. Falcão
Universidade de Lisboa, Portugal
,
Øystein Haug Olsen,
General Chair:
Mário J. Silva
(Universidade de Lisboa, Portugal)
,
Program Chairs:
Ricardo Baeza-Yates,
Deborah L. McGuinness,
Bjorn Olstad
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 November 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
class attribute extraction
knowledge acquisition
query logs
textual data sources
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 17
  Total Citations
  View Citations
- 483
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

The role of documents vs. queries in extracting class attributes from text

CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Using structured text for large-scale attribute extraction

Extraction of open-domain class attributes from text: building blocks for faceted search

Enabling Structured Queries over Unstructured Documents