research-article

Free Access

The linguistic structure of English web-search queries

Authors:
Cory Barr

Yahoo! Inc., Sunnyvale, CA

Yahoo! Inc., Sunnyvale, CA
View Profile

,
Rosie Jones

Yahoo! Inc., Sunnyvale, CA

Yahoo! Inc., Sunnyvale, CA
View Profile

,
Moira Regelson

Perfect Market, Inc., Pasadena, CA

Perfect Market, Inc., Pasadena, CA
View Profile

Authors Info & Claims

EMNLP '08: Proceedings of the Conference on Empirical Methods in Natural Language ProcessingOctober 2008Pages 1021–1030

Published:25 October 2008Publication History

EMNLP '08: Proceedings of the Conference on Empirical Methods in Natural Language Processing

Pages 1021–1030

ABSTRACT

Web-search queries are known to be short, but little else is known about their structure. In this paper we investigate the applicability of part-of-speech tagging to typical English-language web search-engine queries and the potential value of these tags for improving search results. We begin by identifying a set of part-of-speech tags suitable for search queries and quantifying their occurrence. We find that proper-nouns constitute 40% of query terms, and proper nouns and nouns together constitute over 70% of query terms. We also show that the majority of queries are noun-phrases, not unstructured collections of terms. We then use a set of queries manually labeled with these tags to train a Brill tagger and evaluate its performance. In addition, we investigate classification of search queries into grammatical classes based on the syntax of part-of-speech tag sequences. We also conduct preliminary investigative experiments into the practical applicability of leveraging query-trained part-of-speech taggers for information-retrieval tasks. In particular, we show that part-of-speech information can be a significant feature in machine-learned search-result relevance. These experiments also include the potential use of the tagger in selecting words for omission or substitution in query reformulation, actions which can improve recall. We conclude that training a part-of-speech tagger on labeled corpora of queries significantly outperforms taggers based on traditional corpora, and leveraging the unique linguistic structure of web-search queries can improve search experience.

References

James Allan and Hema Raghavan. 2002. Using part-of-speech patterns to reduce query ambiguity. In Proceedings of SIGIR, pages 307--314. Google ScholarDigital Library
Kevin Bartz, Cory Barr, and Adil Aijaz. 2008. Natural language generation in sponsored-search advertisements. In Proceedings of the 9th ACM Conference on Electronic Commerce, pages 1--9, Chicago, Illinois. Google ScholarDigital Library
Eric Brill. 1995. Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Computational Linguistics, 21(4):543--565. Google ScholarDigital Library
Abdur Chowdhury and M. Catherine McCabe. 2000. Improving information retrieval systems using part of speech tagging.Google Scholar
Fabio Crestani, Mark Sanderson, and Mounia Lalmas. 1998. Short queries, natural language and spoken document retrieval: Experiments at glasgow university. In Proceedings of the Sixth Text Retrieval Conference (TREC-6), pages 667--686.Google Scholar
Erika F. de Lima and Jan O. Pederson. 1999. Phrase recognition and expansion for short, precision-biased queries based on a query log. In Annual ACM Conference on Research and Development in Information Retrieval Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pages 145--152, Berkeley, California. Google ScholarDigital Library
Bekir Taner Dincer and Bahar Karaoglan. 2004. The effect of part-of-speech tagging on ir performance for turkish. pages 771--778.Google Scholar
Bernard J. Jansen, Amanda Spink, and Tefko Saracevic. 2000. Real life, real users, and real needs: a study and analysis of user queries on the web. Information Processing and Management, 36(2):207--227. Google ScholarDigital Library
Christina Amalia Lioma. 2008. Part of speech N-grams for information retrieval. Ph.D. thesis, University of Glasgow, Glasgow, Scotland, UK.Google Scholar
Marius Pasca. 2007. Weakly-supervised discovery of named entities using web search queries. In CIKM, pages 683--690. Google ScholarDigital Library
Amanda Spink, B. J. Jansen, D. Wolfram, and T. Saracevic. 2002. From e-sex to e-commerce: Web search changes. IEEE Computer, 35(3):107--109. Google ScholarDigital Library
Tomek Strzalkowski, Jose Perez Carballo, and Mihnea Marinescu. 1998. Natural language information retrieval: Trec-3 report. In Proceedings of the Sixth Text Retrieval Conference (TREC-6), page 39.Google Scholar
Kristina Toutanova and Christopher D. Manning. 2000. Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-2000). Google ScholarDigital Library
Kristina Toutanova, Dan Klein, Christopher D. Manning, and Yoram Singer. 2003. Feature-rich part-of-speech tagging with a cyclic dependency network. In Proceedings of HLT-NAACL, pages 252--259. Google ScholarDigital Library
Ingrid Zukerman and Bhavani Raskutti. 2002. Lexical query paraphrasing for document retrieval. In COLING, pages 1177--1183, Taipei, Taiwan. Google ScholarDigital Library

Recommendations

Amharic-English bilingual web search engine
MEDES '12: Proceedings of the International Conference on Management of Emergent Digital EcoSystems

As non-English languages are growing exponentially on the Web, the number of online non-English speakers who realizes the importance of finding information in different languages is enormously growing. However, the major general purpose search engines ...
Read More
Using English information in non-English web search
iNEWS '08: Proceedings of the 2nd ACM workshop on Improving non english web searching

The leading web search engines have spent a decade building highly specialized ranking functions for English web pages. One of the reasons these ranking functions are effective is that they are designed around features such as PageRank, automatic query ...
Read More
Amharic-English Bilingual Search Engine: Design and Implimentation of Amharic-English Bilingual Search Engine
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EMNLP '08: Proceedings of the Conference on Empirical Methods in Natural Language Processing
October 2008
1129 pages
Program Chairs:
Mirella Lapata
University of Edinburgh
,
Hwee Tou Ng
National University of Singapore
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 25 October 2008
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate73of234submissions,31%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 36
  Total Citations
  View Citations
- 608
  Total Downloads
- Downloads (Last 12 months)26
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

The linguistic structure of English web-search queries

EMNLP '08: Proceedings of the Conference on Empirical Methods in Natural Language Processing

ABSTRACT

References

Cited By

Recommendations

Amharic-English bilingual web search engine

Using English information in non-English web search

Amharic-English Bilingual Search Engine: Design and Implimentation of Amharic-English Bilingual Search Engine

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

The linguistic structure of English web-search queries

EMNLP '08: Proceedings of the Conference on Empirical Methods in Natural Language Processing

ABSTRACT

References

Cited By

Recommendations

Amharic-English bilingual web search engine

Using English information in non-English web search

Amharic-English Bilingual Search Engine: Design and Implimentation of Amharic-English Bilingual Search Engine

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media