Article

A web-based kernel function for measuring the similarity of short text snippets

Authors:
Mehran Sahami

Google Inc, Mountain View, CA

Google Inc, Mountain View, CA
View Profile

,
Timothy D. Heilman

Google Inc, Mountain View, CA

Google Inc, Mountain View, CA
View Profile

WWW '06: Proceedings of the 15th international conference on World Wide WebMay 2006Pages 377–386https://doi.org/10.1145/1135777.1135834

Published:23 May 2006Publication History

WWW '06: Proceedings of the 15th international conference on World Wide Web

Pages 377–386

ABSTRACT

Determining the similarity of short text snippets, such as search queries, works poorly with traditional document similarity measures (e.g., cosine), since there are often few, if any, terms in common between two short text snippets. We address this problem by introducing a novel method for measuring the similarity between short text snippets (even those without any overlapping terms) by leveraging web search results to provide greater context for the short texts. In this paper, we define such a similarity kernel function, mathematically analyze some of its properties, and provide examples of its efficacy. We also show the use of this kernel function in a large-scale system for suggesting related queries to search engine users.

References

P. Anick and S. Tipirneni. The paraphrase search assistant: Terminological feedback for iterative information seeking. In Proceedings of the 22nd Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, pages 153--159, 1999.]] Google ScholarDigital Library
A. Banerjee, I. S. Dhillon, J. Ghosh, and S. Sra. Clustering on the unit hypersphere using von mises-fisher distributions. Journal of Machine Learning Research, 6:1345--1382, 2005.]] Google ScholarDigital Library
C. Buckley, G. Salton, J. Allan, and A. Singhal. Automatic query expansion using SMART: TREC 3. In The Third Text REtrieval Conference, pages 69--80, 1994.]]Google Scholar
N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, 2000.]] Google ScholarDigital Library
N. Cristianini, J. Shawe-Taylor, and H. Lodhi. Latent semantic kernels. Journal of Intelligent Information Systems, 18(2):127--152, 2002.]] Google ScholarDigital Library
S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391--407, 1990.]]Google ScholarCross Ref
I. S. Dhillon and S. Sra. Modeling data using directional distributions, 2003.]]Google Scholar
S. T. Dumais, J. Platt, D. Heckerman, and M. Sahami. Inductive learning algorithms and representations for text categorization. In CIKM-98: Proceedings of the Seventh International Conference on Information and Knowledge Management, 1998.]] Google ScholarDigital Library
L. Fitzpatrick and M. Dent. Automatic feedback using past queries: Social searching? In Proceedings of the 20th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, pages 306--313, 1997.]] Google ScholarDigital Library
D. Harman. Relevance feedback and other query modification techniques. In W. B. Frakes and R. Baeza-Yates, editors, Information Retrieval: Data Structures and Algorithms, pages 241--263. Prentice Hall, 1992.]] Google ScholarDigital Library
T. Joachims. Text categorization with support vector machines: learning with many relevant features. In Proceedings of ECML-98, 10th European Conference on Machine Learning, number 1398, pages 137--142, 1998.]] Google ScholarDigital Library
J. S. Kandola, J. Shawe-Taylor, and N. Cristianini. Learning semantic similarity. In Advances in Neural Information Processing Systems (NIPS) 15, pages 657--664, 2002.]]Google Scholar
M. Mitra, A. Singhal, and C. Buckley. Improving automatic query expansion. In Proceedings of the 21st Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, pages 206--214, 1998.]] Google ScholarDigital Library
V. V. Raghavan and H. Sever. On the reuse of past optimal queries. In Proceedings of the 18th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, pages 344--350, 1995.]] Google ScholarDigital Library
G. Salton and C. Buckley. Term weighting approaches in automatic text retrieval. Information Processing and Management, 24(5):513--523, 1988.]] Google ScholarDigital Library
G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill Book Company, 1983.]] Google ScholarDigital Library
G. Salton, A. Wong, and C. S. Yang. A vector space model for automatic indexing. Communications of the ACM, 18:613--620, 1975.]] Google ScholarDigital Library
A. Vinokourov, J. Shawe-Taylor, and N. Cristianini. Inferring a semantic representation of text via cross-language correlation analysis. In Advances in Neural Information Processing Systems (NIPS) 15, pages 1473--1480, 2002.]]Google Scholar
B. Vélez, R. Wiess, M. A. Sheldon, and D. K. Gifford. Fast and effective query refinement. In Proceedings of the 20th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, pages 6--15, 1997.]] Google ScholarDigital Library
J. Xu and W. B. Croft. Query expansion using local and global document analysis. In Proceedings of the 19th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, pages 4--11, 1996.]] Google ScholarDigital Library

Index Terms

A web-based kernel function for measuring the similarity of short text snippets
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources
  2. Machine learning
2. Information systems
  1. Information retrieval

Recommendations

Measuring semantic similarity between words by removing noise and redundancy in web snippets

Semantic similarity measures play important roles in many Web-related tasks such as Web browsing and query suggestion. Because taxonomy-based methods can not deal with continually emerging words, recently Web-based methods have been proposed to solve ...
Read More
Mining Web search engines for query suggestion

Queries to Web search engines are usually short and ambiguous, which provides insufficient information needs of users for effectively retrieving relevant Web pages. To address this problem, query suggestion is implemented by most search engines. However,...
Read More
Identifying popular search goals behind search queries to improve web search ranking
AIRS'11: Proceedings of the 7th Asia conference on Information Retrieval Technology

Web users usually have a certain search goal before they submit a search query. However, many laypersons can't transform their search goals into suitable queries. Thus, understanding original search goals behind a query is very important for search ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '06: Proceedings of the 15th international conference on World Wide Web
May 2006
1102 pages
ISBN:1595933239
DOI:10.1145/1135777
General Chairs:
Leslie Carr
University of Southampton
,
David De Roure
University of Southampton
,
Arun Iyengar
IBM Research
,
Program Chairs:
Carole Goble
University of Manchester, UK
,
Mike Dahlin
University of Texas at Austin
Copyright © 2006 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 May 2006
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
information retrieval
kernel functions
query suggestion
text similarity measures
web search
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 489
  Total Citations
  View Citations
- 3,289
  Total Downloads
- Downloads (Last 12 months)36
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A web-based kernel function for measuring the similarity of short text snippets

WWW '06: Proceedings of the 15th international conference on World Wide Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Measuring semantic similarity between words by removing noise and redundancy in web snippets

Mining Web search engines for query suggestion

Identifying popular search goals behind search queries to improve web search ranking