Abstract
We focus on the problem of query rewriting for sponsored search. We base rewrites on a historical click graph that records the ads that have been clicked on in response to past user queries. Given a query q, we first consider Simrank [7] as a way to identify queries similar to q, i.e., queries whose ads a user may be interested in. We argue that Simrank fails to properly identify query similarities in our application, and we present two enhanced versions of Simrank: one that exploits weights on click graph edges and another that exploits "evidence." We experimentally evaluate our new schemes against Simrank, using actual click graphs and queries from Yahoo!, and using a variety of metrics. Our results show that the enhanced methods can yield more and better query rewrites.
- Reid Andersen, Fan Chung, and Kevin Lang. Local graph partitioning using pagerank vectors. In FOCS '06. Google ScholarDigital Library
- I. Antonellis, H. Garcia-Molina, and C. Chang. Simrank++: Query rewriting through link analysis of the click graph. In Technical Report, url: http://dbpubs.stanford.edu/pub/2007--32, 2007.Google Scholar
- Doug Beeferman and Adam Berger. Agglomerative clustering of a search engine query log. In KDD '00. Google ScholarDigital Library
- Nick Craswell and Martin Szummer. Random walks on the click graph. In Proc. SIGIR '07. Google ScholarDigital Library
- Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified data processing on large clusters. In OSDI 2004. Google ScholarDigital Library
- S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. 1990.Google Scholar
- Glen Jeh and Jennifer Widom. Simrank: a measure of structural-context similarity. In KDD '02. Google ScholarDigital Library
- Rosie Jones and Daniel C. Fain. Query word deletion prediction. In SIGIR '03. Google ScholarDigital Library
- Rosie Jones, Benjamin Rey, Omid Madani, and Wiley Greiner. Generating query substitutions. In WWW '07. Google ScholarDigital Library
- Christos H. Papadimitriou, Hisao Tamaki, Prabhakar Raghavan, and Santosh Vempala. Latent semantic indexing: a probabilistic analysis. In PODS '98. Google ScholarDigital Library
- M. Regelson and D. Fain. Predicting click-through rate using keyword clusters. In Proc. 2nd Workshop on Sponsored Search Auctions.Google Scholar
- Matthew Richardson, Ewa Dominowska, and Robert Ragno. Predicting clicks: Estimating the click-through rate for new ads. In WWW '07. Google ScholarDigital Library
- Ian Ruthven. Re-examining the potential effectiveness of interactive query expansion. In SIGIR '03, pages 213--220. Google ScholarDigital Library
- A. Sinclair. Algorithms for random generation and counting: A markov chain approach. In Birkhauser, Boston-Basel-Berlin, 1993. Google ScholarDigital Library
- Egidio Terra and Charles L. A. Clarke. Scoring missing terms in information retrieval tasks. In CIKM '04. Google ScholarDigital Library
- Ji-Rong Wen, Jian-Yun Nie, and Hong-Jiang Zhang. Query clustering using user logs. ACM Trans. Inf. Syst., 2002. Google ScholarDigital Library
- Wei Vivian Zhang, Xiaofei He, Benjamin Rey, and Rosie Jones. Query rewriting using active learning for sponsored search. In SIGIR '07. Google ScholarDigital Library
- Wei Vivian Zhang and Rosie Jones. Comparing click logs and editorial labels for training query rewriting. In Query Log Analysis Workshop, WWW '07.Google Scholar
Index Terms
- Simrank++: query rewriting through link analysis of the click graph
Recommendations
Simrank++: query rewriting through link analysis of the clickgraph (poster)
WWW '08: Proceedings of the 17th international conference on World Wide WebWe focus on the problem of query rewriting for sponsored search. We base rewrites on a historical click graph that records the ads that have been clicked on in response to past user queries. Given a query q, we first consider Simrank [2] as a way to ...
The comparative effectiveness of sponsored and nonsponsored links for Web e-commerce queries
The predominant business model for Web search engines is sponsored search, which generates billions in yearly revenue. But are sponsored links providing online consumers with relevant choices for products and services? We address this and related issues ...
SimRank*: effective and scalable pairwise similarity search based on graph topology
Given a graph, how can we quantify similarity between two nodes in an effective and scalable wayý SimRank is an attractive measure of pairwise similarity based on graph topologies. Its underpinning philosophy that "two nodes are similar if they are ...
Comments