research-article

That's Not My Question: Learning to Weight Unmatched Terms in CQA Vertical Search

Authors:
Boaz Petersil

The Technion, Haifa, Israel

The Technion, Haifa, Israel
View Profile

,
Avihai Mejer

Yahoo Research, Haifa, Israel

Yahoo Research, Haifa, Israel
View Profile

,
Idan Szpektor

Yahoo Research, Haifa, Israel

Yahoo Research, Haifa, Israel
View Profile

,
Koby Crammer

The Technion, Haifa, Israel

The Technion, Haifa, Israel
View Profile

SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information RetrievalJuly 2016Pages 225–234https://doi.org/10.1145/2911451.2911496

Published:07 July 2016Publication History

SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval

Pages 225–234

ABSTRACT

A fundamental task in Information Retrieval (IR) is term weighting. Early IR theory considered both the presence or absence of all terms in the lexicon for ranking and needed to weight them all. Yet, as the size of lexicons grew and models became too complex, common weighting models preferred to aggregate only the weights of the query terms that are matched in candidate documents. Thus, unmatched term contribution in these models is only considered indirectly, such as in probability smoothing with corpus distribution, or in weight normalization by document length. In this work we propose a novel term weighting model that directly assesses the weights of unmatched terms, and show its benefits. Specifically, we propose a Learning To Rank framework, in which features corresponding to matched terms are also "mirrored" in similar features that account only for unmatched terms. The relative importance of each feature is learned via a click-through query log. As a test case, we consider vertical search in Community-based Question Answering(CQA) sites from Web queries. Queries that result in viewing CQA content often contain fine grained information needs and benefit more from unmatched term weighting. We assess our model both via manual evaluation and via automatic evaluation over a clickthrough log. Our results show consistent improvement in retrieval when unmatched information is taken into account. This holds both when only identical terms are considered matched, and when related terms are matched via distributional similarity.

References

G. Amati, V. Rijsbergen, and C. Joost. Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst., 20(4), Oct. 2002. Google ScholarDigital Library
J. Arguello, F. Diaz, J. Callan, and J.-F. Crespo. Sources of evidence for vertical selection. In SIGIR, 2009. Google ScholarDigital Library
M. Bendersky, D. Metzler, and W. B. Croft. Learning concept importance using a weighted dependence model. In WSDM, 2010. Google ScholarDigital Library
A. Berger and J. Lafferty. Information retrieval as statistical translation. In SIGIR, 1999. Google ScholarDigital Library
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. JMLR, 3:993--1022, 2003. Google ScholarDigital Library
L. Cai, G. Zhou, K. Liu, and J. Zhao. Learning the latent topics for question retrieval in community qa. In AFNLP, 2011.Google Scholar
X. Cao, G. Cong, B. Cui, C. S. Jensen, and C. Zhang. The use of categorization information in language models for question retrieval. In CIKM, 2009. Google ScholarDigital Library
Y. Cao, J. Xu, T.-Y. Liu, H. Li, Y. Huang, and H.-W. Hon. Adapting ranking svm to document retrieval. In SIGIR, 2006. Google ScholarDigital Library
D. Carmel, A. Mejer, Y. Pinter, and I. Szpektor. Improving term weighting for community question answering search using syntactic analysis. In CIKM, 2014. Google ScholarDigital Library
R.-C. Chen, D. Spina, W. B. Croft, M. Sanderson, and F. Scholer. Harnessing semantics for answer sentence retrieval. In ESAIR Workshop, 2015. Google ScholarDigital Library
K. Crammer, A. Kulesza, and M. Dredze. Adaptive regularization of weight vectors. MLJ, 91(2):155--187, 2013. Google ScholarDigital Library
S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. JAsIs, 41(6):391--407, 1990.Google ScholarCross Ref
H. Duan, Y. Cao, C.-Y. Lin, and Y. Yu. Searching questions by identifying question topic and question focus. In ACL, 2008.Google Scholar
D. Ganguly, D. Roy, M. Mitra, and G. J. Jones. Word embedding based generalized language model for information retrieval. In SIGIR, 2015. Google ScholarDigital Library
P.-S. Huang, X. He, J. Gao, L. Deng, A. Acero, and L. Heck. Learning deep structured semantic models for web search using clickthrough data. In CIKM, 2013. Google ScholarDigital Library
J. Jeon, W. B. Croft, and J. H. Lee. Finding similar questions in large question and answer archives. In CIKM, 2005. Google ScholarDigital Library
R. Jin, A. G. Hauptmann, and C. X. Zhai. Language model for information retrieval. In SIGIR, 2002. Google ScholarDigital Library
Q. Liu, E. Agichtein, G. Dror, E. Gabrilovich, Y. Maarek, D. Pelleg, and I. Szpektor. Predicting web searcher satisfaction with existing community-based answers. In SIGIR, 2011. Google ScholarDigital Library
T.-Y. Liu. Learning to rank for information retrieval. Found. Trends Inf. Retr., 3(3):225--331, Mar. 2009. Google ScholarDigital Library
T. y. Liu, J. Xu, T. Qin, W. Xiong, and H. Li. Letor: Benchmark dataset for research on learning to rank for information retrieval. In SIGIR Workshop on Learning to Rank for Information Retrieval, 2007.Google ScholarCross Ref
Y. Liu, C. Sun, L. Lin, Y. Zhao, and X. Wang. Computing semantic text similarity using rich features. In PACLIC, 2015.Google Scholar
C. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, 2008. Google ScholarDigital Library
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS. 2013.Google ScholarDigital Library
D. R. Miller, T. Leek, and R. M. Schwartz. A hidden markov model information retrieval system. In SIGIR, 1999. Google ScholarDigital Library
V. Murdock and M. Lalmas. Workshop on aggregated search. SIGIR Forum, 42(2):80--83, Nov. 2008. Google ScholarDigital Library
J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In SIGIR, 1998. Google ScholarDigital Library
F. Radlinski, M. Kurup, and T. Joachims. How does clickthrough data reflect retrieval quality? In CIKM, 2008. Google ScholarDigital Library
S. Robertson and H. Zaragoza. The probabilistic relevance framework: Bm25 and beyond. Found. Trends Inf. Retr., 3(4):333--389, Apr. 2009. Google ScholarDigital Library
S. E. Robertson and K. S. Jones. Relevance weighting of search terms. Journal of the American Society for Information science, 27(3):129--146, 1976.Google ScholarCross Ref
S. E. Robertson, C. J. van Rijsbergen, and M. F. Porter. Probabilistic models of indexing and searching. In SIGIR, 1980. Google ScholarDigital Library
G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Inf. Process. Manage., 24(5):513--523, Aug. 1988. Google ScholarDigital Library
A. Severyn and A. Moschitti. Learning to rank short text pairs with convolutional deep neural networks. In SIGIR, 2015. Google ScholarDigital Library
F. Song and W. B. Croft. A general language model for information retrieval. In CIKM, 1999. Google ScholarDigital Library
K. Tymoshenko and A. Moschitti. Assessing the impact of syntactic and semantic structures for answer passages reranking. In CIKM, 2015. Google ScholarDigital Library
X. Wei and W. B. Croft. Lda-based document models for ad-hoc retrieval. In SIGIR, 2006. Google ScholarDigital Library
R. W. White, M. Richardson, and W.-t. Yih. Questions vs. queries in informational search tasks. In WWW Companion, 2015. Google ScholarDigital Library
H. Wu, W. Wu, M. Zhou, E. Chen, L. Duan, and H.-Y. Shum. Improving search relevance for short queries in community question answering. In WSDM, 2014. Google ScholarDigital Library
Q. Wu, C. J. Burges, K. M. Svore, and J. Gao. Adapting boosting for information retrieval measures. Inf. Retr., 13(3):254--270, June 2010. Google ScholarDigital Library
X. Xue, J. Jeon, and W. B. Croft. Retrieval models for question and answer archives. In SIGIR, 2008. Google ScholarDigital Library
C. Zhai. Statistical language models for information retrieval. Synthesis Lectures on HLT, 1(1):1--141, 2008. Google ScholarDigital Library
C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In SIGIR, 2001. Google ScholarDigital Library
W. Zhang, Z. Ming, Y. Zhang, L. Nie, T. Liu, and T. Chua. The use of dependency relation graph to enhance the term weighting in question retrieval. In COLING, 2012.Google Scholar
G. Zheng and J. Callan. Learning to reweight terms with distributed representations. In SIGIR, 2015. Google ScholarDigital Library

Index Terms

That's Not My Question: Learning to Weight Unmatched Terms in CQA Vertical Search
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
      1. Learning to rank
      2. Novelty in information retrieval
    2. Retrieval tasks and goals
      1. Question answering

Recommendations

Novelty based Ranking of Human Answers for Community Questions
SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval

Questions and their corresponding answers within a community based question answering (CQA) site are frequently presented as top search results forWeb search queries and viewed by millions of searchers daily. The number of answers for CQA questions ...
Read More
Recency and quality-based ranking question in CQAs: A Stack Overflow case study
Abstract
Recency ranking, in Community-based Question Answering (CQA), would refer to put recent answers in a list’s top positions. To be recent is not related to how new is the date of creation or editing of a given answer, but how current is ...
Highlights
- An automatic proposal for quality and recency-based answer ranking.
- Proposal of ...
Read More
Information Retrieval by Modified Term Weighting Method Using Random Walk Model with Query Term Position Ranking
ICSPS '09: Proceedings of the 2009 International Conference on Signal Processing Systems

Term weighting is a core idea behind any information retrieval technique which has crucial importance in document ranking. In graph based ranking algorithm, terms within a document are represented as a graph of that document. Term weights for ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval
July 2016
1296 pages
ISBN:9781450340694
DOI:10.1145/2911451
General Chairs:
Raffaele Perego
ISTI-CNR, Italy
,
Fabrizio Sebastiani
Qatar Computing Research Institute, HBKU, Qatar
,
Program Chairs:
Javed Aslam
Northeastern University, US
,
Ian Ruthven
University of Strathclyde, UK
,
Justin Zobel
University of Melbourne, Australia
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 July 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
community-based question answering
document ranking
unmatched terms
Qualifiers
- research-article
Conference

Acceptance Rates
SIGIR '16 Paper Acceptance Rate62of341submissions,18%Overall Acceptance Rate792of3,983submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 312
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

That's Not My Question: Learning to Weight Unmatched Terms in CQA Vertical Search

SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Novelty based Ranking of Human Answers for Community Questions

Recency and quality-based ranking question in CQAs: A Stack Overflow case study

Information Retrieval by Modified Term Weighting Method Using Random Walk Model with Query Term Position Ranking