skip to main content
10.1145/2911451.2911496acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

That's Not My Question: Learning to Weight Unmatched Terms in CQA Vertical Search

Published:07 July 2016Publication History

ABSTRACT

A fundamental task in Information Retrieval (IR) is term weighting. Early IR theory considered both the presence or absence of all terms in the lexicon for ranking and needed to weight them all. Yet, as the size of lexicons grew and models became too complex, common weighting models preferred to aggregate only the weights of the query terms that are matched in candidate documents. Thus, unmatched term contribution in these models is only considered indirectly, such as in probability smoothing with corpus distribution, or in weight normalization by document length. In this work we propose a novel term weighting model that directly assesses the weights of unmatched terms, and show its benefits. Specifically, we propose a Learning To Rank framework, in which features corresponding to matched terms are also "mirrored" in similar features that account only for unmatched terms. The relative importance of each feature is learned via a click-through query log. As a test case, we consider vertical search in Community-based Question Answering(CQA) sites from Web queries. Queries that result in viewing CQA content often contain fine grained information needs and benefit more from unmatched term weighting. We assess our model both via manual evaluation and via automatic evaluation over a clickthrough log. Our results show consistent improvement in retrieval when unmatched information is taken into account. This holds both when only identical terms are considered matched, and when related terms are matched via distributional similarity.

References

  1. G. Amati, V. Rijsbergen, and C. Joost. Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst., 20(4), Oct. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. Arguello, F. Diaz, J. Callan, and J.-F. Crespo. Sources of evidence for vertical selection. In SIGIR, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Bendersky, D. Metzler, and W. B. Croft. Learning concept importance using a weighted dependence model. In WSDM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Berger and J. Lafferty. Information retrieval as statistical translation. In SIGIR, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. JMLR, 3:993--1022, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. L. Cai, G. Zhou, K. Liu, and J. Zhao. Learning the latent topics for question retrieval in community qa. In AFNLP, 2011.Google ScholarGoogle Scholar
  7. X. Cao, G. Cong, B. Cui, C. S. Jensen, and C. Zhang. The use of categorization information in language models for question retrieval. In CIKM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Y. Cao, J. Xu, T.-Y. Liu, H. Li, Y. Huang, and H.-W. Hon. Adapting ranking svm to document retrieval. In SIGIR, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Carmel, A. Mejer, Y. Pinter, and I. Szpektor. Improving term weighting for community question answering search using syntactic analysis. In CIKM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R.-C. Chen, D. Spina, W. B. Croft, M. Sanderson, and F. Scholer. Harnessing semantics for answer sentence retrieval. In ESAIR Workshop, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. K. Crammer, A. Kulesza, and M. Dredze. Adaptive regularization of weight vectors. MLJ, 91(2):155--187, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. JAsIs, 41(6):391--407, 1990.Google ScholarGoogle ScholarCross RefCross Ref
  13. H. Duan, Y. Cao, C.-Y. Lin, and Y. Yu. Searching questions by identifying question topic and question focus. In ACL, 2008.Google ScholarGoogle Scholar
  14. D. Ganguly, D. Roy, M. Mitra, and G. J. Jones. Word embedding based generalized language model for information retrieval. In SIGIR, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P.-S. Huang, X. He, J. Gao, L. Deng, A. Acero, and L. Heck. Learning deep structured semantic models for web search using clickthrough data. In CIKM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Jeon, W. B. Croft, and J. H. Lee. Finding similar questions in large question and answer archives. In CIKM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. Jin, A. G. Hauptmann, and C. X. Zhai. Language model for information retrieval. In SIGIR, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Q. Liu, E. Agichtein, G. Dror, E. Gabrilovich, Y. Maarek, D. Pelleg, and I. Szpektor. Predicting web searcher satisfaction with existing community-based answers. In SIGIR, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. T.-Y. Liu. Learning to rank for information retrieval. Found. Trends Inf. Retr., 3(3):225--331, Mar. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. T. y. Liu, J. Xu, T. Qin, W. Xiong, and H. Li. Letor: Benchmark dataset for research on learning to rank for information retrieval. In SIGIR Workshop on Learning to Rank for Information Retrieval, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  21. Y. Liu, C. Sun, L. Lin, Y. Zhao, and X. Wang. Computing semantic text similarity using rich features. In PACLIC, 2015.Google ScholarGoogle Scholar
  22. C. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS. 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. D. R. Miller, T. Leek, and R. M. Schwartz. A hidden markov model information retrieval system. In SIGIR, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. V. Murdock and M. Lalmas. Workshop on aggregated search. SIGIR Forum, 42(2):80--83, Nov. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In SIGIR, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. F. Radlinski, M. Kurup, and T. Joachims. How does clickthrough data reflect retrieval quality? In CIKM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Robertson and H. Zaragoza. The probabilistic relevance framework: Bm25 and beyond. Found. Trends Inf. Retr., 3(4):333--389, Apr. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. E. Robertson and K. S. Jones. Relevance weighting of search terms. Journal of the American Society for Information science, 27(3):129--146, 1976.Google ScholarGoogle ScholarCross RefCross Ref
  30. S. E. Robertson, C. J. van Rijsbergen, and M. F. Porter. Probabilistic models of indexing and searching. In SIGIR, 1980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Inf. Process. Manage., 24(5):513--523, Aug. 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. A. Severyn and A. Moschitti. Learning to rank short text pairs with convolutional deep neural networks. In SIGIR, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. F. Song and W. B. Croft. A general language model for information retrieval. In CIKM, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. K. Tymoshenko and A. Moschitti. Assessing the impact of syntactic and semantic structures for answer passages reranking. In CIKM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. X. Wei and W. B. Croft. Lda-based document models for ad-hoc retrieval. In SIGIR, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. R. W. White, M. Richardson, and W.-t. Yih. Questions vs. queries in informational search tasks. In WWW Companion, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. H. Wu, W. Wu, M. Zhou, E. Chen, L. Duan, and H.-Y. Shum. Improving search relevance for short queries in community question answering. In WSDM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Q. Wu, C. J. Burges, K. M. Svore, and J. Gao. Adapting boosting for information retrieval measures. Inf. Retr., 13(3):254--270, June 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. X. Xue, J. Jeon, and W. B. Croft. Retrieval models for question and answer archives. In SIGIR, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. C. Zhai. Statistical language models for information retrieval. Synthesis Lectures on HLT, 1(1):1--141, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In SIGIR, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. W. Zhang, Z. Ming, Y. Zhang, L. Nie, T. Liu, and T. Chua. The use of dependency relation graph to enhance the term weighting in question retrieval. In COLING, 2012.Google ScholarGoogle Scholar
  43. G. Zheng and J. Callan. Learning to reweight terms with distributed representations. In SIGIR, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. That's Not My Question: Learning to Weight Unmatched Terms in CQA Vertical Search

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval
          July 2016
          1296 pages
          ISBN:9781450340694
          DOI:10.1145/2911451

          Copyright © 2016 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 7 July 2016

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          SIGIR '16 Paper Acceptance Rate62of341submissions,18%Overall Acceptance Rate792of3,983submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader