ABSTRACT
Recent work on language models for information retrieval has shown that smoothing language models is crucial for achieving good retrieval performance. Many different effective smoothing methods have been proposed, which mostly implement various heuristics to exploit corpus structures. In this paper, we propose a general and unified optimization framework for smoothing language models on graph structures. This framework not only provides a unified formulation of the existing smoothing heuristics, but also serves as a road map for systematically exploring smoothing methods for language models. We follow this road map and derive several different instantiations of the framework. Some of the instantiations lead to novel smoothing methods. Empirical results show that all such instantiations are effective with some outperforming the state of the art smoothing methods.
- M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput., 15(6):1373--1396, 2003. Google ScholarDigital Library
- S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst., 30(1-7):107--117, 1998. Google ScholarDigital Library
- S. F. Chen and J. Goodman. An empirical study of smoothing techniques for language modeling. Technical Report TR-10-98, Harvard University, 1998.Google Scholar
- K. W. Church and P. Hanks. Word association norms, mutual information, and lexicography. Comput. Linguist., 16(1):22--29, 1990. Google ScholarDigital Library
- W. B. Croft and J. Lafferty, editors. Language Modeling and Information Retrieval. Kluwer Academic Publishers, 2003. Google ScholarDigital Library
- F. Diaz. Regularizing ad hoc retrieval scores. In Proceedings of CIKM'05, pages 672--679, 2005. Google ScholarDigital Library
- D. Hiemstra and W. Kraaij. Twenty-one at TREC-7: Ad-hoc and cross-language track. In Proceedings of TREC 7, pages 227--238, 1998.Google Scholar
- J. M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM, 46(5):604--632. Google ScholarDigital Library
- O. Kurland and L. Lee. Corpus structure, language models, and ad hoc information retrieval. In Proceedings of SIGIR'04, pages 194--201. Google ScholarDigital Library
- O. Kurland and L. Lee. Pagerank without hyperlinks: structural re-ranking using links induced by language models. In Proceedings of SIGIR '05, pages 306--313. Google ScholarDigital Library
- O. Kurland and L. Lee. Respect my authority!: Hits without hyperlinks, utilizing cluster-based language models. In Proceedings of SIGIR '06, pages 83--90. Google ScholarDigital Library
- J. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In Proceedings of SIGIR'01, pages 111--119. Google ScholarDigital Library
- V. Lavrenko and B. Croft. Relevance-based language models. In Proceedings of SIGIR'01, pages 120--127. Google ScholarDigital Library
- X. Liu and W. B. Croft. Cluster-based retrieval using language models. In Proceedings of SIGIR'04. Google ScholarDigital Library
- R. Mihalcea and D. R. Radev, editors. Textgraphs: Graph-based methods for NLP, 2006.Google Scholar
- D. H. Miller, T. Leek, and R. Schwartz. A hidden Markov model information retrieval system. In Proceedings of SIGIR 1999, pages 214--221, 1999. Google ScholarDigital Library
- J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In Proceedings of SIGIR 1998, pages 275--281, 1998. Google ScholarDigital Library
- T. Qin, T.-Y. Liu, X.-D. Zhang, Z. Chen, and W.-Y. Ma. A study of relevance propagation for web search. In Proceedings of SIGIR 2005, pages 408--415, 2005. Google ScholarDigital Library
- A. Shakery and C. Zhai. Smoothing document language models with probabilistic term count propagation. Information Retrieval, 11(2):139--164, 2008. Google ScholarDigital Library
- T. Tao, X. Wang, Q. Mei, and C. Zhai. Language model information retrieval with document expansion. In Proceedings of HLT/NAACL 2006, pages 407--414. Google ScholarDigital Library
- J. Xu and W. B. Croft. Cluster-based language models for distributed retrieval. In Proceedings of SIGIR'99, pages 254--261, 1999. Google ScholarDigital Library
- C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems. Google ScholarDigital Library
- C. Zhai and J. Lafferty. Model-based feedback in the language modeling approach to information retrieval. In Proceedings of CIKM'01, pages 403--410, 2001. Google ScholarDigital Library
- C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of ACM SIGIR'01, pages 334--342, Sept 2001. Google ScholarDigital Library
- D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Schölkopf. Learning with local and global consistency. In NIPS, 2004.Google ScholarDigital Library
- D. Zhou and B. Schölkopf. Discrete regularization. Semi-supervised learning, pages 221--232, 2006.Google Scholar
- X. Zhu, Z. Ghahramani, and J. D. Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. In ICML, pages 912--919, 2003.Google ScholarDigital Library
Index Terms
- A general optimization framework for smoothing language models on graph structures
Recommendations
A study of smoothing methods for language models applied to information retrieval
Language modeling approaches to information retrieval are attractive and promising because they connect the problem of retrieval with that of language model estimation, which has been studied extensively in other application areas such as speech ...
Exploiting thread structures to improve smoothing of language models for forum post retrieval
ECIR'11: Proceedings of the 33rd European conference on Advances in information retrievalDue to many unique characteristics of forum data, forum post retrieval is different from traditional document retrieval and web search, raising interesting research questions about how to optimize the accuracy of forum post retrieval. In this paper, we ...
Exploiting Thread Structures to Improve Smoothing of Language Models for Forum Post Retrieval
ECIR 2011: Proceedings of the 33rd European Conference on Advances in Information Retrieval - Volume 6611Due to many unique characteristics of forum data, forum post retrieval is different from traditional document retrieval and web search, raising interesting research questions about how to optimize the accuracy of forum post retrieval. In this paper, we ...
Comments