research-article

A general optimization framework for smoothing language models on graph structures

Authors:
Qiaozhu Mei

University of Illinois at Urbana-Champaign, Urbana, IL, USA

University of Illinois at Urbana-Champaign, Urbana, IL, USA
View Profile

,
Duo Zhang

University of Illinois at Urbana-Champaign, Urbana, IL, USA

University of Illinois at Urbana-Champaign, Urbana, IL, USA
View Profile

,
ChengXiang Zhai

University of Illinois at Urbana-Champaign, Urbana, IL, USA

University of Illinois at Urbana-Champaign, Urbana, IL, USA
View Profile

SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrievalJuly 2008Pages 611–618https://doi.org/10.1145/1390334.1390438

Published:20 July 2008Publication History

SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

Pages 611–618

ABSTRACT

Recent work on language models for information retrieval has shown that smoothing language models is crucial for achieving good retrieval performance. Many different effective smoothing methods have been proposed, which mostly implement various heuristics to exploit corpus structures. In this paper, we propose a general and unified optimization framework for smoothing language models on graph structures. This framework not only provides a unified formulation of the existing smoothing heuristics, but also serves as a road map for systematically exploring smoothing methods for language models. We follow this road map and derive several different instantiations of the framework. Some of the instantiations lead to novel smoothing methods. Empirical results show that all such instantiations are effective with some outperforming the state of the art smoothing methods.

References

M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput., 15(6):1373--1396, 2003. Google ScholarDigital Library
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst., 30(1-7):107--117, 1998. Google ScholarDigital Library
S. F. Chen and J. Goodman. An empirical study of smoothing techniques for language modeling. Technical Report TR-10-98, Harvard University, 1998.Google Scholar
K. W. Church and P. Hanks. Word association norms, mutual information, and lexicography. Comput. Linguist., 16(1):22--29, 1990. Google ScholarDigital Library
W. B. Croft and J. Lafferty, editors. Language Modeling and Information Retrieval. Kluwer Academic Publishers, 2003. Google ScholarDigital Library
F. Diaz. Regularizing ad hoc retrieval scores. In Proceedings of CIKM'05, pages 672--679, 2005. Google ScholarDigital Library
D. Hiemstra and W. Kraaij. Twenty-one at TREC-7: Ad-hoc and cross-language track. In Proceedings of TREC 7, pages 227--238, 1998.Google Scholar
J. M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM, 46(5):604--632. Google ScholarDigital Library
O. Kurland and L. Lee. Corpus structure, language models, and ad hoc information retrieval. In Proceedings of SIGIR'04, pages 194--201. Google ScholarDigital Library
O. Kurland and L. Lee. Pagerank without hyperlinks: structural re-ranking using links induced by language models. In Proceedings of SIGIR '05, pages 306--313. Google ScholarDigital Library
O. Kurland and L. Lee. Respect my authority!: Hits without hyperlinks, utilizing cluster-based language models. In Proceedings of SIGIR '06, pages 83--90. Google ScholarDigital Library
J. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In Proceedings of SIGIR'01, pages 111--119. Google ScholarDigital Library
V. Lavrenko and B. Croft. Relevance-based language models. In Proceedings of SIGIR'01, pages 120--127. Google ScholarDigital Library
X. Liu and W. B. Croft. Cluster-based retrieval using language models. In Proceedings of SIGIR'04. Google ScholarDigital Library
R. Mihalcea and D. R. Radev, editors. Textgraphs: Graph-based methods for NLP, 2006.Google Scholar
D. H. Miller, T. Leek, and R. Schwartz. A hidden Markov model information retrieval system. In Proceedings of SIGIR 1999, pages 214--221, 1999. Google ScholarDigital Library
J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In Proceedings of SIGIR 1998, pages 275--281, 1998. Google ScholarDigital Library
T. Qin, T.-Y. Liu, X.-D. Zhang, Z. Chen, and W.-Y. Ma. A study of relevance propagation for web search. In Proceedings of SIGIR 2005, pages 408--415, 2005. Google ScholarDigital Library
A. Shakery and C. Zhai. Smoothing document language models with probabilistic term count propagation. Information Retrieval, 11(2):139--164, 2008. Google ScholarDigital Library
T. Tao, X. Wang, Q. Mei, and C. Zhai. Language model information retrieval with document expansion. In Proceedings of HLT/NAACL 2006, pages 407--414. Google ScholarDigital Library
J. Xu and W. B. Croft. Cluster-based language models for distributed retrieval. In Proceedings of SIGIR'99, pages 254--261, 1999. Google ScholarDigital Library
C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems. Google ScholarDigital Library
C. Zhai and J. Lafferty. Model-based feedback in the language modeling approach to information retrieval. In Proceedings of CIKM'01, pages 403--410, 2001. Google ScholarDigital Library
C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of ACM SIGIR'01, pages 334--342, Sept 2001. Google ScholarDigital Library
D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Schölkopf. Learning with local and global consistency. In NIPS, 2004.Google ScholarDigital Library
D. Zhou and B. Schölkopf. Discrete regularization. Semi-supervised learning, pages 221--232, 2006.Google Scholar
X. Zhu, Z. Ghahramani, and J. D. Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. In ICML, pages 912--919, 2003.Google ScholarDigital Library

Index Terms

A general optimization framework for smoothing language models on graph structures
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

A study of smoothing methods for language models applied to information retrieval

Language modeling approaches to information retrieval are attractive and promising because they connect the problem of retrieval with that of language model estimation, which has been studied extensively in other application areas such as speech ...
Read More
Exploiting thread structures to improve smoothing of language models for forum post retrieval
ECIR'11: Proceedings of the 33rd European conference on Advances in information retrieval

Due to many unique characteristics of forum data, forum post retrieval is different from traditional document retrieval and web search, raising interesting research questions about how to optimize the accuracy of forum post retrieval. In this paper, we ...
Read More
Exploiting Thread Structures to Improve Smoothing of Language Models for Forum Post Retrieval
ECIR 2011: Proceedings of the 33rd European Conference on Advances in Information Retrieval - Volume 6611

Due to many unique characteristics of forum data, forum post retrieval is different from traditional document retrieval and web search, raising interesting research questions about how to optimize the accuracy of forum post retrieval. In this paper, we ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
July 2008
934 pages
ISBN:9781605581644
DOI:10.1145/1390334
General Chairs:
Tat-Seng Chua
National University of Singapore
,
Mun-Kew Leong
National Library Board, Singapore
,
Program Chairs:
Syung Hyon Myaeng
Information and Communications University, Korea
,
Douglas W. Oard
University of Maryland, College Park, USA
,
Fabrizio Sebastiani
Consiglio Nazionale delle Ricerche, Italy
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 July 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Language modeling
document and word graph
graph structure
smoothing
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 42
  Total Citations
  View Citations
- 762
  Total Downloads
- Downloads (Last 12 months)9
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A general optimization framework for smoothing language models on graph structures

SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

A study of smoothing methods for language models applied to information retrieval

Exploiting thread structures to improve smoothing of language models for forum post retrieval

Exploiting Thread Structures to Improve Smoothing of Language Models for Forum Post Retrieval