Article

Respect my authority!: HITS without hyperlinks, utilizing cluster-based language models

Authors:
Oren Kurland

Cornell University, Ithaca, NY

Cornell University, Ithaca, NY
View Profile

,
Lillian Lee

Cornell University, Ithaca, NY

Cornell University, Ithaca, NY
View Profile

SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrievalAugust 2006Pages 83–90https://doi.org/10.1145/1148170.1148188

Published:06 August 2006Publication History

SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval

Pages 83–90

ABSTRACT

We present an approach to improving the precision of an initial document ranking wherein we utilize cluster information within a graph-based framework. The main idea is to perform reranking based on centrality within bipartite graphs of documents (on one side) and clusters (on the other side), on the premise that these are mutually reinforcing entities. Links between entities are created via consideration of language models induced from them.We find that our cluster-document graphs give rise to much better retrieval performance than previously proposed document-only graphs do. For example, authority-based reranking of documents via a HITS-style cluster-based approach outperforms a previously-proposed PageRank-inspired algorithm applied to solely-document graphs. Moreover, we also show that computing authority scores for clusters constitutes an effective method for identifying clusters containing a large percentage of relevant documents.

References

J. Baliński and C. Danilowicz. Re-ranking method based on inter-document distances. Information Processing and Management, 41(4):759--775, 2005.]] Google ScholarDigital Library
D. Beeferman and A. L. Berger. Agglomerative clustering of a search engine query log. In Proceedings of KDD, pages 407--416, 2000.]] Google ScholarDigital Library
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the 7th International World Wide Web Conference, pages 107--117, 1998.]] Google ScholarDigital Library
W. B. Croft. A model of cluster searching based on classification. Information Systems, 5:189--195, 1980.]]Google ScholarCross Ref
W. B. Croft and J. Lafferty, editors. Language Modeling for Information Retrieval. Number 13 in Information Retrieval Book Series. Kluwer, 2003.]] Google ScholarDigital Library
D. R. Cutting, D. R. Karger, J. O. Pedersen, and J. W. Tukey. Scatter/Gather: A cluster-based approach to browsing large document collections. In 15th Annual International SIGIR, pages 318--329, Denmark, June 1992.]] Google ScholarDigital Library
C. Danilowicz and J. Baliński. Document ranking based upon Markov chains. Information Processing and Management, 41(4):759--775, 2000.]] Google ScholarDigital Library
I. Dhillon. Co-clustering documents and words using bipartite spectral graph partitioning. In Proceedings of the Seventh ACM SIGKDD Conference, pages 269--274, 2001.]] Google ScholarDigital Library
F. Diaz. Regularizing ad hoc retrieval scores. In Proceedings of the Fourteenth International Conference on Information and Knowledge Managment (CIKM), pages 672--679, 2005.]] Google ScholarDigital Library
G. Erkan. Language model based document clustering using random walks. In Proceedings of HLT/NAACL, 2006.]] Google ScholarDigital Library
G. Erkan and D. R. Radev. LexRank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research, 22:457--479, 2004.]]Google ScholarCross Ref
A. Griffiths, H. C. Luckhurst, and P. Willett. Using interdocument similarity information in document retrieval systems. Journal of the American Society for Information Science (JASIS), 37(1):3--11, 1986. Reprinted in Karen Sparck Jones and Peter Willett, eds., Readings in Information Retrieval, Morgan Kaufmann, pp. 365--373, 1997.]] Google ScholarDigital Library
M. A. Hearst and J. O. Pedersen. Reexamining the cluster hypothesis: Scatter/Gather on retrieval results. In Proceedings of SIGIR, 1996.]] Google ScholarDigital Library
N. Jardine and C. J. van Rijsbergen. The use of hierarchic clustering in information retrieval. Information Storage and Retrieval, 7(5):217--240, 1971.]]Google ScholarCross Ref
Y. Karov and S. Edelman. Similarity-based word sense disambiguation. Computational Linguistics, 24(1):41--59, 1998.]] Google ScholarDigital Library
J. Kleinberg. Authoritative sources in a hyperlinked environment. In Proceedings of the 9th ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 668--677, 1998. Extended version in Journal of the ACM, 46:604--632, 1999.]] Google ScholarDigital Library
O. Kurland and L. Lee. Corpus structure, language models, and ad hoc information retrieval. In Proceedings of SIGIR, pages 194--201, 2004.]] Google ScholarDigital Library
O. Kurland and L. Lee. PageRank without hyperlinks: Structural re-ranking using links induced by language models. In Proceedings of SIGIR, pages 306--313, 2005.]] Google ScholarDigital Library
O. Kurland, L. Lee, and C. Domshlak. Better than the real thing? Iterative pseudo-query processing using cluster-based language models. In Proceedings of SIGIR, pages 19--26, 2005.]] Google ScholarDigital Library
J. D. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In Proceedings of SIGIR, pages 111--119, 2001.]] Google ScholarDigital Library
A. N. Langville and C. D. Meyer. Deeper inside PageRank. Internet Mathematics, 2005.]]Google Scholar
A. Leuski. Evaluating document clustering for interactive information retrieval. In Proceedings of the Tenth International Conference on Information and Knowledge Managment (CIKM), pages 33--40, 2001.]] Google ScholarDigital Library
A. Leuski and J. Allan. Evaluating a visual navigation system for a digital library. In Proceedings of the Second European conference on research and advanced technology for digital libraries (ECDL), pages 535--554, 1998.]] Google ScholarDigital Library
G.-A. Levow and I. Matveeva. University of Chicago at CLEF2004: Cross-language text and spoken document retrieval. In Proceedings of CLEF, pages 170--179, 2004.]]Google Scholar
X. Liu and W. B. Croft. Cluster-based retrieval using language models. In Proceedings of SIGIR, pages 186--193, 2004.]] Google ScholarDigital Library
R. Mihalcea and P. Tarau. TextRank: Bringing order into texts. In Proceedings of EMNLP, pages 404--411, 2004. Poster.]]Google Scholar
A. Y. Ng, A. X. Zheng, and M. I. Jordan. Stable algorithms for link analysis. In Proceedings of SIGIR, pages 258--266, 2001.]] Google ScholarDigital Library
J. Otterbacher, G. Erkan, and D. R. Radev. Using random walks for question-focused sentence retrieval. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), pages 915--922, 2005.]] Google ScholarDigital Library
J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In Proceedings of SIGIR, pages 275--281, 1998.]] Google ScholarDigital Library
S. E. Preece. Clustering as an output option. In Proceedings of the American Society for Information Science, pages 189--190, 1973.]]Google Scholar
C. Shah and W. B. Croft. Evaluating high accuracy retrieval techniques. In Proceedings of SIGIR, pages 2--9, 2004.]] Google ScholarDigital Library
X. Shen and C. Zhai. Active feedback in ad hoc information retrieval. In Proceedings of SIGIR, pages 59--66, 2005.]] Google ScholarDigital Library
T. Tao, X. Wang, Q. Mei, and C. Zhai. Language model information retrieval with document expansion. In Proceedings of HLT/NAACL, 2006.]] Google ScholarDigital Library
A. Tombros, R. Villa, and C. van Rijsbergen. The effectiveness of query-specific hierarchic clustering in information retrieval. Information Processing and Management, 38(4):559--582, 2002.]] Google ScholarDigital Library
C. J. van Rijsbergen. Information Retrieval. Butterworths, second edition, 1979.]] Google ScholarDigital Library
P. Willett. Query specific automatic document classification. International Forum on Information and Documentation, 10(2):28--32, 1985.]]Google Scholar
O. Zamir and O. Etzioni. Web document clustering: a feasibility demonstration. In Proceedings of SIGIR, pages 46--54, 1998.]] Google ScholarDigital Library
C. Zhai and J. D. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of SIGIR, pages 334--342, 2001.]] Google ScholarDigital Library
B. Zhang, H. Li, Y. Liu, L. Ji, W. Xi, W. Fan, Z. Chen, and W.-Y. Ma. Improving web search results using affinity graph. In Proceedings of SIGIR, pages 504--511, 2005.]] Google ScholarDigital Library

Index Terms

Respect my authority!: HITS without hyperlinks, utilizing cluster-based language models
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

PageRank without hyperlinks: structural re-ranking using links induced by language models
SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval

Inspired by the PageRank and HITS (hubs and authorities) algorithms for Web search, we propose a structural re-ranking approach to ad hoc information retrieval: we reorder the documents in an initially retrieved set by exploiting asymmetric ...
Read More
PageRank without hyperlinks: Structural reranking using links induced by language models

The ad hoc retrieval task is to find documents in a corpus that are relevant to a query. Inspired by the PageRank and HITS (hubs and authorities) algorithms for Web search, we propose a structural reranking approach to ad-hoc retrieval that applies to ...
Read More
A study of the integration of passage-, document-, and cluster-based information for re-ranking search results
Abstract
Cluster-based and passage-based document retrieval paradigms were shown to be effective. While the former are based on utilizing query-related corpus context manifested in clusters of similar documents, the latter address the fact that a document ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
August 2006
768 pages
ISBN:1595933697
DOI:10.1145/1148170
General Chair:
Efthimis N. Efthimiadis
University of Washington
,
Program Chairs:
Susan Dumais
Microsoft Research, Redmond
,
David Hawking
CSIRO ICT Centre, Canberra, Australia
,
Kalervo Järvelin,
University of Tampere, Finland
Copyright © 2006 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 August 2006
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
HITS
authorities
bipartite graph
cluster-based language models
clusters
graph-based retrieval
high-accuracy retrieval
hubs
language modeling
structural re-ranking
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 67
  Total Citations
  View Citations
- 1,032
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Respect my authority!: HITS without hyperlinks, utilizing cluster-based language models

SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

PageRank without hyperlinks: structural re-ranking using links induced by language models

PageRank without hyperlinks: Structural reranking using links induced by language models

A study of the integration of passage-, document-, and cluster-based information for re-ranking search results