skip to main content
10.1145/1148170.1148188acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Respect my authority!: HITS without hyperlinks, utilizing cluster-based language models

Published:06 August 2006Publication History

ABSTRACT

We present an approach to improving the precision of an initial document ranking wherein we utilize cluster information within a graph-based framework. The main idea is to perform reranking based on centrality within bipartite graphs of documents (on one side) and clusters (on the other side), on the premise that these are mutually reinforcing entities. Links between entities are created via consideration of language models induced from them.We find that our cluster-document graphs give rise to much better retrieval performance than previously proposed document-only graphs do. For example, authority-based reranking of documents via a HITS-style cluster-based approach outperforms a previously-proposed PageRank-inspired algorithm applied to solely-document graphs. Moreover, we also show that computing authority scores for clusters constitutes an effective method for identifying clusters containing a large percentage of relevant documents.

References

  1. J. Baliński and C. Danilowicz. Re-ranking method based on inter-document distances. Information Processing and Management, 41(4):759--775, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Beeferman and A. L. Berger. Agglomerative clustering of a search engine query log. In Proceedings of KDD, pages 407--416, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the 7th International World Wide Web Conference, pages 107--117, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. W. B. Croft. A model of cluster searching based on classification. Information Systems, 5:189--195, 1980.]]Google ScholarGoogle ScholarCross RefCross Ref
  5. W. B. Croft and J. Lafferty, editors. Language Modeling for Information Retrieval. Number 13 in Information Retrieval Book Series. Kluwer, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. R. Cutting, D. R. Karger, J. O. Pedersen, and J. W. Tukey. Scatter/Gather: A cluster-based approach to browsing large document collections. In 15th Annual International SIGIR, pages 318--329, Denmark, June 1992.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Danilowicz and J. Baliński. Document ranking based upon Markov chains. Information Processing and Management, 41(4):759--775, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. I. Dhillon. Co-clustering documents and words using bipartite spectral graph partitioning. In Proceedings of the Seventh ACM SIGKDD Conference, pages 269--274, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. F. Diaz. Regularizing ad hoc retrieval scores. In Proceedings of the Fourteenth International Conference on Information and Knowledge Managment (CIKM), pages 672--679, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. Erkan. Language model based document clustering using random walks. In Proceedings of HLT/NAACL, 2006.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. Erkan and D. R. Radev. LexRank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research, 22:457--479, 2004.]]Google ScholarGoogle ScholarCross RefCross Ref
  12. A. Griffiths, H. C. Luckhurst, and P. Willett. Using interdocument similarity information in document retrieval systems. Journal of the American Society for Information Science (JASIS), 37(1):3--11, 1986. Reprinted in Karen Sparck Jones and Peter Willett, eds., Readings in Information Retrieval, Morgan Kaufmann, pp. 365--373, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. A. Hearst and J. O. Pedersen. Reexamining the cluster hypothesis: Scatter/Gather on retrieval results. In Proceedings of SIGIR, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. N. Jardine and C. J. van Rijsbergen. The use of hierarchic clustering in information retrieval. Information Storage and Retrieval, 7(5):217--240, 1971.]]Google ScholarGoogle ScholarCross RefCross Ref
  15. Y. Karov and S. Edelman. Similarity-based word sense disambiguation. Computational Linguistics, 24(1):41--59, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Kleinberg. Authoritative sources in a hyperlinked environment. In Proceedings of the 9th ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 668--677, 1998. Extended version in Journal of the ACM, 46:604--632, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. O. Kurland and L. Lee. Corpus structure, language models, and ad hoc information retrieval. In Proceedings of SIGIR, pages 194--201, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. O. Kurland and L. Lee. PageRank without hyperlinks: Structural re-ranking using links induced by language models. In Proceedings of SIGIR, pages 306--313, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. O. Kurland, L. Lee, and C. Domshlak. Better than the real thing? Iterative pseudo-query processing using cluster-based language models. In Proceedings of SIGIR, pages 19--26, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. D. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In Proceedings of SIGIR, pages 111--119, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. N. Langville and C. D. Meyer. Deeper inside PageRank. Internet Mathematics, 2005.]]Google ScholarGoogle Scholar
  22. A. Leuski. Evaluating document clustering for interactive information retrieval. In Proceedings of the Tenth International Conference on Information and Knowledge Managment (CIKM), pages 33--40, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Leuski and J. Allan. Evaluating a visual navigation system for a digital library. In Proceedings of the Second European conference on research and advanced technology for digital libraries (ECDL), pages 535--554, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. G.-A. Levow and I. Matveeva. University of Chicago at CLEF2004: Cross-language text and spoken document retrieval. In Proceedings of CLEF, pages 170--179, 2004.]]Google ScholarGoogle Scholar
  25. X. Liu and W. B. Croft. Cluster-based retrieval using language models. In Proceedings of SIGIR, pages 186--193, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. R. Mihalcea and P. Tarau. TextRank: Bringing order into texts. In Proceedings of EMNLP, pages 404--411, 2004. Poster.]]Google ScholarGoogle Scholar
  27. A. Y. Ng, A. X. Zheng, and M. I. Jordan. Stable algorithms for link analysis. In Proceedings of SIGIR, pages 258--266, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Otterbacher, G. Erkan, and D. R. Radev. Using random walks for question-focused sentence retrieval. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), pages 915--922, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. M. Ponte and W. B. Croft. A language modeling approach to information retrieval. In Proceedings of SIGIR, pages 275--281, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. E. Preece. Clustering as an output option. In Proceedings of the American Society for Information Science, pages 189--190, 1973.]]Google ScholarGoogle Scholar
  31. C. Shah and W. B. Croft. Evaluating high accuracy retrieval techniques. In Proceedings of SIGIR, pages 2--9, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. X. Shen and C. Zhai. Active feedback in ad hoc information retrieval. In Proceedings of SIGIR, pages 59--66, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. T. Tao, X. Wang, Q. Mei, and C. Zhai. Language model information retrieval with document expansion. In Proceedings of HLT/NAACL, 2006.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. A. Tombros, R. Villa, and C. van Rijsbergen. The effectiveness of query-specific hierarchic clustering in information retrieval. Information Processing and Management, 38(4):559--582, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. C. J. van Rijsbergen. Information Retrieval. Butterworths, second edition, 1979.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. P. Willett. Query specific automatic document classification. International Forum on Information and Documentation, 10(2):28--32, 1985.]]Google ScholarGoogle Scholar
  37. O. Zamir and O. Etzioni. Web document clustering: a feasibility demonstration. In Proceedings of SIGIR, pages 46--54, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. C. Zhai and J. D. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of SIGIR, pages 334--342, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. B. Zhang, H. Li, Y. Liu, L. Ji, W. Xi, W. Fan, Z. Chen, and W.-Y. Ma. Improving web search results using affinity graph. In Proceedings of SIGIR, pages 504--511, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Respect my authority!: HITS without hyperlinks, utilizing cluster-based language models

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
      August 2006
      768 pages
      ISBN:1595933697
      DOI:10.1145/1148170

      Copyright © 2006 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 6 August 2006

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate792of3,983submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader