skip to main content
10.1145/1498759.1498825acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Integration of news content into web results

Published:09 February 2009Publication History

ABSTRACT

Aggregated search refers to the integration of content from specialized corpora or verticals into web search results. Aggregation improves search when the user has vertical intent but may not be aware of or desire vertical search. In this paper, we address the issue of integrating search results from a news vertical into web search results. News is particularly challenging because, given a query, the appropriate decision---to integrate news content or not---changes with time. Our system adapts to news intent in two ways. First, by inspecting the dynamics of the news collection and query volume, we can track development of and interest in topics. Second, by using click feedback, we can quickly recover from system errors. We define several click-based metrics which allow a system to be monitored and tuned without annotator effort.

References

  1. J. Allan, editor. Topic Detection and Tracking: Event-based Information Organization, volume 12 of The Information Retrieval Series. Springer, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. H. Becker, C. Meek, and D. M. Chickering. Modeling contextual factors of click rates. In AAAI, pages 1310--1315, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Beeferman and A. Berger. Agglomerative clustering of a search engine query log. In KDD 2000, pages 407--416, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. M. Beitzel, E. C. Jensen, O. Frieder, D. Grossman, D. D. Lewis, A. Chowdhury, and A. Kolcz. Automatic web query classification using labeled and unlabeled training data. In SIGIR 2005, pages 581--582, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. N. J. Belkin and W. B. Croft. Information filtering and information retrieval: two sides of the same coin? CACM, 35(12):29--38, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Z. Broder, M. Fontoura, E. Gabrilovich, A. Joshi, V. Josifovski, and T. Zhang. Robust classification of rare queries using web knowledge. In SIGIR 2007, pages 231--238, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Callan. Distributed information retrieval. In W. B. Croft, editor, Advances in Information Retrieval. Kluwer Academic Publishers, 2000.Google ScholarGoogle Scholar
  8. B. Carterette and R. Jones. Evaluating search engines by modeling the relationship between relevance and clicks. In NIPS, 2007.Google ScholarGoogle Scholar
  9. S. Cronen-Townsend, Y. Zhou, and W. B. Croft. Predicting query performance. In SIGIR 2002, pages 299--306, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. S. Das, M. Datar, A. Garg, and S. Rajaram. Google news personalization: scalable online collaborative filtering. In WWW 2007, pages 271--280, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. F. Diaz and R. Jones. Using temporal profiles of queries for precision prediction. In SIGIR 2004, pages 18--24, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. Dupret, V. Murdock, and B. Piwowarski. Web search engine evaluation using clickthrough data and a user model. In Query Log Analysis: Social And Technological Challenges. A workshop at the 16th International World Wide Web Conference (WWW 2007), May 2007.Google ScholarGoogle Scholar
  13. G. E. Dupret and B. Piwowarski. A user browsing model to predict search engine click data from past observations. In SIGIR 2008, pages 331--338, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Jones and F. Diaz. Temporal profiles of queries. ACM Trans. Inf. Syst., 25(3):14, July 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. N. K. Jong and P. Stone. Bayesian models of nonstationary markov decision processes. In The IJCAI-2005 Workshop on Planning and Learning in A Priori Unknown or Dynamic Domains, 2005.Google ScholarGoogle Scholar
  16. I.-H. Kang and G. Kim. Query type classification for web document retrieval. In SIGIR 2003, pages 64--71, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. Kleinberg, A. Slivkins, and E. Upfal. Multi-armed bandits in metric spaces. In STOC 2008. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. V. Lavrenko and W. B. Croft. Relevance based language models. In SIGIR 2001, pages 120--127, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. X. Li, Y.-Y. Wang, and A. Acero. Learning query intent from regularized click graphs. In SIGIR 2008, pages 339--346, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C.-J. Lin, R. C. Weng, and S. S. Keerthi. Trust region newton method for large-scale logistic regression. Journal of Machine Learning Research, 9:627--650, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. K. McCallum and K. Nigam. Employing EM in pool-based active learning for text classification. In ICML 1998, pages 350--358, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. D. Metzler, S. T. Dumais, and C. Meek. Similarity measures for short segments of text. In ECIR, pages 16--27, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. V. Murdock and M. Lalmas, editors. Proceedings of the SIGIR Workshop on Aggregated Search, 2008.Google ScholarGoogle Scholar
  24. S. Pandey, D. Agarwal, D. Chakrabarti, and V. Josifovski. Bandits for taxonomies: A model-based approach. In SDM, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  25. S. Pandey, D. Chakrabarti, and D. Agarwal. Multi-armed bandit problems with dependent arms. In ICML 2007, pages 721--728, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. F. Radlinski, R. Kleinberg, and T. Joachims. Learning diverse rankings with multi-armed bandits. In ICML 2008, pages 784--791, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. Sahami and T. D. Heilman. A web-based kernel function for measuring the similarity of short text snippets. In WWW 2006, pages 377--386, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. J. A. Strens. A bayesian framework for reinforcement learning. In ICML 2000, pages 943--950, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. T. Strohman, D. Metzler, H. Turtle, and W. B. Croft. Indri: A language model-based search engine for complex queries. In Proceedings of the International Conference on Intelligence Analysis, 2004.Google ScholarGoogle Scholar
  30. R. Sutton and A. Barto. Reinforcement Learning. MIT Press, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. M. Vlachos, C. Meek, Z. Vagena, and D. Gunopulos. Identifying similarities, periodicities and bursts for online search queries. In SIGMOD 2004, pages 131--142, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J.-R. Wen, J.-Y. Nie, and H.-J. Zhang. Query clustering using user logs. ACM Trans. Inf. Syst., 20(1):59--81, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. X. Zhang. Fast Algorithms for Burst Detection. PhD thesis, New York University, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Y. Zhang, W. Xu, and J. P. Callan. Exploration and exploitation in adaptive filtering based on bayesian active learning. In ICML 2003, pages 896--903, 2003.Google ScholarGoogle Scholar

Index Terms

  1. Integration of news content into web results

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining
        February 2009
        314 pages
        ISBN:9781605583907
        DOI:10.1145/1498759

        Copyright © 2009 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 9 February 2009

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate498of2,863submissions,17%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader