skip to main content
10.1145/1571941.1571997acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Sources of evidence for vertical selection

Published:19 July 2009Publication History

ABSTRACT

Web search providers often include search services for domain-specific subcollections, called verticals, such as news, images, videos, job postings, company summaries, and artist profiles. We address the problem of vertical selection, predicting relevant verticals (if any) for queries issued to the search engine's main web search page. In contrast to prior query classification and resource selection tasks, vertical selection is associated with unique resources that can inform the classification decision. We focus on three sources of evidence: (1) the query string, from which features are derived independent of external resources, (2) logs of queries previously issued directly to the vertical, and (3) corpora representative of vertical content. We focus on 18 different verticals, which differ in terms of semantics, media type, size, and level of query traffic. We compare our method to prior work in federated search and retrieval effectiveness prediction. An in-depth error analysis reveals unique challenges across different verticals and provides insight into vertical selection for future work.

References

  1. S. M. Beitzel, E. C. Jensen, O. Frieder, D. D. Lewis, A. Chowdhury, and A. Kolcz. Improving automatic query classification via semi-supervised learning. In ICDM 2005, pages 42--49, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. M. Beitzel, E. C. Jensen, D. D. Lewis, A. Chowdhury, and O. Frieder. Automatic classification of web queries using very large unlabeled query logs. TOIS, 25(2):9, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Bhattacharyya. On a measure of divergence between two statistical populations defined by probability distributions. Bull. Calcutta Math. Soc., 35:99 -- 109, 1943.Google ScholarGoogle Scholar
  4. J. Callan. Distributed information retrieval. In W. B. Croft, editor, Advances in Information Retrieval, pages 127--150. Kluwer Academic Publishers, 2000.Google ScholarGoogle Scholar
  5. J. Callan and M. Connell. Query-based sampling of text databases. TOIS, 19(2):97--130, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. P. Callan, Z. Lu, and W. B. Croft. Searching distributed collections with inference networks. In SIGIR 1995, pages 21--28, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Cronen-Townsend, Y. Zhou, and W. B. Croft. Predicting query performance. In SIGIR 2002, pages 299--306, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. F. Diaz. Integration of News Content Into Web Results. In WSDM 2009, pages 182--191, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. L. Gravano, H. Garca-molina, A. Tomasic, I. Rocquencourt, and N. L. Gravano. Gloss: Text-source discovery over the internet. Transactions on Database Systems, 24:229--264, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. X. Li, Y.-Y. Wang, and A. Acero. Learning query intent from regularized click graphs. In SIGIR 2008, pages 339--346, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Y. Li, Z. Zheng, and H. K. Dai. Kdd cup-2005 report: facing a great challenge. SIGKDD Explor. Newsl., 7(2):91--99, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. V. Murdock and M. Lalmas, editors. SIGIR 2008 Workshop on Aggregated Search, 2008.Google ScholarGoogle Scholar
  13. D. Shen, R. Pan, J.--T. Sun, J. J. Pan, K. Wu, J. Yin, and Q. Yang. Q2c@ust: our winning solution to query classification in kddcup 2005. SIGKDD Explor. Newsl., 7(2):100--110, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Shen, J.-T. Sun, Q. Yang, and Z. Chen. Building bridges for web query classification. In SIGIR 2006, pages 131--138, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Shokouhi, J. Zobel, S. Tahaghoghi, and F. Scholer. Using query logs to establish vocabularies in distributed information retrieval. Inf. Process. Manage., 43(1):169--180, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. L. Si. Federated Search of Text Search Engines in Uncooperative Environments. PhD thesis, Carnegie Mellon University, Pittsburgh, PA, 2006.Google ScholarGoogle Scholar
  17. L. Si and J. Callan. Relevant document distribution estimation method for resource selection. In SIGIR 2003, pages 298---305, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. I. H. Witten and T. C. Bell. The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression. Transactions on Information Theory, 37, 1991.Google ScholarGoogle Scholar
  19. J. Xu and W. B. Croft. Cluster--based language models for distributed retrieval. In SIGIR 1999, pages 254--261. ACM, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. B. Yuwono and D. L. Lee. Server ranking for distributed text retrieval systems on the internet. In DASFAA 1997, pages 41--50. World Scientific Press, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Sources of evidence for vertical selection

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
      July 2009
      896 pages
      ISBN:9781605584836
      DOI:10.1145/1571941

      Copyright © 2009 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 19 July 2009

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate792of3,983submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader