skip to main content
10.1145/860435.860490acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Relevant document distribution estimation method for resource selection

Published:28 July 2003Publication History

ABSTRACT

Prior research under a variety of conditions has shown the CORI algorithm to be one of the most effective resource selection algorithms, but the range of database sizes studied was not large. This paper shows that the CORI algorithm does not do well in environments with a mix of "small" and "very large" databases. A new resource selection algorithm is proposed that uses information about database sizes as well as database contents. We also show how to acquire database size estimates in uncooperative environments as an extension of the query-based sampling used to acquire resource descriptions. Experiments demonstrate that the database size estimates are more accurate for large databases than estimates produced by a competing method; the new resource ranking algorithm is always at least as effective as the CORI algorithm; and the new algorithm results in better document rankings than the CORI algorithm.

References

  1. J. Callan. (2000). Distributed information retrieval. In W.B. Croft, editor, Advances in Information Retrieval. Kluwer Academic Publishers. (pp. 127--150).Google ScholarGoogle Scholar
  2. J. Callan, W.B. Croft, and J. Broglio. (1995). TREC and TIPSTER experiments with INQUERY. Information Processing and Management, 31(3). (pp. 327--343). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. N. Craswell. (2000). Methods for distributed information retrieval http://pigfish.vic.cmis.csiro.au/~nickc/pubs/craswellthesis.pdf. Ph. D. thesis, The Australian Nation University.Google ScholarGoogle Scholar
  4. A. Le Calv and J. Savoy. (2000). Database merging strategy based on logistic regression. Information Processing and Management, 36(3). (pp. 341--359). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. D'Souza, J. Thom, and J. Zobel. (2000). A comparison of techniques for selecting text collections. In Proceedings of the Eleventh Australasian Database Conference (ADC). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. L. Gravano, C. Chang, H. Garcia-Molina, and A. Paepcke. (1997). STARTS: Stanford proposal for internet meta-searching. In Proceedings of the 20th ACM-SIGMOD International Conference on Management of Data. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J.C. French, A.L. Powell, J. Callan, C.L. Viles, T. Emmitt, K.J. Prey, and Y. Mou. (1999). Comparing the performance of database selection algorithms http://www-2.cs.cmu.edu/~callan/Papers/sigir99b.ps. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. Ipeirotis and L. Gravano. (2002). Distributed search over the hidden web: Hierarchical database sampling and selection. In Proceedings of the 28th International Conference on Very Large Databases (VLDB). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. The lemur toolkit. http://www.cs.cmu.edu/~lemurGoogle ScholarGoogle Scholar
  10. InvisibleWeb.com. http://www.invisibleweb.com/Google ScholarGoogle Scholar
  11. K.L. Liu, C. Yu, W. Meng, A. Santos and C. Zhang. (2001). Discovering the representative of a search engine. In Proceedings of 10th ACM International Conference on Information and Knowledge Management (CIKM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A.L. Powell, J.C. French, J. Callan, M. Connell, and C.L. Viles, (2000). The impact of database selection on distributed searching. http://www.cs.cmu.edu/~callan/Papers/sigir00b.ps In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. L. Si and J. Callan. (2002). Using sampled data and regression to merge search engine results. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Xu and J. Callan. (1998). Effective retrieval with distributed collections. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Robertson and S. Walker. (1994). Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Relevant document distribution estimation method for resource selection

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
      July 2003
      490 pages
      ISBN:1581136463
      DOI:10.1145/860435

      Copyright © 2003 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 28 July 2003

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      SIGIR '03 Paper Acceptance Rate46of266submissions,17%Overall Acceptance Rate792of3,983submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader