skip to main content
10.1145/1031171.1031180acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Unified utility maximization framework for resource selection

Published:13 November 2004Publication History

ABSTRACT

This paper presents a unified utility framework for resource selection of distributed text information retrieval. This new framework shows an efficient and effective way to infer the probabilities of relevance of all the documents across the text databases. With the estimated relevance information, resource selection can be made by explicitly optimizing the goals of different applications. Specifically, when used for database recommendation, the selection is optimized for the goal of high-recall (include as many relevant documents as possible in the selected databases); when used for distributed document retrieval, the selection targets the high-precision goal (high precision in the final merged list of documents). This new model provides a more solid framework for distributed information retrieval. Empirical studies show that it is at least as effective as other state-of-the-art algorithms.

References

  1. J. Callan. (2000). Distributed information retrieval. In W.B. Croft, editor, Advances in Information Retrieval. Kluwer Academic Publishers. (pp. 127--150).Google ScholarGoogle Scholar
  2. J. Callan, W.B. Croft, and J. Broglio. (1995). TREC and TIPSTER experiments with INQUERY. Information Processing and Management, 31(3). (pp. 327--343). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. G. Conrad, X. S. Guo, P. Jackson and M. Meziou. (2002). Database selection using actual physical and acquired logical collection resources in a massive domain-specific operational environment. Distributed search over the hidden web: Hierarchical database sampling and selection. In Proceedings of the 28th International Conference on Very Large Databases (VLDB). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. N. Craswell. (2000). Methods for distributed information retrieval. Ph. D. thesis, The Australian Nation University.Google ScholarGoogle Scholar
  5. N. Craswell, D. Hawking, and P. Thistlewaite. (1999). Merging results from isolated search engines. In Proceedings of 10th Australasian Database Conference.Google ScholarGoogle Scholar
  6. D. D'Souza, J. Thom, and J. Zobel. (2000). A comparison of techniques for selecting text collections. In Proceedings of the 11th Australasian Database Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. N. Fuhr. (1999). A Decision-Theoretic approach to database selection in networked IR. ACM Transactions on Information Systems, 17(3). (pp. 229--249). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. L. Gravano, C. Chang, H. Garcia-Molina, and A. Paepcke. (1997). STARTS: Stanford proposal for internet meta-searching. In Proceedings of the 20th ACM-SIGMOD International Conference on Management of Data. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. L. Gravano, P. Ipeirotis and M. Sahami. (2003). QProber: A System for Automatic Classification of Hidden-Web Databases. ACM Transactions on Information Systems, 21(1). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. Ipeirotis and L. Gravano. (2002). Distributed search over the hidden web: Hierarchical database sampling and selection. In Proceedings of the 28th International Conference on Very Large Databases (VLDB). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. InvisibleWeb.com. http://www.invisibleweb.comGoogle ScholarGoogle Scholar
  12. The lemur toolkit. http://www.cs.cmu.edu/ lemurGoogle ScholarGoogle Scholar
  13. J. Lu and J. Callan. (2003). Content-based information retrieval in peer-to-peer networks. In Proceedings of the 12th International Conference on Information and Knowledge Management. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. W. Meng, C.T. Yu and K.L. Liu. (2002) Building efficient and effective metasearch engines. ACM Comput. Surv. 34(1). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. H. Nottelmann and N. Fuhr. (2003). Evaluating different method of estimating retrieval quality for resource selection. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. H., Nottelmann and N., Fuhr. (2003). The MIND architecture for heterogeneous multimedia federated digital libraries. ACM SIGIR 2003 Workshop on Distributed Information Retrieval.Google ScholarGoogle Scholar
  17. A.L. Powell, J.C. French, J. Callan, M. Connell, and C.L. Viles. (2000). The impact of database selection on distributed searching. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A.L. Powell and J.C. French. (2003). Comparing the performance of database selection algorithms. ACM Transactions on Information Systems, 21(4). (pp. 412--456). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. Sherman (2001). Search for the invisible web. Guardian Unlimited.Google ScholarGoogle Scholar
  20. L. Si and J. Callan. (2002). Using sampled data and regression to merge search engine results. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. L. Si and J. Callan. (2003). Relevant document distribution estimation method for resource selection. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. L. Si and J. Callan. (2003). A Semi-Supervised learning method to merge search engine results. ACM Transactions on Information Systems, 21(4). (pp. 457--491). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Unified utility maximization framework for resource selection

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '04: Proceedings of the thirteenth ACM international conference on Information and knowledge management
      November 2004
      678 pages
      ISBN:1581138741
      DOI:10.1145/1031171

      Copyright © 2004 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 13 November 2004

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader