skip to main content
article
Free Access

A decision-theoretic approach to database selection in networked IR

Published:01 July 1999Publication History
Skip Abstract Section

Abstract

In networked IR, a client submits a query to a broker, which is in contact with a large number of databases. In order to yield a maximum number of documents at minimum cost, the broker has to make estimates about the retrieval cost of each database, and then decide for each database whether or not to use it for the current query, and if, how many documents to retrieve from it. For this purpose, we develop a general decision-theoretic model and discuss different cost structures. Besides cost for retrieving relevant versus nonrelevant documents, we consider the following parameters for each database: expected retrieval quality, expected number of relevant documents in the database and cost factors for query processing and document delivery. For computing the overall optimum, a divide-and-conquer algorithm is given. If there are several brokers knowing different databases, a preselection of brokers can only be performed heuristically, but the computation of the optimum can be done similarily to the single-broker case. In addition, we derive a formula which estimates the number of relevant documents in a database based on dictionary information.

References

  1. ATKINS, D. E., BIRMINGHAM, W. P., DURFEE, E. H., GLOVER, E. J., MULLEN, T., RUNDENSTEINER, E. A., SOLOWAY, E., VIDAL, J. M., WALLACE, R., AND WELLMAN, M. P. 1996. Toward inquiry-based education through interacting software agents. IEEE Computer 29, 5, 69-76. Google ScholarGoogle Scholar
  2. BAUMGARTEN, C. 1997. A probabilistic model for distributed information retrieval. In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 258-266. Google ScholarGoogle Scholar
  3. BOOKSTEIN, A. 1983. Outline of a general probabilistic retrieval model. J. Doc. 39, 2, 63-72.Google ScholarGoogle Scholar
  4. BOWMAN, C. M., DANZIG, P. B., HARDY, D. R., MANBER, U., AND SCHWARTZ, M. F. 1995. The Harvest information discovery and access system. Comput. Networks ISDN Syst. 28, 1-2 (Dec. 1995), 119-125. Google ScholarGoogle Scholar
  5. CALLAN, J. P., Lu, Z., AND CROFT, W. B. 1995. Searching distributed collections with inference networks. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '95, Seattle, WA, July 9-13), E. A. Fox, P. Ingwersen, and R. Fidel, Eds. ACM Press, New York, NY, 21-28. Google ScholarGoogle Scholar
  6. DANZIG, P., LI, S. -H., AND OBRACZKA, K. 1992. Distributed indexing of autonomous internet services. Comput. Syst. 5, 4, 433-459.Google ScholarGoogle Scholar
  7. DREGER, M., FUHR, N., GROSS, J. K., AND LOHRUM, S. 1998. Provider selection--Design and implementation of the Medoc broker. In Digital Libraries in Computer Science, A. Barth, A. Endres, and A. deKemp, Eds. Springer-Verlag, Berlin, Germany, 67-78. Google ScholarGoogle Scholar
  8. ENDRES, A. AND FUHR, N. 1998. Students access books and journals through MeDoc. Commun. ACM 41, 4, 76-77. Google ScholarGoogle Scholar
  9. FRENCH, J., POWELL, A., VILES, C., EMMITT, T., AND PREY, K. 1998. Evaluating database selection techniques: A testbed and experiment. In Proceedings of the 21st Annual ACM International Conference on Research and Development in Information Retrieval, W. B. Croft, A. Moffat, C. J. van Rijsbergen, R. Wilkinson, and J. Zobel, Eds. ACM Press, New York, NY, 121-129. Google ScholarGoogle Scholar
  10. FUHR, N. 1989. Optimum polynomial retrieval functions based on the probability ranking principle. ACM Trans. Inf. Syst. 7, 3 (July 1989), 183-204. Google ScholarGoogle Scholar
  11. FUHR, N. 1996. Object-oriented and database concepts for the design of networked information retrieval systems. In Proceedings of the fifth international conference on Information and knowledge management (CIKM '96, Rockville, MD, Nov. 12-16, 1996), C. Nicholas, N. Pissinou, M.T. zsu, and K. Barker, Eds. ACM Press, New York, NY, 164-172. Google ScholarGoogle Scholar
  12. G VERT, N. 1997. Database selection in networked information retrieval systems. Diploma thesis. Department of Computer Science, University of Dortmund, Dortmund, Germany.Google ScholarGoogle Scholar
  13. GRAVANO, L. AND GARC A-MOLINA, H. 1996. Generalizing GLOSS to vector-space databases and broker hierarchies. In Proceedings of the 21st VLDB Conference (Zurich, Switzerland). VLDB Endowment, Berkeley, CA, 78-89. Google ScholarGoogle Scholar
  14. GRAVANO, L., GARC A-MOLINA, H., AND TOMASIC, A. 1994. The effectiveness of GIOSS for the text database discovery problem. SIGMOD Rec. 23, 2 (June 1994), 126-137. Google ScholarGoogle Scholar
  15. HARMAN, D. K. 1995. Overview of the second text retrieval conference (TREC-2). Inf. Process. Manage. 31, 3 (May-June), 271-289. Google ScholarGoogle Scholar
  16. KAHLE, B., MORRIS, H., GOLDMAN, J., ERICKSON, T., AND CURRAN, J. 1993. Interfaces for distributed systems of information servers. J. Am. Soc. Inf. Sci. 44, 8 (Sept. 1993), 453-467. Google ScholarGoogle Scholar
  17. ROBERTSON, S. E. 1977. The probability ranking principle in IR. J. Doc. 33, 4, 294-304.Google ScholarGoogle Scholar
  18. TURTLE, H. AND CROFT, W. B. 1991. Evaluation of an inference network-based retrieval model. ACM Trans. Inf. Syst. 9, 3 (July 1991), 187-222. Google ScholarGoogle Scholar
  19. VAN RIJSBERGEN, C.J. 1986. A non-classical logic for information retrieval. Comput. J. 29, 6, 481-485.Google ScholarGoogle Scholar
  20. VOORHEES, E. M., GUPTA, N. K., AND JOHNSON-LAIRD, B. 1995. Learning collection fusion strategies. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '95, Seattle, WA, July 9-13), E. A. Fox, P. Ingwersen, and R. Fidel, Eds. ACM Press, New York, NY, 172-179. Google ScholarGoogle Scholar
  21. WONG, S. K. M. AND YAO, Y.Y. 1995. On modeling information retrieval with probabilistic inference. ACM Trans. Inf. Syst. 13, 1 (Jan.), 38-68. Google ScholarGoogle Scholar
  22. Xu, J. AND CALLAN, J. 1998. Effective retrieval with distributed collections. In Proceedings of the 21st Annual ACM International Conference on Research and Development in Information Retrieval, W. B. Croft, A. Moffat, C. J. van Rijsbergen, R. Wilkinson, and J. Zobel, Eds. ACM Press, New York, NY, 112-120. Google ScholarGoogle Scholar

Index Terms

  1. A decision-theoretic approach to database selection in networked IR

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader