Abstract
In networked IR, a client submits a query to a broker, which is in contact with a large number of databases. In order to yield a maximum number of documents at minimum cost, the broker has to make estimates about the retrieval cost of each database, and then decide for each database whether or not to use it for the current query, and if, how many documents to retrieve from it. For this purpose, we develop a general decision-theoretic model and discuss different cost structures. Besides cost for retrieving relevant versus nonrelevant documents, we consider the following parameters for each database: expected retrieval quality, expected number of relevant documents in the database and cost factors for query processing and document delivery. For computing the overall optimum, a divide-and-conquer algorithm is given. If there are several brokers knowing different databases, a preselection of brokers can only be performed heuristically, but the computation of the optimum can be done similarily to the single-broker case. In addition, we derive a formula which estimates the number of relevant documents in a database based on dictionary information.
- ATKINS, D. E., BIRMINGHAM, W. P., DURFEE, E. H., GLOVER, E. J., MULLEN, T., RUNDENSTEINER, E. A., SOLOWAY, E., VIDAL, J. M., WALLACE, R., AND WELLMAN, M. P. 1996. Toward inquiry-based education through interacting software agents. IEEE Computer 29, 5, 69-76. Google Scholar
- BAUMGARTEN, C. 1997. A probabilistic model for distributed information retrieval. In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 258-266. Google Scholar
- BOOKSTEIN, A. 1983. Outline of a general probabilistic retrieval model. J. Doc. 39, 2, 63-72.Google Scholar
- BOWMAN, C. M., DANZIG, P. B., HARDY, D. R., MANBER, U., AND SCHWARTZ, M. F. 1995. The Harvest information discovery and access system. Comput. Networks ISDN Syst. 28, 1-2 (Dec. 1995), 119-125. Google Scholar
- CALLAN, J. P., Lu, Z., AND CROFT, W. B. 1995. Searching distributed collections with inference networks. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '95, Seattle, WA, July 9-13), E. A. Fox, P. Ingwersen, and R. Fidel, Eds. ACM Press, New York, NY, 21-28. Google Scholar
- DANZIG, P., LI, S. -H., AND OBRACZKA, K. 1992. Distributed indexing of autonomous internet services. Comput. Syst. 5, 4, 433-459.Google Scholar
- DREGER, M., FUHR, N., GROSS, J. K., AND LOHRUM, S. 1998. Provider selection--Design and implementation of the Medoc broker. In Digital Libraries in Computer Science, A. Barth, A. Endres, and A. deKemp, Eds. Springer-Verlag, Berlin, Germany, 67-78. Google Scholar
- ENDRES, A. AND FUHR, N. 1998. Students access books and journals through MeDoc. Commun. ACM 41, 4, 76-77. Google Scholar
- FRENCH, J., POWELL, A., VILES, C., EMMITT, T., AND PREY, K. 1998. Evaluating database selection techniques: A testbed and experiment. In Proceedings of the 21st Annual ACM International Conference on Research and Development in Information Retrieval, W. B. Croft, A. Moffat, C. J. van Rijsbergen, R. Wilkinson, and J. Zobel, Eds. ACM Press, New York, NY, 121-129. Google Scholar
- FUHR, N. 1989. Optimum polynomial retrieval functions based on the probability ranking principle. ACM Trans. Inf. Syst. 7, 3 (July 1989), 183-204. Google Scholar
- FUHR, N. 1996. Object-oriented and database concepts for the design of networked information retrieval systems. In Proceedings of the fifth international conference on Information and knowledge management (CIKM '96, Rockville, MD, Nov. 12-16, 1996), C. Nicholas, N. Pissinou, M.T. zsu, and K. Barker, Eds. ACM Press, New York, NY, 164-172. Google Scholar
- G VERT, N. 1997. Database selection in networked information retrieval systems. Diploma thesis. Department of Computer Science, University of Dortmund, Dortmund, Germany.Google Scholar
- GRAVANO, L. AND GARC A-MOLINA, H. 1996. Generalizing GLOSS to vector-space databases and broker hierarchies. In Proceedings of the 21st VLDB Conference (Zurich, Switzerland). VLDB Endowment, Berkeley, CA, 78-89. Google Scholar
- GRAVANO, L., GARC A-MOLINA, H., AND TOMASIC, A. 1994. The effectiveness of GIOSS for the text database discovery problem. SIGMOD Rec. 23, 2 (June 1994), 126-137. Google Scholar
- HARMAN, D. K. 1995. Overview of the second text retrieval conference (TREC-2). Inf. Process. Manage. 31, 3 (May-June), 271-289. Google Scholar
- KAHLE, B., MORRIS, H., GOLDMAN, J., ERICKSON, T., AND CURRAN, J. 1993. Interfaces for distributed systems of information servers. J. Am. Soc. Inf. Sci. 44, 8 (Sept. 1993), 453-467. Google Scholar
- ROBERTSON, S. E. 1977. The probability ranking principle in IR. J. Doc. 33, 4, 294-304.Google Scholar
- TURTLE, H. AND CROFT, W. B. 1991. Evaluation of an inference network-based retrieval model. ACM Trans. Inf. Syst. 9, 3 (July 1991), 187-222. Google Scholar
- VAN RIJSBERGEN, C.J. 1986. A non-classical logic for information retrieval. Comput. J. 29, 6, 481-485.Google Scholar
- VOORHEES, E. M., GUPTA, N. K., AND JOHNSON-LAIRD, B. 1995. Learning collection fusion strategies. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '95, Seattle, WA, July 9-13), E. A. Fox, P. Ingwersen, and R. Fidel, Eds. ACM Press, New York, NY, 172-179. Google Scholar
- WONG, S. K. M. AND YAO, Y.Y. 1995. On modeling information retrieval with probabilistic inference. ACM Trans. Inf. Syst. 13, 1 (Jan.), 38-68. Google Scholar
- Xu, J. AND CALLAN, J. 1998. Effective retrieval with distributed collections. In Proceedings of the 21st Annual ACM International Conference on Research and Development in Information Retrieval, W. B. Croft, A. Moffat, C. J. van Rijsbergen, R. Wilkinson, and J. Zobel, Eds. ACM Press, New York, NY, 112-120. Google Scholar
Index Terms
- A decision-theoretic approach to database selection in networked IR
Recommendations
Experiments with a component theory of probabilistic information retrieval based on single terms as document components
A component theory of information retrieval using single content terms as component for queries and documents was reviewed and experimented with. The theory has the advantages of being able to (1) bootstrap itself, that is, define initial term weights ...
Incremental Relevance Feedback in Japanese Text Retrieval
AbstractThe application of relevance feedback techniques has been shown to improve retrieval performance for a number of information retrieval tasks. This paper explores incremental relevance feedback for ad hoc Japanese text retrieval; examining, ...
Probabilistic passage models for semantic search of genomics literature
We explore unsupervised learning techniques for extracting semantic information about biomedical concepts and topics, and introduce a passage retrieval model for using these semantics in context to improve genomics literature search. Our contributions ...
Comments