article

Free Access

A decision-theoretic approach to database selection in networked IR

Author:
Norbert Fuhr

University of Dortmund

University of Dortmund
View Profile

Authors Info & Claims

ACM Transactions on Information Systems Volume 17 Issue 3pp 229–249https://doi.org/10.1145/314516.314517

Published:01 July 1999Publication History

ACM Transactions on Information Systems

Abstract

In networked IR, a client submits a query to a broker, which is in contact with a large number of databases. In order to yield a maximum number of documents at minimum cost, the broker has to make estimates about the retrieval cost of each database, and then decide for each database whether or not to use it for the current query, and if, how many documents to retrieve from it. For this purpose, we develop a general decision-theoretic model and discuss different cost structures. Besides cost for retrieving relevant versus nonrelevant documents, we consider the following parameters for each database: expected retrieval quality, expected number of relevant documents in the database and cost factors for query processing and document delivery. For computing the overall optimum, a divide-and-conquer algorithm is given. If there are several brokers knowing different databases, a preselection of brokers can only be performed heuristically, but the computation of the optimum can be done similarily to the single-broker case. In addition, we derive a formula which estimates the number of relevant documents in a database based on dictionary information.

References

ATKINS, D. E., BIRMINGHAM, W. P., DURFEE, E. H., GLOVER, E. J., MULLEN, T., RUNDENSTEINER, E. A., SOLOWAY, E., VIDAL, J. M., WALLACE, R., AND WELLMAN, M. P. 1996. Toward inquiry-based education through interacting software agents. IEEE Computer 29, 5, 69-76. Google Scholar
BAUMGARTEN, C. 1997. A probabilistic model for distributed information retrieval. In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 258-266. Google Scholar
BOOKSTEIN, A. 1983. Outline of a general probabilistic retrieval model. J. Doc. 39, 2, 63-72.Google Scholar
BOWMAN, C. M., DANZIG, P. B., HARDY, D. R., MANBER, U., AND SCHWARTZ, M. F. 1995. The Harvest information discovery and access system. Comput. Networks ISDN Syst. 28, 1-2 (Dec. 1995), 119-125. Google Scholar
CALLAN, J. P., Lu, Z., AND CROFT, W. B. 1995. Searching distributed collections with inference networks. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '95, Seattle, WA, July 9-13), E. A. Fox, P. Ingwersen, and R. Fidel, Eds. ACM Press, New York, NY, 21-28. Google Scholar
DANZIG, P., LI, S. -H., AND OBRACZKA, K. 1992. Distributed indexing of autonomous internet services. Comput. Syst. 5, 4, 433-459.Google Scholar
DREGER, M., FUHR, N., GROSS, J. K., AND LOHRUM, S. 1998. Provider selection--Design and implementation of the Medoc broker. In Digital Libraries in Computer Science, A. Barth, A. Endres, and A. deKemp, Eds. Springer-Verlag, Berlin, Germany, 67-78. Google Scholar
ENDRES, A. AND FUHR, N. 1998. Students access books and journals through MeDoc. Commun. ACM 41, 4, 76-77. Google Scholar
FRENCH, J., POWELL, A., VILES, C., EMMITT, T., AND PREY, K. 1998. Evaluating database selection techniques: A testbed and experiment. In Proceedings of the 21st Annual ACM International Conference on Research and Development in Information Retrieval, W. B. Croft, A. Moffat, C. J. van Rijsbergen, R. Wilkinson, and J. Zobel, Eds. ACM Press, New York, NY, 121-129. Google Scholar
FUHR, N. 1989. Optimum polynomial retrieval functions based on the probability ranking principle. ACM Trans. Inf. Syst. 7, 3 (July 1989), 183-204. Google Scholar
FUHR, N. 1996. Object-oriented and database concepts for the design of networked information retrieval systems. In Proceedings of the fifth international conference on Information and knowledge management (CIKM '96, Rockville, MD, Nov. 12-16, 1996), C. Nicholas, N. Pissinou, M.T. zsu, and K. Barker, Eds. ACM Press, New York, NY, 164-172. Google Scholar
G VERT, N. 1997. Database selection in networked information retrieval systems. Diploma thesis. Department of Computer Science, University of Dortmund, Dortmund, Germany.Google Scholar
GRAVANO, L. AND GARC A-MOLINA, H. 1996. Generalizing GLOSS to vector-space databases and broker hierarchies. In Proceedings of the 21st VLDB Conference (Zurich, Switzerland). VLDB Endowment, Berkeley, CA, 78-89. Google Scholar
GRAVANO, L., GARC A-MOLINA, H., AND TOMASIC, A. 1994. The effectiveness of GIOSS for the text database discovery problem. SIGMOD Rec. 23, 2 (June 1994), 126-137. Google Scholar
HARMAN, D. K. 1995. Overview of the second text retrieval conference (TREC-2). Inf. Process. Manage. 31, 3 (May-June), 271-289. Google Scholar
KAHLE, B., MORRIS, H., GOLDMAN, J., ERICKSON, T., AND CURRAN, J. 1993. Interfaces for distributed systems of information servers. J. Am. Soc. Inf. Sci. 44, 8 (Sept. 1993), 453-467. Google Scholar
ROBERTSON, S. E. 1977. The probability ranking principle in IR. J. Doc. 33, 4, 294-304.Google Scholar
TURTLE, H. AND CROFT, W. B. 1991. Evaluation of an inference network-based retrieval model. ACM Trans. Inf. Syst. 9, 3 (July 1991), 187-222. Google Scholar
VAN RIJSBERGEN, C.J. 1986. A non-classical logic for information retrieval. Comput. J. 29, 6, 481-485.Google Scholar
VOORHEES, E. M., GUPTA, N. K., AND JOHNSON-LAIRD, B. 1995. Learning collection fusion strategies. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '95, Seattle, WA, July 9-13), E. A. Fox, P. Ingwersen, and R. Fidel, Eds. ACM Press, New York, NY, 172-179. Google Scholar
WONG, S. K. M. AND YAO, Y.Y. 1995. On modeling information retrieval with probabilistic inference. ACM Trans. Inf. Syst. 13, 1 (Jan.), 38-68. Google Scholar
Xu, J. AND CALLAN, J. 1998. Effective retrieval with distributed collections. In Proceedings of the 21st Annual ACM International Conference on Research and Development in Information Retrieval, W. B. Croft, A. Moffat, C. J. van Rijsbergen, R. Wilkinson, and J. Zobel, Eds. ACM Press, New York, NY, 112-120. Google Scholar

Index Terms

A decision-theoretic approach to database selection in networked IR
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
  2. Information storage systems
    1. Storage architectures
      1. Storage network architectures

Recommendations

Experiments with a component theory of probabilistic information retrieval based on single terms as document components

A component theory of information retrieval using single content terms as component for queries and documents was reviewed and experimented with. The theory has the advantages of being able to (1) bootstrap itself, that is, define initial term weights ...
Read More
Incremental Relevance Feedback in Japanese Text Retrieval
Abstract
The application of relevance feedback techniques has been shown to improve retrieval performance for a number of information retrieval tasks. This paper explores incremental relevance feedback for ad hoc Japanese text retrieval; examining, ...
Read More
Probabilistic passage models for semantic search of genomics literature

We explore unsupervised learning techniques for extracting semantic information about biomedical concepts and topics, and introduce a passage retrieval model for using these semantics in context to improve genomics literature search. Our contributions ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Information Systems Volume 17, Issue 3
July 1999
113 pages
ISSN:1046-8188
EISSN:1558-2868
DOI:10.1145/314516
Issue’s Table of Contents

Copyright © 1999 ACM
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 July 1999
Published in tois Volume 17, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
networked retrieval
probabilistic retrieval
probability ranking principle
resource discovery
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 114
  Total Citations
  View Citations
- 703
  Total Downloads
- Downloads (Last 12 months)34
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A decision-theoretic approach to database selection in networked IR

ACM Transactions on Information Systems

Abstract

References

Cited By

Index Terms

Recommendations

Experiments with a component theory of probabilistic information retrieval based on single terms as document components

Incremental Relevance Feedback in Japanese Text Retrieval

Probabilistic passage models for semantic search of genomics literature

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A decision-theoretic approach to database selection in networked IR

ACM Transactions on Information Systems

Abstract

References

Cited By

Index Terms

Recommendations

Experiments with a component theory of probabilistic information retrieval based on single terms as document components

Incremental Relevance Feedback in Japanese Text Retrieval

Probabilistic passage models for semantic search of genomics literature

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media