ABSTRACT
This paper addresses the problem of merging results obtained from different databases and search engines in a distributed information retrieval environment. The prior research on this problem either assumed the exchange of statistics necessary for normalizing scores (cooperative solutions) or is heuristic. Both approaches have disadvantages. We show that the problem in uncooperative environments is simpler when viewed as a component of a distributed IR system that uses query-based sampling to create resource descriptions. Documents sampled for creating resource descriptions can also be used to create a sample centralized index, and this index is a source of training data for adaptive results merging algorithms. A variety of experiments demonstrate that this new approach is more effective than a well-known alternative, and that it allows query-by-query tuning of the results merging function.
- J. Callan. Distributed information retrieval. In W.B. Croft, editor, Advances in information retrieval. pp. 127--150. Kluwer Academic Publishers, 2000.Google Scholar
- J. Callan, W.B. Croft, and J. Broglio, TREC and TIPSTER experiments with INQUERY. Information Processing and Management, 31(3):327--343, 1995. Google ScholarDigital Library
- L. Gravano, C. Chang, H. Garcia-Molina, and A. Paepcke. STARTS: Stanford Proposal for Internet Meta-Searching. In Proc. of the ACM-SIGMOD Int'l Conference on Management of Data, 1997. Google ScholarDigital Library
- J. Callan and M. Connell. Query-based sampling of text databases. ACM Transactions on Information Systems, 19(2):97--130, 2001. Google ScholarDigital Library
- N. Fuhr. A decision-theoretic approach to database selection in networked IR. ACM Transactions on Information Systems, 17(3):229--249, 1999. Google ScholarDigital Library
- L. Gravano and H. Garcia-Molina. Generalizing GloSS to Vector-Space Databases and Broker Hierarchies. In Proceedings of the 21st International Conference on Very Large Databases (VLDB), 1995. Google ScholarDigital Library
- J. Xu and J. Callan. Effective Retrieval with Distributed Collections. In Proc. of the 21st Annual Int'l ACM SIGIR Conference on Research and Development in Information Retrieval, 1998. Google ScholarDigital Library
- B. Yuwono and D. Lee. Server Ranking for Distributed Text Retrieval Systems on Internet. In Proc. of the Int. Conf. on Database Systems for Adv. Applications, pages 41--49, 1997. Google ScholarDigital Library
- N. Craswell, P.Bailey, and D.Hawking. Server selection on the World Wide Web. In Proc. of the Fifth ACM Conference on Digital Libraries, pp. 37--46. ACM, 2000. Google ScholarDigital Library
- C. L. Viles and J. C. French. Dissemination of Collection Wide Information in a Distributed Information Retrieval System. In Proc. of the 18th Annual Int'l ACM SIGIR Conference on Research and Development in Information Retrieval, 1995. Google ScholarDigital Library
- S. T. Kirsch. Document retrieval over networks wherein ranking and relevance scores are computed at the client for multiple database documents. U.S. Patent 5,659,732.Google Scholar
- N. Craswell, D. Hawking, and P. Thistlewaite. Merging Results from Isolated Search Engines. In Proc. of the Tenth Australasian Database Conf., pages 189--200, 1999.Google Scholar
- J.C. French, A.L. Powell, J. Callan, C.L. Viles, T. Emmitt, K.J. Prey, and Y. Mou. Comparing the performance of database selection algorithms. In Proc. of the 22nd Annual Int'l ACM SIGIR Conference on Research and Development in Information Retrieval, 1999. Google ScholarDigital Library
- J. H. Lee. Analyses of multiple evidence combination. In Proc. of the 20th Annual Int'l ACM SIGIR Conference on Research and Development in Information Retrieval, 1997. Google ScholarDigital Library
- R. Manmatha, T. Rath, and F. Feng. Modeling score distributions for combining the outputs of search engines. In Proc. of the 24th Annual Int'l ACM SIGIR Conference on Research and Development in Information Retrieval, 2001. Google ScholarDigital Library
- A. Le Calv , J. Savoy. Database Merging Strategy Based on Logistic Regression. Information Processing & Management, 36(3), 2000. Google ScholarDigital Library
- A.L. Powell, J.C. French, J. Callan, M. Connell, and C.L. Viles, The impact of database selection on distributed searching. In Proc. of the 23rd Annual Int'l ACM SIGIR Conference on Research and Development in Information Retrieval, 2000. Google ScholarDigital Library
- J. Xu and W.B. Croft, Cluster-based language models for distributed retrieval. In Proc. of the 22nd Annual Int'l ACM SIGIR Conference on Research and Development in Information Retrieval, 1999. Google ScholarDigital Library
- C. Buckley, A. Singhal, M. Mitra, and G. Salton, New retrieval approaches using SMART. In Proceedings of 1995 Text REtrieval Conference (TREC-3). National Institute of Standards and Technology, special publication.Google Scholar
- L. Larkey, M. Connell, and J. Callan. Collection selection and results merging with topically organized U.S. patents and TREC data. In Proceedings of Conference of Information and Knowledge Management, 2000. Google ScholarDigital Library
- J. A. Aslam, M. Montague. Models for Metasearch. In Proc. of the 23rd Annual Int'l ACM SIGIR Conference on Research and Development in Information Retrieval, 2001. Google ScholarDigital Library
- P. Ogilvie, J. Callan. Experiments using the Lemur toolkit. In Proc of 2001 Text REtrieval Conference (TREC 2001). National Institute of Standards and Technology, special publication.Google Scholar
- Ellen Voorhees, Narendra K. Gupta, and Ben Johnson-Laird. Learning Collection Fusion Strategies. In Proc. of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1995. Google ScholarDigital Library
Index Terms
- Using sampled data and regression to merge search engine results
Recommendations
A semisupervised learning method to merge search engine results
The proliferation of searchable text databases on local area networks and the Internet causes the problem of finding information that may be distributed among many disjoint text databases (distributed information retrieval). How to merge the results ...
An Implemented Rank Merging Algorithm for Meta Search Engine
ICRCCS '09: Proceedings of the 2009 International Conference on Research Challenges in Computer ScienceIn order to improve the precision of meta search engine, In the foundation of analysis two kinds traditional merging algorithm of meta search engine, and the disadvantage of two algorithms are given, one kind of new about the results merging method ...
Results merging algorithm using multiple regression models
ECIR'07: Proceedings of the 29th European conference on IR researchThis paper describes a new algorithm for merging the results of remote collections in a distributed information retrieval environment. The algorithm makes use only of the ranks of the returned documents, thus making it very efficient in environments ...
Comments