Skip to main content
Erschienen in: Discover Computing 3/2017

25.11.2016 | Information Retrieval Efficiency

Efficient distributed selective search

verfasst von: Yubin Kim, Jamie Callan, J. Shane Culpepper, Alistair Moffat

Erschienen in: Discover Computing | Ausgabe 3/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Simulation and analysis have shown that selective search can reduce the cost of large-scale distributed information retrieval. By partitioning the collection into small topical shards, and then using a resource ranking algorithm to choose a subset of shards to search for each query, fewer postings are evaluated. In this paper we extend the study of selective search into new areas using a fine-grained simulation, examining the difference in efficiency when term-based and sample-based resource selection algorithms are used; measuring the effect of two policies for assigning index shards to machines; and exploring the benefits of index-spreading and mirroring as the number of deployed machines is varied. Results obtained for two large datasets and four large query logs confirm that selective search is significantly more efficient than conventional distributed search architectures and can handle higher query rates. Furthermore, we demonstrate that selective search can be tuned to avoid bottlenecks, and thus maximize usage of the underlying computer hardware.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Altingovde, I. S., Demir, E., Can, F., & Ulusoy, O. (2008). Incremental cluster-based retrieval using compressed cluster-skipping inverted files. ACM Transactions on Information Systems, 26(3), 15:1–15:36.CrossRef Altingovde, I. S., Demir, E., Can, F., & Ulusoy, O. (2008). Incremental cluster-based retrieval using compressed cluster-skipping inverted files. ACM Transactions on Information Systems, 26(3), 15:1–15:36.CrossRef
Zurück zum Zitat Aly, R., Hiemstra, D., & Demeester, T. (2013). Taily: Shard selection using the tail of score distributions. In Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval (pp. 673–682) Aly, R., Hiemstra, D., & Demeester, T. (2013). Taily: Shard selection using the tail of score distributions. In Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval (pp. 673–682)
Zurück zum Zitat Arguello, J., Callan, J., & Diaz, F. (2009). Classification-based resource selection. In Proceedings of the 18th international ACM conference on information and knowledge management (pp. 1277–1286) Arguello, J., Callan, J., & Diaz, F. (2009). Classification-based resource selection. In Proceedings of the 18th international ACM conference on information and knowledge management (pp. 1277–1286)
Zurück zum Zitat Badue, C. S., Baeza-Yates, R., Ribeiro-Neto, B., Ziviani, A., & Ziviani, N. (2007). Analyzing imbalance among homogeneous index servers in a web search system. Information Processing and Management, 43(3), 592–608.CrossRef Badue, C. S., Baeza-Yates, R., Ribeiro-Neto, B., Ziviani, A., & Ziviani, N. (2007). Analyzing imbalance among homogeneous index servers in a web search system. Information Processing and Management, 43(3), 592–608.CrossRef
Zurück zum Zitat Baeza-Yates, R., Castillo, C., Junqueira, F., Plachouras, V., & Silvestri, F. (2007). Challenges on distributed web retrieval. In Proceedings of the 23rd IEEE international conference on data engineering (pp. 6–20) Baeza-Yates, R., Castillo, C., Junqueira, F., Plachouras, V., & Silvestri, F. (2007). Challenges on distributed web retrieval. In Proceedings of the 23rd IEEE international conference on data engineering (pp. 6–20)
Zurück zum Zitat Baeza-Yates, R., Gionis, A., Junqueira, F., Plachouras, V., & Telloli, L. (2009a). On the feasibility of multi-site web search engines. In Proceedings of the 18th international ACM conference on information and knowledge management (pp. 425–434) Baeza-Yates, R., Gionis, A., Junqueira, F., Plachouras, V., & Telloli, L. (2009a). On the feasibility of multi-site web search engines. In Proceedings of the 18th international ACM conference on information and knowledge management (pp. 425–434)
Zurück zum Zitat Baeza-Yates, R., Murdock, V., & Hauff, C. (2009b). Efficiency trade-offs in two-tier web search systems. In Proceedings of the 32nd annual international ACM SIGIR conference on research and development in information retrieval (pp. 163–170) Baeza-Yates, R., Murdock, V., & Hauff, C. (2009b). Efficiency trade-offs in two-tier web search systems. In Proceedings of the 32nd annual international ACM SIGIR conference on research and development in information retrieval (pp. 163–170)
Zurück zum Zitat Barroso, L. A., Dean, J., & Hölzle, U. (2003). Web search for a planet: The Google cluster architecture. IEEE Micro, 23(2), 22–28.CrossRef Barroso, L. A., Dean, J., & Hölzle, U. (2003). Web search for a planet: The Google cluster architecture. IEEE Micro, 23(2), 22–28.CrossRef
Zurück zum Zitat Brefeld, U., Cambazoglu, B. B., & Junqueira, F. P. (2011). Document assignment in multi-site search engines. In Proceedings of the 4th ACM international conference on web search and data mining (pp. 575–584) Brefeld, U., Cambazoglu, B. B., & Junqueira, F. P. (2011). Document assignment in multi-site search engines. In Proceedings of the 4th ACM international conference on web search and data mining (pp. 575–584)
Zurück zum Zitat Broccolo, D., Macdonald, C., Orlando, S., Ounis, I., Perego, R., Silvestri, F., & Tonellotto, N. (2013). Query processing in highly-loaded search engines. In Proceedings of the 20th international symposium on string processing and information retrieval (pp. 49–55) Broccolo, D., Macdonald, C., Orlando, S., Ounis, I., Perego, R., Silvestri, F., & Tonellotto, N. (2013). Query processing in highly-loaded search engines. In Proceedings of the 20th international symposium on string processing and information retrieval (pp. 49–55)
Zurück zum Zitat Broder, A.Z., Carmel, D., Herscovici, M., Soffer, A., & Zien, J. (2003). Efficient query evaluation using a two-level retrieval process. In Proceedings of the 12th international ACM conference on information and knowledge management (pp. 426–434) Broder, A.Z., Carmel, D., Herscovici, M., Soffer, A., & Zien, J. (2003). Efficient query evaluation using a two-level retrieval process. In Proceedings of the 12th international ACM conference on information and knowledge management (pp. 426–434)
Zurück zum Zitat Burkowski, F. J. (1990). Retrieval performance of a distributed database utilising a parallel process document server. In Proceedings of the 2nd international symposium on databases in parallel and distributed systems (pp. 71–79) Burkowski, F. J. (1990). Retrieval performance of a distributed database utilising a parallel process document server. In Proceedings of the 2nd international symposium on databases in parallel and distributed systems (pp. 71–79)
Zurück zum Zitat Cacheda, F., Carneiro, V., Plachouras, V., & Ounis, I. (2007). Performance analysis of distributed information retrieval architectures using an improved network simulation model. Information Processing and Management, 43, 204–224.CrossRef Cacheda, F., Carneiro, V., Plachouras, V., & Ounis, I. (2007). Performance analysis of distributed information retrieval architectures using an improved network simulation model. Information Processing and Management, 43, 204–224.CrossRef
Zurück zum Zitat Cahoon, B., McKinley, K. S., & Lu, Z. (2000). Evaluating the performance of distributed architectures for information retrieval using a variety of workloads. ACM Transactions on Information Systems, 18(1), 1–43.CrossRef Cahoon, B., McKinley, K. S., & Lu, Z. (2000). Evaluating the performance of distributed architectures for information retrieval using a variety of workloads. ACM Transactions on Information Systems, 18(1), 1–43.CrossRef
Zurück zum Zitat Callan, J. (2000). Distributed information retrieval. In Advances in information retrieval (pp. 127–150) Callan, J. (2000). Distributed information retrieval. In Advances in information retrieval (pp. 127–150)
Zurück zum Zitat Callan, J., Connell, M., & Du, A. (1999). Automatic discovery of language models for text databases. In Proceedings of the 1999 ACM SIGMOD international conference on management of data (pp. 479–490) Callan, J., Connell, M., & Du, A. (1999). Automatic discovery of language models for text databases. In Proceedings of the 1999 ACM SIGMOD international conference on management of data (pp. 479–490)
Zurück zum Zitat Cambazoglu, B. B., Kayaaslan, E., Jonassen, S., & Aykanat, C. (2013). A term-based inverted index partitioning model for efficient distributed query processing. ACM Transactions on the Web, 7(3), 15:1–15:23.CrossRef Cambazoglu, B. B., Kayaaslan, E., Jonassen, S., & Aykanat, C. (2013). A term-based inverted index partitioning model for efficient distributed query processing. ACM Transactions on the Web, 7(3), 15:1–15:23.CrossRef
Zurück zum Zitat Cambazoglu, B. B., Varol, E., Kayaaslan, E., Aykanat, C., Baeza-Yates, R. (2010). Query forwarding in geographically distributed search engines. In Proceedings of the 33rd international ACM SIGIR conference on research and development in information retrieval (pp. 90–97) Cambazoglu, B. B., Varol, E., Kayaaslan, E., Aykanat, C., Baeza-Yates, R. (2010). Query forwarding in geographically distributed search engines. In Proceedings of the 33rd international ACM SIGIR conference on research and development in information retrieval (pp. 90–97)
Zurück zum Zitat Can, F., Altingövde, I. S., & Demir, E. (2004). Efficiency and effectiveness of query processing in cluster-based retrieval. Information Systems, 29(8), 697–717.CrossRef Can, F., Altingövde, I. S., & Demir, E. (2004). Efficiency and effectiveness of query processing in cluster-based retrieval. Information Systems, 29(8), 697–717.CrossRef
Zurück zum Zitat Croft, W. B. (1980). A model of cluster searching based on classification. Information Systems, 5(3), 189–195.CrossRef Croft, W. B. (1980). A model of cluster searching based on classification. Information Systems, 5(3), 189–195.CrossRef
Zurück zum Zitat Elsas, J. L., Arguello, J., Callan, J., & Carbonell, J. G. (2008). Retrieval and feedback models for blog feed search. In Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (pp. 347–354) Elsas, J. L., Arguello, J., Callan, J., & Carbonell, J. G. (2008). Retrieval and feedback models for blog feed search. In Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (pp. 347–354)
Zurück zum Zitat Francès, G., Bai, X., Cambazoglu, B. B., & Baeza-Yates, R. (2014) Improving the efficiency of multi-site web search engines. In Proceedings of the 7th ACM international conference on web search and data mining (pp. 3–12) Francès, G., Bai, X., Cambazoglu, B. B., & Baeza-Yates, R. (2014) Improving the efficiency of multi-site web search engines. In Proceedings of the 7th ACM international conference on web search and data mining (pp. 3–12)
Zurück zum Zitat Freire, A., Macdonald, C., Tonellotto, N., Ounis, I., & Cacheda, F. (2013). Hybrid query scheduling for a replicated search engine. In Proceedings of the 35th European conference on information retrieval (pp. 435–446) Freire, A., Macdonald, C., Tonellotto, N., Ounis, I., & Cacheda, F. (2013). Hybrid query scheduling for a replicated search engine. In Proceedings of the 35th European conference on information retrieval (pp. 435–446)
Zurück zum Zitat Gravano, L., García-Molina, H., & Tomasic, A. (1999). GlOSS: Text-source discovery over the internet. ACM Transactions on Database Systems, 24, 229–264.CrossRef Gravano, L., García-Molina, H., & Tomasic, A. (1999). GlOSS: Text-source discovery over the internet. ACM Transactions on Database Systems, 24, 229–264.CrossRef
Zurück zum Zitat Griffiths, A., Luckhurst, H., & Willett, P. (1986). Using inter-document similarity information in document retrieval systems. Journal of the American Society for Information Science, 37, 3–11.CrossRef Griffiths, A., Luckhurst, H., & Willett, P. (1986). Using inter-document similarity information in document retrieval systems. Journal of the American Society for Information Science, 37, 3–11.CrossRef
Zurück zum Zitat Hawking, D., & Thistlewaite, P. (1999). Methods for information server selection. ACM Transactions on Information Systems, 17(1), 40–76.CrossRef Hawking, D., & Thistlewaite, P. (1999). Methods for information server selection. ACM Transactions on Information Systems, 17(1), 40–76.CrossRef
Zurück zum Zitat Kang, C., Wang, X., Chang, Y., & Tseng, B. (2012). Learning to rank with multi-aspect relevance for vertical search. In Proceedings of the 5th ACM international conference on web search and data mining (pp. 453–462) Kang, C., Wang, X., Chang, Y., & Tseng, B. (2012). Learning to rank with multi-aspect relevance for vertical search. In Proceedings of the 5th ACM international conference on web search and data mining (pp. 453–462)
Zurück zum Zitat Kim, J., & Croft, W. B. (2010). Ranking using multiple document types in desktop search. In Proceedings of the 33rd international ACM SIGIR conference on research and development in information retrieval (pp. 50–57) Kim, J., & Croft, W. B. (2010). Ranking using multiple document types in desktop search. In Proceedings of the 33rd international ACM SIGIR conference on research and development in information retrieval (pp. 50–57)
Zurück zum Zitat Kim, Y., Callan, J., Culpepper, J. S., & Moffat, A. (2016a). Does selective search benefit from WAND optimization? In Proceedings of the 38th European conference on information retrieval (pp. 145–158) Kim, Y., Callan, J., Culpepper, J. S., & Moffat, A. (2016a). Does selective search benefit from WAND optimization? In Proceedings of the 38th European conference on information retrieval (pp. 145–158)
Zurück zum Zitat Kim, Y., Callan, J., Culpepper, J. S., & Moffat, A. (2016b) Load-balancing in distributed selective search. In Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval (pp. 905–908) Kim, Y., Callan, J., Culpepper, J. S., & Moffat, A. (2016b) Load-balancing in distributed selective search. In Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval (pp. 905–908)
Zurück zum Zitat Kulkarni, A. (2013). Efficient and effective large-scale search. PhD thesis, Carnegie Mellon University Kulkarni, A. (2013). Efficient and effective large-scale search. PhD thesis, Carnegie Mellon University
Zurück zum Zitat Kulkarni, A., & Callan, J. (2010a). Document allocation policies for selective searching of distributed indexes. In Proceedings of the 19th ACM international conference on information and knowledge management (pp. 449–458) Kulkarni, A., & Callan, J. (2010a). Document allocation policies for selective searching of distributed indexes. In Proceedings of the 19th ACM international conference on information and knowledge management (pp. 449–458)
Zurück zum Zitat Kulkarni, A., & Callan, J. (2010b). Topic-based index partitions for efficient and effective selective search. In SIGIR workshop on large-scale distributed information retrieval Kulkarni, A., & Callan, J. (2010b). Topic-based index partitions for efficient and effective selective search. In SIGIR workshop on large-scale distributed information retrieval
Zurück zum Zitat Kulkarni, A., & Callan, J. (2015). Selective search: Efficient and effective search of large textual collections. ACM Transactions on Information Systems, 33(4), 17:1–17:33.CrossRef Kulkarni, A., & Callan, J. (2015). Selective search: Efficient and effective search of large textual collections. ACM Transactions on Information Systems, 33(4), 17:1–17:33.CrossRef
Zurück zum Zitat Kulkarni, A., Tigelaar, A., Hiemstra, D., & Callan, J. (2012). Shard ranking and cutoff estimation for topically partitioned collections. In Proceedings of the 21st ACM international conference on information and knowledge management (pp. 555–564) Kulkarni, A., Tigelaar, A., Hiemstra, D., & Callan, J. (2012). Shard ranking and cutoff estimation for topically partitioned collections. In Proceedings of the 21st ACM international conference on information and knowledge management (pp. 555–564)
Zurück zum Zitat Liu, X., & Croft, W. B. (2004). Cluster-based retrieval using language models. In Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval (pp. 186–193) Liu, X., & Croft, W. B. (2004). Cluster-based retrieval using language models. In Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval (pp. 186–193)
Zurück zum Zitat Lucchese, C., Orlando, S., Perego, R., & Silvestri, F. (2007). Mining query logs to optimize index partitioning in parallel web search engines. In Proceedings of the 2nd international conference on scalable information systems (pp. 43:1–43:9) Lucchese, C., Orlando, S., Perego, R., & Silvestri, F. (2007). Mining query logs to optimize index partitioning in parallel web search engines. In Proceedings of the 2nd international conference on scalable information systems (pp. 43:1–43:9)
Zurück zum Zitat Macdonald, C., Tonellotto, N., & Ounis, I. (2012). Learning to predict response times for online query scheduling. In Proceedings of the 35th annual international ACM SIGIR conference on research and development in information retrieval (pp. 621–630) Macdonald, C., Tonellotto, N., & Ounis, I. (2012). Learning to predict response times for online query scheduling. In Proceedings of the 35th annual international ACM SIGIR conference on research and development in information retrieval (pp. 621–630)
Zurück zum Zitat Markov, I., & Crestani, F. (2014). Theoretical, qualitative, and quantitative analyses of small-document approaches to resource selection. ACM Transactions on Information Systems, 32(2), 9:1–9:37.CrossRef Markov, I., & Crestani, F. (2014). Theoretical, qualitative, and quantitative analyses of small-document approaches to resource selection. ACM Transactions on Information Systems, 32(2), 9:1–9:37.CrossRef
Zurück zum Zitat Metzler, D., & Croft, W. B. (2005). A Markov random field model for term dependencies. In Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval (pp. 472–479) Metzler, D., & Croft, W. B. (2005). A Markov random field model for term dependencies. In Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval (pp. 472–479)
Zurück zum Zitat Moffat, A., Webber, W., Zobel, J. (2006). Load balancing for term-distributed parallel retrieval. In Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval (pp. 348–355) Moffat, A., Webber, W., Zobel, J. (2006). Load balancing for term-distributed parallel retrieval. In Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval (pp. 348–355)
Zurück zum Zitat Moffat, A., Webber, W., Zobel, J., & Baeza-Yates, R. (2007). A pipelined architecture for distributed text query evaluation. Information Retrieval, 10(3), 205–231.CrossRef Moffat, A., Webber, W., Zobel, J., & Baeza-Yates, R. (2007). A pipelined architecture for distributed text query evaluation. Information Retrieval, 10(3), 205–231.CrossRef
Zurück zum Zitat Orlando, S., Perego, R., & Silvestri, F. (2001). Design of a parallel and distributed web search engine. In Proceedings of the international conference on parallel computing (pp. 197–204) Orlando, S., Perego, R., & Silvestri, F. (2001). Design of a parallel and distributed web search engine. In Proceedings of the international conference on parallel computing (pp. 197–204)
Zurück zum Zitat Paltoglou, G., Salampasis, M., & Satratzemi, M. (2008). Integral based source selection for uncooperative distributed information retrieval environments. In Proceedings of the 2008 ACM workshop on large-scale distributed systems for information retrieval (pp. 67–74) Paltoglou, G., Salampasis, M., & Satratzemi, M. (2008). Integral based source selection for uncooperative distributed information retrieval environments. In Proceedings of the 2008 ACM workshop on large-scale distributed systems for information retrieval (pp. 67–74)
Zurück zum Zitat Powell, A. L., French, J. C., Callan, J., Connell, M., & Viles, C. L. (2000). The impact of database selection on distributed searching. In Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval (pp. 232–239) Powell, A. L., French, J. C., Callan, J., Connell, M., & Viles, C. L. (2000). The impact of database selection on distributed searching. In Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval (pp. 232–239)
Zurück zum Zitat Puppin, D., Silvestri, F., & Laforenza, D. (2006). Query-driven document partitioning and collection selection. In Proceedings of the 1st international conference on scalable information systems (p. 34) Puppin, D., Silvestri, F., & Laforenza, D. (2006). Query-driven document partitioning and collection selection. In Proceedings of the 1st international conference on scalable information systems (p. 34)
Zurück zum Zitat Ribeiro-Neto, B. A., & Barbosa, R. A. (1998). Query performance for tightly coupled distributed digital libraries. In Proceedings of the 3rd ACM conference on digital libraries (pp. 182–190) Ribeiro-Neto, B. A., & Barbosa, R. A. (1998). Query performance for tightly coupled distributed digital libraries. In Proceedings of the 3rd ACM conference on digital libraries (pp. 182–190)
Zurück zum Zitat Risvik, K. M., Aasheim, Y., & Lidal, M. (2003). Multi-tier architecture for Web search engines. In Proceedings of the 1st Latin American web congress (pp. 132–143) Risvik, K. M., Aasheim, Y., & Lidal, M. (2003). Multi-tier architecture for Web search engines. In Proceedings of the 1st Latin American web congress (pp. 132–143)
Zurück zum Zitat Seo, J., & Croft, W. B. (2008). Blog site search using resource selection. In Proceedings of the 17th international ACM conference on information and knowledge management (pp. 1053–1062) Seo, J., & Croft, W. B. (2008). Blog site search using resource selection. In Proceedings of the 17th international ACM conference on information and knowledge management (pp. 1053–1062)
Zurück zum Zitat Shokouhi, M. (2007). Central-rank-based collection selection in uncooperative distributed information retrieval. In Proceedings of the 29th European conference on information retrieval (pp. 160–172) Shokouhi, M. (2007). Central-rank-based collection selection in uncooperative distributed information retrieval. In Proceedings of the 29th European conference on information retrieval (pp. 160–172)
Zurück zum Zitat Shokouhi, M., & Si, L. (2011). Federated search. Foundations and Trends in Information Retrieval, 5(1), 1–102.CrossRef Shokouhi, M., & Si, L. (2011). Federated search. Foundations and Trends in Information Retrieval, 5(1), 1–102.CrossRef
Zurück zum Zitat Si, L., & Callan, J. (2003). Relevant document distribution estimation method for resource selection. In Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval (pp. 298–305) Si, L., & Callan, J. (2003). Relevant document distribution estimation method for resource selection. In Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval (pp. 298–305)
Zurück zum Zitat Si, L., & Callan, J. (2004a). The effect of database size distribution on resource selection algorithms. In Distributed multimedia information retrieval (pp. 31–42). LNCS volume 2924 Si, L., & Callan, J. (2004a). The effect of database size distribution on resource selection algorithms. In Distributed multimedia information retrieval (pp. 31–42). LNCS volume 2924
Zurück zum Zitat Si, L., & Callan, J. (2004b). Unified utility maximization framework for resource selection. In Proceedings of the 13th international ACM conference on information and knowledge management (pp. 32–41) Si, L., & Callan, J. (2004b). Unified utility maximization framework for resource selection. In Proceedings of the 13th international ACM conference on information and knowledge management (pp. 32–41)
Zurück zum Zitat Si, L., & Callan, J. (2005). Modeling search engine effectiveness for federated search. In Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval (pp. 83–90) Si, L., & Callan, J. (2005). Modeling search engine effectiveness for federated search. In Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval (pp. 83–90)
Zurück zum Zitat Thomas, P., & Hawking, D. (2009). Server selection methods in personal metasearch: A comparative empirical study. Information Retrieval, 12(5), 581–604.CrossRef Thomas, P., & Hawking, D. (2009). Server selection methods in personal metasearch: A comparative empirical study. Information Retrieval, 12(5), 581–604.CrossRef
Zurück zum Zitat Thomas, P., & Shokouhi, M. (2009). SUSHI: Scoring scaled samples for server selection. In Proceedings of the 32nd annual international ACM SIGIR conference on research and development in information retrieval (pp. 419–426) Thomas, P., & Shokouhi, M. (2009). SUSHI: Scoring scaled samples for server selection. In Proceedings of the 32nd annual international ACM SIGIR conference on research and development in information retrieval (pp. 419–426)
Zurück zum Zitat Tomasic, A., & Garcia-Molina, H. (1993). Caching and database scaling in distributed shared-nothing information retrieval systems. In Proceedings of the 1993 ACM SIGMOD international conference on management of data (pp. 129–138) Tomasic, A., & Garcia-Molina, H. (1993). Caching and database scaling in distributed shared-nothing information retrieval systems. In Proceedings of the 1993 ACM SIGMOD international conference on management of data (pp. 129–138)
Zurück zum Zitat Tonellotto, N., Macdonald, C., & Ounis, I. (2013). Efficient and effective retrieval using selective pruning. In Proceedings of the 6th ACM international conference on web search and data mining (pp. 63–72) Tonellotto, N., Macdonald, C., & Ounis, I. (2013). Efficient and effective retrieval using selective pruning. In Proceedings of the 6th ACM international conference on web search and data mining (pp. 63–72)
Zurück zum Zitat Voorhees, E. M. (1985). The effectiveness and efficiency of agglomerative hierarchic clustering in document retrieval. Technical report, Cornell University Voorhees, E. M. (1985). The effectiveness and efficiency of agglomerative hierarchic clustering in document retrieval. Technical report, Cornell University
Zurück zum Zitat Webber, W., & Moffat, A. (Dec. 2005). In search of reliable retrieval experiments. In Proceedings of the 10th Australasian document computing symposium (pp. 26–33) Webber, W., & Moffat, A. (Dec. 2005). In search of reliable retrieval experiments. In Proceedings of the 10th Australasian document computing symposium (pp. 26–33)
Zurück zum Zitat Willett, P. (1988). Recent trends in hierarchic document clustering: A critical review. Information Processing and Management, 24(5), 577–597.CrossRef Willett, P. (1988). Recent trends in hierarchic document clustering: A critical review. Information Processing and Management, 24(5), 577–597.CrossRef
Zurück zum Zitat Wu, H., & Fang, H. (2014). Analytical performance modeling for top-k query processing. In Proceedings of the 23rd ACM international conference on information and knowledge management (pp. 1619–1628) Wu, H., & Fang, H. (2014). Analytical performance modeling for top-k query processing. In Proceedings of the 23rd ACM international conference on information and knowledge management (pp. 1619–1628)
Zurück zum Zitat Xu, J., & Croft, W.B. (1999). Cluster-based language models for distributed retrieval. In Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval (pp. 254–261) Xu, J., & Croft, W.B. (1999). Cluster-based language models for distributed retrieval. In Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval (pp. 254–261)
Zurück zum Zitat Yuwono, B., & Lee, D. L. (1997). Server ranking for distributed text retrieval systems on internet. In Proceedings of the 5th international conference on database systems for advanced applications (pp. 41–49) Yuwono, B., & Lee, D. L. (1997). Server ranking for distributed text retrieval systems on internet. In Proceedings of the 5th international conference on database systems for advanced applications (pp. 41–49)
Zurück zum Zitat Zhang, J., & Suel, T. (March 2007). Optimized inverted list assignment in distributed search engine architectures. In Parallel and distributed processing symposium (pp. 1–10) Zhang, J., & Suel, T. (March 2007). Optimized inverted list assignment in distributed search engine architectures. In Parallel and distributed processing symposium (pp. 1–10)
Metadaten
Titel
Efficient distributed selective search
verfasst von
Yubin Kim
Jamie Callan
J. Shane Culpepper
Alistair Moffat
Publikationsdatum
25.11.2016
Verlag
Springer Netherlands
Erschienen in
Discover Computing / Ausgabe 3/2017
Print ISSN: 2948-2984
Elektronische ISSN: 2948-2992
DOI
https://doi.org/10.1007/s10791-016-9290-6

Weitere Artikel der Ausgabe 3/2017

Discover Computing 3/2017 Zur Ausgabe

Premium Partner