ABSTRACT
Building and operating large-scale information retrieval systems used by hundreds of millions of people around the world provides a number of interesting challenges. Designing such systems requires making complex design tradeoffs in a number of dimensions, including (a) the number of user queries that must be handled per second and the response latency to these requests, (b) the number and size of various corpora that are searched, (c) the latency and frequency with which documents are updated or added to the corpora, and (d) the quality and cost of the ranking algorithms that are used for retrieval. In this talk I will discuss the evolution of Google's hardware infrastructure and information retrieval systems and some of the design challenges that arise from ever-increasing demands in all of these dimensions. I will also describe how we use various pieces of distributed systems infrastructure when building these retrieval systems. Finally, I will describe some future challenges and open research problems in this area.
Index Terms
- Challenges in building large-scale information retrieval systems: invited talk
Recommendations
Information retrieval from the World Wide Web: a user-focused approach based on individual experience with search engines
Although search engines are essential tools for finding information on the World Wide Web, the effective use of search engines for information retrieval (IR) is a crucial challenge for any Internet user. Based on the user-focused approach, this study ...
Ranking, relevance judgment, and precision of information retrieval on children's queries: Evaluation of Google, Yahoo!, Bing, Yahoo! Kids, and ask Kids
This study employed benchmarking and intellectual relevance judgment in evaluating Google, Yahoo!, Bing, Yahoo! Kids, and Ask Kids on 30 queries that children formulated to find information for specific tasks. Retrieved hits on given queries were ...
An architecture for personalized health information retrieval
SHB '12: Proceedings of the 2012 international workshop on Smart health and wellbeingWith the rapid proliferation of the Internet, traditional Information Retrieval (IR) techniques need to address challenges that stem from information overload by filtering web documents and ranking them in an order that can be perceived to be more ...
Comments