ABSTRACT
Web search engines are facing formidable performance challenges due to data sizes and query loads. The major engines have to process tens of thousands of queries per second over tens of billions of documents. To deal with this heavy workload, such engines employ massively parallel systems consisting of thousands of machines. The significant cost of operating these systems has motivated a lot of recent research into more efficient query processing mechanisms. We investigate a new way to build such high performance IR systems using graphical processing units (GPUs). GPUs were originally designed to accelerate computer graphics applications through massive on-chip parallelism. Recently a number of researchers have studied how to use GPUs for other problem domains such as databases and scientific computing. Our contribution here is to design a basic system architecture for GPU-based high-performance IR, to develop suitable algorithms for subtasks such as inverted list compression, list intersection, and top-$k$ scoring, and to show how to achieve highly efficient query processing on GPU-based systems. Our experimental results for a prototype GPU-based system on $25.2$ million web pages indicate that significant gains in query processing performance can be obtained.
- Nvidia CUDA programming guide, June 2007. http://www.nvidia.com/object/cuda develop.html.Google Scholar
- R. Baeza-Yates, A. Gionis, F. Junqueira, V. Murdock, V. Plachouras, and F. Silvestri. The impact of caching on search engines. In Proc. of the 30th Annual SIGIR Conference, 2007. Google ScholarDigital Library
- G. Blelloch. Prefix sums and their applications. In J. H. Reif, editor, Synthesis of Parallel Algorithms, pages 35--60, 1993.Google Scholar
- S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In 7th World Wide Web Conference, 1998. Google ScholarDigital Library
- A. Broder, D. Carmel, M. Herscovici, A. Soffer, and J. Zien. Efficient query evaluation using a two-level retrieval process. In Proc. of the 12th Conf. on Information and Knowledge Management, pages 426--434, Nov 2003. Google ScholarDigital Library
- J. Cho and A. Ntoulas. Pruning policies for two-tiered inverted index with correctness guarantee. In Proc. of the 30th Annual SIGIR Conference, 2007. Google ScholarDigital Library
- R. Cole. Parallel merge sort. SIAM J. on Computing, 17, 1988. Google ScholarDigital Library
- N. Govindaraju, J. Gray, R. Kumar, and D. Manocha. Gputerasort: high performance graphics co-processor sorting for large database management. In Proc. of the Int. Conf. on Management of Data, 2006. Google ScholarDigital Library
- N. Govindaraju, B. Lloyd, W. Wang, M. Lin, and D. Manocha. Fast computation of database operations using graphics processors. In Proc. of the Int. Conf. on Computer Graphics and Interactive Techniques, 2005. Google ScholarDigital Library
- M. Harris. Parallel prefix sum (scan) with CUDA, April 2007. http://developer.download.nvidia.com/compute/cuda/sdk/website/projects/scan/doc/scan.pdf.Google Scholar
- B. He, W. Fang, Q. Luo, N. Govindaraju, and T. Wang. Mars: A mapreduce framework on graphics processors. In Proc. of Parallel Architectures and Compilation Techniques, 2008. Google ScholarDigital Library
- B. He, K. Yang, R. Fang, M. Lu, N. Govindaraju, Q. Luo, and P. Sander. Relational joins on graphics processors. In Proc. of the ACM SIGMOD International Conference, 2008. Google ScholarDigital Library
- S. Heman. Super-scalar database compression between RAM and CPU-cache. MS Thesis, Centrum voor Wiskunde en Informatica, Amsterdam, Netherlands, July 2005.Google Scholar
- S. Heman, M. Zukowski, A. de Vries, and P. Boncz. MonetDBX100 at the 2006 TREC Terabyte Track. In Proc. of the 15th Text REtrieval Conference (TREC), 2006.Google Scholar
- M. Kaszkiel, J. Zobel, and R. Sacks-Davis. Efficient passage ranking for document databases. ACM Transactions on Information Systems, 17(4):406--439, Oct. 1999. Google ScholarDigital Library
- E. Lawler, J. Lenstra, A. Kan, and D. Shmoys. Sequencing and scheduling: algorithms and complexity. Elsevier, 1993.Google Scholar
- D. Lichterman. Course project for ECE498, Univ. of Illinois at Urbana-Champaign. http://courses.ece.uiuc.edu/ece498/al1/Archive/Spring2007/HallOfFame.html.Google Scholar
- A. Moffat and J. Zobel. Self-indexing inverted files for fast text retrieval. ACM Trans. on Inf. Systems, 14(4):349--379, 1996. Google ScholarDigital Library
- J. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Kruger, A. Lefohn, and T. Purcell. A survey of general-purpose computation on graphics hardware. In Eurographics, 2005.Google Scholar
- R.Blumofe and C.Leiserson. Scheduling multithreaded computations by work stealing. Proc. of the IEEE Symp. on Foundations of Computer Science, 1994. Google ScholarDigital Library
- K. Risvik, Y. Aasheim, and M. Lidal. Multi-tier architecture for web search engines. In First Latin American Web Congress, 2003. Google ScholarDigital Library
- S. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at trec-3. In Proc. of the 3rd Text Retrieval Conference (TREC), Nov 1994.Google Scholar
- J. Zhang, X. Long, and T. Suel. Performance of compressed inverted list caching in search engines. In Proc. of the 17th Int. World Wide Web Conference, April 2008. Google ScholarDigital Library
- J. Zobel and A. Mo®at. Inverted files for text search engines. ACM Computing Surveys, 38(2), 2006. Google ScholarDigital Library
- M. Zukowski, S. Heman, N. Nes, and P. Boncz. Super-scalar RAM-CPU cache compression. In Proc. of the Int. Conf. on Data Engineering, 2006. Google ScholarDigital Library
Index Terms
- Using graphics processors for high performance IR query processing
Recommendations
Griffin: uniting CPU and GPU in information retrieval systems for intra-query parallelism
PPoPP '18Interactive information retrieval services, such as enterprise search and document search, must provide relevant results with consistent, low response times in the face of rapidly growing data sets and query loads. These growing demands have led ...
Using graphics processors for high-performance IR query processing
WWW '08: Proceedings of the 17th international conference on World Wide WebWeb search engines are facing formidable performance challenges as they need to process thousands of queries per second over billions of documents. To deal with this heavy workload, current engines use massively parallel architectures of thousands of ...
A performance study of general-purpose applications on graphics processors using CUDA
Graphics processors (GPUs) provide a vast number of simple, data-parallel, deeply multithreaded cores and high memory bandwidths. GPU architectures are becoming increasingly programmable, offering the potential for dramatic speedups for a variety of ...
Comments