skip to main content
10.1145/1454115.1454152acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

Mars: a MapReduce framework on graphics processors

Published:25 October 2008Publication History

ABSTRACT

We design and implement Mars, a MapReduce framework, on graphics processors (GPUs). MapReduce is a distributed programming framework originally proposed by Google for the ease of development of web search applications on a large number of commodity CPUs. Compared with CPUs, GPUs have an order of magnitude higher computation power and memory bandwidth, but are harder to program since their architectures are designed as a special-purpose co-processor and their programming interfaces are typically for graphics applications. As the first attempt to harness GPU's power for MapReduce, we developed Mars on an NVIDIA G80 GPU, which contains over one hundred processors, and evaluated it in comparison with Phoenix, the state-of-the-art MapReduce framework on multi-core CPUs. Mars hides the programming complexity of the GPU behind the simple and familiar MapReduce interface. It is up to 16 times faster than its CPU-based counterpart for six common web applications on a quad-core machine.

References

  1. A. Ailamaki, N. K. Govindaraju, S. Harizopoulos, and D. Manocha. Query co-processing on commodity processors. In VLDB '06: Proceedings of the 32nd international conference on Very large data bases, pages 1267--1267. VLDB Endowment, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. AMD CTM. http://ati.amd.com/products/streamprocessor/, 2007.Google ScholarGoogle Scholar
  3. Apache Hadoop. http://lucene.apache.org/hadoop/, 2006.Google ScholarGoogle Scholar
  4. D. Blythe. The direct3d 10 system. ACM Trans. Graph., 25(3):724--734, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for gpus: stream computing on graphics hardware. ACM Trans. Graph., 23(3):777--786, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. B. Catanzaro, N. Sundaram, and K. Keutzer. A map reduce framework for programming graphics processors. In Workshop on Software Tools for MultiCore Systems, 2008.Google ScholarGoogle Scholar
  7. H. chih Yang, A. Dasdan, R.-L. Hsiao, and D. S. Parker. Map-reduce-merge: simplified relational data processing on large clusters. In SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data, pages 1029--1040, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Chu, S. Kim, Y. Lin, Y. Yu, G. Bradski, A. Y. Ng, and K. Olukotun. Map-reduce for machine learning on multicore. In NIPS '07: Proceedings of Twenty-First Annual Conference on Neural Information Processing Systems. Neural Information Processing Systems Foundation, 2007.Google ScholarGoogle Scholar
  9. J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51(1):107--113, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Feng, S. Chakraborty, B. Schmidt, W. Liu, and U. D. Bordoloi. Fast schedulability analysis using commodity graphics hardware. 13th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Folding@home. http://www.scei.co.jp/folding, 2007.Google ScholarGoogle Scholar
  12. N. Govindaraju, J. Gray, R. Kumar, and D. Manocha. GPUTeraSort: high performance graphics co-processor sorting for large database management. In SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of data, pages 325--336, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. N. Govindaraju, B. Lloyd, W. Wang, M. Lin, and D. Manocha. Fast computation of database operations using graphics processors. In SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of data, pages 215--226, New York, NY, USA, 2004. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Harris, J. Owens, S. Sengupta, Y. Zhang, and A. Davidson. Cudpp: Cuda data parallel primitives library. 2007.Google ScholarGoogle Scholar
  15. B. He, N. K. Govindaraju, Q. Luo, and B. Smith. Efficient gather and scatter operations on graphics processors. In SC '07: Proceedings of the 2007 ACM/IEEE conference on Supercomputing, pages 1--12, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. B. He, K. Yang, R. Fang, M. Lu, N. Govindaraju, Q. Luo, and P. Sander. Relational joins on graphics processors. In SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 511--524, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Horn. Lib GPU FFT, 2006.Google ScholarGoogle Scholar
  18. C. Jiang and M. Snir. Automatic tuning matrix multiplication performance on graphics hardware. In PACT '05: Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques, pages 185--196, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. E. S. Larsen and D. McAllister. Fast matrix multiplies using graphics hardware. In Supercomputing '01: Proceedings of the 2001 ACM/IEEE conference on Supercomputing (CDROM), pages 55--55, New York, NY, USA, 2001. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. D. Linderman, J. D. Collins, H. Wang, and T. H. Meng. Merge: a programming model for heterogeneous multi-core systems. In ASPLOS XIII: Proceedings of the 13th international conference on Architectural support for programming languages and operating systems, pages 287--296, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. W. Liu, B. Schmidt, G. Voss, and W. Wittig. Streaming algorithms for biological sequence alignment on gpus. IEEE Transactions on Parallel and Distributed Systems, 18:1270--1281, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. H. Nguyen. GPU gems 3. Addison-Wesley, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. NVIDIA Corp. . CUDA Occupancy Calculator, 2007.Google ScholarGoogle Scholar
  24. NVIDIA CUDA. http://developer.nvidia.com/object/cuda.html, 2007.Google ScholarGoogle Scholar
  25. OpenGL. http://www.opengl.org/, 2007.Google ScholarGoogle Scholar
  26. J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Kruger, A. E. Lefohn, and T. J. Purcell. A survey of general-purpose computation on graphics hardware. Computer Graphics Forum, 26, 2007.Google ScholarGoogle Scholar
  27. C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis. Evaluating mapreduce for multi-core and multiprocessor systems. In HPCA '07: Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture, pages 13--24, Washington, DC, USA, 2007. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Sengupta, M. Harris, Y. Zhang, and J. D. Owens. Scan primitives for gpu computing. In GH '07: Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware, pages 97--106, Aire-la-Ville, Switzerland, Switzerland, 2007. Eurographics Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. SETI@home. http://setiathome.berkeley.edu/, 2007.Google ScholarGoogle Scholar
  30. D. Tarditi, S. Puri, and J. Oglesby. Accelerator: using data parallelism to program gpus for general-purpose uses. SIGOPS Oper. Syst. Rev., 40(5):325--335, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. R. Yates and B. Neto. Modern information retrieval. Addison Wesley, 1 edition, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Mars: a MapReduce framework on graphics processors

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques
            October 2008
            328 pages
            ISBN:9781605582825
            DOI:10.1145/1454115

            Copyright © 2008 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 25 October 2008

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate121of471submissions,26%

            Upcoming Conference

            PACT '24
            International Conference on Parallel Architectures and Compilation Techniques
            October 14 - 16, 2024
            Southern California , CA , USA

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader