skip to main content
10.1145/1854273.1854302acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

Twin peaks: a software platform for heterogeneous computing on general-purpose and graphics processors

Published:11 September 2010Publication History

ABSTRACT

Modern processors are evolving into hybrid, heterogeneous processors with both CPU and GPU cores used for general purpose computation. Several languages such as Brook, CUDA, and more recently OpenCL are being developed to fully harness the potential of these processors. These languages typically involve the control code running on the CPU and the performance-critical, data-parallel kernel code running on the GPUs.

In this paper, we present Twin Peaks, a software platform for heterogeneous computing that executes code originally targeted for GPUs efficiently on CPUs as well. This permits a more balanced execution between the CPU and GPU, and enables portability of code between these architectures and to CPU-only environments. We propose several techniques in the runtime system to efficiently utilize the caches and functional units present in CPUs. Using OpenCL as a canonical language for heterogeneous computing, and running several experiments on real hardware, we show that our techniques enable GPGPU-style code to execute efficiently on multicore CPUs with minimal runtime overhead. These results also show that for maximum performance, it is beneficial for applications to utilize both CPUs and GPUs as accelerator targets.

References

  1. }}GPGPU. www.gpgpu.org.Google ScholarGoogle Scholar
  2. }}Intel TBB. www.threadingbuildingblocks.org.Google ScholarGoogle Scholar
  3. }}OpenCL. www.khronos.org/opencl/.Google ScholarGoogle Scholar
  4. }}T. E. Anderson, B. N. Bershad, E. D. Lazowska, and H. M. Levy. Scheduler activations: effective kernel support for the user-level management of parallelism. ACM Trans. on Computer Systems, 10(1), 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. }}I. Buck, T. Foley, D. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for GPUs: Stream computing on graphics hardware. In SIGGRAPH, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. }}B. Calder, C. Krintz, S. John, and T. Austin. Cache-conscious data placement. In Proceedings of ASPLOS, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. }}T. M. Chilimbi, M. D. Hill, and J. R. Larus. Cache-conscious structure layout. In Proceedings of PLDI, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. }}C. H. Crawford, P. Henning, M. Kistler, and C. Wright. Accelerating computing with the Cell broadband engine processor. In Proceedings of Computing Frontiers, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. }}J. Gummaraju, J. Coburn, Y. Turner, and M. Rosenblum. Streamware: Programming general-purpose multicore processors using streams. In Proceedings of ASPLOS XIII, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. }}G. Hoflehner, K. Kirkegaard, R. Skinner, D. Lavery, Y.-F. Lee, and W. Li. Compiler Optimizations for Transaction Processing Workloads on Itanium R Linux Systems. In Proceedings of International Symposium on Microarchitecture, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. }}H. P. Hofstee. Power efficient processor architecture and the cell processor. In Proceedings of HPCA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. }}H. P. Hofstee. Power efficient processor architecture and the cell processor. In Proceedings of HPCA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. }}N. Joukov, A. Kashyap, G. Sivathanu, and E. Zadok. KeFence: An Electric Fence for Kernel Buffers. In StorageSS, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. }}R. Kumar, D. M. Tullsen, N. P. Jouppi, and P. Ranganathan. Heterogeneous chip multiprocessors. Computer, 38(11):32--38, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. }}C. Lattner and V. Adve. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In CGO '04: Proceedings of the International Symposium on Code generation and Optimization, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. }}V. W. Lee, C. Kim, J. Chhugani, M. Deisher, D. Kim, A. D. Nguyen, N. Satish, M. Smelyanskiy, S. Chennupaty, P. Hammarlund, R. Singhal, and P. Dubey. Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU. In Proceedings of ISCA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. }}NVIDIA Corporation. CUDA Programming Guide 2.0, June 2008.Google ScholarGoogle Scholar
  18. }}OpenMP Architecture Review Board. OpenMP Application Program Interface 3.0, 2007.Google ScholarGoogle Scholar
  19. }}J. D. Owens, M. Houston, D. Luebke, S. Green, J. E. Stone, and J. C. Phillips. GPU computing. Proceedings of the IEEE, 96(5), May 2008.Google ScholarGoogle ScholarCross RefCross Ref
  20. }}B. Saha, X. Zhou, H. Chen, Y. Gao, S. Yan, M. Rajagopalan, J. Fang, P. Zhang, R. Ronen, and A. Mendelson. Programming model for a heterogeneous x86 platform. In PLDI, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. }}L. Seiler, D. Carmean, E. Sprangle, T. Forsyth, M. Abrash, P. Dubey, S. Junkins, A. Lake, J. Sugerman, R. Cavin, R. Espasa, E. Grochowski, T. Juan, and P. Hanrahan. Larrabee: a many-core x86 architecture for visual computing. In SIGGRAPH, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. }}J. Stratton, S. S. Stone, and W. W. Hwu. M-CUDA: An efficient implementation of CUDA kernels on multicores. Int'l Workshop on Languages and Compilers for Parallel Computing, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. }}D. Tarditi, S. Puri, and J. Oglesby. Accelerator: Using Data Parallelism to Program GPUs for General-Purpose Uses. In Proceedings of ASPLOS, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. }}C. A. Thekkath, T. D. Nguyen, E. Moy, and E. D. Lazowska. Implementing network protocols at user level. IEEE/ACM Trans. Netw., 1(5), 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. }}P. H. Wang, J. D. Collins, G. N. Chinya, H. Jiang, X. Tian, M. Girkar, N. Y. Yang, G.-Y. Lueh, and H. Wang. EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system. In Proceedings of PLDI, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Twin peaks: a software platform for heterogeneous computing on general-purpose and graphics processors

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniques
        September 2010
        596 pages
        ISBN:9781450301787
        DOI:10.1145/1854273

        Copyright © 2010 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 11 September 2010

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate121of471submissions,26%

        Upcoming Conference

        PACT '24
        International Conference on Parallel Architectures and Compilation Techniques
        October 14 - 16, 2024
        Southern California , CA , USA

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader