skip to main content
10.1145/1967677.1967699acmconferencesArticle/Chapter ViewAbstractPublication PagescpsweekConference Proceedingsconference-collections
research-article

An instruction-scheduling-aware data partitioning technique for coarse-grained reconfigurable architectures

Authors Info & Claims
Published:11 April 2011Publication History

ABSTRACT

In this paper, we propose a data partitioning technique for the memory subsystem that consists of a multi-ported scratchpad memory (SPM) unit and a single-ported data cache in coarse-grained reconfigurable arrays (CGRA) architecture. The embedded reconfigurable processor executes programs by switching between the Non-VLIW and VLIW modes depending on the type of the code region to achieve high performance. The VLIW mode exploits code regions with high ILP that require high memory bandwidth and the Non-VLIW mode exploits those with low ILP that require low memory latency. Our data partitioning technique between the SPM and the data cache is based on data interference graph reduction and profiling information. Given an SPM size, it finds the optimal data partitions by taking the VLIW instruction schedule into consideration. We evaluate our data partitioning technique for the CGRA architecture with three representative multimedia applications.

References

  1. Federico Angiolini, Luca Benini, and Alberto Caprara. Polynomial-time algorithm for on-chip scratchpad memory partitioning. In CASES '03: Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems, pages 318--326, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Oren Avissar, Rajeev Barua, and Dave Stewart. An optimal memory allocation scheme for scratch-pad-based embedded systems. ACM Trans. Embed. Comput. Syst., 1(1):6--26, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Kristof Beyls and Erik H. D'Hollander. Generating cache hints for improved program efficiency. J. Syst. Archit., 51(4):223--250, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. CACTI 4.2. http://quid.hpl.hp.com:9081/cacti/, 2006.Google ScholarGoogle Scholar
  5. Hyungmin Cho, Bernhard Egger, Jaejin Lee, and Heonshik Shin. Dynamic data scratchpad memory management for a memory subsystem with an mmu. In LCTES '07: Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems, pages 195--206, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Intel Corporation. Intel Itanium 2 Processor Reference Manual For Software Development and Optimization. 2004.Google ScholarGoogle Scholar
  7. Eddy De Greef, Francky Catthoor, and Hugo De Man. Array placement for storage size reduction in embedded multimedia systems. In ASAP '97: Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors, pages 66--, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Angel Dominguez, Nghi Nguyen, and Rajeev K. Barua. Recursive function data allocation to scratch-pad memory. In CASES '07: Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems, pages 65--74, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Angel Dominguez, Sumesh Udayakumaran, and Rajeev Barua. Heap data allocation to scratch-pad memory in embedded systems. Journal of Embedded Computing, 1(4):521--540, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Michael R. Garey and David S. Johnson. Computers and Intractability. Freeman, 1979.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Antonio González, Carlos Aliagas, and Mateo Valero. A data cache with multiple caching strategies tuned to different types of locality. In ICS '95: Proceedings of the 9th international conference on Supercomputing, pages 338--347, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. AMD Inc. Software Optimization Guide for AMD64 Processors. 2005.Google ScholarGoogle Scholar
  13. Texas Instruments Incoporated. Tms320c6000 high performance dsps. http://www.ti.com, 2006.Google ScholarGoogle Scholar
  14. ISO/IEC. IS 13818--3 Information Technology - Generic Coding of Moving Pictures and Associated Audio: Audio. 1996. MP3.Google ScholarGoogle Scholar
  15. ISO/IEC. IS 14496--10 Information Technology - Coding of Audio Visual Objects: Advanced Video Coding. 2005. H.264.Google ScholarGoogle Scholar
  16. ISO/IEC. IS 14496--3 Information Technology - Coding of Audio Visual Objects: Audio. 2005. AAC.Google ScholarGoogle Scholar
  17. Toni Juan, Juan J. Navarro, and Olivier Temam. Data caches for superscalar processors. In ICS '97: Proceedings of the 11th international conference on Supercomputing, pages 60--67, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Hsien-Hsin S. Lee and Gary S. Tyson. Region-based caching: an energy-delay efficient memory architecture for embedded processors. In CASES '00: Proceedings of the 2000 international conference on Compilers, architecture, and synthesis for embedded systems, pages 120--127, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jacob Leverich, Hideho Arakida, Alex Solomatnikov, Amin Firoozshahian, Mark Horowitz, and Christos Kozyrakis. Comparing memory systems for chip multiprocessors. In ISCA '07: Proceedings of the 34th annual international symposium on Computer architecture, pages 358--368, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. ARM Limited. RealView SoC Designer 6.2,. http://www.arm.com/products/DevTools/SoCDesigner.html.Google ScholarGoogle Scholar
  21. Guangming Lu, Hartej Singh, Ming-Hau Lee, Nader Bagherzadeh, Fadi J. Kurdahi, and Eliseu M. Chaves Filho. The morphosys parallel reconfigurable system. In Euro-Par '99: Proceedings of the 5th International Euro-Par Conference on Parallel Processing, pages 727--734, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Bingfeng Mei, Serge Vernalde, Diederik Verkest, Hugo De Man, and Rudy Lauwereins. Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling. In DATE '03: Proceedings of the conference on Design, Automation and Test in Europe, page 10296, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Bingfeng Mei, Serge Vernalde, Diederik Verkest, and Rudy Lauwereins. Design methodology for a tightly coupled vliw/reconfigurable matrix architecture: A case study. In DATE '04: Proceedings of the conference on Design, automation and test in Europe, page 21224, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Wilfried Oed and O. Lange. On the effective bandwidth of interleaved memories in vector processor systems. IEEE Trans. Comput., 34(10):949--957, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Taewook Oh, Bernhard Egger, Hyunchul Park, and Scott Mahlke. Recurrence cycle aware modulo scheduling for coarse-grained reconfigurable architectures. In LCTES '09: Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems, pages 21--30, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Hyunchul Park, Kevin Fan, Manjunath Kudlur, and Scott Mahlke. Modulo graph embedding: mapping applications onto coarse-grained reconfigurable architectures. In CASES '06: Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems, pages 136--146, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Hyunchul Park, Kevin Fan, Scott A. Mahlke, Taewook Oh, Heeseok Kim, and Hong-seok Kim. Edge-centric modulo scheduling for coarse-grained reconfigurable architectures. In PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques, pages 166--176, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Yongjun Park, Hyunchul Park, and Scott Mahlke. Cgra express: accelerating execution using dynamic operation fusion. In CASES '09: Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems, pages 271--280, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Ram Raghavan and John P. Hayes. Reducing interference among vector accesses in interleaved memories. IEEE Trans. Comput., 42(4):471--483, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. B. Ramakrishna Rau. Iterative modulo scheduling: an algorithm for software pipelining loops. In MICRO 27: Proceedings of the 27th annual international symposium on Microarchitecture, pages 63--74, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Rajiv Ravindran, Michael Chu, and Scott Mahlke. Compiler-managed partitioned data caches for low power. In LCTES '07: Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems, pages 237--247, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Jude A. Rivers, Gary S. Tyson, Edward S. Davidson, and Todd M. Austin. On high-bandwidth data cache design for multi-issue processors. In MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, pages 46--56, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Julio Sahuquillo, Salvador Petit, Ana Pont, and Veljko Milutinović. Exploring the performance of split data cache schemes on superscalar processors and symmetric multiprocessors. J. Syst. Archit., 51(8):451--469, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Jesús Sánchez and Antonio González. A locality sensitive multi-module cache with explicit management. In ICS '99: Proceedings of the 13th international conference on Supercomputing, pages 51--59, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Aviral Shrivastava, Ilya Issenin, and Nikil Dutt. Compilation techniques for energy reduction in horizontally partitioned cache architectures. In CASES '05: Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems, pages 90--96, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Gurindar S. Sohi and Manoj Franklin. High-bandwidth data memory systems for superscalar processors. In ASPLOS-IV: Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, pages 53--62, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Stefan Steinke, Lars Wehmeyer, Bo-Sik Lee, and Peter Marwedel. Assigning program and data objects to scratchpad for energy reduction. In DATE '02: Proceedings of the conference on Design, automation and test in Europe, page 409, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Tensilica Inc. Xtensa customizable processors. http://www.tensilica.com, 2007.Google ScholarGoogle Scholar
  39. Remko Tronçon, Maurice Bruynooghe, Gerda Janssens, and Francky Catthoor. Storage size reduction by in-place mapping of arrays. In VMCAI '02: Revised Papers from the Third International Workshop on Verification, Model Checking, and Abstract Interpretation, pages 167--181, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Gary Tyson, Matthew Farrens, John Matthews, and Andrew R. Pleszkun. A modified approach to data cache management. In MICRO 28: Proceedings of the 28th annual international symposium on Microarchitecture, pages 93--103, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Sumesh Udayakumaran and Rajeev Barua. Compiler-decided dynamic memory allocation for scratch-pad based embedded systems. In CASES '03: Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems, pages 276--286, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Osman S. Unsal, Israel Koren, C. Mani Krishna, and Csaba Andras Moritz. The minimax cache: An energy-efficient framework for media processors. In HPCA '02: Proceedings of the 8th International Symposium on High-Performance Computer Architecture, page 131, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Manish Verma, Stefan Steinke, and Peter Marwedel. Data partitioning for maximal scratchpad usage. In ASP-DAC '03: Proceedings of the 2003 Asia and South Pacific Design Automation Conference, pages 77--83, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Lars Wehmeyer, Urs Helmig, and Peter Marwedel. Compiler-optimized usage of partitioned memories. In WMPI '04: Proceedings of the 3rd workshop on Memory performance issues, pages 114--120, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An instruction-scheduling-aware data partitioning technique for coarse-grained reconfigurable architectures

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      LCTES '11: Proceedings of the 2011 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
      April 2011
      182 pages
      ISBN:9781450305556
      DOI:10.1145/1967677
      • cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 46, Issue 5
        LCTES '10
        May 2011
        170 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/2016603
        Issue’s Table of Contents

      Copyright © 2011 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 11 April 2011

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate116of438submissions,26%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader