skip to main content
research-article

A programmable memory controller for the DDRx interfacing standards

Published:20 December 2013Publication History
Skip Abstract Section

Abstract

Modern memory controllers employ sophisticated address mapping, command scheduling, and power management optimizations to alleviate the adverse effects of DRAM timing and resource constraints on system performance. A promising way of improving the versatility and efficiency of these controllers is to make them programmable—a proven technique that has seen wide use in other control tasks, ranging from DMA scheduling to NAND Flash and directory control. Unfortunately, the stringent latency and throughput requirements of modern DDRx devices have rendered such programmability largely impractical, confining DDRx controllers to fixed-function hardware.

This article presents the instruction set architecture (ISA) and hardware implementation of PARDIS, a programmable memory controller that can meet the performance requirements of a high-speed DDRx interface. The proposed controller is evaluated by mapping previously proposed DRAM scheduling, address mapping, refresh scheduling, and power management algorithms onto PARDIS. Simulation results show that the average performance of PARDIS comes within 8% of fixed-function hardware for each of these techniques; moreover, by enabling application-specific optimizations, PARDIS improves system performance by 6 to 17% and reduces DRAM energy by 9 to 22% over four existing memory controllers.

References

  1. Agarwal, A., Bianchini, R., Chaiken, D., Kranz, D., Kubiatowicz, J., Hong Lim, B., MacKenzie, K., and Yeung, D. 1995. The MIT alewife machine: Architecture and performance. In Proceedings of the 22nd Annual International Symposium on Computer Architecture. 2--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bailey, D. H. et al. 1994. NAS parallel benchmarks. Tech. rep. RNR-94-007, NASA Ames Research Center.Google ScholarGoogle Scholar
  3. Browne, M., Aybay, G., Nowatzyk, A., Dubois, M., and Member, S. 1998. Design verification of the s3.mp cache coherent shared-memory system. IEEE Trans. Comput. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Cadence. Encounter RTL compiler. http://www.cadence.com/products/ld/rtl-compiler/.Google ScholarGoogle Scholar
  5. Carter, J., Hsieh, W., Stoller, L., Swanson, M., Zhang, L., Brunvand, E., Davis, A., Kuo, C.-C., Kuramkote, R., Parker, M., Schaelicke, L., and Tateyama, T. 1999. Impulse: Building a smarter memory controller. In Proceedings of the International Symposium 5th HPCA. High-Performance Computer Architecture. 70--79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Choudhary, N. K., Wadhavkar, S. V., Shah, T. A., Mayukh, H., Gandhi, J., Dwiel, B. H., Navada, S., Najaf-Abadi, H. H., and Rotenberg, E. 2011. Fabscalar: Composing synthesizable RTL designs of arbitrary cores within a canonical superscalar template. In Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA'11). ACM, New York, 11--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Dagum, L. and Menon, R. 1998. OpenMP: An industry-standard API for shared-memory programming. IEEE Comput. Sci. Eng. 5, 1, 46--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Diniz, B., Guedes, D., Meira,W., Jr., and Bianchini, R. 2007. Limiting the power consumption of main memory. In Proceedings of the Annual International Symposium on Computer Architecture (ISCA). 290--301. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Firoozshahian, A., Solomatnikov, A., Shacham, O., Asgar, Z., Richardson, S., Kozyrakis, C., and Horowitz, M. 2009. A memory system design framework: Creating smart memories. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA'09). ACM, New York, 406--417. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. FreePDK. Free PDK 45nm open-access based PDK for the 45nm technology node. http://www.eda.ncsu.edu/wiki/FreePDK.Google ScholarGoogle Scholar
  11. Hewlett-Packard Development Company, L. P. 2010. DDR3 memory technology. http://h20195.www2.hp.com/v2/GetPDF.aspx/c01750914.pdf.Google ScholarGoogle Scholar
  12. Hur, I. and Lin, C. 2008. A comprehensive approach to dram power management. In Proceedings of HPCA'08. 305--316.Google ScholarGoogle Scholar
  13. Ipek, E., Mutlu, O., Martinez, J., and Caruana, R. 2008. Self-optimizing memory controllers: A reinforcement learning approach. In Proceedings of the International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Isen, C. and John, L. 2009. Eskimo - Energy savings using semantic knowledge of inconsequential memory occupancy for dram subsystem. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42). 337--346. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. ITRS. International Technology Roadmap for Semiconductors: 2010 Update. http://www.itrs.net/links/2010itrs/home2010.htm.Google ScholarGoogle Scholar
  16. Jacob, B. L., Ng, S. W., Wang, D. T., and Wang, D. T. 2008. Memory Systems: Cache, DRAM, Disk. Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Kim, Y., Han, D., Mutlu, O., and Harchol-Balter, M. 2010a. Atlas: A scalable and high-performance scheduling algorithm for multiple memory controllers. In Proceedings of the IEEE 16th International Symposium on High Performance Computer Architecture (HPCA). 1--12.Google ScholarGoogle Scholar
  18. Kim, Y., Papamichael, M., Mutlu, O., and Harchol-Balter, M. 2010b. Thread cluster memory scheduling: Exploiting differences in memory access behavior. In Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'43). IEEE, Los Alamistos, CA, 65--76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Kornaros, G., Papaefstathiou, I., Nikologiannis, A., and Zervos, N. 2003. A fully programmable memory management system optimizing queue handling at multi gigabit rates. In Proceedings of the Design Automation Conference. 54--59. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kuskin, J., Ofelt, D., Heinrich, M., Heinlein, J., Simoni, R., Gharachorloo, K., Chapin, J., Nakahira, D., Baxter, J., Horowitz, M., Gupta, A., Rosenblum, M., and Hennessy, J. 1994. The Stanford flash multiprocessor. In Proceedings of the 21st Annual International Symposium on Computer Architecture (ISCA'94). IEEE, Los Alamitos, CA, 302--313. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Lee, K.-B., Lin, T.-C., and Jen, C.-W. 2005. An efficient quality-aware memory controller for multimediaplatform soc. IEEE Trans. Circuits Syst. Video Technol. 15, 5, 620--633. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Liu, S., Pattabiraman, K., Moscibroda, T., and Zorn, B. G. 2011. Flikker: Saving DRAM refresh-power through critical data partitioning. In Proceedings of ASPLOS, R. Gupta and T. C. Mowry, Eds., ACM, New York, 213--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Martin, J., Bernard, C., Clermidy, F., and Durand, Y. 2009. A microprogrammable memory controller for high-performance dataflow applications. In Proceedings of ESSCIRC (ESSCIRC'09). 348--351.Google ScholarGoogle Scholar
  24. Micron Technology, Inc. 2009a. 8Gb DDR3 SDRAM. Micron Technology, Inc. http://www.micron.com//getdocument/?documentId=416.Google ScholarGoogle Scholar
  25. Micron Technology, Inc. 2009b. TN-29-14: Increasing NAND flash performance functionality. Micron Technology Inc. http://www.micron.com/getdocument/?documentId=140.Google ScholarGoogle Scholar
  26. Micron Technology, Inc. 2009c. TN-41-08: design guide for two DDR3-1066 UDIMM systems introduction. Micron Technology, Inc. http://www.micron.com//document download/?documentId=4297.Google ScholarGoogle Scholar
  27. Mukundan, J. and Martinez, J. F. 2012. Morse: Multi-objective reconfigurable self-optimizing memory scheduler. In Proceedings of the IEEE 18th International Symposium on High-Performance Computer Architecture (HPCA'12). IEEE, Los Alamitos, CA, 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Mutlu, O. and Moscibroda, T. 2008. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared dram systems. In Proceedings of the 35th Annual International Symposium on Computer Architecture. ACM, New York, 32--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Narayanan, R., et al. 2006. Minebench: A benchmark suite for data mining workloads. In Proceedings of the IEEE International Symposium on Workload Characterization.Google ScholarGoogle ScholarCross RefCross Ref
  30. Reinhardt, S. K., Larus, J. R., and Wood, D. A. 1994. Tempest and typhoon: User-level shared memory. In Proceedings of ISCA-21. 325--336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Renau, J., et al. 2005. SESC simulator. http://sesc.sourceforge.net.Google ScholarGoogle Scholar
  32. Rixner, S., et al. 2000. Memory access scheduling. In Proceedings of the 27th Annual International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Stuecheli, J., Kaseridis, D., Hunter, H. C., and John, L. K. 2010. Elastic refresh: Techniques to mitigate refresh penalties in high density memory. In Proceedings of MICRO. 375--384. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Sudan, K., Chatterjee, N., Nellans, D., Awasthi, M., Balasubramonian, R., and Davis, A. 2010. Micro-pages: increasing dram efficiency with locality-aware data placement. In Proceedings of ASPLOS'10. 219--230. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Wilton, S. and Jouppi, N. 1996. CACTI: An enhanced cache access and cycle time model. IEEE J. Solid-State Circuits 31, 5, 677--688.Google ScholarGoogle ScholarCross RefCross Ref
  36. Woo, S. C., Ohara, M., Torrie, E., Singh, J. P., and Gupta, A. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of ISCA-22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Yoo, R. M., Romano, A., and Kozyrakis, C. 2009. Phoenix rebirth: Scalable MapReduce on a large-zscale shared-memory system. In Proceedings of the IEEE International Symposium on Workload Characterization. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Zhang, Z., Zhu, Z., and Zhang, X. 2000. A permutation-based page interleaving scheme to reduce row buffer conflicts and exploit data locality. In Proceedings of the 33rd Annual International Symposium on Microarchitecture. ACM, New York, 32--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Zhao, W. and Cao, Y. 2006. New generation of predictive technology model for sub-45nm design exploration. In Proceedings of the International Symposium on Quality Electronic Design. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Zheng, H., Lin, J., Zhang, Z., Gorbatov, E., David, H., and Zhu, Z. 2008. Mini-rank: Adaptive dram architecture for improving memory power efficiency. In Proceedings of the 41st IEEE/ACM International Symposium on Microarchitecture (MICRO-41). IEEE, Los Alamitos, CA, 210--221. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A programmable memory controller for the DDRx interfacing standards

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Computer Systems
          ACM Transactions on Computer Systems  Volume 31, Issue 4
          December 2013
          90 pages
          ISSN:0734-2071
          EISSN:1557-7333
          DOI:10.1145/2542150
          Issue’s Table of Contents

          Copyright © 2013 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 20 December 2013
          • Revised: 1 June 2013
          • Accepted: 1 June 2013
          • Received: 1 December 2012
          Published in tocs Volume 31, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader