Abstract
Modern memory controllers employ sophisticated address mapping, command scheduling, and power management optimizations to alleviate the adverse effects of DRAM timing and resource constraints on system performance. A promising way of improving the versatility and efficiency of these controllers is to make them programmable—a proven technique that has seen wide use in other control tasks, ranging from DMA scheduling to NAND Flash and directory control. Unfortunately, the stringent latency and throughput requirements of modern DDRx devices have rendered such programmability largely impractical, confining DDRx controllers to fixed-function hardware.
This article presents the instruction set architecture (ISA) and hardware implementation of PARDIS, a programmable memory controller that can meet the performance requirements of a high-speed DDRx interface. The proposed controller is evaluated by mapping previously proposed DRAM scheduling, address mapping, refresh scheduling, and power management algorithms onto PARDIS. Simulation results show that the average performance of PARDIS comes within 8% of fixed-function hardware for each of these techniques; moreover, by enabling application-specific optimizations, PARDIS improves system performance by 6 to 17% and reduces DRAM energy by 9 to 22% over four existing memory controllers.
- Agarwal, A., Bianchini, R., Chaiken, D., Kranz, D., Kubiatowicz, J., Hong Lim, B., MacKenzie, K., and Yeung, D. 1995. The MIT alewife machine: Architecture and performance. In Proceedings of the 22nd Annual International Symposium on Computer Architecture. 2--13. Google ScholarDigital Library
- Bailey, D. H. et al. 1994. NAS parallel benchmarks. Tech. rep. RNR-94-007, NASA Ames Research Center.Google Scholar
- Browne, M., Aybay, G., Nowatzyk, A., Dubois, M., and Member, S. 1998. Design verification of the s3.mp cache coherent shared-memory system. IEEE Trans. Comput. Google ScholarDigital Library
- Cadence. Encounter RTL compiler. http://www.cadence.com/products/ld/rtl-compiler/.Google Scholar
- Carter, J., Hsieh, W., Stoller, L., Swanson, M., Zhang, L., Brunvand, E., Davis, A., Kuo, C.-C., Kuramkote, R., Parker, M., Schaelicke, L., and Tateyama, T. 1999. Impulse: Building a smarter memory controller. In Proceedings of the International Symposium 5th HPCA. High-Performance Computer Architecture. 70--79. Google ScholarDigital Library
- Choudhary, N. K., Wadhavkar, S. V., Shah, T. A., Mayukh, H., Gandhi, J., Dwiel, B. H., Navada, S., Najaf-Abadi, H. H., and Rotenberg, E. 2011. Fabscalar: Composing synthesizable RTL designs of arbitrary cores within a canonical superscalar template. In Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA'11). ACM, New York, 11--22. Google ScholarDigital Library
- Dagum, L. and Menon, R. 1998. OpenMP: An industry-standard API for shared-memory programming. IEEE Comput. Sci. Eng. 5, 1, 46--55. Google ScholarDigital Library
- Diniz, B., Guedes, D., Meira,W., Jr., and Bianchini, R. 2007. Limiting the power consumption of main memory. In Proceedings of the Annual International Symposium on Computer Architecture (ISCA). 290--301. Google ScholarDigital Library
- Firoozshahian, A., Solomatnikov, A., Shacham, O., Asgar, Z., Richardson, S., Kozyrakis, C., and Horowitz, M. 2009. A memory system design framework: Creating smart memories. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA'09). ACM, New York, 406--417. Google ScholarDigital Library
- FreePDK. Free PDK 45nm open-access based PDK for the 45nm technology node. http://www.eda.ncsu.edu/wiki/FreePDK.Google Scholar
- Hewlett-Packard Development Company, L. P. 2010. DDR3 memory technology. http://h20195.www2.hp.com/v2/GetPDF.aspx/c01750914.pdf.Google Scholar
- Hur, I. and Lin, C. 2008. A comprehensive approach to dram power management. In Proceedings of HPCA'08. 305--316.Google Scholar
- Ipek, E., Mutlu, O., Martinez, J., and Caruana, R. 2008. Self-optimizing memory controllers: A reinforcement learning approach. In Proceedings of the International Symposium on Computer Architecture. Google ScholarDigital Library
- Isen, C. and John, L. 2009. Eskimo - Energy savings using semantic knowledge of inconsequential memory occupancy for dram subsystem. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42). 337--346. Google ScholarDigital Library
- ITRS. International Technology Roadmap for Semiconductors: 2010 Update. http://www.itrs.net/links/2010itrs/home2010.htm.Google Scholar
- Jacob, B. L., Ng, S. W., Wang, D. T., and Wang, D. T. 2008. Memory Systems: Cache, DRAM, Disk. Morgan Kaufmann. Google ScholarDigital Library
- Kim, Y., Han, D., Mutlu, O., and Harchol-Balter, M. 2010a. Atlas: A scalable and high-performance scheduling algorithm for multiple memory controllers. In Proceedings of the IEEE 16th International Symposium on High Performance Computer Architecture (HPCA). 1--12.Google Scholar
- Kim, Y., Papamichael, M., Mutlu, O., and Harchol-Balter, M. 2010b. Thread cluster memory scheduling: Exploiting differences in memory access behavior. In Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'43). IEEE, Los Alamistos, CA, 65--76. Google ScholarDigital Library
- Kornaros, G., Papaefstathiou, I., Nikologiannis, A., and Zervos, N. 2003. A fully programmable memory management system optimizing queue handling at multi gigabit rates. In Proceedings of the Design Automation Conference. 54--59. Google ScholarDigital Library
- Kuskin, J., Ofelt, D., Heinrich, M., Heinlein, J., Simoni, R., Gharachorloo, K., Chapin, J., Nakahira, D., Baxter, J., Horowitz, M., Gupta, A., Rosenblum, M., and Hennessy, J. 1994. The Stanford flash multiprocessor. In Proceedings of the 21st Annual International Symposium on Computer Architecture (ISCA'94). IEEE, Los Alamitos, CA, 302--313. Google ScholarDigital Library
- Lee, K.-B., Lin, T.-C., and Jen, C.-W. 2005. An efficient quality-aware memory controller for multimediaplatform soc. IEEE Trans. Circuits Syst. Video Technol. 15, 5, 620--633. Google ScholarDigital Library
- Liu, S., Pattabiraman, K., Moscibroda, T., and Zorn, B. G. 2011. Flikker: Saving DRAM refresh-power through critical data partitioning. In Proceedings of ASPLOS, R. Gupta and T. C. Mowry, Eds., ACM, New York, 213--224. Google ScholarDigital Library
- Martin, J., Bernard, C., Clermidy, F., and Durand, Y. 2009. A microprogrammable memory controller for high-performance dataflow applications. In Proceedings of ESSCIRC (ESSCIRC'09). 348--351.Google Scholar
- Micron Technology, Inc. 2009a. 8Gb DDR3 SDRAM. Micron Technology, Inc. http://www.micron.com//getdocument/?documentId=416.Google Scholar
- Micron Technology, Inc. 2009b. TN-29-14: Increasing NAND flash performance functionality. Micron Technology Inc. http://www.micron.com/getdocument/?documentId=140.Google Scholar
- Micron Technology, Inc. 2009c. TN-41-08: design guide for two DDR3-1066 UDIMM systems introduction. Micron Technology, Inc. http://www.micron.com//document download/?documentId=4297.Google Scholar
- Mukundan, J. and Martinez, J. F. 2012. Morse: Multi-objective reconfigurable self-optimizing memory scheduler. In Proceedings of the IEEE 18th International Symposium on High-Performance Computer Architecture (HPCA'12). IEEE, Los Alamitos, CA, 1--12. Google ScholarDigital Library
- Mutlu, O. and Moscibroda, T. 2008. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared dram systems. In Proceedings of the 35th Annual International Symposium on Computer Architecture. ACM, New York, 32--41. Google ScholarDigital Library
- Narayanan, R., et al. 2006. Minebench: A benchmark suite for data mining workloads. In Proceedings of the IEEE International Symposium on Workload Characterization.Google ScholarCross Ref
- Reinhardt, S. K., Larus, J. R., and Wood, D. A. 1994. Tempest and typhoon: User-level shared memory. In Proceedings of ISCA-21. 325--336. Google ScholarDigital Library
- Renau, J., et al. 2005. SESC simulator. http://sesc.sourceforge.net.Google Scholar
- Rixner, S., et al. 2000. Memory access scheduling. In Proceedings of the 27th Annual International Symposium on Computer Architecture. Google ScholarDigital Library
- Stuecheli, J., Kaseridis, D., Hunter, H. C., and John, L. K. 2010. Elastic refresh: Techniques to mitigate refresh penalties in high density memory. In Proceedings of MICRO. 375--384. Google ScholarDigital Library
- Sudan, K., Chatterjee, N., Nellans, D., Awasthi, M., Balasubramonian, R., and Davis, A. 2010. Micro-pages: increasing dram efficiency with locality-aware data placement. In Proceedings of ASPLOS'10. 219--230. Google ScholarDigital Library
- Wilton, S. and Jouppi, N. 1996. CACTI: An enhanced cache access and cycle time model. IEEE J. Solid-State Circuits 31, 5, 677--688.Google ScholarCross Ref
- Woo, S. C., Ohara, M., Torrie, E., Singh, J. P., and Gupta, A. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of ISCA-22. Google ScholarDigital Library
- Yoo, R. M., Romano, A., and Kozyrakis, C. 2009. Phoenix rebirth: Scalable MapReduce on a large-zscale shared-memory system. In Proceedings of the IEEE International Symposium on Workload Characterization. Google ScholarDigital Library
- Zhang, Z., Zhu, Z., and Zhang, X. 2000. A permutation-based page interleaving scheme to reduce row buffer conflicts and exploit data locality. In Proceedings of the 33rd Annual International Symposium on Microarchitecture. ACM, New York, 32--41. Google ScholarDigital Library
- Zhao, W. and Cao, Y. 2006. New generation of predictive technology model for sub-45nm design exploration. In Proceedings of the International Symposium on Quality Electronic Design. Google ScholarDigital Library
- Zheng, H., Lin, J., Zhang, Z., Gorbatov, E., David, H., and Zhu, Z. 2008. Mini-rank: Adaptive dram architecture for improving memory power efficiency. In Proceedings of the 41st IEEE/ACM International Symposium on Microarchitecture (MICRO-41). IEEE, Los Alamitos, CA, 210--221. Google ScholarDigital Library
Index Terms
- A programmable memory controller for the DDRx interfacing standards
Recommendations
Programmable DDRx Controllers
Modern memory controllers employ sophisticated address mapping, command scheduling, and power management optimizations to alleviate the adverse effects of DRAM timing and resource constraints on system performance. A promising way of improving the ...
Refresh pausing in DRAM memory systems
Dynamic Random Access Memory (DRAM) cells rely on periodic refresh operations to maintain data integrity. As the capacity of DRAM memories has increased, so has the amount of time consumed in doing refresh. Refresh operations contend with read ...
Design and Implementation of a DDR3-based Memory Controller
ISDEA '13: Proceedings of the 2013 Third International Conference on Intelligent System Design and Engineering ApplicationsMemory performance has become the major bottleneck to improve the overall performance of the computer system. DDR3 SDRAM is a new generation of memory technology standard introduced by JEDEC, support multibank in parallel and open-page technology. On ...
Comments