Abstract
In this paper, we show how 3D stacking technology can be used to implement a simple, low-power, high-performance chip multiprocessor suitable for throughput processing. Our proposed architecture, PicoServer, employs 3D technology to bond one die containing several simple slow processing cores to multiple DRAM dies sufficient for a primary memory. The 3D technology also enables wide low-latency buses between processors and memory. These remove the need for an L2 cache allowing its area to be re-allocated to additional simple cores. The additional cores allow the clock frequency to be lowered without impairing throughput. Lower clock frequency in turn reduces power and means that thermal constraints, a concern with 3D stacking, are easily satisfied.The PicoServer architecture specifically targets Tier 1 server applications, which exhibit a high degree of thread level parallelism. An architecture targeted to efficient throughput is ideal for this application domain. We find for a similar logic die area, a 12 CPU system with 3D stacking and no L2 cache outperforms an 8 CPU system with a large on-chip L2 cache by about 14% while consuming 55% less power. In addition, we show that a PicoServer performs comparably to a Pentium 4-like class machine while consuming only about 1/10 of the power, even when conservative assumptions are made about the power consumption of the PicoServer.
- ARM 11 MPcore. http://www.arm.com/products/CPUs/ARM11MPCoreMultiprocessor.html.Google Scholar
- Evolution of network memory. http://www.jedex.org/images/pdf/jack_troung_samsung.pdf.Google Scholar
- FaStack 3D RISC super-8051 microcontroller. http://www.tachyonsemi.com/OtherICs/datasheets/TSCR8051Lx_1_5Web.pdf.Google Scholar
- The Micron system-power calculator. http://www.micron.com/products/dram/syscalc.html.Google Scholar
- National semiconductor DP83820 10 / 100 / 1000 Mb/s PCI ethernet network interface controller.Google Scholar
- Predictive technology model. http://www.eas.asu.edu/~ptm.Google Scholar
- (LS)3-libre streaming, libre software, libre standards an open multimedia streaming project. http://streaming.polito.it/.Google Scholar
- RLDRAM memory. http://www.micron.com/products/dram/rldram/.Google Scholar
- SPECweb99 benchmark. http://www.spec.org/osg/web99/.Google Scholar
- Sun Fire T2000 Server Power Calculator. http://www.sun.com/servers/coolthreads/t2000/calc/index.jsp.Google Scholar
- ITRS roadmap. Technical report, 2005.Google Scholar
- K. Banerjee, S.J. Souri, P. Kapur, and K.C. Saraswat. 3-D ICs: A novel chip design for improving deep-submicrometer interconnect performance and systems-on-chip integration. Proc. of IEEE, 89(5):602--533, May 2001.Google ScholarCross Ref
- P. Barford and M. Crovella. Generating representative web workloads for network and server performance evaluation. In Measurement and Modeling of Computer Systems, pages 151--160, 1998. Google ScholarDigital Library
- L. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese. Piranha: A scalable architecture based on single-chip multiprocessing. In Proc. Int'l Symp. on Computer Architecture, June 2000. Google ScholarDigital Library
- N.L. Binkert, R.G. Dreslinski, L.R. Hsu, K.T. Lim, A.G. Saidi, and S.K. Reinhardt. The M5 simulator: Modeling networked systems. IEEE Micro, 26(4):52--60, Jul/Aug 2006. Google ScholarDigital Library
- B. Black, D. Nelson, C. Webb, and N. Samra. 3D processing technology and its impact on iA32 microprocessors. In Proc. Int'l Conf. of Computer Design, pages 316--318, 2004. Google ScholarDigital Library
- T.-Y. Chiang, S.J. Souri, C.O. Chui, and K.C. Saraswat. Thermal analysis of heterogeneous 3-D ICs with various integration scenario. In IEDM Technical Digest, pages 681--684, Dec. 2001.Google Scholar
- L.T. Clark, E.J. Hoffman, J. Miller, M. Biyani, Y. Liao, S. Strazdus, M. Morrow, K.E. Verlarde, and M.A. Yarch. An embedded 32-b microprocessor core for low-power and high-performance applications. IEEE Journal of Solid State Circuits, 36(11):1599--1608, Nov. 2001.Google ScholarCross Ref
- E.L. Congduc. Packet classification in the NIC for improved SMPbased internet servers. In Proc. Int'l Conf. on Networking, Feb. 2004.Google Scholar
- W.R. Davis, J.Wilson, S. Mick, J. Xu, H. Hua, C. Mineo, A.M. Sule, M. Steer, and P.D. Franzon. Demystifying 3D ICs: The pros and cons of going vertical. IEEE Design & Test of Computers, 22(6):498--510, 2005. Google ScholarDigital Library
- M.J. Flynn and P. Hung. Computer architecture and technology: Some thoughts on the road ahead. In Proc. Int'l Conf. on Engineering of Reconfigurable Systems and Algorithms, pages 3--16, 2004.Google Scholar
- B. Goplen and S.S. Sapatnekar. Thermal via placement in 3D ICs. In Proc. Int'l Symp. on Physical Design, pages 167--174, Apr. 2005. Google ScholarDigital Library
- S. Gupta, M. Hilbert, S. Hong, and R. Patti. Techniques for producing 3D ICs with high-density interconnect. www.tezzaron.com/about/papers/ieee_vmic_2004_finalsecure.pdf.Google Scholar
- R. Ho and M. Horowitz. The future of wires. Proc. of the IEEE, 89(4), Apr. 2001.Google ScholarCross Ref
- W. Huang, M.R. Stan, K. Skadron, K. Sankaranarayanan, S. Ghosh, and S. Velusam. Compact thermal modeling for temperature-aware design. In Proc. Design Automation Conf., June 2004. Google ScholarDigital Library
- P. Kongetira, K. Aingaran, and K. Olukotun. Niagara: A 32-way multithreaded Sparc processor. IEEE Micro, 25(2):21--29, Mar. 2005. Google ScholarDigital Library
- M. Koyanagi. Different approaches to 3D chips. http://asia.stanford.edu/events/Spring05/slides/051205-Koyanagi.pdf.Google Scholar
- C. Kozyrakis, J. Gebis, D. Martin, S. Williams, I. Mavroidis, S. Pope, D. Jones, D. Patterson, and K. Yelick. Vector IRAM: A mediaoriented vector processor with embedded DRAM. In Hotchips, Aug. 2000.Google Scholar
- J. Laudon. Performance/watt: the new server focus. SIGARCH Computer Architecture News, 33(4):5--13, 2005. Google ScholarDigital Library
- K. Lee, T. Nakamura, T. Ono, Y. Yamada, T. Mizukusa, H. Hashimoto, K. Park, H. Kurino, and M. Koyanagi. Three-dimensional shared memory fabricated using wafer stacking technology. In IEDM Technical Digest., pages 165--168, Dec 2000.Google ScholarCross Ref
- J. Li and J.F. Martinez. Power-performance implications of threadlevel parallelism in chip multiprocessors. In Proc. Int'l Symp. on Performance Analysis of Systems and Software, Mar. 2005. Google ScholarDigital Library
- J. Lu. Wafer-level 3D hyper-integration technology platform. www.rpi.edu/~luj/RPI_3D_Research_0504.pdf.Google Scholar
- G. MacGillivray. Process vs. density in DRAMs. http://www.eetasia.com/ARTICLES/2005SEP/B/2005SEP01_STOR_TA.pdf.Google Scholar
- D.A. Maltz and P. Bhagwat. TCP splicing for application layer proxy performance. Research Report RC 21139, IBM, Mar. 1998.Google Scholar
- R.E. Matick and S.E. Schuster. Logic-based eDRAM: origins and rationale for use. IBM Journal of Research and Development, 49(1), Jan. 2005. Google ScholarDigital Library
- T. Mudge. Power: A first-class architectural design constraint. IEEE Computer, 34(4), Apr. 2001. Google ScholarDigital Library
- K. Olukotun, B.A. Nayfeh, L. Hammond, K. Wilson, and K. Chang. The case for a single-chip multiprocessor. In Proc. Int'l Conf. on Arch. Support for Prog. Lang. and Oper. Sys., Oct. 1996. Google ScholarDigital Library
- A. Rahman and R. Reif. System-level performance evaluation of three-dimensional integrated circuits. IEEE Trans. on VLSI, 8, Dec. 2000. Google ScholarDigital Library
- F. Ricci, L.T. Clark, T. Beatty, W. Yu, A. Bashmakov, S. Demmons, E. Fox, J. Miller, M. Biyani, and J. Haigh. A 1.5GHz 90nm embedded microprocessor core. In Proc. Symp. on VLSI Circuits, June 2005.Google Scholar
- J. Schutz and C. Webb. A scalable X86 CPU design for 90 nm process. In Proc. Int'l Solid-State Circuits Conference, Feb. 2004.Google ScholarCross Ref
- D. Wendell, J. Lin, P. Kaushik, S. Seshadri, A. Wang, V. Sundararaman, P. Wang, H. McIntyre, S. Kim, W. Hsu, H. Park, G. Levinsky, J. Lu, M. Chirania, R. Heald, and P. Lazar. A 4MB on-chip l2 cache for a 90nm 1.6GHz 64b SPARC microprocessor. In Proc. Int'l Solid-State Circuits Conference, Feb. 2004.Google ScholarCross Ref
- L. Xue, C.C. Liu, H.-S. Kim, S. Kim, and S. Tiwari. Threedimensional integration: Technology, use, and issues for mixed-signal applications. IEEE Trans. on Electron Devices, 50:601--609, May 2003.Google ScholarCross Ref
Index Terms
- PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor
Recommendations
PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor
ASPLOS XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systemsIn this paper, we show how 3D stacking technology can be used to implement a simple, low-power, high-performance chip multiprocessor suitable for throughput processing. Our proposed architecture, PicoServer, employs 3D technology to bond one die ...
PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor
Proceedings of the 2006 ASPLOS ConferenceIn this paper, we show how 3D stacking technology can be used to implement a simple, low-power, high-performance chip multiprocessor suitable for throughput processing. Our proposed architecture, PicoServer, employs 3D technology to bond one die ...
PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor
Proceedings of the 2006 ASPLOS ConferenceIn this paper, we show how 3D stacking technology can be used to implement a simple, low-power, high-performance chip multiprocessor suitable for throughput processing. Our proposed architecture, PicoServer, employs 3D technology to bond one die ...
Comments