skip to main content
article

PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor

Published:20 October 2006Publication History
Skip Abstract Section

Abstract

In this paper, we show how 3D stacking technology can be used to implement a simple, low-power, high-performance chip multiprocessor suitable for throughput processing. Our proposed architecture, PicoServer, employs 3D technology to bond one die containing several simple slow processing cores to multiple DRAM dies sufficient for a primary memory. The 3D technology also enables wide low-latency buses between processors and memory. These remove the need for an L2 cache allowing its area to be re-allocated to additional simple cores. The additional cores allow the clock frequency to be lowered without impairing throughput. Lower clock frequency in turn reduces power and means that thermal constraints, a concern with 3D stacking, are easily satisfied.The PicoServer architecture specifically targets Tier 1 server applications, which exhibit a high degree of thread level parallelism. An architecture targeted to efficient throughput is ideal for this application domain. We find for a similar logic die area, a 12 CPU system with 3D stacking and no L2 cache outperforms an 8 CPU system with a large on-chip L2 cache by about 14% while consuming 55% less power. In addition, we show that a PicoServer performs comparably to a Pentium 4-like class machine while consuming only about 1/10 of the power, even when conservative assumptions are made about the power consumption of the PicoServer.

References

  1. ARM 11 MPcore. http://www.arm.com/products/CPUs/ARM11MPCoreMultiprocessor.html.Google ScholarGoogle Scholar
  2. Evolution of network memory. http://www.jedex.org/images/pdf/jack_troung_samsung.pdf.Google ScholarGoogle Scholar
  3. FaStack 3D RISC super-8051 microcontroller. http://www.tachyonsemi.com/OtherICs/datasheets/TSCR8051Lx_1_5Web.pdf.Google ScholarGoogle Scholar
  4. The Micron system-power calculator. http://www.micron.com/products/dram/syscalc.html.Google ScholarGoogle Scholar
  5. National semiconductor DP83820 10 / 100 / 1000 Mb/s PCI ethernet network interface controller.Google ScholarGoogle Scholar
  6. Predictive technology model. http://www.eas.asu.edu/~ptm.Google ScholarGoogle Scholar
  7. (LS)3-libre streaming, libre software, libre standards an open multimedia streaming project. http://streaming.polito.it/.Google ScholarGoogle Scholar
  8. RLDRAM memory. http://www.micron.com/products/dram/rldram/.Google ScholarGoogle Scholar
  9. SPECweb99 benchmark. http://www.spec.org/osg/web99/.Google ScholarGoogle Scholar
  10. Sun Fire T2000 Server Power Calculator. http://www.sun.com/servers/coolthreads/t2000/calc/index.jsp.Google ScholarGoogle Scholar
  11. ITRS roadmap. Technical report, 2005.Google ScholarGoogle Scholar
  12. K. Banerjee, S.J. Souri, P. Kapur, and K.C. Saraswat. 3-D ICs: A novel chip design for improving deep-submicrometer interconnect performance and systems-on-chip integration. Proc. of IEEE, 89(5):602--533, May 2001.Google ScholarGoogle ScholarCross RefCross Ref
  13. P. Barford and M. Crovella. Generating representative web workloads for network and server performance evaluation. In Measurement and Modeling of Computer Systems, pages 151--160, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. L. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese. Piranha: A scalable architecture based on single-chip multiprocessing. In Proc. Int'l Symp. on Computer Architecture, June 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. N.L. Binkert, R.G. Dreslinski, L.R. Hsu, K.T. Lim, A.G. Saidi, and S.K. Reinhardt. The M5 simulator: Modeling networked systems. IEEE Micro, 26(4):52--60, Jul/Aug 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. B. Black, D. Nelson, C. Webb, and N. Samra. 3D processing technology and its impact on iA32 microprocessors. In Proc. Int'l Conf. of Computer Design, pages 316--318, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T.-Y. Chiang, S.J. Souri, C.O. Chui, and K.C. Saraswat. Thermal analysis of heterogeneous 3-D ICs with various integration scenario. In IEDM Technical Digest, pages 681--684, Dec. 2001.Google ScholarGoogle Scholar
  18. L.T. Clark, E.J. Hoffman, J. Miller, M. Biyani, Y. Liao, S. Strazdus, M. Morrow, K.E. Verlarde, and M.A. Yarch. An embedded 32-b microprocessor core for low-power and high-performance applications. IEEE Journal of Solid State Circuits, 36(11):1599--1608, Nov. 2001.Google ScholarGoogle ScholarCross RefCross Ref
  19. E.L. Congduc. Packet classification in the NIC for improved SMPbased internet servers. In Proc. Int'l Conf. on Networking, Feb. 2004.Google ScholarGoogle Scholar
  20. W.R. Davis, J.Wilson, S. Mick, J. Xu, H. Hua, C. Mineo, A.M. Sule, M. Steer, and P.D. Franzon. Demystifying 3D ICs: The pros and cons of going vertical. IEEE Design & Test of Computers, 22(6):498--510, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M.J. Flynn and P. Hung. Computer architecture and technology: Some thoughts on the road ahead. In Proc. Int'l Conf. on Engineering of Reconfigurable Systems and Algorithms, pages 3--16, 2004.Google ScholarGoogle Scholar
  22. B. Goplen and S.S. Sapatnekar. Thermal via placement in 3D ICs. In Proc. Int'l Symp. on Physical Design, pages 167--174, Apr. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Gupta, M. Hilbert, S. Hong, and R. Patti. Techniques for producing 3D ICs with high-density interconnect. www.tezzaron.com/about/papers/ieee_vmic_2004_finalsecure.pdf.Google ScholarGoogle Scholar
  24. R. Ho and M. Horowitz. The future of wires. Proc. of the IEEE, 89(4), Apr. 2001.Google ScholarGoogle ScholarCross RefCross Ref
  25. W. Huang, M.R. Stan, K. Skadron, K. Sankaranarayanan, S. Ghosh, and S. Velusam. Compact thermal modeling for temperature-aware design. In Proc. Design Automation Conf., June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. P. Kongetira, K. Aingaran, and K. Olukotun. Niagara: A 32-way multithreaded Sparc processor. IEEE Micro, 25(2):21--29, Mar. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. Koyanagi. Different approaches to 3D chips. http://asia.stanford.edu/events/Spring05/slides/051205-Koyanagi.pdf.Google ScholarGoogle Scholar
  28. C. Kozyrakis, J. Gebis, D. Martin, S. Williams, I. Mavroidis, S. Pope, D. Jones, D. Patterson, and K. Yelick. Vector IRAM: A mediaoriented vector processor with embedded DRAM. In Hotchips, Aug. 2000.Google ScholarGoogle Scholar
  29. J. Laudon. Performance/watt: the new server focus. SIGARCH Computer Architecture News, 33(4):5--13, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. K. Lee, T. Nakamura, T. Ono, Y. Yamada, T. Mizukusa, H. Hashimoto, K. Park, H. Kurino, and M. Koyanagi. Three-dimensional shared memory fabricated using wafer stacking technology. In IEDM Technical Digest., pages 165--168, Dec 2000.Google ScholarGoogle ScholarCross RefCross Ref
  31. J. Li and J.F. Martinez. Power-performance implications of threadlevel parallelism in chip multiprocessors. In Proc. Int'l Symp. on Performance Analysis of Systems and Software, Mar. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. Lu. Wafer-level 3D hyper-integration technology platform. www.rpi.edu/~luj/RPI_3D_Research_0504.pdf.Google ScholarGoogle Scholar
  33. G. MacGillivray. Process vs. density in DRAMs. http://www.eetasia.com/ARTICLES/2005SEP/B/2005SEP01_STOR_TA.pdf.Google ScholarGoogle Scholar
  34. D.A. Maltz and P. Bhagwat. TCP splicing for application layer proxy performance. Research Report RC 21139, IBM, Mar. 1998.Google ScholarGoogle Scholar
  35. R.E. Matick and S.E. Schuster. Logic-based eDRAM: origins and rationale for use. IBM Journal of Research and Development, 49(1), Jan. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. T. Mudge. Power: A first-class architectural design constraint. IEEE Computer, 34(4), Apr. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. K. Olukotun, B.A. Nayfeh, L. Hammond, K. Wilson, and K. Chang. The case for a single-chip multiprocessor. In Proc. Int'l Conf. on Arch. Support for Prog. Lang. and Oper. Sys., Oct. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. A. Rahman and R. Reif. System-level performance evaluation of three-dimensional integrated circuits. IEEE Trans. on VLSI, 8, Dec. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. F. Ricci, L.T. Clark, T. Beatty, W. Yu, A. Bashmakov, S. Demmons, E. Fox, J. Miller, M. Biyani, and J. Haigh. A 1.5GHz 90nm embedded microprocessor core. In Proc. Symp. on VLSI Circuits, June 2005.Google ScholarGoogle Scholar
  40. J. Schutz and C. Webb. A scalable X86 CPU design for 90 nm process. In Proc. Int'l Solid-State Circuits Conference, Feb. 2004.Google ScholarGoogle ScholarCross RefCross Ref
  41. D. Wendell, J. Lin, P. Kaushik, S. Seshadri, A. Wang, V. Sundararaman, P. Wang, H. McIntyre, S. Kim, W. Hsu, H. Park, G. Levinsky, J. Lu, M. Chirania, R. Heald, and P. Lazar. A 4MB on-chip l2 cache for a 90nm 1.6GHz 64b SPARC microprocessor. In Proc. Int'l Solid-State Circuits Conference, Feb. 2004.Google ScholarGoogle ScholarCross RefCross Ref
  42. L. Xue, C.C. Liu, H.-S. Kim, S. Kim, and S. Tiwari. Threedimensional integration: Technology, use, and issues for mixed-signal applications. IEEE Trans. on Electron Devices, 50:601--609, May 2003.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM SIGOPS Operating Systems Review
            ACM SIGOPS Operating Systems Review  Volume 40, Issue 5
            Proceedings of the 2006 ASPLOS Conference
            December 2006
            425 pages
            ISSN:0163-5980
            DOI:10.1145/1168917
            Issue’s Table of Contents
            • cover image ACM Conferences
              ASPLOS XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
              October 2006
              440 pages
              ISBN:1595934510
              DOI:10.1145/1168857

            Copyright © 2006 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 20 October 2006

            Check for updates

            Qualifiers

            • article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader