skip to main content
10.1145/2212908.2212912acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
research-article

Algorithmic methodologies for ultra-efficient inexact architectures for sustaining technology scaling

Published:15 May 2012Publication History

ABSTRACT

Owing to a growing desire to reduce energy consumption and widely anticipated hurdles to the continued technology scaling promised by Moore's law, techniques and technologies such as inexact circuits and probabilistic CMOS (PCMOS) have gained prominence. These radical approaches trade accuracy at the hardware level for significant gains in energy consumption, area, and speed. While holding great promise, their ability to influence the broader milieu of computing is limited due to two shortcomings. First, they were mostly based on ad-hoc hand designs and did not consider algorithmically well-characterized automated design methodologies. Also, existing design approaches were limited to particular layers of abstraction such as physical, architectural and algorithmic or more broadly software. However, it is well-known that significant gains can be achieved by optimizing across the layers. To respond to this need, in this paper, we present an algorithmically well-founded cross-layer co-design framework (CCF) for automatically designing inexact hardware in the form of datapath elements. Specifically adders and multipliers, and show that significant associated gains can be achieved in terms of energy, area, and delay or speed. Our algorithms can achieve these gains with adding any additional hardware overhead. The proposed CCF framework embodies a symbiotic relationship between architecture and logic-layer design through the technique of probabilistic pruning combined with the novel confined voltage scaling technique introduced in this paper, applied at the physical layer. A second drawback of the state of the art with inexact design is the lack of physical evidence established through measuring fabricated ICs that the gains and other benefits that can be achieved are valid. Again, in this paper, we have addressed this shortcoming by using CCF to fabricate a prototype chip implementing inexact data-path elements; a range of 64-bit integer adders whose outputs can be erroneous. Through physical measurements of our prototype chip wherein the inexact adders admit expected relative error magnitudes of 10% or less, we have found that cumulative gains over comparable and fully accurate chips, quantified through the area-delay-energy product, can be a multiplicative factor of 15 or more. As evidence of the utility of these results, we demonstrate that despite admitting error while achieving gains, images processed using the FFT algorithm implemented using our inexact adders are visually discernible.

References

  1. A. Kahng et al. Slack redistribution for graceful degradation under voltage overscaling. in proc. of ASPDAC, pages 825 -- 831, Jan 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Lingamneni et al. Energy parsimonious circuit design through probabilistic pruning. in proc. of DATE, pages 764--769, Mar 2011.Google ScholarGoogle ScholarCross RefCross Ref
  3. A. Lingamneni et al. Parsimonious circuit design for error-tolerant applications through probabilistic logic minimization. in the proc. of the PATMOS, pages 204--213, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A Lingamneni et al. Synthesizing parsimonious inexact circuits through probabilistic design techniques. in the special issue on Probabilistic Embedded Computing, ACM Transactions on Embedded Computing Systems, 2012.Google ScholarGoogle Scholar
  5. S. Borkar. Designing reliable systems from unreliable components: The challenges of transistor variability and degradation. IEEE Micro, 25(6):10--16, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Ernst et al. Razor: A low-power pipeline based on circuit-level timing speculation. In in proc. of MICRO, pages 7--18, Oct. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. Karakonstantis et al. Herqules: system level cross-layer design exploration for efficient energy-quality trade-offs. in the proc. of ISLPED, (117--122), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. V. Varatkar et al. Energy-efficient motion estimation using error-tolerance. In proc. of ISLPED, pages 113 -- 118, Oct 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Harris. A taxonomy of parallel prefix networks. Asilomar Conference on Signals, Systems and Computers, 2:2213, Nov 2003.Google ScholarGoogle ScholarCross RefCross Ref
  10. R. Hegde and N. R. Shanbhag. Energy-efficient signal processing via algorithmic noise-tolerance. In Proc. Int. Symp. on Low Power Electronics and Design, pages 30--35, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Hegde and N. R. Shanbhag. Soft digital signal processing. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 9(6):813--823, Dec. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. itrs. International technology roadmap for semiconductors, 2007.Google ScholarGoogle Scholar
  13. J. George et al. Probabilistic arithmetic and energy efficient embedded signal processing. In proc. of IEEE/ACM CASES, pages 158 -- 168, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Ray et al. Dual use of superscalar datapath for transient-fault detection and recovery. In in proc. of MICRO, pages 214--224, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J.T. Ludwig et al. Low-power digital filtering using approximate processing. IEEE Journal of Solid-State Circuits, 31(3):395--400, Mar. 1996.Google ScholarGoogle ScholarCross RefCross Ref
  16. H. Kaul, M. Anders, S. Mathew, S. Hsu, A. Agarwal, R. Krishnamurthy, and S. Borkar. A 320 mv 56 μw 411 gops/watt ultra-low voltage motion estimation accelerator in 65 nm cmos. In IEEE Journal of Solid-State Circuits, pages 107--114, 2008.Google ScholarGoogle Scholar
  17. K.V. Palem et al. Sustaining moore's law in embedded computing through probabilistic and approximate design: retrospects and prospects. In in proc. of CASES, pages 1--10, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. L. N. B. Chakrapani et al. Probabilistic system-on-a-chip architectures. in ACM Trans. on Design Automation of Electronic Sys, 12(3):1--28, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. L.N.B. Chakrapani et al. Highly energy and performance efficient embedded computing through approximately correct arithmetic: A mathematical foundation and preliminary experimental validation. In proc. of IEEE/ACM CASES, pages 187--196, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Alioto et al. Impact of supply voltage variations on full adder delay: analysis and comparison. IEEE Transactions on VLSI Systems, 14(12):1322, Dec 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. N Banerjee et al. Process variation tolerant low power DCT architecture. In Design, Automation and Test in Europe Conference, Apr 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. K. V. Palem. Energy aware algorithm design via probabilistic computing: From algorithms and models to Moore's law and novel (semiconductor) devices. In proc. of CASES, pages 113 -- 116, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. K. V. Palem. Energy aware computing through probabilistic switching: A study of limits. IEEE Transactions on Computers, 54(9):1123--1137, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. K. V. Palem, S. Cheemalavagu, P. Korkmaz, and B. E. Akgul. Probabilistic and introverted switching to conserve energy in a digital system. US Patent, (20050240787), 2005.Google ScholarGoogle Scholar
  25. N. Pippenger. Analysis of carry propagation in addition: An elementary approach. Journal of Algorithms, 42:317--313, 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. R. M. Karp et al. Average case analysis of a heuristic for the assignment problem. Mathematics of Operations Research, 19(3):513--522, Aug 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Gyger et al. Hardware development kit for systems based on an icyflex processor. CSEM Scientific and Technical Report, 2009.Google ScholarGoogle Scholar
  28. S. H. Nawab et al. Approximate signal processing. The Journal of VLSI Signal Processing, 15:177--200, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S Narayanan et al. Scalable stochastic processors. In in proc. of DATE, pages 335 -- 338, Mar 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S.H. Kim et al. Experimental analysis of sequence dependence on energy saving for error tolerant image processing. in the proc. of ISLPED, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. V. K. Chippa et al. Scalable effort hardware design: exploiting algorithmic resilience for energy efficiency. in the proc. of DAC, (555--560), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Algorithmic methodologies for ultra-efficient inexact architectures for sustaining technology scaling

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CF '12: Proceedings of the 9th conference on Computing Frontiers
      May 2012
      320 pages
      ISBN:9781450312158
      DOI:10.1145/2212908

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 15 May 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate240of680submissions,35%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader