ABSTRACT
Owing to a growing desire to reduce energy consumption and widely anticipated hurdles to the continued technology scaling promised by Moore's law, techniques and technologies such as inexact circuits and probabilistic CMOS (PCMOS) have gained prominence. These radical approaches trade accuracy at the hardware level for significant gains in energy consumption, area, and speed. While holding great promise, their ability to influence the broader milieu of computing is limited due to two shortcomings. First, they were mostly based on ad-hoc hand designs and did not consider algorithmically well-characterized automated design methodologies. Also, existing design approaches were limited to particular layers of abstraction such as physical, architectural and algorithmic or more broadly software. However, it is well-known that significant gains can be achieved by optimizing across the layers. To respond to this need, in this paper, we present an algorithmically well-founded cross-layer co-design framework (CCF) for automatically designing inexact hardware in the form of datapath elements. Specifically adders and multipliers, and show that significant associated gains can be achieved in terms of energy, area, and delay or speed. Our algorithms can achieve these gains with adding any additional hardware overhead. The proposed CCF framework embodies a symbiotic relationship between architecture and logic-layer design through the technique of probabilistic pruning combined with the novel confined voltage scaling technique introduced in this paper, applied at the physical layer. A second drawback of the state of the art with inexact design is the lack of physical evidence established through measuring fabricated ICs that the gains and other benefits that can be achieved are valid. Again, in this paper, we have addressed this shortcoming by using CCF to fabricate a prototype chip implementing inexact data-path elements; a range of 64-bit integer adders whose outputs can be erroneous. Through physical measurements of our prototype chip wherein the inexact adders admit expected relative error magnitudes of 10% or less, we have found that cumulative gains over comparable and fully accurate chips, quantified through the area-delay-energy product, can be a multiplicative factor of 15 or more. As evidence of the utility of these results, we demonstrate that despite admitting error while achieving gains, images processed using the FFT algorithm implemented using our inexact adders are visually discernible.
- A. Kahng et al. Slack redistribution for graceful degradation under voltage overscaling. in proc. of ASPDAC, pages 825 -- 831, Jan 2010. Google ScholarDigital Library
- A. Lingamneni et al. Energy parsimonious circuit design through probabilistic pruning. in proc. of DATE, pages 764--769, Mar 2011.Google ScholarCross Ref
- A. Lingamneni et al. Parsimonious circuit design for error-tolerant applications through probabilistic logic minimization. in the proc. of the PATMOS, pages 204--213, 2011. Google ScholarDigital Library
- A Lingamneni et al. Synthesizing parsimonious inexact circuits through probabilistic design techniques. in the special issue on Probabilistic Embedded Computing, ACM Transactions on Embedded Computing Systems, 2012.Google Scholar
- S. Borkar. Designing reliable systems from unreliable components: The challenges of transistor variability and degradation. IEEE Micro, 25(6):10--16, 2005. Google ScholarDigital Library
- D. Ernst et al. Razor: A low-power pipeline based on circuit-level timing speculation. In in proc. of MICRO, pages 7--18, Oct. 2003. Google ScholarDigital Library
- G. Karakonstantis et al. Herqules: system level cross-layer design exploration for efficient energy-quality trade-offs. in the proc. of ISLPED, (117--122), 2010. Google ScholarDigital Library
- G. V. Varatkar et al. Energy-efficient motion estimation using error-tolerance. In proc. of ISLPED, pages 113 -- 118, Oct 2006. Google ScholarDigital Library
- D. Harris. A taxonomy of parallel prefix networks. Asilomar Conference on Signals, Systems and Computers, 2:2213, Nov 2003.Google ScholarCross Ref
- R. Hegde and N. R. Shanbhag. Energy-efficient signal processing via algorithmic noise-tolerance. In Proc. Int. Symp. on Low Power Electronics and Design, pages 30--35, 1999. Google ScholarDigital Library
- R. Hegde and N. R. Shanbhag. Soft digital signal processing. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 9(6):813--823, Dec. 2001. Google ScholarDigital Library
- itrs. International technology roadmap for semiconductors, 2007.Google Scholar
- J. George et al. Probabilistic arithmetic and energy efficient embedded signal processing. In proc. of IEEE/ACM CASES, pages 158 -- 168, 2006. Google ScholarDigital Library
- J. Ray et al. Dual use of superscalar datapath for transient-fault detection and recovery. In in proc. of MICRO, pages 214--224, 2001. Google ScholarDigital Library
- J.T. Ludwig et al. Low-power digital filtering using approximate processing. IEEE Journal of Solid-State Circuits, 31(3):395--400, Mar. 1996.Google ScholarCross Ref
- H. Kaul, M. Anders, S. Mathew, S. Hsu, A. Agarwal, R. Krishnamurthy, and S. Borkar. A 320 mv 56 μw 411 gops/watt ultra-low voltage motion estimation accelerator in 65 nm cmos. In IEEE Journal of Solid-State Circuits, pages 107--114, 2008.Google Scholar
- K.V. Palem et al. Sustaining moore's law in embedded computing through probabilistic and approximate design: retrospects and prospects. In in proc. of CASES, pages 1--10, 2009. Google ScholarDigital Library
- L. N. B. Chakrapani et al. Probabilistic system-on-a-chip architectures. in ACM Trans. on Design Automation of Electronic Sys, 12(3):1--28, 2007. Google ScholarDigital Library
- L.N.B. Chakrapani et al. Highly energy and performance efficient embedded computing through approximately correct arithmetic: A mathematical foundation and preliminary experimental validation. In proc. of IEEE/ACM CASES, pages 187--196, 2008. Google ScholarDigital Library
- M. Alioto et al. Impact of supply voltage variations on full adder delay: analysis and comparison. IEEE Transactions on VLSI Systems, 14(12):1322, Dec 2006. Google ScholarDigital Library
- N Banerjee et al. Process variation tolerant low power DCT architecture. In Design, Automation and Test in Europe Conference, Apr 2007. Google ScholarDigital Library
- K. V. Palem. Energy aware algorithm design via probabilistic computing: From algorithms and models to Moore's law and novel (semiconductor) devices. In proc. of CASES, pages 113 -- 116, 2003. Google ScholarDigital Library
- K. V. Palem. Energy aware computing through probabilistic switching: A study of limits. IEEE Transactions on Computers, 54(9):1123--1137, 2005. Google ScholarDigital Library
- K. V. Palem, S. Cheemalavagu, P. Korkmaz, and B. E. Akgul. Probabilistic and introverted switching to conserve energy in a digital system. US Patent, (20050240787), 2005.Google Scholar
- N. Pippenger. Analysis of carry propagation in addition: An elementary approach. Journal of Algorithms, 42:317--313, 2002.Google ScholarDigital Library
- R. M. Karp et al. Average case analysis of a heuristic for the assignment problem. Mathematics of Operations Research, 19(3):513--522, Aug 1994. Google ScholarDigital Library
- S. Gyger et al. Hardware development kit for systems based on an icyflex processor. CSEM Scientific and Technical Report, 2009.Google Scholar
- S. H. Nawab et al. Approximate signal processing. The Journal of VLSI Signal Processing, 15:177--200, 1997. Google ScholarDigital Library
- S Narayanan et al. Scalable stochastic processors. In in proc. of DATE, pages 335 -- 338, Mar 2010. Google ScholarDigital Library
- S.H. Kim et al. Experimental analysis of sequence dependence on energy saving for error tolerant image processing. in the proc. of ISLPED, 2009. Google ScholarDigital Library
- V. K. Chippa et al. Scalable effort hardware design: exploiting algorithmic resilience for energy efficiency. in the proc. of DAC, (555--560), 2010. Google ScholarDigital Library
Index Terms
- Algorithmic methodologies for ultra-efficient inexact architectures for sustaining technology scaling
Recommendations
Synthesizing Parsimonious Inexact Circuits through Probabilistic Design Techniques
Special Section on Probabilistic Embedded ComputingThe domain of inexact circuit design, in which accuracy of the circuit can be exchanged for substantial cost (energy, delay, and/or area) savings, has been gathering increasing prominence of late owing to a growing desire for reducing energy consumption ...
Towards ultra-efficient QCA reversible circuits
Nanotechnologies, remarkably Quantum-dot Cellular Automata (QCA), offer an attractive perspective for future computing technologies. In this paper, QCA is investigated as an implementation method for reversible logic. A novel XOR gate and also a new ...
Ultra-area-efficient reversible multiplier
One of the most promising technologies in designing low-power circuits is reversible computing. It is used in nanotechnology, quantum computing, quantum dot cellular automata (QCA), DNA computing, optical computing and in CMOS low-power designs. Because ...
Comments