This book covers algorithmic and hardware implementation techniques to enable embedded deep learning. The authors describe synergetic design approaches on the application-, algorithmic-, computer architecture-, and circuit-level that will help in achieving the goal of reducing the computational cost of deep learning algorithms. The impact of these techniques is displayed in four silicon prototypes for embedded deep learning.

Gives a wide overview of a series of effective solutions for energy-efficient neural networks on battery constrained wearable devices;

Discusses the optimization of neural networks for embedded deployment on all levels of the design hierarchy – applications, algorithms, hardware architectures, and circuits – supported by real silicon prototypes;

Elaborates on how to design efficient Convolutional Neural Network processors, exploiting parallelism and data-reuse, sparse operations, and low-precision computations;

Supports the introduced theory and design concepts by four real silicon prototypes. The physical realization’s implementation and achieved performances are discussed elaborately to illustrated and highlight the introduced cross-layer design concepts.