research-article

Long live TIME: improving lifetime for training-in-memory engines by structured gradient sparsification

Authors:
Yi Cai

Tsinghua University, Beijing, China and Beijing National Research Center for Information Science and Technology (BNRist), Beijing, China

Tsinghua University, Beijing, China and Beijing National Research Center for Information Science and Technology (BNRist), Beijing, China
View Profile

,
Yujun Lin

Tsinghua University, Beijing, China and Beijing National Research Center for Information Science and Technology (BNRist), Beijing, China

Tsinghua University, Beijing, China and Beijing National Research Center for Information Science and Technology (BNRist), Beijing, China
View Profile

,
Lixue Xia

Tsinghua University, Beijing, China and Beijing National Research Center for Information Science and Technology (BNRist), Beijing, China

Tsinghua University, Beijing, China and Beijing National Research Center for Information Science and Technology (BNRist), Beijing, China
View Profile

,
Xiaoming Chen

Chinese Academy of Sciences, Beijing, China

Chinese Academy of Sciences, Beijing, China
View Profile

,
Song Han

Massachusetts Institute of Technology

Massachusetts Institute of Technology
View Profile

,
Yu Wang

Tsinghua University, Beijing, China and Beijing National Research Center for Information Science and Technology (BNRist), Beijing, China

Tsinghua University, Beijing, China and Beijing National Research Center for Information Science and Technology (BNRist), Beijing, China
View Profile

,
Huazhong Yang

Tsinghua University, Beijing, China and Beijing National Research Center for Information Science and Technology (BNRist), Beijing, China

Tsinghua University, Beijing, China and Beijing National Research Center for Information Science and Technology (BNRist), Beijing, China
View Profile

DAC '18: Proceedings of the 55th Annual Design Automation ConferenceJune 2018Article No.: 107Pages 1–6https://doi.org/10.1145/3195970.3196071

Published:24 June 2018Publication History

DAC '18: Proceedings of the 55th Annual Design Automation Conference

Pages 1–6

ABSTRACT

Deeper and larger Neural Networks (NNs) have made breakthroughs in many fields. While conventional CMOS-based computing platforms are hard to achieve higher energy efficiency. RRAM-based systems provide a promising solution to build efficient Training-In-Memory Engines (TIME). While the endurance of RRAM cells is limited, it's a severe issue as the weights of NN always need to be updated for thousands to millions of times during training. Gradient sparsification can address this problem by dropping off most of the smaller gradients but introduce unacceptable computation cost. We proposed an effective framework, SGS-ARS, including Structured Gradient Sparsification (SGS) and Aging-aware Row Swapping (ARS) scheme, to guarantee write balance across whole RRAM crossbars and prolong the lifetime of TIME. Our experiments demonstrate that 356× lifetime extension is achieved when TIME is programmed to train ResNet-50 on Imagenet dataset with our SGS-ARS framework.

References

Aji et al. 2017. Sparse Communication for Distributed Gradient Descent. In Empirical Methods in Natural Language Processing (EMNLP).Google Scholar
Karsten Beckmann et al. 2016. Nanoscale Hafnium Oxide RRAM Devices Exhibit Pulse Dependent Behavior and Multi-level Resistance Capability. Mrs Advances 1 (2016), 1--6.Google Scholar
G. W. Burr et al. 2015. Large-scale neural networks implemented with nonvolatile memory as the synaptic weight element: Comparative performance analysis (accuracy, speed, and power). In IEEE International Electron Devices Meeting. 4.4.1--4.4.4.Google Scholar
C. H Cheng et al. 2010. Novel Ultra-low power RRAM with good endurance and retention. In VLSI Technology. 85--86.Google Scholar
Ming Cheng et al. 2017. TIME: A Training-in-memory Architecture for Memristor-based Deep Neural Networks. In Proceedings of the 54th Annual Design Automation Conference 2017. ACM, 26. Google ScholarDigital Library
Kaiming He et al. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
Kevin Hsieh et al. 2017. Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). USENIX Association, Boston, MA, 629--647. Google ScholarDigital Library
Hsu et al. 2013. Self-rectifying bipolar TaOx/TiO2 RRAM with superior endurance over 1012 cycles for 3D high-density storage-class memory. In VLSI Technology. T166--T167.Google Scholar
Andrej Karpathy et al. 2015. Deep visual-semantic alignments for generating image descriptions. In Computer Vision and Pattern Recognition.Google Scholar
Yujun Lin et al. 2018. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training. International Conference on Learning Representations (2018).Google Scholar
Wei Liu et al. 2016. SSD: Single Shot MultiBox Detector. In European Conference on Computer Vision. 21--37.Google Scholar
Olga Russakovsky et al. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115, 3 (2015), 211--252. Google ScholarDigital Library
Linghao Song et al. 2017. PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning. In IEEE International Symposium on High PERFORMANCE Computer Architecture. 541--552.Google Scholar
Christian Szegedy et al. 2015. Going deeper with convolutions. In Computer Vision and Pattern Recognition. 1--9.Google Scholar
M. M. Waldrop. 2016. The chips are down for Moore's law. Nature 530, 7589 (2016), 144.Google Scholar
Yu Wang et al. 2016. Low power Convolutional Neural Networks on a chip. In IEEE International Symposium on Circuits and Systems. 129--132.Google Scholar
Lixue Xia et al. 2017. Fault-Tolerant Training with On-Line Fault Detection for RRAM-Based Neural Computing Systems. In Design Automation Conference. 1--6. Google ScholarDigital Library

Recommendations

ERIS live: a NUMA-aware in-memory storage engine for tera-scale multiprocessor systems
SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data

The ever-growing demand for more computing power forces hardware vendors to put an increasing number of multiprocessors into a single server system, which usually exhibits a non-uniform memory access (NUMA). In-memory database systems running on NUMA ...
Read More
Long Live TIME: Improving Lifetime for Training-In-Memory Engines by Structured Gradient Sparsification
2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)
Deeper and larger Neural Networks (NNs) have made breakthroughs in many fields. While conventional CMOS-based computing platforms are hard to achieve higher energy efficiency. RRAM-based systems provide a promising solution to build efficient Training-In-...
Read More
Multiple neural networks for a long term time series forecast

The artificial neural network (ANN) methodology has been used in various time series prediction applications. However, the accuracy of a neural network model may be seriously compromised when it is used recursively for making long-term multi-step ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

DAC '18: Proceedings of the 55th Annual Design Automation Conference
June 2018
1089 pages
ISBN:9781450357005
DOI:10.1145/3195970

Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 June 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,770of5,499submissions,32%
Upcoming Conference
DAC '24

Sponsor:

sigda

61st ACM/IEEE Design Automation Conference

June 23 - 27, 2024

San Francisco , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 30
  Total Citations
  View Citations
- 407
  Total Downloads
- Downloads (Last 12 months)35
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Long live TIME: improving lifetime for training-in-memory engines by structured gradient sparsification

DAC '18: Proceedings of the 55th Annual Design Automation Conference

ABSTRACT

References

Cited By

Recommendations

ERIS live: a NUMA-aware in-memory storage engine for tera-scale multiprocessor systems

Long Live TIME: Improving Lifetime for Training-In-Memory Engines by Structured Gradient Sparsification

Multiple neural networks for a long term time series forecast

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Long live TIME: improving lifetime for training-in-memory engines by structured gradient sparsification

DAC '18: Proceedings of the 55th Annual Design Automation Conference

ABSTRACT

References

Cited By

Recommendations

ERIS live: a NUMA-aware in-memory storage engine for tera-scale multiprocessor systems

Long Live TIME: Improving Lifetime for Training-In-Memory Engines by Structured Gradient Sparsification

Multiple neural networks for a long term time series forecast

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media