skip to main content
10.1145/3020078.3021745acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
research-article

ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA

Published:22 February 2017Publication History

ABSTRACT

Long Short-Term Memory (LSTM) is widely used in speech recognition. In order to achieve higher prediction accuracy, machine learning scientists have built increasingly larger models. Such large model is both computation intensive and memory intensive. Deploying such bulky model results in high power consumption and leads to a high total cost of ownership (TCO) of a data center. To speedup the prediction and make it energy efficient, we first propose a load-balance-aware pruning method that can compress the LSTM model size by 20x (10x from pruning and 2x from quantization) with negligible loss of the prediction accuracy. The pruned model is friendly for parallel processing. Next, we propose a scheduler that encodes and partitions the compressed model to multiple PEs for parallelism and schedule the complicated LSTM data flow. Finally, we design the hardware architecture, named Efficient Speech Recognition Engine (ESE) that works directly on the sparse LSTM model.

Implemented on Xilinx KU060 FPGA running at 200MHz, ESE has a performance of 282 GOPS working directly on the sparse LSTM network, corresponding to 2.52 TOPS on the dense one, and processes a full LSTM for speech recognition with a power dissipation of 41 Watts. Evaluated on the LSTM for speech recognition benchmark, ESE is 43x and 3x faster than Core i7 5930k CPU and Pascal Titan X GPU implementations. It achieves 40x and 11.5x higher energy efficiency compared with the CPU and GPU respectively.

References

  1. A. X. M. Chang, B. Martini, and E. Culurciello. Recurrent neural networks hardware implementation on FPGA. CoRR, abs/1511.05552, 2015.Google ScholarGoogle Scholar
  2. T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam. Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. In ASPLOS, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun, and O. Temam. Dadiannao: A machine-learning supercomputer. In MICRO, December 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Dorrance, F. Ren, et al. A scalable sparse matrix-vector multiplication kernel for energy-efficient sparse-blas on FPGAs. In FPGA, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Z. Du, R. Fasthuber, T. Chen, P. Ienne, L. Li, T. Luo, X. Feng, Y. Chen, and O. Temam. Shidiannao: shifting vision processing closer to the sensor. In ISCA, pages 92--104. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. A. et al. Deep speech 2: End-to-end speech recognition in english and mandarin. arXiv, preprint arXiv:1512.02595, 2015.Google ScholarGoogle Scholar
  7. J. Fowers, K. Ovtcharov, K. Strauss, et al. A high memory bandwidth fpga accelerator for sparse matrixvector multiplication. In FCCM, 2014.Google ScholarGoogle Scholar
  8. J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, and D. S. Pallett. Darpa timit acoustic-phonetic continous speech corpus cd-rom. nist speech disc 1--1.1. NASA STI/Recon technical report n, 93, 1993.Google ScholarGoogle ScholarCross RefCross Ref
  9. K. Guo, L. Sui, et al. Angel-eye: A complete design flow for mapping cnn onto customized hardware. In ISVLSI, 2016. Google ScholarGoogle ScholarCross RefCross Ref
  10. S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally. Eie: efficient inference engine on compressed deep neural network. arXiv preprint arXiv:1602.01528, 2016.Google ScholarGoogle Scholar
  11. S. Han, H. Mao, and W. J. Dally. Deep Compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. ICLR, 2016.Google ScholarGoogle Scholar
  12. S. Han, J. Pool, J. Tran, and W. J. Dally. Learning both weights and connections for efficient neural networks. In Proceedings of Advances in Neural Information Processing Systems, 2015.Google ScholarGoogle Scholar
  13. A. Hannun, C. Case, J. Casper, B. Catanzaro, G. Diamos, E. Elsen, R. Prenger, S. Satheesh, S. Sengupta, A. Coates, and A. Ng. Deep speech: Scaling up end-to-end speech recognition. arXiv, preprint arXiv:1412.5567, 2014.Google ScholarGoogle Scholar
  14. S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Lee, K. Hwang, J. Park, S. Choi, S. Shin, and W. Sung. Fpga-based low-power speech recognition with recurrent neural networks. arXiv preprint arXiv:1610.00552, 2016.Google ScholarGoogle Scholar
  16. E. Nurvitadhi, J. Sim, D. Sheffield, A. Mishra, S. Krishnan, and D. Marr. Accelerating recurrent neural networks in analytics servers: Comparison of fpga, cpu, gpu, and asic. In Field Programmable Logic and Applications (FPL), 2016 26th International Conference on, pages 1--4. EPFL, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  17. D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz, et al. The Kaldi speech recognition toolkit. In IEEE 2011 workshop on automatic speech recognition and understanding, 2011.Google ScholarGoogle Scholar
  18. J. Qiu, J. Wang, et al. Going deeper with embedded FPGA platform for convolutional neural network. In FPGA, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. H. Sak et al. Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In INTERSPEECH, pages 338--342, 2014.Google ScholarGoogle Scholar
  20. L. D. Xuedong Huang. An Overview of Modern Speech Recognition, pages 339--366. Chapman & Hall/CRC, January 2010.Google ScholarGoogle Scholar
  21. L. Zhuo and V. K. Prasanna. Sparse matrix-vector multiplication on fpgas. In FPGA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
        February 2017
        312 pages
        ISBN:9781450343541
        DOI:10.1145/3020078

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 22 February 2017

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        FPGA '17 Paper Acceptance Rate25of101submissions,25%Overall Acceptance Rate125of627submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader