Article

Architectural optimizations for low-power, real-time speech recognition

Authors:
Rajeev Krishna

University of Michigan, Ann Arbor, MI

University of Michigan, Ann Arbor, MI
View Profile

,
Scott Mahlke

University of Michigan, Ann Arbor, MI

University of Michigan, Ann Arbor, MI
View Profile

,
Todd Austin

University of Michigan, Ann Arbor, MI

University of Michigan, Ann Arbor, MI
View Profile

CASES '03: Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systemsOctober 2003Pages 220–231https://doi.org/10.1145/951710.951740

Published:30 October 2003Publication History

CASES '03: Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems

Pages 220–231

ABSTRACT

The proliferation of computing technology to low power domains such as hand--held devices has lead to increased interest in portable interface technologies, with particular interest in speech recognition. The computational demands of robust, large vocabulary speech recognition systems, however, are currently prohibitive for such low power devices. This work begins anexploration of domain specific characteristics of speech recognition that might be exploited to achieve the requisite performance within the power constraints of such devices. We focus primarily on architectural techniques to exploit the massive amounts of potential thread level parallelism apparent in this application domain, and consider the performance / power trade-offs of such architectures. Our results show that a simple, multi-threaded, multi-pipelined processor architecture can significantly improve the performance of the time-consuming search phase of modern speech recognition algorithms, and may reduce overall energy consumption by drastically reducing dissipation of static power. We also show that the primary hurdle to achieving these performance benefits is the data request rate into the memory system, and consider some initial solutions to this problem.

References

Intel PXA250 processor. http://developer.intel.com/.Google Scholar
Micron Technologies. http://www.micron.com/.Google Scholar
Project Gutenberg. http://promo.net/pg/.Google Scholar
Simplescalar toolset. http://www.simplescalar.com.Google Scholar
K. Agaram, S. Keckler, and D. Burger. A characterization of speech recognition on modern computer systems. In Proceedings of 4th Annual Workshop on Workload Characterization, December 2001. Google ScholarDigital Library
T. Anantharaman and B. Bisiani. A hardware accelerator for speech recognition algorithms. In Proceedings of the 13th Annual Intl. Symposium on Computer Architecture, pages 216--223, 1986. Google ScholarDigital Library
D. Brooks, V. Tiwari, and M. Martonosi. Wattch: a framework for architectural-level power analysis and optimizations. In ISCA, pages 83--94, 2000. Google ScholarDigital Library
S. Chatterjee and P. Agrawal. Connected speech recognition on a multiple processor pipeline. volume 2, pages 774--777, May 1989.Google Scholar
C.Lai, S.Su, and Q.Zhao. Performance analysis of speech recognition software. In Proceedings of 5th Workshop on Computer Architecture Evaluation using Commercial Workloads, February 2002.Google Scholar
P. Clarkson and R. Rosenfeld. Statistical language modeling using the CMU-Cambridge toolkit. In Proceedings of EUROSPEECH'97, pages 2707--2710, 1997.Google Scholar
C.Zhang, F. Vahid, and W. Najjar. A Highly--Configurable Cache Architecture for Embedded Systems. In 30th Annual International Symposium on Computer Architecture, June 2003. Google ScholarDigital Library
H. Hon. A survey of hardware architectures designed for speech recognition. Technical Report CMU-CS-91-169, August 1991.Google Scholar
X. Huang, F. Alleva, H.-W. Hon, M.-Y. Hwang, K.-F. Lee, and R. Rosenfeld. The SPHINX-II speech recognition system: an overview. Computer Speech and Language, 7(2):137--148, 1993.Google ScholarCross Ref
D. Jagger and D. Seal. ARM Architecture Reference Manual (2nd edition). Addison--Wesley, 2000. Google ScholarDigital Library
G. Karypis. Metis family of multilevel partitioning algorithms. http://www-users.cs.umn.edu/~karypis/metis/metis/index.html.Google Scholar
S. Kaxiras, G. Narlikar, A. Berenbaum, and Z. Hu. Comparing Power Consumption of an SMT and a CMP DSP for Mobile Phone Workloads. In International Conference on Compilers, Architectures, and Synthesis for Embedded Systems (CASES), November 2001. Google ScholarDigital Library
R. Krishna, S. Mahlke, and T. Austin. Insights into the memory demands of speech recognition algorithms. In Proceedings of the 2nd Annual Workshop on Memory Performance Issues, May 2002.Google Scholar
K. Lee, H. Hon, and R. Reddy. An overview of the SPHINX speech recognition system. IEEE Transactions on Acoustics, Speech and Signal Processing, 34:35--44, 1990.Google ScholarCross Ref
K.-F. Lee. Automatic Speech Recognition: The Development of the SPHINX System. Klewer Academic Publishers, 1989. Google ScholarDigital Library
L. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of IEEE, 77(2):257--286, February 1989.Google ScholarCross Ref
L. Rabiner and B.-H. Juang. Fundamentals of Speech Recognition. Prentice Hall, 1993. Google ScholarDigital Library
M. Ravishankar. Parallel implementation of fast beam search for speaker-independent continuous speech recognition. Computer Science & Automation, 1993.Google Scholar
R. Sasanka, S. Adve, Y. Chen, and E.Debes. Comparing the Energy Efficiency of CMP and SMT Architectures for Multimedia Workloads. Technical Report UIUCDCS-R-2003-2325, 2003.Google Scholar
P. Shivakumar and N. Jouppi. CACTI 3.0: An integrated cache timing, power, and area model. Technical report, August 2000.Google Scholar
D. Wang and B. Jacobs. MASE DRAM memory simulator manual. http://www.ece.umd.edu/courses/enee759h.S2003/references/mase_dram.pdf.Google Scholar
S. Young. Large vocabulary continuous speech recognition: A review. Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding, Snowbird, Utah, pages 3--28, December 1995.Google Scholar

Index Terms

Architectural optimizations for low-power, real-time speech recognition
1. Computer systems organization
2. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Speech recognition

Recommendations

Hardware speech recognition for user interfaces in low cost, low power devices
DAC '05: Proceedings of the 42nd annual Design Automation Conference

We propose a system architecture for real-time hardware speech recognition on low-cost, power-constrained devices. The system is intended to support real-time speech-based user interfaces as part of an effort to bring Information and Communication ...
Read More
An ultra low-power hardware accelerator for automatic speech recognition
MICRO-49: The 49th Annual IEEE/ACM International Symposium on Microarchitecture

Automatic Speech Recognition (ASR) is becoming increasingly ubiquitous, especially in the mobile segment. Fast and accurate ASR comes at a high energy cost which is not affordable for the tiny power budget of mobile devices. Hardware acceleration can ...
Read More
Low-Power Automatic Speech Recognition Through a Mobile GPU and a Viterbi Accelerator

Automatic speech recognition (ASR) has become a core technology for mobile devices. Delivering real-time and accurate ASR has a huge computational cost, which is challenging to achieve in tightly energy-constrained platforms such as mobile devices. A ...
Read More

Reviews

Reviewer: Klaus K. Obermeier

Ever wondered how much continuous speech recognition could be done with one AA battery__?__ Eighteen thousand, 180,000, or even 1.8 million words__?__ Given an imputed demand for speech recognition on handheld devices, the dilemma is clear: massive parallel processing algorithms face severely limited AA battery power reservoirs. The authors argue that a simplified multi-threaded architecture that uses sublanguage information and decentralized controllers to reduce combinatorics in processing speech improves search efficiency, cuts down on the rate of data requests into the memory system, and, consequently, uses less power. The paper briefly introduces the state-of-the-art of speech processing, succinctly presents the authors’ proposal for a system architecture that could effectively be used for handheld devices, and presents a thorough, seven-page performance evaluation, before concluding with a cogent summary of related work and future research directions. The findings are threefold. First, high-concurrency execution environments with latency tolerance improve speech recognition. Second, reduction of static power dissipation leads to less energy consumption for a given task. Third, the crux in improving performance lies in optimizing the memory system, and reducing heat dissipation during power consumption. The authors extrapolate a performance of about 95 to 100 words per minute (18,000 words) for three hours of AA battery life. The do-ability is almost certain; the actual usability is a different story. Until speech recognizers go beyond is-this-what-you-mean confirmation prompts, and handle rudimentary dialogue without repetitive user input, battery life takes a back seat to the creature comforts of real-life spoken interaction. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CASES '03: Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
October 2003
340 pages
ISBN:1581136765
DOI:10.1145/951710
General Chairs:
Jaime Moreno
IBM Research
,
Praveen Murthy
Fujitsu Labs of America
,
Program Chairs:
Tom Conte
North Carolina State University
,
Paolo Faraboschi
HP Labs
Copyright © 2003 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 October 2003
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
CASES '03 Paper Acceptance Rate31of162submissions,19%Overall Acceptance Rate52of230submissions,23%
More
Upcoming Conference
ESWEEK '24

Sponsor:

sigbed

sigbed

sigbed

Twentieth Embedded Systems Week

September 29 - October 4, 2024

Raleigh , NC , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 25
  Total Citations
  View Citations
- 1,155
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Architectural optimizations for low-power, real-time speech recognition

CASES '03: Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Hardware speech recognition for user interfaces in low cost, low power devices

An ultra low-power hardware accelerator for automatic speech recognition

Low-Power Automatic Speech Recognition Through a Mobile GPU and a Viterbi Accelerator

Reviews

Access critical reviews of Computing literature here