skip to main content
10.1145/951710.951740acmconferencesArticle/Chapter ViewAbstractPublication PagesesweekConference Proceedingsconference-collections
Article

Architectural optimizations for low-power, real-time speech recognition

Published:30 October 2003Publication History

ABSTRACT

The proliferation of computing technology to low power domains such as hand--held devices has lead to increased interest in portable interface technologies, with particular interest in speech recognition. The computational demands of robust, large vocabulary speech recognition systems, however, are currently prohibitive for such low power devices. This work begins anexploration of domain specific characteristics of speech recognition that might be exploited to achieve the requisite performance within the power constraints of such devices. We focus primarily on architectural techniques to exploit the massive amounts of potential thread level parallelism apparent in this application domain, and consider the performance / power trade-offs of such architectures. Our results show that a simple, multi-threaded, multi-pipelined processor architecture can significantly improve the performance of the time-consuming search phase of modern speech recognition algorithms, and may reduce overall energy consumption by drastically reducing dissipation of static power. We also show that the primary hurdle to achieving these performance benefits is the data request rate into the memory system, and consider some initial solutions to this problem.

References

  1. Intel PXA250 processor. http://developer.intel.com/.Google ScholarGoogle Scholar
  2. Micron Technologies. http://www.micron.com/.Google ScholarGoogle Scholar
  3. Project Gutenberg. http://promo.net/pg/.Google ScholarGoogle Scholar
  4. Simplescalar toolset. http://www.simplescalar.com.Google ScholarGoogle Scholar
  5. K. Agaram, S. Keckler, and D. Burger. A characterization of speech recognition on modern computer systems. In Proceedings of 4th Annual Workshop on Workload Characterization, December 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. T. Anantharaman and B. Bisiani. A hardware accelerator for speech recognition algorithms. In Proceedings of the 13th Annual Intl. Symposium on Computer Architecture, pages 216--223, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Brooks, V. Tiwari, and M. Martonosi. Wattch: a framework for architectural-level power analysis and optimizations. In ISCA, pages 83--94, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Chatterjee and P. Agrawal. Connected speech recognition on a multiple processor pipeline. volume 2, pages 774--777, May 1989.Google ScholarGoogle Scholar
  9. C.Lai, S.Su, and Q.Zhao. Performance analysis of speech recognition software. In Proceedings of 5th Workshop on Computer Architecture Evaluation using Commercial Workloads, February 2002.Google ScholarGoogle Scholar
  10. P. Clarkson and R. Rosenfeld. Statistical language modeling using the CMU-Cambridge toolkit. In Proceedings of EUROSPEECH'97, pages 2707--2710, 1997.Google ScholarGoogle Scholar
  11. C.Zhang, F. Vahid, and W. Najjar. A Highly--Configurable Cache Architecture for Embedded Systems. In 30th Annual International Symposium on Computer Architecture, June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. H. Hon. A survey of hardware architectures designed for speech recognition. Technical Report CMU-CS-91-169, August 1991.Google ScholarGoogle Scholar
  13. X. Huang, F. Alleva, H.-W. Hon, M.-Y. Hwang, K.-F. Lee, and R. Rosenfeld. The SPHINX-II speech recognition system: an overview. Computer Speech and Language, 7(2):137--148, 1993.Google ScholarGoogle ScholarCross RefCross Ref
  14. D. Jagger and D. Seal. ARM Architecture Reference Manual (2nd edition). Addison--Wesley, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. G. Karypis. Metis family of multilevel partitioning algorithms. http://www-users.cs.umn.edu/~karypis/metis/metis/index.html.Google ScholarGoogle Scholar
  16. S. Kaxiras, G. Narlikar, A. Berenbaum, and Z. Hu. Comparing Power Consumption of an SMT and a CMP DSP for Mobile Phone Workloads. In International Conference on Compilers, Architectures, and Synthesis for Embedded Systems (CASES), November 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. Krishna, S. Mahlke, and T. Austin. Insights into the memory demands of speech recognition algorithms. In Proceedings of the 2nd Annual Workshop on Memory Performance Issues, May 2002.Google ScholarGoogle Scholar
  18. K. Lee, H. Hon, and R. Reddy. An overview of the SPHINX speech recognition system. IEEE Transactions on Acoustics, Speech and Signal Processing, 34:35--44, 1990.Google ScholarGoogle ScholarCross RefCross Ref
  19. K.-F. Lee. Automatic Speech Recognition: The Development of the SPHINX System. Klewer Academic Publishers, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. L. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of IEEE, 77(2):257--286, February 1989.Google ScholarGoogle ScholarCross RefCross Ref
  21. L. Rabiner and B.-H. Juang. Fundamentals of Speech Recognition. Prentice Hall, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Ravishankar. Parallel implementation of fast beam search for speaker-independent continuous speech recognition. Computer Science & Automation, 1993.Google ScholarGoogle Scholar
  23. R. Sasanka, S. Adve, Y. Chen, and E.Debes. Comparing the Energy Efficiency of CMP and SMT Architectures for Multimedia Workloads. Technical Report UIUCDCS-R-2003-2325, 2003.Google ScholarGoogle Scholar
  24. P. Shivakumar and N. Jouppi. CACTI 3.0: An integrated cache timing, power, and area model. Technical report, August 2000.Google ScholarGoogle Scholar
  25. D. Wang and B. Jacobs. MASE DRAM memory simulator manual. http://www.ece.umd.edu/courses/enee759h.S2003/references/mase_dram.pdf.Google ScholarGoogle Scholar
  26. S. Young. Large vocabulary continuous speech recognition: A review. Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding, Snowbird, Utah, pages 3--28, December 1995.Google ScholarGoogle Scholar

Index Terms

  1. Architectural optimizations for low-power, real-time speech recognition

          Recommendations

          Reviews

          Klaus K. Obermeier

          Ever wondered how much continuous speech recognition could be done with one AA battery__?__ Eighteen thousand, 180,000, or even 1.8 million words__?__ Given an imputed demand for speech recognition on handheld devices, the dilemma is clear: massive parallel processing algorithms face severely limited AA battery power reservoirs. The authors argue that a simplified multi-threaded architecture that uses sublanguage information and decentralized controllers to reduce combinatorics in processing speech improves search efficiency, cuts down on the rate of data requests into the memory system, and, consequently, uses less power. The paper briefly introduces the state-of-the-art of speech processing, succinctly presents the authors’ proposal for a system architecture that could effectively be used for handheld devices, and presents a thorough, seven-page performance evaluation, before concluding with a cogent summary of related work and future research directions. The findings are threefold. First, high-concurrency execution environments with latency tolerance improve speech recognition. Second, reduction of static power dissipation leads to less energy consumption for a given task. Third, the crux in improving performance lies in optimizing the memory system, and reducing heat dissipation during power consumption. The authors extrapolate a performance of about 95 to 100 words per minute (18,000 words) for three hours of AA battery life. The do-ability is almost certain; the actual usability is a different story. Until speech recognizers go beyond is-this-what-you-mean confirmation prompts, and handle rudimentary dialogue without repetitive user input, battery life takes a back seat to the creature comforts of real-life spoken interaction. Online Computing Reviews Service

          Access critical reviews of Computing literature here

          Become a reviewer for Computing Reviews.

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            CASES '03: Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
            October 2003
            340 pages
            ISBN:1581136765
            DOI:10.1145/951710

            Copyright © 2003 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 30 October 2003

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • Article

            Acceptance Rates

            CASES '03 Paper Acceptance Rate31of162submissions,19%Overall Acceptance Rate52of230submissions,23%

            Upcoming Conference

            ESWEEK '24
            Twentieth Embedded Systems Week
            September 29 - October 4, 2024
            Raleigh , NC , USA

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader