Recent Advances in the processing capabilities of handheld devices (PDAs or mobile phones) have provided the opportunity for enablement of speech recognition system, and even end-to-end speech translation system on these devices. However,
speech-to-speech translation (as opposite to fixed phrase translation) is a highly complex task. A large amount of computation is involved to achieve reliable transformation performance. Resource limitations are not just CPU speed, but also the memory and storage requirements, and the audio input and output requirements all tax current systems to their limits. When the resource demand exceeds the computational capability of available state-of-the-art hand-held devices, a common technique for mobile speech-to-speech translation system is to use a client-server approach, where the handheld device (a mobile phone or PDA) is treated simply as a system client. While we will briefly describe the client/server approach, we will mainly focus on the approach that the end-to-end speech-to-speech translation system is completely hosted on the handheld devices. We will describe the challenges and algorithm and code optimization solutions we developed for the handheld MASTOR systems (Multilingual Automatic Speech-to-Speech Translator) for between English and Mandarin Chinese, and between English and Arabic on embedded Linux and Windows CE operating systems. The system includes an HMM-based large vocabulary continuous speech recognizer using statistical n-grams, a translation module, and a multi-language speech synthesis system.