This chapter is related to the speech coding and packet loss problems that occur in network speech recognition where speech is transmitted (and most of the time coded) from a client terminal to a recognition server. The first part describes some commonly used speech coding standards and presents a packet loss model useful to evaluate different channel degradation conditions in a controlled fashion. The second part evaluates the influence of different speech and audio codecs on the performance of a continuous speech recognition engine. It is shown that MPEG transcoding degrades the speech recognition performance for low bit rates whereas performance remains acceptable for specialized speech coders like G723. The same system is also evaluated for different simulated and real packet loss conditions; in that case, the significant degradation of the automatic speech recognition (ASR) performance is analyzed. The third part presents an overview of joint compression and packet loss effects on speech biometrics. Conversely to the ASR task, it is experimentally demonstrated that the adverse effects of packet loss alone are negligible, while the encoding of speech, particularly at a low bit rate, coupled with packet loss, can reduce the speaker recognition accuracy considerably. The fourth part discusses these experimental observations and refers to robustness approaches.
Weitere Kapitel dieses Buchs durch Wischen aufrufen
Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten
Sie möchten Zugang zu diesem Inhalt erhalten? Dann informieren Sie sich jetzt über unsere Produkte:
- Speech Coding and Packet Loss Effects on Speech and Speaker Recognition
- Springer London