Skip to main content

2015 | OriginalPaper | Buchkapitel

4. Recent Speech Coding Technologies and Standards

verfasst von : Daniel J. Sinder, Imre Varga, Venkatesh Krishnan, Vivek Rajendran, Stéphane Villette

Erschienen in: Speech and Audio Processing for Coding, Enhancement and Recognition

Verlag: Springer New York

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This chapter presents an overview of recent developments in conversational speech coding technologies, important new algorithmic advances, and recent standardization activities in ITU-T, 3GPP, 3GPP2, MPEG and IETF that offer a significantly improved user experience during voice calls on existing and future communication systems. User experience is determined by speech quality, hence network operators are very concerned about quality of speech coders. Operators are also concerned about capacity, hence coding efficiency is another important measure. Advanced speech coding technologies provide the capability to both improve coding efficiency and user experience. One option to improve quality is to extend the audio bandwidth from traditional narrowband to wideband (16 kHz sampling) and super-wideband (32 kHz sampling). Another method is in increasing the robustness of the coder against transmission errors. Error concealment algorithms are used which substitute the missing parts of the audio signal as far as possible. In packet-switched applications (VoIP systems), special mechanisms are included in jitter buffer management (JBM) algorithms to maximize sound quality. It is of high importance to ensure standardization and deployment of speech coders that meet quality expectations. As an example of this, we refer to the Enhanced Voice Services (EVS) project in 3GPP that is developing the next generation speech coder in 3GPP. The basic motivation for 3GPP to start the EVS project was to extend the path of codec evolution by providing super-wideband experience at around 13 kb/s and better quality for music and mixed content in conversational applications. Optimized behavior in VoIP applications is achieved through the introduction of high error robustness, jitter buffer management, inclusion of source-controlled variable bit rate operation, support of various audio bandwidths, and stereo.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
11.
Zurück zum Zitat B. Bessette, The adaptive multirate wideband speech codec (AMR-WB). IEEE Trans. Speech Audio Process. 10, 620–636 (2002)CrossRef B. Bessette, The adaptive multirate wideband speech codec (AMR-WB). IEEE Trans. Speech Audio Process. 10, 620–636 (2002)CrossRef
12.
Zurück zum Zitat B. Geiser, P. Jax, P. Vary, H. Taddei, S. Schandl, M. Gartner, C. Guillaum’e, S. Ragot, Bandwidth extension for hierarchical speech and audio coding in ITU-T Rec. G.729.1. IEEE Trans. Audio Speech Lang. Process. 15(8), 2496–2509 (2007) B. Geiser, P. Jax, P. Vary, H. Taddei, S. Schandl, M. Gartner, C. Guillaum’e, S. Ragot, Bandwidth extension for hierarchical speech and audio coding in ITU-T Rec. G.729.1. IEEE Trans. Audio Speech Lang. Process. 15(8), 2496–2509 (2007)
13.
Zurück zum Zitat B. Geiser, P. Vary, High rate data hiding in ACELP speech codecs, in IEEE International Conference on Acoustics, Speech and Signal Processing, 2008 (ICASSP 2008) (2008), pp. 4005–4008. doi:10.1109/ICASSP.2008.4518532 B. Geiser, P. Vary, High rate data hiding in ACELP speech codecs, in IEEE International Conference on Acoustics, Speech and Signal Processing, 2008 (ICASSP 2008) (2008), pp. 4005–4008. doi:10.​1109/​ICASSP.​2008.​4518532
15.
Zurück zum Zitat Y. Hiwasaki, H. Ohmuro, ITU-T G.711.1: Extending G.711 to higher-quality wideband speech. IEEE Commun. Mag. 47(10), 110–116 (2009) Y. Hiwasaki, H. Ohmuro, ITU-T G.711.1: Extending G.711 to higher-quality wideband speech. IEEE Commun. Mag. 47(10), 110–116 (2009)
17.
Zurück zum Zitat ITU-T Recommendation G.711 Appendix I: Lower-band postfiltering for R1 mode (2012) ITU-T Recommendation G.711 Appendix I: Lower-band postfiltering for R1 mode (2012)
18.
Zurück zum Zitat ITU-T Recommendation G.711.0: Lossless compression for G.711 PCM (2009) ITU-T Recommendation G.711.0: Lossless compression for G.711 PCM (2009)
19.
Zurück zum Zitat ITU-T Recommendation G.711.1: Wideband embedded extension for ITU-T G.711 (2012) ITU-T Recommendation G.711.1: Wideband embedded extension for ITU-T G.711 (2012)
20.
Zurück zum Zitat ITU-T Recommendation G.711.1 Annex C: Lossless compression of ITU-T G.711 PCM compatible bitstream in ITU-T G.711.1 (2012) ITU-T Recommendation G.711.1 Annex C: Lossless compression of ITU-T G.711 PCM compatible bitstream in ITU-T G.711.1 (2012)
21.
Zurück zum Zitat ITU-T Recommendation G.711.1 Annex D: Superwideband extension (2012) ITU-T Recommendation G.711.1 Annex D: Superwideband extension (2012)
22.
Zurück zum Zitat ITU-T Recommendation G.711.1 Annex F: Stereo embedded extension for ITU-T G.711.1 (2012) ITU-T Recommendation G.711.1 Annex F: Stereo embedded extension for ITU-T G.711.1 (2012)
23.
Zurück zum Zitat ITU-T Recommendation G.711.1 Appendix IV: Mid-side stereo (2012) ITU-T Recommendation G.711.1 Appendix IV: Mid-side stereo (2012)
24.
Zurück zum Zitat ITU-T Recommendation G.718: Frame error robust narrow-band and wideband embedded variable bit-rate coding of speech and audio from 8–32 kbit/s (2008) ITU-T Recommendation G.718: Frame error robust narrow-band and wideband embedded variable bit-rate coding of speech and audio from 8–32 kbit/s (2008)
25.
Zurück zum Zitat ITU-T Recommendation G.718 Annex B: Superwideband scalable extension for G.718 (2009) ITU-T Recommendation G.718 Annex B: Superwideband scalable extension for G.718 (2009)
26.
Zurück zum Zitat ITU-T Recommendation G.719: Low-complexity full-band audio coding for high-quality conversational applications (2008) ITU-T Recommendation G.719: Low-complexity full-band audio coding for high-quality conversational applications (2008)
27.
Zurück zum Zitat ITU-T Recommendation G.722: 7 kHz Audio coding within 64 kb/s (2012) ITU-T Recommendation G.722: 7 kHz Audio coding within 64 kb/s (2012)
28.
Zurück zum Zitat ITU-T Recommendation G.722 Annex B: Superwideband embedded extension for G.722 (2012) ITU-T Recommendation G.722 Annex B: Superwideband embedded extension for G.722 (2012)
29.
Zurück zum Zitat ITU-T Recommendation G.722 Annex D: Stereo embedded extension for G.722 (2012) ITU-T Recommendation G.722 Annex D: Stereo embedded extension for G.722 (2012)
30.
Zurück zum Zitat ITU-T Recommendation G.722 Appendix III: A high-quality packet loss concealment algorithm for G.722 (2012) ITU-T Recommendation G.722 Appendix III: A high-quality packet loss concealment algorithm for G.722 (2012)
31.
Zurück zum Zitat ITU-T Recommendation G.722 Appendix IV: A low-complexity packet loss concealment algorithm for G.722 (2012) ITU-T Recommendation G.722 Appendix IV: A low-complexity packet loss concealment algorithm for G.722 (2012)
32.
Zurück zum Zitat ITU-T Recommendation G.722 Appendix V: Mid-side stereo (2012) ITU-T Recommendation G.722 Appendix V: Mid-side stereo (2012)
33.
Zurück zum Zitat ITU-T Recommendation G.729.1: G.729 Based embedded variable bit-rate coder: An 8–32 kb/s scalable wideband coder bitstream interoperable with G.729 (2006) ITU-T Recommendation G.729.1: G.729 Based embedded variable bit-rate coder: An 8–32 kb/s scalable wideband coder bitstream interoperable with G.729 (2006)
34.
Zurück zum Zitat ITU-T Recommendation G.729.1 Annex E: Superwideband scalable extension for G.729.1 (2010) ITU-T Recommendation G.729.1 Annex E: Superwideband scalable extension for G.729.1 (2010)
36.
Zurück zum Zitat V. Krishnan, V. Rajendran, A. Kandhadai, S. Manjunath, EVRC-Wideband: the new 3GPP2 wideband vocoder standard, in IEEE International Conference on Acoustics, Speech and Signal Processing, 2007 (ICASSP 2007), vol. 2 (2007), pp. II-333–II-336. doi:10.1109/ICASSP.2007.366240 V. Krishnan, V. Rajendran, A. Kandhadai, S. Manjunath, EVRC-Wideband: the new 3GPP2 wideband vocoder standard, in IEEE International Conference on Acoustics, Speech and Signal Processing, 2007 (ICASSP 2007), vol. 2 (2007), pp. II-333–II-336. doi:10.​1109/​ICASSP.​2007.​366240
37.
38.
Zurück zum Zitat M. Dietz, L. Liljeryd, K. Kjorling, O. Kunz, Spectral band replication, a novel approach in audio coding, in Proceedings of the 112th Convention of the Audio Engineering Society, vol. 1 (2002) M. Dietz, L. Liljeryd, K. Kjorling, O. Kunz, Spectral band replication, a novel approach in audio coding, in Proceedings of the 112th Convention of the Audio Engineering Society, vol. 1 (2002)
39.
Zurück zum Zitat J. Makhoul, M. Berouti, High frequency regeneration in speech coding systems, in Proceedings of IEEE ICASSP, vol. 1 (1979) J. Makhoul, M. Berouti, High frequency regeneration in speech coding systems, in Proceedings of IEEE ICASSP, vol. 1 (1979)
40.
Zurück zum Zitat J. Makinen, B. Bessette, S. Bruhn, P. Ojala, R. Salami, A. Taleb, AMR-WB+: a new audio coding standard for 3rd generation mobile audio services, in Proceedings of IEEE ICASSP, vol. 2 (2005) J. Makinen, B. Bessette, S. Bruhn, P. Ojala, R. Salami, A. Taleb, AMR-WB+: a new audio coding standard for 3rd generation mobile audio services, in Proceedings of IEEE ICASSP, vol. 2 (2005)
42.
Zurück zum Zitat J. Sjoberg, M. Westerlund, A. Lakaniemi, Q. Xie, RTP Payload Format and File Storage Format for the Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband (AMR-WB) Audio Codecs. RFC 4867 (Proposed Standard) (2007), http://www.ietf.org/rfc/rfc4867.txt J. Sjoberg, M. Westerlund, A. Lakaniemi, Q. Xie, RTP Payload Format and File Storage Format for the Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband (AMR-WB) Audio Codecs. RFC 4867 (Proposed Standard) (2007), http://​www.​ietf.​org/​rfc/​rfc4867.​txt
43.
Zurück zum Zitat H. Taddei, I. Varga, L. Gros, C. Quinquis, J.Y., Monfort, F. Mertz, T. Clevorn, Evaluation of AMR-NB and AMR-WB in packet switched conversational communications, in International Conference on Multimedia and Expo (ICME) (2004) H. Taddei, I. Varga, L. Gros, C. Quinquis, J.Y., Monfort, F. Mertz, T. Clevorn, Evaluation of AMR-NB and AMR-WB in packet switched conversational communications, in International Conference on Multimedia and Expo (ICME) (2004)
44.
Zurück zum Zitat I. Varga, R.D.D. Iacovo, P. Usai, Standardization of the AMR wideband speech codec in 3GPP and ITU-T. IEEE Commun. Mag. 44(5), 66–73 (2006)CrossRef I. Varga, R.D.D. Iacovo, P. Usai, Standardization of the AMR wideband speech codec in 3GPP and ITU-T. IEEE Commun. Mag. 44(5), 66–73 (2006)CrossRef
45.
Zurück zum Zitat I. Varga, S. Proust, H. Taddei, ITU-T G.729.1 scalable codec for new wideband services. IEEE Commun. Mag. 47(10), 131–137 (2009) I. Varga, S. Proust, H. Taddei, ITU-T G.729.1 scalable codec for new wideband services. IEEE Commun. Mag. 47(10), 131–137 (2009)
46.
Zurück zum Zitat S. Voran, Subjective ratings of instantaneous and gradual transitions from narrowband to wideband active speech, in Proceedings of IEEE ICASSP (2010) S. Voran, Subjective ratings of instantaneous and gradual transitions from narrowband to wideband active speech, in Proceedings of IEEE ICASSP (2010)
47.
Zurück zum Zitat M. Yavuz, S. Diaz, R. Kapoor, M. Grob, P. Black, Y. Tokgoz, C. Lott, VoIP over cdma2000 1xEV-DO revision A. IEEE Commun. Mag. 44(2), 88–95 (2006)CrossRef M. Yavuz, S. Diaz, R. Kapoor, M. Grob, P. Black, Y. Tokgoz, C. Lott, VoIP over cdma2000 1xEV-DO revision A. IEEE Commun. Mag. 44(2), 88–95 (2006)CrossRef
Metadaten
Titel
Recent Speech Coding Technologies and Standards
verfasst von
Daniel J. Sinder
Imre Varga
Venkatesh Krishnan
Vivek Rajendran
Stéphane Villette
Copyright-Jahr
2015
Verlag
Springer New York
DOI
https://doi.org/10.1007/978-1-4939-1456-2_4

Neuer Inhalt