Skip to main content
Erschienen in: International Journal of Speech Technology 3/2022

24.06.2022

Discrete cosine transform-based data hiding for speech bandwidth extension

verfasst von: Sunil Kumar Koduri, Kishore Kumar T

Erschienen in: International Journal of Speech Technology | Ausgabe 3/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The limited narrow frequency range of 300–3400 Hz used in public switched telephone networks causes a significant reduction of speech quality. To address this drawback, a new robust transform-domain speech bandwidth extension method is proposed in this paper. The method uses the discrete Cosine transform-based data hiding (DCTBDH) technique to provide a better-quality wideband speech signal. The spectral envelope parameters are extracted from the high-frequency components of speech signal existing above narrowband, which are then spread by using spreading sequences, and are embedded within the DCT coefficients of narrowband signal. A better-quality wideband signal is reconstructed using the extracted embedded information at the receiver end. In simulations, the high-quality wideband speech was obtained from speech transmitted over a public switched telephone network. The spectral envelope parameters of the high-frequency components of the speech signal are transparently embedded with a mean square error of 5.78 × 10–4. In a mean opinion score (MOS) listening test, we verified that the proposed method yields improved perceptual transparency compared to conventional methods of about 0.21 points on the MOS scale. The log spectral distortion value obtained was 2.2248 which showed that the proposed technique yields an improved quality of speech signal compared to conventional methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Abel, J., & Fingscheidt, T. (2017). A DNN Regression Approach to Speech Enhancement by Artificial Bandwidth Extension. In Proceedings of IEEE workshop on applications of signal processing to audio and acoustics, pp. 219–223. Abel, J., & Fingscheidt, T. (2017). A DNN Regression Approach to Speech Enhancement by Artificial Bandwidth Extension. In Proceedings of IEEE workshop on applications of signal processing to audio and acoustics, pp. 219–223.
Zurück zum Zitat Archit, G., Brendan, S., Yannis, A. & Thomas, C. W. (2019). Speech bandwidth extension with wavenet. In Proceedings of IEEE workshop on applications of signal processing to audio and acoustics, pp. 205–208. Archit, G., Brendan, S., Yannis, A. & Thomas, C. W. (2019). Speech bandwidth extension with wavenet. In Proceedings of IEEE workshop on applications of signal processing to audio and acoustics, pp. 205–208.
Zurück zum Zitat Berthy, F., Zeyu, J., Jiaqi, S., & Adam, F. (2019). Learning bandwidth expansion using perceptually-motivated loss. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp. 606–610. Berthy, F., Zeyu, J., Jiaqi, S., & Adam, F. (2019). Learning bandwidth expansion using perceptually-motivated loss. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp. 606–610.
Zurück zum Zitat Bhatt, N., & Kosta, Y. (2015). A novel approach for artificial bandwidth extension of speech signals by LPC technique over proposed GSM FR NB coder using high band feature extraction and various extension of excitation methods. International Journal of Speech Technology, 18(1), 57–64.CrossRef Bhatt, N., & Kosta, Y. (2015). A novel approach for artificial bandwidth extension of speech signals by LPC technique over proposed GSM FR NB coder using high band feature extraction and various extension of excitation methods. International Journal of Speech Technology, 18(1), 57–64.CrossRef
Zurück zum Zitat Bong-Ki, L., Kyoungjin, N., Joon-Hyuk, C., Kihyun, Ch., & Eunmi, O. (2018). Sequential deep neural networks ensemble for speech bandwidth extension. IEEE Access, 6, 27039–27047.CrossRef Bong-Ki, L., Kyoungjin, N., Joon-Hyuk, C., Kihyun, Ch., & Eunmi, O. (2018). Sequential deep neural networks ensemble for speech bandwidth extension. IEEE Access, 6, 27039–27047.CrossRef
Zurück zum Zitat Chen, S., & Leung, H. (2005). Artificial bandwidth extension of telephony speech by data hiding. In Proceedings of International Symposium on Circuits and Systems (ISCAS), pp. 3151–3154. Chen, S., & Leung, H. (2005). Artificial bandwidth extension of telephony speech by data hiding. In Proceedings of International Symposium on Circuits and Systems (ISCAS), pp. 3151–3154.
Zurück zum Zitat Chen, S., Leung, H., & Ding, H. (2007). Telephony speech enhancement by data hiding. IEEE Transactions on Instrumentation and Measurement, 56(1), 63–74.CrossRef Chen, S., Leung, H., & Ding, H. (2007). Telephony speech enhancement by data hiding. IEEE Transactions on Instrumentation and Measurement, 56(1), 63–74.CrossRef
Zurück zum Zitat Chen, S., & Leung, H. (2007). Speech bandwidth extension by data hiding and phonetic classification. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp. 593–596. Chen, S., & Leung, H. (2007). Speech bandwidth extension by data hiding and phonetic classification. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp. 593–596.
Zurück zum Zitat Chen, Z., Zhao, C., Geng, G., & Yin, F. (2013). An audio watermark based speech bandwidth extension method. EURASIP Journal Audio, Speech and Music Processing, 10, 1–8. Chen, Z., Zhao, C., Geng, G., & Yin, F. (2013). An audio watermark based speech bandwidth extension method. EURASIP Journal Audio, Speech and Music Processing, 10, 1–8.
Zurück zum Zitat Dinan, E. H., & Jabbari, E. H. (1998). Spreading codes for direct sequence CDMA and wideband CDMA cellular networks. IEEE Communications Magazine, 36(9), 48–54.CrossRef Dinan, E. H., & Jabbari, E. H. (1998). Spreading codes for direct sequence CDMA and wideband CDMA cellular networks. IEEE Communications Magazine, 36(9), 48–54.CrossRef
Zurück zum Zitat ETSI ES 201 108 V1.1.2 (2000). Speech Processing, Transmission and Quality Aspects (STQ); Distributed speech recognition; Front-end feature extraction algorithm; Compression algorithms. ETSI ES 201 108 V1.1.2 (2000). Speech Processing, Transmission and Quality Aspects (STQ); Distributed speech recognition; Front-end feature extraction algorithm; Compression algorithms.
Zurück zum Zitat Garofalo, J. S., Lamel, L. F., & Fisher, W. M. (2013). Getting started with the DARPA TIMIT CD-ROM: An acoustic phonetic continuous speech database, National Institute of Standards and Technology (NIST). Garofalo, J. S., Lamel, L. F., & Fisher, W. M. (2013). Getting started with the DARPA TIMIT CD-ROM: An acoustic phonetic continuous speech database, National Institute of Standards and Technology (NIST).
Zurück zum Zitat Geiser, B., Jax, P., & Vary, P. (2005). Artificial bandwidth extension of speech supported by watermark-transmitted side information. In Proceedings of the 9th European Conference on Speech Communication and Technology, pp. 1497–1500. Geiser, B., Jax, P., & Vary, P. (2005). Artificial bandwidth extension of speech supported by watermark-transmitted side information. In Proceedings of the 9th European Conference on Speech Communication and Technology, pp. 1497–1500.
Zurück zum Zitat Geiser, B., & Vary, P. (2007). Backwards compatible wideband telephony in mobile networks: CELP watermarking and bandwidth extension. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp. 533–536. Geiser, B., & Vary, P. (2007). Backwards compatible wideband telephony in mobile networks: CELP watermarking and bandwidth extension. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp. 533–536.
Zurück zum Zitat Geiser, B., & Vary, P. (2013). Speech bandwidth extension based on in-band transmission of higher frequencies. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp. 7507–7511. Geiser, B., & Vary, P. (2013). Speech bandwidth extension based on in-band transmission of higher frequencies. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp. 7507–7511.
Zurück zum Zitat Goldsmith, A. (2006). Wireless communications. Cambridge University Press.CrossRef Goldsmith, A. (2006). Wireless communications. Cambridge University Press.CrossRef
Zurück zum Zitat Hanzo, L. L., Somerville, F. C. A., & Woodard, J. P. (2001). Voice compression and communications: Principles and applications for fixed and wireless channels. Wiley.CrossRef Hanzo, L. L., Somerville, F. C. A., & Woodard, J. P. (2001). Voice compression and communications: Principles and applications for fixed and wireless channels. Wiley.CrossRef
Zurück zum Zitat Hassan, A., Hershey, J. E., & Saulnier, G. J. (1998). Perspectives in spread spectrum. Kluwer Academic Publishers.CrossRef Hassan, A., Hershey, J. E., & Saulnier, G. J. (1998). Perspectives in spread spectrum. Kluwer Academic Publishers.CrossRef
Zurück zum Zitat ITU-T. (2001). ITU-T Rec. P.862: Perceptual evaluation of speech quality (PESQ): An objective method for end to-end speech quality assessment of narrow-band telephone networks and speech codecs. ITU-T. (2001). ITU-T Rec. P.862: Perceptual evaluation of speech quality (PESQ): An objective method for end to-end speech quality assessment of narrow-band telephone networks and speech codecs.
Zurück zum Zitat ITU-T. (2005). Recommendation P.862.2: Wideband extension to recommendation P.862 for the assessment of wideband telephone networks and speech codecs. ITU-T. (2005). Recommendation P.862.2: Wideband extension to recommendation P.862 for the assessment of wideband telephone networks and speech codecs.
Zurück zum Zitat Jax, P. (2002). Enhancement of bandlimited speech signals: Algorithms and theoretical bounds. Ph.D. dissertation, RWTH Aachen University, Aachen, Germany. Jax, P. (2002). Enhancement of bandlimited speech signals: Algorithms and theoretical bounds. Ph.D. dissertation, RWTH Aachen University, Aachen, Germany.
Zurück zum Zitat Jax, P., & Vary, P. (2002). An upper bound on the quality of artificial bandwidth extension of narrowband speech signals. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp. 237–240. Jax, P., & Vary, P. (2002). An upper bound on the quality of artificial bandwidth extension of narrowband speech signals. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp. 237–240.
Zurück zum Zitat Jax, P., & Vary, P. (2003). On artificial bandwidth extension of telephone speech. Signal Processing, 83(8), 1707–1719.CrossRef Jax, P., & Vary, P. (2003). On artificial bandwidth extension of telephone speech. Signal Processing, 83(8), 1707–1719.CrossRef
Zurück zum Zitat Jax, P., & Vary, P. (2006). Bandwidth extension of speech signals: A catalyst for the introduction of wideband speech coding? IEEE Communication Magazine, 44(5), 106–111.CrossRef Jax, P., & Vary, P. (2006). Bandwidth extension of speech signals: A catalyst for the introduction of wideband speech coding? IEEE Communication Magazine, 44(5), 106–111.CrossRef
Zurück zum Zitat Johannes, A., & Tim, F. (2019). Sinusoidal-based lowband synthesis for artificial speech bandwidth extension. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(4), 765–776.CrossRef Johannes, A., & Tim, F. (2019). Sinusoidal-based lowband synthesis for artificial speech bandwidth extension. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(4), 765–776.CrossRef
Zurück zum Zitat Jonas, S., Friedrich, F., Markus, B., & Gerhard, S. (2019). Artificial bandwidth extension using a conditional generative adversarial network with discriminative training. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP) (pp. 7005–7009). Jonas, S., Friedrich, F., Markus, B., & Gerhard, S. (2019). Artificial bandwidth extension using a conditional generative adversarial network with discriminative training. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP) (pp. 7005–7009).
Zurück zum Zitat Kanhe, A., & Aghila, G. (2016). DCT based Audio Steganography in Voiced and Un-voiced Frames. In Proceedings of International Conference of Information and Analytics, pp. 1–4. Kanhe, A., & Aghila, G. (2016). DCT based Audio Steganography in Voiced and Un-voiced Frames. In Proceedings of International Conference of Information and Analytics, pp. 1–4.
Zurück zum Zitat Keiser, B. E., & Strange, E. (1995). Digital telephony and network integration. Van Nostrand Reinhold.CrossRef Keiser, B. E., & Strange, E. (1995). Digital telephony and network integration. Van Nostrand Reinhold.CrossRef
Zurück zum Zitat Kosta, Y. (2016). Simulation and overall comparative evaluation of performance between different techniques for high band feature extraction based on artificial bandwidth extension of speech over proposed global system for mobile full rate narrow band coder. International Journal of Speech Technology, 19(4), 881–893.CrossRef Kosta, Y. (2016). Simulation and overall comparative evaluation of performance between different techniques for high band feature extraction based on artificial bandwidth extension of speech over proposed global system for mobile full rate narrow band coder. International Journal of Speech Technology, 19(4), 881–893.CrossRef
Zurück zum Zitat Kyoungjin, N., & Joon-Hyuk, Ch. (2020). Deep neural network ensemble for reducing artificial noise in bandwidth extension. Digital Signal Processing, 102, 1–6. Kyoungjin, N., & Joon-Hyuk, Ch. (2020). Deep neural network ensemble for reducing artificial noise in bandwidth extension. Digital Signal Processing, 102, 1–6.
Zurück zum Zitat Mathieu, L., & Felix, G. (2020). Bandwidth extension of musical audio signals with no side information using dilated convolutional neural networks. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp. 801–805. Mathieu, L., & Felix, G. (2020). Bandwidth extension of musical audio signals with no side information using dilated convolutional neural networks. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp. 801–805.
Zurück zum Zitat Nilsson, M., & Kleijn. W. B. (2001). Avoiding overestimation in bandwidth extension of telephony speech. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp. 869–872. Nilsson, M., & Kleijn. W. B. (2001). Avoiding overestimation in bandwidth extension of telephony speech. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp. 869–872.
Zurück zum Zitat Pramod, B., Massimiliano, T., & Nicholas, E. (2019). Latent representation learning for artificial bandwidth extension using a conditional variational auto-encoder. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp. 7010–7014. Pramod, B., Massimiliano, T., & Nicholas, E. (2019). Latent representation learning for artificial bandwidth extension using a conditional variational auto-encoder. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp. 7010–7014.
Zurück zum Zitat Prasad, N., & Kishore Kumar, T. (2016). Bandwidth extension of speech signals: A comprehensive review. International Journal of Intelligent Systems and Applications, 8(2), 45–52.CrossRef Prasad, N., & Kishore Kumar, T. (2016). Bandwidth extension of speech signals: A comprehensive review. International Journal of Intelligent Systems and Applications, 8(2), 45–52.CrossRef
Zurück zum Zitat Prasad, N., & Kishore Kumar, T. (2017). Speech bandwidth extension aided by spectral magnitude data hiding. Circuits, Systems, and Signal Processing, 36(11), 4512–4540.CrossRef Prasad, N., & Kishore Kumar, T. (2017). Speech bandwidth extension aided by spectral magnitude data hiding. Circuits, Systems, and Signal Processing, 36(11), 4512–4540.CrossRef
Zurück zum Zitat Sagi, A., & Malah, D. (2007). Bandwidth extension of telephone speech aided by data embedding. EURASIP Journal on Advances in Signal Processing, 2007, 37–52.MATH Sagi, A., & Malah, D. (2007). Bandwidth extension of telephone speech aided by data embedding. EURASIP Journal on Advances in Signal Processing, 2007, 37–52.MATH
Zurück zum Zitat Sunil Kumar, K., & Kishore Kumar, T. (2019). Speech Bandwidth Extension Aided by Hybrid Model Transform Domain Data Hiding. In Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–5. Sunil Kumar, K., & Kishore Kumar, T. (2019). Speech Bandwidth Extension Aided by Hybrid Model Transform Domain Data Hiding. In Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–5.
Zurück zum Zitat Xiang, H., Chenglin, X., Nana, H., Lei, X., EngSiong, Ch., & Haizhou, L. (2020). Time-domain neural network approach for speech bandwidth extension. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp. 866–870. Xiang, H., Chenglin, X., Nana, H., Lei, X., EngSiong, Ch., & Haizhou, L. (2020). Time-domain neural network approach for speech bandwidth extension. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp. 866–870.
Zurück zum Zitat Yingwue, W., Shenghui, Z., & Dan, Q., (2016). Using conditional restricted Boltzmann machines for spectral envelope modelling in speech bandwidth extension. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp. 5930–5934. Yingwue, W., Shenghui, Z., & Dan, Q., (2016). Using conditional restricted Boltzmann machines for spectral envelope modelling in speech bandwidth extension. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp. 5930–5934.
Zurück zum Zitat Yuanjie, D., Yaxing, L., Xiaoqi, L., Shan, X., Dan, W., Zhihui, Z., & Shengwu, X. (2020). A time-frequency network with channel attention and non-local modules for artificial bandwidth extension. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp. 6954–6958. Yuanjie, D., Yaxing, L., Xiaoqi, L., Shan, X., Dan, W., Zhihui, Z., & Shengwu, X. (2020). A time-frequency network with channel attention and non-local modules for artificial bandwidth extension. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp. 6954–6958.
Zurück zum Zitat Zhen-Hua, L., Yang, A., & Yu, G. (2018). Waveform modelling and generation using hierarchical recurrent neural networks for speech bandwidth extension. IEEE/ACM Transaction Audio, Speech, and Language Process, 26(5), 883–894.CrossRef Zhen-Hua, L., Yang, A., & Yu, G. (2018). Waveform modelling and generation using hierarchical recurrent neural networks for speech bandwidth extension. IEEE/ACM Transaction Audio, Speech, and Language Process, 26(5), 883–894.CrossRef
Metadaten
Titel
Discrete cosine transform-based data hiding for speech bandwidth extension
verfasst von
Sunil Kumar Koduri
Kishore Kumar T
Publikationsdatum
24.06.2022
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 3/2022
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-022-09980-x

Weitere Artikel der Ausgabe 3/2022

International Journal of Speech Technology 3/2022 Zur Ausgabe

Neuer Inhalt