Skip to main content
Log in

Scale-Invariant Modification of COSH Distance for Measuring Speech Signal Distortions in Real-Time Mode

  • Published:
Radioelectronics and Communications Systems Aims and scope Submit manuscript

Abstract

This study considers a new measure of distortions of speaker speech sounds that is invariant with respect to the gain of speech signal in a communication channel. Properties of the measure are investigated in comparison with its closest analogues. A series of theoretical features has been proved. The new measure is shown to combine advantages of the symmetric Itakura distance in relation to the noise immunity of automatic speech processing, on the one hand, and of the COSH distance in relation to the sensitivity to speech signal distortions, on the other hand. Using the proprietary software, an experiment was set up and conducted. Estimates of the new measure dependence on the signal-to-noise ratio were presented. It has been shown that the logarithmic presentation of this relationship has the pattern close to linear. The obtained results are intended to be used in development of new systems and upgrading of existing systems and technologies for digital signal processing and speech quality analysis under the noise exposure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1.
Fig. 2.
Fig. 3.

Similar content being viewed by others

References

  1. J. Sadasivan, C. S. Seelamantula, N. R. Muraka, "Speech enhancement using a risk estimation approach," Speech Commun., v.116, p.12 (2020). DOI: https://doi.org/10.1016/j.specom.2019.11.001.

    Article  Google Scholar 

  2. V. V. Savchenko, "Itakura–Saito divergence as an element of the information theory of speech perception," J. Commun. Technol. Electron., v.64, n.6, p.590 (2019). DOI: https://doi.org/10.1134/S1064226919060093.

    Article  Google Scholar 

  3. M. A. Bakhshali, M. Khademi, A. Ebrahimi-Moghadam, S. Moghimi, "EEG signal classification of imagined speech based on Riemannian distance of correntropy spectral density," Biomed. Signal Process. Control, v.59, p.101899 (2020). DOI: https://doi.org/10.1016/j.bspc.2020.101899.

    Article  Google Scholar 

  4. A. A. Borovkov, Mathematical Statistics. Additional Chapters (Nauka, Fizmatlit, Moscow, 1984).

    Google Scholar 

  5. C. Liu, M. Jiang, "Robust adaptive filter with lncosh cost," Signal Process., v.168, p.107348 (2020). DOI: https://doi.org/10.1016/j.sigpro.2019.107348.

    Article  Google Scholar 

  6. D. Prasetyawan, T. Nakamoto, "Comparison of NMF with Kullback-Leibler divergence and Itakura-Saito divergence for Odor approximation," in 2019 IEEE International Symposium on Olfaction and Electronic Nose (ISOEN) (IEEE, Washington, 2019). DOI: https://doi.org/10.1109/ISOEN.2019.8823186.

    Chapter  Google Scholar 

  7. Y. Matsuyama, A. Buzo, R. Gray, "Spectral distortion measures for speech compression. Information Systems Lab., Stanford Electronics Lab., Tech. Rep. 6504-3," Stanford, California (1978).

  8. F. Itakura, S. Saito, "Analysis synthesis telephony based on the maximum likelihood method," in Proc. 6th of the International Congress on Acoustics (IEEE, Los Alamitos, CA, 1968). URI: http://www.fon.hum.uva.nl/praat/manual/Itakura___Saito__1968_.html.

    Google Scholar 

  9. R. Gray, A. Buzo, A. Gray, Y. Matsuyama, "Distortion measures for speech processing," IEEE Trans. Acoust. Speech, Signal Process., v.28, n.4, p.367 (1980). DOI: https://doi.org/10.1109/TASSP.1980.1163421.

    Article  MATH  Google Scholar 

  10. S. Kullback, Information Theory and Statistics (Dover Publications, New York, 1997). URI: https://www.amazon.com/Information-Theory-Statistics-Dover-Mathematics/dp/0486696847.

    MATH  Google Scholar 

  11. F.-L. Xie, F. K. Soong, H. Li, "Voice conversion with SI-DNN and KL divergence based mapping without parallel training data," Speech Commun., v.106, p.57 (2019). DOI: https://doi.org/10.1016/j.specom.2018.11.007.

    Article  Google Scholar 

  12. A. A. Gharbali, S. Najdi, J. M. Fonseca, "Investigating the contribution of distance-based features to automatic sleep stage classification," Comput. Biol. Med., v.96, p.8 (2018). DOI: https://doi.org/10.1016/j.compbiomed.2018.03.001.

    Article  Google Scholar 

  13. V. V. Savchenko, "A method of measuring the index of acoustic voice quality based on an information-theoretic approach," Meas. Tech., v.61, n.1, p.79 (2018). DOI: https://doi.org/10.1007/s11018-018-1391-8.

    Article  Google Scholar 

  14. Y. Gu, H.-L. Wei, "A robust model structure selection method for small sample size and multiple datasets problems," Inf. Sci., v.451–452, p.195 (2018). DOI: https://doi.org/10.1016/j.ins.2018.04.007.

    Article  MATH  Google Scholar 

  15. F. Mustiere, M. Bouchard, M. Bolic, "All-pole modeling of discrete spectral powers: A unified approach," IEEE Trans. Audio, Speech, Lang. Process., v.20, n.2, p.705 (2012). DOI: https://doi.org/10.1109/TASL.2011.2163511.

    Article  Google Scholar 

  16. S. Shamila Rachel, U. Snekhalatha, K. Vedhasorubini, D. Balakrishnan, "Spectral analysis of speech signal characteristics: A comparison between healthy controls and Laryngeal disorder," in Proc. International Conference on Intelligent Computing and Applications (Springer, Singapore, 2018). DOI: https://doi.org/10.1007/978-981-10-5520-1_31.

    Chapter  Google Scholar 

  17. B. Wei, J. D. Gibson, "A new discrete spectral modeling method and an application to CELP coding," IEEE Signal Process. Lett., v.10, n.4, p.101 (2003). DOI: https://doi.org/10.1109/LSP.2003.808550.

    Article  Google Scholar 

  18. A. Ben Aicha, "Machine learning based approach to assess denoised speech," Procedia Comput. Sci., v.159, p.698 (2019). DOI: https://doi.org/10.1016/j.procs.2019.09.225.

    Article  Google Scholar 

  19. M. E. Hossain, M. S. A. Zilany, E. Davies-Venn, "On the feasibility of using a bispectral measure as a nonintrusive predictor of speech intelligibility," Comput. Speech Lang., v.57, p.59 (2019). DOI: https://doi.org/10.1016/j.csl.2019.02.003.

    Article  Google Scholar 

  20. V. V. Savchenko, A. V. Savchenko, "Method for measuring distortions of a speech signal during its transmission over a communication channel to a biometric identification system," Izmer. Tekhnika, n.11, p.65 (2020). DOI: https://doi.org/10.32446/0368-1025it.2020-11-65-72.

    Article  Google Scholar 

  21. V. V. Savchenko, "Minimum of information divergence criterion for signals with tuning to speaker voice in automatic speech recognition," Radioelectron. Commun. Syst., v.63, n.1, p.42 (2020). DOI: https://doi.org/10.3103/S0735272720010045.

    Article  MathSciNet  Google Scholar 

  22. V. V. Savchenko, "Words phonetic decoding method with the suppression of background noise," J. Commun. Technol. Electron., v.62, n.7, p.788 (2017). DOI: https://doi.org/10.1134/S1064226917070099.

    Article  Google Scholar 

  23. V. V. Savchenko, A. V. Savchenko, "Criterion of significance level for selection of order of spectral estimation of entropy maximum," Radioelectron. Commun. Syst., v.62, n.5, p.223 (2019). DOI: https://doi.org/10.3103/S0735272719050042.

    Article  Google Scholar 

  24. J. Benesty, J. Chen, Y. Huang, "Linear prediction," in Springer Handbook of Speech Processing (Springer Berlin Heidelberg, Berlin, Heidelberg, 2008). DOI: https://doi.org/10.1007/978-3-540-49127-9_7.

    Chapter  Google Scholar 

  25. F. Itakura, "Minimum prediction residual principle applied to speech recognition," IEEE Trans. Acoust. Speech, Signal Process., v.23, n.1, p.67 (1975). DOI: https://doi.org/10.1109/TASSP.1975.1162641.

    Article  Google Scholar 

  26. E. Estrada, H. Nazeran, F. Ebrahimi, M. Mikaeili, "Symmetric Itakura distance as an EEG signal feature for sleep depth determination," in ASME 2009 Summer Bioengineering Conference, Parts A and B (American Society of Mechanical Engineers, 2009). DOI: https://doi.org/10.1115/SBC2009-206233.

    Chapter  Google Scholar 

  27. O. Diana, A. Mihaela, "Feature extraction and classification methods for a motor task brain computer interface: A comparative evaluation for two databases," Int. J. Adv. Comput. Sci. Appl., v.8, n.8 (2017). DOI: https://doi.org/10.14569/IJACSA.2017.080834.

    Article  Google Scholar 

Download references

Acknowledgments

This investigation was carried out at the expense of the grant from the Russian Science Foundation (Project no. 20-71-10010).

Author information

Authors and Affiliations

Authors

Ethics declarations

ADDITIONAL INFORMATION

A. V. Savchenko and V. V. Savchenko

The authors declare that they have no conflicts of interest.

This article does not contain any studies with human participants or animals performed by any of the authors.

The initial version of this paper in Russian is published in the journal “Izvestiya Vysshikh Uchebnykh Zavedenii. Radioelektronika,” ISSN 2307-6011 (Online), ISSN 0021-3470 (Print) on the link http://radio.kpi.ua/article/view/S0021347021060030 with DOI: https://doi.org/10.20535/S0021347021060030

Additional information

Translated from Izvestiya Vysshikh Uchebnykh Zavedenii. Radioelektronika, No. 6, pp. 350-361, May, 2021 https://doi.org/10.20535/S0021347021060030 .

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Savchenko, A.V., Savchenko, V.V. Scale-Invariant Modification of COSH Distance for Measuring Speech Signal Distortions in Real-Time Mode. Radioelectron.Commun.Syst. 64, 300–309 (2021). https://doi.org/10.3103/S0735272721060030

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.3103/S0735272721060030

Navigation