Abstract
This study considers a new measure of distortions of speaker speech sounds that is invariant with respect to the gain of speech signal in a communication channel. Properties of the measure are investigated in comparison with its closest analogues. A series of theoretical features has been proved. The new measure is shown to combine advantages of the symmetric Itakura distance in relation to the noise immunity of automatic speech processing, on the one hand, and of the COSH distance in relation to the sensitivity to speech signal distortions, on the other hand. Using the proprietary software, an experiment was set up and conducted. Estimates of the new measure dependence on the signal-to-noise ratio were presented. It has been shown that the logarithmic presentation of this relationship has the pattern close to linear. The obtained results are intended to be used in development of new systems and upgrading of existing systems and technologies for digital signal processing and speech quality analysis under the noise exposure.
Similar content being viewed by others
References
J. Sadasivan, C. S. Seelamantula, N. R. Muraka, "Speech enhancement using a risk estimation approach," Speech Commun., v.116, p.12 (2020). DOI: https://doi.org/10.1016/j.specom.2019.11.001.
V. V. Savchenko, "Itakura–Saito divergence as an element of the information theory of speech perception," J. Commun. Technol. Electron., v.64, n.6, p.590 (2019). DOI: https://doi.org/10.1134/S1064226919060093.
M. A. Bakhshali, M. Khademi, A. Ebrahimi-Moghadam, S. Moghimi, "EEG signal classification of imagined speech based on Riemannian distance of correntropy spectral density," Biomed. Signal Process. Control, v.59, p.101899 (2020). DOI: https://doi.org/10.1016/j.bspc.2020.101899.
A. A. Borovkov, Mathematical Statistics. Additional Chapters (Nauka, Fizmatlit, Moscow, 1984).
C. Liu, M. Jiang, "Robust adaptive filter with lncosh cost," Signal Process., v.168, p.107348 (2020). DOI: https://doi.org/10.1016/j.sigpro.2019.107348.
D. Prasetyawan, T. Nakamoto, "Comparison of NMF with Kullback-Leibler divergence and Itakura-Saito divergence for Odor approximation," in 2019 IEEE International Symposium on Olfaction and Electronic Nose (ISOEN) (IEEE, Washington, 2019). DOI: https://doi.org/10.1109/ISOEN.2019.8823186.
Y. Matsuyama, A. Buzo, R. Gray, "Spectral distortion measures for speech compression. Information Systems Lab., Stanford Electronics Lab., Tech. Rep. 6504-3," Stanford, California (1978).
F. Itakura, S. Saito, "Analysis synthesis telephony based on the maximum likelihood method," in Proc. 6th of the International Congress on Acoustics (IEEE, Los Alamitos, CA, 1968). URI: http://www.fon.hum.uva.nl/praat/manual/Itakura___Saito__1968_.html.
R. Gray, A. Buzo, A. Gray, Y. Matsuyama, "Distortion measures for speech processing," IEEE Trans. Acoust. Speech, Signal Process., v.28, n.4, p.367 (1980). DOI: https://doi.org/10.1109/TASSP.1980.1163421.
S. Kullback, Information Theory and Statistics (Dover Publications, New York, 1997). URI: https://www.amazon.com/Information-Theory-Statistics-Dover-Mathematics/dp/0486696847.
F.-L. Xie, F. K. Soong, H. Li, "Voice conversion with SI-DNN and KL divergence based mapping without parallel training data," Speech Commun., v.106, p.57 (2019). DOI: https://doi.org/10.1016/j.specom.2018.11.007.
A. A. Gharbali, S. Najdi, J. M. Fonseca, "Investigating the contribution of distance-based features to automatic sleep stage classification," Comput. Biol. Med., v.96, p.8 (2018). DOI: https://doi.org/10.1016/j.compbiomed.2018.03.001.
V. V. Savchenko, "A method of measuring the index of acoustic voice quality based on an information-theoretic approach," Meas. Tech., v.61, n.1, p.79 (2018). DOI: https://doi.org/10.1007/s11018-018-1391-8.
Y. Gu, H.-L. Wei, "A robust model structure selection method for small sample size and multiple datasets problems," Inf. Sci., v.451–452, p.195 (2018). DOI: https://doi.org/10.1016/j.ins.2018.04.007.
F. Mustiere, M. Bouchard, M. Bolic, "All-pole modeling of discrete spectral powers: A unified approach," IEEE Trans. Audio, Speech, Lang. Process., v.20, n.2, p.705 (2012). DOI: https://doi.org/10.1109/TASL.2011.2163511.
S. Shamila Rachel, U. Snekhalatha, K. Vedhasorubini, D. Balakrishnan, "Spectral analysis of speech signal characteristics: A comparison between healthy controls and Laryngeal disorder," in Proc. International Conference on Intelligent Computing and Applications (Springer, Singapore, 2018). DOI: https://doi.org/10.1007/978-981-10-5520-1_31.
B. Wei, J. D. Gibson, "A new discrete spectral modeling method and an application to CELP coding," IEEE Signal Process. Lett., v.10, n.4, p.101 (2003). DOI: https://doi.org/10.1109/LSP.2003.808550.
A. Ben Aicha, "Machine learning based approach to assess denoised speech," Procedia Comput. Sci., v.159, p.698 (2019). DOI: https://doi.org/10.1016/j.procs.2019.09.225.
M. E. Hossain, M. S. A. Zilany, E. Davies-Venn, "On the feasibility of using a bispectral measure as a nonintrusive predictor of speech intelligibility," Comput. Speech Lang., v.57, p.59 (2019). DOI: https://doi.org/10.1016/j.csl.2019.02.003.
V. V. Savchenko, A. V. Savchenko, "Method for measuring distortions of a speech signal during its transmission over a communication channel to a biometric identification system," Izmer. Tekhnika, n.11, p.65 (2020). DOI: https://doi.org/10.32446/0368-1025it.2020-11-65-72.
V. V. Savchenko, "Minimum of information divergence criterion for signals with tuning to speaker voice in automatic speech recognition," Radioelectron. Commun. Syst., v.63, n.1, p.42 (2020). DOI: https://doi.org/10.3103/S0735272720010045.
V. V. Savchenko, "Words phonetic decoding method with the suppression of background noise," J. Commun. Technol. Electron., v.62, n.7, p.788 (2017). DOI: https://doi.org/10.1134/S1064226917070099.
V. V. Savchenko, A. V. Savchenko, "Criterion of significance level for selection of order of spectral estimation of entropy maximum," Radioelectron. Commun. Syst., v.62, n.5, p.223 (2019). DOI: https://doi.org/10.3103/S0735272719050042.
J. Benesty, J. Chen, Y. Huang, "Linear prediction," in Springer Handbook of Speech Processing (Springer Berlin Heidelberg, Berlin, Heidelberg, 2008). DOI: https://doi.org/10.1007/978-3-540-49127-9_7.
F. Itakura, "Minimum prediction residual principle applied to speech recognition," IEEE Trans. Acoust. Speech, Signal Process., v.23, n.1, p.67 (1975). DOI: https://doi.org/10.1109/TASSP.1975.1162641.
E. Estrada, H. Nazeran, F. Ebrahimi, M. Mikaeili, "Symmetric Itakura distance as an EEG signal feature for sleep depth determination," in ASME 2009 Summer Bioengineering Conference, Parts A and B (American Society of Mechanical Engineers, 2009). DOI: https://doi.org/10.1115/SBC2009-206233.
O. Diana, A. Mihaela, "Feature extraction and classification methods for a motor task brain computer interface: A comparative evaluation for two databases," Int. J. Adv. Comput. Sci. Appl., v.8, n.8 (2017). DOI: https://doi.org/10.14569/IJACSA.2017.080834.
Acknowledgments
This investigation was carried out at the expense of the grant from the Russian Science Foundation (Project no. 20-71-10010).
Author information
Authors and Affiliations
Ethics declarations
ADDITIONAL INFORMATION
A. V. Savchenko and V. V. Savchenko
The authors declare that they have no conflicts of interest.
This article does not contain any studies with human participants or animals performed by any of the authors.
The initial version of this paper in Russian is published in the journal “Izvestiya Vysshikh Uchebnykh Zavedenii. Radioelektronika,” ISSN 2307-6011 (Online), ISSN 0021-3470 (Print) on the link http://radio.kpi.ua/article/view/S0021347021060030 with DOI: https://doi.org/10.20535/S0021347021060030
Additional information
Translated from Izvestiya Vysshikh Uchebnykh Zavedenii. Radioelektronika, No. 6, pp. 350-361, May, 2021 https://doi.org/10.20535/S0021347021060030 .
About this article
Cite this article
Savchenko, A.V., Savchenko, V.V. Scale-Invariant Modification of COSH Distance for Measuring Speech Signal Distortions in Real-Time Mode. Radioelectron.Commun.Syst. 64, 300–309 (2021). https://doi.org/10.3103/S0735272721060030
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.3103/S0735272721060030