nach oben

International Journal of Speech Technology

Erschienen in:

26.11.2021

Sparse representation and reproduction of speech signals in complex Fourier basis

verfasst von: Lee-Chung Kwek, Alan Wee-Chiat Tan, Heng-Siong Lim, Cheah-Heng Tan, Khaled A. Alaghbari

Erschienen in: International Journal of Speech Technology | Ausgabe 1/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Sparse representation concerns the task of determining the most compact representation of a signal via a linear combination of bases of an overcomplete dictionary. As the problem is non-convex, it is common to consider approximate suboptimal solutions, and one such method is the orthogonal matching pursuit (OMP) algorithm. OMP is an iterative greedy algorithm, where at each step, the basis vector which is most correlated with the current residual is selected. For the most part, attention in the past has been directed towards using real-valued dictionaries as the considered signal of interest is also real-valued. From the perspective of speech representation, the use of complex dictionaries in sparse representation is intuitively appealing as audio signals are generally assumed to be a mixture of exponentials, with time-varying amplitudes and phases. However, sparse representation of speech signal based on complex dictionary is less investigated mainly because the measurements are normally real-valued. In this paper, we pursue this intuition by modelling the complex dictionary on the popular discrete Fourier transform, and then proceed to introduce a new orthogonalization mechanism in the OMP for such cases. The customization of the conventional OMP algorithm to the complex setting enables high-quality compact representation of the speech signals with low computational complexity. Results from experiments demonstrate that the proposed approach is able to retain high perceptual similarity of the reconstructed speech signals to the original ones.

Vorheriger Artikel A method for constructing Korean spontaneous spoken language corpus based on an imitation of abbreviated and transformed particles

Nächster Artikel Information hiding in proposed 10.6 kbps CS-ACELP based speech codec using Quantization Index Modulation

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

A basis vector is said to be active if the corresponding sparse coefficient is non-zero.

Aharon, M., Elad, M., & Bruckstein, A. (2006). K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing, 54(11), 4311–4322.CrossRef

Barthélemy, Q., Larue, A., & Mars, J. I. (2015). Color sparse representations for image processing: Review, models, and prospects. IEEE Transactions on Image Processing, 24(11), 3978–3989.MathSciNetCrossRef

Chen, J., Paliwal, K. K., & Nakamura, S. (2000). A block cosine transform and its application in speech recognition. In INTERSPEECH

Cook, G. W., & Kalker, T. (2013). The sparse discrete cosine transform with application to image compression. In 2013 Picture coding symposium (PCS), pp. 9–12.

Day, D., & Heroux, M. A. (2001). Solving complex-valued linear systems via equivalent real formulations. SIAM Journal on Scientific Computing, 23(2), 480–498.MathSciNetCrossRef

Deng, S., & Han, J. (2016). Sparse decomposition for signal periodic model over complex exponential dictionary. IEEE Signal Processing Letters, 23(12), 1858–1861.CrossRef

Donoho, D. L. (2006). Compressed sensing. IEEE Transactions on Information Theory, 52(4), 1289–1306.MathSciNetCrossRef

Fan, R., Wan, Q., Liu, Y., Chen, H. & Zhang, X. (2012). Complex orthogonal matching pursuit and its exact recovery conditions. arXiv:1206.2197.

Haneche, H., Boudraa, B., & Ouahabi, A. (2020). A new way to enhance speech signal based on compressed sensing. Measurement, 151, 107117.CrossRef

Haneche, H., Ouahabi, A., & Boudraa, B. (2019). New mobile communication system design for Rayleigh environments based on compressed sensing-source coding. IET Communications, 13(15), 2375–2385.CrossRef

ITU-T Recommendation P.800. (1996). Methods for subjective determination of transmission quality. Series P: Telephone Transmission Quality.

ITU-T Recommendation P.862. (2001). Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. Series P: Telephone transmission quality. Local Line Networks: Telephone Installations.

Jafari, M. G., & Plumbley, M. D. (2011). Fast dictionary learning for sparse representations of speech signals. IEEE Journal of Selected Topics in Signal Processing, 5(5), 1025–1031.CrossRef

Jankowski, C., Kalyanswamy, A., Basson, S., & Spitz, J. (1990). NTIMIT: A phonetically balanced, continuous speech, telephone bandwidth speech database. In International Conference on Acoustics, Speech, and Signal Processing (Vol. 1, pp. 109–112).

Loizou, P. C. (2017). Speech Enhancement: Theory and Practice (2nd ed.). Boca Raton: CRC Press.

Mlynarski, W. (2013). Sparse, complex-valued representations of natural sounds learned with phase and amplitude continuity priors. arXiv:1312.4695.

Mohimani, G. H., Babaie-Zadeh, M., & Jutten, C. (2008). Complex-valued sparse representation based on smoothed 0 norm. In 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3881–3884.

Moreno-Alvarado, R. G., Martinez-Garcia, M., Nakano, M., & Pérez, H. M. (2014). DCT-compressive sampling of multifrequency sparse audio signals. In 2014 IEEE Latin-America conference on communications (LATINCOM), pp. 1–5.

Orovic, I. (2016). Compressive sensing in signal processing: Algorithms and transform domain formulations. Mathematical Problems in Engineering, 2016, 16.MathSciNetCrossRef

Rix, A. W., Beerends, J. G., Hollier, M. P., & Hekstra, A. P. (2001). Perceptual evaluation of speech quality (PESQ)—A new method for speech quality assessment of telephone networks and codecs. In 2001 IEEE international conference on acoustics, speech, and signal processing. Proceedings (Cat. No.01CH37221) (Vol. 2, pp. 749–752).

Sharma, P., Abrol, V., & Sao, A. K. (2017). Deep-sparse-representation-based features for speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(11), 2162–2175.CrossRef

Sigg, C. D., Dikk, T., & Buhmann, J. M. (2010). In 2010 IEEE international conference on acoustics, speech and signal processing, pp. 4758–4761.

Tabet, Y., Boughazi, M., & Afifi, S. (2018). Speech analysis and synthesis with a refined adaptive sinusoidal representation. International Journal of Speech Technology, 21, 581–588.CrossRef

Tropp, J. (2004). Greed is good: Algorithmic results for sparse approximation. IEEE Transactions on Information Theory, 50(10), 2231–2242.MathSciNetCrossRef

Wright, J., Ma, Y., Mairal, J., Sapiro, G., Huang, T. S., & Yan, S. (2010). Sparse representation for computer vision and pattern recognition. Proceedings of the IEEE, 98(6), 1031–1044.CrossRef

Titel: Sparse representation and reproduction of speech signals in complex Fourier basis
verfasst von: Lee-Chung Kwek
Alan Wee-Chiat Tan
Heng-Siong Lim
Cheah-Heng Tan
Khaled A. Alaghbari
Publikationsdatum: 26.11.2021
Verlag: Springer US
Erschienen in: International Journal of Speech Technology / Ausgabe 1/2022
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-021-09941-w

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Internationaler Motorenkongress/© [M] ATZlive | Chisnikov / Fotolia.com, Search Icon, Banner Hanser, Gardiner von Trapp/© Alpega Group, Benny Hahn/© ZEP GmbH, Customer Experience/© © oatawa / Getty Images / iStock, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade, chassis.tech plus 2023/© [M] ATZlive / TÜV SÜD PRODUCT SERVICE GMBH

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 1/2022

Towards a historical dictionary for Arabic language

Boosting subjective quality of Arabic text-to-speech (TTS) using end-to-end deep architecture

Efficient cancelable speaker identification system based on a hybrid structure of DWT and SVD

Correction to: The perception of emotional cues by children in artificial background noise

A novel semantic and logical-based approach integrating RTE technique in the Arabic question–answering

Closed-set speaker identification using VQ and GMM based models

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.