Skip to main content
Erschienen in: International Journal of Speech Technology 1/2022

26.11.2021

Sparse representation and reproduction of speech signals in complex Fourier basis

verfasst von: Lee-Chung Kwek, Alan Wee-Chiat Tan, Heng-Siong Lim, Cheah-Heng Tan, Khaled A. Alaghbari

Erschienen in: International Journal of Speech Technology | Ausgabe 1/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Sparse representation concerns the task of determining the most compact representation of a signal via a linear combination of bases of an overcomplete dictionary. As the problem is non-convex, it is common to consider approximate suboptimal solutions, and one such method is the orthogonal matching pursuit (OMP) algorithm. OMP is an iterative greedy algorithm, where at each step, the basis vector which is most correlated with the current residual is selected. For the most part, attention in the past has been directed towards using real-valued dictionaries as the considered signal of interest is also real-valued. From the perspective of speech representation, the use of complex dictionaries in sparse representation is intuitively appealing as audio signals are generally assumed to be a mixture of exponentials, with time-varying amplitudes and phases. However, sparse representation of speech signal based on complex dictionary is less investigated mainly because the measurements are normally real-valued. In this paper, we pursue this intuition by modelling the complex dictionary on the popular discrete Fourier transform, and then proceed to introduce a new orthogonalization mechanism in the OMP for such cases. The customization of the conventional OMP algorithm to the complex setting enables high-quality compact representation of the speech signals with low computational complexity. Results from experiments demonstrate that the proposed approach is able to retain high perceptual similarity of the reconstructed speech signals to the original ones.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
A basis vector is said to be active if the corresponding sparse coefficient is non-zero.
 
Literatur
Zurück zum Zitat Aharon, M., Elad, M., & Bruckstein, A. (2006). K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing, 54(11), 4311–4322.CrossRef Aharon, M., Elad, M., & Bruckstein, A. (2006). K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing, 54(11), 4311–4322.CrossRef
Zurück zum Zitat Barthélemy, Q., Larue, A., & Mars, J. I. (2015). Color sparse representations for image processing: Review, models, and prospects. IEEE Transactions on Image Processing, 24(11), 3978–3989.MathSciNetCrossRef Barthélemy, Q., Larue, A., & Mars, J. I. (2015). Color sparse representations for image processing: Review, models, and prospects. IEEE Transactions on Image Processing, 24(11), 3978–3989.MathSciNetCrossRef
Zurück zum Zitat Chen, J., Paliwal, K. K., & Nakamura, S. (2000). A block cosine transform and its application in speech recognition. In INTERSPEECH Chen, J., Paliwal, K. K., & Nakamura, S. (2000). A block cosine transform and its application in speech recognition. In INTERSPEECH
Zurück zum Zitat Cook, G. W., & Kalker, T. (2013). The sparse discrete cosine transform with application to image compression. In 2013 Picture coding symposium (PCS), pp. 9–12. Cook, G. W., & Kalker, T. (2013). The sparse discrete cosine transform with application to image compression. In 2013 Picture coding symposium (PCS), pp. 9–12.
Zurück zum Zitat Day, D., & Heroux, M. A. (2001). Solving complex-valued linear systems via equivalent real formulations. SIAM Journal on Scientific Computing, 23(2), 480–498.MathSciNetCrossRef Day, D., & Heroux, M. A. (2001). Solving complex-valued linear systems via equivalent real formulations. SIAM Journal on Scientific Computing, 23(2), 480–498.MathSciNetCrossRef
Zurück zum Zitat Deng, S., & Han, J. (2016). Sparse decomposition for signal periodic model over complex exponential dictionary. IEEE Signal Processing Letters, 23(12), 1858–1861.CrossRef Deng, S., & Han, J. (2016). Sparse decomposition for signal periodic model over complex exponential dictionary. IEEE Signal Processing Letters, 23(12), 1858–1861.CrossRef
Zurück zum Zitat Fan, R., Wan, Q., Liu, Y., Chen, H. & Zhang, X. (2012). Complex orthogonal matching pursuit and its exact recovery conditions. arXiv:1206.2197. Fan, R., Wan, Q., Liu, Y., Chen, H. & Zhang, X. (2012). Complex orthogonal matching pursuit and its exact recovery conditions. arXiv:​1206.​2197.
Zurück zum Zitat Haneche, H., Boudraa, B., & Ouahabi, A. (2020). A new way to enhance speech signal based on compressed sensing. Measurement, 151, 107117.CrossRef Haneche, H., Boudraa, B., & Ouahabi, A. (2020). A new way to enhance speech signal based on compressed sensing. Measurement, 151, 107117.CrossRef
Zurück zum Zitat Haneche, H., Ouahabi, A., & Boudraa, B. (2019). New mobile communication system design for Rayleigh environments based on compressed sensing-source coding. IET Communications, 13(15), 2375–2385.CrossRef Haneche, H., Ouahabi, A., & Boudraa, B. (2019). New mobile communication system design for Rayleigh environments based on compressed sensing-source coding. IET Communications, 13(15), 2375–2385.CrossRef
Zurück zum Zitat ITU-T Recommendation P.800. (1996). Methods for subjective determination of transmission quality. Series P: Telephone Transmission Quality. ITU-T Recommendation P.800. (1996). Methods for subjective determination of transmission quality. Series P: Telephone Transmission Quality.
Zurück zum Zitat ITU-T Recommendation P.862. (2001). Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. Series P: Telephone transmission quality. Local Line Networks: Telephone Installations. ITU-T Recommendation P.862. (2001). Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. Series P: Telephone transmission quality. Local Line Networks: Telephone Installations.
Zurück zum Zitat Jafari, M. G., & Plumbley, M. D. (2011). Fast dictionary learning for sparse representations of speech signals. IEEE Journal of Selected Topics in Signal Processing, 5(5), 1025–1031.CrossRef Jafari, M. G., & Plumbley, M. D. (2011). Fast dictionary learning for sparse representations of speech signals. IEEE Journal of Selected Topics in Signal Processing, 5(5), 1025–1031.CrossRef
Zurück zum Zitat Jankowski, C., Kalyanswamy, A., Basson, S., & Spitz, J. (1990). NTIMIT: A phonetically balanced, continuous speech, telephone bandwidth speech database. In International Conference on Acoustics, Speech, and Signal Processing (Vol. 1, pp. 109–112). Jankowski, C., Kalyanswamy, A., Basson, S., & Spitz, J. (1990). NTIMIT: A phonetically balanced, continuous speech, telephone bandwidth speech database. In International Conference on Acoustics, Speech, and Signal Processing (Vol. 1, pp. 109–112).
Zurück zum Zitat Loizou, P. C. (2017). Speech Enhancement: Theory and Practice (2nd ed.). Boca Raton: CRC Press. Loizou, P. C. (2017). Speech Enhancement: Theory and Practice (2nd ed.). Boca Raton: CRC Press.
Zurück zum Zitat Mlynarski, W. (2013). Sparse, complex-valued representations of natural sounds learned with phase and amplitude continuity priors. arXiv:1312.4695. Mlynarski, W. (2013). Sparse, complex-valued representations of natural sounds learned with phase and amplitude continuity priors. arXiv:​1312.​4695.
Zurück zum Zitat Mohimani, G. H., Babaie-Zadeh, M., & Jutten, C. (2008). Complex-valued sparse representation based on smoothed 0 norm. In 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3881–3884. Mohimani, G. H., Babaie-Zadeh, M., & Jutten, C. (2008). Complex-valued sparse representation based on smoothed 0 norm. In 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3881–3884.
Zurück zum Zitat Moreno-Alvarado, R. G., Martinez-Garcia, M., Nakano, M., & Pérez, H. M. (2014). DCT-compressive sampling of multifrequency sparse audio signals. In 2014 IEEE Latin-America conference on communications (LATINCOM), pp. 1–5. Moreno-Alvarado, R. G., Martinez-Garcia, M., Nakano, M., & Pérez, H. M. (2014). DCT-compressive sampling of multifrequency sparse audio signals. In 2014 IEEE Latin-America conference on communications (LATINCOM), pp. 1–5.
Zurück zum Zitat Orovic, I. (2016). Compressive sensing in signal processing: Algorithms and transform domain formulations. Mathematical Problems in Engineering, 2016, 16.MathSciNetCrossRef Orovic, I. (2016). Compressive sensing in signal processing: Algorithms and transform domain formulations. Mathematical Problems in Engineering, 2016, 16.MathSciNetCrossRef
Zurück zum Zitat Rix, A. W., Beerends, J. G., Hollier, M. P., & Hekstra, A. P. (2001). Perceptual evaluation of speech quality (PESQ)—A new method for speech quality assessment of telephone networks and codecs. In 2001 IEEE international conference on acoustics, speech, and signal processing. Proceedings (Cat. No.01CH37221) (Vol. 2, pp. 749–752). Rix, A. W., Beerends, J. G., Hollier, M. P., & Hekstra, A. P. (2001). Perceptual evaluation of speech quality (PESQ)—A new method for speech quality assessment of telephone networks and codecs. In 2001 IEEE international conference on acoustics, speech, and signal processing. Proceedings (Cat. No.01CH37221) (Vol. 2, pp. 749–752).
Zurück zum Zitat Sharma, P., Abrol, V., & Sao, A. K. (2017). Deep-sparse-representation-based features for speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(11), 2162–2175.CrossRef Sharma, P., Abrol, V., & Sao, A. K. (2017). Deep-sparse-representation-based features for speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(11), 2162–2175.CrossRef
Zurück zum Zitat Sigg, C. D., Dikk, T., & Buhmann, J. M. (2010). In 2010 IEEE international conference on acoustics, speech and signal processing, pp. 4758–4761. Sigg, C. D., Dikk, T., & Buhmann, J. M. (2010). In 2010 IEEE international conference on acoustics, speech and signal processing, pp. 4758–4761.
Zurück zum Zitat Tabet, Y., Boughazi, M., & Afifi, S. (2018). Speech analysis and synthesis with a refined adaptive sinusoidal representation. International Journal of Speech Technology, 21, 581–588.CrossRef Tabet, Y., Boughazi, M., & Afifi, S. (2018). Speech analysis and synthesis with a refined adaptive sinusoidal representation. International Journal of Speech Technology, 21, 581–588.CrossRef
Zurück zum Zitat Tropp, J. (2004). Greed is good: Algorithmic results for sparse approximation. IEEE Transactions on Information Theory, 50(10), 2231–2242.MathSciNetCrossRef Tropp, J. (2004). Greed is good: Algorithmic results for sparse approximation. IEEE Transactions on Information Theory, 50(10), 2231–2242.MathSciNetCrossRef
Zurück zum Zitat Wright, J., Ma, Y., Mairal, J., Sapiro, G., Huang, T. S., & Yan, S. (2010). Sparse representation for computer vision and pattern recognition. Proceedings of the IEEE, 98(6), 1031–1044.CrossRef Wright, J., Ma, Y., Mairal, J., Sapiro, G., Huang, T. S., & Yan, S. (2010). Sparse representation for computer vision and pattern recognition. Proceedings of the IEEE, 98(6), 1031–1044.CrossRef
Metadaten
Titel
Sparse representation and reproduction of speech signals in complex Fourier basis
verfasst von
Lee-Chung Kwek
Alan Wee-Chiat Tan
Heng-Siong Lim
Cheah-Heng Tan
Khaled A. Alaghbari
Publikationsdatum
26.11.2021
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 1/2022
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-021-09941-w

Weitere Artikel der Ausgabe 1/2022

International Journal of Speech Technology 1/2022 Zur Ausgabe

Neuer Inhalt