Skip to main content
Top
Published in: International Journal of Speech Technology 3/2019

15-07-2019

Blind speech dereverberation using sparse decomposition and multi-channel linear prediction

Authors: Leila Mousavi, Farbod Razzazi, Afrooz Haghbin

Published in: International Journal of Speech Technology | Issue 3/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this study, a blind speech dereverberation method in a noiseless single input multiple output acoustic channel is proposed. The method is based on multichannel linear prediction (MCLP) in STFT domain assuming sparsity in both residual speech and channel coefficients. The proposed speech dereverberation algorithm assumes that both the residual speech signal and the linear prediction coefficients is sparse. The optimization was performed by convex optimization using ADMM and CVX. The proposed model was compared with state of the art methods with lp norm optimization criteria. Simulations were evaluated in different room models with various reverberation times, numbers of microphones and parameter adjustments. The results show that the performance of the proposed method is superior in terms of speech dereverberation assessment criteria.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Babacan, S. D., Molina, R., Do, M. N., & Katsaggelos, A. K. (2012). Bayesian blind deconvolution with general sparse image priors. In: Proceedings of European conference of computer vision (ECCV), Florence, Italy (pp. 341–355). Babacan, S. D., Molina, R., Do, M. N., & Katsaggelos, A. K. (2012). Bayesian blind deconvolution with general sparse image priors. In: Proceedings of European conference of computer vision (ECCV), Florence, Italy (pp. 341–355).
go back to reference Candes, E. J., Wakin, M. B., & Boyd, S. P. (2008). Enhancing sparsity by reweighted minimization. Journal of Fourier Anals and Application, 14(5–6), 877–905.MathSciNetCrossRefMATH Candes, E. J., Wakin, M. B., & Boyd, S. P. (2008). Enhancing sparsity by reweighted minimization. Journal of Fourier Anals and Application, 14(5–6), 877–905.MathSciNetCrossRefMATH
go back to reference Chartrand, R., & Yin, W. (2008). Iteratively reweighted algorithms for compressive sensing. In Proceedings of IEEE international conference of acoustics, speech, and signal processing (ICASSP), Las Vegas, NV, USA (pp. 3869–3872). Chartrand, R., & Yin, W. (2008). Iteratively reweighted algorithms for compressive sensing. In Proceedings of IEEE international conference of acoustics, speech, and signal processing (ICASSP), Las Vegas, NV, USA (pp. 3869–3872).
go back to reference Giacobello, D., Christensen, M. G., Murthi, M. N., Jensen, S. H., & Moonen, M. (2009), Speech coding based on sparse linear prediction. In Proceedings of European signal processing conference (EUSIPCO) (pp. 2524–2528). Giacobello, D., Christensen, M. G., Murthi, M. N., Jensen, S. H., & Moonen, M. (2009), Speech coding based on sparse linear prediction. In Proceedings of European signal processing conference (EUSIPCO) (pp. 2524–2528).
go back to reference Giacobello, D., Christensen, M. G., Murthi, M. N., Jensen, S. H., & Moonen, M. (2012). Sparse linear prediction and its applications to speech processing. IEEE Transactions on Audio, Speech and Language Processing, 20(5), 1644–1657.CrossRef Giacobello, D., Christensen, M. G., Murthi, M. N., Jensen, S. H., & Moonen, M. (2012). Sparse linear prediction and its applications to speech processing. IEEE Transactions on Audio, Speech and Language Processing, 20(5), 1644–1657.CrossRef
go back to reference Hansen, P. C., & O’Leary, D. P. (1993). The use of the L-curve in the regularization of discrete ill-posed problems. SIAM Journal Scientific Computing, 14(6), 1487–1503.MathSciNetCrossRefMATH Hansen, P. C., & O’Leary, D. P. (1993). The use of the L-curve in the regularization of discrete ill-posed problems. SIAM Journal Scientific Computing, 14(6), 1487–1503.MathSciNetCrossRefMATH
go back to reference Jensen, T. L., Giacobello, D., van Waterschoot, T., & Christensen, M. G. (2016a). Fast algorithms for high-order sparse linear prediction with applications to speech processing. Speech Communication, 76(2), 143–156.CrossRef Jensen, T. L., Giacobello, D., van Waterschoot, T., & Christensen, M. G. (2016a). Fast algorithms for high-order sparse linear prediction with applications to speech processing. Speech Communication, 76(2), 143–156.CrossRef
go back to reference Jensen T. L., Giacobello D., van Waterschoot T., & Christensen M. G. (2016b). Computational analysis of a fast algorithm for high-order sparse linear prediction. In Proceedings of European signal processing conference (EUSIPCO) (pp. 1–6). Jensen T. L., Giacobello D., van Waterschoot T., & Christensen M. G. (2016b). Computational analysis of a fast algorithm for high-order sparse linear prediction. In Proceedings of European signal processing conference (EUSIPCO) (pp. 1–6).
go back to reference Jukic, A., van Waterschoot, T., Gerkmann, T., & Doclo S. (2014). Speech dereverberation with multi-channel linear prediction and sparse priors for the desired signal. In Proceedings of joint workshop hands-free speech communication microphone arrays (HSCMA), Nancy, France (pp. 23–26). Jukic, A., van Waterschoot, T., Gerkmann, T., & Doclo S. (2014). Speech dereverberation with multi-channel linear prediction and sparse priors for the desired signal. In Proceedings of joint workshop hands-free speech communication microphone arrays (HSCMA), Nancy, France (pp. 23–26).
go back to reference Jukic, A., van Waterschoot, T., Gerkmann, T., & Doclo, S. (2015). Multi-channel linear prediction-based speech dereverberation with sparse priors. IEEE Transactions on Audio, Speech and Language Processing, 23(9), 1509–1520.CrossRef Jukic, A., van Waterschoot, T., Gerkmann, T., & Doclo, S. (2015). Multi-channel linear prediction-based speech dereverberation with sparse priors. IEEE Transactions on Audio, Speech and Language Processing, 23(9), 1509–1520.CrossRef
go back to reference Kinoshita, K., Delcroix, M., Nakatani, T., & Miyoshi, T. (2009). Suppression of late reverberation effect on speech signal using long-term multiple-step linear prediction. IEEE Transactions on Audio, Speech and Language Processing, 17(4), 534–545.CrossRef Kinoshita, K., Delcroix, M., Nakatani, T., & Miyoshi, T. (2009). Suppression of late reverberation effect on speech signal using long-term multiple-step linear prediction. IEEE Transactions on Audio, Speech and Language Processing, 17(4), 534–545.CrossRef
go back to reference Moshirynia, M., Razzazi, F., & Haghbin, A. (2014), A speech dereverberation method using adaptive sparse dictionary learning. In Proceedings of REVERB challenge workshop (pp. 1–4). Moshirynia, M., Razzazi, F., & Haghbin, A. (2014), A speech dereverberation method using adaptive sparse dictionary learning. In Proceedings of REVERB challenge workshop (pp. 1–4).
go back to reference Nakatani, T., Juang, B. H., Yoshioka, T., Kinoshita, K., Delcroix, M., & Miyoshi, M. (2008a). Speech dereverberation based on maximum-likelihood estimation with time-varying Gaussian source model. IEEE Transactions on Audio, Speech and Language Processing, 16(8), 1512–1527.CrossRef Nakatani, T., Juang, B. H., Yoshioka, T., Kinoshita, K., Delcroix, M., & Miyoshi, M. (2008a). Speech dereverberation based on maximum-likelihood estimation with time-varying Gaussian source model. IEEE Transactions on Audio, Speech and Language Processing, 16(8), 1512–1527.CrossRef
go back to reference Nakatani, T, Yoshioka, T, Kinoshita, K., Miyoshi, M., & Juang, B. H. (2008b). Blind speech dereverberation with multi-channel linear prediction based on short time Fourier transform representation. In Proceedings of international conference acoustic speech and signal processing, Las Vegas, NV (pp. 85–88). Nakatani, T, Yoshioka, T, Kinoshita, K., Miyoshi, M., & Juang, B. H. (2008b). Blind speech dereverberation with multi-channel linear prediction based on short time Fourier transform representation. In Proceedings of international conference acoustic speech and signal processing, Las Vegas, NV (pp. 85–88).
go back to reference Nakatani, T., Yoshioka, T., Kinoshita, K., Miyoshi, M., & Juang, B. H. (2010). Speech dereverberation based on variance-normalized delayed linear prediction. IEEE Transactions on Audio, Speech and Language Processing, 18(7), 1717–1731.CrossRef Nakatani, T., Yoshioka, T., Kinoshita, K., Miyoshi, M., & Juang, B. H. (2010). Speech dereverberation based on variance-normalized delayed linear prediction. IEEE Transactions on Audio, Speech and Language Processing, 18(7), 1717–1731.CrossRef
go back to reference Novey, M., Adali, T., & Roy, A. (2010). A complex generalized Gaussian distribution characterization, generation, and estimation. IEEE Transactions on Signal Processing, 58(3), 1427–1433.MathSciNetCrossRefMATH Novey, M., Adali, T., & Roy, A. (2010). A complex generalized Gaussian distribution characterization, generation, and estimation. IEEE Transactions on Signal Processing, 58(3), 1427–1433.MathSciNetCrossRefMATH
go back to reference Parikh, N., & Boyd, S. (2014). Proximal algorithms. Foundations and Trends in Optimization, 1(3), 123–231.CrossRef Parikh, N., & Boyd, S. (2014). Proximal algorithms. Foundations and Trends in Optimization, 1(3), 123–231.CrossRef
go back to reference Schmid, D., Enzner, G., Malik, S., Kolossa, D., & Martin, R. (2014). Variational Bayesian inference for multichannel dereverberation and noise reduction. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(8), 1320–1335.CrossRef Schmid, D., Enzner, G., Malik, S., Kolossa, D., & Martin, R. (2014). Variational Bayesian inference for multichannel dereverberation and noise reduction. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(8), 1320–1335.CrossRef
go back to reference Schwartz, B., Gannot, S., & Habets, E. A. P. (2013). Multi-microphone speech dereverberation using expectation-maximization and Kalman smoother. In Proceedings of European signal processing conference (EUSIPCO), Marrakech, Morocco (pp. 1–5). Schwartz, B., Gannot, S., & Habets, E. A. P. (2013). Multi-microphone speech dereverberation using expectation-maximization and Kalman smoother. In Proceedings of European signal processing conference (EUSIPCO), Marrakech, Morocco (pp. 1–5).
go back to reference Wipf, D., & Nagarajan, S. (2010). Iterative reweighted l1 and l2 methods for finding sparse solutions. IEEE Journal of Selective Topics on Signal Processing, 4(2), 317–329.CrossRef Wipf, D., & Nagarajan, S. (2010). Iterative reweighted l1 and l2 methods for finding sparse solutions. IEEE Journal of Selective Topics on Signal Processing, 4(2), 317–329.CrossRef
go back to reference Wipf, D., & Zhang, H. (2013) Analysis of Bayesian blind deconvolution. In Proceedings of international conference of energy minimization methods and computational visual pattern recognition (EMMCVPR), Lund, Sweden, August 2013 (pp. 40–53). Wipf, D., & Zhang, H. (2013) Analysis of Bayesian blind deconvolution. In Proceedings of international conference of energy minimization methods and computational visual pattern recognition (EMMCVPR), Lund, Sweden, August 2013 (pp. 40–53).
go back to reference Yoshioka, T. (2010). Speech enhancement in reverberant environment. PhD. Thesis, Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University. Yoshioka, T. (2010). Speech enhancement in reverberant environment. PhD. Thesis, Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University.
go back to reference Yoshioka, T., & Nakatani, T. (2012). Generalization of multi-channel linear prediction methods for blind MIMO impulse response shortening. IEEE Transactions on Audio, Speech and Language Processing, 20(10), 2707–2720.CrossRef Yoshioka, T., & Nakatani, T. (2012). Generalization of multi-channel linear prediction methods for blind MIMO impulse response shortening. IEEE Transactions on Audio, Speech and Language Processing, 20(10), 2707–2720.CrossRef
Metadata
Title
Blind speech dereverberation using sparse decomposition and multi-channel linear prediction
Authors
Leila Mousavi
Farbod Razzazi
Afrooz Haghbin
Publication date
15-07-2019
Publisher
Springer US
Published in
International Journal of Speech Technology / Issue 3/2019
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-019-09620-x

Other articles of this Issue 3/2019

International Journal of Speech Technology 3/2019 Go to the issue