Skip to main content
Top
Published in: International Journal of Speech Technology 1/2016

24-11-2015

Hybridization of spectral filtering with particle swarm optimization for speech signal enhancement

Authors: R. Senthamizh Selvi, G. R. Suresh

Published in: International Journal of Speech Technology | Issue 1/2016

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Speech enhancement has received a significant amount of research attention over the past several decades. The enhancement of speech signal is needed so as to improve the degraded signal and the goal is to separate a single mixture into its underlying clean speech and interferer components. This is achieved by having prior knowledge through learning and generation of masks accordingly. Hybridization of the spectral filtering and optimization algorithm is employed for speech enhancement in this paper. The proposed technique uses MMSE (Minimum Mean Squared Error) and PSO (Particle Swarm Optimization) for effective enhancement. The proposed technique is three module technique consisting of pre-processing module, optimization module and spectral filtering module. Loizou’s database and Aurora dataset are used for evaluating the proposed technique using standard evaluation metrics consists of PESQ and SNR. Comparative analysis is also made by comparing with other existing techniques such as MMSE and BNMF. Highest PESQ for proposed technique is 2.75 and highest SNR came about 32.97. The technique gave average PESQ of 2.18 and average SNR of 20.53 which was higher than the average values for other techniques. Hence, we can observe that proposed technique yielded better evaluation metrics than the existing methods.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Banbrook, M., McLaughlin, S., & Mann, I. (1999). Speech characterization and synthesis by nonlinear methods. IEEE Transactions on Speech and Audio Processing, 7, 1–17.CrossRef Banbrook, M., McLaughlin, S., & Mann, I. (1999). Speech characterization and synthesis by nonlinear methods. IEEE Transactions on Speech and Audio Processing, 7, 1–17.CrossRef
go back to reference Boll, S. F. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions Acoustics Speech Signal Process, 27, 113–120.CrossRef Boll, S. F. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions Acoustics Speech Signal Process, 27, 113–120.CrossRef
go back to reference Choi, J.-H., & Chang, J.-H. (2012). On using acoustic environment classification for statistical model-based speech enhancement. Speech Communication, 54, 477–490.CrossRef Choi, J.-H., & Chang, J.-H. (2012). On using acoustic environment classification for statistical model-based speech enhancement. Speech Communication, 54, 477–490.CrossRef
go back to reference Choma, M. A., Sarunic, M. V., Yang, C., & Izatt, Joseph A. (2003). Sensitivity advantage of swept source and Fourier domain optical coherence tomography. Optics Express, 11(18), 2183–2189.CrossRef Choma, M. A., Sarunic, M. V., Yang, C., & Izatt, Joseph A. (2003). Sensitivity advantage of swept source and Fourier domain optical coherence tomography. Optics Express, 11(18), 2183–2189.CrossRef
go back to reference Deller, J. R., Hansen, J. H. L., & Proakis, J. G. (2000). Discrete time processing of speech signals (2nd ed.). New York: IEEE Press. Deller, J. R., Hansen, J. H. L., & Proakis, J. G. (2000). Discrete time processing of speech signals (2nd ed.). New York: IEEE Press.
go back to reference Ding, H., Ding, I. Y., Koh, S. N., & Yeo, C. K. (2009). A spectral filtering method based on hybrid wiener filters for speech enhancement. Journal Speech Communication, 51(3), 259–267.CrossRef Ding, H., Ding, I. Y., Koh, S. N., & Yeo, C. K. (2009). A spectral filtering method based on hybrid wiener filters for speech enhancement. Journal Speech Communication, 51(3), 259–267.CrossRef
go back to reference Ephraim, Y. (1992). A Bayesian estimation approach for speech enhancement using hidden Markov models. IEEE Transaction Signal Processing, 40(4), 725–735.CrossRef Ephraim, Y. (1992). A Bayesian estimation approach for speech enhancement using hidden Markov models. IEEE Transaction Signal Processing, 40(4), 725–735.CrossRef
go back to reference Ephraim, Y., & Malah, D. (1984a). Speech enhancement using a minimum mean-square error short-timespectral amplitude estimator, IEEE Transactions Acoustics speech Signal Process ASSP, 32(6), 1109–1121.CrossRef Ephraim, Y., & Malah, D. (1984a). Speech enhancement using a minimum mean-square error short-timespectral amplitude estimator, IEEE Transactions Acoustics speech Signal Process ASSP, 32(6), 1109–1121.CrossRef
go back to reference Ephraim, Y., & Malah, D. (1984b). Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP, 32(6), 1109–1121.CrossRef Ephraim, Y., & Malah, D. (1984b). Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP, 32(6), 1109–1121.CrossRef
go back to reference Ephraim, Y., & Malah, D. (1984c). Speech enhancement using a minimum mean square error short-time spectral amplitude estimator. IEEE Transaction Acoustics, Speech and Signal Processing ASSP, 32, 1109–1121.CrossRef Ephraim, Y., & Malah, D. (1984c). Speech enhancement using a minimum mean square error short-time spectral amplitude estimator. IEEE Transaction Acoustics, Speech and Signal Processing ASSP, 32, 1109–1121.CrossRef
go back to reference Ephraim, Y., & Van Trees, H. L. (1995). A signal subspace approach for speech enhancement. IEEE Transaction Speech and Audio Processing, 3(4), 251–266.CrossRef Ephraim, Y., & Van Trees, H. L. (1995). A signal subspace approach for speech enhancement. IEEE Transaction Speech and Audio Processing, 3(4), 251–266.CrossRef
go back to reference Ghasemi, J., & Mollaei, M. R. K. (2009). A new approach for speech enhancement based on eigen value spectral subtraction. Signal Processing, 3(4), 34–41. Ghasemi, J., & Mollaei, M. R. K. (2009). A new approach for speech enhancement based on eigen value spectral subtraction. Signal Processing, 3(4), 34–41.
go back to reference Gustafsson, H., Nordholm, S. E., & Claesson, I. (2001). Spectral subtraction using reduced delay convolution and adaptive averaging. IEEE Transactions on Speech and Audio Processing, 9(8), 799–807.CrossRef Gustafsson, H., Nordholm, S. E., & Claesson, I. (2001). Spectral subtraction using reduced delay convolution and adaptive averaging. IEEE Transactions on Speech and Audio Processing, 9(8), 799–807.CrossRef
go back to reference Hirsch, H., & Pearce, D. (2000). The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions, ISCA ITRW ASR, September 18–20. Hirsch, H., & Pearce, D. (2000). The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions, ISCA ITRW ASR, September 18–20.
go back to reference Hu, Y., & Loizou, P. (2006). Subjective comparison of speech enhancement algorithms. In Proceedings of IEEE international conference acoustics, speech, signal processing (vol. 1, pp. 153–156). Hu, Y., & Loizou, P. (2006). Subjective comparison of speech enhancement algorithms. In Proceedings of IEEE international conference acoustics, speech, signal processing (vol. 1, pp. 153–156).
go back to reference Hu, Y., & Loizou, P. C. (2008). Evaluation of objective quality measures for speech enhancement. IEEE transactions on audio, speech and language processing, 16(1), 229–238.CrossRef Hu, Y., & Loizou, P. C. (2008). Evaluation of objective quality measures for speech enhancement. IEEE transactions on audio, speech and language processing, 16(1), 229–238.CrossRef
go back to reference Johnson, M. T., Lindgren, A. C., Povinelli, R. J., & Yuan, X. (2003). Performance of nonlinear speech enhancement using phase space recognition struction. ICASSP, 1, 1–920. Johnson, M. T., Lindgren, A. C., Povinelli, R. J., & Yuan, X. (2003). Performance of nonlinear speech enhancement using phase space recognition struction. ICASSP, 1, 1–920.
go back to reference Kim, G., & Loizou, P. C. (2010). Improving speech intelligibility in noise using environment optimized algorithms. IEEE Transactions On Audio, Speech and Language Processing, 18(8), 2080–2090.CrossRef Kim, G., & Loizou, P. C. (2010). Improving speech intelligibility in noise using environment optimized algorithms. IEEE Transactions On Audio, Speech and Language Processing, 18(8), 2080–2090.CrossRef
go back to reference Kim, G., Yang, L., Yi, H., & Loizoua, P. C. (2009). An algorithm that improves speech intelligibility in noise for normal-hearing listeners. Journal of the Acoustic Society of America, 126(3), 1486–1492.CrossRef Kim, G., Yang, L., Yi, H., & Loizoua, P. C. (2009). An algorithm that improves speech intelligibility in noise for normal-hearing listeners. Journal of the Acoustic Society of America, 126(3), 1486–1492.CrossRef
go back to reference Kressner, A. A., Anderson, D. V., & Rozell, C. J. (2013). Causal binary mask estimation for speech enhancement using sparsity constraints. POMA, 19, 55037. Kressner, A. A., Anderson, D. V., & Rozell, C. J. (2013). Causal binary mask estimation for speech enhancement using sparsity constraints. POMA, 19, 55037.
go back to reference Loizou, P. C. (2007). Speech enhancement: theory and practice. Boca Raton: CRC Press. Loizou, P. C. (2007). Speech enhancement: theory and practice. Boca Raton: CRC Press.
go back to reference Lollmann, H. W., & Vary, P. (2009). A blind speech enhancement algorithm for the suppression of late reverberation and noise. In Proceedings of IEEE international conference on acoustics, speech and signal processing, (pp. 3989–3992). Lollmann, H. W., & Vary, P. (2009). A blind speech enhancement algorithm for the suppression of late reverberation and noise. In Proceedings of IEEE international conference on acoustics, speech and signal processing, (pp. 3989–3992).
go back to reference Madhu, N., Spriet, A., Jansen, S., Koning, R., & Wouters, J. (2013). The potential for speech intelligibility improvement using the ideal binary mask and the ideal wiener filter in single channel noise reduction systems: application to auditory prostheses. IEEE Transactions On Audio, Speech, and Language Processing, 21(1), 63–72.CrossRef Madhu, N., Spriet, A., Jansen, S., Koning, R., & Wouters, J. (2013). The potential for speech intelligibility improvement using the ideal binary mask and the ideal wiener filter in single channel noise reduction systems: application to auditory prostheses. IEEE Transactions On Audio, Speech, and Language Processing, 21(1), 63–72.CrossRef
go back to reference Moraglio, A., Di Chio, C., & Poli, R. (2007). Geometric particle swarm optimization. Lecture Notes in Computer Science, 4445, 125–136.CrossRef Moraglio, A., Di Chio, C., & Poli, R. (2007). Geometric particle swarm optimization. Lecture Notes in Computer Science, 4445, 125–136.CrossRef
go back to reference Nickel, R. M., Astudillo, R. F., Kolossa, D., & Martin, R. (2013). Corpus-Based Speech Enhancement With Uncertainty Modeling and Cepstral Smoothing. IEEE Transactions On Audio, Speech, And Language Processing, 21(5), 983–997.CrossRef Nickel, R. M., Astudillo, R. F., Kolossa, D., & Martin, R. (2013). Corpus-Based Speech Enhancement With Uncertainty Modeling and Cepstral Smoothing. IEEE Transactions On Audio, Speech, And Language Processing, 21(5), 983–997.CrossRef
go back to reference Quackenbush, S., Barnwell, T., & Clements, M. (1988). Objective measures of speech quality. Englewood Cliffs, NJ: Prentice-Hall. Quackenbush, S., Barnwell, T., & Clements, M. (1988). Objective measures of speech quality. Englewood Cliffs, NJ: Prentice-Hall.
go back to reference Rezayee, A., & Gazor, S. (2001). An adaptive KLT approach for speech enhancement. IEEE Transaction Speech and Audio Processing, 9(2), 87–95.CrossRef Rezayee, A., & Gazor, S. (2001). An adaptive KLT approach for speech enhancement. IEEE Transaction Speech and Audio Processing, 9(2), 87–95.CrossRef
go back to reference Rix, A. W., Hollier, M. P., Hekstra, A. P., & Beerend, J. G. (2000). Perceptual evaluation of speech quality (PESQ), and objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs, ITU, ITU-T Rec. P. 862. Rix, A. W., Hollier, M. P., Hekstra, A. P., & Beerend, J. G. (2000). Perceptual evaluation of speech quality (PESQ), and objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs, ITU, ITU-T Rec. P. 862.
go back to reference Sharma, B., Sehgal, S., & Nain, A. (2013). Particle swarm optimization and genetic algorithm based optimal power flow solutions. International Journal of Application or Innovation in Engineering and Management, 2(7), 307–315. Sharma, B., Sehgal, S., & Nain, A. (2013). Particle swarm optimization and genetic algorithm based optimal power flow solutions. International Journal of Application or Innovation in Engineering and Management, 2(7), 307–315.
go back to reference Siegel, N., Rosen, J., & Brooker, G. (2013). Faithful reconstruction of digital holograms captured by FINCH using a Hamming window function in the Fresnel propagation. Optics Letters, 38(19), 3922–3925.CrossRef Siegel, N., Rosen, J., & Brooker, G. (2013). Faithful reconstruction of digital holograms captured by FINCH using a Hamming window function in the Fresnel propagation. Optics Letters, 38(19), 3922–3925.CrossRef
go back to reference Sigg, C. D., Dikk, T., & Buhmann, J. M. (2012). Speech enhancement using generative dictionary learning. IEEE Transactions on Audio, Speech and Language Processing, 20(6), 1692–1718.CrossRef Sigg, C. D., Dikk, T., & Buhmann, J. M. (2012). Speech enhancement using generative dictionary learning. IEEE Transactions on Audio, Speech and Language Processing, 20(6), 1692–1718.CrossRef
go back to reference Smita, P., & Vaidya B. N. (2012). Particle swarm optimization based optimal power flow for reactive loss minimization. In Proceedings of conference on electrical, electronics and computer science. Smita, P., & Vaidya B. N. (2012). Particle swarm optimization based optimal power flow for reactive loss minimization. In Proceedings of conference on electrical, electronics and computer science.
go back to reference Soon, I. Y., Koh, S. N., & Yeo, C. K. (1998). Noisy speech enhancement using discrete cosine transform. Speech Communication, 24(3), 249–257.CrossRef Soon, I. Y., Koh, S. N., & Yeo, C. K. (1998). Noisy speech enhancement using discrete cosine transform. Speech Communication, 24(3), 249–257.CrossRef
go back to reference Suman, M., Khan, H., Latha, M. M., & Kumari, D. A. (2011). Speech enhancement and recognition of compressed speech signal in noisy reverberant conditions. International Research Journal of Signal Processing, 02(02), 80. Suman, M., Khan, H., Latha, M. M., & Kumari, D. A. (2011). Speech enhancement and recognition of compressed speech signal in noisy reverberant conditions. International Research Journal of Signal Processing, 02(02), 80.
go back to reference Thorpe, L., & Yang, W. (1999). Performance of current perceptual objective speech quality measures. In Proceedings of IEEE speech coding workshop (pp. 144–146). Thorpe, L., & Yang, W. (1999). Performance of current perceptual objective speech quality measures. In Proceedings of IEEE speech coding workshop (pp. 144–146).
go back to reference Van den Bogaert, T., Doclo, S., Wouters, J., & Moonen, M. (2009). Speech enhancement with multichannel Wiener filter techniques in multimicrophone binaural hearing aids. Journal of the Acoustic Society of America, 125(1), 360–371.CrossRef Van den Bogaert, T., Doclo, S., Wouters, J., & Moonen, M. (2009). Speech enhancement with multichannel Wiener filter techniques in multimicrophone binaural hearing aids. Journal of the Acoustic Society of America, 125(1), 360–371.CrossRef
go back to reference Welker, D. P., Greenberg, J. E., Desloge, J. G., & Zurek, P. M. (1997). Microphone-array hearing aids with binaural output-part II: a two microphone adaptive system. IEEE Transactions Speech Audio Process, 5, 543–551.CrossRef Welker, D. P., Greenberg, J. E., Desloge, J. G., & Zurek, P. M. (1997). Microphone-array hearing aids with binaural output-part II: a two microphone adaptive system. IEEE Transactions Speech Audio Process, 5, 543–551.CrossRef
go back to reference Xiao, X., & Nickel, R. M. (2010). Speech enhancement with inventory style speech resynthesis. IEEE Transaction Audio, Speech, Language Processing, 18(6), 1243–1257.CrossRef Xiao, X., & Nickel, R. M. (2010). Speech enhancement with inventory style speech resynthesis. IEEE Transaction Audio, Speech, Language Processing, 18(6), 1243–1257.CrossRef
Metadata
Title
Hybridization of spectral filtering with particle swarm optimization for speech signal enhancement
Authors
R. Senthamizh Selvi
G. R. Suresh
Publication date
24-11-2015
Publisher
Springer US
Published in
International Journal of Speech Technology / Issue 1/2016
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-015-9317-1

Other articles of this Issue 1/2016

International Journal of Speech Technology 1/2016 Go to the issue