nach oben

International Journal of Speech Technology

Erschienen in:

24.11.2015

Hybridization of spectral filtering with particle swarm optimization for speech signal enhancement

verfasst von: R. Senthamizh Selvi, G. R. Suresh

Erschienen in: International Journal of Speech Technology | Ausgabe 1/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Speech enhancement has received a significant amount of research attention over the past several decades. The enhancement of speech signal is needed so as to improve the degraded signal and the goal is to separate a single mixture into its underlying clean speech and interferer components. This is achieved by having prior knowledge through learning and generation of masks accordingly. Hybridization of the spectral filtering and optimization algorithm is employed for speech enhancement in this paper. The proposed technique uses MMSE (Minimum Mean Squared Error) and PSO (Particle Swarm Optimization) for effective enhancement. The proposed technique is three module technique consisting of pre-processing module, optimization module and spectral filtering module. Loizou’s database and Aurora dataset are used for evaluating the proposed technique using standard evaluation metrics consists of PESQ and SNR. Comparative analysis is also made by comparing with other existing techniques such as MMSE and BNMF. Highest PESQ for proposed technique is 2.75 and highest SNR came about 32.97. The technique gave average PESQ of 2.18 and average SNR of 20.53 which was higher than the average values for other techniques. Hence, we can observe that proposed technique yielded better evaluation metrics than the existing methods.

Vorheriger Artikel Automatic speech segmentation in syllable centric speech recognition system

Nächster Artikel Speech coding using Best Tree Encoding (BTE) technique based on LPC and trigonometric features

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Banbrook, M., McLaughlin, S., & Mann, I. (1999). Speech characterization and synthesis by nonlinear methods. IEEE Transactions on Speech and Audio Processing, 7, 1–17.CrossRef

Boll, S. F. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions Acoustics Speech Signal Process, 27, 113–120.CrossRef

Choi, J.-H., & Chang, J.-H. (2012). On using acoustic environment classification for statistical model-based speech enhancement. Speech Communication, 54, 477–490.CrossRef

Choma, M. A., Sarunic, M. V., Yang, C., & Izatt, Joseph A. (2003). Sensitivity advantage of swept source and Fourier domain optical coherence tomography. Optics Express, 11(18), 2183–2189.CrossRef

Deller, J. R., Hansen, J. H. L., & Proakis, J. G. (2000). Discrete time processing of speech signals (2nd ed.). New York: IEEE Press.

Ding, H., Ding, I. Y., Koh, S. N., & Yeo, C. K. (2009). A spectral filtering method based on hybrid wiener filters for speech enhancement. Journal Speech Communication, 51(3), 259–267.CrossRef

Donoho, D. L. (1995). Denoising by soft thresholding. IEEE Transactions Information Theory, 41(3), 613–627.CrossRefMathSciNetMATH

Ephraim, Y. (1992). A Bayesian estimation approach for speech enhancement using hidden Markov models. IEEE Transaction Signal Processing, 40(4), 725–735.CrossRef

Ephraim, Y., & Malah, D. (1984a). Speech enhancement using a minimum mean-square error short-timespectral amplitude estimator, IEEE Transactions Acoustics speech Signal Process ASSP, 32(6), 1109–1121.CrossRef

Ephraim, Y., & Malah, D. (1984b). Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP, 32(6), 1109–1121.CrossRef

Ephraim, Y., & Malah, D. (1984c). Speech enhancement using a minimum mean square error short-time spectral amplitude estimator. IEEE Transaction Acoustics, Speech and Signal Processing ASSP, 32, 1109–1121.CrossRef

Ephraim, Y., & Van Trees, H. L. (1995). A signal subspace approach for speech enhancement. IEEE Transaction Speech and Audio Processing, 3(4), 251–266.CrossRef

Ghasemi, J., & Mollaei, M. R. K. (2009). A new approach for speech enhancement based on eigen value spectral subtraction. Signal Processing, 3(4), 34–41.

Gustafsson, H., Nordholm, S. E., & Claesson, I. (2001). Spectral subtraction using reduced delay convolution and adaptive averaging. IEEE Transactions on Speech and Audio Processing, 9(8), 799–807.CrossRef

Hirsch, H., & Pearce, D. (2000). The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions, ISCA ITRW ASR, September 18–20.

Hu, Y., & Loizou, P. (2006). Subjective comparison of speech enhancement algorithms. In Proceedings of IEEE international conference acoustics, speech, signal processing (vol. 1, pp. 153–156).

Hu, Y., & Loizou, P. C. (2008). Evaluation of objective quality measures for speech enhancement. IEEE transactions on audio, speech and language processing, 16(1), 229–238.CrossRef

Johnson, M. T., Lindgren, A. C., Povinelli, R. J., & Yuan, X. (2003). Performance of nonlinear speech enhancement using phase space recognition struction. ICASSP, 1, 1–920.

Kim, G., & Loizou, P. C. (2010). Improving speech intelligibility in noise using environment optimized algorithms. IEEE Transactions On Audio, Speech and Language Processing, 18(8), 2080–2090.CrossRef

Kim, G., Yang, L., Yi, H., & Loizoua, P. C. (2009). An algorithm that improves speech intelligibility in noise for normal-hearing listeners. Journal of the Acoustic Society of America, 126(3), 1486–1492.CrossRef

Kressner, A. A., Anderson, D. V., & Rozell, C. J. (2013). Causal binary mask estimation for speech enhancement using sparsity constraints. POMA, 19, 55037.

Loizou, P. C. (2007). Speech enhancement: theory and practice. Boca Raton: CRC Press.

Lollmann, H. W., & Vary, P. (2009). A blind speech enhancement algorithm for the suppression of late reverberation and noise. In Proceedings of IEEE international conference on acoustics, speech and signal processing, (pp. 3989–3992).

Madhu, N., Spriet, A., Jansen, S., Koning, R., & Wouters, J. (2013). The potential for speech intelligibility improvement using the ideal binary mask and the ideal wiener filter in single channel noise reduction systems: application to auditory prostheses. IEEE Transactions On Audio, Speech, and Language Processing, 21(1), 63–72.CrossRef

Moraglio, A., Di Chio, C., & Poli, R. (2007). Geometric particle swarm optimization. Lecture Notes in Computer Science, 4445, 125–136.CrossRef

Nickel, R. M., Astudillo, R. F., Kolossa, D., & Martin, R. (2013). Corpus-Based Speech Enhancement With Uncertainty Modeling and Cepstral Smoothing. IEEE Transactions On Audio, Speech, And Language Processing, 21(5), 983–997.CrossRef

Quackenbush, S., Barnwell, T., & Clements, M. (1988). Objective measures of speech quality. Englewood Cliffs, NJ: Prentice-Hall.

Rezayee, A., & Gazor, S. (2001). An adaptive KLT approach for speech enhancement. IEEE Transaction Speech and Audio Processing, 9(2), 87–95.CrossRef

Rix, A. W., Hollier, M. P., Hekstra, A. P., & Beerend, J. G. (2000). Perceptual evaluation of speech quality (PESQ), and objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs, ITU, ITU-T Rec. P. 862.

Sharma, B., Sehgal, S., & Nain, A. (2013). Particle swarm optimization and genetic algorithm based optimal power flow solutions. International Journal of Application or Innovation in Engineering and Management, 2(7), 307–315.

Siegel, N., Rosen, J., & Brooker, G. (2013). Faithful reconstruction of digital holograms captured by FINCH using a Hamming window function in the Fresnel propagation. Optics Letters, 38(19), 3922–3925.CrossRef

Sigg, C. D., Dikk, T., & Buhmann, J. M. (2012). Speech enhancement using generative dictionary learning. IEEE Transactions on Audio, Speech and Language Processing, 20(6), 1692–1718.CrossRef

Smita, P., & Vaidya B. N. (2012). Particle swarm optimization based optimal power flow for reactive loss minimization. In Proceedings of conference on electrical, electronics and computer science.

Soon, I. Y., Koh, S. N., & Yeo, C. K. (1998). Noisy speech enhancement using discrete cosine transform. Speech Communication, 24(3), 249–257.CrossRef

Suman, M., Khan, H., Latha, M. M., & Kumari, D. A. (2011). Speech enhancement and recognition of compressed speech signal in noisy reverberant conditions. International Research Journal of Signal Processing, 02(02), 80.

Thorpe, L., & Yang, W. (1999). Performance of current perceptual objective speech quality measures. In Proceedings of IEEE speech coding workshop (pp. 144–146).

Van den Bogaert, T., Doclo, S., Wouters, J., & Moonen, M. (2009). Speech enhancement with multichannel Wiener filter techniques in multimicrophone binaural hearing aids. Journal of the Acoustic Society of America, 125(1), 360–371.CrossRef

Welker, D. P., Greenberg, J. E., Desloge, J. G., & Zurek, P. M. (1997). Microphone-array hearing aids with binaural output-part II: a two microphone adaptive system. IEEE Transactions Speech Audio Process, 5, 543–551.CrossRef

Xiao, X., & Nickel, R. M. (2010). Speech enhancement with inventory style speech resynthesis. IEEE Transaction Audio, Speech, Language Processing, 18(6), 1243–1257.CrossRef

Titel: Hybridization of spectral filtering with particle swarm optimization for speech signal enhancement
verfasst von: R. Senthamizh Selvi
G. R. Suresh
Publikationsdatum: 24.11.2015
Verlag: Springer US
Erschienen in: International Journal of Speech Technology / Ausgabe 1/2016
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-015-9317-1

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Arbeitszeit/© granata68 / Fotolia, E-Autos im Fuhrpark: Lohnt sich das noch?/© Petair / stock.adobe.com, Kryptowährungen/© gopixa / Getty Images / iStock, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Sustainibility Finance/© Robert Kneschke / stock.adobe.com / Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 1/2016

Articulatory and excitation source features for speech recognition in read, extempore and conversation modes

Speech coding using Best Tree Encoding (BTE) technique based on LPC and trigonometric features

Pitch estimation of speech and music sound based on multi-scale product with auditory feature extraction

Efficient feature combination techniques for emotional speech classification

Automatic speech segmentation in syllable centric speech recognition system

Combining the evidences of temporal and spectral enhancement techniques for improving the performance of Indian language identification system in the presence of background noise

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.