Skip to main content
Erschienen in: International Journal of Speech Technology 4/2017

23.10.2017

Single-channel speech separation using combined EMD and speech-specific information

verfasst von: M. K. Prasanna Kumar, R. Kumaraswamy

Erschienen in: International Journal of Speech Technology | Ausgabe 4/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Multi-channel blind source separation (BSS) methods use more than one microphone. There is a need to develop speech separation algorithms under single microphone scenario. In this paper we propose a method for single channel speech separation (SCSS) by combining empirical mode decomposition (EMD) and speech specific information. Speech specific information is derived in the form of source-filter features. Source features are obtained using multi pitch information. Filter information is estimated using formant analysis. To track multi pitch information in the mixed signal we apply simple-inverse filtering tracking (SIFT) and histogram based pitch estimation to excitation source information. Formant estimation is done using linear predictive (LP) analysis. Pitch and formant estimation are done with and without EMD decomposition for better extraction of the individual speakers in the mixture. Combining EMD with speech specific information provides encouraging results for single-channel speech separation.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Bofill, P. (2008). Identifying single source data for mixing matrix estimation in instantaneous blind source separation Proceedings of the ICANN (pp. 759–767). Berlin: Springer. Bofill, P. (2008). Identifying single source data for mixing matrix estimation in instantaneous blind source separation Proceedings of the ICANN (pp. 759–767). Berlin: Springer.
Zurück zum Zitat Douglas, S. C., & Sawada, H., & Makino S. (2005). Natural gradient Multichannel blind deconvolution and speech separation using causal FIR filters”. IEEE Transactions on Speech Audio Processing, 13(1), 92–104.CrossRef Douglas, S. C., & Sawada, H., & Makino S. (2005). Natural gradient Multichannel blind deconvolution and speech separation using causal FIR filters”. IEEE Transactions on Speech Audio Processing, 13(1), 92–104.CrossRef
Zurück zum Zitat Ellis, D. P. (2006). Model-based scene analysis. Computational auditory scene analysis: Principles, algorithms, and applications, 4, 115–146. Ellis, D. P. (2006). Model-based scene analysis. Computational auditory scene analysis: Principles, algorithms, and applications, 4, 115–146.
Zurück zum Zitat Fevotte, C., & Godsill, S. J. (2006). A Bayesian approach for blind separation of sparse sources. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 2174–2188.CrossRefMATH Fevotte, C., & Godsill, S. J. (2006). A Bayesian approach for blind separation of sparse sources. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 2174–2188.CrossRefMATH
Zurück zum Zitat Gao, B., Woo, W. L., & Dlay, S. S. (2011). Single-channel source separation using EMD-subband variable regularized sparse features. IEEE Transactions on Audio, Speech, and Language Processing, 19(4), 961–976.CrossRef Gao, B., Woo, W. L., & Dlay, S. S. (2011). Single-channel source separation using EMD-subband variable regularized sparse features. IEEE Transactions on Audio, Speech, and Language Processing, 19(4), 961–976.CrossRef
Zurück zum Zitat Gao, B., Woo, W. L., & Dlay, S. S. (2013). Unsupervised single-channel separation of nonstationary signals using gammatone filterbank and itakura–saito nonnegative matrix two-dimensional factorizations. IEEE Transactions on Circuits and Systems I: Regular Papers, 60(3), 662–675.CrossRefMathSciNet Gao, B., Woo, W. L., & Dlay, S. S. (2013). Unsupervised single-channel separation of nonstationary signals using gammatone filterbank and itakura–saito nonnegative matrix two-dimensional factorizations. IEEE Transactions on Circuits and Systems I: Regular Papers, 60(3), 662–675.CrossRefMathSciNet
Zurück zum Zitat Greenwood, M., & Kinghorn, A. (1999). SUVing: Automatic silence/unvoiced/voiced classification of speech. Sheffield: Undergraduate Coursework, Department of Computer Science, The University of Sheffield. Greenwood, M., & Kinghorn, A. (1999). SUVing: Automatic silence/unvoiced/voiced classification of speech. Sheffield: Undergraduate Coursework, Department of Computer Science, The University of Sheffield.
Zurück zum Zitat Huang, N. E., & Shen, Z., & Long S. R. (1998). The empirical mode decomposition and Hilbert spectrum for nonlinear and non-stationary time series analysis”. Proceedings of the Royal Society of London A, 454, 903–995.CrossRefMathSciNetMATH Huang, N. E., & Shen, Z., & Long S. R. (1998). The empirical mode decomposition and Hilbert spectrum for nonlinear and non-stationary time series analysis”. Proceedings of the Royal Society of London A, 454, 903–995.CrossRefMathSciNetMATH
Zurück zum Zitat Jang, G. J., & Lee, T. W. (2003). A maximum likelihood approach to single-channel source separation. Journal of Machine Learning Research, 4, 1365–1392.MathSciNetMATH Jang, G. J., & Lee, T. W. (2003). A maximum likelihood approach to single-channel source separation. Journal of Machine Learning Research, 4, 1365–1392.MathSciNetMATH
Zurück zum Zitat Karhunen, J., & Oja, E. (2001). Independent component analysis. Hoboken: Wiley.MATH Karhunen, J., & Oja, E. (2001). Independent component analysis. Hoboken: Wiley.MATH
Zurück zum Zitat Kristjansson, T., Attias, H., & Hershey, J. (2004). Single microphone source separation using high resolution signal reconstruction. In IEEE Proceedings.(ICASSP’04). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004. (Vol. 2, pp. ii-817). Kristjansson, T., Attias, H., & Hershey, J. (2004). Single microphone source separation using high resolution signal reconstruction. In IEEE Proceedings.(ICASSP’04). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004. (Vol. 2, pp. ii-817).
Zurück zum Zitat Li, P., Guan, Y., Xu, B., & Liu, W. (2006). Monaural speech separation based on computational auditory scene analysis and objective quality assessment of speech. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 2014–2023.CrossRef Li, P., Guan, Y., Xu, B., & Liu, W. (2006). Monaural speech separation based on computational auditory scene analysis and objective quality assessment of speech. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 2014–2023.CrossRef
Zurück zum Zitat Li, Y., Amari, S. I., Cichocki, A., Ho, D. W., & Xie, S. (2006). Underdetermined blind source separation based on sparse representation. IEEE Transactions on Signal Processing, 54(2), 423–437.CrossRefMATH Li, Y., Amari, S. I., Cichocki, A., Ho, D. W., & Xie, S. (2006). Underdetermined blind source separation based on sparse representation. IEEE Transactions on Signal Processing, 54(2), 423–437.CrossRefMATH
Zurück zum Zitat Litvin, Y., & Cohen, I. (2011). Single-channel source separation of audio signals using bark scale wavelet packet decomposition. Journal of Signal Processing Systems, 65(3), 339–350.CrossRef Litvin, Y., & Cohen, I. (2011). Single-channel source separation of audio signals using bark scale wavelet packet decomposition. Journal of Signal Processing Systems, 65(3), 339–350.CrossRef
Zurück zum Zitat Mijovic, B., De Vos, M., Gligorijevic, I., Taelman, J., & Van Huffel, S. (2010). Source separation from single-channel recordings by combining empirical-mode decomposition and independent component analysis. IEEE Transactions on Biomedical Engineering, 57(9), 2188–2196.CrossRef Mijovic, B., De Vos, M., Gligorijevic, I., Taelman, J., & Van Huffel, S. (2010). Source separation from single-channel recordings by combining empirical-mode decomposition and independent component analysis. IEEE Transactions on Biomedical Engineering, 57(9), 2188–2196.CrossRef
Zurück zum Zitat Molla, M. K. I., & Hirose, K. (2007). Single-mixture audio source separation by subspace decomposition of Hilbert spectrum. IEEE Transactions on Audio, Speech, and Language Processing, 15(3), 893–900.CrossRef Molla, M. K. I., & Hirose, K. (2007). Single-mixture audio source separation by subspace decomposition of Hilbert spectrum. IEEE Transactions on Audio, Speech, and Language Processing, 15(3), 893–900.CrossRef
Zurück zum Zitat Ozerov, A., & Févotte, C. (2010). Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation. IEEE Transactions on Audio, Speech, and Language Processing, 18(3), 550–563.CrossRef Ozerov, A., & Févotte, C. (2010). Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation. IEEE Transactions on Audio, Speech, and Language Processing, 18(3), 550–563.CrossRef
Zurück zum Zitat Reyes-Gomez, M. J., Ellis, D. P., & Jojic, N. (2004). Multiband audio modeling for single-channel acoustic source separation. In IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP’04). (Vol. 5, pp. V-641). Reyes-Gomez, M. J., Ellis, D. P., & Jojic, N. (2004). Multiband audio modeling for single-channel acoustic source separation. In IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP’04). (Vol. 5, pp. V-641).
Zurück zum Zitat Schmidt, M. N., & Olsson, R. K. (2006). Single-channel speech separation using sparse non-negative matrix factorization. In Spoken Language Proceesing, ISCA International Conference on (INTERSPEECH). Schmidt, M. N., & Olsson, R. K. (2006). Single-channel speech separation using sparse non-negative matrix factorization. In Spoken Language Proceesing, ISCA International Conference on (INTERSPEECH).
Zurück zum Zitat Snell, R. C., & Milinazzo, F. (1993). Formant location from LPC analysis data. IEEE Transactions on Speech and Audio Processing, 1(2), 129–134.CrossRef Snell, R. C., & Milinazzo, F. (1993). Formant location from LPC analysis data. IEEE Transactions on Speech and Audio Processing, 1(2), 129–134.CrossRef
Zurück zum Zitat Stark, M., Wohlmayr, M., & Pernkopf, F. (2011). Source–filter-based single-channel speech separation using pitch information. IEEE Transactions on Audio, Speech, and Language Processing, 19(2), 242–255.CrossRef Stark, M., Wohlmayr, M., & Pernkopf, F. (2011). Source–filter-based single-channel speech separation using pitch information. IEEE Transactions on Audio, Speech, and Language Processing, 19(2), 242–255.CrossRef
Zurück zum Zitat Tengtrairat, N., Gao, B., Woo, W. L., & Dlay, S. S. (2013). Single-channel blind separation using pseudo-stereo mixture and complex 2-D histogram. IEEE Transactions on Neural Networks and Learning Systems, 24(11), 1722–1735.CrossRef Tengtrairat, N., Gao, B., Woo, W. L., & Dlay, S. S. (2013). Single-channel blind separation using pseudo-stereo mixture and complex 2-D histogram. IEEE Transactions on Neural Networks and Learning Systems, 24(11), 1722–1735.CrossRef
Zurück zum Zitat Vincent, E., Bertin, N., Gribonval, R., & Bimbot, F. (2014). From blind to guided audio source separation: How models and side information can improve the separation of sound. IEEE Signal Processing Magazine, 31(3), 107–115.CrossRef Vincent, E., Bertin, N., Gribonval, R., & Bimbot, F. (2014). From blind to guided audio source separation: How models and side information can improve the separation of sound. IEEE Signal Processing Magazine, 31(3), 107–115.CrossRef
Zurück zum Zitat Vincent, E., Gribonval, R., & Févotte, C. (2006). Performance measurement in blind audio source separation. IEEE Transactions on Audio, Speech, and Language Processing, 14(4), 1462–1469.CrossRef Vincent, E., Gribonval, R., & Févotte, C. (2006). Performance measurement in blind audio source separation. IEEE Transactions on Audio, Speech, and Language Processing, 14(4), 1462–1469.CrossRef
Zurück zum Zitat Virtanen, T. (2007). Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria. IEEE Transactions on Audio, Speech, and Language Processing, 15(3), 1066–1074.CrossRef Virtanen, T. (2007). Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria. IEEE Transactions on Audio, Speech, and Language Processing, 15(3), 1066–1074.CrossRef
Zurück zum Zitat Wang, Y. H., Yeh, C. H., Young, H. W. V., Hu, K., & Lo, M. T. (2014). On the computational complexity of the empirical mode decomposition algorithm. Physica A: Statistical Mechanics and its Applications, 400, 159–167.CrossRef Wang, Y. H., Yeh, C. H., Young, H. W. V., Hu, K., & Lo, M. T. (2014). On the computational complexity of the empirical mode decomposition algorithm. Physica A: Statistical Mechanics and its Applications, 400, 159–167.CrossRef
Zurück zum Zitat Wu, Z., & Huang, N. E. (2009). Ensemble empirical mode decomposition: A noise-assisted data analysis method. Advances in Adaptive Data Analysis, 1(01), 1–41.CrossRef Wu, Z., & Huang, N. E. (2009). Ensemble empirical mode decomposition: A noise-assisted data analysis method. Advances in Adaptive Data Analysis, 1(01), 1–41.CrossRef
Zurück zum Zitat Yegnanarayana, B., Swamy, R. K., & Murty, K. S. R. (2009). Determining mixing parameters from multispeaker data using speech-specific information. IEEE Transactions on Audio, Speech, and Language Processing, 17(6), 1196–1207.CrossRef Yegnanarayana, B., Swamy, R. K., & Murty, K. S. R. (2009). Determining mixing parameters from multispeaker data using speech-specific information. IEEE Transactions on Audio, Speech, and Language Processing, 17(6), 1196–1207.CrossRef
Zurück zum Zitat Yeh, J. R., Shieh, J. S., & Huang, N. E. (2010). Complementary ensemble empirical mode decomposition: A novel noise enhanced data analysis method. Advances in Adaptive Data Analysis, 2(02), 135–156.CrossRefMathSciNet Yeh, J. R., Shieh, J. S., & Huang, N. E. (2010). Complementary ensemble empirical mode decomposition: A novel noise enhanced data analysis method. Advances in Adaptive Data Analysis, 2(02), 135–156.CrossRefMathSciNet
Zurück zum Zitat Yilmaz, O., & Rickard, S. (2004). Blind separation of speech mixtures via time frequency masking. IEEE Transactions on Signal Processing, 52(7), 1830–1847.CrossRefMathSciNetMATH Yilmaz, O., & Rickard, S. (2004). Blind separation of speech mixtures via time frequency masking. IEEE Transactions on Signal Processing, 52(7), 1830–1847.CrossRefMathSciNetMATH
Metadaten
Titel
Single-channel speech separation using combined EMD and speech-specific information
verfasst von
M. K. Prasanna Kumar
R. Kumaraswamy
Publikationsdatum
23.10.2017
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 4/2017
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-017-9468-3

Weitere Artikel der Ausgabe 4/2017

International Journal of Speech Technology 4/2017 Zur Ausgabe

Neuer Inhalt