Skip to main content
Top
Published in: International Journal of Speech Technology 1/2017

29-11-2016

Single-channel speech separation using empirical mode decomposition and multi pitch information with estimation of number of speakers

Authors: M. K. Prasanna Kumar, R. Kumaraswamy

Published in: International Journal of Speech Technology | Issue 1/2017

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Speech separation is an essential part of any voice recognition system like speaker recognition, speech recognition and hearing aids etc. When speech separation is applied at the front-end of any voice recognition system increases the performance efficiency of that particular system. In this paper we propose a system for single channel speech separation by combining empirical mode decomposition (EMD) and multi pitch information. The proposed method is completely unsupervised and requires no knowledge of the underlying speakers. In this method we apply EMD to short frames of the mixed speech for better estimation of the speech specific information. Speech specific information is derived through multi pitch tracking. To track multi pitch information from the mixed signal we apply simple-inverse filtering tracking and histogram based pitch estimation to excitation source information along with estimating the number of speakers present in the mixed signal.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Bofill, P. (2008). Identifying single source data for mixing matrix estimation in instantaneous blind source separation. Proceedings of the ICANN, 5163, 759–767. Bofill, P. (2008). Identifying single source data for mixing matrix estimation in instantaneous blind source separation. Proceedings of the ICANN, 5163, 759–767.
go back to reference Douglas, S. C., Sawada, H., & Makino, S. (2005). Natural gradient Multichannel blind deconvolution and speech separation using causal FIR filters. IEEE Transactions on Speech Audio Processing, 13(1), 92–104.CrossRef Douglas, S. C., Sawada, H., & Makino, S. (2005). Natural gradient Multichannel blind deconvolution and speech separation using causal FIR filters. IEEE Transactions on Speech Audio Processing, 13(1), 92–104.CrossRef
go back to reference Ellis, D. (2006). Model based scene analysis. In D. Wang & G. Brown (Eds.), Computational auditory scene analysis: Principles, algorithms and applications. New York: Wiley. Ellis, D. (2006). Model based scene analysis. In D. Wang & G. Brown (Eds.), Computational auditory scene analysis: Principles, algorithms and applications. New York: Wiley.
go back to reference Fevotte, C., & Godsill, S. J. (2006). A baysean approach for blind separation of sparse sources. IEEE Transactions on Audio, Speech and Language Processing, 14(6), 2174–2188.CrossRef Fevotte, C., & Godsill, S. J. (2006). A baysean approach for blind separation of sparse sources. IEEE Transactions on Audio, Speech and Language Processing, 14(6), 2174–2188.CrossRef
go back to reference Gao, B., Woo, W. L., & Dlay, S. S. (2011). Single channel source separation using EMD sub band variable regularized sparse features. IEEE Transactions on Audio, Speech and Language Processing, 19(4), 961–976.CrossRef Gao, B., Woo, W. L., & Dlay, S. S. (2011). Single channel source separation using EMD sub band variable regularized sparse features. IEEE Transactions on Audio, Speech and Language Processing, 19(4), 961–976.CrossRef
go back to reference Gao, B., Woo, W. L., & Dlay, S. S. (2013). Unsupervised single Channel separation of non stationary signals using Gammatone filter bank and Itakura-Satio nonnegative matrix two-dimensional factorizations. IEEE Transactions on Circuits and Systems, 60(3), 662–675.MathSciNetCrossRef Gao, B., Woo, W. L., & Dlay, S. S. (2013). Unsupervised single Channel separation of non stationary signals using Gammatone filter bank and Itakura-Satio nonnegative matrix two-dimensional factorizations. IEEE Transactions on Circuits and Systems, 60(3), 662–675.MathSciNetCrossRef
go back to reference Huang, N. E., Shen, Z., & Long, S. R. (1998). The empirical mode decomposition and Hilbert spectrum for nonlinear and non-stationary time series analysis. Proceedings of Royal Society of London, 454, 903–995.MathSciNetCrossRefMATH Huang, N. E., Shen, Z., & Long, S. R. (1998). The empirical mode decomposition and Hilbert spectrum for nonlinear and non-stationary time series analysis. Proceedings of Royal Society of London, 454, 903–995.MathSciNetCrossRefMATH
go back to reference Jang, G. J., & Lee, T. W. (2003). A maximum likelihood approach to single channel source separation. Journal of Machine Learning Research, 4, 1365–1392.MathSciNetMATH Jang, G. J., & Lee, T. W. (2003). A maximum likelihood approach to single channel source separation. Journal of Machine Learning Research, 4, 1365–1392.MathSciNetMATH
go back to reference Karhunen, J., & Oja, E. (2001). Independent component analysis. New York: John Wiley Sons.MATH Karhunen, J., & Oja, E. (2001). Independent component analysis. New York: John Wiley Sons.MATH
go back to reference Kristjansson, T., Attias, H., & Hershey, J. (2004) Single microphone source separation using high resolution signal reconstruction. In Proceedings of International Conference on Acoustics, Speech, Signal Processing, (ICASSP’04, (Vol. 2, pp. 817–820). Montreal, QC. Kristjansson, T., Attias, H., & Hershey, J. (2004) Single microphone source separation using high resolution signal reconstruction. In Proceedings of International Conference on Acoustics, Speech, Signal Processing, (ICASSP’04, (Vol. 2, pp. 817–820). Montreal, QC.
go back to reference Kumaraswamy, R., Yegnanarayana, B., & Sri ramamurty, K. (2009). Determining mixing parameters from multi speaker data using speech specific information. IEEE Transactions on Audio Speech and Language Processing, 17(6), 1196–1207.CrossRef Kumaraswamy, R., Yegnanarayana, B., & Sri ramamurty, K. (2009). Determining mixing parameters from multi speaker data using speech specific information. IEEE Transactions on Audio Speech and Language Processing, 17(6), 1196–1207.CrossRef
go back to reference Li, Y., Amari, S., & Cichocki, A. (2006a). Underdetermined blind source separation based on sparse representation. IEEE Transactions on Audio, Speech and Language Processing, 54(2), 423–437. Li, Y., Amari, S., & Cichocki, A. (2006a). Underdetermined blind source separation based on sparse representation. IEEE Transactions on Audio, Speech and Language Processing, 54(2), 423–437.
go back to reference Li, P., Guan, Y., & Xu, B. (2006b). Monaural speech separation based on computational auditory scene analysis and objective quality assessment of speech. IEEE Transactions on Audio, Speech and Language Processing, 14(6), 2014–2023.CrossRef Li, P., Guan, Y., & Xu, B. (2006b). Monaural speech separation based on computational auditory scene analysis and objective quality assessment of speech. IEEE Transactions on Audio, Speech and Language Processing, 14(6), 2014–2023.CrossRef
go back to reference Litvin, Y., & Cohen, I. (2009). Single channel source separation of audio signals using Bark Scale Wavlet Packet Decomposition. IEEE International Workshop on Machine Learning for Signal Processing, 65(3), 339–9350. Litvin, Y., & Cohen, I. (2009). Single channel source separation of audio signals using Bark Scale Wavlet Packet Decomposition. IEEE International Workshop on Machine Learning for Signal Processing, 65(3), 339–9350.
go back to reference Mijovic, Bogdan, & De Vos, Maarten. (2010). Source separation from single channel recordings by combining empirical mode decomposition and independent component analysis. IEEE Transactions on Biomedical Engineering, 57(9), 2188–2196.CrossRef Mijovic, Bogdan, & De Vos, Maarten. (2010). Source separation from single channel recordings by combining empirical mode decomposition and independent component analysis. IEEE Transactions on Biomedical Engineering, 57(9), 2188–2196.CrossRef
go back to reference Molla, M. K., & Hirose, K. (2007). Single mixture audio source separation by subspace decomposition of Hilbert spectrum. IEEE Transactions on Audio, Speech and Language Processing, 15(3), 893–900.CrossRef Molla, M. K., & Hirose, K. (2007). Single mixture audio source separation by subspace decomposition of Hilbert spectrum. IEEE Transactions on Audio, Speech and Language Processing, 15(3), 893–900.CrossRef
go back to reference Ozerov, A., & Fevotte, C. (2010). Multichannel non-negative Matrix factorization in convolutive mixtures for audio source separation. IEEE Transactions on Audio, Speech and Language Processing, 18(3), 550–563.CrossRef Ozerov, A., & Fevotte, C. (2010). Multichannel non-negative Matrix factorization in convolutive mixtures for audio source separation. IEEE Transactions on Audio, Speech and Language Processing, 18(3), 550–563.CrossRef
go back to reference Philipos, C. (2011). Loizou. Speech Quality Assessment, Multimedia Analysis, Processing & Communications, 346, 623–654.CrossRef Philipos, C. (2011). Loizou. Speech Quality Assessment, Multimedia Analysis, Processing & Communications, 346, 623–654.CrossRef
go back to reference Reys, M. J., Ellis, D., & Jojic, N. (2004). Multiband audio modelling for single channel acoustic source separation. In Proceedings of International Conference on Acoustics, Speech, Signal Processing (ICASSP’04) (Vol. 5, pp. 641–644). Montreal, QC. Reys, M. J., Ellis, D., & Jojic, N. (2004). Multiband audio modelling for single channel acoustic source separation. In Proceedings of International Conference on Acoustics, Speech, Signal Processing (ICASSP’04) (Vol. 5, pp. 641–644). Montreal, QC.
go back to reference Schmidt, M. N., & Olsson, R. K. (2006). Single channel speech separation using sparse non negative matrix factorization”, In Proceedings of International Conference on Spoken Language Processing (INTERSPEECH), (pp. 2614–2617). Pittsburgh, PA. Schmidt, M. N., & Olsson, R. K. (2006). Single channel speech separation using sparse non negative matrix factorization”, In Proceedings of International Conference on Spoken Language Processing (INTERSPEECH), (pp. 2614–2617). Pittsburgh, PA.
go back to reference Schobben, D., Torkkola, K., & Smaragdis, P. (1999). Evaluation of blind signal separation methods. In Proceedings of ICA BSS, Aussois. Schobben, D., Torkkola, K., & Smaragdis, P. (1999). Evaluation of blind signal separation methods. In Proceedings of ICA BSS, Aussois.
go back to reference Stark, Michael, Wohlmayr, Michael, & Pernkopf, Franz. (2011). Source filter based single channel speech separation using pitch information. IEEE Transactions on Audio, Speech and Language Processing, 19(2), 242–254.CrossRef Stark, Michael, Wohlmayr, Michael, & Pernkopf, Franz. (2011). Source filter based single channel speech separation using pitch information. IEEE Transactions on Audio, Speech and Language Processing, 19(2), 242–254.CrossRef
go back to reference Tengtrairat, N., Gao, B., & Woo, W. L. (2013). Single channel Blind separation using pseudo stereo mixture and complex 2-D histogram. IEEE Transactions on Neural Networks and Learning Systems, 24(11), 1722–1735.CrossRef Tengtrairat, N., Gao, B., & Woo, W. L. (2013). Single channel Blind separation using pseudo stereo mixture and complex 2-D histogram. IEEE Transactions on Neural Networks and Learning Systems, 24(11), 1722–1735.CrossRef
go back to reference Vincent, E., & Bertin, N. (2014). From Blind to guided audio source separation. IEEE Signal Processing Magazine, 31(3), 107–115.CrossRef Vincent, E., & Bertin, N. (2014). From Blind to guided audio source separation. IEEE Signal Processing Magazine, 31(3), 107–115.CrossRef
go back to reference Vincent, E., Gribonval, R., & Fevotte, C. (2006). Performance measurement in blind audio source separation. IEEE Transactions on Speech and Audio Processing., 14(4), 1462–1469.CrossRef Vincent, E., Gribonval, R., & Fevotte, C. (2006). Performance measurement in blind audio source separation. IEEE Transactions on Speech and Audio Processing., 14(4), 1462–1469.CrossRef
go back to reference Virtanen, T. (2007). Monaural sound source separation by non negative matrix factorization with temporal continuity and sparseness criteria. IEEE Transactions on Audio, Speech and Language Processing, 15(3), 1066–1074.CrossRef Virtanen, T. (2007). Monaural sound source separation by non negative matrix factorization with temporal continuity and sparseness criteria. IEEE Transactions on Audio, Speech and Language Processing, 15(3), 1066–1074.CrossRef
go back to reference Wang, Y. H., Yeh, C. H., & Young, H. W. (2014). On the Computational complexity of the empirical mode decomposition algorithm. Physica A: Statistical Mechanics and its Applications, 400(15), 159–167.CrossRef Wang, Y. H., Yeh, C. H., & Young, H. W. (2014). On the Computational complexity of the empirical mode decomposition algorithm. Physica A: Statistical Mechanics and its Applications, 400(15), 159–167.CrossRef
go back to reference Yilmaz, O., & Rickard, S. (2004). Blind separation of speech mixtures via time-frequency masking. IEEE Transactions on Signal Processing, 52(7), 1830–1847.MathSciNetCrossRef Yilmaz, O., & Rickard, S. (2004). Blind separation of speech mixtures via time-frequency masking. IEEE Transactions on Signal Processing, 52(7), 1830–1847.MathSciNetCrossRef
Metadata
Title
Single-channel speech separation using empirical mode decomposition and multi pitch information with estimation of number of speakers
Authors
M. K. Prasanna Kumar
R. Kumaraswamy
Publication date
29-11-2016
Publisher
Springer US
Published in
International Journal of Speech Technology / Issue 1/2017
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-016-9392-y

Other articles of this Issue 1/2017

International Journal of Speech Technology 1/2017 Go to the issue