Skip to main content
Top
Published in: International Journal of Speech Technology 4/2018

05-09-2018

Improvement in monaural speech separation using sparse non-negative tucker decomposition

Authors: Yash Vardhan Varshney, Prashant Upadhyaya, Zia Ahmad Abbasi, Musiur Raza Abidi, Omar Farooq

Published in: International Journal of Speech Technology | Issue 4/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

A monaural speech separation/enhancement technique based on non-negative tucker decomposition (NTD) has been introduced in this paper. In the proposed work, the effect of sparsity regularization factor on the separation of mixed signal is included in the generalized cost function of NTD. By using the proposed algorithm, the vector components of both target and mixed signal can be exploited and used for the separation of any monaural mixture. Experiment was done on the monaural data generated by mixing the speech signals from two speakers and, by mixing noise and speech signals using TIMIT and noisex-92 dataset. The separation results are compared with the other existing algorithms in terms of correlation of separated signal with the original signal, signal to distortion ratio, perceptual evaluation of speech quality and short-time objective intelligibility. Further, to get more conclusive information about separation ability, speech recognition using Kaldi toolkit was also performed. The recognition results are compared in terms of word error rate (WER) using the MFCC based features. Results show the average improved WER using proposed algorithm over the nearest performing algorithm is up to 2.7% for mixed speech of two speakers and 1.52% for noisy speech input.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Anastasakos, T., McDonough, J., & Makhoul, J. (1997). Speaker adaptive training: A maximum likelihood approach to speaker normalization. In IEEE international conference on acoustics, speech, and signal processing (pp. 1043–1046). Anastasakos, T., McDonough, J., & Makhoul, J. (1997). Speaker adaptive training: A maximum likelihood approach to speaker normalization. In IEEE international conference on acoustics, speech, and signal processing (pp. 1043–1046).
go back to reference Bavkar, S. (2013). PCA based single channel speech enhancement method for highly noisy environment. In Advances in computing, communications and informatics (ICACCI) (pp. 1103–1107). Bavkar, S. (2013). PCA based single channel speech enhancement method for highly noisy environment. In Advances in computing, communications and informatics (ICACCI) (pp. 1103–1107).
go back to reference Bertin, N., Févotte, C., & Badeau, R. (2009). A tempering approach for Itakura-Saito non-negative matrix factorization. With application to music transcription. In Proceedings of ICASSP, IEEE international conference on acoustics, speech and signal processing (pp. 1545–1548). Bertin, N., Févotte, C., & Badeau, R. (2009). A tempering approach for Itakura-Saito non-negative matrix factorization. With application to music transcription. In Proceedings of ICASSP, IEEE international conference on acoustics, speech and signal processing (pp. 1545–1548).
go back to reference Dey, N., & Ashour, A. S. (2018a). Applied examples and applications of localization and tracking problem of multiple speech sources. In Direction of arrival estimation and localization of multi-speech sources (pp. 35–48). Cham: Springer.CrossRef Dey, N., & Ashour, A. S. (2018a). Applied examples and applications of localization and tracking problem of multiple speech sources. In Direction of arrival estimation and localization of multi-speech sources (pp. 35–48). Cham: Springer.CrossRef
go back to reference Dey, N., & Ashour, A. S. (2018b). Challanges and future perspectives in speech-sources direction of arrival estimation and localization. In Direction of arrival estimation and localization of multi-speech sources (pp. 49–52). Cham: Springer.CrossRef Dey, N., & Ashour, A. S. (2018b). Challanges and future perspectives in speech-sources direction of arrival estimation and localization. In Direction of arrival estimation and localization of multi-speech sources (pp. 49–52). Cham: Springer.CrossRef
go back to reference Févotte, C., Gribonval, R., & Vincent, E. (2005). BSS EVAL Toolbox User Guide. Tech Rep 1706, IRISA. Févotte, C., Gribonval, R., & Vincent, E. (2005). BSS EVAL Toolbox User Guide. Tech Rep 1706, IRISA.
go back to reference Garofolo, J., Lamel, L., & Fisher, W., et al. (1988). Getting started with the DARPA TIMIT CD-ROM: An acoustic phonetic continuous speech database. National Institute of Standards and Technology (NIST), Gaithersburg, MD, USA. Garofolo, J., Lamel, L., & Fisher, W., et al. (1988). Getting started with the DARPA TIMIT CD-ROM: An acoustic phonetic continuous speech database. National Institute of Standards and Technology (NIST), Gaithersburg, MD, USA.
go back to reference Guan, N., Lan, L., & Tao, D., et al. (2014). Transductive nonnegative matrix factorization for semi-supervised high-performance speech separation. In Proceedings of ICASSP, IEEE international conference on acoustics, speech and signal processing (pp 2534–2538). Guan, N., Lan, L., & Tao, D., et al. (2014). Transductive nonnegative matrix factorization for semi-supervised high-performance speech separation. In Proceedings of ICASSP, IEEE international conference on acoustics, speech and signal processing (pp 2534–2538).
go back to reference ITU. (2001). Perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs. In ITU-T recommendation (pp. 1–32). ITU. (2001). Perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs. In ITU-T recommendation (pp. 1–32).
go back to reference Jolliffe, I. T. (2002). Principal component analysis (2nd ed.). Berlin: Springer Jolliffe, I. T. (2002). Principal component analysis (2nd ed.). Berlin: Springer
go back to reference Khademian, M., & Mehdi, M. (2016). Monaural multi-talker speech recognition using factorial speech processing models. 1–28. Khademian, M., & Mehdi, M. (2016). Monaural multi-talker speech recognition using factorial speech processing models. 1–28.
go back to reference Kolda, T. G. (2006) Multilinear operators for higher-order decompositions, SANDIA Report SAND2006-2081. Kolda, T. G. (2006) Multilinear operators for higher-order decompositions, SANDIA Report SAND2006-2081.
go back to reference Lef, A., & Bach, F. (2011). Online algorithms for nonnegative matrix factorization with the Itakura-Saito divergence to cite this version: online algorithms for nonnegative matrix factorization with the Itakura-Saito divergence. Lef, A., & Bach, F. (2011). Online algorithms for nonnegative matrix factorization with the Itakura-Saito divergence to cite this version: online algorithms for nonnegative matrix factorization with the Itakura-Saito divergence.
go back to reference Lin, C.-J. (2007). On the convergence of multiplicative update for nonnegative matrix factorization. IEEE Transactions on Neural Networks and Learning Systems, 18, 1589–1596.CrossRef Lin, C.-J. (2007). On the convergence of multiplicative update for nonnegative matrix factorization. IEEE Transactions on Neural Networks and Learning Systems, 18, 1589–1596.CrossRef
go back to reference Liu, J., Liu, J., Wonka, P., & Ye, J. (2012). Sparse non-negative tensor factorization using columnwise coordinate descent. Pattern Recognition, 45, 649–656.CrossRefMATH Liu, J., Liu, J., Wonka, P., & Ye, J. (2012). Sparse non-negative tensor factorization using columnwise coordinate descent. Pattern Recognition, 45, 649–656.CrossRefMATH
go back to reference Mallat, S. (1998) A wavelet tour of signal processing: the sparse way (3rd ed.). Cambridge: Academic Press. Mallat, S. (1998) A wavelet tour of signal processing: the sparse way (3rd ed.). Cambridge: Academic Press.
go back to reference Mørup, M., & Hansen, L. K. (2009) Tuning pruning in sparse non-negative matrix factorization. In European signal processing conference (pp. 1923–1927). Mørup, M., & Hansen, L. K. (2009) Tuning pruning in sparse non-negative matrix factorization. In European signal processing conference (pp. 1923–1927).
go back to reference Plátek, O. (2014). Automatic speech recognition using Kaldi. Charles University in Prague. Plátek, O. (2014). Automatic speech recognition using Kaldi. Charles University in Prague.
go back to reference Schmidt, M., Winther, O., & Hansen, L. K. (2009). Bayesian non-negative matrix factorization. In Independent component analysis and signal separation (pp. 540–547). Schmidt, M., Winther, O., & Hansen, L. K. (2009). Bayesian non-negative matrix factorization. In Independent component analysis and signal separation (pp. 540–547).
go back to reference Stern, R. M. (2003). Signal separation motivated by human auditory perception: Applications to automatic speech recognition. In NSF symposium on speech separation. Stern, R. M. (2003). Signal separation motivated by human auditory perception: Applications to automatic speech recognition. In NSF symposium on speech separation.
go back to reference Taal, C. H., Hendriks, R. C., Heusdens, R., & Jensen, J. (2011). An algorithm for intelligibility prediction of time—Frequency weighted noisy speech. IEEE Transactions on Audio, Speech, and Language Processing, 19, 2125–2136.CrossRef Taal, C. H., Hendriks, R. C., Heusdens, R., & Jensen, J. (2011). An algorithm for intelligibility prediction of time—Frequency weighted noisy speech. IEEE Transactions on Audio, Speech, and Language Processing, 19, 2125–2136.CrossRef
go back to reference Upadhyaya, P., Mittal, S. K., Varshney, Y. V., et al. (2017) Speaker adaptive model for hindi speech using Kaldi speech recognition toolkit. In International conference on multimedia, signal processing and communication technologies (IMPACT) (pp. 222–226). Upadhyaya, P., Mittal, S. K., Varshney, Y. V., et al. (2017) Speaker adaptive model for hindi speech using Kaldi speech recognition toolkit. In International conference on multimedia, signal processing and communication technologies (IMPACT) (pp. 222–226).
go back to reference Varga, A., & Steeneken, H. J. M. (1993). Assessment for automatic speech recognition:{II}. {NOISEX-92}: A database and an experiment to study the effct of additive noise on speech recognition systems. Speech Communication, 12, 247–251.CrossRef Varga, A., & Steeneken, H. J. M. (1993). Assessment for automatic speech recognition:{II}. {NOISEX-92}: A database and an experiment to study the effct of additive noise on speech recognition systems. Speech Communication, 12, 247–251.CrossRef
go back to reference Varshney, Y. V., Abbasi, Z. A., Abidi, M. R., & Farooq, O. (2017a). Variable sparsity regularization factor based SNMF for monaural speech separation. In 2017 40th international conference on telecommunications and signal processing, TSP 2017. Varshney, Y. V., Abbasi, Z. A., Abidi, M. R., & Farooq, O. (2017a). Variable sparsity regularization factor based SNMF for monaural speech separation. In 2017 40th international conference on telecommunications and signal processing, TSP 2017.
go back to reference Vincent, E., Gribonval, R., & F´evotte, C. (2006). Performance measurement in blind audio source separation. IEEE Transactions on Audio, Speech, and Language Processing Institute of Electrical and Electronics Engineers, 14, 1462–1469. Vincent, E., Gribonval, R., & F´evotte, C. (2006). Performance measurement in blind audio source separation. IEEE Transactions on Audio, Speech, and Language Processing Institute of Electrical and Electronics Engineers, 14, 1462–1469.
go back to reference Young, S., Hain, T., & Woodland, P., et al. (2002). The HTK book (for version 3.2.1). Cambridge: Cambridge University Engineering Department. Young, S., Hain, T., & Woodland, P., et al. (2002). The HTK book (for version 3.2.1). Cambridge: Cambridge University Engineering Department.
go back to reference Yuan, Z., Yang, Z., & Oja, E. (2007) Projective nonnegative matrix factorization: Sparseness, orthogonality, and clustering. Helsinki University of Technology 1–14. Yuan, Z., Yang, Z., & Oja, E. (2007) Projective nonnegative matrix factorization: Sparseness, orthogonality, and clustering. Helsinki University of Technology 1–14.
Metadata
Title
Improvement in monaural speech separation using sparse non-negative tucker decomposition
Authors
Yash Vardhan Varshney
Prashant Upadhyaya
Zia Ahmad Abbasi
Musiur Raza Abidi
Omar Farooq
Publication date
05-09-2018
Publisher
Springer US
Published in
International Journal of Speech Technology / Issue 4/2018
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-018-9550-5

Other articles of this Issue 4/2018

International Journal of Speech Technology 4/2018 Go to the issue