Top

Published in:

2020 | OriginalPaper | Chapter

Gen-Res-Net: A Novel Generative Model for Singing Voice Separation

Authors : Congzhou Tian, Hangyu Li, Deshun Yang, Xiaoou Chen

Published in: MultiMedia Modeling

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

In most cases, modeling in the time-frequency domain is the most common method to solve the problem of singing voice separation since frequency characteristics differ between different sources. During the past few years, applying recurrent neural network (RNN) to series of split spectrograms has been mostly adopted by researchers to tackle this problem. Recently, however, the U-net’s success has drawn the focus to treating the spectrogram as a 2-dimensional image with an auto-encoder structure, which indicates that some useful methods in image analysis may help solve this problem. Under this scenario, we propose a novel spectrogram-generative model to separate the two sources in the time-frequency domain inspired by Residual blocks, Squeeze and Excitation blocks and WaveNet. We apply none-reduce-sized Residual blocks together with Squeeze and Excitation blocks in the main stream to extract features of the input spectrogram while gathering the output layers in a skip-connection structure used in WaveNet. Experimental results on two datasets (MUSDB18 and CCMixer) have shown that our proposed network performs better than the current state-of-the-art approach working on spectrograms of mixtures – the deep U-net structure.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Speaker-Aware Speech Emotion Recognition by Fusing Amplitude and Phase Information

next chapter A Distinct Synthesizer Convolutional TasNet for Singing Voice Separation

van der Merwe, P.: Origins of the Popular Style: The Antecedents of Twentieth-Century Popular Music, p. 320. Clarendon Press, Oxford (1989). ISBN 0-19-316121-4

Fujihara, H., Goto, M., Ogata, J., Komatani, K., Ogata, T., Okuno, H.G.: Automatic synchronization between lyrics and music CD recordings based on Viterbi alignment of segregated vocal signals. In: Proceedings of ISM, pp. 257–264, December 2006

Yang, Y.-H., Chen, H.H.: Machine recognition of music emotion: a review. ACM Trans. Intell. Syst. Technol 40, 1–30 (2012)CrossRef

Berenzweig, A., Ellis, D.P.W., Lawrence, S.: Using voice segments to improve artist classification of music. In: AES 22nd International Conference: Virtual, Synthetic, and Entertainment Audio (2002)

Li, Y., Wang, D.: Separation of singing voice from music accompaniment for monaural recordings. IEEE Trans. Audio Speech Lang. Process. 15(4), 1475–1487 (2007)CrossRef

Rafii, Z., Pardo, B.: Repeating pattern extraction technique (REPET): a simple method for music/voice separation. IEEE Trans. Audio Speech Lang. Process. 21(1), 73–84 (2013)CrossRef

Jeong, I.-Y., Lee, K.: Singing voice separation using RPCA with weighted \(l_{1}\)-norm. In: Tichavský, P., Babaie-Zadeh, M., Michel, O.J.J., Thirion-Moreau, N. (eds.) LVA/ICA 2017. LNCS, vol. 10169, pp. 553–562. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-53547-0_52CrossRef

Huang, P., Kim, M., Hasegawa-Johnson, M., Smaragdis, P.: Joint optimization of masks and deep recurrent neural networks for monaural source separation. IEEE/ACM Trans. Audio Speech Lang. Process. 23(12), 2136–2147 (2015)CrossRef

Mimilakis, S.I., Drossos, K., Virtanen, T., Schuller, G.: A recurrent encoder-decoder approach with skip-filtering connections for monaural singing voice separation. In: 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP), Tokyo, pp. 1–6 (2017)

10.

Luo, Y., Chen, Z., Hershey, J.R., Le Roux, J., Mesgarani, N.: Deep clustering and conventional networks for music separation: stronger together. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, pp. 61–65 (2017)

11.

Jansson, A., Humphrey, E., Montecchio, N., Bittner, R., Kumar, A., Weyde, T.: Singing voice separation with deep U-net convolutional networks (2017)

12.

Chandna, P., Miron, M., Janer, J., Gómez, E.: Monoaural audio source separation using deep convolutional neural networks. In: Tichavský, P., Babaie-Zadeh, M., Michel, O.J.J., Thirion-Moreau, N. (eds.) LVA/ICA 2017. LNCS, vol. 10169, pp. 258–266. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-53547-0_25CrossRef

13.

Grais, E.M., Ward, D., Plumbley, M.D.: Raw multi-channel audio source separation using multi-resolution convolutional auto-encoders. In: 2018 26th European Signal Processing Conference (EUSIPCO), pp. 1577–1581. IEEE (2018)

14.

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

15.

Hu, J., Shen, L., Sun, G.: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7132–7141 (2018)

16.

van den Oord, A., et al.: WaveNet: a generative model for raw audio. http://arxiv.org/abs/1609.03499 (2016)

17.

Rafii, Z., Liutkus, A., Stter, F.-R., Mimilakis, S.I., Bittner, R.: The MUSDB18 corpus for music separation (2017)

18.

Liutkus, A., Fitzgerald, D., Rafii, Z.: Scalable audio separation with light kernel additive modelling. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 76–80. IEEE (2015)

19.

Stoller, D., Ewert, S., Dixon, S.: Wave-U-Net: a multi-scale neural network for end-to-end audio source separation. arXiv preprint arXiv:1806.03185 (2018)

20.

Vincent, E., Gribonval, R., Fevotte, C.: Performance measurement in blind audio source separation. IEEE Trans. Audio Speech Lang. Process. 14(4), 1462–1469 (2006)CrossRef

21.

Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28CrossRef

Title: Gen-Res-Net: A Novel Generative Model for Singing Voice Separation
Authors: Congzhou Tian
Hangyu Li
Deshun Yang
Xiaoou Chen
Publisher: Springer International Publishing
Book: MultiMedia Modeling
Print ISBN: 978-3-030-37730-4

Electronic ISBN: 978-3-030-37731-1

Copyright Year: 2020
DOI: https://doi.org/10.1007/978-3-030-37731-1_3

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"