Top

Published in:

2022 | OriginalPaper | Chapter

Training Scheme for Stereo Audio Generation

Author : Padmaja Mohanty

Published in: Computing, Communication and Learning

Publisher: Springer Nature Switzerland

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

The Voice substitution and audio generation are being used more and more often in a variety of computer listening applications. Furthermore, state-of-the-art perceptual synthesis is allowing richer music without the need for expensive equipment. True audio immersion is when the listener feels what they are listening to, they become part of the story being told. True stereo audio must be generated differently to make use of two channels rather than just one. However, generating stereo audio has not been a popular topic in literature despite being an important component of a listener’s experience. Some great tools for generating stereo audio are Sharp Beta Point or Audacity. This research is focused on developing a generative model for stereo audio generation. It also presents new forms of representation that effectively capture stereo image of stereo audio. It is evident from results that the proposed method improves audio quality to significant degree.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Rice Leaf Disease Detection and Classification Using a Deep Neural Network

next chapter A Machine Learning Approach for Classification of Lemon Leaf Diseases

Achlioptas, P., Diamanti, O., Mitliagkas, I., Guibas, L.J.: Learning representations and generative models for 3D point clouds (2017)

Araki, S., Hayashi, T., Delcroix, M., Fujimoto, M., Takeda, K., Nakatani, T.: Exploring multi-channel features for denoising-autoencoder-based speech enhancement. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 116–120 (2015)

Donahue, C., McAuley, J., Puckette, M.: Adversarial audio synthesis. In: International Conference on Learning Representations (2019)

Engel, J., Agrawal, K.K., Chen, S., Gulrajani, I., Donahue, C., Roberts, A.: GANSynth: adversarial neural audio synthesis. In: International Conference on Learning Representations (2019)

Engel, J., et al.: Neural audio synthesis of musical notes with WaveNet autoencoders. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, International Convention Centre, Sydney, Australia. Proceedings of Machine Learning Research, vol. 70, pp. 1068–1077. PMLR (2017)

Fink, M., Kraft, S., Zölzer, U.: Downmmix-compatible conversion from mono to stereo in time-and frequency-domain. In: Proceedings of the 18th International Conference on Digital Audio Effects (2015)

Johnston, J.D., Ferreira, A.J.: Sum-difference stereo transform coding. In: Proceedings of ICASSP 1992: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 569–572 (1992)

Liu, T., Yan, D., Yan, N., Chen, G.: Anti-forensics of fake stereo audio using generative adversarial network. Multimed. Tools Appl. 81, 17155–17167 (2022). https://doi.org/10.1007/s11042-022-12448-4CrossRef

Lopez-Paz, D., Oquab, M.: Revisiting classifier two-sample tests (2016) arXiv:1610.06545

10.

Menéndez González, V., Gilbert, A., Phillipson, G., Jolly, S., Hadfield, S.: SaiNet: stereo aware inpainting behind objects with generative networks. arXiv-2205 (2022)

11.

Nugraha, A.A., Liutkus, A., Vincent, E.: Multichannel audio source separation with deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 24, 1652–1664 (2016)CrossRef

12.

Okano, T., Beranek, L.L., Hidaka, T.: Relations among interaural cross-correlation coefficient (IACC\(_{E}\)), lateral fraction (LF\(_{E}\)), and apparent source width (ASW) in concert halls. Acoust. Soc. Am. 104, 255–265 (1998)CrossRef

13.

Prenger, R., Valle, R., Catanzaro, B.: WaveGlow: a flow-based generative network for speech synthesis. In: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3617–3621 (2019)

14.

Rubner, Y., Tomasi, C., Guibas, L.J.: A metric for distributions with applications to image databases. In: Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271), pp. 59–66 (1998)

15.

Singh, P., Devi, K.J., Varish, N.: Muzzle pattern based cattle identification using generative adversarial networks. In: Tiwari, A., Ahuja, K., Yadav, A., Bansal, J.C., Deep, K., Nagar, A.K. (eds.) Soft Computing for Problem Solving. AISC, vol. 1392, pp. 13–23. Springer, Singapore (2021). https://doi.org/10.1007/978-981-16-2709-5_2CrossRef

16.

van den Oord, A., et al.: WaveNet: a generative model for raw audio (2016)

17.

Wang, X., Takaki, S., Yamagishi, J.: Neural source-filter-based waveform model for statistical parametric speech synthesis. In: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5916–5920. IEEE (2019)

18.

Xiao, X., et al.: Deep beamforming networks for multi-channel speech recognition. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5745–5749 (2016)

Title: Training Scheme for Stereo Audio Generation
Author: Padmaja Mohanty
Publisher: Springer Nature Switzerland
Book: Computing, Communication and Learning
Print ISBN: 978-3-031-21749-4

Electronic ISBN: 978-3-031-21750-0

Copyright Year: 2022
DOI: https://doi.org/10.1007/978-3-031-21750-0_21

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner