Skip to main content
Top

2022 | OriginalPaper | Chapter

Training Scheme for Stereo Audio Generation

Author : Padmaja Mohanty

Published in: Computing, Communication and Learning

Publisher: Springer Nature Switzerland

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The Voice substitution and audio generation are being used more and more often in a variety of computer listening applications. Furthermore, state-of-the-art perceptual synthesis is allowing richer music without the need for expensive equipment. True audio immersion is when the listener feels what they are listening to, they become part of the story being told. True stereo audio must be generated differently to make use of two channels rather than just one. However, generating stereo audio has not been a popular topic in literature despite being an important component of a listener’s experience. Some great tools for generating stereo audio are Sharp Beta Point or Audacity. This research is focused on developing a generative model for stereo audio generation. It also presents new forms of representation that effectively capture stereo image of stereo audio. It is evident from results that the proposed method improves audio quality to significant degree.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Achlioptas, P., Diamanti, O., Mitliagkas, I., Guibas, L.J.: Learning representations and generative models for 3D point clouds (2017) Achlioptas, P., Diamanti, O., Mitliagkas, I., Guibas, L.J.: Learning representations and generative models for 3D point clouds (2017)
2.
go back to reference Araki, S., Hayashi, T., Delcroix, M., Fujimoto, M., Takeda, K., Nakatani, T.: Exploring multi-channel features for denoising-autoencoder-based speech enhancement. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 116–120 (2015) Araki, S., Hayashi, T., Delcroix, M., Fujimoto, M., Takeda, K., Nakatani, T.: Exploring multi-channel features for denoising-autoencoder-based speech enhancement. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 116–120 (2015)
3.
go back to reference Donahue, C., McAuley, J., Puckette, M.: Adversarial audio synthesis. In: International Conference on Learning Representations (2019) Donahue, C., McAuley, J., Puckette, M.: Adversarial audio synthesis. In: International Conference on Learning Representations (2019)
4.
go back to reference Engel, J., Agrawal, K.K., Chen, S., Gulrajani, I., Donahue, C., Roberts, A.: GANSynth: adversarial neural audio synthesis. In: International Conference on Learning Representations (2019) Engel, J., Agrawal, K.K., Chen, S., Gulrajani, I., Donahue, C., Roberts, A.: GANSynth: adversarial neural audio synthesis. In: International Conference on Learning Representations (2019)
5.
go back to reference Engel, J., et al.: Neural audio synthesis of musical notes with WaveNet autoencoders. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, International Convention Centre, Sydney, Australia. Proceedings of Machine Learning Research, vol. 70, pp. 1068–1077. PMLR (2017) Engel, J., et al.: Neural audio synthesis of musical notes with WaveNet autoencoders. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, International Convention Centre, Sydney, Australia. Proceedings of Machine Learning Research, vol. 70, pp. 1068–1077. PMLR (2017)
6.
go back to reference Fink, M., Kraft, S., Zölzer, U.: Downmmix-compatible conversion from mono to stereo in time-and frequency-domain. In: Proceedings of the 18th International Conference on Digital Audio Effects (2015) Fink, M., Kraft, S., Zölzer, U.: Downmmix-compatible conversion from mono to stereo in time-and frequency-domain. In: Proceedings of the 18th International Conference on Digital Audio Effects (2015)
7.
go back to reference Johnston, J.D., Ferreira, A.J.: Sum-difference stereo transform coding. In: Proceedings of ICASSP 1992: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 569–572 (1992) Johnston, J.D., Ferreira, A.J.: Sum-difference stereo transform coding. In: Proceedings of ICASSP 1992: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 569–572 (1992)
10.
go back to reference Menéndez González, V., Gilbert, A., Phillipson, G., Jolly, S., Hadfield, S.: SaiNet: stereo aware inpainting behind objects with generative networks. arXiv-2205 (2022) Menéndez González, V., Gilbert, A., Phillipson, G., Jolly, S., Hadfield, S.: SaiNet: stereo aware inpainting behind objects with generative networks. arXiv-2205 (2022)
11.
go back to reference Nugraha, A.A., Liutkus, A., Vincent, E.: Multichannel audio source separation with deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 24, 1652–1664 (2016)CrossRef Nugraha, A.A., Liutkus, A., Vincent, E.: Multichannel audio source separation with deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 24, 1652–1664 (2016)CrossRef
12.
go back to reference Okano, T., Beranek, L.L., Hidaka, T.: Relations among interaural cross-correlation coefficient (IACC\(_{E}\)), lateral fraction (LF\(_{E}\)), and apparent source width (ASW) in concert halls. Acoust. Soc. Am. 104, 255–265 (1998)CrossRef Okano, T., Beranek, L.L., Hidaka, T.: Relations among interaural cross-correlation coefficient (IACC\(_{E}\)), lateral fraction (LF\(_{E}\)), and apparent source width (ASW) in concert halls. Acoust. Soc. Am. 104, 255–265 (1998)CrossRef
13.
go back to reference Prenger, R., Valle, R., Catanzaro, B.: WaveGlow: a flow-based generative network for speech synthesis. In: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3617–3621 (2019) Prenger, R., Valle, R., Catanzaro, B.: WaveGlow: a flow-based generative network for speech synthesis. In: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3617–3621 (2019)
14.
go back to reference Rubner, Y., Tomasi, C., Guibas, L.J.: A metric for distributions with applications to image databases. In: Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271), pp. 59–66 (1998) Rubner, Y., Tomasi, C., Guibas, L.J.: A metric for distributions with applications to image databases. In: Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271), pp. 59–66 (1998)
15.
go back to reference Singh, P., Devi, K.J., Varish, N.: Muzzle pattern based cattle identification using generative adversarial networks. In: Tiwari, A., Ahuja, K., Yadav, A., Bansal, J.C., Deep, K., Nagar, A.K. (eds.) Soft Computing for Problem Solving. AISC, vol. 1392, pp. 13–23. Springer, Singapore (2021). https://doi.org/10.1007/978-981-16-2709-5_2CrossRef Singh, P., Devi, K.J., Varish, N.: Muzzle pattern based cattle identification using generative adversarial networks. In: Tiwari, A., Ahuja, K., Yadav, A., Bansal, J.C., Deep, K., Nagar, A.K. (eds.) Soft Computing for Problem Solving. AISC, vol. 1392, pp. 13–23. Springer, Singapore (2021). https://​doi.​org/​10.​1007/​978-981-16-2709-5_​2CrossRef
16.
go back to reference van den Oord, A., et al.: WaveNet: a generative model for raw audio (2016) van den Oord, A., et al.: WaveNet: a generative model for raw audio (2016)
17.
go back to reference Wang, X., Takaki, S., Yamagishi, J.: Neural source-filter-based waveform model for statistical parametric speech synthesis. In: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5916–5920. IEEE (2019) Wang, X., Takaki, S., Yamagishi, J.: Neural source-filter-based waveform model for statistical parametric speech synthesis. In: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5916–5920. IEEE (2019)
18.
go back to reference Xiao, X., et al.: Deep beamforming networks for multi-channel speech recognition. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5745–5749 (2016) Xiao, X., et al.: Deep beamforming networks for multi-channel speech recognition. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5745–5749 (2016)
Metadata
Title
Training Scheme for Stereo Audio Generation
Author
Padmaja Mohanty
Copyright Year
2022
DOI
https://doi.org/10.1007/978-3-031-21750-0_21

Premium Partner