Top

Published in:

2021 | OriginalPaper | Chapter

Speech2Image: Generating Images from Speech Using Pix2Pix Model

Authors : Ankit Raj Ojha, Abhilash Gunasegaran, Aruna Maurya, Spriha Mandal

Published in: Advanced Computing

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Generating images from speech is a fundamental problem that has numerous applications which include art generation, computer-aided design, enhancing learning capabilities in children among others. We present an audio-conditioned image generation model that transfers features from the speech descriptions (source) to respective image (target) domain. To accomplish this, we have used the Caltech-UCSD Birds-200-2011 (CUB-200-2011) dataset as target images and then we generated high-quality speech descriptions of birds to prepare custom dataset for training. We further verified the generated images using the images of birds provided in the CUB-200-2011 dataset. The model is trained and tested on three types of speech representations, i.e Spectrograms, ConstantQ Transforms and Short Time Fourier Transforms the results of which are discussed in subsequent sections. Unlike conventional approaches for speech to image conversions, which rely on a different intermediary domain such as text to realize this transition, our novel approach relies on an intermediate image transition, effectively restricting the number of domains involved in this process.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter RumEval2020-An Effective Approach for Rumour Detection with a Deep Hybrid C-LSTM Model

next chapter SQL Query from Portuguese Language Using Natural Language Processing

Brock, A., Lim, T., Ritchie, J.M., Weston, N.: Neural photo editing with introspective adversarial networks (2016)

Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural networks. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2414–2423 (2016)

Goodfellow, I.J., et al.: Generative adversarial networks (2014)

Gruss, E., Sapirshtein, A., Heruti, V.: Pictures of jap girls in synthesis, pp. 461–465, June 2019. https://doi.org/10.1145/3325480.3329183

Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local nash equilibrium (2017)

Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks (2016)

Kim, T., Cha, M., Kim, H., Lee, J.K., Kim, J.: Learning to discover cross-domain relations with generative adversarial networks (2017)

Luan, F., Paris, S., Shechtman, E., Bala, K.: Deep photo style transfer (2017)

Ma, L., Jia, X., Sun, Q., Schiele, B., Tuytelaars, T., Gool, L.V.: Pose guided person image generation (2017)

10.

Mirza, M., Osindero, S.: Conditional generative adversarial nets (2014)

11.

Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks (2015)

12.

Sagong, M.C., Shin, Y.G., Yeo, Y.J., Park, S., Ko, S.J.: cGANs with conditional convolution layer (2019)

13.

Welinder, P., et al.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010)

14.

Wikipedia contributors: Constant-q transform – Wikipedia, the free encyclopedia (2020). https://en.wikipedia.org/wiki/Constant-Q_transform. Accessed 5 May 2020

15.

Wikipedia contributors: Short-time fourier transform – Wikipedia, the free encyclopedia (2020). https://en.wikipedia.org/wiki/Short-time_Fourier_transform. Accessed 5 May 2020

16.

Wikipedia contributors: Spectrogram – Wikipedia, the free encyclopedia (2020). https://en.wikipedia.org/wiki/Spectrogram. Accessed 5 May 2020

17.

Wu, H., Zheng, S., Zhang, J., Huang, K.: Gp-gan: towards realistic high-resolution image blending (2019)

18.

Xian, W., et al: Texturegan: Controlling deep image synthesis with texture patches (2017)

19.

Xu, T., et al.: Attngan: fine-grained text to image generation with attentional generative adversarial networks (2017)

20.

Xu, T., et al.: Attngan: fine-grained text to image generation with attentional generative adversarial networks. https://github.com/taoxugit/AttnGAN (2018)

21.

Yeh, R.A., Chen, C., Lim, T.Y., Schwing, A.G., Hasegawa-Johnson, M., Do, M.N.: Semantic image inpainting with deep generative models (2017)

22.

Zhang, H., et al.: Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks (2016)

23.

Zhang, H., et al.: Stackgan++: Realistic image synthesis with stacked generative adversarial networks (2017)

24.

Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks (2017)

25.

Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks (2020)

Title: Speech2Image: Generating Images from Speech Using Pix2Pix Model
Authors: Ankit Raj Ojha
Abhilash Gunasegaran
Aruna Maurya
Spriha Mandal
Publisher: Springer Singapore
Book: Advanced Computing
Print ISBN: 978-981-16-0400-3

Electronic ISBN: 978-981-16-0401-0

Copyright Year: 2021
DOI: https://doi.org/10.1007/978-981-16-0401-0_24

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner