Skip to main content
Top

2021 | OriginalPaper | Chapter

Speech2Image: Generating Images from Speech Using Pix2Pix Model

Authors : Ankit Raj Ojha, Abhilash Gunasegaran, Aruna Maurya, Spriha Mandal

Published in: Advanced Computing

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Generating images from speech is a fundamental problem that has numerous applications which include art generation, computer-aided design, enhancing learning capabilities in children among others. We present an audio-conditioned image generation model that transfers features from the speech descriptions (source) to respective image (target) domain. To accomplish this, we have used the Caltech-UCSD Birds-200-2011 (CUB-200-2011) dataset as target images and then we generated high-quality speech descriptions of birds to prepare custom dataset for training. We further verified the generated images using the images of birds provided in the CUB-200-2011 dataset. The model is trained and tested on three types of speech representations, i.e Spectrograms, ConstantQ Transforms and Short Time Fourier Transforms the results of which are discussed in subsequent sections. Unlike conventional approaches for speech to image conversions, which rely on a different intermediary domain such as text to realize this transition, our novel approach relies on an intermediate image transition, effectively restricting the number of domains involved in this process.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Brock, A., Lim, T., Ritchie, J.M., Weston, N.: Neural photo editing with introspective adversarial networks (2016) Brock, A., Lim, T., Ritchie, J.M., Weston, N.: Neural photo editing with introspective adversarial networks (2016)
2.
go back to reference Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural networks. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2414–2423 (2016) Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural networks. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2414–2423 (2016)
3.
go back to reference Goodfellow, I.J., et al.: Generative adversarial networks (2014) Goodfellow, I.J., et al.: Generative adversarial networks (2014)
5.
go back to reference Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local nash equilibrium (2017) Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local nash equilibrium (2017)
6.
go back to reference Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks (2016) Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks (2016)
7.
go back to reference Kim, T., Cha, M., Kim, H., Lee, J.K., Kim, J.: Learning to discover cross-domain relations with generative adversarial networks (2017) Kim, T., Cha, M., Kim, H., Lee, J.K., Kim, J.: Learning to discover cross-domain relations with generative adversarial networks (2017)
8.
go back to reference Luan, F., Paris, S., Shechtman, E., Bala, K.: Deep photo style transfer (2017) Luan, F., Paris, S., Shechtman, E., Bala, K.: Deep photo style transfer (2017)
9.
go back to reference Ma, L., Jia, X., Sun, Q., Schiele, B., Tuytelaars, T., Gool, L.V.: Pose guided person image generation (2017) Ma, L., Jia, X., Sun, Q., Schiele, B., Tuytelaars, T., Gool, L.V.: Pose guided person image generation (2017)
10.
go back to reference Mirza, M., Osindero, S.: Conditional generative adversarial nets (2014) Mirza, M., Osindero, S.: Conditional generative adversarial nets (2014)
11.
go back to reference Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks (2015) Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks (2015)
12.
go back to reference Sagong, M.C., Shin, Y.G., Yeo, Y.J., Park, S., Ko, S.J.: cGANs with conditional convolution layer (2019) Sagong, M.C., Shin, Y.G., Yeo, Y.J., Park, S., Ko, S.J.: cGANs with conditional convolution layer (2019)
13.
go back to reference Welinder, P., et al.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010) Welinder, P., et al.: Caltech-UCSD Birds 200. Technical Report CNS-TR-2010-001, California Institute of Technology (2010)
17.
go back to reference Wu, H., Zheng, S., Zhang, J., Huang, K.: Gp-gan: towards realistic high-resolution image blending (2019) Wu, H., Zheng, S., Zhang, J., Huang, K.: Gp-gan: towards realistic high-resolution image blending (2019)
18.
go back to reference Xian, W., et al: Texturegan: Controlling deep image synthesis with texture patches (2017) Xian, W., et al: Texturegan: Controlling deep image synthesis with texture patches (2017)
19.
go back to reference Xu, T., et al.: Attngan: fine-grained text to image generation with attentional generative adversarial networks (2017) Xu, T., et al.: Attngan: fine-grained text to image generation with attentional generative adversarial networks (2017)
21.
go back to reference Yeh, R.A., Chen, C., Lim, T.Y., Schwing, A.G., Hasegawa-Johnson, M., Do, M.N.: Semantic image inpainting with deep generative models (2017) Yeh, R.A., Chen, C., Lim, T.Y., Schwing, A.G., Hasegawa-Johnson, M., Do, M.N.: Semantic image inpainting with deep generative models (2017)
22.
go back to reference Zhang, H., et al.: Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks (2016) Zhang, H., et al.: Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks (2016)
23.
go back to reference Zhang, H., et al.: Stackgan++: Realistic image synthesis with stacked generative adversarial networks (2017) Zhang, H., et al.: Stackgan++: Realistic image synthesis with stacked generative adversarial networks (2017)
24.
go back to reference Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks (2017) Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks (2017)
25.
go back to reference Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks (2020) Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks (2020)
Metadata
Title
Speech2Image: Generating Images from Speech Using Pix2Pix Model
Authors
Ankit Raj Ojha
Abhilash Gunasegaran
Aruna Maurya
Spriha Mandal
Copyright Year
2021
Publisher
Springer Singapore
DOI
https://doi.org/10.1007/978-981-16-0401-0_24

Premium Partner