Skip to main content
Erschienen in: Neural Computing and Applications 23/2021

19.07.2021 | Original Article

Visualization-based disentanglement of latent space

verfasst von: Runze Huang, Qianying Zheng, Haifang Zhou

Erschienen in: Neural Computing and Applications | Ausgabe 23/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In recent years, selecting manipulation of data attributes by changing latent code using auto-encoder has received considerable scholarly attention . However, the representation of the data encoded by the auto-encoder cannot be visually observed. Furthermore, the attribute values and the latent code of the dimension do not conform to a linear monotonic relationship. From a practical point of view, we propose a novel method that uses the encoder–decoder architecture to disentangle data into two visualizable representations that are encoded as latent spaces. Consequently, the encoded latent space can be used to manipulate data attributes in a simple and intuitive way. The experiments on image dataset and music dataset show that the proposed approach leads to produce complete interpretable latent spaces, which can be used to manipulate a wide range of data attributes and to generate realistic music via analogy.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Shao Z, Huang M, Wen J, Xu W, Zhu X (2019) Long and diverse text generation with planning-based hierarchical variational model. In: International joint conference on natural language processing (IJCNLP), pp 3255–3266 Shao Z, Huang M, Wen J, Xu W, Zhu X (2019) Long and diverse text generation with planning-based hierarchical variational model. In: International joint conference on natural language processing (IJCNLP), pp 3255–3266
2.
Zurück zum Zitat Shen D, Celikyilmaz A, Zhang Y, Chen L, Wang X, Gao J, Carin L (2019) Towards generating long and coherent text with multi-level latent variable models. In: Meeting of the association for computational linguistics (ACL), pp 2079–2089 Shen D, Celikyilmaz A, Zhang Y, Chen L, Wang X, Gao J, Carin L (2019) Towards generating long and coherent text with multi-level latent variable models. In: Meeting of the association for computational linguistics (ACL), pp 2079–2089
3.
Zurück zum Zitat Zhang Y, Wang Y, Zhang L, Zhang Z, Gai K (2019) Improve diverse text generation by self labeling conditional variational auto encoder. In: International conference on acoustics speech and signal processing (ICASSP), pp 2767–2771 Zhang Y, Wang Y, Zhang L, Zhang Z, Gai K (2019) Improve diverse text generation by self labeling conditional variational auto encoder. In: International conference on acoustics speech and signal processing (ICASSP), pp 2767–2771
4.
Zurück zum Zitat Hsu W, Zhang Y, Weiss R, Chung Y, Wang Y, Wu Y, Glass J (2019) Disentangling correlated speaker and noise for speech synthesis via data augmentation and adversarial factorization. In: International conference on acoustics speech and signal processing (ICASSP), pp 5901–5905 Hsu W, Zhang Y, Weiss R, Chung Y, Wang Y, Wu Y, Glass J (2019) Disentangling correlated speaker and noise for speech synthesis via data augmentation and adversarial factorization. In: International conference on acoustics speech and signal processing (ICASSP), pp 5901–5905
5.
Zurück zum Zitat Hsu W, Zhang Y, Weiss R, Zen H, Wu Y, Wang Y, Cao Y, Jia Y, Chen Z, Shen J (2019) Hierarchical Generative Modeling for Controllable Speech Synthesis. In: international conference on learning representations (ICLR) Hsu W, Zhang Y, Weiss R, Zen H, Wu Y, Wang Y, Cao Y, Jia Y, Chen Z, Shen J (2019) Hierarchical Generative Modeling for Controllable Speech Synthesis. In: international conference on learning representations (ICLR)
6.
Zurück zum Zitat Luo Y, Agres K, Herremans D (2019) Learning disentangled representations of timbre and pitch for musical instrument sounds using gaussian mixture variational autoencoders. In: International symposium/conference on music information retrieval (ISMIR), pp 746–753 Luo Y, Agres K, Herremans D (2019) Learning disentangled representations of timbre and pitch for musical instrument sounds using gaussian mixture variational autoencoders. In: International symposium/conference on music information retrieval (ISMIR), pp 746–753
7.
Zurück zum Zitat Wang Y, Stanton D, Zhang Y, Ryan R, Battenberg E, Shor J, Xiao Y, Jia Y, Ren F, Saurous RA (2018) Style tokens: unsupervised style modeling, control and transfer in end-to-end speech synthesis. In: International conference on machine learning (ICML), pp 5167–5176 Wang Y, Stanton D, Zhang Y, Ryan R, Battenberg E, Shor J, Xiao Y, Jia Y, Ren F, Saurous RA (2018) Style tokens: unsupervised style modeling, control and transfer in end-to-end speech synthesis. In: International conference on machine learning (ICML), pp 5167–5176
8.
Zurück zum Zitat Razavi A, Oord Avd, Vinyals O (2019) Generating diverse high-fidelity images with VQ-VAE-2. In: Neural information processing systems (NIPS) Razavi A, Oord Avd, Vinyals O (2019) Generating diverse high-fidelity images with VQ-VAE-2. In: Neural information processing systems (NIPS)
9.
Zurück zum Zitat Ślot K, Kapusta P, Kucharski J (2020) Autoencoder-based image processing framework for object appearance modifications. Neural Computing and Applications (NCAA) Ślot K, Kapusta P, Kucharski J (2020) Autoencoder-based image processing framework for object appearance modifications. Neural Computing and Applications (NCAA)
10.
Zurück zum Zitat Qi CR, Yi L, Su H, Guibas LJ (2017) PointNet++: deep hierarchical feature learning on point sets in a metric space. In: Neural Information Processing Systems (NIPS) Qi CR, Yi L, Su H, Guibas LJ (2017) PointNet++: deep hierarchical feature learning on point sets in a metric space. In: Neural Information Processing Systems (NIPS)
11.
Zurück zum Zitat Brunner G, Konrad A, Wang Y, Wattenhofer R (2018) MIDI-VAE: modeling dynamics and instrumentation of music with applications to style transfer. In: International symposium/conference on music information retrieval (ISMIR), pp 747–754 Brunner G, Konrad A, Wang Y, Wattenhofer R (2018) MIDI-VAE: modeling dynamics and instrumentation of music with applications to style transfer. In: International symposium/conference on music information retrieval (ISMIR), pp 747–754
12.
Zurück zum Zitat 9. Esling P, Chemlaromeusantos A, Bitton A (2018) Bridging audio analysis, perception and synthesis with perceptually-regularized variational timbre spaces. In: International symposium/conference on music information retrieval (ISMIR), pp 175–181 9. Esling P, Chemlaromeusantos A, Bitton A (2018) Bridging audio analysis, perception and synthesis with perceptually-regularized variational timbre spaces. In: International symposium/conference on music information retrieval (ISMIR), pp 175–181
13.
Zurück zum Zitat Roberts A, Engel J, Raffel C, Hawthorne C, Eck DJaL (2018) A hierarchical latent vector model for learning long-term structure in music. In: International conference on machine learning (ICML) Roberts A, Engel J, Raffel C, Hawthorne C, Eck DJaL (2018) A hierarchical latent vector model for learning long-term structure in music. In: International conference on machine learning (ICML)
14.
Zurück zum Zitat Rubenstein PK, Scholkopf B, Tolstikhin I (2018) Learning disentangled representations with wasserstein auto-encoders. In: International conference on learning representations (ICLR) Rubenstein PK, Scholkopf B, Tolstikhin I (2018) Learning disentangled representations with wasserstein auto-encoders. In: International conference on learning representations (ICLR)
15.
Zurück zum Zitat Kingma DP, Welling M (2014) Auto-encoding variational Bayes. In: International conference on learning representations (ICLR) Kingma DP, Welling M (2014) Auto-encoding variational Bayes. In: International conference on learning representations (ICLR)
16.
Zurück zum Zitat Hadjeres G, Nielsen F, Pachet F, Ieee (2017) GLSR-VAE: geodesic latent space regularization for variational autoencoder architectures. In: IEEE symposium series on computational intelligence (SSCI) Hadjeres G, Nielsen F, Pachet F, Ieee (2017) GLSR-VAE: geodesic latent space regularization for variational autoencoder architectures. In: IEEE symposium series on computational intelligence (SSCI)
17.
Zurück zum Zitat Brunner G, Konrad A, Wang Y, Wattenhofer R (2018) MIDI-VAE: modeling dynamics and instrumentation of music with applications to style transfer. In: International symposium/conference on music information retrieval (ISMIR) Brunner G, Konrad A, Wang Y, Wattenhofer R (2018) MIDI-VAE: modeling dynamics and instrumentation of music with applications to style transfer. In: International symposium/conference on music information retrieval (ISMIR)
18.
Zurück zum Zitat Pati A, Lerch A, Hadjeres G (2019) Learning to traverse latent spaces for musical score inpainting. In: international symposium/conference on music information retrieval (ISMIR) Pati A, Lerch A, Hadjeres G (2019) Learning to traverse latent spaces for musical score inpainting. In: international symposium/conference on music information retrieval (ISMIR)
19.
Zurück zum Zitat Rezaabad AL, Vishwanath S (2020) Learning representations by maximizing mutual information in variational autoencoders. In: International symposium on information theory (ISIT) Rezaabad AL, Vishwanath S (2020) Learning representations by maximizing mutual information in variational autoencoders. In: International symposium on information theory (ISIT)
20.
Zurück zum Zitat Gao S, Brekelmans R, Steeg GV, Galstyan A (2019) Auto-encoding total correlation explanation. In: International conference on artificial intelligence and statistics Gao S, Brekelmans R, Steeg GV, Galstyan A (2019) Auto-encoding total correlation explanation. In: International conference on artificial intelligence and statistics
21.
Zurück zum Zitat Achille A, Soatto S (2018) Information dropout: learning optimal representations through noisy computation. IEEE Trans Pattern Anal Mach Intell (TPAMI) 40(12):2897–2905CrossRef Achille A, Soatto S (2018) Information dropout: learning optimal representations through noisy computation. IEEE Trans Pattern Anal Mach Intell (TPAMI) 40(12):2897–2905CrossRef
22.
Zurück zum Zitat Kim H, Mnih A (2018) Disentangling by factorising. In: International conference on machine learning (ICML) Kim H, Mnih A (2018) Disentangling by factorising. In: International conference on machine learning (ICML)
23.
Zurück zum Zitat Castro DCD, Tan J, Kainz B, Konukoglu E, Glocker B (2019) Morpho-MNIST: quantitative assessment and diagnostics for representation learning. J Mach Learn Res (JMLR) 20(178):1–29MathSciNetMATH Castro DCD, Tan J, Kainz B, Konukoglu E, Glocker B (2019) Morpho-MNIST: quantitative assessment and diagnostics for representation learning. J Mach Learn Res (JMLR) 20(178):1–29MathSciNetMATH
25.
Zurück zum Zitat Yingzhen L, Mandt S (2018) Disentangled sequential autoencoder. In: International conference on machine learning (ICML) Yingzhen L, Mandt S (2018) Disentangled sequential autoencoder. In: International conference on machine learning (ICML)
26.
Zurück zum Zitat Jha AH, Anand S, Singh M, Veeravasarapu VSR (2018) Disentangling factors of variation with cycle-consistent variational auto-encoders. In: European conference on computer vision (ECCV) Jha AH, Anand S, Singh M, Veeravasarapu VSR (2018) Disentangling factors of variation with cycle-consistent variational auto-encoders. In: European conference on computer vision (ECCV)
27.
Zurück zum Zitat Hadad N, Wolf L, Shahar M (2018) A two-step disentanglement method. In: Computer vision and pattern recognition (CVPR) Hadad N, Wolf L, Shahar M (2018) A two-step disentanglement method. In: Computer vision and pattern recognition (CVPR)
28.
Zurück zum Zitat Zhao S, Song J, Ermon S (2017) InfoVAE: information maximizing variational autoencoders. arXiv:1706.02262 [cs, stat] Zhao S, Song J, Ermon S (2017) InfoVAE: information maximizing variational autoencoders. arXiv:1706.02262 [cs, stat]
29.
Zurück zum Zitat Houthooft R, Chen X, Duan Y, Schulman J, Turck FD, Abbeel P (2016) VIME: variational information maximizing exploration. In: Neural information processing systems (NIPS) Houthooft R, Chen X, Duan Y, Schulman J, Turck FD, Abbeel P (2016) VIME: variational information maximizing exploration. In: Neural information processing systems (NIPS)
30.
Zurück zum Zitat Esmaeili B, Wu H, Jain S, Bozkurt A, Siddharth N, Paige B, Brooks DH, Dy JG, Meent J-Wvd (2019) Structured disentangled representations. In: International conference on artificial intelligence and statistics Esmaeili B, Wu H, Jain S, Bozkurt A, Siddharth N, Paige B, Brooks DH, Dy JG, Meent J-Wvd (2019) Structured disentangled representations. In: International conference on artificial intelligence and statistics
32.
Zurück zum Zitat Locatello F, Bauer S, Lucic M, Ratsch G, Gelly S, Scholkopf B, Bachem O (2019) Challenging common assumptions in the unsupervised learning of disentangled representations. In: International conference on learning representations (ICLR) Locatello F, Bauer S, Lucic M, Ratsch G, Gelly S, Scholkopf B, Bachem O (2019) Challenging common assumptions in the unsupervised learning of disentangled representations. In: International conference on learning representations (ICLR)
33.
Zurück zum Zitat Pesteie M, Abolmaesumi P, Rohling RN (2019) Adaptive augmentation of medical data using independently conditional variational auto-encoders. IEEE Trans Med Imaging 38(12):2807–2820CrossRef Pesteie M, Abolmaesumi P, Rohling RN (2019) Adaptive augmentation of medical data using independently conditional variational auto-encoders. IEEE Trans Med Imaging 38(12):2807–2820CrossRef
34.
Zurück zum Zitat Sohn K, Yan X, Lee H (2015) Learning structured output representation using deep conditional generative models. In: Neural information processing systems (NIPS) Sohn K, Yan X, Lee H (2015) Learning structured output representation using deep conditional generative models. In: Neural information processing systems (NIPS)
35.
Zurück zum Zitat Pandey G, Dukkipati A (2017) Variational methods for conditional multimodal deep learning. In: International Joint Conference on Neural Network (IJCNN) Pandey G, Dukkipati A (2017) Variational methods for conditional multimodal deep learning. In: International Joint Conference on Neural Network (IJCNN)
36.
Zurück zum Zitat Kulkarni TD, Whitney WF, Kohli P, Tenenbaum JB (2015) Deep convolutional inverse graphics network. In: Neural information processing systems (NIPS) Kulkarni TD, Whitney WF, Kohli P, Tenenbaum JB (2015) Deep convolutional inverse graphics network. In: Neural information processing systems (NIPS)
37.
Zurück zum Zitat Pati A, Lerch A (2020) Attribute-based regularization of latent spaces for variational auto-encoders. Neural Computing and Applications (NCAA) Pati A, Lerch A (2020) Attribute-based regularization of latent spaces for variational auto-encoders. Neural Computing and Applications (NCAA)
38.
Zurück zum Zitat Kaliakatsos-Papakostas M, Floros A, Vrahatis MN (2020) Artificial intelligence methods for music generation: a review and future perspectives. In 217:217–245 Kaliakatsos-Papakostas M, Floros A, Vrahatis MN (2020) Artificial intelligence methods for music generation: a review and future perspectives. In 217:217–245
39.
Zurück zum Zitat Yang R, Chen T, Zhang Y, Xia G (2019) Inspecting and interacting with meaningful music representations using VAE. In: New interfaces for musical expression, pp 307–312 Yang R, Chen T, Zhang Y, Xia G (2019) Inspecting and interacting with meaningful music representations using VAE. In: New interfaces for musical expression, pp 307–312
40.
Zurück zum Zitat Esling P, Chemla-Romeu-Santos A, Bitton A (2018) Bridging audio analysis, perception and synthesis with perceptually-regularized variational timbre spaces. In: International symposium/conference on music information retrieval Esling P, Chemla-Romeu-Santos A, Bitton A (2018) Bridging audio analysis, perception and synthesis with perceptually-regularized variational timbre spaces. In: International symposium/conference on music information retrieval
41.
Zurück zum Zitat Jing L, Xinyu Y, Shulei J, Juan L (2019) MG-VAE: Deep chinese folk songs generation with specific regional styles. In: Conference on sound and music technology (CSMT) Jing L, Xinyu Y, Shulei J, Juan L (2019) MG-VAE: Deep chinese folk songs generation with specific regional styles. In: Conference on sound and music technology (CSMT)
42.
Zurück zum Zitat Yun-Ning H, Yi-AN C, Yi-Hsuan Y (2018) Learning disentangled representations for timber and pitch. arXiv:1811:03271v1 [cs.SD] Yun-Ning H, Yi-AN C, Yi-Hsuan Y (2018) Learning disentangled representations for timber and pitch. arXiv:1811:03271v1 [cs.SD]
43.
Zurück zum Zitat Yang R, Wang D, Wang Z, Chen T, Jiang J, Xia G (2019) Deep music analogy via latent representation disentanglement. In: International symposium/conference on music information retrieval (ISMIR), pp 596–603 Yang R, Wang D, Wang Z, Chen T, Jiang J, Xia G (2019) Deep music analogy via latent representation disentanglement. In: International symposium/conference on music information retrieval (ISMIR), pp 596–603
44.
Zurück zum Zitat Chung J, Gülčehre vC, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555 [cs, stat] Chung J, Gülčehre vC, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555 [cs, stat]
45.
Zurück zum Zitat Higgins I, Matthey L, Pal A, Burgess C, Glorot X, Botvinick M, Mohamed S, Lerchner A (2016) beta-VAE: learning basic visual concepts with a constrained variational framework. In: International conference on learning representations (ICLR) Higgins I, Matthey L, Pal A, Burgess C, Glorot X, Botvinick M, Mohamed S, Lerchner A (2016) beta-VAE: learning basic visual concepts with a constrained variational framework. In: International conference on learning representations (ICLR)
46.
Zurück zum Zitat Adel T, Ghahramani Z, Weller A (2018) Discovering interpretable representations for both deep generative and discriminative models. In: International conference on machine learning (ICML) Adel T, Ghahramani Z, Weller A (2018) Discovering interpretable representations for both deep generative and discriminative models. In: International conference on machine learning (ICML)
47.
Zurück zum Zitat Eastwood C, Williams CKI (2018) A framework for the quantitative evaluation of disentangled representations. In: International Conference on Learning Representations (ICLR) Eastwood C, Williams CKI (2018) A framework for the quantitative evaluation of disentangled representations. In: International Conference on Learning Representations (ICLR)
48.
Zurück zum Zitat Chen TQ, Li X, Grosse RB, Duvenaud D (2018) Isolating sources of disentanglement in variational autoencoders. In: International Conference on Learning Representations (ICLR) Chen TQ, Li X, Grosse RB, Duvenaud D (2018) Isolating sources of disentanglement in variational autoencoders. In: International Conference on Learning Representations (ICLR)
49.
Zurück zum Zitat Ridgeway K, Mozer MC (2018) Learning deep disentangled embeddings with the F-statistic loss. In: Neural Information Processing Systems (NIPS) Ridgeway K, Mozer MC (2018) Learning deep disentangled embeddings with the F-statistic loss. In: Neural Information Processing Systems (NIPS)
50.
Zurück zum Zitat Kumar A, Sattigeri P, Balakrishnan A (2017) Variational inference of disentangled latent concepts from unlabeled observations. In: International Conference on Learning Representations (ICLR) Kumar A, Sattigeri P, Balakrishnan A (2017) Variational inference of disentangled latent concepts from unlabeled observations. In: International Conference on Learning Representations (ICLR)
51.
Zurück zum Zitat Scheffe H (1999) The analysis of variance, vol 72. Wiley Scheffe H (1999) The analysis of variance, vol 72. Wiley
52.
Zurück zum Zitat Sak H, Senior AW, Beaufays F (2014) Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In: Conference of the international speech communication association Sak H, Senior AW, Beaufays F (2014) Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In: Conference of the international speech communication association
Metadaten
Titel
Visualization-based disentanglement of latent space
verfasst von
Runze Huang
Qianying Zheng
Haifang Zhou
Publikationsdatum
19.07.2021
Verlag
Springer London
Erschienen in
Neural Computing and Applications / Ausgabe 23/2021
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-021-06223-z

Weitere Artikel der Ausgabe 23/2021

Neural Computing and Applications 23/2021 Zur Ausgabe

Premium Partner