nach oben

Neural Computing and Applications

Erschienen in:

19.07.2021 | Original Article

Visualization-based disentanglement of latent space

verfasst von: Runze Huang, Qianying Zheng, Haifang Zhou

Erschienen in: Neural Computing and Applications | Ausgabe 23/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In recent years, selecting manipulation of data attributes by changing latent code using auto-encoder has received considerable scholarly attention . However, the representation of the data encoded by the auto-encoder cannot be visually observed. Furthermore, the attribute values and the latent code of the dimension do not conform to a linear monotonic relationship. From a practical point of view, we propose a novel method that uses the encoder–decoder architecture to disentangle data into two visualizable representations that are encoded as latent spaces. Consequently, the encoded latent space can be used to manipulate data attributes in a simple and intuitive way. The experiments on image dataset and music dataset show that the proposed approach leads to produce complete interpretable latent spaces, which can be used to manipulate a wide range of data attributes and to generate realistic music via analogy.

Vorheriger Artikel CondenseNet with exclusive lasso regularization

Nächster Artikel Spatial bound whale optimization algorithm: an efficient high-dimensional feature selection approach

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nur mit Berechtigung zugänglich

https://github.com/Runze-huang/VD-ED.

https://github.com/Runze-huang/VD-ED/music/sample.

Shao Z, Huang M, Wen J, Xu W, Zhu X (2019) Long and diverse text generation with planning-based hierarchical variational model. In: International joint conference on natural language processing (IJCNLP), pp 3255–3266

Shen D, Celikyilmaz A, Zhang Y, Chen L, Wang X, Gao J, Carin L (2019) Towards generating long and coherent text with multi-level latent variable models. In: Meeting of the association for computational linguistics (ACL), pp 2079–2089

Zhang Y, Wang Y, Zhang L, Zhang Z, Gai K (2019) Improve diverse text generation by self labeling conditional variational auto encoder. In: International conference on acoustics speech and signal processing (ICASSP), pp 2767–2771

Hsu W, Zhang Y, Weiss R, Chung Y, Wang Y, Wu Y, Glass J (2019) Disentangling correlated speaker and noise for speech synthesis via data augmentation and adversarial factorization. In: International conference on acoustics speech and signal processing (ICASSP), pp 5901–5905

Hsu W, Zhang Y, Weiss R, Zen H, Wu Y, Wang Y, Cao Y, Jia Y, Chen Z, Shen J (2019) Hierarchical Generative Modeling for Controllable Speech Synthesis. In: international conference on learning representations (ICLR)

Luo Y, Agres K, Herremans D (2019) Learning disentangled representations of timbre and pitch for musical instrument sounds using gaussian mixture variational autoencoders. In: International symposium/conference on music information retrieval (ISMIR), pp 746–753

Wang Y, Stanton D, Zhang Y, Ryan R, Battenberg E, Shor J, Xiao Y, Jia Y, Ren F, Saurous RA (2018) Style tokens: unsupervised style modeling, control and transfer in end-to-end speech synthesis. In: International conference on machine learning (ICML), pp 5167–5176

Razavi A, Oord Avd, Vinyals O (2019) Generating diverse high-fidelity images with VQ-VAE-2. In: Neural information processing systems (NIPS)

Ślot K, Kapusta P, Kucharski J (2020) Autoencoder-based image processing framework for object appearance modifications. Neural Computing and Applications (NCAA)

10.

Qi CR, Yi L, Su H, Guibas LJ (2017) PointNet++: deep hierarchical feature learning on point sets in a metric space. In: Neural Information Processing Systems (NIPS)

11.

Brunner G, Konrad A, Wang Y, Wattenhofer R (2018) MIDI-VAE: modeling dynamics and instrumentation of music with applications to style transfer. In: International symposium/conference on music information retrieval (ISMIR), pp 747–754

12.

9. Esling P, Chemlaromeusantos A, Bitton A (2018) Bridging audio analysis, perception and synthesis with perceptually-regularized variational timbre spaces. In: International symposium/conference on music information retrieval (ISMIR), pp 175–181

13.

Roberts A, Engel J, Raffel C, Hawthorne C, Eck DJaL (2018) A hierarchical latent vector model for learning long-term structure in music. In: International conference on machine learning (ICML)

14.

Rubenstein PK, Scholkopf B, Tolstikhin I (2018) Learning disentangled representations with wasserstein auto-encoders. In: International conference on learning representations (ICLR)

15.

Kingma DP, Welling M (2014) Auto-encoding variational Bayes. In: International conference on learning representations (ICLR)

16.

Hadjeres G, Nielsen F, Pachet F, Ieee (2017) GLSR-VAE: geodesic latent space regularization for variational autoencoder architectures. In: IEEE symposium series on computational intelligence (SSCI)

17.

18.

Pati A, Lerch A, Hadjeres G (2019) Learning to traverse latent spaces for musical score inpainting. In: international symposium/conference on music information retrieval (ISMIR)

19.

Rezaabad AL, Vishwanath S (2020) Learning representations by maximizing mutual information in variational autoencoders. In: International symposium on information theory (ISIT)

20.

Gao S, Brekelmans R, Steeg GV, Galstyan A (2019) Auto-encoding total correlation explanation. In: International conference on artificial intelligence and statistics

21.

Achille A, Soatto S (2018) Information dropout: learning optimal representations through noisy computation. IEEE Trans Pattern Anal Mach Intell (TPAMI) 40(12):2897–2905CrossRef

22.

Kim H, Mnih A (2018) Disentangling by factorising. In: International conference on machine learning (ICML)

23.

Castro DCD, Tan J, Kainz B, Konukoglu E, Glocker B (2019) Morpho-MNIST: quantitative assessment and diagnostics for representation learning. J Mach Learn Res (JMLR) 20(178):1–29MathSciNetMATH

24.

Foxley E (2011) Nottingham database. https://github.com/jukedeck/nottingham-dataset

25.

Yingzhen L, Mandt S (2018) Disentangled sequential autoencoder. In: International conference on machine learning (ICML)

26.

Jha AH, Anand S, Singh M, Veeravasarapu VSR (2018) Disentangling factors of variation with cycle-consistent variational auto-encoders. In: European conference on computer vision (ECCV)

27.

Hadad N, Wolf L, Shahar M (2018) A two-step disentanglement method. In: Computer vision and pattern recognition (CVPR)

28.

Zhao S, Song J, Ermon S (2017) InfoVAE: information maximizing variational autoencoders. arXiv:1706.02262 [cs, stat]

29.

Houthooft R, Chen X, Duan Y, Schulman J, Turck FD, Abbeel P (2016) VIME: variational information maximizing exploration. In: Neural information processing systems (NIPS)

30.

Esmaeili B, Wu H, Jain S, Bozkurt A, Siddharth N, Paige B, Brooks DH, Dy JG, Meent J-Wvd (2019) Structured disentangled representations. In: International conference on artificial intelligence and statistics

31.

Carter S, Nielsen M (2017) Using artificial intelligence to augment human intelligence. vol 2. https://doi.org/10.23915/DISTILL.00009

32.

Locatello F, Bauer S, Lucic M, Ratsch G, Gelly S, Scholkopf B, Bachem O (2019) Challenging common assumptions in the unsupervised learning of disentangled representations. In: International conference on learning representations (ICLR)

33.

Pesteie M, Abolmaesumi P, Rohling RN (2019) Adaptive augmentation of medical data using independently conditional variational auto-encoders. IEEE Trans Med Imaging 38(12):2807–2820CrossRef

34.

Sohn K, Yan X, Lee H (2015) Learning structured output representation using deep conditional generative models. In: Neural information processing systems (NIPS)

35.

Pandey G, Dukkipati A (2017) Variational methods for conditional multimodal deep learning. In: International Joint Conference on Neural Network (IJCNN)

36.

Kulkarni TD, Whitney WF, Kohli P, Tenenbaum JB (2015) Deep convolutional inverse graphics network. In: Neural information processing systems (NIPS)

37.

Pati A, Lerch A (2020) Attribute-based regularization of latent spaces for variational auto-encoders. Neural Computing and Applications (NCAA)

38.

Kaliakatsos-Papakostas M, Floros A, Vrahatis MN (2020) Artificial intelligence methods for music generation: a review and future perspectives. In 217:217–245

39.

Yang R, Chen T, Zhang Y, Xia G (2019) Inspecting and interacting with meaningful music representations using VAE. In: New interfaces for musical expression, pp 307–312

40.

Esling P, Chemla-Romeu-Santos A, Bitton A (2018) Bridging audio analysis, perception and synthesis with perceptually-regularized variational timbre spaces. In: International symposium/conference on music information retrieval

41.

Jing L, Xinyu Y, Shulei J, Juan L (2019) MG-VAE: Deep chinese folk songs generation with specific regional styles. In: Conference on sound and music technology (CSMT)

42.

Yun-Ning H, Yi-AN C, Yi-Hsuan Y (2018) Learning disentangled representations for timber and pitch. arXiv:1811:03271v1 [cs.SD]

43.

Yang R, Wang D, Wang Z, Chen T, Jiang J, Xia G (2019) Deep music analogy via latent representation disentanglement. In: International symposium/conference on music information retrieval (ISMIR), pp 596–603

44.

Chung J, Gülčehre vC, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555 [cs, stat]

45.

Higgins I, Matthey L, Pal A, Burgess C, Glorot X, Botvinick M, Mohamed S, Lerchner A (2016) beta-VAE: learning basic visual concepts with a constrained variational framework. In: International conference on learning representations (ICLR)

46.

Adel T, Ghahramani Z, Weller A (2018) Discovering interpretable representations for both deep generative and discriminative models. In: International conference on machine learning (ICML)

47.

Eastwood C, Williams CKI (2018) A framework for the quantitative evaluation of disentangled representations. In: International Conference on Learning Representations (ICLR)

48.

Chen TQ, Li X, Grosse RB, Duvenaud D (2018) Isolating sources of disentanglement in variational autoencoders. In: International Conference on Learning Representations (ICLR)

49.

Ridgeway K, Mozer MC (2018) Learning deep disentangled embeddings with the F-statistic loss. In: Neural Information Processing Systems (NIPS)

50.

Kumar A, Sattigeri P, Balakrishnan A (2017) Variational inference of disentangled latent concepts from unlabeled observations. In: International Conference on Learning Representations (ICLR)

51.

Scheffe H (1999) The analysis of variance, vol 72. Wiley

52.

Sak H, Senior AW, Beaufays F (2014) Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In: Conference of the international speech communication association

Titel: Visualization-based disentanglement of latent space
verfasst von: Runze Huang
Qianying Zheng
Haifang Zhou
Publikationsdatum: 19.07.2021
Verlag: Springer London
Erschienen in: Neural Computing and Applications / Ausgabe 23/2021
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI: https://doi.org/10.1007/s00521-021-06223-z

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 23/2021

Orthogonal Latin squares-based firefly optimization algorithm for industrial quadratic assignment tasks

Transform-based graph topology similarity metrics

Hybrid intelligent framework for one-day ahead wind speed forecasting

Automatic steel grades design for Jominy profile achievement through neural networks and genetic algorithms

Investigation into the automatic drilling of cortical bones using ANFIS-PSO and sensitivity analysis

Spatial bound whale optimization algorithm: an efficient high-dimensional feature selection approach

Premium Partner