Skip to main content
Top

30-08-2024

A Novel Multimodal Generative Learning Model based on Basic Fuzzy Concepts

Authors: Huankun Sheng, Hongwei Mo, Tengteng Zhang

Published in: Cognitive Computation

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Multimodal models are designed to process different types of data within a single generative framework. The prevalent strategy in previous methods involves learning joint representations that are shared across different modalities. These joint representations are typically obtained by concatenating the top of layers of modality-specific networks. Recently, significant advancements have been made in generating images from text and vice versa. Despite these successes, current models often overlook the role of fuzzy concepts, which are crucial given that human cognitive processes inherently involve a high degree of fuzziness. Recognizing and incorporating fuzzy concepts is therefore essential for enhancing the effectiveness of multimodal cognition models. In this paper, a novel framework, named the Fuzzy Concept Learning Model (FCLM), is proposed to process modalities based on fuzzy concepts. The high-level abstractions between different modalities in the FCLM are represented by the ‘fuzzy concept functions.’ After training, the FCLM is capable of generating images from attribute descriptions and inferring the attributes of input images. Additionally, it can formulate fuzzy concepts at various levels of abstraction. Extensive experiments were conducted on the dSprites and 3D Chairs datasets. Both qualitative and quantitative results from these experiments demonstrate the effectiveness and efficiency of the proposed framework. The FCLM integrates the fuzzy cognitive mechanism with the statistical characteristics of the environment. This innovative cognition-inspired framework offers a novel perspective for processing multimodal information.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Kingma DP, Rezende DJ, Mohamed S, Welling M. Semi-supervised learning with deep generative models. In: Advances in neural information processing systems; 2014. pp. 3581–3589. Kingma DP, Rezende DJ, Mohamed S, Welling M. Semi-supervised learning with deep generative models. In: Advances in neural information processing systems; 2014. pp. 3581–3589.
2.
go back to reference Kulkarni TD, Whitney W, Kohli P, Tenenbaum JB. Deep convolutional inverse graphics network. In: Advances in neural information processing systems; 2015.pp. 2539–2547. Kulkarni TD, Whitney W, Kohli P, Tenenbaum JB. Deep convolutional inverse graphics network. In: Advances in neural information processing systems; 2015.pp. 2539–2547.
3.
go back to reference Yan X, Yang J, Sohn K, Lee H. Attribute2image: conditional image generation from visual attributes. In: European conference on computer vision; 2016. pp. 776–791. Yan X, Yang J, Sohn K, Lee H. Attribute2image: conditional image generation from visual attributes. In: European conference on computer vision; 2016. pp. 776–791.
4.
go back to reference Larsen ABL, Sønderby SK, Larochelle H, Winther O. Autoencoding beyond pixels using a learned similarity metric.arXiv preprint. 2015. arXiv:1512.09300. Larsen ABL, Sønderby SK, Larochelle H, Winther O. Autoencoding beyond pixels using a learned similarity metric.arXiv preprint. 2015. arXiv:​1512.​09300.
5.
go back to reference Li Y, Ouyang W, Zhou B, Shi J, Zhang C, Wang X. Factorizable Net: An Efficient Subgraph-based Framework for Scene Graph Generation. In: European conference on computer vision; 2018. Pp. 335–351. Li Y, Ouyang W, Zhou B, Shi J, Zhang C, Wang X. Factorizable Net: An Efficient Subgraph-based Framework for Scene Graph Generation. In: European conference on computer vision; 2018. Pp. 335–351.
6.
go back to reference Yang J, Lu J, Lee S, Batra D, Parikh D. Graph R-CNN for scene graph generation. In: European conference on computer vision; 2018. pp. 670–685. Yang J, Lu J, Lee S, Batra D, Parikh D. Graph R-CNN for scene graph generation. In: European conference on computer vision; 2018. pp. 670–685.
7.
go back to reference Xu D, Zhu Y, Choy CB, Fei-Fei L. Scene graph generation by iterative message passing. In: IEEE Conference on computer vision and pattern recognition (CVPR); 2017. pp. 5410–5419. Xu D, Zhu Y, Choy CB, Fei-Fei L. Scene graph generation by iterative message passing. In: IEEE Conference on computer vision and pattern recognition (CVPR); 2017. pp. 5410–5419.
8.
go back to reference Johnson J, Gupta A, Fei-Fei L. Image generation from scene graphs. In: IEEE Conference on computer vision and pattern recognition; 2018 pp.1219–1228. Johnson J, Gupta A, Fei-Fei L. Image generation from scene graphs. In: IEEE Conference on computer vision and pattern recognition; 2018 pp.1219–1228.
9.
go back to reference Ding G, Chen M, Zhao S, et al. Neural image caption generation with weighted training and reference. Cogn comput. 2019;11:763–77.CrossRef Ding G, Chen M, Zhao S, et al. Neural image caption generation with weighted training and reference. Cogn comput. 2019;11:763–77.CrossRef
10.
go back to reference Sohn K, Shang W, Lee H. Improved Multimodal Deep Learning with Variation of Information. In: Advances in neural information processing systems; 2014. pp. 2141–2149. Sohn K, Shang W, Lee H. Improved Multimodal Deep Learning with Variation of Information. In: Advances in neural information processing systems; 2014. pp. 2141–2149.
12.
go back to reference Rezende DJ, Mohamed S, Wierstra D. Stochastic backpropagation and variational inference in deep latent gaussian models. arXiv preprint. 2014. arXiv:1401.4082. Rezende DJ, Mohamed S, Wierstra D. Stochastic backpropagation and variational inference in deep latent gaussian models. arXiv preprint. 2014. arXiv:​1401.​4082.
14.
15.
16.
go back to reference Higgins I, Sonnerat N, Matthey L, Pal A, Burgess CP, Botvinick M, et al. Scan: learning abstract hierarchical compositional visual concepts; arXiv preprint. 2017. arXiv:1707.03389. Higgins I, Sonnerat N, Matthey L, Pal A, Burgess CP, Botvinick M, et al. Scan: learning abstract hierarchical compositional visual concepts; arXiv preprint. 2017. arXiv:​1707.​03389.
17.
go back to reference Liu C, Shang Z, Tang YY. Zero-Shot Learning with Fuzzy Attribute. In: 2017 3rd IEEE international conference on cybernetics (CYBCONF); 2017. pp. 1–6. Liu C, Shang Z, Tang YY. Zero-Shot Learning with Fuzzy Attribute. In: 2017 3rd IEEE international conference on cybernetics (CYBCONF); 2017. pp. 1–6.
18.
go back to reference Lawry J, Tang Y. Uncertainty modelling for vague concepts: a prototype theory approach. Artif Intell. 2009;173(18):1539–58.MathSciNetCrossRef Lawry J, Tang Y. Uncertainty modelling for vague concepts: a prototype theory approach. Artif Intell. 2009;173(18):1539–58.MathSciNetCrossRef
19.
go back to reference Li XH, Chen XH. D-Intuitionistic hesitant fuzzy sets and their application in multiple attribute decision making. Cogn Comput. 2018;10(3):496–505.CrossRef Li XH, Chen XH. D-Intuitionistic hesitant fuzzy sets and their application in multiple attribute decision making. Cogn Comput. 2018;10(3):496–505.CrossRef
20.
go back to reference Liu PD, Li HG. Interval-valued intuitionistic fuzzy power bonferroni aggregation operators and their application to group decision making. Cogn Comput. 2017;9(4):494–512.CrossRef Liu PD, Li HG. Interval-valued intuitionistic fuzzy power bonferroni aggregation operators and their application to group decision making. Cogn Comput. 2017;9(4):494–512.CrossRef
21.
go back to reference Seiti H, Hafezalkotob A. A New risk-based fuzzy cognitive model and its application to decision-making. Cogn Comput. 2020;12(1):309–26.CrossRef Seiti H, Hafezalkotob A. A New risk-based fuzzy cognitive model and its application to decision-making. Cogn Comput. 2020;12(1):309–26.CrossRef
22.
go back to reference Liu P, Qin X. A new decision-making method based on interval-valued linguistic intuitionistic fuzzy information. Cogn Comput. 2019;11(1):125–44.CrossRef Liu P, Qin X. A new decision-making method based on interval-valued linguistic intuitionistic fuzzy information. Cogn Comput. 2019;11(1):125–44.CrossRef
23.
go back to reference Chen RTQ, Li X, Grosse R, Duvenaud D. Isolating sources of disentanglement in variational autoencoders. In: Advances in neural information processing systems. 2018. pp. 2610–2620. Chen RTQ, Li X, Grosse R, Duvenaud D. Isolating sources of disentanglement in variational autoencoders. In: Advances in neural information processing systems. 2018. pp. 2610–2620.
24.
go back to reference Bengio Y, Courville A, Vincent P. Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell. 2012;35(8):1798–828.CrossRef Bengio Y, Courville A, Vincent P. Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell. 2012;35(8):1798–828.CrossRef
25.
go back to reference Zhu YZ, Min MR, Kadav A, et al. S3VAE: self-supervised sequential VAE for representation disentanglement and data generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition; 2020. pp. 6538–6547. Zhu YZ, Min MR, Kadav A, et al. S3VAE: self-supervised sequential VAE for representation disentanglement and data generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition; 2020. pp. 6538–6547.
27.
go back to reference Roche E. Cognitive representations of semantic categories. J Exp Psychol-Gen. 1975;104(3):192.CrossRef Roche E. Cognitive representations of semantic categories. J Exp Psychol-Gen. 1975;104(3):192.CrossRef
28.
go back to reference Goodman IR, Nguyen HT. Uncertainty models for knowledge-based systems: a unified approach to the measurement of uncertainty, Elsevier Science Inc. 1985. Goodman IR, Nguyen HT. Uncertainty models for knowledge-based systems: a unified approach to the measurement of uncertainty, Elsevier Science Inc. 1985.
30.
go back to reference Lawry J, Tang Y. Relating Prototype Theory and Label Semantics. Soft Methods for Handling Variability and Imprecision.Springer; 2008. pp. 35–42. Lawry J, Tang Y. Relating Prototype Theory and Label Semantics. Soft Methods for Handling Variability and Imprecision.Springer; 2008. pp. 35–42.
31.
go back to reference Tang Y, Lawry J. Information cell mixture models: The cognitive representations of vague concepts. Integrated Uncertainty Management and Applications, Springer; 2010. pp. 371–382. Tang Y, Lawry J. Information cell mixture models: The cognitive representations of vague concepts. Integrated Uncertainty Management and Applications, Springer; 2010. pp. 371–382.
32.
go back to reference Sohn K, Yan X, Lee H, Arbor A. Learning structured output representation using deep conditional generative models. In: Advances in neural information processing systems; 2015. pp. 83–3491. Sohn K, Yan X, Lee H, Arbor A. Learning structured output representation using deep conditional generative models. In: Advances in neural information processing systems; 2015. pp. 83–3491.
33.
go back to reference Pandey G, Dukkipati A. Variational methods for conditional multimodal deep learning. In: International joint conference on neural networks; 2017. pp. 308–315. Pandey G, Dukkipati A. Variational methods for conditional multimodal deep learning. In: International joint conference on neural networks; 2017. pp. 308–315.
34.
go back to reference Wang X, Tan K, Du Q, et al. CVA2E: A conditional variation-al autoencoder with an adversarial training process for hyperspectral imagery classification. IEEE Transactions on geoscience and remote sensing; 2020. pp. 1–17. Wang X, Tan K, Du Q, et al. CVA2E: A conditional variation-al autoencoder with an adversarial training process for hyperspectral imagery classification. IEEE Transactions on geoscience and remote sensing; 2020. pp. 1–17.
35.
go back to reference Wu H, Jia J, Xie L, et al. Cross-VAE: Towards disentangling expression from identity for human faces. In: IEEE International conference on acoustics, speech and signal processing (ICASSP); 2020. pp. 4087–4091. Wu H, Jia J, Xie L, et al. Cross-VAE: Towards disentangling expression from identity for human faces. In: IEEE International conference on acoustics, speech and signal processing (ICASSP); 2020. pp. 4087–4091.
36.
go back to reference Goodfellow I. J., Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A ,Bengio Y. Generative adversarial nets. In: Advances in neural information processing systems. 2014. pp. 139–44. Goodfellow I. J., Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A ,Bengio Y. Generative adversarial nets. In: Advances in neural information processing systems. 2014. pp. 139–44.
37.
go back to reference Reed S, Akata Z, Mohan S, Tenka S, Schiele B, Lee H. Learning what and where to draw. In: Advances in neural information processing systems; 2016. pp. 217–225. Reed S, Akata Z, Mohan S, Tenka S, Schiele B, Lee H. Learning what and where to draw. In: Advances in neural information processing systems; 2016. pp. 217–225.
38.
go back to reference Reed S, Akata Z, Yan X, Logeswaran L, Schiele B, Lee H. Generative adversarial text to image synthesis. In: International conference on machine learning. 2016. pp: 1060–9. Reed S, Akata Z, Yan X, Logeswaran L, Schiele B, Lee H. Generative adversarial text to image synthesis. In: International conference on machine learning. 2016. pp: 1060–9.
39.
go back to reference Heim E. Constrained generative adversarial networks for interactive image generation. In: The IEEE conference on computer vision and pattern recognition (CVPR); 2019. pp. 10753–10761. Heim E. Constrained generative adversarial networks for interactive image generation. In: The IEEE conference on computer vision and pattern recognition (CVPR); 2019. pp. 10753–10761.
40.
go back to reference Park T, Liu MY, Wang TC, Zhu JY. Semantic image synthesis with spatially-adaptive normalization. In: The IEEE conference on computer vision and pattern recognition (CVPR); 2019 pp. 2337–2346. Park T, Liu MY, Wang TC, Zhu JY. Semantic image synthesis with spatially-adaptive normalization. In: The IEEE conference on computer vision and pattern recognition (CVPR); 2019 pp. 2337–2346.
41.
go back to reference Wang Z, Healy G, Smeaton AF, et al. Use of neural signals to evaluate the quality of generative adversarial network performance in facial image generation. Cogn Comput. 2020;12:13–24.CrossRef Wang Z, Healy G, Smeaton AF, et al. Use of neural signals to evaluate the quality of generative adversarial network performance in facial image generation. Cogn Comput. 2020;12:13–24.CrossRef
42.
go back to reference Gu JX, Zhao HD, Lin Z, Li S, Cai JF, Ling . Scene graph generation with external knowledge and image reconstruction. In: The IEEE conference on computer vision and pattern recognition (CVPR); 2019. pp. 1969–1978. Gu JX, Zhao HD, Lin Z, Li S, Cai JF, Ling . Scene graph generation with external knowledge and image reconstruction. In: The IEEE conference on computer vision and pattern recognition (CVPR); 2019. pp. 1969–1978.
43.
go back to reference Zakraoui J, Saleh M, Asghar U, et al. Generating images from Arabic story-text using scene graph, In: 2020 IEEE international conference on informatics, IoT, and enabling technologies (ICIoT); 2020. pp. 469–475. Zakraoui J, Saleh M, Asghar U, et al. Generating images from Arabic story-text using scene graph, In: 2020 IEEE international conference on informatics, IoT, and enabling technologies (ICIoT); 2020. pp. 469–475.
44.
go back to reference Tenenbaum J. Building machines that learn and think like people. In: Proceedings of the 17th international conference on autonomous agents and multiAgent systems. 2018. p. 5. Tenenbaum J. Building machines that learn and think like people. In: Proceedings of the 17th international conference on autonomous agents and multiAgent systems. 2018. p. 5.
45.
go back to reference Lake BM, Salakhutdinov R, Tenenbaum JB. Human-level concept learning through probabilistic program induction. Science. 2015;350(6266):1332–8.MathSciNetCrossRef Lake BM, Salakhutdinov R, Tenenbaum JB. Human-level concept learning through probabilistic program induction. Science. 2015;350(6266):1332–8.MathSciNetCrossRef
46.
go back to reference Zhang SF, Huang KZ, Zhang R, Hussain A. Learning from few samples with memory network. Cogn Comput. 2018;10(1):15–22.CrossRef Zhang SF, Huang KZ, Zhang R, Hussain A. Learning from few samples with memory network. Cogn Comput. 2018;10(1):15–22.CrossRef
48.
49.
go back to reference Gauthier J, Levy R, Tenenbaum J. B. Word learning and the acquisition of syntactic-semantic overhypotheses. In: CogSci. 2018. pp. 1699–704. Gauthier J, Levy R, Tenenbaum J. B. Word learning and the acquisition of syntactic-semantic overhypotheses. In: CogSci. 2018. pp. 1699–704.
50.
go back to reference Mao J, Gan C, Kohli P, Tenenbaum JB, Wu J. The neuro-symbolic concept learner: Interpreting scenes, words, and sentences from natural supervision. arXiv preprint. 2019. arXiv:1904.12584. Mao J, Gan C, Kohli P, Tenenbaum JB, Wu J. The neuro-symbolic concept learner: Interpreting scenes, words, and sentences from natural supervision. arXiv preprint. 2019. arXiv:​1904.​12584.
51.
go back to reference Yi K, Wu J, Gan C, Torralba A, Kohli P, Tenenbaum J. B. Neural-symbolic vqa: disentangling reasoning from vision and language understanding. In: Proceedings of the 32nd international conference on neural information processing systems. 2018. pp. 1039–50. Yi K, Wu J, Gan C, Torralba A, Kohli P, Tenenbaum J. B. Neural-symbolic vqa: disentangling reasoning from vision and language understanding. In: Proceedings of the 32nd international conference on neural information processing systems. 2018. pp. 1039–50.
52.
go back to reference Higgins I, Matthey L, Glorot X, Pal A, et al. beta-VAE: Learning basic visual concepts with a constrained variational framework. In: International conference on learning representations; 2017. p. 3. Higgins I, Matthey L, Glorot X, Pal A, et al. beta-VAE: Learning basic visual concepts with a constrained variational framework. In: International conference on learning representations; 2017. p. 3.
55.
go back to reference Aubry M, Maturana D, Efros AA, Russell BC and Sivic J. Seeing 3D Chairs: Exemplar part-based 2D-3D alignment using a large dataset of cad models. In: IEEE Conference on computer vision and pattern recognition; 2014. pp. 3762–3769. Aubry M, Maturana D, Efros AA, Russell BC and Sivic J. Seeing 3D Chairs: Exemplar part-based 2D-3D alignment using a large dataset of cad models. In: IEEE Conference on computer vision and pattern recognition; 2014. pp. 3762–3769.
Metadata
Title
A Novel Multimodal Generative Learning Model based on Basic Fuzzy Concepts
Authors
Huankun Sheng
Hongwei Mo
Tengteng Zhang
Publication date
30-08-2024
Publisher
Springer US
Published in
Cognitive Computation
Print ISSN: 1866-9956
Electronic ISSN: 1866-9964
DOI
https://doi.org/10.1007/s12559-024-10336-7

Premium Partner