Skip to main content

30.08.2024

A Novel Multimodal Generative Learning Model based on Basic Fuzzy Concepts

verfasst von: Huankun Sheng, Hongwei Mo, Tengteng Zhang

Erschienen in: Cognitive Computation

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Multimodal models are designed to process different types of data within a single generative framework. The prevalent strategy in previous methods involves learning joint representations that are shared across different modalities. These joint representations are typically obtained by concatenating the top of layers of modality-specific networks. Recently, significant advancements have been made in generating images from text and vice versa. Despite these successes, current models often overlook the role of fuzzy concepts, which are crucial given that human cognitive processes inherently involve a high degree of fuzziness. Recognizing and incorporating fuzzy concepts is therefore essential for enhancing the effectiveness of multimodal cognition models. In this paper, a novel framework, named the Fuzzy Concept Learning Model (FCLM), is proposed to process modalities based on fuzzy concepts. The high-level abstractions between different modalities in the FCLM are represented by the ‘fuzzy concept functions.’ After training, the FCLM is capable of generating images from attribute descriptions and inferring the attributes of input images. Additionally, it can formulate fuzzy concepts at various levels of abstraction. Extensive experiments were conducted on the dSprites and 3D Chairs datasets. Both qualitative and quantitative results from these experiments demonstrate the effectiveness and efficiency of the proposed framework. The FCLM integrates the fuzzy cognitive mechanism with the statistical characteristics of the environment. This innovative cognition-inspired framework offers a novel perspective for processing multimodal information.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Kingma DP, Rezende DJ, Mohamed S, Welling M. Semi-supervised learning with deep generative models. In: Advances in neural information processing systems; 2014. pp. 3581–3589. Kingma DP, Rezende DJ, Mohamed S, Welling M. Semi-supervised learning with deep generative models. In: Advances in neural information processing systems; 2014. pp. 3581–3589.
2.
Zurück zum Zitat Kulkarni TD, Whitney W, Kohli P, Tenenbaum JB. Deep convolutional inverse graphics network. In: Advances in neural information processing systems; 2015.pp. 2539–2547. Kulkarni TD, Whitney W, Kohli P, Tenenbaum JB. Deep convolutional inverse graphics network. In: Advances in neural information processing systems; 2015.pp. 2539–2547.
3.
Zurück zum Zitat Yan X, Yang J, Sohn K, Lee H. Attribute2image: conditional image generation from visual attributes. In: European conference on computer vision; 2016. pp. 776–791. Yan X, Yang J, Sohn K, Lee H. Attribute2image: conditional image generation from visual attributes. In: European conference on computer vision; 2016. pp. 776–791.
4.
Zurück zum Zitat Larsen ABL, Sønderby SK, Larochelle H, Winther O. Autoencoding beyond pixels using a learned similarity metric.arXiv preprint. 2015. arXiv:1512.09300. Larsen ABL, Sønderby SK, Larochelle H, Winther O. Autoencoding beyond pixels using a learned similarity metric.arXiv preprint. 2015. arXiv:​1512.​09300.
5.
Zurück zum Zitat Li Y, Ouyang W, Zhou B, Shi J, Zhang C, Wang X. Factorizable Net: An Efficient Subgraph-based Framework for Scene Graph Generation. In: European conference on computer vision; 2018. Pp. 335–351. Li Y, Ouyang W, Zhou B, Shi J, Zhang C, Wang X. Factorizable Net: An Efficient Subgraph-based Framework for Scene Graph Generation. In: European conference on computer vision; 2018. Pp. 335–351.
6.
Zurück zum Zitat Yang J, Lu J, Lee S, Batra D, Parikh D. Graph R-CNN for scene graph generation. In: European conference on computer vision; 2018. pp. 670–685. Yang J, Lu J, Lee S, Batra D, Parikh D. Graph R-CNN for scene graph generation. In: European conference on computer vision; 2018. pp. 670–685.
7.
Zurück zum Zitat Xu D, Zhu Y, Choy CB, Fei-Fei L. Scene graph generation by iterative message passing. In: IEEE Conference on computer vision and pattern recognition (CVPR); 2017. pp. 5410–5419. Xu D, Zhu Y, Choy CB, Fei-Fei L. Scene graph generation by iterative message passing. In: IEEE Conference on computer vision and pattern recognition (CVPR); 2017. pp. 5410–5419.
8.
Zurück zum Zitat Johnson J, Gupta A, Fei-Fei L. Image generation from scene graphs. In: IEEE Conference on computer vision and pattern recognition; 2018 pp.1219–1228. Johnson J, Gupta A, Fei-Fei L. Image generation from scene graphs. In: IEEE Conference on computer vision and pattern recognition; 2018 pp.1219–1228.
9.
Zurück zum Zitat Ding G, Chen M, Zhao S, et al. Neural image caption generation with weighted training and reference. Cogn comput. 2019;11:763–77.CrossRef Ding G, Chen M, Zhao S, et al. Neural image caption generation with weighted training and reference. Cogn comput. 2019;11:763–77.CrossRef
10.
Zurück zum Zitat Sohn K, Shang W, Lee H. Improved Multimodal Deep Learning with Variation of Information. In: Advances in neural information processing systems; 2014. pp. 2141–2149. Sohn K, Shang W, Lee H. Improved Multimodal Deep Learning with Variation of Information. In: Advances in neural information processing systems; 2014. pp. 2141–2149.
12.
Zurück zum Zitat Rezende DJ, Mohamed S, Wierstra D. Stochastic backpropagation and variational inference in deep latent gaussian models. arXiv preprint. 2014. arXiv:1401.4082. Rezende DJ, Mohamed S, Wierstra D. Stochastic backpropagation and variational inference in deep latent gaussian models. arXiv preprint. 2014. arXiv:​1401.​4082.
13.
14.
15.
Zurück zum Zitat Vedantam R, Fischer I, Huang J, Murphy K. Generative models of visually grounded imagination. arXiv preprint. 2018. arXiv:1705.10762. Vedantam R, Fischer I, Huang J, Murphy K. Generative models of visually grounded imagination. arXiv preprint. 2018. arXiv:​1705.​10762.
16.
Zurück zum Zitat Higgins I, Sonnerat N, Matthey L, Pal A, Burgess CP, Botvinick M, et al. Scan: learning abstract hierarchical compositional visual concepts; arXiv preprint. 2017. arXiv:1707.03389. Higgins I, Sonnerat N, Matthey L, Pal A, Burgess CP, Botvinick M, et al. Scan: learning abstract hierarchical compositional visual concepts; arXiv preprint. 2017. arXiv:​1707.​03389.
17.
Zurück zum Zitat Liu C, Shang Z, Tang YY. Zero-Shot Learning with Fuzzy Attribute. In: 2017 3rd IEEE international conference on cybernetics (CYBCONF); 2017. pp. 1–6. Liu C, Shang Z, Tang YY. Zero-Shot Learning with Fuzzy Attribute. In: 2017 3rd IEEE international conference on cybernetics (CYBCONF); 2017. pp. 1–6.
18.
Zurück zum Zitat Lawry J, Tang Y. Uncertainty modelling for vague concepts: a prototype theory approach. Artif Intell. 2009;173(18):1539–58.MathSciNetCrossRef Lawry J, Tang Y. Uncertainty modelling for vague concepts: a prototype theory approach. Artif Intell. 2009;173(18):1539–58.MathSciNetCrossRef
19.
Zurück zum Zitat Li XH, Chen XH. D-Intuitionistic hesitant fuzzy sets and their application in multiple attribute decision making. Cogn Comput. 2018;10(3):496–505.CrossRef Li XH, Chen XH. D-Intuitionistic hesitant fuzzy sets and their application in multiple attribute decision making. Cogn Comput. 2018;10(3):496–505.CrossRef
20.
Zurück zum Zitat Liu PD, Li HG. Interval-valued intuitionistic fuzzy power bonferroni aggregation operators and their application to group decision making. Cogn Comput. 2017;9(4):494–512.CrossRef Liu PD, Li HG. Interval-valued intuitionistic fuzzy power bonferroni aggregation operators and their application to group decision making. Cogn Comput. 2017;9(4):494–512.CrossRef
21.
Zurück zum Zitat Seiti H, Hafezalkotob A. A New risk-based fuzzy cognitive model and its application to decision-making. Cogn Comput. 2020;12(1):309–26.CrossRef Seiti H, Hafezalkotob A. A New risk-based fuzzy cognitive model and its application to decision-making. Cogn Comput. 2020;12(1):309–26.CrossRef
22.
Zurück zum Zitat Liu P, Qin X. A new decision-making method based on interval-valued linguistic intuitionistic fuzzy information. Cogn Comput. 2019;11(1):125–44.CrossRef Liu P, Qin X. A new decision-making method based on interval-valued linguistic intuitionistic fuzzy information. Cogn Comput. 2019;11(1):125–44.CrossRef
23.
Zurück zum Zitat Chen RTQ, Li X, Grosse R, Duvenaud D. Isolating sources of disentanglement in variational autoencoders. In: Advances in neural information processing systems. 2018. pp. 2610–2620. Chen RTQ, Li X, Grosse R, Duvenaud D. Isolating sources of disentanglement in variational autoencoders. In: Advances in neural information processing systems. 2018. pp. 2610–2620.
24.
Zurück zum Zitat Bengio Y, Courville A, Vincent P. Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell. 2012;35(8):1798–828.CrossRef Bengio Y, Courville A, Vincent P. Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell. 2012;35(8):1798–828.CrossRef
25.
Zurück zum Zitat Zhu YZ, Min MR, Kadav A, et al. S3VAE: self-supervised sequential VAE for representation disentanglement and data generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition; 2020. pp. 6538–6547. Zhu YZ, Min MR, Kadav A, et al. S3VAE: self-supervised sequential VAE for representation disentanglement and data generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition; 2020. pp. 6538–6547.
27.
Zurück zum Zitat Roche E. Cognitive representations of semantic categories. J Exp Psychol-Gen. 1975;104(3):192.CrossRef Roche E. Cognitive representations of semantic categories. J Exp Psychol-Gen. 1975;104(3):192.CrossRef
28.
Zurück zum Zitat Goodman IR, Nguyen HT. Uncertainty models for knowledge-based systems: a unified approach to the measurement of uncertainty, Elsevier Science Inc. 1985. Goodman IR, Nguyen HT. Uncertainty models for knowledge-based systems: a unified approach to the measurement of uncertainty, Elsevier Science Inc. 1985.
30.
Zurück zum Zitat Lawry J, Tang Y. Relating Prototype Theory and Label Semantics. Soft Methods for Handling Variability and Imprecision.Springer; 2008. pp. 35–42. Lawry J, Tang Y. Relating Prototype Theory and Label Semantics. Soft Methods for Handling Variability and Imprecision.Springer; 2008. pp. 35–42.
31.
Zurück zum Zitat Tang Y, Lawry J. Information cell mixture models: The cognitive representations of vague concepts. Integrated Uncertainty Management and Applications, Springer; 2010. pp. 371–382. Tang Y, Lawry J. Information cell mixture models: The cognitive representations of vague concepts. Integrated Uncertainty Management and Applications, Springer; 2010. pp. 371–382.
32.
Zurück zum Zitat Sohn K, Yan X, Lee H, Arbor A. Learning structured output representation using deep conditional generative models. In: Advances in neural information processing systems; 2015. pp. 83–3491. Sohn K, Yan X, Lee H, Arbor A. Learning structured output representation using deep conditional generative models. In: Advances in neural information processing systems; 2015. pp. 83–3491.
33.
Zurück zum Zitat Pandey G, Dukkipati A. Variational methods for conditional multimodal deep learning. In: International joint conference on neural networks; 2017. pp. 308–315. Pandey G, Dukkipati A. Variational methods for conditional multimodal deep learning. In: International joint conference on neural networks; 2017. pp. 308–315.
34.
Zurück zum Zitat Wang X, Tan K, Du Q, et al. CVA2E: A conditional variation-al autoencoder with an adversarial training process for hyperspectral imagery classification. IEEE Transactions on geoscience and remote sensing; 2020. pp. 1–17. Wang X, Tan K, Du Q, et al. CVA2E: A conditional variation-al autoencoder with an adversarial training process for hyperspectral imagery classification. IEEE Transactions on geoscience and remote sensing; 2020. pp. 1–17.
35.
Zurück zum Zitat Wu H, Jia J, Xie L, et al. Cross-VAE: Towards disentangling expression from identity for human faces. In: IEEE International conference on acoustics, speech and signal processing (ICASSP); 2020. pp. 4087–4091. Wu H, Jia J, Xie L, et al. Cross-VAE: Towards disentangling expression from identity for human faces. In: IEEE International conference on acoustics, speech and signal processing (ICASSP); 2020. pp. 4087–4091.
36.
Zurück zum Zitat Goodfellow I. J., Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A ,Bengio Y. Generative adversarial nets. In: Advances in neural information processing systems. 2014. pp. 139–44. Goodfellow I. J., Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A ,Bengio Y. Generative adversarial nets. In: Advances in neural information processing systems. 2014. pp. 139–44.
37.
Zurück zum Zitat Reed S, Akata Z, Mohan S, Tenka S, Schiele B, Lee H. Learning what and where to draw. In: Advances in neural information processing systems; 2016. pp. 217–225. Reed S, Akata Z, Mohan S, Tenka S, Schiele B, Lee H. Learning what and where to draw. In: Advances in neural information processing systems; 2016. pp. 217–225.
38.
Zurück zum Zitat Reed S, Akata Z, Yan X, Logeswaran L, Schiele B, Lee H. Generative adversarial text to image synthesis. In: International conference on machine learning. 2016. pp: 1060–9. Reed S, Akata Z, Yan X, Logeswaran L, Schiele B, Lee H. Generative adversarial text to image synthesis. In: International conference on machine learning. 2016. pp: 1060–9.
39.
Zurück zum Zitat Heim E. Constrained generative adversarial networks for interactive image generation. In: The IEEE conference on computer vision and pattern recognition (CVPR); 2019. pp. 10753–10761. Heim E. Constrained generative adversarial networks for interactive image generation. In: The IEEE conference on computer vision and pattern recognition (CVPR); 2019. pp. 10753–10761.
40.
Zurück zum Zitat Park T, Liu MY, Wang TC, Zhu JY. Semantic image synthesis with spatially-adaptive normalization. In: The IEEE conference on computer vision and pattern recognition (CVPR); 2019 pp. 2337–2346. Park T, Liu MY, Wang TC, Zhu JY. Semantic image synthesis with spatially-adaptive normalization. In: The IEEE conference on computer vision and pattern recognition (CVPR); 2019 pp. 2337–2346.
41.
Zurück zum Zitat Wang Z, Healy G, Smeaton AF, et al. Use of neural signals to evaluate the quality of generative adversarial network performance in facial image generation. Cogn Comput. 2020;12:13–24.CrossRef Wang Z, Healy G, Smeaton AF, et al. Use of neural signals to evaluate the quality of generative adversarial network performance in facial image generation. Cogn Comput. 2020;12:13–24.CrossRef
42.
Zurück zum Zitat Gu JX, Zhao HD, Lin Z, Li S, Cai JF, Ling . Scene graph generation with external knowledge and image reconstruction. In: The IEEE conference on computer vision and pattern recognition (CVPR); 2019. pp. 1969–1978. Gu JX, Zhao HD, Lin Z, Li S, Cai JF, Ling . Scene graph generation with external knowledge and image reconstruction. In: The IEEE conference on computer vision and pattern recognition (CVPR); 2019. pp. 1969–1978.
43.
Zurück zum Zitat Zakraoui J, Saleh M, Asghar U, et al. Generating images from Arabic story-text using scene graph, In: 2020 IEEE international conference on informatics, IoT, and enabling technologies (ICIoT); 2020. pp. 469–475. Zakraoui J, Saleh M, Asghar U, et al. Generating images from Arabic story-text using scene graph, In: 2020 IEEE international conference on informatics, IoT, and enabling technologies (ICIoT); 2020. pp. 469–475.
44.
Zurück zum Zitat Tenenbaum J. Building machines that learn and think like people. In: Proceedings of the 17th international conference on autonomous agents and multiAgent systems. 2018. p. 5. Tenenbaum J. Building machines that learn and think like people. In: Proceedings of the 17th international conference on autonomous agents and multiAgent systems. 2018. p. 5.
45.
Zurück zum Zitat Lake BM, Salakhutdinov R, Tenenbaum JB. Human-level concept learning through probabilistic program induction. Science. 2015;350(6266):1332–8.MathSciNetCrossRef Lake BM, Salakhutdinov R, Tenenbaum JB. Human-level concept learning through probabilistic program induction. Science. 2015;350(6266):1332–8.MathSciNetCrossRef
46.
Zurück zum Zitat Zhang SF, Huang KZ, Zhang R, Hussain A. Learning from few samples with memory network. Cogn Comput. 2018;10(1):15–22.CrossRef Zhang SF, Huang KZ, Zhang R, Hussain A. Learning from few samples with memory network. Cogn Comput. 2018;10(1):15–22.CrossRef
48.
Zurück zum Zitat Harris E, Niranjan M, Hare J. A biologically inspired visual working memory for deep networks.arXiv preprint. 2019. arXiv:1901.03665. Harris E, Niranjan M, Hare J. A biologically inspired visual working memory for deep networks.arXiv preprint. 2019. arXiv:​1901.​03665.
49.
Zurück zum Zitat Gauthier J, Levy R, Tenenbaum J. B. Word learning and the acquisition of syntactic-semantic overhypotheses. In: CogSci. 2018. pp. 1699–704. Gauthier J, Levy R, Tenenbaum J. B. Word learning and the acquisition of syntactic-semantic overhypotheses. In: CogSci. 2018. pp. 1699–704.
50.
Zurück zum Zitat Mao J, Gan C, Kohli P, Tenenbaum JB, Wu J. The neuro-symbolic concept learner: Interpreting scenes, words, and sentences from natural supervision. arXiv preprint. 2019. arXiv:1904.12584. Mao J, Gan C, Kohli P, Tenenbaum JB, Wu J. The neuro-symbolic concept learner: Interpreting scenes, words, and sentences from natural supervision. arXiv preprint. 2019. arXiv:​1904.​12584.
51.
Zurück zum Zitat Yi K, Wu J, Gan C, Torralba A, Kohli P, Tenenbaum J. B. Neural-symbolic vqa: disentangling reasoning from vision and language understanding. In: Proceedings of the 32nd international conference on neural information processing systems. 2018. pp. 1039–50. Yi K, Wu J, Gan C, Torralba A, Kohli P, Tenenbaum J. B. Neural-symbolic vqa: disentangling reasoning from vision and language understanding. In: Proceedings of the 32nd international conference on neural information processing systems. 2018. pp. 1039–50.
52.
Zurück zum Zitat Higgins I, Matthey L, Glorot X, Pal A, et al. beta-VAE: Learning basic visual concepts with a constrained variational framework. In: International conference on learning representations; 2017. p. 3. Higgins I, Matthey L, Glorot X, Pal A, et al. beta-VAE: Learning basic visual concepts with a constrained variational framework. In: International conference on learning representations; 2017. p. 3.
55.
Zurück zum Zitat Aubry M, Maturana D, Efros AA, Russell BC and Sivic J. Seeing 3D Chairs: Exemplar part-based 2D-3D alignment using a large dataset of cad models. In: IEEE Conference on computer vision and pattern recognition; 2014. pp. 3762–3769. Aubry M, Maturana D, Efros AA, Russell BC and Sivic J. Seeing 3D Chairs: Exemplar part-based 2D-3D alignment using a large dataset of cad models. In: IEEE Conference on computer vision and pattern recognition; 2014. pp. 3762–3769.
Metadaten
Titel
A Novel Multimodal Generative Learning Model based on Basic Fuzzy Concepts
verfasst von
Huankun Sheng
Hongwei Mo
Tengteng Zhang
Publikationsdatum
30.08.2024
Verlag
Springer US
Erschienen in
Cognitive Computation
Print ISSN: 1866-9956
Elektronische ISSN: 1866-9964
DOI
https://doi.org/10.1007/s12559-024-10336-7

Premium Partner