Skip to main content
Top

10-08-2024 | Research

Prompt Learning for Multimodal Intent Recognition with Modal Alignment Perception

Authors: Yuzhao Chen, Wenhua Zhu, Weilun Yu, Hongfei Xue, Hao Fu, Jiali Lin, Dazhi Jiang

Published in: Cognitive Computation

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Multimodal intent recognition analysis is a crucial task in understanding user intent through speech, body movements, tone, and other modalities in real-world multimodal environments. However, due to the hidden nature of intent within and across modalities, most existing methods still have limitations in excavating and integrating multimodal intent information. This paper introduces a prompt learning with modal alignment perception (PMAP) approach to address these challenges. First, for excavating deep-level semantic information, the intent templates are constructed for prompt learning to enhance text representations. Then, cross-modal alignment perception is leveraged to eliminate modality discrepancies while excavating consistent hidden intent information from non-text modalities. Through multimodal semantic interaction, the position of text in the semantic space is fine-tuned, which effectively aggregates intent details from multiple modalities. Extensive experiments demonstrate that our method achieves significant improvements.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Yu W, Li C, Hu X, Zhu W, Cambria E, Jiang D. Dialogue emotion model based on local–global context encoder and commonsense knowledge fusion attention. Int J Mach Learn Cybern. 2024. pp. 1–15. Yu W, Li C, Hu X, Zhu W, Cambria E, Jiang D. Dialogue emotion model based on local–global context encoder and commonsense knowledge fusion attention. Int J Mach Learn Cybern. 2024. pp. 1–15.
2.
go back to reference Jiang D, Wei R, Liu H, Wen J, Tu G, Zheng L, Cambria EA. A multitask learning framework for multimodal sentiment analysis. In: 2021 International conference on data mining workshops (ICDMW). IEEE; 2021. pp. 151–7. Jiang D, Wei R, Liu H, Wen J, Tu G, Zheng L, Cambria EA. A multitask learning framework for multimodal sentiment analysis. In: 2021 International conference on data mining workshops (ICDMW). IEEE; 2021. pp. 151–7.
3.
go back to reference Zhang H, Li X, Xu H, Zhang P, Zhao K, Gao K. TEXTOIR: an integrated and visualized platform for text open intent recognition. arXiv:2110.15063. 2021. Zhang H, Li X, Xu H, Zhang P, Zhao K, Gao K. TEXTOIR: an integrated and visualized platform for text open intent recognition. arXiv:​2110.​15063. 2021.
4.
go back to reference Zhang H, Xu H, Lin T-E. Deep open intent classification with adaptive decision boundary. In: Proceedings of the AAAI conference on artificial intelligence, vol. 35. 2021. pp. 14374–82. Zhang H, Xu H, Lin T-E. Deep open intent classification with adaptive decision boundary. In: Proceedings of the AAAI conference on artificial intelligence, vol. 35. 2021. pp. 14374–82.
5.
go back to reference Zhang H, Xu H, Lin T-E, Lyu R. Discovering new intents with deep aligned clustering. In: Proceedings of the AAAI conference on artificial intelligence, vol. 35. 2021. pp. 14365–73. Zhang H, Xu H, Lin T-E, Lyu R. Discovering new intents with deep aligned clustering. In: Proceedings of the AAAI conference on artificial intelligence, vol. 35. 2021. pp. 14365–73.
6.
go back to reference Zhang H, Xu H, Wang X, Zhou Q, Zhao S, Teng J. MInTRec: a new dataset for multimodal intent recognition. In: Proceedings of the 30th ACM international conference on multimedia. 2022. pp. 1688–97. Zhang H, Xu H, Wang X, Zhou Q, Zhao S, Teng J. MInTRec: a new dataset for multimodal intent recognition. In: Proceedings of the 30th ACM international conference on multimedia. 2022. pp. 1688–97.
7.
go back to reference Zhang H, Xu H, Zhao S, Zhou Q. Learning discriminative representations and decision boundaries for open intent detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2023. Zhang H, Xu H, Zhao S, Zhou Q. Learning discriminative representations and decision boundaries for open intent detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2023.
8.
go back to reference Saha T, Patra A, Saha S, Bhattacharyya P. Towards emotion-aided multi-modal dialogue act classification. In: Proceedings of the 58th annual meeting of the association for computational linguistics. 2020. pp. 4361–4372. Saha T, Patra A, Saha S, Bhattacharyya P. Towards emotion-aided multi-modal dialogue act classification. In: Proceedings of the 58th annual meeting of the association for computational linguistics. 2020. pp. 4361–4372.
9.
go back to reference Tsai Y-HH, Bai S, Liang PP, Kolter JZ, Morency L-P, Salakhutdinov R. Multimodal transformer for unaligned multimodal language sequences. In: Proceedings of the conference. Association for Computational Linguistics. Meeting, vol. 2019. NIH Public Access; 2019. p. 6558. Tsai Y-HH, Bai S, Liang PP, Kolter JZ, Morency L-P, Salakhutdinov R. Multimodal transformer for unaligned multimodal language sequences. In: Proceedings of the conference. Association for Computational Linguistics. Meeting, vol. 2019. NIH Public Access; 2019. p. 6558.
10.
go back to reference Hazarika D, Zimmermann R, Poria S. MISA: modality-invariant and-specific representations for multimodal sentiment analysis. In: Proceedings of the 28th ACM international conference on multimedia. 2020. pp. 1122–1131. Hazarika D, Zimmermann R, Poria S. MISA: modality-invariant and-specific representations for multimodal sentiment analysis. In: Proceedings of the 28th ACM international conference on multimedia. 2020. pp. 1122–1131.
11.
go back to reference Rahman W, Hasan MK, Lee S, Zadeh A, Mao C, Morency L-P, Hoque E. Integrating multimodal information in large pretrained transformers. In: Proceedings of the conference. Association for Computational Linguistics. Meeting, vol. 2020. NIH Public Access; 2020. p. 2359. Rahman W, Hasan MK, Lee S, Zadeh A, Mao C, Morency L-P, Hoque E. Integrating multimodal information in large pretrained transformers. In: Proceedings of the conference. Association for Computational Linguistics. Meeting, vol. 2020. NIH Public Access; 2020. p. 2359.
12.
go back to reference Dong J, Fu J, Zhou P, Li H, Wang X. Improving spoken language understanding with cross-modal contrastive learning. In: Interspeech. 2022. pp. 2693–2697. Dong J, Fu J, Zhou P, Li H, Wang X. Improving spoken language understanding with cross-modal contrastive learning. In: Interspeech. 2022. pp. 2693–2697.
13.
go back to reference Zhou Q, Xu H, Li H, Zhang H, Zhang X, Wang Y, Gao K. Token-level contrastive learning with modality-aware prompting for multimodal intent recognition. In: Proceedings of the AAAI conference on artificial intelligence, vol. 38. 2024. pp. 17114–22. Zhou Q, Xu H, Li H, Zhang H, Zhang X, Wang Y, Gao K. Token-level contrastive learning with modality-aware prompting for multimodal intent recognition. In: Proceedings of the AAAI conference on artificial intelligence, vol. 38. 2024. pp. 17114–22.
14.
go back to reference Sun Y, Wang S, Feng S, Ding S, Pang C, Shang J, Liu J, Chen X, Zhao Y, Lu Y, et al. ERNIE 3.0: large-scale knowledge enhanced pre-training for language understanding and generation. arXiv:2107.02137. 2021. Sun Y, Wang S, Feng S, Ding S, Pang C, Shang J, Liu J, Chen X, Zhao Y, Lu Y, et al. ERNIE 3.0: large-scale knowledge enhanced pre-training for language understanding and generation. arXiv:​2107.​02137. 2021.
15.
go back to reference Wang W, Tao C, Gan Z, Wang G, Chen L, Zhang X, Zhang R, Yang Q, Henao R, Carin L. Improving textual network learning with variational homophilic embeddings. Adv Neural Inf Process Syst. 2019;32. Wang W, Tao C, Gan Z, Wang G, Chen L, Zhang X, Zhang R, Yang Q, Henao R, Carin L. Improving textual network learning with variational homophilic embeddings. Adv Neural Inf Process Syst. 2019;32.
16.
go back to reference Han W, Chen H, Gelbukh A, Zadeh A, Morency L-P, Poria S. Bi-bimodal modality fusion for correlation-controlled multimodal sentiment analysis. In: Proceedings of the 2021 international conference on multimodal interaction. 2021. pp. 6–15. Han W, Chen H, Gelbukh A, Zadeh A, Morency L-P, Poria S. Bi-bimodal modality fusion for correlation-controlled multimodal sentiment analysis. In: Proceedings of the 2021 international conference on multimodal interaction. 2021. pp. 6–15.
17.
go back to reference Han W, Chen H, Poria S. Improving multimodal fusion with hierarchical mutual information maximization for multimodal sentiment analysis. arXiv:2109.00412. 2021. Han W, Chen H, Poria S. Improving multimodal fusion with hierarchical mutual information maximization for multimodal sentiment analysis. arXiv:​2109.​00412. 2021.
18.
go back to reference Young T, Hazarika D, Poria S, Cambria E. Recent trends in deep learning based natural language processing. IEEE Comput Intell Mag. 2018;13(3):55–75.CrossRef Young T, Hazarika D, Poria S, Cambria E. Recent trends in deep learning based natural language processing. IEEE Comput Intell Mag. 2018;13(3):55–75.CrossRef
19.
go back to reference Paraskevopoulos G, Georgiou E, Potamianos A. MMLATCH: bottom-up top-down fusion for multimodal sentiment analysis. In: ICASSP 2022-2022 IEEE International conference on acoustics, speech and signal processing (ICASSP). IEEE; 2022. pp. 4573–7. Paraskevopoulos G, Georgiou E, Potamianos A. MMLATCH: bottom-up top-down fusion for multimodal sentiment analysis. In: ICASSP 2022-2022 IEEE International conference on acoustics, speech and signal processing (ICASSP). IEEE; 2022. pp. 4573–7.
20.
go back to reference Wen J, Tu G, Li R, Jiang D, Zhu W. Learning more from mixed emotions: a label refinement method for emotion recognition in conversations. Transactions of the Association for Computational Linguistics. 2023;11:1485–99.CrossRef Wen J, Tu G, Li R, Jiang D, Zhu W. Learning more from mixed emotions: a label refinement method for emotion recognition in conversations. Transactions of the Association for Computational Linguistics. 2023;11:1485–99.CrossRef
21.
go back to reference Hou M, Tang J, Zhang J, Kong W, Zhao Q. Deep multimodal multilinear fusion with high-order polynomial pooling. Adv Neural Inf Process Syst. 2019;32. Hou M, Tang J, Zhang J, Kong W, Zhao Q. Deep multimodal multilinear fusion with high-order polynomial pooling. Adv Neural Inf Process Syst. 2019;32.
22.
go back to reference Jie YW, Satapathy R, Mong GS, Cambria E, et al. How interpretable are reasoning explanations from prompting large language models? arXiv:2402.11863. 2024. Jie YW, Satapathy R, Mong GS, Cambria E, et al. How interpretable are reasoning explanations from prompting large language models? arXiv:​2402.​11863. 2024.
23.
go back to reference Le-Hong P, Cambria E. A semantics-aware approach for multilingual natural language inference. Language Resources and Evaluation. 2023. pp. 1–29. Le-Hong P, Cambria E. A semantics-aware approach for multilingual natural language inference. Language Resources and Evaluation. 2023. pp. 1–29.
24.
go back to reference Gandhi A, Adhvaryu K, Poria S, Cambria E, Hussain A. Multimodal sentiment analysis: a systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions. Inf Fusion. 2023;91:424–44.CrossRef Gandhi A, Adhvaryu K, Poria S, Cambria E, Hussain A. Multimodal sentiment analysis: a systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions. Inf Fusion. 2023;91:424–44.CrossRef
25.
go back to reference Herreras EB. Cognitive neuroscience; the biology of the mind. Cuadernos de Neuropsicología/Panamerican Journal of Neuropsychology. 2010;4(1):87–90. Herreras EB. Cognitive neuroscience; the biology of the mind. Cuadernos de Neuropsicología/Panamerican Journal of Neuropsychology. 2010;4(1):87–90.
26.
go back to reference Groome D, Eysenck MW. Cognitive psychology: revisiting the classic studies. 2023. Groome D, Eysenck MW. Cognitive psychology: revisiting the classic studies. 2023.
27.
go back to reference Liu H, Yang B, Yu Z. A multi-view interactive approach for multimodal sarcasm detection in social Internet of Things with knowledge enhancement. Appl Sciences. 2024;14(5). Liu H, Yang B, Yu Z. A multi-view interactive approach for multimodal sarcasm detection in social Internet of Things with knowledge enhancement. Appl Sciences. 2024;14(5).
28.
go back to reference Minaee S, Kalchbrenner N, Cambria E, Nikzad N, Chenaghlu M, Gao J. Deep learning-based text classification: a comprehensive review. ACM computing surveys (CSUR). 2021;54(3):1–40.CrossRef Minaee S, Kalchbrenner N, Cambria E, Nikzad N, Chenaghlu M, Gao J. Deep learning-based text classification: a comprehensive review. ACM computing surveys (CSUR). 2021;54(3):1–40.CrossRef
29.
go back to reference Jiang D, Liu H, Tu G, Wei R. Window transformer for dialogue document: a joint framework for causal emotion entailment. Int J Mach Learn Cybern. 2023. pp. 1–11. Jiang D, Liu H, Tu G, Wei R. Window transformer for dialogue document: a joint framework for causal emotion entailment. Int J Mach Learn Cybern. 2023. pp. 1–11.
30.
go back to reference Jiang D, Liu H, Tu G, Wei R, Cambria E. Self-supervised utterance order prediction for emotion recognition in conversations. Neurocomputing. 2024. p. 127370. Jiang D, Liu H, Tu G, Wei R, Cambria E. Self-supervised utterance order prediction for emotion recognition in conversations. Neurocomputing. 2024. p. 127370.
31.
go back to reference Fu H, Liu H, Wang H, Xu L, Lin J, Jiang D. Multi-modal sarcasm detection with sentiment word embedding. Electronics. 2024;13(5):855.CrossRef Fu H, Liu H, Wang H, Xu L, Lin J, Jiang D. Multi-modal sarcasm detection with sentiment word embedding. Electronics. 2024;13(5):855.CrossRef
32.
go back to reference Mao R, He K, Zhang X, Chen G, Ni J, Yang Z, Cambria E. A survey on semantic processing techniques. Inf Fusion. 2024;101:101988.CrossRef Mao R, He K, Zhang X, Chen G, Ni J, Yang Z, Cambria E. A survey on semantic processing techniques. Inf Fusion. 2024;101:101988.CrossRef
33.
go back to reference Yu T, Gao H, Lin T-E, Yang M, Wu Y, Ma W, Wang C, Huang F, Li Y. Speech-text pre-training for spoken dialog understanding with explicit cross-modal alignment. In: Proceedings of the 61st annual meeting of the association for computational linguistics (Volume 1: Long Papers). 2023. pp. 7900–13. Yu T, Gao H, Lin T-E, Yang M, Wu Y, Ma W, Wang C, Huang F, Li Y. Speech-text pre-training for spoken dialog understanding with explicit cross-modal alignment. In: Proceedings of the 61st annual meeting of the association for computational linguistics (Volume 1: Long Papers). 2023. pp. 7900–13.
34.
go back to reference Zadeh A, Liang PP, Mazumder N, Poria S, Cambria E, Morency L-P. Memory fusion network for multi-view sequential learning. In: Proceedings of the AAAI conference on artificial intelligence, vol. 32. 2018. Zadeh A, Liang PP, Mazumder N, Poria S, Cambria E, Morency L-P. Memory fusion network for multi-view sequential learning. In: Proceedings of the AAAI conference on artificial intelligence, vol. 32. 2018.
35.
36.
go back to reference Liu Z, Shen Y, Lakshminarasimhan VB, Liang PP, Zadeh A, Morency L-P. Efficient low-rank multimodal fusion with modality-specific factors. arXiv:1806.00064. 2018. Liu Z, Shen Y, Lakshminarasimhan VB, Liang PP, Zadeh A, Morency L-P. Efficient low-rank multimodal fusion with modality-specific factors. arXiv:​1806.​00064. 2018.
37.
go back to reference Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need. Adv Neural Inf Process Syst. 2017;30. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need. Adv Neural Inf Process Syst. 2017;30.
38.
go back to reference Devlin J, Chang M-W, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805. 2018. Devlin J, Chang M-W, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:​1810.​04805. 2018.
39.
go back to reference Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, Cistac P, Rault T, Louf R, Funtowicz M, et al. Huggingface’s transformers: state-of-the-art natural language processing. arXiv:1910.03771. 2019. Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, Cistac P, Rault T, Louf R, Funtowicz M, et al. Huggingface’s transformers: state-of-the-art natural language processing. arXiv:​1910.​03771. 2019.
40.
go back to reference Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing. ACM Comput Surv. 2023;55(9):1–35. Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing. ACM Comput Surv. 2023;55(9):1–35.
41.
go back to reference Zhou K, Yang J, Loy CC, Liu Z. Conditional prompt learning for vision-language models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022. pp. 16816–25. Zhou K, Yang J, Loy CC, Liu Z. Conditional prompt learning for vision-language models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022. pp. 16816–25.
42.
go back to reference Zhou K, Yang J, Loy CC, Liu Z. Learning to prompt for vision-language models. Int J Comput Vision. 2022;130(9):2337–48. Zhou K, Yang J, Loy CC, Liu Z. Learning to prompt for vision-language models. Int J Comput Vision. 2022;130(9):2337–48.
43.
go back to reference Rao Y, Zhao W, Chen G, Tang Y, Zhu Z, Huang G, Zhou J, Lu J. DenseCLIP: language-guided dense prediction with context-aware prompting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022. pp. 18082–91. Rao Y, Zhao W, Chen G, Tang Y, Zhu Z, Huang G, Zhou J, Lu J. DenseCLIP: language-guided dense prediction with context-aware prompting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022. pp. 18082–91.
44.
go back to reference Wang Z, Zhang Z, Lee C-Y, Zhang H, Sun R, Ren X, Su G, Perot V, Dy J, Pfister T. Learning to prompt for continual learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022. pp. 139–49. Wang Z, Zhang Z, Lee C-Y, Zhang H, Sun R, Ren X, Su G, Perot V, Dy J, Pfister T. Learning to prompt for continual learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022. pp. 139–49.
45.
go back to reference Li D, Li J, Li H, Niebles JC, Hoi SCH. Align and prompt: video-and-language pre-training with entity prompts. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022. pp. 4953–63. Li D, Li J, Li H, Niebles JC, Hoi SCH. Align and prompt: video-and-language pre-training with entity prompts. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022. pp. 4953–63.
46.
go back to reference Gan Y, Bai Y, Lou Y, Ma X, Zhang R, Shi N, Luo L. Decorate the newcomers: visual domain prompt for continual test time adaptation. In: Proceedings of the AAAI conference on artificial intelligence, vol. 37. 2023. pp. 7595–603. Gan Y, Bai Y, Lou Y, Ma X, Zhang R, Shi N, Luo L. Decorate the newcomers: visual domain prompt for continual test time adaptation. In: Proceedings of the AAAI conference on artificial intelligence, vol. 37. 2023. pp. 7595–603.
47.
go back to reference He K, Mao R, Huang Y, Gong T, Li C, Cambria E. Template-free prompting for few-shot named entity recognition via semantic-enhanced contrastive learning. IEEE Trans Neural Netw Learn Syst. 2023. He K, Mao R, Huang Y, Gong T, Li C, Cambria E. Template-free prompting for few-shot named entity recognition via semantic-enhanced contrastive learning. IEEE Trans Neural Netw Learn Syst. 2023.
48.
go back to reference Zhu L, Li W, Mao R, Pandelea V, Cambria E. PAED: zero-shot persona attribute extraction in dialogues. In: Proceedings of the 61st annual meeting of the association for computational linguistics (Volume 1: Long Papers). 2023. pp. 9771–87. Zhu L, Li W, Mao R, Pandelea V, Cambria E. PAED: zero-shot persona attribute extraction in dialogues. In: Proceedings of the 61st annual meeting of the association for computational linguistics (Volume 1: Long Papers). 2023. pp. 9771–87.
49.
go back to reference Baevski A, Zhou Y, Mohamed A, Auli M. wav2vec 2.0: a framework for self-supervised learning of speech representations. Adv Neural Inf Process Syst. 2020;33:12449–60. Baevski A, Zhou Y, Mohamed A, Auli M. wav2vec 2.0: a framework for self-supervised learning of speech representations. Adv Neural Inf Process Syst. 2020;33:12449–60.
50.
go back to reference Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L. ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on computer vision and pattern recognition. Ieee; 2009. pp. 248–55. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L. ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on computer vision and pattern recognition. Ieee; 2009. pp. 248–55.
51.
go back to reference Graves A, Fernández S, Gomez F, Schmidhuber J. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd international conference on machine learning. 2006. pp. 369–76. Graves A, Fernández S, Gomez F, Schmidhuber J. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd international conference on machine learning. 2006. pp. 369–76.
Metadata
Title
Prompt Learning for Multimodal Intent Recognition with Modal Alignment Perception
Authors
Yuzhao Chen
Wenhua Zhu
Weilun Yu
Hongfei Xue
Hao Fu
Jiali Lin
Dazhi Jiang
Publication date
10-08-2024
Publisher
Springer US
Published in
Cognitive Computation
Print ISSN: 1866-9956
Electronic ISSN: 1866-9964
DOI
https://doi.org/10.1007/s12559-024-10328-7

Premium Partner