Skip to main content

20.02.2024 | Research

CMC-MMR: multi-modal recommendation model with cross-modal correction

verfasst von: YuBin Wang, HongBin Xia, Yuan Liu

Erschienen in: Journal of Intelligent Information Systems

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Multi-modal recommendation using multi-modal features (e.g., image and text features) has received significant attention and has been shown to have more effective recommendation. However, there are currently the following problems with multi-modal recommendation: (1) Multi-modal recommendation often handle individual modes’ raw data directly, leading to noise affecting the model’s effectiveness and the failure to explore interconnections between modes; (2) Different users have different preferences. It’s impractical to treat all modalities equally, as this could interfere with the model’s ability to make recommendation. To address the above problems, this paper proposes a Multi-modal recommendation model with cross-modal correction (CMC-MMR). Firstly, in order to reduce the effect of noise in the raw data and to take full advantage of the relationships between modes, we designed a cross-modal correction module to denoise and correct the modes using a cross-modal correction mechanism; Secondly, the similarity between the same modalities of each item is used as a benchmark to build item-item graphs for each modality, and user-item graphs with degree-sensitive pruning strategies are also built to mine higher-order information; Finally, we designed a self-supervised task to adaptively mine user preferences for modality. We conducted comparative experiments with eleven baseline models on four real-world datasets. The experimental results show that CMC-MMR improves 6.202%, 4.975% , 6.054% and 11.368% on average on the four datasets, respectively, demonstrates the effectiveness of CMC-MMR.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Chen, F., Wang, J., Wei, Y., et al. (2022). Breaking isolation: Multimodal graph fusion for multimedia recommendation by edge-wise modulation. In: Proceedings of the 30th ACM International Conference on Multimedia. Association for Computing Machinery, New York, NY, USA, MM ’22, pp 385–394. https://doi.org/10.1145/3503161.3548399 Chen, F., Wang, J., Wei, Y., et al. (2022). Breaking isolation: Multimodal graph fusion for multimedia recommendation by edge-wise modulation. In: Proceedings of the 30th ACM International Conference on Multimedia. Association for Computing Machinery, New York, NY, USA, MM ’22, pp 385–394. https://​doi.​org/​10.​1145/​3503161.​3548399
Zurück zum Zitat Chen, J., Hr, Fang, & Saad, Y. (2009). Fast approximate knn graph construction for high dimensional data via recursive lanczos bisection. Journal of Machine Learning Research, 10, 1989–2012. Chen, J., Hr, Fang, & Saad, Y. (2009). Fast approximate knn graph construction for high dimensional data via recursive lanczos bisection. Journal of Machine Learning Research, 10, 1989–2012.
Zurück zum Zitat Chen, M., Wei, Z., Huang, Z., et al. (2020). Simple and deep graph convolutional networks. In: III HD, Singh A (eds) Proceedings of the 37th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 119. PMLR, pp 1725–1735 Chen, M., Wei, Z., Huang, Z., et al. (2020). Simple and deep graph convolutional networks. In: III HD, Singh A (eds) Proceedings of the 37th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 119. PMLR, pp 1725–1735
Zurück zum Zitat He, R., & McAuley, J. (2016). Vbpr: Visual bayesian personalized ranking from implicit feedback. Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 30, No. 1) He, R., & McAuley, J. (2016). Vbpr: Visual bayesian personalized ranking from implicit feedback. Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 30, No. 1)
Zurück zum Zitat He, X., Deng, K., Wang, X., et al. (2020). Lightgcn: Simplifying and powering graph convolution network for recommendation. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, New York, NY, USA, SIGIR ’20, pp. 639–648. https://doi.org/10.1145/3397271.3401063 He, X., Deng, K., Wang, X., et al. (2020). Lightgcn: Simplifying and powering graph convolution network for recommendation. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, New York, NY, USA, SIGIR ’20, pp. 639–648. https://​doi.​org/​10.​1145/​3397271.​3401063
Zurück zum Zitat Hjelm, R. D., Fedorov, A., Lavoie-Marchildon, S., et al. (2019). Learning deep representations by mutual information estimation and maximization Hjelm, R. D., Fedorov, A., Lavoie-Marchildon, S., et al. (2019). Learning deep representations by mutual information estimation and maximization
Zurück zum Zitat Kemertas, M., Pishdad, L., Derpanis, K. G., et al. (2020). Rankmi: A mutual information maximizing ranking loss. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Kemertas, M., Pishdad, L., Derpanis, K. G., et al. (2020). Rankmi: A mutual information maximizing ranking loss. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Zurück zum Zitat Kipf, T. N., & Welling, M. (2017). Semi-supervised classification with graph convolutional networks Kipf, T. N., & Welling, M. (2017). Semi-supervised classification with graph convolutional networks
Zurück zum Zitat Liu, S., Chen, Z., Liu, H., et al. (2019). User-video co-attention network for personalized micro-video recommendation. In: The World Wide Web Conference. Association for Computing Machinery, New York, NY, USA, WWW ’19, pp. 3020–3026. https://doi.org/10.1145/3308558.3313513 Liu, S., Chen, Z., Liu, H., et al. (2019). User-video co-attention network for personalized micro-video recommendation. In: The World Wide Web Conference. Association for Computing Machinery, New York, NY, USA, WWW ’19, pp. 3020–3026. https://​doi.​org/​10.​1145/​3308558.​3313513
Zurück zum Zitat Luo, D., Cheng, W., Yu, W., et al. (2021). Learning to drop: Robust graph neural network via topological denoising. In: Proceedings of the 14th ACM International Conference on Web Search and Data Mining. Association for Computing Machinery, New York, NY, USA, WSDM ’21, pp 779–787. https://doi.org/10.1145/3437963.3441734 Luo, D., Cheng, W., Yu, W., et al. (2021). Learning to drop: Robust graph neural network via topological denoising. In: Proceedings of the 14th ACM International Conference on Web Search and Data Mining. Association for Computing Machinery, New York, NY, USA, WSDM ’21, pp 779–787. https://​doi.​org/​10.​1145/​3437963.​3441734
Zurück zum Zitat van der Maaten, L., & Hinton, G. E. (2008). Visualizing data using t-sne. Journal of Machine Learning Research, 9, 2579–2605. van der Maaten, L., & Hinton, G. E. (2008). Visualizing data using t-sne. Journal of Machine Learning Research, 9, 2579–2605.
Zurück zum Zitat Ni, J., Li, J., & McAuley, J. (2019). Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In: K. Inui, J. Jiang, V. Ng, et al. (Eds.) Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, pp 188–197, https://doi.org/10.18653/v1/D19-1018 Ni, J., Li, J., & McAuley, J. (2019). Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In: K. Inui, J. Jiang, V. Ng, et al. (Eds.) Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, pp 188–197, https://​doi.​org/​10.​18653/​v1/​D19-1018
Zurück zum Zitat Paszke, A., Gross, S., Massa, F., et al. (2019). Pytorch: An imperative style, high-performance deep learning library. In H. Wallach, H. Larochelle, A. Beygelzimer, et al. (Eds.), Advances in Neural Information Processing Systems. (Vol. 32). Curran Associates Inc. Paszke, A., Gross, S., Massa, F., et al. (2019). Pytorch: An imperative style, high-performance deep learning library. In H. Wallach, H. Larochelle, A. Beygelzimer, et al. (Eds.), Advances in Neural Information Processing Systems. (Vol. 32). Curran Associates Inc.
Zurück zum Zitat Reimers, N., & Gurevych, I. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks Reimers, N., & Gurevych, I. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks
Zurück zum Zitat Rendle, S., Freudenthaler, C., Gantner, Z., et al (2009). Bpr: Bayesian personalized ranking from implicit feedback. In: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence. AUAI Press, Arlington, Virginia, USA, UAI ’09, pp. 452–461 Rendle, S., Freudenthaler, C., Gantner, Z., et al (2009). Bpr: Bayesian personalized ranking from implicit feedback. In: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence. AUAI Press, Arlington, Virginia, USA, UAI ’09, pp. 452–461
Zurück zum Zitat Rong, Y., Huang, W., Xu, T., et al. (2020). Dropedge: Towards deep graph convolutional networks on node classification. In: International Conference on Learning Representations Rong, Y., Huang, W., Xu, T., et al. (2020). Dropedge: Towards deep graph convolutional networks on node classification. In: International Conference on Learning Representations
Zurück zum Zitat Terrell, G. R., & Scott, D. W. (1992). Variable kernel density estimation. The Annals of Statistics, 20(3), 1236–1265.MathSciNetCrossRef Terrell, G. R., & Scott, D. W. (1992). Variable kernel density estimation. The Annals of Statistics, 20(3), 1236–1265.MathSciNetCrossRef
Zurück zum Zitat Wang, C., Yu, Y., Ma, W., et al. (2022). Towards representation alignment and uniformity in collaborative filtering. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, New York, NY, USA, KDD ’22, pp 1816–1825. https://doi.org/10.1145/3534678.3539253 Wang, C., Yu, Y., Ma, W., et al. (2022). Towards representation alignment and uniformity in collaborative filtering. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, New York, NY, USA, KDD ’22, pp 1816–1825. https://​doi.​org/​10.​1145/​3534678.​3539253
Zurück zum Zitat Wang, W., Feng, F., He, X., et al. (2021). Denoising implicit feedback for recommendation. In: Proceedings of the 14th ACM International Conference on Web Search and Data Mining. Association for Computing Machinery, New York, NY, USA, WSDM ’21, pp 373–381. https://doi.org/10.1145/3437963.3441800 Wang, W., Feng, F., He, X., et al. (2021). Denoising implicit feedback for recommendation. In: Proceedings of the 14th ACM International Conference on Web Search and Data Mining. Association for Computing Machinery, New York, NY, USA, WSDM ’21, pp 373–381. https://​doi.​org/​10.​1145/​3437963.​3441800
Zurück zum Zitat Wei, Y., Wang, X., Nie, L., et al. (2019). Mmgcn: Multi-modal graph convolution network for personalized recommendation of micro-video. In: Proceedings of the 27th ACM International Conference on Multimedia. Association for Computing Machinery, New York, NY, USA, MM ’19, pp. 1437–1445. https://doi.org/10.1145/3343031.3351034 Wei, Y., Wang, X., Nie, L., et al. (2019). Mmgcn: Multi-modal graph convolution network for personalized recommendation of micro-video. In: Proceedings of the 27th ACM International Conference on Multimedia. Association for Computing Machinery, New York, NY, USA, MM ’19, pp. 1437–1445. https://​doi.​org/​10.​1145/​3343031.​3351034
Zurück zum Zitat Wei, Y., Wang, X., Nie, L., et al. (2020). Graph-refined convolutional network for multimedia recommendation with implicit feedback. In: Proceedings of the 28th ACM International Conference on Multimedia. Association for Computing Machinery, New York, NY, USA, MM ’20, pp. 3541–3549. https://doi.org/10.1145/3394171.3413556 Wei, Y., Wang, X., Nie, L., et al. (2020). Graph-refined convolutional network for multimedia recommendation with implicit feedback. In: Proceedings of the 28th ACM International Conference on Multimedia. Association for Computing Machinery, New York, NY, USA, MM ’20, pp. 3541–3549. https://​doi.​org/​10.​1145/​3394171.​3413556
Zurück zum Zitat Weston, J., Yee, H., & Weiss, R. J. (2013). Learning to rank recommendations with the k-order statistic loss. In: Proceedings of the 7th ACM Conference on Recommender Systems. Association for Computing Machinery, New York, NY, USA, RecSys ’13, pp 245–248. https://doi.org/10.1145/2507157.2507210 Weston, J., Yee, H., & Weiss, R. J. (2013). Learning to rank recommendations with the k-order statistic loss. In: Proceedings of the 7th ACM Conference on Recommender Systems. Association for Computing Machinery, New York, NY, USA, RecSys ’13, pp 245–248. https://​doi.​org/​10.​1145/​2507157.​2507210
Zurück zum Zitat Yu, P., Tan, Z., Lu, G., et al. (2023). Multi-view graph convolutional network for multimedia recommendation. In: Proceedings of the 31st ACM International Conference on Multimedia. Association for Computing Machinery, New York, NY, USA, MM ’23, pp 6576–6585. https://doi.org/10.1145/3581783.3613915 Yu, P., Tan, Z., Lu, G., et al. (2023). Multi-view graph convolutional network for multimedia recommendation. In: Proceedings of the 31st ACM International Conference on Multimedia. Association for Computing Machinery, New York, NY, USA, MM ’23, pp 6576–6585. https://​doi.​org/​10.​1145/​3581783.​3613915
Zurück zum Zitat Zhang, F., Yuan, N. J., Lian, D., et al. (2016). Collaborative knowledge base embedding for recommender systems. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, New York, NY, USA, KDD ’16, pp 353–362. https://doi.org/10.1145/2939672.2939673 Zhang, F., Yuan, N. J., Lian, D., et al. (2016). Collaborative knowledge base embedding for recommender systems. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, New York, NY, USA, KDD ’16, pp 353–362. https://​doi.​org/​10.​1145/​2939672.​2939673
Zurück zum Zitat Zhang, J., Zhu, Y., Liu, Q., et al. (2021). Mining latent structures for multimedia recommendation. In: Proceedings of the 29th ACM International Conference on Multimedia. Association for Computing Machinery, New York, NY, USA, MM ’21, pp. 3872–3880. https://doi.org/10.1145/3474085.3475259 Zhang, J., Zhu, Y., Liu, Q., et al. (2021). Mining latent structures for multimedia recommendation. In: Proceedings of the 29th ACM International Conference on Multimedia. Association for Computing Machinery, New York, NY, USA, MM ’21, pp. 3872–3880. https://​doi.​org/​10.​1145/​3474085.​3475259
Zurück zum Zitat Zhou, X. (2023). Mmrec: Simplifying multimodal recommendation Zhou, X. (2023). Mmrec: Simplifying multimodal recommendation
Zurück zum Zitat Zhou, X., & Shen, Z. (2023). A tale of two graphs: Freezing and denoising graph structures for multimodal recommendation. In: Proceedings of the 31st ACM International Conference on Multimedia. Association for Computing Machinery, New York, NY, USA, MM ’23, pp. 935–943. https://doi.org/10.1145/3581783.3611943 Zhou, X., & Shen, Z. (2023). A tale of two graphs: Freezing and denoising graph structures for multimodal recommendation. In: Proceedings of the 31st ACM International Conference on Multimedia. Association for Computing Machinery, New York, NY, USA, MM ’23, pp. 935–943. https://​doi.​org/​10.​1145/​3581783.​3611943
Zurück zum Zitat Zhou, X., Zhou, H., Liu, Y., et al. (2023). Bootstrap latent representations for multi-modal recommendation. In: Proceedings of the ACM Web Conference 2023. Association for Computing Machinery, New York, NY, USA, WWW ’23, pp. 845–854. https://doi.org/10.1145/3543507.3583251 Zhou, X., Zhou, H., Liu, Y., et al. (2023). Bootstrap latent representations for multi-modal recommendation. In: Proceedings of the ACM Web Conference 2023. Association for Computing Machinery, New York, NY, USA, WWW ’23, pp. 845–854. https://​doi.​org/​10.​1145/​3543507.​3583251
Metadaten
Titel
CMC-MMR: multi-modal recommendation model with cross-modal correction
verfasst von
YuBin Wang
HongBin Xia
Yuan Liu
Publikationsdatum
20.02.2024
Verlag
Springer US
Erschienen in
Journal of Intelligent Information Systems
Print ISSN: 0925-9902
Elektronische ISSN: 1573-7675
DOI
https://doi.org/10.1007/s10844-024-00848-x