research-article

HyperLearn: A Distributed Approach for Representation Learning in Datasets With Many Modalities

Authors:
Devanshu Arya

University of Amsterdam, Amsterdam, Netherlands

University of Amsterdam, Amsterdam, Netherlands
View Profile

,
Stevan Rudinac

University of Amsterdam, Amsterdam, Netherlands

University of Amsterdam, Amsterdam, Netherlands
View Profile

,
Marcel Worring

University of Amsterdam, Amsterdam, Netherlands

University of Amsterdam, Amsterdam, Netherlands
View Profile

MM '19: Proceedings of the 27th ACM International Conference on MultimediaOctober 2019Pages 2245–2253https://doi.org/10.1145/3343031.3350572

Published:15 October 2019Publication History

MM '19: Proceedings of the 27th ACM International Conference on Multimedia

Pages 2245–2253

ABSTRACT

Multimodal datasets contain an enormous amount of relational information, which grows exponentially with the introduction of new modalities. Learning representations in such a scenario is inherently complex due to the presence of multiple heterogeneous information channels. These channels can encode both (a) inter-relations between the items of different modalities and (b) intra-relations between the items of the same modality. Encoding multimedia items into a continuous low-dimensional semantic space such that both types of relations are captured and preserved is extremely challenging, especially if the goal is a unified end-to-end learning framework. The two key challenges that need to be addressed are: 1) the framework must be able to merge complex intra and inter relations without losing any valuable information and 2) the learning model should be invariant to the addition of new and potentially very different modalities. In this paper, we propose a flexible framework which can scale to data streams from many modalities. To that end we introduce a hypergraph-based model for data representation and deploy Graph Convolutional Networks to fuse relational information within and across modalities. Our approach provides an efficient solution for distributing otherwise extremely computationally expensive or even unfeasible training processes across multiple-GPUs, without any sacrifices in accuracy. Moreover, adding new modalities to our model requires only an additional GPU unit keeping the computational time unchanged, which brings representation learning to truly multimodal datasets. We demonstrate the feasibility of our approach in the experiments on multimedia datasets featuring second, third and fourth order relations.

References

Devanshu Arya and Marcel Worring. 2018. Exploiting Relational Information in Social Networks using Geometric Deep Learning on Hypergraphs. In Proceedings of the 2018 ACM International Conference on Multimedia Retrieval. ACM, 117--125.Google ScholarDigital Library
Anirban Banerjee, Arnab Char, and Bibhash Mondal. 2017. Spectra of general hypergraphs. Linear Algebra Appl., Vol. 518 (2017), 14--30.Google ScholarCross Ref
Davide Boscaini, Jonathan Masci, Emanuele Rodolà, and Michael Bronstein. 2016. Learning shape correspondence with anisotropic convolutional neural networks. In Advances in Neural Information Processing Systems. 3189--3197.Google Scholar
Michael M Bronstein, Joan Bruna, Yann LeCun, Arthur Szlam, and Pierre Vandergheynst. 2017. Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine, Vol. 34, 4 (2017), 18--42.Google ScholarCross Ref
Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann Lecun. 2014. Spectral networks and locally connected networks on graphs. In International Conference on Learning Representations (ICLR2014), CBLS, April 2014 .Google Scholar
Jiajun Bu, Shulong Tan, Chun Chen, Can Wang, Hao Wu, Lijun Zhang, and Xiaofei He. 2010. Music recommendation by unified hypergraph: combining social media information and music content. In Proceedings of the 18th ACM international conference on Multimedia. ACM, 391--400.Google ScholarDigital Library
Shiyu Chang, Wei Han, Jiliang Tang, Guo-Jun Qi, Charu C Aggarwal, and Thomas S Huang. 2015. Heterogeneous network embedding via deep architectures. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 119--128.Google ScholarDigital Library
Peng Cui, Shao-Wei Liu, Wen-Wu Zhu, Huan-Bo Luan, Tat-Seng Chua, and Shi-Qiang Yang. 2014. Social-sensed image search. ACM Transactions on Information Systems (TOIS), Vol. 32, 2 (2014), 8.Google ScholarDigital Library
Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems. 3844--3852.Google Scholar
David K Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Alán Aspuru-Guzik, and Ryan P Adams. 2015. Convolutional networks on graphs for learning molecular fingerprints. In Advances in neural information processing systems. 2224--2232.Google Scholar
Richard A Harshman et almbox. 1970. Foundations of the PARAFAC procedure: Models and conditions for an" explanatory" multimodal factor analysis. (1970).Google Scholar
Feiran Huang, Xiaoming Zhang, Chaozhuo Li, Zhoujun Li, Yueying He, and Zhonghua Zhao. 2018. Multimodal network embedding via attention based multi-view variational autoencoder. In Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval. ACM, 108--116.Google ScholarDigital Library
Mark J Huiskes and Michael S Lew. 2008. The MIR flickr retrieval evaluation. In Proceedings of the 1st ACM international conference on Multimedia information retrieval. ACM, 39--43.Google ScholarDigital Library
CG Khatri and C Radhakrishna Rao. 1968. Solutions to some functional equations and their applications to characterization of probability distributions. Sankhy=a: The Indian Journal of Statistics, Series A (1968), 167--180.Google Scholar
Hyon-Jung Kim, Esa Ollila, Visa Koivunen, and Christophe Croux. 2013. Robust and sparse estimation of tensor decompositions. In 2013 IEEE Global Conference on Signal and Information Processing. IEEE, 965--968.Google ScholarCross Ref
Thomas N Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. Proceedings of the International Conference on Learning Representations (2017).Google Scholar
Tamara G Kolda and Brett W Bader. 2009. Tensor decompositions and applications. SIAM review, Vol. 51, 3 (2009), 455--500.Google Scholar
Timothee Lacroix, Nicolas Usunier, and Guillaume Obozinski. 2018. Canonical Tensor Decomposition for Knowledge Base Completion. In International Conference on Machine Learning. 2869--2878.Google Scholar
Dong Li, Zhiming Xu, Sheng Li, and Xin Sun. 2013. Link prediction in social networks based on hypergraph. In Proceedings of the 22nd International Conference on World Wide Web. ACM, 41--42.Google ScholarDigital Library
Hang Li, Haozheng Wang, Zhenglu Yang, and Masato Odagaki. 2017. Variation autoencoder based network representation learning for classification. In Proceedings of ACL 2017, Student Research Workshop. 56--61.Google ScholarCross Ref
Wu-Jun Li and Dit-Yan Yeung. 2009. Relation regularized matrix factorization. In Twenty-First International Joint Conference on Artificial Intelligence .Google Scholar
Xirong Li, Tiberio Uricchio, Lamberto Ballan, Marco Bertini, Cees GM Snoek, and Alberto Del Bimbo. 2016. Socializing the semantic gap: A comparative survey on image tag assignment, refinement, and retrieval. ACM Computing Surveys (CSUR), Vol. 49, 1 (2016), 14.Google ScholarDigital Library
Zechao Li and Jinhui Tang. 2016. Weakly supervised deep matrix factorization for social image understanding. IEEE Transactions on Image Processing, Vol. 26, 1 (2016), 276--288.Google ScholarDigital Library
Zechao Li, Jinhui Tang, and Tao Mei. 2018. Deep collaborative embedding for social image understanding. IEEE transactions on pattern analysis and machine intelligence (2018).Google Scholar
Koji Maruhashi, Masaru Todoriki, Takuya Ohwa, Keisuke Goto, Yu Hasegawa, Hiroya Inakoshi, and Hirokazu Anai. 2018. Learning multi-way relations via tensor decomposition with neural networks. In Thirty-Second AAAI Conference on Artificial Intelligence.Google Scholar
Jonathan Masci, Davide Boscaini, Michael Bronstein, and Pierre Vandergheynst. 2015. Geodesic convolutional neural networks on riemannian manifolds. In Proceedings of the IEEE international conference on computer vision workshops. 37--45.Google ScholarDigital Library
Julian McAuley and Jure Leskovec. 2012. Image labeling on a network: using social-network metadata for image classification. In European conference on computer vision. Springer, 828--841.Google ScholarDigital Library
Bradley N Miller, Istvan Albert, Shyong K Lam, Joseph A Konstan, and John Riedl. 2003. MovieLens unplugged: experiences with an occasionally connected recommender system. In Proceedings of the 8th international conference on Intelligent user interfaces. ACM, 263--266.Google ScholarDigital Library
Federico Monti, Michael Bronstein, and Xavier Bresson. 2017. Geometric matrix completion with recurrent multi-graph neural networks. In Advances in Neural Information Processing Systems. 3697--3707.Google Scholar
Atsuhiro Narita, Kohei Hayashi, Ryota Tomioka, and Hisashi Kashima. 2012. Tensor factorization using auxiliary information. Data Mining and Knowledge Discovery, Vol. 25, 2 (2012), 298--324.Google ScholarDigital Library
Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y Ng. 2011. Multimodal deep learning. In Proceedings of the 28th international conference on machine learning (ICML-11). 689--696.Google ScholarDigital Library
Nikhil Rasiwasia, Jose Costa Pereira, Emanuele Coviello, Gabriel Doyle, Gert RG Lanckriet, Roger Levy, and Nuno Vasconcelos. 2010. A new approach to cross-modal multimedia retrieval. In Proceedings of the 18th ACM international conference on Multimedia. ACM, 251--260.Google ScholarDigital Library
Sam T Roweis and Lawrence K Saul. 2000. Nonlinear dimensionality reduction by locally linear embedding. science, Vol. 290, 5500 (2000), 2323--2326.Google Scholar
Stevan Rudinac, Iva Gornishka, and Marcel Worring. 2017. Multimodal Classification of Violent Online Political Extremism Content with Graph Convolutional Networks. In Proceedings of the on Thematic Workshops of ACM Multimedia 2017. ACM, 245--252.Google ScholarDigital Library
Jitao Sang, Jing Liu, and Changsheng Xu. 2011. Exploiting user information for image tag refinement. In Proceedings of the 19th ACM international conference on Multimedia. ACM, 1129--1132.Google ScholarDigital Library
David I Shuman, Sunil K Narang, Pascal Frossard, Antonio Ortega, and Pierre Vandergheynst. 2013. The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Processing Magazine, vol. 30, no. 3, pp. 83 -- 98 (2013).Google ScholarCross Ref
Richard Socher, Danqi Chen, Christopher D Manning, and Andrew Ng. 2013. Reasoning with neural tensor networks for knowledge base completion. In Advances in neural information processing systems. 926--934.Google Scholar
Nitish Srivastava and Ruslan R Salakhutdinov. 2012. Multimodal learning with deep boltzmann machines. In Advances in neural information processing systems. 2222--2230.Google Scholar
Gjorgji Strezoski and Marcel Worring. 2017. Omniart: multi-task deep learning for artistic data analysis. arXiv preprint arXiv:1708.00684 (2017).Google Scholar
Gjorgji Strezoski and Marcel Worring. 2018. OmniArt: A Large-scale Artistic Benchmark. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Vol. 14, 4 (2018), 88.Google Scholar
Jinhui Tang, Zechao Li, Meng Wang, and Ruizhen Zhao. 2015a. Neighborhood discriminant hashing for large-scale image retrieval. IEEE Transactions on Image Processing, Vol. 24, 9 (2015), 2827--2840.Google ScholarDigital Library
Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015b. Line: Large-scale information network embedding. In Proceedings of the 24th international conference on world wide web. International World Wide Web Conferences Steering Committee, 1067--1077.Google ScholarDigital Library
Jinhui Tang, Xiangbo Shu, Zechao Li, Yu-Gang Jiang, and Qi Tian. 2019. Social Anchor-Unit Graph Regularized Tensor Completion for Large-Scale Image Retagging. IEEE transactions on pattern analysis and machine intelligence (2019).Google ScholarCross Ref
Jinhui Tang, Xiangbo Shu, Guo-Jun Qi, Zechao Li, Meng Wang, Shuicheng Yan, and Ramesh Jain. 2017. Tri-clustered tensor completion for social-aware image tag refinement. IEEE transactions on pattern analysis and machine intelligence, Vol. 39, 8 (2017), 1662--1674.Google ScholarDigital Library
Joshua B Tenenbaum, Vin De Silva, and John C Langford. 2000. A global geometric framework for nonlinear dimensionality reduction. science, Vol. 290, 5500 (2000), 2319--2323.Google Scholar
Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, and Guillaume Bouchard. 2016. Complex embeddings for simple link prediction. In International Conference on Machine Learning. 2071--2080.Google ScholarDigital Library
Xiaolong Wang and Abhinav Gupta. 2018. Videos as space-time region graphs. In Proceedings of the European Conference on Computer Vision (ECCV). 399--417.Google ScholarDigital Library
Michael M Wolf, Alicia M Klinvex, and Daniel M Dunlavy. 2016. Advantages to modeling relational data using hypergraphs versus graphs. In 2016 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 1--7.Google ScholarCross Ref
Fei Yan and Krystian Mikolajczyk. 2015. Deep correlation for matching images and text. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3441--3450.Google ScholarCross Ref
Cheng Yang, Zhiyuan Liu, Deli Zhao, Maosong Sun, and Edward Chang. 2015. Network representation learning with rich text information. In Twenty-Fourth International Joint Conference on Artificial Intelligence.Google ScholarDigital Library
Daokun Zhang, Jie Yin, Xingquan Zhu, and Chengqi Zhang. 2017. User Profile Preserving Social Network Embedding.. In IJCAI. 3378--3384.Google Scholar
Dengyong Zhou, Jiayuan Huang, and Bernhard Schölkopf. 2007. Learning with hypergraphs: Clustering, classification, and embedding. In Advances in neural information processing systems. 1601--1608.Google Scholar

Index Terms

HyperLearn: A Distributed Approach for Representation Learning in Datasets With Many Modalities
1. Computing methodologies
  1. Artificial intelligence
    1. Distributed artificial intelligence
  2. Machine learning
    1. Learning paradigms
      1. Multi-task learning
    2. Machine learning approaches
      1. Learning latent representations

Recommendations

Adapt and explore: Multimodal mixup for representation learning
Abstract
Research on general multimodal systems has gained significant attention due to the proliferation of multimodal data in the real world. Despite the remarkable performance achieved by existing multimodal representation learning schemes, missing ...
Highlights
- Innovatively introducing mixup strategy to multimodal representation learning.
- Conducting multimodal mixup through adapting and exploring steps.
- Mixing negative samples in multimodal contrastive learning.
- Improving the ...
Read More
How to Sense the World: Leveraging Hierarchy in Multimodal Perception for Robust Reinforcement Learning Agents
AAMAS '22: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems

This work addresses the problem of sensing the world: how to learn a multimodal representation of a reinforcement learning agent's environment that allows the execution of tasks under incomplete perceptual conditions. To address such problem, we argue ...
Read More
Learning from the global view: Supervised contrastive learning of multimodal representation
Abstract
The development of technology enables the availability of abundant multimodal data, which can be utilized in many representation learning tasks. However, most methods ignore the rich modality correlation information stored in each multimodal ...
Highlights
- Proposing global contrastive learning based on multimodal representation.
- Devising multiple techniques to define the negatives/positives for each anchor.
- Leveraging label information to conduct supervised contrastive learning.
- ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '19: Proceedings of the 27th ACM International Conference on Multimedia
October 2019
2794 pages
ISBN:9781450368896
DOI:10.1145/3343031
General Chairs:
Laurent Amsaleg
CNRS-IRISA, France
,
Benoit Huet
EURECOM, France
,
Martha Larson
Radboud University and TU Delft (Netherlands)
,
Program Chairs:
Guillaume Gravier
CNRS-IRISA, France
,
Hayley Hung
Delft University of Technology Netherlands
,
Chong-Wah Ngo
City University of Hong Kong Hong Kong
,
Wei Tsang Ooi
National University of Singapore Singapore
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 October 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
geometric deep learning
highly multimodal datasets
hypergraph
multimodal representation learning
tensor factorization
Qualifiers
- research-article
Conference

Acceptance Rates
MM '19 Paper Acceptance Rate252of936submissions,27%Overall Acceptance Rate995of4,171submissions,24%
More
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 306
  Total Downloads
- Downloads (Last 12 months)21
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HyperLearn: A Distributed Approach for Representation Learning in Datasets With Many Modalities

MM '19: Proceedings of the 27th ACM International Conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Adapt and explore: Multimodal mixup for representation learning

How to Sense the World: Leveraging Hierarchy in Multimodal Perception for Robust Reinforcement Learning Agents

Learning from the global view: Supervised contrastive learning of multimodal representation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

HyperLearn: A Distributed Approach for Representation Learning in Datasets With Many Modalities

MM '19: Proceedings of the 27th ACM International Conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Adapt and explore: Multimodal mixup for representation learning

How to Sense the World: Leveraging Hierarchy in Multimodal Perception for Robust Reinforcement Learning Agents

Learning from the global view: Supervised contrastive learning of multimodal representation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media