skip to main content
10.1145/3343031.3350572acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

HyperLearn: A Distributed Approach for Representation Learning in Datasets With Many Modalities

Published:15 October 2019Publication History

ABSTRACT

Multimodal datasets contain an enormous amount of relational information, which grows exponentially with the introduction of new modalities. Learning representations in such a scenario is inherently complex due to the presence of multiple heterogeneous information channels. These channels can encode both (a) inter-relations between the items of different modalities and (b) intra-relations between the items of the same modality. Encoding multimedia items into a continuous low-dimensional semantic space such that both types of relations are captured and preserved is extremely challenging, especially if the goal is a unified end-to-end learning framework. The two key challenges that need to be addressed are: 1) the framework must be able to merge complex intra and inter relations without losing any valuable information and 2) the learning model should be invariant to the addition of new and potentially very different modalities. In this paper, we propose a flexible framework which can scale to data streams from many modalities. To that end we introduce a hypergraph-based model for data representation and deploy Graph Convolutional Networks to fuse relational information within and across modalities. Our approach provides an efficient solution for distributing otherwise extremely computationally expensive or even unfeasible training processes across multiple-GPUs, without any sacrifices in accuracy. Moreover, adding new modalities to our model requires only an additional GPU unit keeping the computational time unchanged, which brings representation learning to truly multimodal datasets. We demonstrate the feasibility of our approach in the experiments on multimedia datasets featuring second, third and fourth order relations.

References

  1. Devanshu Arya and Marcel Worring. 2018. Exploiting Relational Information in Social Networks using Geometric Deep Learning on Hypergraphs. In Proceedings of the 2018 ACM International Conference on Multimedia Retrieval. ACM, 117--125.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Anirban Banerjee, Arnab Char, and Bibhash Mondal. 2017. Spectra of general hypergraphs. Linear Algebra Appl., Vol. 518 (2017), 14--30.Google ScholarGoogle ScholarCross RefCross Ref
  3. Davide Boscaini, Jonathan Masci, Emanuele Rodolà, and Michael Bronstein. 2016. Learning shape correspondence with anisotropic convolutional neural networks. In Advances in Neural Information Processing Systems. 3189--3197.Google ScholarGoogle Scholar
  4. Michael M Bronstein, Joan Bruna, Yann LeCun, Arthur Szlam, and Pierre Vandergheynst. 2017. Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine, Vol. 34, 4 (2017), 18--42.Google ScholarGoogle ScholarCross RefCross Ref
  5. Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann Lecun. 2014. Spectral networks and locally connected networks on graphs. In International Conference on Learning Representations (ICLR2014), CBLS, April 2014 .Google ScholarGoogle Scholar
  6. Jiajun Bu, Shulong Tan, Chun Chen, Can Wang, Hao Wu, Lijun Zhang, and Xiaofei He. 2010. Music recommendation by unified hypergraph: combining social media information and music content. In Proceedings of the 18th ACM international conference on Multimedia. ACM, 391--400.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Shiyu Chang, Wei Han, Jiliang Tang, Guo-Jun Qi, Charu C Aggarwal, and Thomas S Huang. 2015. Heterogeneous network embedding via deep architectures. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 119--128.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Peng Cui, Shao-Wei Liu, Wen-Wu Zhu, Huan-Bo Luan, Tat-Seng Chua, and Shi-Qiang Yang. 2014. Social-sensed image search. ACM Transactions on Information Systems (TOIS), Vol. 32, 2 (2014), 8.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems. 3844--3852.Google ScholarGoogle Scholar
  10. David K Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Alán Aspuru-Guzik, and Ryan P Adams. 2015. Convolutional networks on graphs for learning molecular fingerprints. In Advances in neural information processing systems. 2224--2232.Google ScholarGoogle Scholar
  11. Richard A Harshman et almbox. 1970. Foundations of the PARAFAC procedure: Models and conditions for an" explanatory" multimodal factor analysis. (1970).Google ScholarGoogle Scholar
  12. Feiran Huang, Xiaoming Zhang, Chaozhuo Li, Zhoujun Li, Yueying He, and Zhonghua Zhao. 2018. Multimodal network embedding via attention based multi-view variational autoencoder. In Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval. ACM, 108--116.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Mark J Huiskes and Michael S Lew. 2008. The MIR flickr retrieval evaluation. In Proceedings of the 1st ACM international conference on Multimedia information retrieval. ACM, 39--43.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. CG Khatri and C Radhakrishna Rao. 1968. Solutions to some functional equations and their applications to characterization of probability distributions. Sankhy=a: The Indian Journal of Statistics, Series A (1968), 167--180.Google ScholarGoogle Scholar
  15. Hyon-Jung Kim, Esa Ollila, Visa Koivunen, and Christophe Croux. 2013. Robust and sparse estimation of tensor decompositions. In 2013 IEEE Global Conference on Signal and Information Processing. IEEE, 965--968.Google ScholarGoogle ScholarCross RefCross Ref
  16. Thomas N Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. Proceedings of the International Conference on Learning Representations (2017).Google ScholarGoogle Scholar
  17. Tamara G Kolda and Brett W Bader. 2009. Tensor decompositions and applications. SIAM review, Vol. 51, 3 (2009), 455--500.Google ScholarGoogle Scholar
  18. Timothee Lacroix, Nicolas Usunier, and Guillaume Obozinski. 2018. Canonical Tensor Decomposition for Knowledge Base Completion. In International Conference on Machine Learning. 2869--2878.Google ScholarGoogle Scholar
  19. Dong Li, Zhiming Xu, Sheng Li, and Xin Sun. 2013. Link prediction in social networks based on hypergraph. In Proceedings of the 22nd International Conference on World Wide Web. ACM, 41--42.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Hang Li, Haozheng Wang, Zhenglu Yang, and Masato Odagaki. 2017. Variation autoencoder based network representation learning for classification. In Proceedings of ACL 2017, Student Research Workshop. 56--61.Google ScholarGoogle ScholarCross RefCross Ref
  21. Wu-Jun Li and Dit-Yan Yeung. 2009. Relation regularized matrix factorization. In Twenty-First International Joint Conference on Artificial Intelligence .Google ScholarGoogle Scholar
  22. Xirong Li, Tiberio Uricchio, Lamberto Ballan, Marco Bertini, Cees GM Snoek, and Alberto Del Bimbo. 2016. Socializing the semantic gap: A comparative survey on image tag assignment, refinement, and retrieval. ACM Computing Surveys (CSUR), Vol. 49, 1 (2016), 14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Zechao Li and Jinhui Tang. 2016. Weakly supervised deep matrix factorization for social image understanding. IEEE Transactions on Image Processing, Vol. 26, 1 (2016), 276--288.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Zechao Li, Jinhui Tang, and Tao Mei. 2018. Deep collaborative embedding for social image understanding. IEEE transactions on pattern analysis and machine intelligence (2018).Google ScholarGoogle Scholar
  25. Koji Maruhashi, Masaru Todoriki, Takuya Ohwa, Keisuke Goto, Yu Hasegawa, Hiroya Inakoshi, and Hirokazu Anai. 2018. Learning multi-way relations via tensor decomposition with neural networks. In Thirty-Second AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  26. Jonathan Masci, Davide Boscaini, Michael Bronstein, and Pierre Vandergheynst. 2015. Geodesic convolutional neural networks on riemannian manifolds. In Proceedings of the IEEE international conference on computer vision workshops. 37--45.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Julian McAuley and Jure Leskovec. 2012. Image labeling on a network: using social-network metadata for image classification. In European conference on computer vision. Springer, 828--841.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Bradley N Miller, Istvan Albert, Shyong K Lam, Joseph A Konstan, and John Riedl. 2003. MovieLens unplugged: experiences with an occasionally connected recommender system. In Proceedings of the 8th international conference on Intelligent user interfaces. ACM, 263--266.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Federico Monti, Michael Bronstein, and Xavier Bresson. 2017. Geometric matrix completion with recurrent multi-graph neural networks. In Advances in Neural Information Processing Systems. 3697--3707.Google ScholarGoogle Scholar
  30. Atsuhiro Narita, Kohei Hayashi, Ryota Tomioka, and Hisashi Kashima. 2012. Tensor factorization using auxiliary information. Data Mining and Knowledge Discovery, Vol. 25, 2 (2012), 298--324.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y Ng. 2011. Multimodal deep learning. In Proceedings of the 28th international conference on machine learning (ICML-11). 689--696.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Nikhil Rasiwasia, Jose Costa Pereira, Emanuele Coviello, Gabriel Doyle, Gert RG Lanckriet, Roger Levy, and Nuno Vasconcelos. 2010. A new approach to cross-modal multimedia retrieval. In Proceedings of the 18th ACM international conference on Multimedia. ACM, 251--260.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Sam T Roweis and Lawrence K Saul. 2000. Nonlinear dimensionality reduction by locally linear embedding. science, Vol. 290, 5500 (2000), 2323--2326.Google ScholarGoogle Scholar
  34. Stevan Rudinac, Iva Gornishka, and Marcel Worring. 2017. Multimodal Classification of Violent Online Political Extremism Content with Graph Convolutional Networks. In Proceedings of the on Thematic Workshops of ACM Multimedia 2017. ACM, 245--252.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Jitao Sang, Jing Liu, and Changsheng Xu. 2011. Exploiting user information for image tag refinement. In Proceedings of the 19th ACM international conference on Multimedia. ACM, 1129--1132.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. David I Shuman, Sunil K Narang, Pascal Frossard, Antonio Ortega, and Pierre Vandergheynst. 2013. The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Processing Magazine, vol. 30, no. 3, pp. 83 -- 98 (2013).Google ScholarGoogle ScholarCross RefCross Ref
  37. Richard Socher, Danqi Chen, Christopher D Manning, and Andrew Ng. 2013. Reasoning with neural tensor networks for knowledge base completion. In Advances in neural information processing systems. 926--934.Google ScholarGoogle Scholar
  38. Nitish Srivastava and Ruslan R Salakhutdinov. 2012. Multimodal learning with deep boltzmann machines. In Advances in neural information processing systems. 2222--2230.Google ScholarGoogle Scholar
  39. Gjorgji Strezoski and Marcel Worring. 2017. Omniart: multi-task deep learning for artistic data analysis. arXiv preprint arXiv:1708.00684 (2017).Google ScholarGoogle Scholar
  40. Gjorgji Strezoski and Marcel Worring. 2018. OmniArt: A Large-scale Artistic Benchmark. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Vol. 14, 4 (2018), 88.Google ScholarGoogle Scholar
  41. Jinhui Tang, Zechao Li, Meng Wang, and Ruizhen Zhao. 2015a. Neighborhood discriminant hashing for large-scale image retrieval. IEEE Transactions on Image Processing, Vol. 24, 9 (2015), 2827--2840.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015b. Line: Large-scale information network embedding. In Proceedings of the 24th international conference on world wide web. International World Wide Web Conferences Steering Committee, 1067--1077.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Jinhui Tang, Xiangbo Shu, Zechao Li, Yu-Gang Jiang, and Qi Tian. 2019. Social Anchor-Unit Graph Regularized Tensor Completion for Large-Scale Image Retagging. IEEE transactions on pattern analysis and machine intelligence (2019).Google ScholarGoogle ScholarCross RefCross Ref
  44. Jinhui Tang, Xiangbo Shu, Guo-Jun Qi, Zechao Li, Meng Wang, Shuicheng Yan, and Ramesh Jain. 2017. Tri-clustered tensor completion for social-aware image tag refinement. IEEE transactions on pattern analysis and machine intelligence, Vol. 39, 8 (2017), 1662--1674.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Joshua B Tenenbaum, Vin De Silva, and John C Langford. 2000. A global geometric framework for nonlinear dimensionality reduction. science, Vol. 290, 5500 (2000), 2319--2323.Google ScholarGoogle Scholar
  46. Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, and Guillaume Bouchard. 2016. Complex embeddings for simple link prediction. In International Conference on Machine Learning. 2071--2080.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Xiaolong Wang and Abhinav Gupta. 2018. Videos as space-time region graphs. In Proceedings of the European Conference on Computer Vision (ECCV). 399--417.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Michael M Wolf, Alicia M Klinvex, and Daniel M Dunlavy. 2016. Advantages to modeling relational data using hypergraphs versus graphs. In 2016 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 1--7.Google ScholarGoogle ScholarCross RefCross Ref
  49. Fei Yan and Krystian Mikolajczyk. 2015. Deep correlation for matching images and text. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3441--3450.Google ScholarGoogle ScholarCross RefCross Ref
  50. Cheng Yang, Zhiyuan Liu, Deli Zhao, Maosong Sun, and Edward Chang. 2015. Network representation learning with rich text information. In Twenty-Fourth International Joint Conference on Artificial Intelligence.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Daokun Zhang, Jie Yin, Xingquan Zhu, and Chengqi Zhang. 2017. User Profile Preserving Social Network Embedding.. In IJCAI. 3378--3384.Google ScholarGoogle Scholar
  52. Dengyong Zhou, Jiayuan Huang, and Bernhard Schölkopf. 2007. Learning with hypergraphs: Clustering, classification, and embedding. In Advances in neural information processing systems. 1601--1608.Google ScholarGoogle Scholar

Index Terms

  1. HyperLearn: A Distributed Approach for Representation Learning in Datasets With Many Modalities

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          MM '19: Proceedings of the 27th ACM International Conference on Multimedia
          October 2019
          2794 pages
          ISBN:9781450368896
          DOI:10.1145/3343031

          Copyright © 2019 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 15 October 2019

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          MM '19 Paper Acceptance Rate252of936submissions,27%Overall Acceptance Rate995of4,171submissions,24%

          Upcoming Conference

          MM '24
          MM '24: The 32nd ACM International Conference on Multimedia
          October 28 - November 1, 2024
          Melbourne , VIC , Australia

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader