Skip to main content
Top
Published in: Neural Computing and Applications 14/2020

15-01-2020 | Original Article

Gated multimodal networks

Authors: John Arevalo, Thamar Solorio, Manuel Montes-y-Gómez, Fabio A. González

Published in: Neural Computing and Applications | Issue 14/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This paper considers the problem of leveraging multiple sources of information or data modalities (e.g., images and text) in neural networks. We define a novel model called gated multimodal unit (GMU), designed as an internal unit in a neural network architecture whose purpose is to find an intermediate representation based on a combination of data from different modalities. The GMU learns to decide how modalities influence the activation of the unit using multiplicative gates. The GMU can be used as a building block for different kinds of neural networks and can be seen as a form of intermediate fusion. The model was evaluated on two multimodal learning tasks in conjunction with fully connected and convolutional neural networks. We compare the GMU with other early- and late-fusion methods, outperforming classification scores in two benchmark datasets: MM-IMDb and DeepScene.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
2.
go back to reference Alvear-Sandoval RF, Figueiras-Vidal AR (2018) On building ensembles of stacked denoising auto-encoding classifiers and their further improvement. Inf Fusion 39:41–52CrossRef Alvear-Sandoval RF, Figueiras-Vidal AR (2018) On building ensembles of stacked denoising auto-encoding classifiers and their further improvement. Inf Fusion 39:41–52CrossRef
4.
go back to reference Andrew G, Arora R, Bilmes JA, Livescu K (2013) Deep canonical correlation analysis. In: ICML (3), pp 1247–1255 Andrew G, Arora R, Bilmes JA, Livescu K (2013) Deep canonical correlation analysis. In: ICML (3), pp 1247–1255
5.
go back to reference Antol S, Agrawal A, Lu J, Mitchell M, Batra D, Lawrence Zitnick C, Parikh D (2015) Vqa: visual question answering. In: Proceedings of the IEEE international conference on computer vision, pp 2425–2433 Antol S, Agrawal A, Lu J, Mitchell M, Batra D, Lawrence Zitnick C, Parikh D (2015) Vqa: visual question answering. In: Proceedings of the IEEE international conference on computer vision, pp 2425–2433
6.
go back to reference Arevalo J, Solorio T, Montes-y Gómez M, González FA (2017) Gated multimodal units for information fusion. In: 5th international conference on learning representations 2017 workshop Arevalo J, Solorio T, Montes-y Gómez M, González FA (2017) Gated multimodal units for information fusion. In: 5th international conference on learning representations 2017 workshop
8.
go back to reference Bengio Y (2012) Practical recommendations for gradient-based training of deep architectures. In: Montavon G, Orr GB, Müller KR (eds) Neural networks: tricks of the trade. Springer, Berlin, pp 437–478CrossRef Bengio Y (2012) Practical recommendations for gradient-based training of deep architectures. In: Montavon G, Orr GB, Müller KR (eds) Neural networks: tricks of the trade. Springer, Berlin, pp 437–478CrossRef
9.
go back to reference Bengio Y, Ducharme R, Vincent P, Janvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155MATH Bengio Y, Ducharme R, Vincent P, Janvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155MATH
10.
go back to reference Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(Feb):281–305MathSciNetMATH Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(Feb):281–305MathSciNetMATH
12.
go back to reference Bouckaert RR, Frank E (2004) Evaluating the replicability of significance tests for comparing learning algorithms. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 3–12 Bouckaert RR, Frank E (2004) Evaluating the replicability of significance tests for comparing learning algorithms. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 3–12
13.
go back to reference Chen LC, Yang Y, Wang J, Xu W, Yuille AL (2016) Attention to scale: scale-aware semantic image segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3640–3649 Chen LC, Yang Y, Wang J, Xu W, Yuille AL (2016) Attention to scale: scale-aware semantic image segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3640–3649
14.
go back to reference Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:14061078 Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:​14061078
15.
go back to reference Choromanska A, Henaff M, Mathieu M, Arous GB, LeCun Y (2015) The loss surfaces of multilayer networks. J Mach Learn Res 38:192–204 Choromanska A, Henaff M, Mathieu M, Arous GB, LeCun Y (2015) The loss surfaces of multilayer networks. J Mach Learn Res 38:192–204
16.
go back to reference Coates A, Ng AY (2011) The importance of encoding versus training with sparse coding and vector quantization. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 921–928 Coates A, Ng AY (2011) The importance of encoding versus training with sparse coding and vector quantization. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 921–928
18.
go back to reference Dietterich TG (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10(7):1895–1923CrossRef Dietterich TG (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10(7):1895–1923CrossRef
19.
go back to reference Feng F, Li R, Wang X (2013) Constructing hierarchical image-tags bimodal representations for word tags alternative choice. arXiv preprint arXiv:13071275 Feng F, Li R, Wang X (2013) Constructing hierarchical image-tags bimodal representations for word tags alternative choice. arXiv preprint arXiv:​13071275
20.
go back to reference Fernando T, Denman S, Sridharan S, Fookes C (2018) Pedestrian trajectory prediction with structured memory hierarchies. arXiv preprint arXiv:180708381 Fernando T, Denman S, Sridharan S, Fookes C (2018) Pedestrian trajectory prediction with structured memory hierarchies. arXiv preprint arXiv:​180708381
21.
go back to reference Frome A, Corrado GS, Shlens J, Bengio S, Dean J, Ranzato MA, Mikolov T (2013) DeViSE: a deep visual-semantic embedding model. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems, vol 26. Curran Associates Inc., Hook, pp 2121–2129 Frome A, Corrado GS, Shlens J, Bengio S, Dean J, Ranzato MA, Mikolov T (2013) DeViSE: a deep visual-semantic embedding model. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems, vol 26. Curran Associates Inc., Hook, pp 2121–2129
22.
go back to reference Goodfellow I, Warde-farley D, Mirza M, Courville A, Bengio Y (2013) Maxout networks. In: Dasgupta S, Mcallester D (eds) Proceedings of the 30th international conference on machine learning (ICML-13), JMLR workshop and conference proceedings, vol 28, pp 1319–1327 Goodfellow I, Warde-farley D, Mirza M, Courville A, Bengio Y (2013) Maxout networks. In: Dasgupta S, Mcallester D (eds) Proceedings of the 30th international conference on machine learning (ICML-13), JMLR workshop and conference proceedings, vol 28, pp 1319–1327
24.
go back to reference Huang EH, Socher R, Manning CD, Ng A (2012) Improving word representations via global context and multiple word prototypes. In: Proceedings of the 50th annual meeting of the association for computational linguistics: long papers, vol 1. Association for Computational Linguistics, pp 873–882 Huang EH, Socher R, Manning CD, Ng A (2012) Improving word representations via global context and multiple word prototypes. In: Proceedings of the 50th annual meeting of the association for computational linguistics: long papers, vol 1. Association for Computational Linguistics, pp 873–882
25.
go back to reference Huete A, Justice C, Van Leeuwen W (1999) Modis vegetation index (mod13). Algorithm Theor basis Doc 3:213 Huete A, Justice C, Van Leeuwen W (1999) Modis vegetation index (mod13). Algorithm Theor basis Doc 3:213
26.
go back to reference Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of The 32nd international conference on machine learning, pp 448–456 Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of The 32nd international conference on machine learning, pp 448–456
27.
go back to reference Ivasic-Kos M, Pobar M, Mikec L (2014) Movie posters classification into genres based on low-level features. In: 2014 37th international convention on information and communication technology, electronics and microelectronics (MIPRO), vol i. IEEE, pp 1198–1203. https://doi.org/10.1109/MIPRO.2014.6859750 Ivasic-Kos M, Pobar M, Mikec L (2014) Movie posters classification into genres based on low-level features. In: 2014 37th international convention on information and communication technology, electronics and microelectronics (MIPRO), vol i. IEEE, pp 1198–1203. https://​doi.​org/​10.​1109/​MIPRO.​2014.​6859750
29.
go back to reference Jacobs RA, Jordan MI, Nowlan SJ, Hinton GE (1991) Adaptive mixtures of local experts. Neural Comput 3(1):79–87CrossRef Jacobs RA, Jordan MI, Nowlan SJ, Hinton GE (1991) Adaptive mixtures of local experts. Neural Comput 3(1):79–87CrossRef
30.
go back to reference Janowczyk A, Madabhushi A (2016) Deep learning for digital pathology image analysis: a comprehensive tutorial with selected use cases. J Pathol Inform 7:1–29CrossRef Janowczyk A, Madabhushi A (2016) Deep learning for digital pathology image analysis: a comprehensive tutorial with selected use cases. J Pathol Inform 7:1–29CrossRef
31.
go back to reference Johnson J, Karpathy A, Fei-Fei L (2015) Densecap: fully convolutional localization networks for dense captioning. arXiv preprint arXiv:151107571 Johnson J, Karpathy A, Fei-Fei L (2015) Densecap: fully convolutional localization networks for dense captioning. arXiv preprint arXiv:​151107571
33.
go back to reference Kang Y, Kim S, Choi S (2012) Deep learning to hash with multiple representations. In: 2012 IEEE 12th international conference on data mining. IEEE, pp 930–935 Kang Y, Kim S, Choi S (2012) Deep learning to hash with multiple representations. In: 2012 IEEE 12th international conference on data mining. IEEE, pp 930–935
34.
go back to reference Kiela D, Bottou L (2014) Learning image embeddings using convolutional neural networks for improved multi-modal semantics. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP-14), pp 36–45 Kiela D, Bottou L (2014) Learning image embeddings using convolutional neural networks for improved multi-modal semantics. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP-14), pp 36–45
35.
go back to reference Kiela D, Grave E, Joulin A, Mikolov T (2018) Efficient large-scale multi-modal classification. arXiv preprint arXiv:180202892 Kiela D, Grave E, Joulin A, Mikolov T (2018) Efficient large-scale multi-modal classification. arXiv preprint arXiv:​180202892
37.
go back to reference Kiros R, Salakhutdinov R, Zemel RS (2014a) Multimodal neural language models. ICML 14:595–603 Kiros R, Salakhutdinov R, Zemel RS (2014a) Multimodal neural language models. ICML 14:595–603
38.
go back to reference Kiros R, Salakhutdinov R, Zemel RS (2014b) Unifying visual-semantic embeddings with multimodal neural language models. arXiv preprint arXiv:14112539 Kiros R, Salakhutdinov R, Zemel RS (2014b) Unifying visual-semantic embeddings with multimodal neural language models. arXiv preprint arXiv:​14112539
39.
go back to reference Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems 25. Curran Associates Inc, New york, pp 1097–1105 Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems 25. Curran Associates Inc, New york, pp 1097–1105
41.
go back to reference Li Deng DY (2014) Deep learning: methods and applications. NOW Publishers, BostonCrossRef Li Deng DY (2014) Deep learning: methods and applications. NOW Publishers, BostonCrossRef
42.
go back to reference Liu F, Shen C, Lin G (2015) Deep convolutional neural fields for depth estimation from a single image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5162–5170 Liu F, Shen C, Lin G (2015) Deep convolutional neural fields for depth estimation from a single image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5162–5170
45.
go back to reference Lu X, Wu F, Li X, Zhang Y, Lu W, Wang D, Zhuang Y (2014) Learning multimodal neural network with ranking examples. In: Proceedings of the 22nd ACM international conference on multimedia. ACM, pp 985–988 Lu X, Wu F, Li X, Zhang Y, Lu W, Wang D, Zhuang Y (2014) Learning multimodal neural network with ranking examples. In: Proceedings of the 22nd ACM international conference on multimedia. ACM, pp 985–988
49.
go back to reference Mandal D, Biswas S (2016) Generalized coupled dictionary learning approach with applications to cross-modal matching. IEEE Trans Image Process 25(8):3826–3837MathSciNetMATHCrossRef Mandal D, Biswas S (2016) Generalized coupled dictionary learning approach with applications to cross-modal matching. IEEE Trans Image Process 25(8):3826–3837MathSciNetMATHCrossRef
50.
go back to reference Mao J, Xu W, Yang Y, Wang J, Yuille AL (2014) Explain images with multimodal recurrent neural networks. arXiv preprint arXiv:14101090 Mao J, Xu W, Yang Y, Wang J, Yuille AL (2014) Explain images with multimodal recurrent neural networks. arXiv preprint arXiv:​14101090
51.
go back to reference Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:13013781 Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:​13013781
52.
go back to reference Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems. Curran Associates Inc, New york, pp 3111–3119 Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems. Curran Associates Inc, New york, pp 3111–3119
54.
go back to reference Norouzi M, Mikolov T, Bengio S, Singer Y, Shlens J, Frome A, Corrado GS, Dean J (2014) Zero-shot learning by convex combination of semantic embeddings. CoRR abs/1312.5, arxiv:1312.5650 Norouzi M, Mikolov T, Bengio S, Singer Y, Shlens J, Frome A, Corrado GS, Dean J (2014) Zero-shot learning by convex combination of semantic embeddings. CoRR abs/1312.5, arxiv:​1312.​5650
56.
go back to reference Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556 Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:​14091556
57.
go back to reference Socher R, Ganjoo M, Manning CD, Ng A (2013) Zero-shot learning through cross-modal transfer. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems, vol 26. Curran Associates Inc, Hook, pp 935–943 Socher R, Ganjoo M, Manning CD, Ng A (2013) Zero-shot learning through cross-modal transfer. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems, vol 26. Curran Associates Inc, Hook, pp 935–943
58.
go back to reference Socher R, Karpathy A, Le QV, Manning CD, Ng AY (2014) Grounded compositional semantics for finding and describing images with sentences. Trans Assoc Comput Linguist (TACL) 2:207–218CrossRef Socher R, Karpathy A, Le QV, Manning CD, Ng AY (2014) Grounded compositional semantics for finding and describing images with sentences. Trans Assoc Comput Linguist (TACL) 2:207–218CrossRef
59.
go back to reference Srivastava N, Salakhutdinov R (2012) Multimodal learning with deep Boltzmann machines. In: Pereira F, Burges C, Bottou L, Weinberger K (eds) Advances in neural information processing systems, vol 25. Curran Associates Inc, Hook, pp 2222–2230 Srivastava N, Salakhutdinov R (2012) Multimodal learning with deep Boltzmann machines. In: Pereira F, Burges C, Bottou L, Weinberger K (eds) Advances in neural information processing systems, vol 25. Curran Associates Inc, Hook, pp 2222–2230
61.
go back to reference Suk HI, Shen D (2013) Deep learning-based feature representation for AD/MCI classification. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 8150. LNCS, pp 583–590. https://doi.org/10.1007/978-3-642-40763-5_72 Suk HI, Shen D (2013) Deep learning-based feature representation for AD/MCI classification. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 8150. LNCS, pp 583–590. https://​doi.​org/​10.​1007/​978-3-642-40763-5_​72
62.
go back to reference Treml M, Arjona-Medina J, Unterthiner T, Durgesh R, Friedmann F, Schuberth P, Mayr A, Heusel M, Hofmarcher M, Widrich M et al (2016) Speeding up semantic segmentation for autonomous driving. NIPSW 1(7):8 Treml M, Arjona-Medina J, Unterthiner T, Durgesh R, Friedmann F, Schuberth P, Mayr A, Heusel M, Hofmarcher M, Widrich M et al (2016) Speeding up semantic segmentation for autonomous driving. NIPSW 1(7):8
63.
64.
go back to reference Valada A, Dhall A, Burgard W (2016) Convoluted mixture of deep experts for robust semantic segmentation. In: IEEE/RSJ international conference on intelligent robots and systems (IROS) workshop, state estimation and terrain perception for all terrain mobile robots Valada A, Dhall A, Burgard W (2016) Convoluted mixture of deep experts for robust semantic segmentation. In: IEEE/RSJ international conference on intelligent robots and systems (IROS) workshop, state estimation and terrain perception for all terrain mobile robots
66.
go back to reference Van Merriënboer B, Bahdanau D, Dumoulin V, Serdyuk D, Warde-Farley D, Chorowski J, Bengio Y (2015) Blocks and fuel: frameworks for deep learning. arXiv preprint arXiv:150600619 Van Merriënboer B, Bahdanau D, Dumoulin V, Serdyuk D, Warde-Farley D, Chorowski J, Bengio Y (2015) Blocks and fuel: frameworks for deep learning. arXiv preprint arXiv:​150600619
67.
go back to reference Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: a neural image caption generator. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164 Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: a neural image caption generator. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164
68.
go back to reference Wei Q (2015) Bayesian fusion of multi-band images: a powerful tool for super-resolution. Ph.D. thesis, Institut National Polytechnique de Toulouse (INPT) Wei Q (2015) Bayesian fusion of multi-band images: a powerful tool for super-resolution. Ph.D. thesis, Institut National Polytechnique de Toulouse (INPT)
69.
go back to reference Wei Q, Dobigeon N, Tourneret JY (2015) Bayesian fusion of multi-band images. IEEE J Sel Top Signal Process 9(6):1117–1127MATHCrossRef Wei Q, Dobigeon N, Tourneret JY (2015) Bayesian fusion of multi-band images. IEEE J Sel Top Signal Process 9(6):1117–1127MATHCrossRef
70.
go back to reference Wu P, Hoi SC, Xia H, Zhao P, Wang D, Miao C (2013) Online multimodal deep similarity learning with application to image retrieval. In: Proceedings of the 21st ACM international conference on multimedia—MM ’13. ACM Press, New York, pp 153–162. https://doi.org/10.1145/2502081.2502112 Wu P, Hoi SC, Xia H, Zhao P, Wang D, Miao C (2013) Online multimodal deep similarity learning with application to image retrieval. In: Proceedings of the 21st ACM international conference on multimedia—MM ’13. ACM Press, New York, pp 153–162. https://​doi.​org/​10.​1145/​2502081.​2502112
71.
go back to reference Wu Q, Teney D, Wang P, Shen C, Dick A, van den Hengel A (2017) Visual question answering: a survey of methods and datasets. Comput Vis Image Underst 163:21–40CrossRef Wu Q, Teney D, Wang P, Shen C, Dick A, van den Hengel A (2017) Visual question answering: a survey of methods and datasets. Comput Vis Image Underst 163:21–40CrossRef
72.
go back to reference Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhutdinov R, Zemel RS, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention 2(3):5. arXiv preprint arXiv:150203044 Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhutdinov R, Zemel RS, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention 2(3):5. arXiv preprint arXiv:​150203044
73.
go back to reference Yan R, Zhao D (2018) Smarter response with proactive suggestion: a new generative neural conversation paradigm. In: IJCAI, pp 4525–4531 Yan R, Zhao D (2018) Smarter response with proactive suggestion: a new generative neural conversation paradigm. In: IJCAI, pp 4525–4531
74.
go back to reference Yao L, Zhang Y, Feng Y, Zhao D, Yan R (2017) Towards implicit content-introducing for generative short-text conversation systems. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 2190–2199 Yao L, Zhang Y, Feng Y, Zhao D, Yan R (2017) Towards implicit content-introducing for generative short-text conversation systems. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 2190–2199
75.
go back to reference Ye F, Pu J, Wang J, Li Y, Zha H (2017) Glioma grading based on 3d multimodal convolutional neural network and privileged learning. In: 2017 IEEE international conference on bioinformatics and biomedicine (BIBM). IEEE, pp 759–763 Ye F, Pu J, Wang J, Li Y, Zha H (2017) Glioma grading based on 3d multimodal convolutional neural network and privileged learning. In: 2017 IEEE international conference on bioinformatics and biomedicine (BIBM). IEEE, pp 759–763
76.
go back to reference Yuksel SE, Wilson JN, Gader PD (2012) Twenty years of mixture of experts. IEEE Trans Neural Netw Learn Syst 23(8):1177–1193CrossRef Yuksel SE, Wilson JN, Gader PD (2012) Twenty years of mixture of experts. IEEE Trans Neural Netw Learn Syst 23(8):1177–1193CrossRef
77.
go back to reference Zhao J, Xie X, Xu X, Sun S (2017) Multi-view learning overview: recent progress and new challenges. Inf Fusion 38:43–54CrossRef Zhao J, Xie X, Xu X, Sun S (2017) Multi-view learning overview: recent progress and new challenges. Inf Fusion 38:43–54CrossRef
78.
go back to reference Zheng Y, Zhang YJ, Larochelle H (2014) Topic modeling of multimodal data: an autoregressive approach. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1370–1377 Zheng Y, Zhang YJ, Larochelle H (2014) Topic modeling of multimodal data: an autoregressive approach. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1370–1377
Metadata
Title
Gated multimodal networks
Authors
John Arevalo
Thamar Solorio
Manuel Montes-y-Gómez
Fabio A. González
Publication date
15-01-2020
Publisher
Springer London
Published in
Neural Computing and Applications / Issue 14/2020
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-019-04559-1

Other articles of this Issue 14/2020

Neural Computing and Applications 14/2020 Go to the issue

Premium Partner