Top

Published in:

2023 | OriginalPaper | Chapter

SAViR-T: Spatially Attentive Visual Reasoning with Transformers

Authors : Pritish Sahu, Kalliopi Basioti, Vladimir Pavlovic

Published in: Machine Learning and Knowledge Discovery in Databases

Publisher: Springer Nature Switzerland

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

We present a novel computational model, SAViR-T, for the family of visual reasoning problems embodied in the Raven’s Progressive Matrices (RPM). Our model considers explicit spatial semantics of visual elements within each image in the puzzle, encoded as spatio-visual tokens, and learns the intra-image as well as the inter-image token dependencies, highly relevant for the visual reasoning task. Token-wise relationship, modeled through a transformer-based SAViR-T architecture, extract group (row or column) driven representations by leveraging the group-rule coherence and use this as the inductive bias to extract the underlying rule representations in the top two row (or column) per token in the RPM. We use this relation representations to locate the correct choice image that completes the last row or column for the RPM. Extensive experiments across both synthetic RPM benchmarks, including RAVEN, I-RAVEN, RAVEN-FAIR, and PGM, and the natural image-based “V-PROM” demonstrate that SAViR-T sets a new state-of-the-art for visual reasoning, exceeding prior models’ performance by a considerable margin.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects

next chapter A Scaling Law for Syn2real Transfer: How Much Is Your Pre-training Effective?

Available only for authorised users

\(16 = 8+ 8\) for eight Context and eight Choice images of an RPM.

Apps, J.N.: Abstract Thinking. In: Loue, S.J., Sajatovic, M. (eds.) Encyclopedia of Aging and Public Health. Springer, Boston, MA (2006). https://doi.org/10.1007/978-0-387-33754-8_2

Barrett, D., Hill, F., Santoro, A., Morcos, A., Lillicrap, T.: Measuring abstract reasoning in neural networks. In: International Conference on Machine Learning, pp. 511–520. PMLR (2018)

Benny, Y., Pekar, N., Wolf, L.: Scale-localized abstract reasoning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12557–12565 (2021)

Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229. Springer, Cham (2020)

Carpenter, P.A., Just, M.A., Shell, P.: What one intelligence test measures: a theoretical account of the processing in the raven progressive matrices test. Psychol. Rev. 97(3), 404 (1990)CrossRef

Chen, M.,et al.: Generative pretraining from pixels. In: International Conference on Machine Learning, pp. 1691–1703. PMLR (2020)

Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

10.

Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef

11.

Hu, S., Ma, Y., Liu, X., Wei, Y., Bai, S.: Stratified rule-aware network for abstract visual reasoning. arXiv preprint arXiv:2002.06838 (2020)

12.

Lake, B.M., Ullman, T.D., Tenenbaum, J.B., Gershman, S.J.: Building machines that learn and think like people. Behav. brain sciences 40 (2017)

13.

Li, L.H., Yatskar, M., Yin, D., Hsieh, C.J., Chang, K.W.: Visualbert: A simple and performant baseline for vision and language. arXiv preprint arXiv:1908.03557 (2019)

14.

Liu, Y., et al.: Zettlemoyer, L., Stoyanov, V.: Roberta: a robustly optimized BERt pretraining approach. arXiv preprint arXiv:1907.11692 (2019)

15.

Lovett, A., Forbus, K., Usher, J.: Analogy with qualitative spatial representations can simulate solving raven’s progressive matrices. In: Proceedings of the Annual Meeting of the Cognitive Science Society. vol. 29 (2007)

16.

Lovett, A., Forbus, K., Usher, J.: A structure-mapping model of raven’s progressive matrices. In: Proceedings of the Annual Meeting of the Cognitive Science Society. vol. 32 (2010)

17.

Lovett, A., Tomai, E., Forbus, K., Usher, J.: Solving geometric analogy problems through two-stage analogical mapping. Cogn. Sci. 33(7), 1192–1231 (2009)CrossRef

18.

Małkiński, M., Mańdziuk, J.: A review of emerging research directions in abstract visual reasoning. arXiv preprint arXiv:2202.10284 (2022)

19.

McGreggor, K., Goel, A.: Confident reasoning on raven’s progressive matrices tests. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 28 (2014)

20.

Palmer, S.E.: Hierarchical structure in perceptual representation. Cogn. Psychol. 9(4), 441–474 (1977)CrossRef

21.

Parmar, N., Vaswani, A., Uszkoreit, J., Kaiser, L., Shazeer, N., Ku, A., Tran, D.: Image transformer. In: International Conference on Machine Learning, pp. 4055–4064. PMLR (2018)

22.

Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving language understanding with unsupervised learning (2018)

23.

Raven, J.C., Court, J.H.: Raven’s Progressive Matrices and Vocabulary Scales, vol. 759. Oxford pyschologists Press, Oxford (1998)

24.

Santoro, A., et al.: A simple neural network module for relational reasoning. In: 30th Proceedings of Conference on Advances in Neural Information Processing Systems (2017)

25.

Teney, D., Wang, P., Cao, J., Liu, L., Shen, C., van den Hengel, A.: V-PROM: a benchmark for visual reasoning using visual progressive matrices. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 12071–12078 (2020)

26.

Vaswani, A., et al.: Attention is all you need. In: 30th Proceedings of Advances in Neural Information Processing Systems (2017)

27.

Wang, D., Jamnik, M., Lio, P.: Abstract diagrammatic reasoning with multiplex graph networks. arXiv preprint arXiv:2006.11197 (2020)

28.

Wang, K., Su, Z.: Automatic generation of raven’s progressive matrices. In: Twenty-Fourth International Joint Conference on Artificial Intelligence (2015)

29.

Wang, Y., et al.: End-to-end video instance segmentation with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8741–8750 (2021)

30.

Wu, Y., Dong, H., Grosse, R., Ba, J.: The scattering compositional learner: discovering objects, attributes, relationships in analogical reasoning. arXiv preprint arXiv:2007.04212 (2020)

31.

Zeng, Y., Fu, J., Chao, H.: Learning Joint spatial-temporal transformations for video inpainting. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12361, pp. 528–543. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58517-4_31CrossRef

32.

Zhang, C., Gao, F., Jia, B., Zhu, Y., Zhu, S.C.: Raven: A dataset for relational and analogical visual reasoning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5317–5327 (2019)

33.

Zhang, C., Jia, B., Gao, F., Zhu, Y., Lu, H., Zhu, S.C.: Learning perceptual inference by contrasting. In: 32nd Proceedings of Advances in Neural Information Processing Systems (2019)

34.

Zheng, K., Zha, Z.J., Wei, W.: Abstract reasoning with distracting features. In: 32nd Advances in Neural Information Processing Systems (2019)

35.

Zhuo, T., Huang, Q., Kankanhalli, M.: Unsupervised abstract reasoning for raven’s problem matrices. IEEE Trans. Image Process. 30, 8332–8341 (2021)CrossRef

36.

Zhuo, T., Kankanhalli, M.: Effective abstract reasoning with dual-contrast network. In: International Conference on Learning Representations (2020)

Title: SAViR-T: Spatially Attentive Visual Reasoning with Transformers
Authors: Pritish Sahu
Kalliopi Basioti
Vladimir Pavlovic
Publisher: Springer Nature Switzerland
Book: Machine Learning and Knowledge Discovery in Databases
Print ISBN: 978-3-031-26408-5

Electronic ISBN: 978-3-031-26409-2

Copyright Year: 2023
DOI: https://doi.org/10.1007/978-3-031-26409-2_28

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner