Skip to main content
Top
Published in: World Wide Web 6/2023

11-09-2023

Graph convolutional network for difficulty-controllable visual question generation

Authors: Feng Chen, Jiayuan Xie, Yi Cai, Zehang Lin, Qing Li, Tao Wang

Published in: World Wide Web | Issue 6/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this article, we address the problem of difficulty-controllable visual question generation, which is to generate questions that satisfy the given difficulty levels based on the images and target answer. The existing approach seems to generate questions following templates. For easy questions, the model presents both answers and it becomes a choice question; while for hard questions, the answer set is not part of the question. In fact, question difficulty should be reflected by the objects and their relationships in the question. Towards this end, we propose a graph-based model with three concrete modules: Difficulty-controllable Graph Convolutional Network (DGCN) module, fusion module and difficulty-controllable decoder, to generate questions with a controllable level of difficulty. We first define a difficulty label based on the difficult index from the education area to represent the difficulty of a question. Next, a DGCN module is proposed to learn image representations that capture relations between objects in an image conditioned on a given difficulty label. Then, we use a fusion module to jointly attend the image representations and answer representations to capture answer-related image features. Finally, a difficulty-controllable decoder combines difficulty information into decoder initialization and input at each time step to control the difficulty of generated questions. Experimental results demonstrate that our framework not only achieves significant improvements on several automatic evaluation metrics, but also can generate difficulty-controllable questions.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Ha, L.A., Yaneva, V., Baldwin, P., Mee, J.: Predicting the difficulty of multiple choice questions in a high-stakes medical exam. In: Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 11–20 (2019) Ha, L.A., Yaneva, V., Baldwin, P., Mee, J.: Predicting the difficulty of multiple choice questions in a high-stakes medical exam. In: Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 11–20 (2019)
2.
go back to reference Chen, F., Xie, J., Cai, Y., Wang, T., Li, Q.: Difficulty-controllable visual question generation. In: U, L.H., Spaniol, M., Sakurai, Y., Chen, J. (eds.) Web and Big Data - 5th International Joint Conference, APWeb-WAIM, vol. 12858, pp. 332–347 (2021) Chen, F., Xie, J., Cai, Y., Wang, T., Li, Q.: Difficulty-controllable visual question generation. In: U, L.H., Spaniol, M., Sakurai, Y., Chen, J. (eds.) Web and Big Data - 5th International Joint Conference, APWeb-WAIM, vol. 12858, pp. 332–347 (2021)
3.
go back to reference Lu, P., Ji, L., Zhang,W., Duan, N., Zhou, M.,Wang, J.: R-VQA: learning visual relation facts with semantic attention for visual question answering. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD, pp. 1880–1889 (2018) Lu, P., Ji, L., Zhang,W., Duan, N., Zhou, M.,Wang, J.: R-VQA: learning visual relation facts with semantic attention for visual question answering. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD, pp. 1880–1889 (2018)
4.
go back to reference Liao, Y., Bing, L., Li, P., Shi, S., Lam, W., Zhang, T.: Quase: Sequence editing under quantifiable guidance. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3855–3864 (2018) Liao, Y., Bing, L., Li, P., Shi, S., Lam, W., Zhang, T.: Quase: Sequence editing under quantifiable guidance. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3855–3864 (2018)
5.
go back to reference dos Santos, C.N., Melnyk, I., Padhi, I.: Fighting offensive language on social media with unsupervised text style transfer. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 189–194 (2018) dos Santos, C.N., Melnyk, I., Padhi, I.: Fighting offensive language on social media with unsupervised text style transfer. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 189–194 (2018)
6.
go back to reference Heilman, M., Smith, N.A.: Good question! statistical ranking for question generation. In: Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, pp. 609–617 (2010) Heilman, M., Smith, N.A.: Good question! statistical ranking for question generation. In: Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, pp. 609–617 (2010)
7.
go back to reference Lindberg, D., Popowich, F., Nesbit, J.C., Winne, P.H.: Generating natural language questions to support learning on-line. In: Proceedings of the 14th European Workshop on Natural Language Generation, pp. 105–114 (2013) Lindberg, D., Popowich, F., Nesbit, J.C., Winne, P.H.: Generating natural language questions to support learning on-line. In: Proceedings of the 14th European Workshop on Natural Language Generation, pp. 105–114 (2013)
8.
go back to reference Labutov, I., Basu, S., Vanderwende, L.: Deep questions without deep understanding. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (Volume 1: Long Papers), pp. 889–898 (2015) Labutov, I., Basu, S., Vanderwende, L.: Deep questions without deep understanding. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (Volume 1: Long Papers), pp. 889–898 (2015)
9.
go back to reference Zhao, Y., Ni, X., Ding, Y., Ke, Q.: Paragraph-level neural question generation with maxout pointer and gated self-attention networks. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3901–3910 (2018) Zhao, Y., Ni, X., Ding, Y., Ke, Q.: Paragraph-level neural question generation with maxout pointer and gated self-attention networks. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3901–3910 (2018)
10.
go back to reference Zhou, W., Zhang, M., Wu, Y.: Question-type driven question generation. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pp. 6031–6036 (2019) Zhou, W., Zhang, M., Wu, Y.: Question-type driven question generation. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pp. 6031–6036 (2019)
11.
go back to reference Li, J., Gao, Y., Bing, L., King, I., Lyu, M.R.: Improving question generation with to the point context. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pp. 3214–3224 (2019) Li, J., Gao, Y., Bing, L., King, I., Lyu, M.R.: Improving question generation with to the point context. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pp. 3214–3224 (2019)
12.
go back to reference Nema, P., Mohankumar, A.K., Khapra, M.M., Srinivasan, B.V., Ravindran, B.: Let’s ask again: Refine network for automatic question generation. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pp. 3312–3321 (2019) Nema, P., Mohankumar, A.K., Khapra, M.M., Srinivasan, B.V., Ravindran, B.: Let’s ask again: Refine network for automatic question generation. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pp. 3312–3321 (2019)
13.
go back to reference Scialom, T., Piwowarski, B., Staiano, J.: Self-attention architectures for answeragnostic neural question generation. In: Proceedings of the 57th Conference of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 6027–6032 (2019) Scialom, T., Piwowarski, B., Staiano, J.: Self-attention architectures for answeragnostic neural question generation. In: Proceedings of the 57th Conference of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 6027–6032 (2019)
14.
go back to reference Tuan, L.A., Shah, D.J., Barzilay, R.: Capturing greater context for question generation. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, pp. 9065–9072 (2020) Tuan, L.A., Shah, D.J., Barzilay, R.: Capturing greater context for question generation. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, pp. 9065–9072 (2020)
15.
go back to reference Du, X., Shao, J., Cardie, C.: Learning to ask: Neural question generation for reading comprehension. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1342–1352 (2017) Du, X., Shao, J., Cardie, C.: Learning to ask: Neural question generation for reading comprehension. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1342–1352 (2017)
16.
go back to reference [16] Zhou, Q., Yang, N., Wei, F., Tan, C., Bao, H., Zhou, M.: Neural question generation from text: A preliminary study. In: Natural Language Processing and Chinese Computing - 6th CCF International Conference, vol. 10619, pp. 662–671 (2017) [16] Zhou, Q., Yang, N., Wei, F., Tan, C., Bao, H., Zhou, M.: Neural question generation from text: A preliminary study. In: Natural Language Processing and Chinese Computing - 6th CCF International Conference, vol. 10619, pp. 662–671 (2017)
17.
go back to reference Kim, Y., Lee, H., Shin, J., Jung, K.: Improving neural question generation using answer separation. In: The Thirty-Third AAAI Conference on Artificial Intelligence, pp. 6602–6609 (2019) Kim, Y., Lee, H., Shin, J., Jung, K.: Improving neural question generation using answer separation. In: The Thirty-Third AAAI Conference on Artificial Intelligence, pp. 6602–6609 (2019)
18.
go back to reference Ma, X., Zhu, Q., Zhou, Y., Li, X.: Improving question generation with sentencelevel semantic matching and answer position inferring. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, pp. 8464–8471 (2020) Ma, X., Zhu, Q., Zhou, Y., Li, X.: Improving question generation with sentencelevel semantic matching and answer position inferring. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, pp. 8464–8471 (2020)
19.
go back to reference Gao, Y., Bing, L., Chen,W., Lyu, M.R., King, I.: Difficulty controllable generation of reading comprehension questions. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, pp. 4968–4974 (2019) Gao, Y., Bing, L., Chen,W., Lyu, M.R., King, I.: Difficulty controllable generation of reading comprehension questions. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, pp. 4968–4974 (2019)
20.
go back to reference Kumar, V., Hua, Y., Ramakrishnan, G., Qi, G., Gao, L., Li, Y.: Difficultycontrollable multi-hop question generation from knowledge graphs. In: The 18th International Semantic Web Conference, vol. 11778, pp. 382–398 (2019) Kumar, V., Hua, Y., Ramakrishnan, G., Qi, G., Gao, L., Li, Y.: Difficultycontrollable multi-hop question generation from knowledge graphs. In: The 18th International Semantic Web Conference, vol. 11778, pp. 382–398 (2019)
21.
go back to reference Li, X., Zhou, Z., Chen, L., Gao, L.: Residual attention-based lstm for video captioning. World Wide Web 22(2), 621–636 (2019)CrossRef Li, X., Zhou, Z., Chen, L., Gao, L.: Residual attention-based lstm for video captioning. World Wide Web 22(2), 621–636 (2019)CrossRef
22.
go back to reference Tian, H., Tao, Y., Pouyanfar, S., Chen, S.-C., Shyu, M.-L.: Multimodal deep representation learning for video classification. World Wide Web 22(3), 1325–1341 (2019)CrossRef Tian, H., Tao, Y., Pouyanfar, S., Chen, S.-C., Shyu, M.-L.: Multimodal deep representation learning for video classification. World Wide Web 22(3), 1325–1341 (2019)CrossRef
23.
go back to reference Chen, J., Zhang, S., Zeng, J., Zou, F., Li, Y.-F., Liu, T., Lu, P.: Multi-level, multimodal interactions for visual question answering over text in images. World Wide Web, 1–17 (2021) Chen, J., Zhang, S., Zeng, J., Zou, F., Li, Y.-F., Liu, T., Lu, P.: Multi-level, multimodal interactions for visual question answering over text in images. World Wide Web, 1–17 (2021)
24.
go back to reference Chaisorn, L., Chua, T.-S., Lee, C.-H.: A multi-modal approach to story segmentation for news video. World Wide Web 6(2), 187–208 (2003)CrossRef Chaisorn, L., Chua, T.-S., Lee, C.-H.: A multi-modal approach to story segmentation for news video. World Wide Web 6(2), 187–208 (2003)CrossRef
25.
go back to reference Zhang, Z., Wang, Z., Li, X., Liu, N., Guo, B., Yu, Z.: Modalnet: an aspectlevel sentiment classification model by exploring multimodal data with fusion discriminant attentional network. World Wide Web 24(6), 1957–1974 (2021)CrossRef Zhang, Z., Wang, Z., Li, X., Liu, N., Guo, B., Yu, Z.: Modalnet: an aspectlevel sentiment classification model by exploring multimodal data with fusion discriminant attentional network. World Wide Web 24(6), 1957–1974 (2021)CrossRef
26.
go back to reference Ren, M., Kiros, R., Zemel, R.: Exploring models and data for image question answering. In: Advances in Neural Information Processing Systems, pp. 2953–2961 (2015) Ren, M., Kiros, R., Zemel, R.: Exploring models and data for image question answering. In: Advances in Neural Information Processing Systems, pp. 2953–2961 (2015)
27.
go back to reference Mostafazadeh, N., Misra, I., Devlin, J., Mitchell, M., He, X., Vanderwende, L.: Generating natural questions about an image. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers), pp. 1802–1813 (2016) Mostafazadeh, N., Misra, I., Devlin, J., Mitchell, M., He, X., Vanderwende, L.: Generating natural questions about an image. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers), pp. 1802–1813 (2016)
28.
go back to reference Krishna, R., Bernstein, M., Fei-Fei, L.: Information maximizing visual question generation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2008–2018 (2019) Krishna, R., Bernstein, M., Fei-Fei, L.: Information maximizing visual question generation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2008–2018 (2019)
29.
go back to reference Xu, X., Wang, T., Yang, Y., Hanjalic, A., Shen, H.T.: Radial graph convolutional network for visual question generation. IEEE Trans. Neural Netw. Learn. Syst. 32(4), 1654–1667 (2021)CrossRef Xu, X., Wang, T., Yang, Y., Hanjalic, A., Shen, H.T.: Radial graph convolutional network for visual question generation. IEEE Trans. Neural Netw. Learn. Syst. 32(4), 1654–1667 (2021)CrossRef
30.
go back to reference Zhang, S., Qu, L., You, S., Yang, Z., Zhang, J.: Automatic generation of grounded visual questions. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, pp. 4235–4243 (2017) Zhang, S., Qu, L., You, S., Yang, Z., Zhang, J.: Automatic generation of grounded visual questions. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, pp. 4235–4243 (2017)
31.
go back to reference Fan, Z., Wei, Z., Li, P., Lan, Y., Huang, X.: A question type driven framework to diversify visual question generation. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, pp. 4048–4054 (2018) Fan, Z., Wei, Z., Li, P., Lan, Y., Huang, X.: A question type driven framework to diversify visual question generation. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, pp. 4048–4054 (2018)
32.
go back to reference Jain, U., Zhang, Z., Schwing, A.G.: Creativity: Generating diverse questions using variational autoencoders. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5415–5424 (2017) Jain, U., Zhang, Z., Schwing, A.G.: Creativity: Generating diverse questions using variational autoencoders. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5415–5424 (2017)
33.
go back to reference Li, Y., Duan, N., Zhou, B., Chu, X., Ouyang, W., Wang, X., Zhou, M.: Visual question generation as dual task of visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6116–6124 (2018) Li, Y., Duan, N., Zhou, B., Chu, X., Ouyang, W., Wang, X., Zhou, M.: Visual question generation as dual task of visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6116–6124 (2018)
34.
go back to reference Jain, U., Lazebnik, S., Schwing, A.G.: Two can play this game: visual dialog with discriminative question generation and answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5754–5763 (2018) Jain, U., Lazebnik, S., Schwing, A.G.: Two can play this game: visual dialog with discriminative question generation and answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5754–5763 (2018)
35.
go back to reference Teney, D., Anderson, P., He, X., van den Hengel, A.: Tips and tricks for visual question answering: Learnings from the 2017 challenge. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4223–4232(2018) Teney, D., Anderson, P., He, X., van den Hengel, A.: Tips and tricks for visual question answering: Learnings from the 2017 challenge. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4223–4232(2018)
36.
go back to reference Teney, D., Liu, L., van den Hengel, A.: Graph-structured representations for visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3233–3241 (2017) Teney, D., Liu, L., van den Hengel, A.: Graph-structured representations for visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3233–3241 (2017)
37.
go back to reference Zheng, C., Wu, Z., Wang, T., Yi, C., Li, Q.: Object-aware multimodal named entity recognition in social media posts with adversarial learning. IEEE Transactions on Multimedia (2020) Zheng, C., Wu, Z., Wang, T., Yi, C., Li, Q.: Object-aware multimodal named entity recognition in social media posts with adversarial learning. IEEE Transactions on Multimedia (2020)
38.
go back to reference Egly, R., Driver, J., Rafal, R.D.: Shifting visual attention between objects and locations: evidence from normal and parietal lesion subjects. J Exp Psychol Gen 123(2), 161 (1994)CrossRef Egly, R., Driver, J., Rafal, R.D.: Shifting visual attention between objects and locations: evidence from normal and parietal lesion subjects. J Exp Psychol Gen 123(2), 161 (1994)CrossRef
39.
go back to reference Scholl, B.J.: Objects and attention: The state of the art. Cognition 80(1–2), 1–46 (2001)CrossRef Scholl, B.J.: Objects and attention: The state of the art. Cognition 80(1–2), 1–46 (2001)CrossRef
40.
go back to reference Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations (2015) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations (2015)
41.
go back to reference He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
42.
go back to reference Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015) Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
43.
go back to reference Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., Chen, S., Kalantidis, Y., Li, L., Shamma, D.A., Bernstein, M.S., Fei-Fei, L.: Visual genome: Connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vis. 123(1), 32–73 (2017)MathSciNetCrossRef Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., Chen, S., Kalantidis, Y., Li, L., Shamma, D.A., Bernstein, M.S., Fei-Fei, L.: Visual genome: Connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vis. 123(1), 32–73 (2017)MathSciNetCrossRef
44.
go back to reference Huang, Q., Wei, J., Cai, Y., Zheng, C., Chen, J., Leung, H., Li, Q.: Aligned dual channel graph convolutional network for visual question answering. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7166–7176 (2020) Huang, Q., Wei, J., Cai, Y., Zheng, C., Chen, J., Leung, H., Li, Q.: Aligned dual channel graph convolutional network for visual question answering. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7166–7176 (2020)
45.
go back to reference Norcliffe-Brown,W., Vafeias, S., Parisot, S.: Learning conditioned graph structures for interpretable visual question answering. In: Advances in Neural Information Processing Systems, pp. 8344–8353 (2018) Norcliffe-Brown,W., Vafeias, S., Parisot, S.: Learning conditioned graph structures for interpretable visual question answering. In: Advances in Neural Information Processing Systems, pp. 8344–8353 (2018)
46.
go back to reference Monti, F., Boscaini, D., Masci, J., Rodol‘a, E., Svoboda, J., Bronstein, M.M.: Geometric deep learning on graphs and manifolds using mixture model cnns. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5425–5434 (2017) Monti, F., Boscaini, D., Masci, J., Rodol‘a, E., Svoboda, J., Bronstein, M.M.: Geometric deep learning on graphs and manifolds using mixture model cnns. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5425–5434 (2017)
47.
go back to reference Goyal, Y., Khot, T., Summers-Stay, D., Batra, D., Parikh, D.: Making the V in VQA matter: Elevating the role of image understanding in visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6325–6334 (2017) Goyal, Y., Khot, T., Summers-Stay, D., Batra, D., Parikh, D.: Making the V in VQA matter: Elevating the role of image understanding in visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6325–6334 (2017)
48.
go back to reference Desai, T., Moldovan, D.I.: Towards predicting difficulty of reading comprehension questions. In: Proceedings of the Thirty-Second International Florida Artificial Intelligence Research Society Conference, pp. 8–13 (2019) Desai, T., Moldovan, D.I.: Towards predicting difficulty of reading comprehension questions. In: Proceedings of the Thirty-Second International Florida Artificial Intelligence Research Society Conference, pp. 8–13 (2019)
49.
go back to reference Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., Zhang, L.: Bottom-up and top-down attention for image captioning and visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6077–6086 (2018) Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., Zhang, L.: Bottom-up and top-down attention for image captioning and visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6077–6086 (2018)
50.
go back to reference Kim, J., Jun, J., Zhang, B.: Bilinear attention networks. In: Advances in Neural Information Processing Systems, pp. 1571–1581 (2018) Kim, J., Jun, J., Zhang, B.: Bilinear attention networks. In: Advances in Neural Information Processing Systems, pp. 1571–1581 (2018)
51.
go back to reference Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: 3rd International Conference on Learning Representations (2015) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: 3rd International Conference on Learning Representations (2015)
52.
go back to reference Papineni, K., Roukos, S.,Ward, T., Zhu,W.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318 (2002) Papineni, K., Roukos, S.,Ward, T., Zhu,W.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318 (2002)
53.
go back to reference Denkowski, M.J., Lavie, A.: Meteor universal: Language specific translation evaluation for any target language. In: Proceedings of the Ninth Workshop on Statistical Machine Translation, pp. 376–380 (2014) Denkowski, M.J., Lavie, A.: Meteor universal: Language specific translation evaluation for any target language. In: Proceedings of the Ninth Workshop on Statistical Machine Translation, pp. 376–380 (2014)
54.
go back to reference Lin, C.-Y.: Rouge: A package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81 (2004) Lin, C.-Y.: Rouge: A package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81 (2004)
55.
go back to reference Gao, P., Jiang, Z., You, H., Lu, P., Hoi, S.C.H., Wang, X., Li, H.: Dynamic fusion with intra- and inter-modality attention flow for visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6639–6648 (2019) Gao, P., Jiang, Z., You, H., Lu, P., Hoi, S.C.H., Wang, X., Li, H.: Dynamic fusion with intra- and inter-modality attention flow for visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6639–6648 (2019)
56.
go back to reference Zhang, Y., Hare, J.S., Prügel-Bennett, A.: Learning to count objects in natural images for visual question answering. In: 6th International Conference on Learning Representations (2018) Zhang, Y., Hare, J.S., Prügel-Bennett, A.: Learning to count objects in natural images for visual question answering. In: 6th International Conference on Learning Representations (2018)
Metadata
Title
Graph convolutional network for difficulty-controllable visual question generation
Authors
Feng Chen
Jiayuan Xie
Yi Cai
Zehang Lin
Qing Li
Tao Wang
Publication date
11-09-2023
Publisher
Springer US
Published in
World Wide Web / Issue 6/2023
Print ISSN: 1386-145X
Electronic ISSN: 1573-1413
DOI
https://doi.org/10.1007/s11280-023-01202-x

Other articles of this Issue 6/2023

World Wide Web 6/2023 Go to the issue

Premium Partner