Top

World Wide Web

Published in:

11-09-2023

Graph convolutional network for difficulty-controllable visual question generation

Authors: Feng Chen, Jiayuan Xie, Yi Cai, Zehang Lin, Qing Li, Tao Wang

Published in: World Wide Web | Issue 6/2023

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

In this article, we address the problem of difficulty-controllable visual question generation, which is to generate questions that satisfy the given difficulty levels based on the images and target answer. The existing approach seems to generate questions following templates. For easy questions, the model presents both answers and it becomes a choice question; while for hard questions, the answer set is not part of the question. In fact, question difficulty should be reflected by the objects and their relationships in the question. Towards this end, we propose a graph-based model with three concrete modules: Difficulty-controllable Graph Convolutional Network (DGCN) module, fusion module and difficulty-controllable decoder, to generate questions with a controllable level of difficulty. We first define a difficulty label based on the difficult index from the education area to represent the difficulty of a question. Next, a DGCN module is proposed to learn image representations that capture relations between objects in an image conditioned on a given difficulty label. Then, we use a fusion module to jointly attend the image representations and answer representations to capture answer-related image features. Finally, a difficulty-controllable decoder combines difficulty information into decoder initialization and input at each time step to control the difficulty of generated questions. Experimental results demonstrate that our framework not only achieves significant improvements on several automatic evaluation metrics, but also can generate difficulty-controllable questions.

next article Efficient continuous kNN join over dynamic high-dimensional data

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Ha, L.A., Yaneva, V., Baldwin, P., Mee, J.: Predicting the difficulty of multiple choice questions in a high-stakes medical exam. In: Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 11–20 (2019)

Chen, F., Xie, J., Cai, Y., Wang, T., Li, Q.: Difficulty-controllable visual question generation. In: U, L.H., Spaniol, M., Sakurai, Y., Chen, J. (eds.) Web and Big Data - 5th International Joint Conference, APWeb-WAIM, vol. 12858, pp. 332–347 (2021)

Lu, P., Ji, L., Zhang,W., Duan, N., Zhou, M.,Wang, J.: R-VQA: learning visual relation facts with semantic attention for visual question answering. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD, pp. 1880–1889 (2018)

Liao, Y., Bing, L., Li, P., Shi, S., Lam, W., Zhang, T.: Quase: Sequence editing under quantifiable guidance. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3855–3864 (2018)

dos Santos, C.N., Melnyk, I., Padhi, I.: Fighting offensive language on social media with unsupervised text style transfer. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 189–194 (2018)

Heilman, M., Smith, N.A.: Good question! statistical ranking for question generation. In: Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, pp. 609–617 (2010)

Lindberg, D., Popowich, F., Nesbit, J.C., Winne, P.H.: Generating natural language questions to support learning on-line. In: Proceedings of the 14th European Workshop on Natural Language Generation, pp. 105–114 (2013)

Labutov, I., Basu, S., Vanderwende, L.: Deep questions without deep understanding. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (Volume 1: Long Papers), pp. 889–898 (2015)

Zhao, Y., Ni, X., Ding, Y., Ke, Q.: Paragraph-level neural question generation with maxout pointer and gated self-attention networks. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3901–3910 (2018)

10.

Zhou, W., Zhang, M., Wu, Y.: Question-type driven question generation. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pp. 6031–6036 (2019)

11.

Li, J., Gao, Y., Bing, L., King, I., Lyu, M.R.: Improving question generation with to the point context. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pp. 3214–3224 (2019)

12.

Nema, P., Mohankumar, A.K., Khapra, M.M., Srinivasan, B.V., Ravindran, B.: Let’s ask again: Refine network for automatic question generation. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pp. 3312–3321 (2019)

13.

Scialom, T., Piwowarski, B., Staiano, J.: Self-attention architectures for answeragnostic neural question generation. In: Proceedings of the 57th Conference of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 6027–6032 (2019)

14.

Tuan, L.A., Shah, D.J., Barzilay, R.: Capturing greater context for question generation. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, pp. 9065–9072 (2020)

15.

Du, X., Shao, J., Cardie, C.: Learning to ask: Neural question generation for reading comprehension. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1342–1352 (2017)

16.

[16] Zhou, Q., Yang, N., Wei, F., Tan, C., Bao, H., Zhou, M.: Neural question generation from text: A preliminary study. In: Natural Language Processing and Chinese Computing - 6th CCF International Conference, vol. 10619, pp. 662–671 (2017)

17.

Kim, Y., Lee, H., Shin, J., Jung, K.: Improving neural question generation using answer separation. In: The Thirty-Third AAAI Conference on Artificial Intelligence, pp. 6602–6609 (2019)

18.

Ma, X., Zhu, Q., Zhou, Y., Li, X.: Improving question generation with sentencelevel semantic matching and answer position inferring. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, pp. 8464–8471 (2020)

19.

Gao, Y., Bing, L., Chen,W., Lyu, M.R., King, I.: Difficulty controllable generation of reading comprehension questions. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, pp. 4968–4974 (2019)

20.

Kumar, V., Hua, Y., Ramakrishnan, G., Qi, G., Gao, L., Li, Y.: Difficultycontrollable multi-hop question generation from knowledge graphs. In: The 18th International Semantic Web Conference, vol. 11778, pp. 382–398 (2019)

21.

Li, X., Zhou, Z., Chen, L., Gao, L.: Residual attention-based lstm for video captioning. World Wide Web 22(2), 621–636 (2019)CrossRef

22.

Tian, H., Tao, Y., Pouyanfar, S., Chen, S.-C., Shyu, M.-L.: Multimodal deep representation learning for video classification. World Wide Web 22(3), 1325–1341 (2019)CrossRef

23.

Chen, J., Zhang, S., Zeng, J., Zou, F., Li, Y.-F., Liu, T., Lu, P.: Multi-level, multimodal interactions for visual question answering over text in images. World Wide Web, 1–17 (2021)

24.

Chaisorn, L., Chua, T.-S., Lee, C.-H.: A multi-modal approach to story segmentation for news video. World Wide Web 6(2), 187–208 (2003)CrossRef

25.

Zhang, Z., Wang, Z., Li, X., Liu, N., Guo, B., Yu, Z.: Modalnet: an aspectlevel sentiment classification model by exploring multimodal data with fusion discriminant attentional network. World Wide Web 24(6), 1957–1974 (2021)CrossRef

26.

Ren, M., Kiros, R., Zemel, R.: Exploring models and data for image question answering. In: Advances in Neural Information Processing Systems, pp. 2953–2961 (2015)

27.

Mostafazadeh, N., Misra, I., Devlin, J., Mitchell, M., He, X., Vanderwende, L.: Generating natural questions about an image. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers), pp. 1802–1813 (2016)

28.

Krishna, R., Bernstein, M., Fei-Fei, L.: Information maximizing visual question generation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2008–2018 (2019)

29.

Xu, X., Wang, T., Yang, Y., Hanjalic, A., Shen, H.T.: Radial graph convolutional network for visual question generation. IEEE Trans. Neural Netw. Learn. Syst. 32(4), 1654–1667 (2021)CrossRef

30.

Zhang, S., Qu, L., You, S., Yang, Z., Zhang, J.: Automatic generation of grounded visual questions. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, pp. 4235–4243 (2017)

31.

Fan, Z., Wei, Z., Li, P., Lan, Y., Huang, X.: A question type driven framework to diversify visual question generation. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, pp. 4048–4054 (2018)

32.

Jain, U., Zhang, Z., Schwing, A.G.: Creativity: Generating diverse questions using variational autoencoders. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5415–5424 (2017)

33.

Li, Y., Duan, N., Zhou, B., Chu, X., Ouyang, W., Wang, X., Zhou, M.: Visual question generation as dual task of visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6116–6124 (2018)

34.

Jain, U., Lazebnik, S., Schwing, A.G.: Two can play this game: visual dialog with discriminative question generation and answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5754–5763 (2018)

35.

Teney, D., Anderson, P., He, X., van den Hengel, A.: Tips and tricks for visual question answering: Learnings from the 2017 challenge. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4223–4232(2018)

36.

Teney, D., Liu, L., van den Hengel, A.: Graph-structured representations for visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3233–3241 (2017)

37.

Zheng, C., Wu, Z., Wang, T., Yi, C., Li, Q.: Object-aware multimodal named entity recognition in social media posts with adversarial learning. IEEE Transactions on Multimedia (2020)

38.

Egly, R., Driver, J., Rafal, R.D.: Shifting visual attention between objects and locations: evidence from normal and parietal lesion subjects. J Exp Psychol Gen 123(2), 161 (1994)CrossRef

39.

Scholl, B.J.: Objects and attention: The state of the art. Cognition 80(1–2), 1–46 (2001)CrossRef

40.

Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations (2015)

41.

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

42.

Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)

43.

Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., Chen, S., Kalantidis, Y., Li, L., Shamma, D.A., Bernstein, M.S., Fei-Fei, L.: Visual genome: Connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vis. 123(1), 32–73 (2017)MathSciNetCrossRef

44.

Huang, Q., Wei, J., Cai, Y., Zheng, C., Chen, J., Leung, H., Li, Q.: Aligned dual channel graph convolutional network for visual question answering. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7166–7176 (2020)

45.

Norcliffe-Brown,W., Vafeias, S., Parisot, S.: Learning conditioned graph structures for interpretable visual question answering. In: Advances in Neural Information Processing Systems, pp. 8344–8353 (2018)

46.

Monti, F., Boscaini, D., Masci, J., Rodol‘a, E., Svoboda, J., Bronstein, M.M.: Geometric deep learning on graphs and manifolds using mixture model cnns. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5425–5434 (2017)

47.

Goyal, Y., Khot, T., Summers-Stay, D., Batra, D., Parikh, D.: Making the V in VQA matter: Elevating the role of image understanding in visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6325–6334 (2017)

48.

Desai, T., Moldovan, D.I.: Towards predicting difficulty of reading comprehension questions. In: Proceedings of the Thirty-Second International Florida Artificial Intelligence Research Society Conference, pp. 8–13 (2019)

49.

Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., Zhang, L.: Bottom-up and top-down attention for image captioning and visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6077–6086 (2018)

50.

Kim, J., Jun, J., Zhang, B.: Bilinear attention networks. In: Advances in Neural Information Processing Systems, pp. 1571–1581 (2018)

51.

Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: 3rd International Conference on Learning Representations (2015)

52.

Papineni, K., Roukos, S.,Ward, T., Zhu,W.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318 (2002)

53.

Denkowski, M.J., Lavie, A.: Meteor universal: Language specific translation evaluation for any target language. In: Proceedings of the Ninth Workshop on Statistical Machine Translation, pp. 376–380 (2014)

54.

Lin, C.-Y.: Rouge: A package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81 (2004)

55.

Gao, P., Jiang, Z., You, H., Lu, P., Hoi, S.C.H., Wang, X., Li, H.: Dynamic fusion with intra- and inter-modality attention flow for visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6639–6648 (2019)

56.

Zhang, Y., Hare, J.S., Prügel-Bennett, A.: Learning to count objects in natural images for visual question answering. In: 6th International Conference on Learning Representations (2018)

Title: Graph convolutional network for difficulty-controllable visual question generation
Authors: Feng Chen
Jiayuan Xie
Yi Cai
Zehang Lin
Qing Li
Tao Wang
Publication date: 11-09-2023
Publisher: Springer US
Published in: World Wide Web / Issue 6/2023
Print ISSN: 1386-145X
Electronic ISSN: 1573-1413
DOI: https://doi.org/10.1007/s11280-023-01202-x

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Other articles of this Issue 6/2023

Enhancing bitcoin transaction confirmation prediction: a hybrid model combining neural networks and XGBoost

A semi-supervised framework for concept-based hierarchical document clustering

Discovering time series motifs of all lengths using dynamic time warping

Low-cost crossed probing path planning for network failure localization

Click is not equal to purchase: multi-task reinforcement learning for multi-behavior recommendation

KC-GEE: knowledge-based conditioning for generative event extraction

Premium Partner