Skip to main content
Top

2024 | OriginalPaper | Chapter

MbAbI: A Benchmark Dataset for Malayalam Text Understanding and Reasoning

Authors : K. Reji Rahmath, P. C. Reghu Raj, P. C. Rafeeque

Published in: Proceedings of Third International Conference on Computing and Communication Networks

Publisher: Springer Nature Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This paper proposes a Malayalam Dataset containing 40000 instances, intended to assist the researchers working on Question Answering systems. This data set is the first of its kind. The Facebook bAbI dataset was translated to create this Malayalam bAbI (or, MbAbI) dataset. It comprises 20 different tasks of varying complexity. The tasks include simple supporting facts to complex path finding tasks. We have machine translated these tasks from English to Malayalam and tested the baseline QA models on these datasets. We have obtained state-of-the-art results with MbAbI dataset using deep learning and transformer models. In order to benefit the low-resource Malayalam research community we have made this dataset publicly available.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Anuranjana, K., Rao, V. and Mamidi, R.: Hindirc: a dataset for reading comprehension in hindi. In: 0th International Conference on Computational Linguistics and Intelligent Text (2019) Anuranjana, K., Rao, V. and Mamidi, R.: Hindirc: a dataset for reading comprehension in hindi. In: 0th International Conference on Computational Linguistics and Intelligent Text (2019)
2.
go back to reference Chen, D., Mooney, R.: Learning to interpret natural language navigation instructions from observations. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 25 (2011) Chen, D., Mooney, R.: Learning to interpret natural language navigation instructions from observations. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 25 (2011)
3.
go back to reference Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018) Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:​1810.​04805 (2018)
4.
go back to reference Grigar, D.: Twisty Little Passages: An Approach to Interactive Fiction by Nick Montfort, p. 286. MIT Press, Cambridge (2003). Illus. Trade. ISBN: 0-262-13436-5, 2005 Grigar, D.: Twisty Little Passages: An Approach to Interactive Fiction by Nick Montfort, p. 286. MIT Press, Cambridge (2003). Illus. Trade. ISBN: 0-262-13436-5, 2005
5.
go back to reference Hermann, K.M., Kocisky, T., Grefenstette, E., Espeholt, L., Kay, W., Suleyman, M., Blunsom, P.: Teaching machines to read and comprehend. Adv. Neural Inf. Process. Syst. 28 (2015) Hermann, K.M., Kocisky, T., Grefenstette, E., Espeholt, L., Kay, W., Suleyman, M., Blunsom, P.: Teaching machines to read and comprehend. Adv. Neural Inf. Process. Syst. 28 (2015)
6.
go back to reference Hill, F., Bordes, A., Chopra, S., Weston, J.: The goldilocks principle: reading children’s books with explicit memory representations. arXiv preprint arXiv:1511.02301 (2015) Hill, F., Bordes, A., Chopra, S., Weston, J.: The goldilocks principle: reading children’s books with explicit memory representations. arXiv preprint arXiv:​1511.​02301 (2015)
7.
go back to reference Levesque, H., Davis, E. and Morgenstern, L.: The winograd schema challenge. In: Thirteenth International Conference on the Principles of Knowledge Representation and Reasoning (2012) Levesque, H., Davis, E. and Morgenstern, L.: The winograd schema challenge. In: Thirteenth International Conference on the Principles of Knowledge Representation and Reasoning (2012)
8.
9.
go back to reference Lin, J., Quan, D., Sinha, V., Bakshi, K., Huynh, D., Katz, B., Karger, D.R.: What makes a good answer? the role of context in question answering. In: INTERACT (2003) Lin, J., Quan, D., Sinha, V., Bakshi, K., Huynh, D., Katz, B., Karger, D.R.: What makes a good answer? the role of context in question answering. In: INTERACT (2003)
11.
go back to reference Ojokoh, B., Adebisi, E.: A review of question answering systems. J. Web Engin. 717–758 (2018) Ojokoh, B., Adebisi, E.: A review of question answering systems. J. Web Engin. 717–758 (2018)
12.
go back to reference Reji Rahmath K, PC Reghu Raj, and Rafeeque PC. Malayalam question answering system using deep learning approaches. IETE J. Res. 1–13 (2022) Reji Rahmath K, PC Reghu Raj, and Rafeeque PC. Malayalam question answering system using deep learning approaches. IETE J. Res. 1–13 (2022)
13.
go back to reference Rogers, A., Gardner, M., Augenstein, I.: Qa dataset explosion: A taxonomy of nlp resources for question answering and reading comprehension. ACM Comput. Surv. 55(10), 1–45 (2023)CrossRef Rogers, A., Gardner, M., Augenstein, I.: Qa dataset explosion: A taxonomy of nlp resources for question answering and reading comprehension. ACM Comput. Surv. 55(10), 1–45 (2023)CrossRef
14.
go back to reference Serban, I.V., García-Durán, A., Gulcehre, C., Ahn, S., Chandar, S., Courville, A., Bengio, Y.: Generating factoid questions with recurrent neural networks: The 30m factoid question-answer corpus. arXiv preprint arXiv:1603.06807 (2016) Serban, I.V., García-Durán, A., Gulcehre, C., Ahn, S., Chandar, S., Courville, A., Bengio, Y.: Generating factoid questions with recurrent neural networks: The 30m factoid question-answer corpus. arXiv preprint arXiv:​1603.​06807 (2016)
15.
go back to reference Stelmakh, I., Luan, Y., Dhingra, B. and Chang, M.W.: Asqa: factoid questions meet long-form answers. arXiv preprint arXiv:2204.06092 (2022) Stelmakh, I., Luan, Y., Dhingra, B. and Chang, M.W.: Asqa: factoid questions meet long-form answers. arXiv preprint arXiv:​2204.​06092 (2022)
16.
go back to reference Wang, M., et al.: A survey of answer extraction techniques in factoid question answering. Comput. Linguist. 1(1), 1–14 (2006) Wang, M., et al.: A survey of answer extraction techniques in factoid question answering. Comput. Linguist. 1(1), 1–14 (2006)
17.
go back to reference Wanjawa, B.W., Wanzare, L.D., Indede, F., McOnyango, O., Muchemi, L., Ombui, E.: Kenswquad-a question answering dataset for swahili low-resource language. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 22(4), 1–20 (2023) Wanjawa, B.W., Wanzare, L.D., Indede, F., McOnyango, O., Muchemi, L., Ombui, E.: Kenswquad-a question answering dataset for swahili low-resource language. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 22(4), 1–20 (2023)
18.
go back to reference Weston, J., Bordes, A., Chopra, S., Rush, A.M., Van Merriënboer, B., Joulin, A., Mikolov, T.: Towards ai-complete question answering: a set of prerequisite toy tasks. arXiv preprint arXiv:1502.05698 (2015) Weston, J., Bordes, A., Chopra, S., Rush, A.M., Van Merriënboer, B., Joulin, A., Mikolov, T.: Towards ai-complete question answering: a set of prerequisite toy tasks. arXiv preprint arXiv:​1502.​05698 (2015)
19.
go back to reference Yu, Y., Si, X., Hu, C., Zhang, J.: A review of recurrent neural networks: Lstm cells and network architectures. Neural Comput. 31(7), 1235–1270 (2019)MathSciNetCrossRef Yu, Y., Si, X., Hu, C., Zhang, J.: A review of recurrent neural networks: Lstm cells and network architectures. Neural Comput. 31(7), 1235–1270 (2019)MathSciNetCrossRef
Metadata
Title
MbAbI: A Benchmark Dataset for Malayalam Text Understanding and Reasoning
Authors
K. Reji Rahmath
P. C. Reghu Raj
P. C. Rafeeque
Copyright Year
2024
Publisher
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-97-0892-5_54