Skip to main content

2024 | OriginalPaper | Buchkapitel

MbAbI: A Benchmark Dataset for Malayalam Text Understanding and Reasoning

verfasst von : K. Reji Rahmath, P. C. Reghu Raj, P. C. Rafeeque

Erschienen in: Proceedings of Third International Conference on Computing and Communication Networks

Verlag: Springer Nature Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper proposes a Malayalam Dataset containing 40000 instances, intended to assist the researchers working on Question Answering systems. This data set is the first of its kind. The Facebook bAbI dataset was translated to create this Malayalam bAbI (or, MbAbI) dataset. It comprises 20 different tasks of varying complexity. The tasks include simple supporting facts to complex path finding tasks. We have machine translated these tasks from English to Malayalam and tested the baseline QA models on these datasets. We have obtained state-of-the-art results with MbAbI dataset using deep learning and transformer models. In order to benefit the low-resource Malayalam research community we have made this dataset publicly available.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Anuranjana, K., Rao, V. and Mamidi, R.: Hindirc: a dataset for reading comprehension in hindi. In: 0th International Conference on Computational Linguistics and Intelligent Text (2019) Anuranjana, K., Rao, V. and Mamidi, R.: Hindirc: a dataset for reading comprehension in hindi. In: 0th International Conference on Computational Linguistics and Intelligent Text (2019)
2.
Zurück zum Zitat Chen, D., Mooney, R.: Learning to interpret natural language navigation instructions from observations. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 25 (2011) Chen, D., Mooney, R.: Learning to interpret natural language navigation instructions from observations. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 25 (2011)
3.
Zurück zum Zitat Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018) Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:​1810.​04805 (2018)
4.
Zurück zum Zitat Grigar, D.: Twisty Little Passages: An Approach to Interactive Fiction by Nick Montfort, p. 286. MIT Press, Cambridge (2003). Illus. Trade. ISBN: 0-262-13436-5, 2005 Grigar, D.: Twisty Little Passages: An Approach to Interactive Fiction by Nick Montfort, p. 286. MIT Press, Cambridge (2003). Illus. Trade. ISBN: 0-262-13436-5, 2005
5.
Zurück zum Zitat Hermann, K.M., Kocisky, T., Grefenstette, E., Espeholt, L., Kay, W., Suleyman, M., Blunsom, P.: Teaching machines to read and comprehend. Adv. Neural Inf. Process. Syst. 28 (2015) Hermann, K.M., Kocisky, T., Grefenstette, E., Espeholt, L., Kay, W., Suleyman, M., Blunsom, P.: Teaching machines to read and comprehend. Adv. Neural Inf. Process. Syst. 28 (2015)
6.
Zurück zum Zitat Hill, F., Bordes, A., Chopra, S., Weston, J.: The goldilocks principle: reading children’s books with explicit memory representations. arXiv preprint arXiv:1511.02301 (2015) Hill, F., Bordes, A., Chopra, S., Weston, J.: The goldilocks principle: reading children’s books with explicit memory representations. arXiv preprint arXiv:​1511.​02301 (2015)
7.
Zurück zum Zitat Levesque, H., Davis, E. and Morgenstern, L.: The winograd schema challenge. In: Thirteenth International Conference on the Principles of Knowledge Representation and Reasoning (2012) Levesque, H., Davis, E. and Morgenstern, L.: The winograd schema challenge. In: Thirteenth International Conference on the Principles of Knowledge Representation and Reasoning (2012)
8.
Zurück zum Zitat Lim, S., Kim, M., Lee, J.: Korquad1. 0: Korean qa dataset for machine reading comprehension. arXiv preprint arXiv:1909.07005 (2019) Lim, S., Kim, M., Lee, J.: Korquad1. 0: Korean qa dataset for machine reading comprehension. arXiv preprint arXiv:​1909.​07005 (2019)
9.
Zurück zum Zitat Lin, J., Quan, D., Sinha, V., Bakshi, K., Huynh, D., Katz, B., Karger, D.R.: What makes a good answer? the role of context in question answering. In: INTERACT (2003) Lin, J., Quan, D., Sinha, V., Bakshi, K., Huynh, D., Katz, B., Karger, D.R.: What makes a good answer? the role of context in question answering. In: INTERACT (2003)
10.
11.
Zurück zum Zitat Ojokoh, B., Adebisi, E.: A review of question answering systems. J. Web Engin. 717–758 (2018) Ojokoh, B., Adebisi, E.: A review of question answering systems. J. Web Engin. 717–758 (2018)
12.
Zurück zum Zitat Reji Rahmath K, PC Reghu Raj, and Rafeeque PC. Malayalam question answering system using deep learning approaches. IETE J. Res. 1–13 (2022) Reji Rahmath K, PC Reghu Raj, and Rafeeque PC. Malayalam question answering system using deep learning approaches. IETE J. Res. 1–13 (2022)
13.
Zurück zum Zitat Rogers, A., Gardner, M., Augenstein, I.: Qa dataset explosion: A taxonomy of nlp resources for question answering and reading comprehension. ACM Comput. Surv. 55(10), 1–45 (2023)CrossRef Rogers, A., Gardner, M., Augenstein, I.: Qa dataset explosion: A taxonomy of nlp resources for question answering and reading comprehension. ACM Comput. Surv. 55(10), 1–45 (2023)CrossRef
14.
Zurück zum Zitat Serban, I.V., García-Durán, A., Gulcehre, C., Ahn, S., Chandar, S., Courville, A., Bengio, Y.: Generating factoid questions with recurrent neural networks: The 30m factoid question-answer corpus. arXiv preprint arXiv:1603.06807 (2016) Serban, I.V., García-Durán, A., Gulcehre, C., Ahn, S., Chandar, S., Courville, A., Bengio, Y.: Generating factoid questions with recurrent neural networks: The 30m factoid question-answer corpus. arXiv preprint arXiv:​1603.​06807 (2016)
15.
Zurück zum Zitat Stelmakh, I., Luan, Y., Dhingra, B. and Chang, M.W.: Asqa: factoid questions meet long-form answers. arXiv preprint arXiv:2204.06092 (2022) Stelmakh, I., Luan, Y., Dhingra, B. and Chang, M.W.: Asqa: factoid questions meet long-form answers. arXiv preprint arXiv:​2204.​06092 (2022)
16.
Zurück zum Zitat Wang, M., et al.: A survey of answer extraction techniques in factoid question answering. Comput. Linguist. 1(1), 1–14 (2006) Wang, M., et al.: A survey of answer extraction techniques in factoid question answering. Comput. Linguist. 1(1), 1–14 (2006)
17.
Zurück zum Zitat Wanjawa, B.W., Wanzare, L.D., Indede, F., McOnyango, O., Muchemi, L., Ombui, E.: Kenswquad-a question answering dataset for swahili low-resource language. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 22(4), 1–20 (2023) Wanjawa, B.W., Wanzare, L.D., Indede, F., McOnyango, O., Muchemi, L., Ombui, E.: Kenswquad-a question answering dataset for swahili low-resource language. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 22(4), 1–20 (2023)
18.
Zurück zum Zitat Weston, J., Bordes, A., Chopra, S., Rush, A.M., Van Merriënboer, B., Joulin, A., Mikolov, T.: Towards ai-complete question answering: a set of prerequisite toy tasks. arXiv preprint arXiv:1502.05698 (2015) Weston, J., Bordes, A., Chopra, S., Rush, A.M., Van Merriënboer, B., Joulin, A., Mikolov, T.: Towards ai-complete question answering: a set of prerequisite toy tasks. arXiv preprint arXiv:​1502.​05698 (2015)
19.
Zurück zum Zitat Yu, Y., Si, X., Hu, C., Zhang, J.: A review of recurrent neural networks: Lstm cells and network architectures. Neural Comput. 31(7), 1235–1270 (2019)MathSciNetCrossRef Yu, Y., Si, X., Hu, C., Zhang, J.: A review of recurrent neural networks: Lstm cells and network architectures. Neural Comput. 31(7), 1235–1270 (2019)MathSciNetCrossRef
Metadaten
Titel
MbAbI: A Benchmark Dataset for Malayalam Text Understanding and Reasoning
verfasst von
K. Reji Rahmath
P. C. Reghu Raj
P. C. Rafeeque
Copyright-Jahr
2024
Verlag
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-97-0892-5_54