Skip to main content

Advertisement

Log in

An improvement of Bengali factoid question answering system using unsupervised statistical methods

  • Published:
Sādhanā Aims and scope Submit manuscript

Abstract

Virtual Assistants (VA) and Chatbots have boosted the pace of research in Question Answering (QA) system. QA systems are supposed to return the answers of the questions by processing the backend repository. All the questions and the text in the repositories are in natural languages only. Substantial number of projects are executed for building QA systems in high resource languages. In case of low resource languages, the progress is still in early stage. In this work, we have designed, developed and evaluated the performance of a factoid QA system in a low resource language—Bengali. The system takes the questions from the human and then retrieves all the prospective answers from a multi-domain repository. Based on six parameters, the answers are ranked and returned. Therefore, the performance of the system is evaluated and compared with earlier systems using standard metrics. The algorithm is tested on two repositories. First is the TDIL corpus containing large collection of famous Bengali literature, which was developed in the Technology Development of Indian Languages (TDIL) project. Second is the translated SQuAD which is the Bengali translation of Stanford Question Answering Dataset. The accurate answer is ranked by the system as 1st in 88.23% cases. Accuracy and F1 score are calculated as 97.64% and 98.5%, respectively for TDIL corpus and 97.16% and 98.51% for translated SQuAD based on the performance evaluation by confusion matrix.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1
Figure 2
Figure 3

Similar content being viewed by others

References

  1. Banerjee S, Naskar S K and Bandyopadhyay S 2014 Bfqa: a Bengali factoid question answering system. In: International Conference on Text, Speech, and Dialogue, pp. 217–224 Springer, Cham

  2. http://tdil.meity.gov.in/ (as on 05.07.2020 5:00 AM)

  3. Rajpurkar P, Zhang J, Lopyrev K and Liang P 2016 Squad: 100,000+ questions for machine comprehension of text. arXiv preprint https://arxiv.org/abs/1606.05250

  4. Gao T, Fodor P and Kifer M 2019 Querying knowledge via multi-hop English questions. Theory Pract. Log. Program. 19(5–6): 636–653

    Article  MathSciNet  Google Scholar 

  5. Kim K M, Nan C J, Heo M O and Zhang B T 2016 Pororobot: child tutoring robot for English education. In: International Symposium on Perception, Action, and Cognitive Systems (PACS), October

  6. Day M Y, Tsai C C, Chuang W C, Lin J K, Chang H Y, Sun T J, Tsai Y J, Chiang Y H, Han C Z, Chen W M and Tsai Y D 2016 IMTKU Question answering system for world history exams at NTCIR-12 QA Lab2. In: NTCIR

  7. Liu Y, Xu B, Yang Y, Chung T and Zhang P 2019 Constructing a hybrid automatic Q&A system integrating knowledge graph and information retrieval technologies. In: Foundations and trends in smart learning, pp. 67–76. Springer, Singapore

  8. Diefenbach D, Amjad S, Both A, Singh K and Maret P 2017 Trill: a reusable front-end for QA systems. In: European Semantic Web Conference, pp. 48–53. Springer, Cham

  9. Abdi A, Idris N and Ahmad Z 2018 QAPD: an ontology-based question answering system in the physics domain. Soft Comput. 22(1): 213–230

    Article  Google Scholar 

  10. Siciliani L, Basile P, Semeraro G and Mennitti M 2019 An Italian question answering system for structured data based on controlled natural languages. In: CLiC-it

  11. Ketsmur M, Rodrigues M and Teixeira A 2017 DBPEDIA based factual questions answering system. IADIS Int. J. WWW/Internet 15(1): 80–95

    Google Scholar 

  12. Gunawan A A, Mulyono P R and Budiharto W 2018 Indonesian question answering system for solving arithmetic word problems on intelligent humanoid robot. Proc. Comput. Sci. 135: 719–726

    Article  Google Scholar 

  13. Schubotz M, Scharpf P, Dudhat K, Nagar Y, Hamborg F and Gipp B 2018 Introducing mathqa: a math-aware question answering system. Inf. Discov. Deliv. 46: 214–224

    Google Scholar 

  14. Dhanjal G S, Sharma S and Sarao P K 2016 Gravity based Punjabi question answering system. Int. J. Comput. Appl. 147(3): 21–30

    Google Scholar 

  15. Walia T S, Josan G S and Singh A 2019 An efficient automated answer scoring system for Punjabi language. Egypt. Inform. J. 20(2): 89–96

    Article  Google Scholar 

  16. Woods W A 1973 Progress in natural language understanding: an application to lunar geology. In: Proceedings of the June 4–8, National Computer Conference and Exposition, pp. 441–450

  17. Green Jr B F, Wolf A K, Chomsky C and Laughery K 1961 Baseball: an automatic question-answerer. In: Papers Presented at the May 9–11, 1961, Western Joint IRE-AIEE-ACM Computer Conference, pp. 219–224

  18. Lan Z, Chen M, Goodman S, Gimpel K, Sharma P and Soricut R 2019 Albert: a lite bert for self-supervised learning of language representations. arXiv preprint https://arxiv.org/abs/1909.11942

  19. Zhang Z, Yang J and Zhao H 2020 Retrospective reader for machine reading comprehension. arXiv preprint https://arxiv.org/abs/2001.09694

  20. Banerjee S, Naskar S K, Rosso P and Bandyopadhyay S 2019 Classifier combination approach for question classification for Bengali question answering system. Sadhana 44(12): 1–14

    Article  Google Scholar 

  21. Das A and Saha D 2019 A Novel Approach to Enhance the Performance of Semantic Search in Bengali using Neural Net and Other Classification Techniques. https://arxiv.org/abs/1911.01256v1

  22. Das A 2020 An alternate approach for question answering system in Bengali language using classification techniques. INFOCOMP J. Comput. Sci. 19(1): 1–9

    Google Scholar 

  23. Deepak Gupta D, Kumari S, Ekbal A and Bhattacharyya P 2018 MMQA: a multi-domain multi-lingual question-answering framework for English and Hindi. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miayazaki, Japan

  24. Deepak Gupta D, Lenka P, Ekbal A and Bhattacharyya P 2018 Brussels, Belgium, uncovering code-mixed challenges: a framework for linguistically driven question generation and neural based question answering. In: Proceedings of the 22nd Conference on Computational Natural Language Learning, pp. 119–130

  25. Chandu K R, Chinnakotla M K, Black A W and Shrivastava M 2017 WebShodh: A Code Mixed Factoid Question Answering System for Web, pp. 104–111. CLEF

  26. https://www.machinelearningplus.com/nlp/cosine-similarity/ (on 01.02.2020 at 7 pm)

  27. http://pythonfiddle.com/shannon-entropy-calculation/ (on 01.02.2020 at 7 pm)

  28. https://towardsdatascience.com/the-intuition-behind-shannons-entropy-e74820fe9800 (on 01.02.2020 at 7 pm)

  29. https://arxiv.org/pdf/1405.2061.pdf (on 01.02.2020 at 7 pm)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arijit Das.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Das, A., Mandal, J., Danial, Z. et al. An improvement of Bengali factoid question answering system using unsupervised statistical methods. Sādhanā 47, 2 (2022). https://doi.org/10.1007/s12046-021-01765-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12046-021-01765-3

Keywords

Navigation