Abstract
Virtual Assistants (VA) and Chatbots have boosted the pace of research in Question Answering (QA) system. QA systems are supposed to return the answers of the questions by processing the backend repository. All the questions and the text in the repositories are in natural languages only. Substantial number of projects are executed for building QA systems in high resource languages. In case of low resource languages, the progress is still in early stage. In this work, we have designed, developed and evaluated the performance of a factoid QA system in a low resource language—Bengali. The system takes the questions from the human and then retrieves all the prospective answers from a multi-domain repository. Based on six parameters, the answers are ranked and returned. Therefore, the performance of the system is evaluated and compared with earlier systems using standard metrics. The algorithm is tested on two repositories. First is the TDIL corpus containing large collection of famous Bengali literature, which was developed in the Technology Development of Indian Languages (TDIL) project. Second is the translated SQuAD which is the Bengali translation of Stanford Question Answering Dataset. The accurate answer is ranked by the system as 1st in 88.23% cases. Accuracy and F1 score are calculated as 97.64% and 98.5%, respectively for TDIL corpus and 97.16% and 98.51% for translated SQuAD based on the performance evaluation by confusion matrix.
Similar content being viewed by others
References
Banerjee S, Naskar S K and Bandyopadhyay S 2014 Bfqa: a Bengali factoid question answering system. In: International Conference on Text, Speech, and Dialogue, pp. 217–224 Springer, Cham
http://tdil.meity.gov.in/ (as on 05.07.2020 5:00 AM)
Rajpurkar P, Zhang J, Lopyrev K and Liang P 2016 Squad: 100,000+ questions for machine comprehension of text. arXiv preprint https://arxiv.org/abs/1606.05250
Gao T, Fodor P and Kifer M 2019 Querying knowledge via multi-hop English questions. Theory Pract. Log. Program. 19(5–6): 636–653
Kim K M, Nan C J, Heo M O and Zhang B T 2016 Pororobot: child tutoring robot for English education. In: International Symposium on Perception, Action, and Cognitive Systems (PACS), October
Day M Y, Tsai C C, Chuang W C, Lin J K, Chang H Y, Sun T J, Tsai Y J, Chiang Y H, Han C Z, Chen W M and Tsai Y D 2016 IMTKU Question answering system for world history exams at NTCIR-12 QA Lab2. In: NTCIR
Liu Y, Xu B, Yang Y, Chung T and Zhang P 2019 Constructing a hybrid automatic Q&A system integrating knowledge graph and information retrieval technologies. In: Foundations and trends in smart learning, pp. 67–76. Springer, Singapore
Diefenbach D, Amjad S, Both A, Singh K and Maret P 2017 Trill: a reusable front-end for QA systems. In: European Semantic Web Conference, pp. 48–53. Springer, Cham
Abdi A, Idris N and Ahmad Z 2018 QAPD: an ontology-based question answering system in the physics domain. Soft Comput. 22(1): 213–230
Siciliani L, Basile P, Semeraro G and Mennitti M 2019 An Italian question answering system for structured data based on controlled natural languages. In: CLiC-it
Ketsmur M, Rodrigues M and Teixeira A 2017 DBPEDIA based factual questions answering system. IADIS Int. J. WWW/Internet 15(1): 80–95
Gunawan A A, Mulyono P R and Budiharto W 2018 Indonesian question answering system for solving arithmetic word problems on intelligent humanoid robot. Proc. Comput. Sci. 135: 719–726
Schubotz M, Scharpf P, Dudhat K, Nagar Y, Hamborg F and Gipp B 2018 Introducing mathqa: a math-aware question answering system. Inf. Discov. Deliv. 46: 214–224
Dhanjal G S, Sharma S and Sarao P K 2016 Gravity based Punjabi question answering system. Int. J. Comput. Appl. 147(3): 21–30
Walia T S, Josan G S and Singh A 2019 An efficient automated answer scoring system for Punjabi language. Egypt. Inform. J. 20(2): 89–96
Woods W A 1973 Progress in natural language understanding: an application to lunar geology. In: Proceedings of the June 4–8, National Computer Conference and Exposition, pp. 441–450
Green Jr B F, Wolf A K, Chomsky C and Laughery K 1961 Baseball: an automatic question-answerer. In: Papers Presented at the May 9–11, 1961, Western Joint IRE-AIEE-ACM Computer Conference, pp. 219–224
Lan Z, Chen M, Goodman S, Gimpel K, Sharma P and Soricut R 2019 Albert: a lite bert for self-supervised learning of language representations. arXiv preprint https://arxiv.org/abs/1909.11942
Zhang Z, Yang J and Zhao H 2020 Retrospective reader for machine reading comprehension. arXiv preprint https://arxiv.org/abs/2001.09694
Banerjee S, Naskar S K, Rosso P and Bandyopadhyay S 2019 Classifier combination approach for question classification for Bengali question answering system. Sadhana 44(12): 1–14
Das A and Saha D 2019 A Novel Approach to Enhance the Performance of Semantic Search in Bengali using Neural Net and Other Classification Techniques. https://arxiv.org/abs/1911.01256v1
Das A 2020 An alternate approach for question answering system in Bengali language using classification techniques. INFOCOMP J. Comput. Sci. 19(1): 1–9
Deepak Gupta D, Kumari S, Ekbal A and Bhattacharyya P 2018 MMQA: a multi-domain multi-lingual question-answering framework for English and Hindi. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miayazaki, Japan
Deepak Gupta D, Lenka P, Ekbal A and Bhattacharyya P 2018 Brussels, Belgium, uncovering code-mixed challenges: a framework for linguistically driven question generation and neural based question answering. In: Proceedings of the 22nd Conference on Computational Natural Language Learning, pp. 119–130
Chandu K R, Chinnakotla M K, Black A W and Shrivastava M 2017 WebShodh: A Code Mixed Factoid Question Answering System for Web, pp. 104–111. CLEF
https://www.machinelearningplus.com/nlp/cosine-similarity/ (on 01.02.2020 at 7 pm)
http://pythonfiddle.com/shannon-entropy-calculation/ (on 01.02.2020 at 7 pm)
https://towardsdatascience.com/the-intuition-behind-shannons-entropy-e74820fe9800 (on 01.02.2020 at 7 pm)
https://arxiv.org/pdf/1405.2061.pdf (on 01.02.2020 at 7 pm)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Das, A., Mandal, J., Danial, Z. et al. An improvement of Bengali factoid question answering system using unsupervised statistical methods. Sādhanā 47, 2 (2022). https://doi.org/10.1007/s12046-021-01765-3
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12046-021-01765-3