Top

Published in:

01-06-2023

FarsNewsQA: a deep learning-based question answering system for the Persian news articles

Authors: Arefeh Kazemi, Zahra Zojaji, Mahdi Malverdi, Jamshid Mozafari, Fatemeh Ebrahimi, Negin Abadani, Mohammad Reza Varasteh, Mohammad Ali Nematbakhsh

Published in: Discover Computing | Issue 1/2023

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Nowadays, a considerable volume of news articles is produced daily by news agencies worldwide. Since there is an extensive volume of news on the web, finding exact answers to the users’ questions is not a straightforward task. Developing Question Answering (QA) systems for the news articles can tackle this challenge. Due to the lack of studies on Persian QA systems and the importance and wild applications of QA systems in the news domain, this research aims to design and implement a QA system for the Persian news articles. This is the first attempt to develop a Persian QA system in the news domain to our best knowledge. We first create FarsQuAD: a Persian QA dataset for the news domain. We analyze the type and complexity of the users’ questions about the Persian news. The results show that What and Who questions have the most and Why and Which questions have the least occurrences in the Persian news domain. The results also indicate that the users usually raise complex questions about the Persian news. Then we develop FarsNewsQA: a QA system for answering questions about Persian news. We developed three models of the FarsNewsQA using BERT, ParsBERT, and ALBERT. The best version of the FarsNewsQA offers an F1 score of 75.61%, which is comparable with that of QA system on the English SQuAD dataset made by the Stanford university, and shows the new Bert-based technologies works well for Persian news QA systems.

previous article Shop by image: characterizing visual search in e-commerce

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Usage statistics of content languages for websites, https://w3techs.com/technologies/overview/content_language, retrevied October 2021.

https://www.hamshahrionline.ir.

https://www.yjc.news.

A phrase in Persian that means: Dataset Collection Platform

https://colab.research.google.com.

Abadani, N., Mozafari, J., Fatemi, A., Nematbakhsh, M., & Kazemi, A. (2021). Parsquad: Persian question answering dataset based on machine translation of squad 2.0. International Journal of Web Research, 4(1), 34–46.

Boreshban, Y., Yousefinasab, H., & Mirroshandel, S. A. (2018). Providing a religious corpus of question answering system in Persian. Signal and Data Processing, 15(1), 87–102.CrossRef

Calijorne Soares, M. A., & Parreiras, F. S. (2020). A literature review on question answering techniques, paradigms and systems. Journal of King Saud University - Computer and Information Sciences, 32(6), 635–646.CrossRef

Carrino, C.P., Costa-juss‘a, M.R., Fonollosa, J.A.R. (2020, May). Automatic Spanish translation of SQuAD dataset for multi-lingual question answering. In Proceedings of the 12th language resources and evaluation conference (pp. 5515-5523). Marseille, France: European Language Resources Association. Retrieved from https://aclanthology.org/2020.lrec-1.677

Clark, J. H., Choi, E., Collins, M., Garrette, D., Kwiatkowski, T., Nikolaev, V., & Palomaki, J. (2020). TyDi QA: A benchmark for information-seeking question answering in typologically diverse languages. Transactions of the Association for Computational Linguistics, 8, 454–470.CrossRef

Croce, D., Zelenanska, A., Basili, R. (2018). Neural learning for question answering in italian. In C. Ghidini, B. Magnini, A. Passerini, & P. Traverso (Eds.), In Ai*ia 2018 - advances in artificial intelligence (pp. 389-402). Cham: Springer International Publishing

Devlin, J., Chang, M.-W., Lee, K., Toutanova, K. (2019, June). BERT: Pre-training of deep bidirectional transformers for language understanding Proceedings of the 2019 conference of the north American chapter of the association for computational linguistics: Human language technolo-gies, volume 1 (long and short papers) (pp. 4171-4186). Minneapolis, Minnesota: Association for Computational Linguistics. Retrieved from https://aclanthology.org/N19-1423 1 0.18653/v1/N19-1423

Efimov, P., Chertok, A., Boytsov, L., Braslavski, P. (2020). Sberquad û russian reading comprehension dataset: Description and analysis. A. Arampatzis et al. (Eds.), In Experimental ir meets multilinguality, multimodality, and interaction (pp. 3-15). Cham: Springer International Publishing.

Etezadi, R., & Shamsfard, M. (2020). Pecoq: A dataset for persian com-plex question answering over knowledge graph. 2020 11th international conference on information and knowledge technology (ikt) (p. 102-106). 10.1109/IKT51791.2020.934561

Farahani, M., Gharachorloo, M., Farahani, M., & Manthouri, M. (2021). Parsbert: Transformer-based model for persian language understanding. Neural Processing Letters, 53(6), 3831–3847.CrossRef

Green, B.F., Wolf, A.K., Chomsky, C., Laughery, K. (1961). Baseball: An automatic question-answerer. In Papers presented at the may 9-11, 1961, western joint ire-aiee-acm computer conference (p. 219-224). New York, NY, USA: Association for Computing Machinery. Retrieved from https://doi.org/10.1145/1460690.1460714 10.1145/1460690.1460714

Huang, Z., Xu, S., Hu, M., Wang, X., Qiu, J., Fu, Y., & Wang, C. (2020). Recent trends in deep learning based open-domain textual question answering systems. IEEE Access, 8, 94341–94356. https://doi.org/10.1109/ACCESS.2020.2988903CrossRef

Humphrey, S. M., Névéol, A., Gobeil, J., Ruch, P., Darmoni, S. J., & Browne, A. (2009). Comparing a rule based vs statistical system for auto-matic categorization of MEDLINE documents according to biomedical specialty. J Am Soc Inf Sci Technol, 60(12), 2530–2539.CrossRef

Ishwari, K.S.D., Aneeze, A.K.R.R., Sudheesan, S., Karunaratne, H.J.D.A., Nugaliyadde, A., Mallawarachchi, Y. (2019). Advances in natu-ral language question answering: A review. CoRR, abs/1904.05276 . Retrieved from http://arxiv.org/abs/1904.05276 https://arxiv.org/abs/ 1904.05276 https://doi.org/10.48550/arXiv.1904.05276

Jaccard, P. (1912). The distribution of the flora in the alpine zone.1. New Phytologist, 11(2), 37–50.CrossRef

Jurafsky, D., & Martin, J. H. (2000). Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition (1st ed.). USA: Prentice Hall PTR.

Karpukhin, V., Oguz, B., Min, S., Wu, L., Edunov, S., Chen, D., Yih, W. (2020). Dense passage retrieval for open-domain question answering. CoRR, abs/2004.04906 . Retrieved from https://arxiv.org/abs/2004.04906 https://arxiv.org/abs/2004.04906 https://doi.org/10.48550/arXiv.2004.04906

Kazemi, A., Mozafari, J., & Nematbakhsh, M. A. (2022). Persianquad: The native question answering dataset for the Persian language. IEEE Access, 10, 26045–26057. https://doi.org/10.1109/ACCESS.2022.3157289CrossRef

Keraron, R., Lancrenon, G., Bras, M., Allary, F., Moyse, G., Scialom, T., . . . Staiano, J. (2020, May). Project PIAF: Building a native French question-answering dataset. In Proceedings of the 12th lan-guage resources and evaluation conference (pp. 5481-5490). Marseille, France: European Language Resources Association. Retrieved from https://aclanthology.org/2020.lrec-1.673

Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A., Alberti, C., & Petrov, S. (2019). Natural questions: A benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7, 452–466.CrossRef

Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R. (2019). ALBERT: A lite BERT for self-supervised learning of lan-guage representations. CoRR, abs/1909.11942 . Retrieved from http://arxiv.org/abs/1909.11942 https://arxiv.org/abs/1909.11942 https://doi.org/10.48550/arXiv.1909.11942

Lee, K., Yoon, K., Park, S., Hwang, S.-w. (2018, May). Semi-supervised training data generation for multilingual question answering. In Proceedings othe eleventh international conference on language resources and evaluation (LREC 2018). Miyazaki, Japan: European Language Resources Association (ELRA). Retrieved from https://aclanthology.org/L18- 1437

Lei, T., Shi, Z., Liu, D., Yang, L., Zhu, F. (2018). A novel cnn-based method for question classification in intelligent question answering. In Proceedings of the 2018 international conference on algorithms, computing and artificial intelligence. New York, NY, USA: Association for Computing Machinery. Retrieved from https://doi.org/10.1145/3302425.3302483 10.1145/3302425.3302483

Lim, S., Kim, M., Lee, J. (2019). Korquad1.0: Korean QA dataset for machine reading comprehension. CoRR, abs/1909.07005 . Retrieved from http://arxiv.org/abs/1909.07005 https://arxiv.org/abs/1909.07005 https://doi.org/10.48550/arXiv.1909.07005

Mozannar, H., Maamary, E., El Hajal, K., Hajj, H. (2019, August). Neural Arabic question answering. Proceedings of the fourth arabic natural language processing workshop (pp. 108-118). Florence, Italy: Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/W19-4612 10.18653/v1/W19 -4612

Nguyen, T., Rosenberg, M., Song, X., Gao, J., Tiwary, S., Majumder, R., Deng, L. (2016). Ms marco: A human generated machine reading comprehension dataset. Coco@nips. Retrieved from http://ceur-ws.org/Vol1773/CoCoNIPS 2016 paper

Nishida, K., Saito, I., Otsuka, A., Asano, H., Tomita, J. (2018). Retrieve and-read: Multi-task learning of information retrieval and reading comprehension. In Proceedings of the 27th acm international conference on information and knowledge management (p. 647-656). New York, NY, USA: Association for Computing Machinery. Retrieved from https://doi.org/10.1145/3269206.3271702 10.1145/3269206.3271702

Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P. (2016, November). SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 conference on empirical methods in natural language processing (pp. 2383-2392). Austin, Texas: Association for Computational Linguistics. Retrieved from https://aclanthology.org/D16-1264 10.18653/v1/ D16-1264

Shao, C., Liu, T., Lai, Y., Tseng, Y., Tsai, S. (2018). DRCD: A Chinese machine reading comprehension dataset. CoRR, abs/1806.00920 . Retrieved from arXiv:http://arxiv.org/abs/1806.00920https://doi.org/10.48550/arXiv.1806.00920

Veisi, H., & Shandi, H. F. (2020). A Persian medical question answering system. International Journal on Artificial Intelligence Tools, 29(06), 2050019. https://doi.org/10.1142/S0218213020500190CrossRef

Voorhees, E.M., & Tice, D.M. (2000, May). The TREC-8 question answering track. In Proceedings of the second international conference on language resources and evaluation (LRECÆ00). Athens, Greece: European Language Resources Association (ELRA). Retrieved from http://www.lrec-conf.org/proceedings/lrec2000/pdf/26.pd

Woods, W.A. (1973). Progress in natural language understanding: An application to lunar geology. In Proceedings of the june 4-8, 1973, national computer conference and exposition (p. 441-450). New York, NY, USA: Association for Computing Machinery. Retrieved from https://doi.org/10.1145/1499586.1499695 10.1145/1499586.1499695

Xia, W., Zhu, W., Liao, B., Chen, M., Cai, L., & Huang, L. (2018). Novel architecture for long short-term memory used in question classification. Neurocomputing, 299, 20–31.CrossRef

Yang, Y., Yih, W.-t., Meek, C. (2015, September). WikiQA: A challenge dataset for open-domain question answering. In Proceedings of the 2015 conference on empirical methods in natural language processing (pp. 2013-2018). Lisbon, Portugal: Association for Computational Linguistics. Retrieved from https://aclanthology.org/D15-1237 10.18653/v1/ D15-12

Zhu, F., Lei, W., Wang, C., Zheng, J., Poria, S., Chua, T. (2021). Retrieving and reading: A comprehensive survey on open-domain question answering. CoRR, abs/2101.00774 . Retrieved from https://arxiv.org/abs/2101.00774 https://arxiv.org/abs/2101.00774 https://doi.org/10.48550/arXiv.2101.00774

Title: FarsNewsQA: a deep learning-based question answering system for the Persian news articles
Authors: Arefeh Kazemi
Zahra Zojaji
Mahdi Malverdi
Jamshid Mozafari
Fatemeh Ebrahimi
Negin Abadani
Mohammad Reza Varasteh
Mohammad Ali Nematbakhsh
Publication date: 01-06-2023
Publisher: Springer Netherlands
Published in: Discover Computing / Issue 1/2023
Print ISSN: 2948-2984
Electronic ISSN: 2948-2992
DOI: https://doi.org/10.1007/s10791-023-09417-2

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner