Skip to main content
Top
Published in: Information Retrieval Journal 1/2023

01-06-2023

FarsNewsQA: a deep learning-based question answering system for the Persian news articles

Authors: Arefeh Kazemi, Zahra Zojaji, Mahdi Malverdi, Jamshid Mozafari, Fatemeh Ebrahimi, Negin Abadani, Mohammad Reza Varasteh, Mohammad Ali Nematbakhsh

Published in: Discover Computing | Issue 1/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Nowadays, a considerable volume of news articles is produced daily by news agencies worldwide. Since there is an extensive volume of news on the web, finding exact answers to the users’ questions is not a straightforward task. Developing Question Answering (QA) systems for the news articles can tackle this challenge. Due to the lack of studies on Persian QA systems and the importance and wild applications of QA systems in the news domain, this research aims to design and implement a QA system for the Persian news articles. This is the first attempt to develop a Persian QA system in the news domain to our best knowledge. We first create FarsQuAD: a Persian QA dataset for the news domain. We analyze the type and complexity of the users’ questions about the Persian news. The results show that What and Who questions have the most and Why and Which questions have the least occurrences in the Persian news domain. The results also indicate that the users usually raise complex questions about the Persian news. Then we develop FarsNewsQA: a QA system for answering questions about Persian news. We developed three models of the FarsNewsQA using BERT, ParsBERT, and ALBERT. The best version of the FarsNewsQA offers an F1 score of 75.61%, which is comparable with that of QA system on the English SQuAD dataset made by the Stanford university, and shows the new Bert-based technologies works well for Persian news QA systems.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Usage statistics of content languages for websites, https://​w3techs.​com/​technologies/​overview/​content_​language, retrevied October 2021.
 
4
A phrase in Persian that means: Dataset Collection Platform
 
Literature
go back to reference Abadani, N., Mozafari, J., Fatemi, A., Nematbakhsh, M., & Kazemi, A. (2021). Parsquad: Persian question answering dataset based on machine translation of squad 2.0. International Journal of Web Research, 4(1), 34–46. Abadani, N., Mozafari, J., Fatemi, A., Nematbakhsh, M., & Kazemi, A. (2021). Parsquad: Persian question answering dataset based on machine translation of squad 2.0. International Journal of Web Research, 4(1), 34–46.
go back to reference Abadani, N., Mozafari, J., Fatemi, A., Nematbakhsh, M., & Kazemi, A. (2021). Parsquad: Persian question answering dataset based on machine translation of squad 2.0. International Journal of Web Research, 4(1), 34–46. Abadani, N., Mozafari, J., Fatemi, A., Nematbakhsh, M., & Kazemi, A. (2021). Parsquad: Persian question answering dataset based on machine translation of squad 2.0. International Journal of Web Research, 4(1), 34–46.
go back to reference Boreshban, Y., Yousefinasab, H., & Mirroshandel, S. A. (2018). Providing a religious corpus of question answering system in Persian. Signal and Data Processing, 15(1), 87–102.CrossRef Boreshban, Y., Yousefinasab, H., & Mirroshandel, S. A. (2018). Providing a religious corpus of question answering system in Persian. Signal and Data Processing, 15(1), 87–102.CrossRef
go back to reference Calijorne Soares, M. A., & Parreiras, F. S. (2020). A literature review on question answering techniques, paradigms and systems. Journal of King Saud University - Computer and Information Sciences, 32(6), 635–646.CrossRef Calijorne Soares, M. A., & Parreiras, F. S. (2020). A literature review on question answering techniques, paradigms and systems. Journal of King Saud University - Computer and Information Sciences, 32(6), 635–646.CrossRef
go back to reference Carrino, C.P., Costa-juss‘a, M.R., Fonollosa, J.A.R. (2020, May). Automatic Spanish translation of SQuAD dataset for multi-lingual question answering. In Proceedings of the 12th language resources and evaluation conference (pp. 5515-5523). Marseille, France: European Language Resources Association. Retrieved from https://aclanthology.org/2020.lrec-1.677 Carrino, C.P., Costa-juss‘a, M.R., Fonollosa, J.A.R. (2020, May). Automatic Spanish translation of SQuAD dataset for multi-lingual question answering. In Proceedings of the 12th language resources and evaluation conference (pp. 5515-5523). Marseille, France: European Language Resources Association. Retrieved from https://​aclanthology.​org/​2020.​lrec-1.​677
go back to reference Clark, J. H., Choi, E., Collins, M., Garrette, D., Kwiatkowski, T., Nikolaev, V., & Palomaki, J. (2020). TyDi QA: A benchmark for information-seeking question answering in typologically diverse languages. Transactions of the Association for Computational Linguistics, 8, 454–470.CrossRef Clark, J. H., Choi, E., Collins, M., Garrette, D., Kwiatkowski, T., Nikolaev, V., & Palomaki, J. (2020). TyDi QA: A benchmark for information-seeking question answering in typologically diverse languages. Transactions of the Association for Computational Linguistics, 8, 454–470.CrossRef
go back to reference Croce, D., Zelenanska, A., Basili, R. (2018). Neural learning for question answering in italian. In C. Ghidini, B. Magnini, A. Passerini, & P. Traverso (Eds.), In Ai*ia 2018 - advances in artificial intelligence (pp. 389-402). Cham: Springer International Publishing Croce, D., Zelenanska, A., Basili, R. (2018). Neural learning for question answering in italian. In C. Ghidini, B. Magnini, A. Passerini, & P. Traverso (Eds.), In Ai*ia 2018 - advances in artificial intelligence (pp. 389-402). Cham: Springer International Publishing
go back to reference Devlin, J., Chang, M.-W., Lee, K., Toutanova, K. (2019, June). BERT: Pre-training of deep bidirectional transformers for language understanding Proceedings of the 2019 conference of the north American chapter of the association for computational linguistics: Human language technolo-gies, volume 1 (long and short papers) (pp. 4171-4186). Minneapolis, Minnesota: Association for Computational Linguistics. Retrieved from https://aclanthology.org/N19-1423 1 0.18653/v1/N19-1423 Devlin, J., Chang, M.-W., Lee, K., Toutanova, K. (2019, June). BERT: Pre-training of deep bidirectional transformers for language understanding Proceedings of the 2019 conference of the north American chapter of the association for computational linguistics: Human language technolo-gies, volume 1 (long and short papers) (pp. 4171-4186). Minneapolis, Minnesota: Association for Computational Linguistics. Retrieved from https://​aclanthology.​org/​N19-1423 1 0.18653/v1/N19-1423
go back to reference Efimov, P., Chertok, A., Boytsov, L., Braslavski, P. (2020). Sberquad û russian reading comprehension dataset: Description and analysis. A. Arampatzis et al. (Eds.), In Experimental ir meets multilinguality, multimodality, and interaction (pp. 3-15). Cham: Springer International Publishing. Efimov, P., Chertok, A., Boytsov, L., Braslavski, P. (2020). Sberquad û russian reading comprehension dataset: Description and analysis. A. Arampatzis et al. (Eds.), In Experimental ir meets multilinguality, multimodality, and interaction (pp. 3-15). Cham: Springer International Publishing.
go back to reference Etezadi, R., & Shamsfard, M. (2020). Pecoq: A dataset for persian com-plex question answering over knowledge graph. 2020 11th international conference on information and knowledge technology (ikt) (p. 102-106). 10.1109/IKT51791.2020.934561 Etezadi, R., & Shamsfard, M. (2020). Pecoq: A dataset for persian com-plex question answering over knowledge graph. 2020 11th international conference on information and knowledge technology (ikt) (p. 102-106). 10.1109/IKT51791.2020.934561
go back to reference Farahani, M., Gharachorloo, M., Farahani, M., & Manthouri, M. (2021). Parsbert: Transformer-based model for persian language understanding. Neural Processing Letters, 53(6), 3831–3847.CrossRef Farahani, M., Gharachorloo, M., Farahani, M., & Manthouri, M. (2021). Parsbert: Transformer-based model for persian language understanding. Neural Processing Letters, 53(6), 3831–3847.CrossRef
go back to reference Green, B.F., Wolf, A.K., Chomsky, C., Laughery, K. (1961). Baseball: An automatic question-answerer. In Papers presented at the may 9-11, 1961, western joint ire-aiee-acm computer conference (p. 219-224). New York, NY, USA: Association for Computing Machinery. Retrieved from https://doi.org/10.1145/1460690.1460714 10.1145/1460690.1460714 Green, B.F., Wolf, A.K., Chomsky, C., Laughery, K. (1961). Baseball: An automatic question-answerer. In Papers presented at the may 9-11, 1961, western joint ire-aiee-acm computer conference (p. 219-224). New York, NY, USA: Association for Computing Machinery. Retrieved from https://​doi.​org/​10.​1145/​1460690.​1460714 10.1145/1460690.1460714
go back to reference Humphrey, S. M., Névéol, A., Gobeil, J., Ruch, P., Darmoni, S. J., & Browne, A. (2009). Comparing a rule based vs statistical system for auto-matic categorization of MEDLINE documents according to biomedical specialty. J Am Soc Inf Sci Technol, 60(12), 2530–2539.CrossRef Humphrey, S. M., Névéol, A., Gobeil, J., Ruch, P., Darmoni, S. J., & Browne, A. (2009). Comparing a rule based vs statistical system for auto-matic categorization of MEDLINE documents according to biomedical specialty. J Am Soc Inf Sci Technol, 60(12), 2530–2539.CrossRef
go back to reference Ishwari, K.S.D., Aneeze, A.K.R.R., Sudheesan, S., Karunaratne, H.J.D.A., Nugaliyadde, A., Mallawarachchi, Y. (2019). Advances in natu-ral language question answering: A review. CoRR, abs/1904.05276 . Retrieved from http://arxiv.org/abs/1904.05276 https://arxiv.org/abs/ 1904.05276 https://doi.org/10.48550/arXiv.1904.05276 Ishwari, K.S.D., Aneeze, A.K.R.R., Sudheesan, S., Karunaratne, H.J.D.A., Nugaliyadde, A., Mallawarachchi, Y. (2019). Advances in natu-ral language question answering: A review. CoRR, abs/1904.05276 . Retrieved from http://​arxiv.​org/​abs/​1904.​05276 https://arxiv.org/abs/ 1904.05276 https://doi.org/10.48550/arXiv.1904.05276
go back to reference Jaccard, P. (1912). The distribution of the flora in the alpine zone.1. New Phytologist, 11(2), 37–50.CrossRef Jaccard, P. (1912). The distribution of the flora in the alpine zone.1. New Phytologist, 11(2), 37–50.CrossRef
go back to reference Jurafsky, D., & Martin, J. H. (2000). Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition (1st ed.). USA: Prentice Hall PTR. Jurafsky, D., & Martin, J. H. (2000). Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition (1st ed.). USA: Prentice Hall PTR.
go back to reference Karpukhin, V., Oguz, B., Min, S., Wu, L., Edunov, S., Chen, D., Yih, W. (2020). Dense passage retrieval for open-domain question answering. CoRR, abs/2004.04906 . Retrieved from https://arxiv.org/abs/2004.04906 https://arxiv.org/abs/2004.04906 https://doi.org/10.48550/arXiv.2004.04906 Karpukhin, V., Oguz, B., Min, S., Wu, L., Edunov, S., Chen, D., Yih, W. (2020). Dense passage retrieval for open-domain question answering. CoRR, abs/2004.04906 . Retrieved from https://​arxiv.​org/​abs/​2004.​04906 https://arxiv.org/abs/2004.04906 https://doi.org/10.48550/arXiv.2004.04906
go back to reference Keraron, R., Lancrenon, G., Bras, M., Allary, F., Moyse, G., Scialom, T., . . . Staiano, J. (2020, May). Project PIAF: Building a native French question-answering dataset. In Proceedings of the 12th lan-guage resources and evaluation conference (pp. 5481-5490). Marseille, France: European Language Resources Association. Retrieved from https://aclanthology.org/2020.lrec-1.673 Keraron, R., Lancrenon, G., Bras, M., Allary, F., Moyse, G., Scialom, T., . . . Staiano, J. (2020, May). Project PIAF: Building a native French question-answering dataset. In Proceedings of the 12th lan-guage resources and evaluation conference (pp. 5481-5490). Marseille, France: European Language Resources Association. Retrieved from https://​aclanthology.​org/​2020.​lrec-1.​673
go back to reference Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A., Alberti, C., & Petrov, S. (2019). Natural questions: A benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7, 452–466.CrossRef Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A., Alberti, C., & Petrov, S. (2019). Natural questions: A benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7, 452–466.CrossRef
go back to reference Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R. (2019). ALBERT: A lite BERT for self-supervised learning of lan-guage representations. CoRR, abs/1909.11942 . Retrieved from http://arxiv.org/abs/1909.11942 https://arxiv.org/abs/1909.11942 https://doi.org/10.48550/arXiv.1909.11942 Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R. (2019). ALBERT: A lite BERT for self-supervised learning of lan-guage representations. CoRR, abs/1909.11942 . Retrieved from http://​arxiv.​org/​abs/​1909.​11942 https://arxiv.org/abs/1909.11942 https://doi.org/10.48550/arXiv.1909.11942
go back to reference Lee, K., Yoon, K., Park, S., Hwang, S.-w. (2018, May). Semi-supervised training data generation for multilingual question answering. In Proceedings othe eleventh international conference on language resources and evaluation (LREC 2018). Miyazaki, Japan: European Language Resources Association (ELRA). Retrieved from https://aclanthology.org/L18- 1437 Lee, K., Yoon, K., Park, S., Hwang, S.-w. (2018, May). Semi-supervised training data generation for multilingual question answering. In Proceedings othe eleventh international conference on language resources and evaluation (LREC 2018). Miyazaki, Japan: European Language Resources Association (ELRA). Retrieved from https://​aclanthology.​org/​L18- 1437
go back to reference Lei, T., Shi, Z., Liu, D., Yang, L., Zhu, F. (2018). A novel cnn-based method for question classification in intelligent question answering. In Proceedings of the 2018 international conference on algorithms, computing and artificial intelligence. New York, NY, USA: Association for Computing Machinery. Retrieved from https://doi.org/10.1145/3302425.3302483 10.1145/3302425.3302483 Lei, T., Shi, Z., Liu, D., Yang, L., Zhu, F. (2018). A novel cnn-based method for question classification in intelligent question answering. In Proceedings of the 2018 international conference on algorithms, computing and artificial intelligence. New York, NY, USA: Association for Computing Machinery. Retrieved from https://​doi.​org/​10.​1145/​3302425.​3302483 10.1145/3302425.3302483
go back to reference Lim, S., Kim, M., Lee, J. (2019). Korquad1.0: Korean QA dataset for machine reading comprehension. CoRR, abs/1909.07005 . Retrieved from http://arxiv.org/abs/1909.07005 https://arxiv.org/abs/1909.07005 https://doi.org/10.48550/arXiv.1909.07005 Lim, S., Kim, M., Lee, J. (2019). Korquad1.0: Korean QA dataset for machine reading comprehension. CoRR, abs/1909.07005 . Retrieved from http://​arxiv.​org/​abs/​1909.​07005 https://arxiv.org/abs/1909.07005 https://doi.org/10.48550/arXiv.1909.07005
go back to reference Lim, S., Kim, M., Lee, J. (2019). Korquad1.0: Korean QA dataset for machine reading comprehension. CoRR, abs/1909.07005 . Retrieved from http://arxiv.org/abs/1909.07005 https://arxiv.org/abs/1909.07005 https://doi.org/10.48550/arXiv.1909.07005 Lim, S., Kim, M., Lee, J. (2019). Korquad1.0: Korean QA dataset for machine reading comprehension. CoRR, abs/1909.07005 . Retrieved from http://​arxiv.​org/​abs/​1909.​07005 https://arxiv.org/abs/1909.07005 https://doi.org/10.48550/arXiv.1909.07005
go back to reference Mozannar, H., Maamary, E., El Hajal, K., Hajj, H. (2019, August). Neural Arabic question answering. Proceedings of the fourth arabic natural language processing workshop (pp. 108-118). Florence, Italy: Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/W19-4612 10.18653/v1/W19 -4612 Mozannar, H., Maamary, E., El Hajal, K., Hajj, H. (2019, August). Neural Arabic question answering. Proceedings of the fourth arabic natural language processing workshop (pp. 108-118). Florence, Italy: Association for Computational Linguistics. Retrieved from https://​www.​aclweb.​org/​anthology/​W19-4612 10.18653/v1/W19 -4612
go back to reference Nguyen, T., Rosenberg, M., Song, X., Gao, J., Tiwary, S., Majumder, R., Deng, L. (2016). Ms marco: A human generated machine reading comprehension dataset. Coco@nips. Retrieved from http://ceur-ws.org/Vol1773/CoCoNIPS 2016 paper Nguyen, T., Rosenberg, M., Song, X., Gao, J., Tiwary, S., Majumder, R., Deng, L. (2016). Ms marco: A human generated machine reading comprehension dataset. Coco@nips. Retrieved from http://​ceur-ws.​org/​Vol1773/​CoCoNIPS 2016 paper
go back to reference Nishida, K., Saito, I., Otsuka, A., Asano, H., Tomita, J. (2018). Retrieve and-read: Multi-task learning of information retrieval and reading comprehension. In Proceedings of the 27th acm international conference on information and knowledge management (p. 647-656). New York, NY, USA: Association for Computing Machinery. Retrieved from https://doi.org/10.1145/3269206.3271702 10.1145/3269206.3271702 Nishida, K., Saito, I., Otsuka, A., Asano, H., Tomita, J. (2018). Retrieve and-read: Multi-task learning of information retrieval and reading comprehension. In Proceedings of the 27th acm international conference on information and knowledge management (p. 647-656). New York, NY, USA: Association for Computing Machinery. Retrieved from https://​doi.​org/​10.​1145/​3269206.​3271702 10.1145/3269206.3271702
go back to reference Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P. (2016, November). SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 conference on empirical methods in natural language processing (pp. 2383-2392). Austin, Texas: Association for Computational Linguistics. Retrieved from https://aclanthology.org/D16-1264 10.18653/v1/ D16-1264 Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P. (2016, November). SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 conference on empirical methods in natural language processing (pp. 2383-2392). Austin, Texas: Association for Computational Linguistics. Retrieved from https://​aclanthology.​org/​D16-1264 10.18653/v1/ D16-1264
go back to reference Voorhees, E.M., & Tice, D.M. (2000, May). The TREC-8 question answering track. In Proceedings of the second international conference on language resources and evaluation (LRECÆ00). Athens, Greece: European Language Resources Association (ELRA). Retrieved from http://www.lrec-conf.org/proceedings/lrec2000/pdf/26.pd Voorhees, E.M., & Tice, D.M. (2000, May). The TREC-8 question answering track. In Proceedings of the second international conference on language resources and evaluation (LRECÆ00). Athens, Greece: European Language Resources Association (ELRA). Retrieved from http://​www.​lrec-conf.​org/​proceedings/​lrec2000/​pdf/​26.​pd
go back to reference Woods, W.A. (1973). Progress in natural language understanding: An application to lunar geology. In Proceedings of the june 4-8, 1973, national computer conference and exposition (p. 441-450). New York, NY, USA: Association for Computing Machinery. Retrieved from https://doi.org/10.1145/1499586.1499695 10.1145/1499586.1499695 Woods, W.A. (1973). Progress in natural language understanding: An application to lunar geology. In Proceedings of the june 4-8, 1973, national computer conference and exposition (p. 441-450). New York, NY, USA: Association for Computing Machinery. Retrieved from https://​doi.​org/​10.​1145/​1499586.​1499695 10.1145/1499586.1499695
go back to reference Xia, W., Zhu, W., Liao, B., Chen, M., Cai, L., & Huang, L. (2018). Novel architecture for long short-term memory used in question classification. Neurocomputing, 299, 20–31.CrossRef Xia, W., Zhu, W., Liao, B., Chen, M., Cai, L., & Huang, L. (2018). Novel architecture for long short-term memory used in question classification. Neurocomputing, 299, 20–31.CrossRef
go back to reference Yang, Y., Yih, W.-t., Meek, C. (2015, September). WikiQA: A challenge dataset for open-domain question answering. In Proceedings of the 2015 conference on empirical methods in natural language processing (pp. 2013-2018). Lisbon, Portugal: Association for Computational Linguistics. Retrieved from https://aclanthology.org/D15-1237 10.18653/v1/ D15-12 Yang, Y., Yih, W.-t., Meek, C. (2015, September). WikiQA: A challenge dataset for open-domain question answering. In Proceedings of the 2015 conference on empirical methods in natural language processing (pp. 2013-2018). Lisbon, Portugal: Association for Computational Linguistics. Retrieved from https://​aclanthology.​org/​D15-1237 10.18653/v1/ D15-12
go back to reference Zhu, F., Lei, W., Wang, C., Zheng, J., Poria, S., Chua, T. (2021). Retrieving and reading: A comprehensive survey on open-domain question answering. CoRR, abs/2101.00774 . Retrieved from https://arxiv.org/abs/2101.00774 https://arxiv.org/abs/2101.00774 https://doi.org/10.48550/arXiv.2101.00774 Zhu, F., Lei, W., Wang, C., Zheng, J., Poria, S., Chua, T. (2021). Retrieving and reading: A comprehensive survey on open-domain question answering. CoRR, abs/2101.00774 . Retrieved from https://​arxiv.​org/​abs/​2101.​00774 https://arxiv.org/abs/2101.00774 https://doi.org/10.48550/arXiv.2101.00774
Metadata
Title
FarsNewsQA: a deep learning-based question answering system for the Persian news articles
Authors
Arefeh Kazemi
Zahra Zojaji
Mahdi Malverdi
Jamshid Mozafari
Fatemeh Ebrahimi
Negin Abadani
Mohammad Reza Varasteh
Mohammad Ali Nematbakhsh
Publication date
01-06-2023
Publisher
Springer Netherlands
Published in
Discover Computing / Issue 1/2023
Print ISSN: 2948-2984
Electronic ISSN: 2948-2992
DOI
https://doi.org/10.1007/s10791-023-09417-2

Premium Partner