Skip to main content
Erschienen in: Arabian Journal for Science and Engineering 4/2021

24.02.2021 | Research Article-Computer Engineering and Computer Science

BLSTM-API: Bi-LSTM Recurrent Neural Network-Based Approach for Arabic Paraphrase Identification

verfasst von: Adnen Mahmoud, Mounir Zrigui

Erschienen in: Arabian Journal for Science and Engineering | Ausgabe 4/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Advances in communication technologies have enabled peoples to deliver more. Due to this phenomenon, an increasing amount of data are easily disseminated and published on the internet, which encouraged the practice of paraphrasing. It allows the original sentence to be concealed by alternative expressions of the same meaning. Its detection consists in identifying the degree of semantic similarity between them. It is one of the complex tasks of automatic natural language processing and artificial intelligence. Despite the fact that Arabic language is spoken by a large population around the world, it is rich of grammars and semantics that made hard its sentences modeling and similarity computing. In this paper, an Arabic extrinsic paraphrase identification method is proposed. It is based on a Siamese recurrent neural networks architecture seeing its performance in processing variable size of textual sequences. Indeed, pertinent features are firstly extracted using global word vector that used a global co-occurrence matrix based on a local context window. Then, bidirectional long short-term memory is introduced that incorporated efficiently long-term dependent relationships and captured meaningful contextual semantics between words. For paraphrase identification, cosine measure is used as a merge function. It was useful for identifying semantic similarity between the obtained source and suspect vectors. To address the lack of free and publicly Arabic paraphrased datasets, word2vec algorithm and part-of-speech tagging are combined to generate suspect sentences. For its validation, its quality is compared to the SemEval benchmark. Experiments demonstrated the effectiveness of our proposal’s methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Altheneyan, A.; Menai, M.E.B.: Evaluation of state-of-the-art paraphrase identification and its application to automatic plagiarism detection. Int. J. Pattern Recognit Artif Intell. 34(4), 1–31 (2020)CrossRef Altheneyan, A.; Menai, M.E.B.: Evaluation of state-of-the-art paraphrase identification and its application to automatic plagiarism detection. Int. J. Pattern Recognit Artif Intell. 34(4), 1–31 (2020)CrossRef
2.
Zurück zum Zitat Abdellaoui, H.; Zrigui, M.: Using tweets and emojis to build TEAD: an arabic dataset for sentiment analysis. Computación y Sistemas 22(3), 777–786 (2018)CrossRef Abdellaoui, H.; Zrigui, M.: Using tweets and emojis to build TEAD: an arabic dataset for sentiment analysis. Computación y Sistemas 22(3), 777–786 (2018)CrossRef
3.
Zurück zum Zitat Mahmoud, A.; Zrigui, M.: Semantic similarity analysis for paraphrase identification in Arabic texts. In: 31st Pacific Asia Conference on Language, Information and Computation PACLIC, Philippine, pp. 274–281 (2017) Mahmoud, A.; Zrigui, M.: Semantic similarity analysis for paraphrase identification in Arabic texts. In: 31st Pacific Asia Conference on Language, Information and Computation PACLIC, Philippine, pp. 274–281 (2017)
4.
Zurück zum Zitat Hkiri, E.; Mallat, S.; Zrigui, M.: Integrating bilingual named entities lexicon with conditional random fields model for Arabic named entities recognition. In: 14th IAPR International Conference on Document Analysis and Recognition, Kyoto, Japan, pp. 609–614 (2017) Hkiri, E.; Mallat, S.; Zrigui, M.: Integrating bilingual named entities lexicon with conditional random fields model for Arabic named entities recognition. In: 14th IAPR International Conference on Document Analysis and Recognition, Kyoto, Japan, pp. 609–614 (2017)
5.
Zurück zum Zitat Hkiri, E.; Mallat, S.; Zrigui, M.; Mars, M.: Constructing a lexicon of Arabic-English named entity using SMT and semantic linked data. Int. Arab J. Inf. Technol. 14, 820–825 (2017) Hkiri, E.; Mallat, S.; Zrigui, M.; Mars, M.: Constructing a lexicon of Arabic-English named entity using SMT and semantic linked data. Int. Arab J. Inf. Technol. 14, 820–825 (2017)
6.
Zurück zum Zitat Boudhief, A.; Maraoui, M.; Zrigui, M: Elaboration of a model for an indexed base for teaching Arabic language to disabled people. In: 6th International Conference on Computer Science and Information Technology CSIT, Amman, Jordan (2016) Boudhief, A.; Maraoui, M.; Zrigui, M: Elaboration of a model for an indexed base for teaching Arabic language to disabled people. In: 6th International Conference on Computer Science and Information Technology CSIT, Amman, Jordan (2016)
7.
Zurück zum Zitat Maraoui, M.; Terbeh, N.; Zrigui, M.: Arabic discourse analysis based on acoustic, prosodic and phonetic modeling: elocution evaluation, speech classification and pathological speech correction. Int. J. Speech Technol. 21(14), 1071–1090 (2018)CrossRef Maraoui, M.; Terbeh, N.; Zrigui, M.: Arabic discourse analysis based on acoustic, prosodic and phonetic modeling: elocution evaluation, speech classification and pathological speech correction. Int. J. Speech Technol. 21(14), 1071–1090 (2018)CrossRef
8.
Zurück zum Zitat Batita, M.A.; Zrigui, M.: Derivational relations in arabic wordnet. In: 9th Global WordNet Conference GWC, Singapore (2018) Batita, M.A.; Zrigui, M.: Derivational relations in arabic wordnet. In: 9th Global WordNet Conference GWC, Singapore (2018)
9.
Zurück zum Zitat Mohamed, M.A.B.; Mallat, S.; Nahdi, M.A.; Zrigui, M.: Exploring the potential of schemes in building NLP tools for Arabic language. Int. Arab J. Inf. Technol. (IAJIT) 12(16), 566–573 (2015) Mohamed, M.A.B.; Mallat, S.; Nahdi, M.A.; Zrigui, M.: Exploring the potential of schemes in building NLP tools for Arabic language. Int. Arab J. Inf. Technol. (IAJIT) 12(16), 566–573 (2015)
11.
Zurück zum Zitat Diana, N.E.; Ulfa, I.H.: Measuring performance of n-gram and Jaccard-similarity metrics in document plagiarism application. J. Phys. 1196, 1–8 (2019) Diana, N.E.; Ulfa, I.H.: Measuring performance of n-gram and Jaccard-similarity metrics in document plagiarism application. J. Phys. 1196, 1–8 (2019)
12.
Zurück zum Zitat Ilham, A.A.; Bustamin, A.; Aswad, I.; Armin F.: Implementation of clustering and similarity analysis for detecting content similarity in student final projects. In: 3rd EPI International Conference on Science and Engineering, India (2020) Ilham, A.A.; Bustamin, A.; Aswad, I.; Armin F.: Implementation of clustering and similarity analysis for detecting content similarity in student final projects. In: 3rd EPI International Conference on Science and Engineering, India (2020)
13.
Zurück zum Zitat Abualigaha, L.M.; Khader, A.T.; Hanandeh, E.S.: A new feature selection method to improve the document clustering using particle swarm optimization algorithm. J. Comput. Sci. 25, 456–466 (2018)CrossRef Abualigaha, L.M.; Khader, A.T.; Hanandeh, E.S.: A new feature selection method to improve the document clustering using particle swarm optimization algorithm. J. Comput. Sci. 25, 456–466 (2018)CrossRef
14.
Zurück zum Zitat Abualigah, L.M.; Khader, A.T.; Hanandeh, E.S.: Hybrid clustering analysis using improved krill herd algorithm. Appl. Intell. 48(5), 4047–4071 (2018)CrossRef Abualigah, L.M.; Khader, A.T.; Hanandeh, E.S.: Hybrid clustering analysis using improved krill herd algorithm. Appl. Intell. 48(5), 4047–4071 (2018)CrossRef
15.
Zurück zum Zitat Sahu, M.: Plagiarism detection using artificial intelligence technique in multiple files. Int. J. Sci. Technol. Res. 5(14), 111–114 (2016) Sahu, M.: Plagiarism detection using artificial intelligence technique in multiple files. Int. J. Sci. Technol. Res. 5(14), 111–114 (2016)
16.
Zurück zum Zitat Ali, W.; Ahmed, T.; Rehman, Z.; Anwar, U.R.; Slaman, L.: Detection of plagiarism in Urdu text documents. In: 14th International Conference on Emerging Technologies ICET, Islamabad (2018) Ali, W.; Ahmed, T.; Rehman, Z.; Anwar, U.R.; Slaman, L.: Detection of plagiarism in Urdu text documents. In: 14th International Conference on Emerging Technologies ICET, Islamabad (2018)
17.
Zurück zum Zitat Ullah, F.; Wang, J.; Farhan, M.; Jabbar, S.; Naseer, M.K.; Asif, M.: LSA based smart assessment methodology for SDN infrastructure in IoT environment. Int. J. Parallel Prog. 48, 162–177 (2020)CrossRef Ullah, F.; Wang, J.; Farhan, M.; Jabbar, S.; Naseer, M.K.; Asif, M.: LSA based smart assessment methodology for SDN infrastructure in IoT environment. Int. J. Parallel Prog. 48, 162–177 (2020)CrossRef
18.
Zurück zum Zitat Ratna, A.A.P.; Wulandari, N.A.; Kaltsum, A.; Ibrahim, I.; Purnamasari, P.D.: Answer categorization method using K-Means for Indonesian language automatic short answer grading system based on Latent Semantic Analysis. In: International Conference on Quality in Research (QIR): International Symposium on Electrical and Computer Engineering, Indonesia (2019) Ratna, A.A.P.; Wulandari, N.A.; Kaltsum, A.; Ibrahim, I.; Purnamasari, P.D.: Answer categorization method using K-Means for Indonesian language automatic short answer grading system based on Latent Semantic Analysis. In: International Conference on Quality in Research (QIR): International Symposium on Electrical and Computer Engineering, Indonesia (2019)
19.
Zurück zum Zitat Daud, A.; Khan, J.A.; Nasir, J.A.; Abbasi, R.: Latent dirichlet allocation and POS tags based method for external plagiarism detection: LDA and POS tags based plagiarism detection. Int. J. Semant. Web Inf. Syst. (IJSWIS) 14(13), 53–69 (2018)CrossRef Daud, A.; Khan, J.A.; Nasir, J.A.; Abbasi, R.: Latent dirichlet allocation and POS tags based method for external plagiarism detection: LDA and POS tags based plagiarism detection. Int. J. Semant. Web Inf. Syst. (IJSWIS) 14(13), 53–69 (2018)CrossRef
20.
Zurück zum Zitat Xue, M.: A text retrieval algorithm based on the hybrid LDA and Word2Vec model. In: International Conference on Intelligent Transportation, Big Data & Smart City ICITBS, China (2019) Xue, M.: A text retrieval algorithm based on the hybrid LDA and Word2Vec model. In: International Conference on Intelligent Transportation, Big Data & Smart City ICITBS, China (2019)
21.
Zurück zum Zitat Yazid, B.; Mourad, O.; Abdelmalik, T.: Semantic similarity approach between two sentences. In: 5th International Conference on the Image and Signal Processing and their Applications, Algeria (2019) Yazid, B.; Mourad, O.; Abdelmalik, T.: Semantic similarity approach between two sentences. In: 5th International Conference on the Image and Signal Processing and their Applications, Algeria (2019)
22.
Zurück zum Zitat Farouk, M.: Measuring text similarity based on structure and word embedding. Cogn. Syst. Res. 63(11), 1–10 (2020)CrossRef Farouk, M.: Measuring text similarity based on structure and word embedding. Cogn. Syst. Res. 63(11), 1–10 (2020)CrossRef
23.
Zurück zum Zitat Suleiman, D.; Awajan, A.; Al-Madi, N.: Deep learning based technique for plagiarism detection in Arabic texts. In: International Conference on New Trends in Computing Sciences ICTCS, Jordan (2017) Suleiman, D.; Awajan, A.; Al-Madi, N.: Deep learning based technique for plagiarism detection in Arabic texts. In: International Conference on New Trends in Computing Sciences ICTCS, Jordan (2017)
24.
Zurück zum Zitat Nagoudi, E.M.B.; Ferrero, J.; Schwab, D.: LIM-LIG at SemEval-2017 Task1: enhancing the semantic similarity for arabic sentences with vectors weighting. in: 11th International Workshop on Semantic Evaluation SemEval-2017, Canada (2017) Nagoudi, E.M.B.; Ferrero, J.; Schwab, D.: LIM-LIG at SemEval-2017 Task1: enhancing the semantic similarity for arabic sentences with vectors weighting. in: 11th International Workshop on Semantic Evaluation SemEval-2017, Canada (2017)
25.
Zurück zum Zitat Florou, E.; Perifanos, K.; Goutos, D.: Neural embeddings for metaphor detection in a corpus of Greek texts. In: International Conference on Information, Intelligence, Systems and Applications IISA, Greece (2018) Florou, E.; Perifanos, K.; Goutos, D.: Neural embeddings for metaphor detection in a corpus of Greek texts. In: International Conference on Information, Intelligence, Systems and Applications IISA, Greece (2018)
26.
Zurück zum Zitat Mahmoud, A.; Zrigui, M.: Machine learning based method for detecting Arabic paraphrases. In: 33rd International Business Information Management Association IBIMA, Granada, Spain, pp. 5035–5048 (2019) Mahmoud, A.; Zrigui, M.: Machine learning based method for detecting Arabic paraphrases. In: 33rd International Business Information Management Association IBIMA, Granada, Spain, pp. 5035–5048 (2019)
27.
Zurück zum Zitat Mahmoud, A.; Zrigui, M.: Similar meaning analysis for original documents identification in Arabic language. In: International Conference on Computational Collective Intelligence ICCCI), Hendaye, France, pp. 193–206 (2019) Mahmoud, A.; Zrigui, M.: Similar meaning analysis for original documents identification in Arabic language. In: International Conference on Computational Collective Intelligence ICCCI), Hendaye, France, pp. 193–206 (2019)
28.
Zurück zum Zitat Mahmoud, A.; Zrigui, M.: Deep neural network models for paraphrased text classification in the Arabic language. In: 24th International Conference on Applications of Natural Language to Information Systems NLDB, Salford, UK, pp. 3–16 (2019) Mahmoud, A.; Zrigui, M.: Deep neural network models for paraphrased text classification in the Arabic language. In: 24th International Conference on Applications of Natural Language to Information Systems NLDB, Salford, UK, pp. 3–16 (2019)
29.
Zurück zum Zitat Kim, Y.: Convolutional neural networks for sentence classification. In: Conference on Empirical Methods in Natural Language Processing EMNLP, Doha, Qatar, pp. 1746–1751 (2014) Kim, Y.: Convolutional neural networks for sentence classification. In: Conference on Empirical Methods in Natural Language Processing EMNLP, Doha, Qatar, pp. 1746–1751 (2014)
30.
Zurück zum Zitat He, H.; Gimpel, K.; Lin, J.: Multi-perspective sentence similarity modelling with convolutional neural networks. In: Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, pp. 1576–1586 (2015) He, H.; Gimpel, K.; Lin, J.: Multi-perspective sentence similarity modelling with convolutional neural networks. In: Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, pp. 1576–1586 (2015)
31.
Zurück zum Zitat Song, Y.; Hu, Q.V.; He, L.: P-CNN: enhancing text matching with positional convolutional neural network. Knowl. Based Syst. 169, 67–79 (2019)CrossRef Song, Y.; Hu, Q.V.; He, L.: P-CNN: enhancing text matching with positional convolutional neural network. Knowl. Based Syst. 169, 67–79 (2019)CrossRef
32.
Zurück zum Zitat Bsir, B.; Zrigui, M.: Gender identification: a comparative study of deep learning architectures. In: International Conference on Intelligent Systems Design and Applications ISDA, Advances in Intelligent Systems and Computing, Springer, vol 94, pp. 792–800 (2020) Bsir, B.; Zrigui, M.: Gender identification: a comparative study of deep learning architectures. In: International Conference on Intelligent Systems Design and Applications ISDA, Advances in Intelligent Systems and Computing, Springer, vol 94, pp. 792–800 (2020)
33.
Zurück zum Zitat Liu, G., Guoa, J.: Bidirectional LSTM with attention mechanism and convolutional layer for text classification. Neurocomputing 337, 1–51 (2019)CrossRef Liu, G., Guoa, J.: Bidirectional LSTM with attention mechanism and convolutional layer for text classification. Neurocomputing 337, 1–51 (2019)CrossRef
34.
Zurück zum Zitat Hunt, E.; Janamsetty, R.; Kinares, C.; Koh, C.; Sanchez, A.; Zhan, F.; Ozdemir, M.; Waseem, S.; Yolcu, O.; Dahal, B.; Zhan, J.; Gewali, L.; Oh, P.: Machine learning models for paraphrase identification and its applications on plagiarism detection. In: IEEE International Conference on Big Knowledge ICBK, Beijing China, pp. 97–104 (2019) Hunt, E.; Janamsetty, R.; Kinares, C.; Koh, C.; Sanchez, A.; Zhan, F.; Ozdemir, M.; Waseem, S.; Yolcu, O.; Dahal, B.; Zhan, J.; Gewali, L.; Oh, P.: Machine learning models for paraphrase identification and its applications on plagiarism detection. In: IEEE International Conference on Big Knowledge ICBK, Beijing China, pp. 97–104 (2019)
35.
Zurück zum Zitat Duong, P.H.; Nguyen, H.T.; Duong, H.N.; Ngo, K.; Ngo, D.: A hybrid approach to paraphrase detection. In: 5th NAFOSTED Conference on Information and Computer Science, pp. 366–371 (2018) Duong, P.H.; Nguyen, H.T.; Duong, H.N.; Ngo, K.; Ngo, D.: A hybrid approach to paraphrase detection. In: 5th NAFOSTED Conference on Information and Computer Science, pp. 366–371 (2018)
36.
Zurück zum Zitat Wang, X.; Li, C.; Zheng, Z.; Xu, B.: Paraphrase recognition via combination of neural classifier and keywords. In: International Joint Conference on Neural Networks IJCNN, Rio, Brazil, pp. 1–8 (2018) Wang, X.; Li, C.; Zheng, Z.; Xu, B.: Paraphrase recognition via combination of neural classifier and keywords. In: International Joint Conference on Neural Networks IJCNN, Rio, Brazil, pp. 1–8 (2018)
37.
Zurück zum Zitat Einea, O.; Elnagar, A.: Predicting semantic textual similarity of Arabic question pairs using deep learning. In: 16th International Conference on Computer Systems and Applications AICCSA, Abu Dhabi, United Arab Emirates, pp. 1–5 (2020) Einea, O.; Elnagar, A.: Predicting semantic textual similarity of Arabic question pairs using deep learning. In: 16th International Conference on Computer Systems and Applications AICCSA, Abu Dhabi, United Arab Emirates, pp. 1–5 (2020)
38.
Zurück zum Zitat Wang, S.; Zhou, W.; Jiang, C.: A survey of word embeddings based on deep learning. Computing 102, 717–740 (2020)MathSciNetCrossRef Wang, S.; Zhou, W.; Jiang, C.: A survey of word embeddings based on deep learning. Computing 102, 717–740 (2020)MathSciNetCrossRef
39.
Zurück zum Zitat Pennington, J.; Socher, R.; Manning, C.: GloVe: Global vectors for word representation. In: Conference on Empirical Methods in Natural Language Processing EMNLP, Qatar, pp. 1532–1543 (2014) Pennington, J.; Socher, R.; Manning, C.: GloVe: Global vectors for word representation. In: Conference on Empirical Methods in Natural Language Processing EMNLP, Qatar, pp. 1532–1543 (2014)
40.
Zurück zum Zitat Alrabiah, M.; Al-Salman, A.; Atwell, E.; Alhelewh, N.: KSUCCA: a key to exploring Arabic historical linguistics. Int. J. Comput. Linguist. (IJCL) 5, 27–36 (2014) Alrabiah, M.; Al-Salman, A.; Atwell, E.; Alhelewh, N.: KSUCCA: a key to exploring Arabic historical linguistics. Int. J. Comput. Linguist. (IJCL) 5, 27–36 (2014)
41.
Zurück zum Zitat Saad, M.K.; Ashour, W.: OSAC: Open Source Arabic Corpora. In: 6th International Conference on Electrical and Computer Systems EECS’10, North Cyprus (2010) Saad, M.K.; Ashour, W.: OSAC: Open Source Arabic Corpora. In: 6th International Conference on Electrical and Computer Systems EECS’10, North Cyprus (2010)
42.
Zurück zum Zitat Chicco, D.; Jurman, G.: The advantages of the matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom. 21(6), 1–13 (2020) Chicco, D.; Jurman, G.: The advantages of the matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom. 21(6), 1–13 (2020)
43.
Zurück zum Zitat Kong, L., Han, Z., Han, Y., Qi, H.: A deep paraphrase identification model interacting semantics with syntax. Hindawi Complex 2020, 1–14 (2020) Kong, L., Han, Z., Han, Y., Qi, H.: A deep paraphrase identification model interacting semantics with syntax. Hindawi Complex 2020, 1–14 (2020)
44.
Zurück zum Zitat Othman, N.; Faiz, R.; Smaili, K.: Manhattan siamese LSTM for question retrieval in community question answering. In: 18th International Conference on Ontologies, DataBases, and Applications of Semantics ODBASE, Greece (2019) Othman, N.; Faiz, R.; Smaili, K.: Manhattan siamese LSTM for question retrieval in community question answering. In: 18th International Conference on Ontologies, DataBases, and Applications of Semantics ODBASE, Greece (2019)
45.
Zurück zum Zitat Yao, L.; Pan, Z.; Ning, H.: Unlabeled short text similarity with LSTM encoder. IEEE Access 7(11), 3430–3437 (2019)CrossRef Yao, L.; Pan, Z.; Ning, H.: Unlabeled short text similarity with LSTM encoder. IEEE Access 7(11), 3430–3437 (2019)CrossRef
Metadaten
Titel
BLSTM-API: Bi-LSTM Recurrent Neural Network-Based Approach for Arabic Paraphrase Identification
verfasst von
Adnen Mahmoud
Mounir Zrigui
Publikationsdatum
24.02.2021
Verlag
Springer Berlin Heidelberg
Erschienen in
Arabian Journal for Science and Engineering / Ausgabe 4/2021
Print ISSN: 2193-567X
Elektronische ISSN: 2191-4281
DOI
https://doi.org/10.1007/s13369-020-05320-w

Weitere Artikel der Ausgabe 4/2021

Arabian Journal for Science and Engineering 4/2021 Zur Ausgabe

Research Article-Computer Engineering and Computer Science

Sparse to Dense Scale Prediction for Crowd Couting in High Density Crowds

Research Article-Computer Engineering and Computer Science

Prediction of Heart Disease Using Deep Convolutional Neural Networks

    Marktübersichten

    Die im Laufe eines Jahres in der „adhäsion“ veröffentlichten Marktübersichten helfen Anwendern verschiedenster Branchen, sich einen gezielten Überblick über Lieferantenangebote zu verschaffen.