Skip to main content
Erschienen in: International Journal of Speech Technology 1/2023

03.03.2022

ArSphere: Arabic word vectors embedded in a polar sphere

verfasst von: Sandra Rizkallah, Amir F. Atiya, Samir Shaheen, Hossam ElDin Mahgoub

Erschienen in: International Journal of Speech Technology | Ausgabe 1/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Word embeddings mean the mapping of words into vectors in an N-dimensional space. ArSphere: is an approach that designs word embeddings for the Arabic language. This approach overcomes one of the shortcomings of word embeddings (for English language too), namely their inability to handle opposites (and differentiate those from unrelated word pairs). To achieve that goal the vectors are embedded onto the unit sphere, rather than onto the entire space. The sphere embedding is suitable in the sense that polarity can be addressed by embedding vectors at opposite poles of the sphere. The proposed approach has several advantages. It utilizes the extensive resources developed by linguistic experts, including classic dictionaries. This is in contrast to the prevailing approach of designing the word embedding using the concept of word co-occurrence. Another advantage is that it is successful in distinguishing between synonyms, antonyms and unrelated word pairs. An algorithm to design the word embedding has been derived, and it is a simple relaxation algorithm. Being a fast algorithm allows easy update of the word vector collection, when adding new words or synonyms. The vectors are tested against a number of other published models and the results show that ArSphere outperforms the other models.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Al-Ayyoub, M., Essa, S. B., & Alsmadi, I. (2015). Lexiconbased sentiment analysis of arabic tweets. IJSNM, 2(2), 101–114.CrossRef Al-Ayyoub, M., Essa, S. B., & Alsmadi, I. (2015). Lexiconbased sentiment analysis of arabic tweets. IJSNM, 2(2), 101–114.CrossRef
Zurück zum Zitat Al-Azani, S., & El-Alfy, E. S. M. (2017a). Hybrid deep learning for sentiment polarity determination of arabic microblogs (pp. 491–500). New York: Springer. Al-Azani, S., & El-Alfy, E. S. M. (2017a). Hybrid deep learning for sentiment polarity determination of arabic microblogs (pp. 491–500). New York: Springer.
Zurück zum Zitat Al-Azani, S., & El-Alfy, E. S. M. (2017b). Using word embedding and ensemble learning for highly imbalanced data sentiment analysis in short arabic text. Procedia Computer Science, 109, 359–366.CrossRef Al-Azani, S., & El-Alfy, E. S. M. (2017b). Using word embedding and ensemble learning for highly imbalanced data sentiment analysis in short arabic text. Procedia Computer Science, 109, 359–366.CrossRef
Zurück zum Zitat Al-Rfou, R., Perozzi, B., & Skiena, S. (2013). Polyglot: Distributed word representations for multilingual nlp. arXiv preprint arXiv:13071662. Al-Rfou, R., Perozzi, B., & Skiena, S. (2013). Polyglot: Distributed word representations for multilingual nlp. arXiv preprint arXiv:​13071662.
Zurück zum Zitat Alashri, S., Alzahrani, S., Alhoshan, M., Alkhanen, I., Alghunaim, S., & Alhassoun, M. (2019). Lexi-augmenter: Lexicon-based model for tweets sentiment analysis. In 2019 IEEE international conference on computational science and engineering (CSE) and IEEE international conference on embedded and ubiquitous computing (EUC), IEEE (pp. 7–10). Alashri, S., Alzahrani, S., Alhoshan, M., Alkhanen, I., Alghunaim, S., & Alhassoun, M. (2019). Lexi-augmenter: Lexicon-based model for tweets sentiment analysis. In 2019 IEEE international conference on computational science and engineering (CSE) and IEEE international conference on embedded and ubiquitous computing (EUC), IEEE (pp. 7–10).
Zurück zum Zitat Altowayan, A. A., & Elnagar, A. (2017). Improving arabic sentiment analysis with sentiment-specific embeddings. In 2017 IEEE international conference on big data (big data), IEEE (pp. 4314–4320). Altowayan, A. A., & Elnagar, A. (2017). Improving arabic sentiment analysis with sentiment-specific embeddings. In 2017 IEEE international conference on big data (big data), IEEE (pp. 4314–4320).
Zurück zum Zitat Altowayan, A. A., & Tao, L. (2016). Word embeddings for arabic sentiment analysis. In 2016 IEEE international conference on big data (big data), IEEE (pp. 3820–3825). Altowayan, A. A., & Tao, L. (2016). Word embeddings for arabic sentiment analysis. In 2016 IEEE international conference on big data (big data), IEEE (pp. 3820–3825).
Zurück zum Zitat Aly, M., & Atiya, A. (2013). Labr: A large scale arabic book reviews dataset. In Proceedings of the 51st annual meeting of the association for computational linguistics (Volume 2: Short Papers) (Vol. 2, pp. 494–498). Aly, M., & Atiya, A. (2013). Labr: A large scale arabic book reviews dataset. In Proceedings of the 51st annual meeting of the association for computational linguistics (Volume 2: Short Papers) (Vol. 2, pp. 494–498).
Zurück zum Zitat Artetxe, M., Labaka, G., & Agirre, E. (2017). Learning bilingual word embeddings with (almost) no bilingual data. In Proceedings of the 55th annual meeting of the association for computational linguistics (Volume 1: Long Papers, pp. 451–462). Artetxe, M., Labaka, G., & Agirre, E. (2017). Learning bilingual word embeddings with (almost) no bilingual data. In Proceedings of the 55th annual meeting of the association for computational linguistics (Volume 1: Long Papers, pp. 451–462).
Zurück zum Zitat Baly, F., Hajj, H., et al. (2020). Arabert: Transformerbased model for arabic language understanding. In Proceedings of the 4th workshop on open-source arabic corpora and processing tools, with a shared task on offensive language detection (pp. 9–15). Baly, F., Hajj, H., et al. (2020). Arabert: Transformerbased model for arabic language understanding. In Proceedings of the 4th workshop on open-source arabic corpora and processing tools, with a shared task on offensive language detection (pp. 9–15).
Zurück zum Zitat Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135–146.CrossRef Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135–146.CrossRef
Zurück zum Zitat Boudad, N., Ezzahid, S., Faizi, R., & Thami, R.O.H. (2020). Exploring the use of word embedding and deep learning in arabic sentiment analysis. In M. Ezziyyani (Ed.), Advanced intelligent systems for sustainable development (AI2SD’2019), Springer, Cham (pp. 243–253). Boudad, N., Ezzahid, S., Faizi, R., & Thami, R.O.H. (2020). Exploring the use of word embedding and deep learning in arabic sentiment analysis. In M. Ezziyyani (Ed.), Advanced intelligent systems for sustainable development (AI2SD’2019), Springer, Cham (pp. 243–253).
Zurück zum Zitat Boujelbane, R., Khemekhem, M. E., & Belguith, L. H. (2013). Mapping rules for building a tunisian dialect lexicon and generating corpora. In Proceedings of the sixth international joint conference on natural language processing (pp. 419–428). Boujelbane, R., Khemekhem, M. E., & Belguith, L. H. (2013). Mapping rules for building a tunisian dialect lexicon and generating corpora. In Proceedings of the sixth international joint conference on natural language processing (pp. 419–428).
Zurück zum Zitat Dahou, A., Xiong, S., Zhou, J., Haddoud, M. H., & Duan, P. (2016). Word embeddings and convolutional neural network for arabic sentiment classification. In Proceedings of coling 2016, the 26th international conference on computational linguistics: Technical papers (pp. 2418–2427). Dahou, A., Xiong, S., Zhou, J., Haddoud, M. H., & Duan, P. (2016). Word embeddings and convolutional neural network for arabic sentiment classification. In Proceedings of coling 2016, the 26th international conference on computational linguistics: Technical papers (pp. 2418–2427).
Zurück zum Zitat Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:​181004805.
Zurück zum Zitat Diab, M., Al-Badrashiny, M., Aminian, M., Attia, M., Dasigi, P., Elfardy, H., Eskander, R., Habash, N., Hawwari, A., & Salloum, W. (2014). Tharwa: A large scale dialectal arabic-standard arabic-english lexicon. In 9th international conference on language resources and evaluation, LREC 2014, European Language Resources Association (ELRA) (pp. 3782–3789). Diab, M., Al-Badrashiny, M., Aminian, M., Attia, M., Dasigi, P., Elfardy, H., Eskander, R., Habash, N., Hawwari, A., & Salloum, W. (2014). Tharwa: A large scale dialectal arabic-standard arabic-english lexicon. In 9th international conference on language resources and evaluation, LREC 2014, European Language Resources Association (ELRA) (pp. 3782–3789).
Zurück zum Zitat Dou, Z., Wei, W., & Wan, X. (2018). Improving word embeddings for antonym detection using thesauri and sentiwordnet. In CCF international conference on natural language processing and Chinese computing. Springer (pp. 67–79). Dou, Z., Wei, W., & Wan, X. (2018). Improving word embeddings for antonym detection using thesauri and sentiwordnet. In CCF international conference on natural language processing and Chinese computing. Springer (pp. 67–79).
Zurück zum Zitat El Bazi, I., & Laachfoubi, N. (2019). Arabic named entity recognition using deep learning approach. International Journal of Electrical & Computer Engineering, 9(3), 2088–8708. El Bazi, I., & Laachfoubi, N. (2019). Arabic named entity recognition using deep learning approach. International Journal of Electrical & Computer Engineering, 9(3), 2088–8708.
Zurück zum Zitat El-Beltagy, S. R., & Ali, A. (2013). Open issues in the sentiment analysis of arabic social media: A case study. In 2013 9th international conference on innovations in information technology (IIT), IEEE (pp. 215–220) El-Beltagy, S. R., & Ali, A. (2013). Open issues in the sentiment analysis of arabic social media: A case study. In 2013 9th international conference on innovations in information technology (IIT), IEEE (pp. 215–220)
Zurück zum Zitat El-Beltagy, S. R., Khalil, T., Halaby, A., & Hammad, M. (2016). Combining lexical features and a supervised learning approach for arabic sentiment analysis. In International conference on intelligent text processing and computational linguistics, Springer (pp. 307–319). El-Beltagy, S. R., Khalil, T., Halaby, A., & Hammad, M. (2016). Combining lexical features and a supervised learning approach for arabic sentiment analysis. In International conference on intelligent text processing and computational linguistics, Springer (pp. 307–319).
Zurück zum Zitat Fouad, M. M., Mahany, A., Aljohani, N., Abbasi, R. A., & Hassan, S. U. (2020). Arwordvec: Efficient word embedding models for arabic tweets. Soft Computing, 24(11), 8061–8068.CrossRef Fouad, M. M., Mahany, A., Aljohani, N., Abbasi, R. A., & Hassan, S. U. (2020). Arwordvec: Efficient word embedding models for arabic tweets. Soft Computing, 24(11), 8061–8068.CrossRef
Zurück zum Zitat Ghoniem, R. M., Alhelwa, N., & Shaalan, K. (2019). A novel hybrid genetic-whale optimization model for ontology learning from arabic text. Algorithms, 12(9), 182.CrossRef Ghoniem, R. M., Alhelwa, N., & Shaalan, K. (2019). A novel hybrid genetic-whale optimization model for ontology learning from arabic text. Algorithms, 12(9), 182.CrossRef
Zurück zum Zitat Gomaa, W. H., & Fahmy, A. A. (2014). Automatic scoring for answers to arabic test questions. Computer Speech & Language, 28(4), 833–857.CrossRef Gomaa, W. H., & Fahmy, A. A. (2014). Automatic scoring for answers to arabic test questions. Computer Speech & Language, 28(4), 833–857.CrossRef
Zurück zum Zitat Gomaa, W. H., Fahmy, A. A., et al. (2013). A survey of text similarity approaches. International Journal of Computer Applications, 68(13), 13–18.CrossRef Gomaa, W. H., Fahmy, A. A., et al. (2013). A survey of text similarity approaches. International Journal of Computer Applications, 68(13), 13–18.CrossRef
Zurück zum Zitat Habibi, M., Weber, L., Neves, M., Wiegandt, D. L., & Leser, U. (2017). Deep learning with word embeddings improves biomedical named entity recognition. Bioinformatics, 33(14), i37–i48.CrossRef Habibi, M., Weber, L., Neves, M., Wiegandt, D. L., & Leser, U. (2017). Deep learning with word embeddings improves biomedical named entity recognition. Bioinformatics, 33(14), i37–i48.CrossRef
Zurück zum Zitat Hammo, B., Abuleil, S., Lytinen, S., & Evens, M. (2004). Experimenting with a question answering system for the arabic language. Computers and the Humanities, 38(4), 397–415.CrossRef Hammo, B., Abuleil, S., Lytinen, S., & Evens, M. (2004). Experimenting with a question answering system for the arabic language. Computers and the Humanities, 38(4), 397–415.CrossRef
Zurück zum Zitat Helwe, C., & Elbassuoni, S. (2019). Arabic named entity recognition via deep co-learning. Artificial Intelligence Review, 52(1), 197–215.CrossRef Helwe, C., & Elbassuoni, S. (2019). Arabic named entity recognition via deep co-learning. Artificial Intelligence Review, 52(1), 197–215.CrossRef
Zurück zum Zitat Kolyvakis, P., Kalousis, A., & Kiritsis, D. (2018). Deepalignment: Unsupervised ontology matching with refined word vectors. In Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: Human language technologies, Volume 1 (Long Papers) (pp. 787–798). Kolyvakis, P., Kalousis, A., & Kiritsis, D. (2018). Deepalignment: Unsupervised ontology matching with refined word vectors. In Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: Human language technologies, Volume 1 (Long Papers) (pp. 787–798).
Zurück zum Zitat Kumar, C. S. P., & Babu, L. D. D. (2020). Evolving dictionary based sentiment scoring framework for patient authored text. Evolutionary Intelligence 1–11. Kumar, C. S. P., & Babu, L. D. D. (2020). Evolving dictionary based sentiment scoring framework for patient authored text. Evolutionary Intelligence 1–11.
Zurück zum Zitat Lachraf, R., Echahid, Y., Lakhdar, H., Abdelali, A., Schwab, D., et al. (2019). Arbengvec: Arabic-english crosslingual word embedding model. In Proceedings of the fourth Arabic natural language processing workshop. Lachraf, R., Echahid, Y., Lakhdar, H., Abdelali, A., Schwab, D., et al. (2019). Arbengvec: Arabic-english crosslingual word embedding model. In Proceedings of the fourth Arabic natural language processing workshop.
Zurück zum Zitat Mahgoub, H. E., Hashish, M., & Hassanein, A. T. (1990). A matrix representation of the inflectional forms of arabic words: A study of co-occurrence patterns. In Proceedings of the 13th conference on computational linguistics-Volume 3, Association for Computational Linguistics (pp. 419–421). Mahgoub, H. E., Hashish, M., & Hassanein, A. T. (1990). A matrix representation of the inflectional forms of arabic words: A study of co-occurrence patterns. In Proceedings of the 13th conference on computational linguistics-Volume 3, Association for Computational Linguistics (pp. 419–421).
Zurück zum Zitat Malhas, R., Torki, M., & Elsayed, T. (2016). Qu-ir at semeval 2016 task 3: Learning to rank on arabic community question answering forums with word embedding. In Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016) (pp. 866–871). Malhas, R., Torki, M., & Elsayed, T. (2016). Qu-ir at semeval 2016 task 3: Learning to rank on arabic community question answering forums with word embedding. In Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016) (pp. 866–871).
Zurück zum Zitat Medved, M., & Hor´ak, A. (2018). Sentence and word embedding employed in open question-answering. In ICAART (2) (pp. 486–492). Medved, M., & Hor´ak, A. (2018). Sentence and word embedding employed in open question-answering. In ICAART (2) (pp. 486–492).
Zurück zum Zitat Mezghanni, I. B., & Gargouri, F. (2017). Deriving ontological semantic relations between arabic compound nouns concepts. Journal of King Saud University-Computer and Information Sciences, 29(2), 212–228.CrossRef Mezghanni, I. B., & Gargouri, F. (2017). Deriving ontological semantic relations between arabic compound nouns concepts. Journal of King Saud University-Computer and Information Sciences, 29(2), 212–228.CrossRef
Zurück zum Zitat Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:13013781 Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:​13013781
Zurück zum Zitat Mikolov, T., Grave, E., Bojanowski, P., Puhrsch, C., & Joulin, A. (2017). Advances in pre-training distributed word representations. arXiv preprint arXiv:171209405 Mikolov, T., Grave, E., Bojanowski, P., Puhrsch, C., & Joulin, A. (2017). Advances in pre-training distributed word representations. arXiv preprint arXiv:​171209405
Zurück zum Zitat Mohammad, S. M., Salameh, M., & Kiritchenko, S. (2016). How translation alters sentiment. Journal of Artificial Intelligence Research, 55, 95–130.MathSciNetCrossRef Mohammad, S. M., Salameh, M., & Kiritchenko, S. (2016). How translation alters sentiment. Journal of Artificial Intelligence Research, 55, 95–130.MathSciNetCrossRef
Zurück zum Zitat Moussa, M. E., Mohamed, E. H., & Haggag, M. H. (2020). A generic lexicon-based framework for sentiment analysis. International Journal of Computers and Applications, 42(5), 463–473.CrossRef Moussa, M. E., Mohamed, E. H., & Haggag, M. H. (2020). A generic lexicon-based framework for sentiment analysis. International Journal of Computers and Applications, 42(5), 463–473.CrossRef
Zurück zum Zitat Nabil, M., Aly, M., & Atiya A. (2015). Astd: Arabic sentiment tweets dataset. In Proceedings of the 2015 conference on empirical methods in natural language processing (pp. 2515–2519). Nabil, M., Aly, M., & Atiya A. (2015). Astd: Arabic sentiment tweets dataset. In Proceedings of the 2015 conference on empirical methods in natural language processing (pp. 2515–2519).
Zurück zum Zitat Nakov, P., Màrquez, L., Moschitti, A., & Mubarak, H. (2019). Arabic community question answering. Natural Language Engineering, 25(1), 5.CrossRef Nakov, P., Màrquez, L., Moschitti, A., & Mubarak, H. (2019). Arabic community question answering. Natural Language Engineering, 25(1), 5.CrossRef
Zurück zum Zitat Nguyen, K. A., im Walde, S. S., & Vu, N. T. (2016). Integrating distributional lexical contrast into word embeddings for antonym-synonym distinction. In Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 2: Short Papers) (pp. 454–459). Nguyen, K. A., im Walde, S. S., & Vu, N. T. (2016). Integrating distributional lexical contrast into word embeddings for antonym-synonym distinction. In Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 2: Short Papers) (pp. 454–459).
Zurück zum Zitat Omar, A. M. (2008). Modern Arabic language dictionary . Alam El-Kutub. Omar, A. M. (2008). Modern Arabic language dictionary https://static-content.springer.com/image/art%3A10.1007%2Fs10772-022-09966-9/MediaObjects/10772_2022_9966_Figa_HTML.gif . Alam El-Kutub.
Zurück zum Zitat Ono, M., Miwa M., & Sasaki, Y. (2015). Word embeddingbased antonym detection using thesauri and distributional information. In Proceedings of the 2015 conference of the North American chapter of the association for computational linguistics: Human language technologies (pp. 984–989). Ono, M., Miwa M., & Sasaki, Y. (2015). Word embeddingbased antonym detection using thesauri and distributional information. In Proceedings of the 2015 conference of the North American chapter of the association for computational linguistics: Human language technologies (pp. 984–989).
Zurück zum Zitat Pennington J., Socher R., & Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532–1543). Pennington J., Socher R., & Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532–1543).
Zurück zum Zitat Rizkallah, S., Atiya, A., Mahgoub, H. E., & Heragy, M. (2018). Dialect versus msa sentiment analysis. In International conference on advanced machine learning technologies and applications, Springer (pp. 605–613). Rizkallah, S., Atiya, A., Mahgoub, H. E., & Heragy, M. (2018). Dialect versus msa sentiment analysis. In International conference on advanced machine learning technologies and applications, Springer (pp. 605–613).
Zurück zum Zitat Rizkallah, S., Atiya, A. F., & Shaheen, S. (2020a). Learning spherical word vectors for opinion mining and applying on hotel reviews. In International conference on intelligent systems design and applications, Springer (pp. 200–211). Rizkallah, S., Atiya, A. F., & Shaheen, S. (2020a). Learning spherical word vectors for opinion mining and applying on hotel reviews. In International conference on intelligent systems design and applications, Springer (pp. 200–211).
Zurück zum Zitat Rizkallah, S., Atiya, A. F., & Shaheen, S. (2020b). A polarity capturing sphere for word to vector representation. Applied Sciences, 10(12), 4386.CrossRef Rizkallah, S., Atiya, A. F., & Shaheen, S. (2020b). A polarity capturing sphere for word to vector representation. Applied Sciences, 10(12), 4386.CrossRef
Zurück zum Zitat Rizkallah, S., Atiya, A. F., & Shaheen, S. (2021). New vectorspace embeddings for recommender systems. Applied Sciences, 11(14), 6477.CrossRef Rizkallah, S., Atiya, A. F., & Shaheen, S. (2021). New vectorspace embeddings for recommender systems. Applied Sciences, 11(14), 6477.CrossRef
Zurück zum Zitat Rizkallah, S., Atiya, A., & Shaheen, S. (2022). Arcoq: Arabic closest opposite questions dataset. Working Paper Rizkallah, S., Atiya, A., & Shaheen, S. (2022). Arcoq: Arabic closest opposite questions dataset. Working Paper
Zurück zum Zitat Saad, M. K., & Ashour, W. (2010). Osac: Open source arabic corpus. In Proceedings of the 6th international symposium on electrical and electronics engineering and computer science (pp. 557–562). Saad, M. K., & Ashour, W. (2010). Osac: Open source arabic corpus. In Proceedings of the 6th international symposium on electrical and electronics engineering and computer science (pp. 557–562).
Zurück zum Zitat Salama, R. A., Youssef, A., & Fahmy, A. (2018). Morphological word embedding for arabic. Procedia Computer Science, 142, 83–93.CrossRef Salama, R. A., Youssef, A., & Fahmy, A. (2018). Morphological word embedding for arabic. Procedia Computer Science, 142, 83–93.CrossRef
Zurück zum Zitat Salameh, M., Mohammad, S., & Kiritchenko, S. (2015). Sentiment after translation: A case-study on arabic social media posts. In Proceedings of the 2015 conference of the North American chapter of the association for computational linguistics: Human language technologies (pp. 767–777). Salameh, M., Mohammad, S., & Kiritchenko, S. (2015). Sentiment after translation: A case-study on arabic social media posts. In Proceedings of the 2015 conference of the North American chapter of the association for computational linguistics: Human language technologies (pp. 767–777).
Zurück zum Zitat Shaalan, K. (2014). A survey of arabic named entity recognition and classification. Computational Linguistics, 40(2), 469–510.CrossRef Shaalan, K. (2014). A survey of arabic named entity recognition and classification. Computational Linguistics, 40(2), 469–510.CrossRef
Zurück zum Zitat Shen, Y., Rong, W., Jiang, N., Peng, B., Tang, J., & Xiong, Z. (2017). Word embedding based correlation model for question/answer matching. In Thirty-first AAAI conference on artificial intelligence. Shen, Y., Rong, W., Jiang, N., Peng, B., Tang, J., & Xiong, Z. (2017). Word embedding based correlation model for question/answer matching. In Thirty-first AAAI conference on artificial intelligence.
Zurück zum Zitat Singh, S. K., & Sachan, M. K. (2019). Sentiverb system: Classification of social media text using sentiment analysis. Multimedia Tools and Applications, 78(22), 32109–32136.CrossRef Singh, S. K., & Sachan, M. K. (2019). Sentiverb system: Classification of social media text using sentiment analysis. Multimedia Tools and Applications, 78(22), 32109–32136.CrossRef
Zurück zum Zitat Soliman, A. B., Eissa, K., & El-Beltagy, S. R. (2017). Aravec: A set of arabic word embedding models for use in arabic nlp. Procedia Computer Science, 117, 256–265.CrossRef Soliman, A. B., Eissa, K., & El-Beltagy, S. R. (2017). Aravec: A set of arabic word embedding models for use in arabic nlp. Procedia Computer Science, 117, 256–265.CrossRef
Zurück zum Zitat Taj, S., Shaikh, B. B., & Meghji, A. F. (2019). Sentiment analysis of news articles: A lexicon based approach. In 2019 2nd international conference on computing, mathematics and engineering technologies (iCoMET), IEEE (pp. 1–5). Taj, S., Shaikh, B. B., & Meghji, A. F. (2019). Sentiment analysis of news articles: A lexicon based approach. In 2019 2nd international conference on computing, mathematics and engineering technologies (iCoMET), IEEE (pp. 1–5).
Zurück zum Zitat Talafha, B., Ali, M., Za’ter, M. E., Seelawi, H., Tuffaha, I., Samir, M., Farhan, W., & Al-Natsheh, H. T. (2020). Multidialect arabic bert for country-level dialect identification. arXiv preprint arXiv:200705612. Talafha, B., Ali, M., Za’ter, M. E., Seelawi, H., Tuffaha, I., Samir, M., Farhan, W., & Al-Natsheh, H. T. (2020). Multidialect arabic bert for country-level dialect identification. arXiv preprint arXiv:​200705612.
Zurück zum Zitat Tubishat, M., Idris, N., & Abushariah, M. A. (2018). Implicit aspect extraction in sentiment analysis: Review, taxonomy, oppportunities, and open challenges. Information Processing & Management, 54(4), 545–563.CrossRef Tubishat, M., Idris, N., & Abushariah, M. A. (2018). Implicit aspect extraction in sentiment analysis: Review, taxonomy, oppportunities, and open challenges. Information Processing & Management, 54(4), 545–563.CrossRef
Zurück zum Zitat Vasile, F., Smirnova, E., & Conneau, A. (2016). Metaprod2vec: Product embeddings using sideinformation for recommendation. In Proceedings of the 10th ACM conference on recommender systems (pp. 225–232). Vasile, F., Smirnova, E., & Conneau, A. (2016). Metaprod2vec: Product embeddings using sideinformation for recommendation. In Proceedings of the 10th ACM conference on recommender systems (pp. 225–232).
Zurück zum Zitat Wu, Y., Xu, J., Jiang, M., Zhang, Y., & Xu, H. (2015). A study of neural word embeddings for named entity recognition in clinical text. In AMIA annual symposium proceedings, American Medical Informatics Association (Vol. 2015, p. 1326). Wu, Y., Xu, J., Jiang, M., Zhang, Y., & Xu, H. (2015). A study of neural word embeddings for named entity recognition in clinical text. In AMIA annual symposium proceedings, American Medical Informatics Association (Vol. 2015, p. 1326).
Zurück zum Zitat Xing, C., Wang, D., Liu, C., & Lin, Y. (2015). Normalized word embedding and orthogonal transform for bilingual word translation. In Proceedings of the 2015 conference of the North American chapter of the association for computational linguistics: Human language technologies (pp. 1006–1011). Xing, C., Wang, D., Liu, C., & Lin, Y. (2015). Normalized word embedding and orthogonal transform for bilingual word translation. In Proceedings of the 2015 conference of the North American chapter of the association for computational linguistics: Human language technologies (pp. 1006–1011).
Zurück zum Zitat Ye, Z., Li, F., & Baldwin, T. (2018). Encoding sentiment information into word vectors for sentiment analysis. In Proceedings of the 27th international conference on computational linguistics (pp. 997–1007). Ye, Z., Li, F., & Baldwin, T. (2018). Encoding sentiment information into word vectors for sentiment analysis. In Proceedings of the 27th international conference on computational linguistics (pp. 997–1007).
Zurück zum Zitat Yih, W., Zweig, G., & Platt, J. C. (2012). Polarity inducing latent semantic analysis. In Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning, Association for Computational Linguistics (pp. 1212–1222). Yih, W., Zweig, G., & Platt, J. C. (2012). Polarity inducing latent semantic analysis. In Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning, Association for Computational Linguistics (pp. 1212–1222).
Zurück zum Zitat Zahran, M. A., Magooda, A., Mahgoub, A. Y., Raafat, H., Rashwan, M., & Atyia, A. (2015). Word representations in vector space and their applications for arabic. In International conference on intelligent text processing and computational linguistics, Springer (pp. 430–443). Zahran, M. A., Magooda, A., Mahgoub, A. Y., Raafat, H., Rashwan, M., & Atyia, A. (2015). Word representations in vector space and their applications for arabic. In International conference on intelligent text processing and computational linguistics, Springer (pp. 430–443).
Zurück zum Zitat Zhang, J., Salwen, J., Glass, M., & Gliozzo, A. (2014). Word semantic representations using Bayesian probabilistic tensor factorization. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1522–1531). Zhang, J., Salwen, J., Glass, M., & Gliozzo, A. (2014). Word semantic representations using Bayesian probabilistic tensor factorization. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1522–1531).
Zurück zum Zitat Zitouni, I. (2014). Natural language processing of semitic languages. Springer.CrossRef Zitouni, I. (2014). Natural language processing of semitic languages. Springer.CrossRef
Zurück zum Zitat Zou, W. Y., Socher, R., Cer, D., & Manning, C. D. (2013). Bilingual word embeddings for phrase-based machine translation. In Proceedings of the 2013 conference on empirical methods in natural language processing (pp. 1393–1398) Zou, W. Y., Socher, R., Cer, D., & Manning, C. D. (2013). Bilingual word embeddings for phrase-based machine translation. In Proceedings of the 2013 conference on empirical methods in natural language processing (pp. 1393–1398)
Metadaten
Titel
ArSphere: Arabic word vectors embedded in a polar sphere
verfasst von
Sandra Rizkallah
Amir F. Atiya
Samir Shaheen
Hossam ElDin Mahgoub
Publikationsdatum
03.03.2022
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 1/2023
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-022-09966-9

Weitere Artikel der Ausgabe 1/2023

International Journal of Speech Technology 1/2023 Zur Ausgabe

Neuer Inhalt