Skip to main content

2018 | OriginalPaper | Buchkapitel

Combining Lexical Features and a Supervised Learning Approach for Arabic Sentiment Analysis

verfasst von : Samhaa R. El-Beltagy, Talaat Khalil, Amal Halaby, Muhammad Hammad

Erschienen in: Computational Linguistics and Intelligent Text Processing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The importance of building sentiment analysis tools for Arabic social media has been recognized during the past couple of years, especially with the rapid increase in the number of Arabic social media users. One of the main difficulties in tackling this problem is that text within social media is mostly colloquial, with many dialects being used within social media platforms. In this paper, we present a set of features that were integrated with a machine learning based sentiment analysis model and applied on Egyptian, Saudi, Levantine, and MSA Arabic social media datasets. Many of the proposed features were derived through the use of an Arabic Sentiment Lexicon. The model also presents emoticon based features, as well as input text related features such as the number of segments within the text, the length of the text, whether the text ends with a question mark or not, etc. We show that the presented features have resulted in an increased accuracy across six of the seven datasets we’ve experimented with and which are all benchmarked. Since the developed model outperforms all existing Arabic sentiment analysis systems that have publicly available datasets, we can state that this model presents state-of-the-art in Arabic sentiment analysis.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The system was the best performer in SemEval 2013 and SemEval 2014 with respect to the message level polarity detection task [9, 10].
 
2
The version provided to us by the authors had 4820 tweets.
 
Literatur
1.
Zurück zum Zitat Abdulla, N.A., Ahmed, N.A., Shehab, M.A., Al-Ayyoub, M.: Arabic sentiment analysis: lexicon-based and corpus-based. In: 2013 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT), pp. 1–6. IEEE, Amman (2013) Abdulla, N.A., Ahmed, N.A., Shehab, M.A., Al-Ayyoub, M.: Arabic sentiment analysis: lexicon-based and corpus-based. In: 2013 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT), pp. 1–6. IEEE, Amman (2013)
2.
Zurück zum Zitat Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M., Euler, T.: YALE: rapid prototyping for complex data mining tasks. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 935–940 (2006) Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M., Euler, T.: YALE: rapid prototyping for complex data mining tasks. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 935–940 (2006)
3.
Zurück zum Zitat Shoukry, A., Rafea, A.: Preprocessing Egyptian dialect tweets for sentiment mining. In: Proceedings of 4th Workshop on Computational Approaches to Arabic Script-Based Languages, San Diego, California, USA, pp. 47–56 (2012) Shoukry, A., Rafea, A.: Preprocessing Egyptian dialect tweets for sentiment mining. In: Proceedings of 4th Workshop on Computational Approaches to Arabic Script-Based Languages, San Diego, California, USA, pp. 47–56 (2012)
4.
Zurück zum Zitat Witten, I.H., Frank, E., Trigg, L., Hall, M., Holmes, G., Cunningham, S.J.: Weka: practical machine learning tools and techniques with Java implementations. In: Seminar, vol. 99, pp. 192–196 (1999) Witten, I.H., Frank, E., Trigg, L., Hall, M., Holmes, G., Cunningham, S.J.: Weka: practical machine learning tools and techniques with Java implementations. In: Seminar, vol. 99, pp. 192–196 (1999)
5.
Zurück zum Zitat El-Beltagy, S.R., Rafea, A.: An accuracy enhanced light stemmer for Arabic text. ACM Trans. Speech Lang. Process. 7, 2–23 (2011)CrossRef El-Beltagy, S.R., Rafea, A.: An accuracy enhanced light stemmer for Arabic text. ACM Trans. Speech Lang. Process. 7, 2–23 (2011)CrossRef
6.
Zurück zum Zitat Salamah, J.B., Elkhlifi, A.: Microblogging opinion mining approach for Kuwaiti dialect. In: International Conference on Computing Technology and Information Management (ICCTIM 2014), pp. 388–396 (2014) Salamah, J.B., Elkhlifi, A.: Microblogging opinion mining approach for Kuwaiti dialect. In: International Conference on Computing Technology and Information Management (ICCTIM 2014), pp. 388–396 (2014)
7.
Zurück zum Zitat Duwairi, R., Marji, R., Sha’ban, N., Rushaidat, S.: Sentiment analysis in Arabic tweets. In: 2014 5th International Conference Information and Communication Systems (ICICS), pp. 1–6. IEEE, Irbid (2014) Duwairi, R., Marji, R., Sha’ban, N., Rushaidat, S.: Sentiment analysis in Arabic tweets. In: 2014 5th International Conference Information and Communication Systems (ICICS), pp. 1–6. IEEE, Irbid (2014)
8.
Zurück zum Zitat Salameh, M., Mohammad, S., Kiritchenko, S.: Sentiment after translation: a case-study on Arabic social media posts. In: Proceedings of 2015 Conference of the North American Chapter of Association for Computational Linguistics: Human Language Technologies, pp. 767–777. Association for Computational Linguistics, Denver (2015) Salameh, M., Mohammad, S., Kiritchenko, S.: Sentiment after translation: a case-study on Arabic social media posts. In: Proceedings of 2015 Conference of the North American Chapter of Association for Computational Linguistics: Human Language Technologies, pp. 767–777. Association for Computational Linguistics, Denver (2015)
9.
Zurück zum Zitat Mohammad, S.M., Kiritchenko, S., Zhu, X.: NRC-Canada: building the state-of-the-art in sentiment analysis of tweets. In: Proceedings of 7th International Workshop on Semantic Evaluation Exercises (SemEval-2013), Atlanta, Georgia, USA (2013) Mohammad, S.M., Kiritchenko, S., Zhu, X.: NRC-Canada: building the state-of-the-art in sentiment analysis of tweets. In: Proceedings of 7th International Workshop on Semantic Evaluation Exercises (SemEval-2013), Atlanta, Georgia, USA (2013)
10.
Zurück zum Zitat Kiritchenko, S., Zhu, X., Mohammad, S.: Sentiment analysis of short informal texts. J. Artif. Intell. Res. 50, 723–762 (2014) Kiritchenko, S., Zhu, X., Mohammad, S.: Sentiment analysis of short informal texts. J. Artif. Intell. Res. 50, 723–762 (2014)
11.
Zurück zum Zitat Mourad, A., Darwish, K.: Subjectivity and sentiment analysis of modern standard Arabic and Arabic microblogs. In: Proceedings of 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 55–64 (2013) Mourad, A., Darwish, K.: Subjectivity and sentiment analysis of modern standard Arabic and Arabic microblogs. In: Proceedings of 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 55–64 (2013)
12.
Zurück zum Zitat Refaee, E., Rieser, V.: Subjectivity and sentiment analysis of Arabic Twitter feeds with limited resources. In: Proceedings of Workshop on Free/Open-Source Arabic corpora and corpora processing tools, Reykjavik, Iceland, pp. 16–21 (2014) Refaee, E., Rieser, V.: Subjectivity and sentiment analysis of Arabic Twitter feeds with limited resources. In: Proceedings of Workshop on Free/Open-Source Arabic corpora and corpora processing tools, Reykjavik, Iceland, pp. 16–21 (2014)
13.
Zurück zum Zitat Shoukry, A., Rafea, A.: A hybrid approach for sentiment classification of egyptian dialect tweets. In: 1st International Conference on Arabic Computational Linguistics (ACLing), Cairo, Egypt, pp. 78–85 (2015) Shoukry, A., Rafea, A.: A hybrid approach for sentiment classification of egyptian dialect tweets. In: 1st International Conference on Arabic Computational Linguistics (ACLing), Cairo, Egypt, pp. 78–85 (2015)
14.
Zurück zum Zitat Khalil, T., Halaby, A., Hammad, M.H., El-Beltagy, S.R.: Which configuration works best? An experimental study on supervised Arabic twitter sentiment analysis. In: Proceedings of 1st Conference on Arabic Computational Liguistics (ACLing 2015), Co-located with CICLing 2015, Cairo, Egypt, pp. 86–93 (2015) Khalil, T., Halaby, A., Hammad, M.H., El-Beltagy, S.R.: Which configuration works best? An experimental study on supervised Arabic twitter sentiment analysis. In: Proceedings of 1st Conference on Arabic Computational Liguistics (ACLing 2015), Co-located with CICLing 2015, Cairo, Egypt, pp. 86–93 (2015)
15.
Zurück zum Zitat Rennie, J.D.M., Shih, L., Teevan, J., Karger, D.R.: Tackling the poor assumptions of Naive Bayes text classifiers. In: Proceedings of 20th International Conference on Machine Learning (ICML 2003), USA, vol. 20, pp. 616–623 (2003) Rennie, J.D.M., Shih, L., Teevan, J., Karger, D.R.: Tackling the poor assumptions of Naive Bayes text classifiers. In: Proceedings of 20th International Conference on Machine Learning (ICML 2003), USA, vol. 20, pp. 616–623 (2003)
18.
Zurück zum Zitat El-Beltagy, S.R.: NileULex: a phrase and word level sentiment lexicon for Egyptian and modern standard Arabic. In: Proceedings of LREC 2016, Portorož, Slovenia (2016, to appear) El-Beltagy, S.R.: NileULex: a phrase and word level sentiment lexicon for Egyptian and modern standard Arabic. In: Proceedings of LREC 2016, Portorož, Slovenia (2016, to appear)
19.
Zurück zum Zitat El-Beltagy, S.R., Ali, A.: Open issues in the sentiment analysis of Arabic social media: a case study. In: Proceedings of 9th International Conference on Innovations and Information Technology (IIT 2013), Al Ain, UAE (2013) El-Beltagy, S.R., Ali, A.: Open issues in the sentiment analysis of Arabic social media: a case study. In: Proceedings of 9th International Conference on Innovations and Information Technology (IIT 2013), Al Ain, UAE (2013)
20.
Zurück zum Zitat Zayed, O., El-Beltagy, S.R.: Named entity recognition of persons’ names in Arabic tweets. In: Proceedings of Recent Advances in Natural Language Processing (RANLP 2015), Hissar, Bulgaria (2015) Zayed, O., El-Beltagy, S.R.: Named entity recognition of persons’ names in Arabic tweets. In: Proceedings of Recent Advances in Natural Language Processing (RANLP 2015), Hissar, Bulgaria (2015)
21.
Zurück zum Zitat El-Beltagy, S.R., Rafea, A.: LemaLight: a dictionary based Arabic lemmatizer and stemmer (2016) El-Beltagy, S.R., Rafea, A.: LemaLight: a dictionary based Arabic lemmatizer and stemmer (2016)
23.
Zurück zum Zitat Refaee, E., Rieser, V.: An Arabic Twitter Corpus for subjectivity and sentiment analysis. In: Proceedings of 9th Edition of Language Resources and Evaluation Conference (LREC 2014), Iceland (2014) Refaee, E., Rieser, V.: An Arabic Twitter Corpus for subjectivity and sentiment analysis. In: Proceedings of 9th Edition of Language Resources and Evaluation Conference (LREC 2014), Iceland (2014)
24.
Zurück zum Zitat McCallum, A., Nigam, K.: A comparison of event models for Naive Bayes text classification. In: AAAI/ICML-1998 Workshop on Learning for Text Categorization, pp. 41–48 (1998) McCallum, A., Nigam, K.: A comparison of event models for Naive Bayes text classification. In: AAAI/ICML-1998 Workshop on Learning for Text Categorization, pp. 41–48 (1998)
25.
Zurück zum Zitat Chang, C., Lin, C.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 1–39 (2011)CrossRef Chang, C., Lin, C.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 1–39 (2011)CrossRef
26.
Zurück zum Zitat El-Bletagy, S.R.: NileTMRG: deriving prior polarities for Arabic sentiment terms. In: Proceedings of SemEval 2016, San Diego, California (2014, submitted) El-Bletagy, S.R.: NileTMRG: deriving prior polarities for Arabic sentiment terms. In: Proceedings of SemEval 2016, San Diego, California (2014, submitted)
Metadaten
Titel
Combining Lexical Features and a Supervised Learning Approach for Arabic Sentiment Analysis
verfasst von
Samhaa R. El-Beltagy
Talaat Khalil
Amal Halaby
Muhammad Hammad
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-75487-1_24

Premium Partner