Skip to main content
Erschienen in: Cognitive Computation 1/2019

24.09.2018

A Study of Arabic Social Media Users—Posting Behavior and Author’s Gender Prediction

verfasst von: Abdulrahman I. Al-Ghadir, Aqil M. Azmi

Erschienen in: Cognitive Computation | Ausgabe 1/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Social media opens up numerous possibilities to study human interaction and collective behavior in an unprecedented scale. It opened a whole new venue for research under the name “social computing”. Researchers are interested in profiling individuals (e.g., gender, age group), groups, community, and networking. We are interested in studying the collective behavior of Arabic social media users. Most studies covering Arabic social media has focused on the sentiment analysis of, say tweets. This study, however, looks into who and when users interact with the Arabic social media. Specifically, there are two objectives of this work. First, studying the demographic posting behavior of social media users from two different perspectives: gender and educational level. Second, author profiling. Identifying author’s gender of a social media post. We use Saudi Arabia, a very prolific country when it comes to social media in general, as a backdrop for this study. The results in this study are based on mining huge amount of metadata of a popular local social media forum covering the period 2011–14 inclusive. The extracted features (normalized list of k highest scoring words, and likewise for stems) from the posts were used to train classifiers to identify the author’s gender. We used two different classifiers, Support Vector Machine (SVM) with linear kernel and 1-NN (1-nearest neighbor), and experimented with different sizes for the list of features. When the number of features (size of the features vector) is small (≤ 50) both classifiers perform equally well in identifying the author’s gender, but we risk overfitting the data. The classifiers achieved their best result when using 100 features. The 1-NN classifier delivered a better performance, achieving a balanced accuracy of 93.16% vs 87.33% for the SVM in predicting the author’s gender. And for a larger set of features, SVM delivered a better performance and more stable behavior than 1-NN, but still nowhere close to its best performance. We used t test to confirm our assessment that the difference between the performance of both classifiers is statistically significant. Based on that, we recommend using 100 features, and we get our best result using 1-NN with a balanced accuracy of 93.16%.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
We will be following Buckwalter transliteration scheme http://​www.​qamus.​org/​transliteration.​htm [last accessed Sep 28, 2017], for those having difficulty reading the Arabic script.
 
2
Muslims pray five times a day. These are Fajr (dawn); Dhuhr (noon); Asr (afternoon); Maghrib (sunset); and Isha (night). The prayer times are dynamic and are impacted by the Sun movement, and accordingly they vary between seasons.
 
3
The ninth month in the Islamic lunar calendar. The fasting typically starts at dawn and ends by sunset.
 
4
We failed to find the exact date when Twitter was launched in the kingdom; however, using the service https://​discover.​twitter.​com/​first-tweet, and going through the first tweet of some of the popular local celebrities, we were able to confirm that Twitter was introduced around Q3/2010.
 
5
In US educational system, these will be roughly known as Elementary, Middle, and High schools (respectively).
 
Literatur
1.
Zurück zum Zitat AbdulMageed M, Diab M, Kubler S. SAMAR: a system for subjectivity and sentiment analysis for Arabic social media. Comput Speech Lang 2014;28(1):20–37.CrossRef AbdulMageed M, Diab M, Kubler S. SAMAR: a system for subjectivity and sentiment analysis for Arabic social media. Comput Speech Lang 2014;28(1):20–37.CrossRef
2.
Zurück zum Zitat Ahlqvist T, Back A, Halonen M, Heinonen S. 2008. Social media roadmaps: exploring the futures triggered by social media. Tech. Rep. VTT Tiedotteita Research Notes 2454. Espoo. Ahlqvist T, Back A, Halonen M, Heinonen S. 2008. Social media roadmaps: exploring the futures triggered by social media. Tech. Rep. VTT Tiedotteita Research Notes 2454. Espoo.
4.
Zurück zum Zitat Alabdullatif A, Shahzad B, Alwagait E. Classification of Arabic Twitter users: a study based on user behaviour and interests. Mob Inf Syst. 2016:Article 8315,281. Alabdullatif A, Shahzad B, Alwagait E. Classification of Arabic Twitter users: a study based on user behaviour and interests. Mob Inf Syst. 2016:Article 8315,281.
5.
Zurück zum Zitat Aldayel H K, Azmi A M. Arabic tweets sentiment analysis a hybrid scheme. J Inf Sci 2016;42(6):782–97.CrossRef Aldayel H K, Azmi A M. Arabic tweets sentiment analysis a hybrid scheme. J Inf Sci 2016;42(6):782–97.CrossRef
6.
Zurück zum Zitat Alowibdi J S, Buy U A, Yu P. Empirical evaluation of profile characteristics for gender classification on Twitter. 12th International Conference on Machine Learning and Applications (ICMLA). IEEE; 2013. p. 365–369. Alowibdi J S, Buy U A, Yu P. Empirical evaluation of profile characteristics for gender classification on Twitter. 12th International Conference on Machine Learning and Applications (ICMLA). IEEE; 2013. p. 365–369.
7.
Zurück zum Zitat Alowibdi J S, Buy U A, Yu P. Language independent gender classification on Twitter. IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM ’13); 2013. p. 739–743. Alowibdi J S, Buy U A, Yu P. Language independent gender classification on Twitter. IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM ’13); 2013. p. 739–743.
8.
Zurück zum Zitat Alowibdi J S, Buy U A, Yu P S. Say it with colors: language-independent gender classification on Twitter. Springer International Publishing. 2014. p. 47–62. Alowibdi J S, Buy U A, Yu P S. Say it with colors: language-independent gender classification on Twitter. Springer International Publishing. 2014. p. 47–62.
10.
Zurück zum Zitat Alwagait E, Shahzad B. Maximization of tweet’s viewership with respect to time. World Symposium on Computer Applications & Research (WSCAR ’14); 2014. p. 1–5. Alwagait E, Shahzad B. Maximization of tweet’s viewership with respect to time. World Symposium on Computer Applications & Research (WSCAR ’14); 2014. p. 1–5.
11.
Zurück zum Zitat Alwagait E, Shahzad B. When are tweets better valued? An empirical study. J Univ Comput Sci 2014;20 (10):1511–21. Alwagait E, Shahzad B. When are tweets better valued? An empirical study. J Univ Comput Sci 2014;20 (10):1511–21.
13.
Zurück zum Zitat Argamon S, Koppel M, Pennebaker J W, Schler J. Automatically profiling the author of an anonymous text. Commun ACM 2009;52(2):119–23.CrossRef Argamon S, Koppel M, Pennebaker J W, Schler J. Automatically profiling the author of an anonymous text. Commun ACM 2009;52(2):119–23.CrossRef
14.
Zurück zum Zitat Azmi A M, Aljafari E A. Modern information retrieval in Arabic—catering to standard and colloquial Arabic users. J Inf Sci 2015;41(4):506–17.CrossRef Azmi A M, Aljafari E A. Modern information retrieval in Arabic—catering to standard and colloquial Arabic users. J Inf Sci 2015;41(4):506–17.CrossRef
15.
Zurück zum Zitat Azmi A M, Almajed R S. A survey of automatic Arabic diacritization techniques. Nat Lang Eng (NLE) 2015; 21(3):477–95.CrossRef Azmi A M, Almajed R S. A survey of automatic Arabic diacritization techniques. Nat Lang Eng (NLE) 2015; 21(3):477–95.CrossRef
17.
Zurück zum Zitat Burger J D, Henderson J, Kim G, Zarrella G. Discriminating gender on Twitter. Proceedings of the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics; 2011. p. 1301–1309. Burger J D, Henderson J, Kim G, Zarrella G. Discriminating gender on Twitter. Proceedings of the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics; 2011. p. 1301–1309.
18.
Zurück zum Zitat Cai F, Chen H. A probabilistic model for information retrieval by mining user behaviors. Cogn Comput 2016; 8(3):494–504.CrossRef Cai F, Chen H. A probabilistic model for information retrieval by mining user behaviors. Cogn Comput 2016; 8(3):494–504.CrossRef
19.
Zurück zum Zitat Chen Z, Lu X, Shen S, Ai W, Liu X, Mei Q. 2017. Through a gender lens: an empirical study of emoji usage over large-scale android users. Computing Research Repository (CoRR) arXiv:1705.05546. Chen Z, Lu X, Shen S, Ai W, Liu X, Mei Q. 2017. Through a gender lens: an empirical study of emoji usage over large-scale android users. Computing Research Repository (CoRR) arXiv:1705.​05546.
21.
Zurück zum Zitat Farghaly A, Shaalan K. Arabic natural language processing: challenges and solutions. ACM Trans Asian Lang Inf Process (TALIP) 2009;8(4):14:1–14:22. Farghaly A, Shaalan K. Arabic natural language processing: challenges and solutions. ACM Trans Asian Lang Inf Process (TALIP) 2009;8(4):14:1–14:22.
24.
Zurück zum Zitat Jue A L, Marr J A, Kassotakis M E. Social media at work: how networking tools propel organizational performance. San Francisco: Jossey-Bass; 2010. Jue A L, Marr J A, Kassotakis M E. Social media at work: how networking tools propel organizational performance. San Francisco: Jossey-Bass; 2010.
26.
Zurück zum Zitat Larose D T. Discovering knowledge in data: an introduction to data mining. New York: Wiley; 2014. Larose D T. Discovering knowledge in data: an introduction to data mining. New York: Wiley; 2014.
27.
Zurück zum Zitat Liu W, Ruths D. What’s in a name? Using first names as features for gender inference in twitter. AAAI Spring symposium: analyzing microtext; 2013. Liu W, Ruths D. What’s in a name? Using first names as features for gender inference in twitter. AAAI Spring symposium: analyzing microtext; 2013.
28.
Zurück zum Zitat Mahalanobis P. On the generalized distance in statistics. Proc Natl Inst Sci (Calcutta) 1936;2(1):49–55. Mahalanobis P. On the generalized distance in statistics. Proc Natl Inst Sci (Calcutta) 1936;2(1):49–55.
29.
Zurück zum Zitat Marquardt J, Farnadi G, Vasudevan G, Moens M F, Davalos S, Teredesai A, De Cock M. Age and gender identification in social media. Proceedings of CLEF 2014 Conference and Labs of the Evaluation Forum; 2014. Marquardt J, Farnadi G, Vasudevan G, Moens M F, Davalos S, Teredesai A, De Cock M. Age and gender identification in social media. Proceedings of CLEF 2014 Conference and Labs of the Evaluation Forum; 2014.
30.
Zurück zum Zitat Miller Z, Dickinson B, Hu W. Gender prediction on Twitter using stream algorithms with N-gram character features. Int J Internet Sci (IJIS) 2012;2(24):143–8. Miller Z, Dickinson B, Hu W. Gender prediction on Twitter using stream algorithms with N-gram character features. Int J Internet Sci (IJIS) 2012;2(24):143–8.
32.
Zurück zum Zitat Peersman C, Daelemans W, Van Vaerenbergh L. Predicting age and gender in online social networks. Proceedings 3rd International Workshop on Search and Mining User-generated Contents (SMUC ’11). New York: ACM; 2011. p. 37–44. Peersman C, Daelemans W, Van Vaerenbergh L. Predicting age and gender in online social networks. Proceedings 3rd International Workshop on Search and Mining User-generated Contents (SMUC ’11). New York: ACM; 2011. p. 37–44.
33.
Zurück zum Zitat Pennacchiotti M, Popescu AM. A machine learning approach to Twitter user classification. Proceedings of the Fifth International Conference on Weblogs and Social Media (ICWSM ’11). Barcelona. Spain; 2011. p. 281–288. Pennacchiotti M, Popescu AM. A machine learning approach to Twitter user classification. Proceedings of the Fifth International Conference on Weblogs and Social Media (ICWSM ’11). Barcelona. Spain; 2011. p. 281–288.
34.
Zurück zum Zitat Rajaraman A, Ullman J D. Mining of massive datasets. Cambridge: Cambridge University Press; 2012. Rajaraman A, Ullman J D. Mining of massive datasets. Cambridge: Cambridge University Press; 2012.
36.
Zurück zum Zitat Sap M, Park G, Eichstaedt J C, Kern M, Stillwell D, Kosinski M, Ungar L H, Schwartz HA. Developing age and gender predictive lexica over social media. Proceedings of 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha; 2014. p. 1146–1151. Sap M, Park G, Eichstaedt J C, Kern M, Stillwell D, Kosinski M, Ungar L H, Schwartz HA. Developing age and gender predictive lexica over social media. Proceedings of 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha; 2014. p. 1146–1151.
37.
Zurück zum Zitat Standage T. 2013. Writing on the wall: social media—the first 2,000 years. Bloomsbury. Standage T. 2013. Writing on the wall: social media—the first 2,000 years. Bloomsbury.
38.
Zurück zum Zitat Sun X, Ding X, Liu T. Gender identification on social media. Berlin: Springer; 2014, pp. 99–107. Sun X, Ding X, Liu T. Gender identification on social media. Berlin: Springer; 2014, pp. 99–107.
39.
Zurück zum Zitat Vinciarelli A, Esposito A, André E, Bonin F, Chetouani M, Cohn J F, Cristani M, Fuhrmann F, Gilmartin E, Hammal Z, Heylen D, Kaiser R, Koutsombogera M, Potamianos A, Renals S, Riccardi G, Salah A A. Open challenges in modeling, analysis and synthesis of human behavior in human–human and human–machine interactions. Cogn Comput 2015;7(4):397–413.CrossRef Vinciarelli A, Esposito A, André E, Bonin F, Chetouani M, Cohn J F, Cristani M, Fuhrmann F, Gilmartin E, Hammal Z, Heylen D, Kaiser R, Koutsombogera M, Potamianos A, Renals S, Riccardi G, Salah A A. Open challenges in modeling, analysis and synthesis of human behavior in human–human and human–machine interactions. Cogn Comput 2015;7(4):397–413.CrossRef
Metadaten
Titel
A Study of Arabic Social Media Users—Posting Behavior and Author’s Gender Prediction
verfasst von
Abdulrahman I. Al-Ghadir
Aqil M. Azmi
Publikationsdatum
24.09.2018
Verlag
Springer US
Erschienen in
Cognitive Computation / Ausgabe 1/2019
Print ISSN: 1866-9956
Elektronische ISSN: 1866-9964
DOI
https://doi.org/10.1007/s12559-018-9592-7

Weitere Artikel der Ausgabe 1/2019

Cognitive Computation 1/2019 Zur Ausgabe

Premium Partner