Skip to main content

2016 | OriginalPaper | Buchkapitel

I, Me, Mine: The Role of Personal Phrases in Author Profiling

verfasst von : Rosa María Ortega-Mendoza, Anilú Franco-Arcega, Adrián Pastor López-Monroy, Manuel Montes-y-Gómez

Erschienen in: Experimental IR Meets Multilinguality, Multimodality, and Interaction

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The Author Profiling (AP) task aims to distinguish between groups of authors labeled by a common demographic characteristic such as gender or age by studying the language usage. In this work we studied the role of personal phrases (i.e., sentences containing first person pronouns) for the AP task. We support the idea that people better expose their personal interests and writing style when they talk about themselves and, consequently, that words near to a personal pronoun reveal valuable information for the classification of authors. The evaluation using different social media data showed that phrases containing singular first person pronouns are highly valuable for predicting the age and gender of users. Considering only these phrases we obtained reductions of up to 60 % of the information in the user documents and a comparable classification performance than using all available data. In addition, the results obtained by personal phrases considerably outperformed those from non-personal sentences, indicating their greater suitability for the AP task. We consider these findings could be further applied in the design of strategies for the construction of AP corpora, novel feature selection methods, as well as new feature and instance weighting schemes.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
In this context, documents are commonly referred to as user profiles or user histories, and they correspond to all textual information generated by a user, for example, all posts from her blog or the set of tweets from her account.
 
4
POS tags were obtained using Stanford tagger: http://​nlp.​stanford.​edu/​software/​tagger.​shtml.
 
Literatur
1.
Zurück zum Zitat Argamon, S., Dhawle, S., Koppel, M., Pennebaker, J.W.: Lexical predictors of personality type. In: Joint Annual Meeting of the Interface and the Classification Society of North America, St. Louis, MI (2005) Argamon, S., Dhawle, S., Koppel, M., Pennebaker, J.W.: Lexical predictors of personality type. In: Joint Annual Meeting of the Interface and the Classification Society of North America, St. Louis, MI (2005)
2.
Zurück zum Zitat Argamon, S., Koppel, M., Pennebaker, J.W., Schler, J.: Automatically profiling the author of an anonymous text. Commun. ACM 52(2), 119–123 (2009)CrossRef Argamon, S., Koppel, M., Pennebaker, J.W., Schler, J.: Automatically profiling the author of an anonymous text. Commun. ACM 52(2), 119–123 (2009)CrossRef
3.
Zurück zum Zitat Cappellato, L., Ferro, N., Jones, G., San-Juan, E. (eds.): CLEF 2015 Labs and Workshops, Notebook Papers, Toulouse, France, September 2015 Cappellato, L., Ferro, N., Jones, G., San-Juan, E. (eds.): CLEF 2015 Labs and Workshops, Notebook Papers, Toulouse, France, September 2015
4.
Zurück zum Zitat Chung, C.K., Pennebaker, J.W.: The psychological functions of function words. In: Fiedler, K. (ed.) Social Communication: Frontiers of Social Psychology, pp. 343–359. Psychology Press, New York (2007) Chung, C.K., Pennebaker, J.W.: The psychological functions of function words. In: Fiedler, K. (ed.) Social Communication: Frontiers of Social Psychology, pp. 343–359. Psychology Press, New York (2007)
5.
Zurück zum Zitat Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)MathSciNetMATH Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)MathSciNetMATH
6.
Zurück zum Zitat Dietterich, T.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 10, 1895–1923 (1998)CrossRef Dietterich, T.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 10, 1895–1923 (1998)CrossRef
7.
Zurück zum Zitat Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)MATH Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)MATH
8.
Zurück zum Zitat Forner, P., Navigli, R., Tufis, D. (eds.): Notebook Papers of CLEF 2013 LABs and Workshops (CLEF-2013), Valencia, Spain, September 2013 Forner, P., Navigli, R., Tufis, D. (eds.): Notebook Papers of CLEF 2013 LABs and Workshops (CLEF-2013), Valencia, Spain, September 2013
9.
Zurück zum Zitat Goswami, S., Sarkar, S., Rustagi, M.: Stylometric analysis of bloggers’ age and gender. In: Third International ICWSM Conference, pp. 214–217 (2009) Goswami, S., Sarkar, S., Rustagi, M.: Stylometric analysis of bloggers’ age and gender. In: Third International ICWSM Conference, pp. 214–217 (2009)
10.
Zurück zum Zitat Kacewicz, E., Pennebaker, J.W., Davis, M., Moongee, J., Graesser, A.C.: Pronoun use reflects standings in social hierarchies. J. Lang. Soc. Psychol. 33, 125–143 (2013)CrossRef Kacewicz, E., Pennebaker, J.W., Davis, M., Moongee, J., Graesser, A.C.: Pronoun use reflects standings in social hierarchies. J. Lang. Soc. Psychol. 33, 125–143 (2013)CrossRef
11.
Zurück zum Zitat Koppel, M., Argamon, S., Shimoni, A.R.: Automatically categorizing written texts by author gender. Literary Linguist. Comput. 17(4), 401–412 (2002)CrossRef Koppel, M., Argamon, S., Shimoni, A.R.: Automatically categorizing written texts by author gender. Literary Linguist. Comput. 17(4), 401–412 (2002)CrossRef
12.
Zurück zum Zitat López-Monroy, A.P., Montes-y-Gómez, M., Escalante, H.J., Villaseñor-Pineda, L., Villatoro-Tello, E.: INAOE’s participation at PAN’13–Notebook for PAN at CLEF 2013: author profiling task. In: Forner et al. [8] López-Monroy, A.P., Montes-y-Gómez, M., Escalante, H.J., Villaseñor-Pineda, L., Villatoro-Tello, E.: INAOE’s participation at PAN’13–Notebook for PAN at CLEF 2013: author profiling task. In: Forner et al. [8]
13.
Zurück zum Zitat López-Monroy, A.P., Montes-y-Gómez, M., Escalante, H.J., Villaseñor-Pineda, L., Stamatatos, E.: Discriminative subprofile-specific representations for author profiling in social media. Knowl. Based Syst. 89, 134–147 (2015)CrossRef López-Monroy, A.P., Montes-y-Gómez, M., Escalante, H.J., Villaseñor-Pineda, L., Stamatatos, E.: Discriminative subprofile-specific representations for author profiling in social media. Knowl. Based Syst. 89, 134–147 (2015)CrossRef
14.
Zurück zum Zitat Maharjan, S., Solorio, T.: Using wide range of features for author profiling–notebook for PAN at CLEF 2015. In: Cappellato et al. [3] Maharjan, S., Solorio, T.: Using wide range of features for author profiling–notebook for PAN at CLEF 2015. In: Cappellato et al. [3]
15.
Zurück zum Zitat Meina, M., Brodzínska, K., Celmer, B., Czoków, M., Patera, M., Pezacki, J., Wilk, M.: Ensemble-based classification for author profiling using various features-notebook for PAN at CLEF 2013. In: Forner et al. [8] Meina, M., Brodzínska, K., Celmer, B., Czoków, M., Patera, M., Pezacki, J., Wilk, M.: Ensemble-based classification for author profiling using various features-notebook for PAN at CLEF 2013. In: Forner et al. [8]
16.
Zurück zum Zitat Mihalcea, R., Hassan, S.: Using the essence of texts to improve document classification. In: RANLP 2005, Borovetz, Bulgaria (2005) Mihalcea, R., Hassan, S.: Using the essence of texts to improve document classification. In: RANLP 2005, Borovetz, Bulgaria (2005)
17.
Zurück zum Zitat Mukherjee, A., Liu, B.: Improving gender classification of blog authors. In: Conference on Empirical Methods in Natural Language Processing (EMNLP 2010), Stroudsburg, PA, USA, pp. 207–217. Association for Computational Linguistics (2010) Mukherjee, A., Liu, B.: Improving gender classification of blog authors. In: Conference on Empirical Methods in Natural Language Processing (EMNLP 2010), Stroudsburg, PA, USA, pp. 207–217. Association for Computational Linguistics (2010)
18.
Zurück zum Zitat Newman, M.L., Groom, C.J., Handelman, L.D., Pennebaker, J.W.: Gender differences in language use: an analysis of 14,000 text samples. Discourse Process. 45, 211–236 (2008)CrossRef Newman, M.L., Groom, C.J., Handelman, L.D., Pennebaker, J.W.: Gender differences in language use: an analysis of 14,000 text samples. Discourse Process. 45, 211–236 (2008)CrossRef
19.
Zurück zum Zitat Newman, M., Pennebaker, J., Berry, D., Richards, J.: Lying words: predicting deception from linguistic styles. Pers. Soc. Psychol. Bull. 29, 665–675 (2003)CrossRef Newman, M., Pennebaker, J., Berry, D., Richards, J.: Lying words: predicting deception from linguistic styles. Pers. Soc. Psychol. Bull. 29, 665–675 (2003)CrossRef
20.
Zurück zum Zitat Nguyen, D., Smith, N.A., Rosé, C.P.: Author age prediction from text using linear regression. In: 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences and Humanities, pp. 115–123. Association for Computational Linguistics (2011) Nguyen, D., Smith, N.A., Rosé, C.P.: Author age prediction from text using linear regression. In: 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences and Humanities, pp. 115–123. Association for Computational Linguistics (2011)
21.
Zurück zum Zitat Pennachiotti, M., Popescu, A.M.: Democrats, republicans and starbucks afficionados: user classification in Twitter. In: 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, USA, pp. 430–438 (2011) Pennachiotti, M., Popescu, A.M.: Democrats, republicans and starbucks afficionados: user classification in Twitter. In: 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, USA, pp. 430–438 (2011)
22.
Zurück zum Zitat Pennebaker, J.: The Secret Life of Pronouns: What Our Words Say About Us. Bloomsbury, London (2011) Pennebaker, J.: The Secret Life of Pronouns: What Our Words Say About Us. Bloomsbury, London (2011)
23.
Zurück zum Zitat Pennebaker, J., Stone, L.: Words of wisdom: language use over the life span. J. Pers. Soc. Psychol. 85, 291–301 (2003)CrossRef Pennebaker, J., Stone, L.: Words of wisdom: language use over the life span. J. Pers. Soc. Psychol. 85, 291–301 (2003)CrossRef
24.
Zurück zum Zitat Rangel, F., Celli, F., Rosso, P., Potthast, M., Stein, B., Daelemans, W.: Overview of the 3rd author profiling task at PAN 2015. In: Cappellato et al. [3] Rangel, F., Celli, F., Rosso, P., Potthast, M., Stein, B., Daelemans, W.: Overview of the 3rd author profiling task at PAN 2015. In: Cappellato et al. [3]
25.
Zurück zum Zitat Rangel, F., Rosso, P.: Use of language and author profiling: identification of gender and age. In: Workshop on Natural Language Processing and Cognitive Science (NLPCS-2013), Marseille, France (2013) Rangel, F., Rosso, P.: Use of language and author profiling: identification of gender and age. In: Workshop on Natural Language Processing and Cognitive Science (NLPCS-2013), Marseille, France (2013)
26.
Zurück zum Zitat Rangel, F., Rosso, P.: On the multilingual and genre robustness of emographs for author profiling in social media. In: Mothe, J., et al. (eds.) Experimental IR Meets Multilinguality, Multimodality, and Interaction. LNCS, vol. 9283, pp. 274–280. Springer, Heidelberg (2015)CrossRef Rangel, F., Rosso, P.: On the multilingual and genre robustness of emographs for author profiling in social media. In: Mothe, J., et al. (eds.) Experimental IR Meets Multilinguality, Multimodality, and Interaction. LNCS, vol. 9283, pp. 274–280. Springer, Heidelberg (2015)CrossRef
27.
Zurück zum Zitat Rangel, F., Rosso, P.: On the impact of emotions on author profiling. Inf. Process. Manage. 52(1), 73–92 (2016)CrossRef Rangel, F., Rosso, P.: On the impact of emotions on author profiling. Inf. Process. Manage. 52(1), 73–92 (2016)CrossRef
28.
Zurück zum Zitat Rangel, F., Rosso, P., Koppel, M., Stamatatos, E., Inches, G. Overview of the author profiling task at PAN 2013. In: Forner et al. [8] Rangel, F., Rosso, P., Koppel, M., Stamatatos, E., Inches, G. Overview of the author profiling task at PAN 2013. In: Forner et al. [8]
29.
Zurück zum Zitat Rao, D., Yarowsky, D., Shreevats, A., Gupta, M.: Classifying latent user attributes in Twitter. In: Proceedings of SMUC 2010, pp. 710–718 (2010) Rao, D., Yarowsky, D., Shreevats, A., Gupta, M.: Classifying latent user attributes in Twitter. In: Proceedings of SMUC 2010, pp. 710–718 (2010)
30.
Zurück zum Zitat Rude, S., Gortner, E.M., Pennebaker, J.W.: Language use of depressed and depression-vulnerable college students. Cogn. Emot. 18, 1121–1133 (2004)CrossRef Rude, S., Gortner, E.M., Pennebaker, J.W.: Language use of depressed and depression-vulnerable college students. Cogn. Emot. 18, 1121–1133 (2004)CrossRef
31.
Zurück zum Zitat Schler, J., Koppel, M., Argamon, S., Pennebaker, J.W.: Effects of age and gender on blogging. In: AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs, pp. 199–205. AAAI (2006) Schler, J., Koppel, M., Argamon, S., Pennebaker, J.W.: Effects of age and gender on blogging. In: AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs, pp. 199–205. AAAI (2006)
32.
Zurück zum Zitat Schwartz, H.A., Eichstaedt, J.C., Dziurzynski, L., Kern, M.L., Blanco, E., Kosinski, M., Stillwell, D., Seligman, M.E.P., Ungar, L.H.: Toward personality insights from language exploration in social media. In: AAAI Spring Symposium: Analyzing Microtext. AAAI (2013) Schwartz, H.A., Eichstaedt, J.C., Dziurzynski, L., Kern, M.L., Blanco, E., Kosinski, M., Stillwell, D., Seligman, M.E.P., Ungar, L.H.: Toward personality insights from language exploration in social media. In: AAAI Spring Symposium: Analyzing Microtext. AAAI (2013)
33.
Zurück zum Zitat Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)CrossRef Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)CrossRef
34.
Zurück zum Zitat Sidorov, G., Miranda Jiménez, S., Viveros Jiménez, F., Gelbukh, A., Castro Sánchez, N., Velásquez, F., Díaz Rangel, I., Suárez Guerra, S., Treviño, A., Gordon, J.: Empirical study of opinion mining in spanish tweets. LNAI, pp. 7629–7630 (2012) Sidorov, G., Miranda Jiménez, S., Viveros Jiménez, F., Gelbukh, A., Castro Sánchez, N., Velásquez, F., Díaz Rangel, I., Suárez Guerra, S., Treviño, A., Gordon, J.: Empirical study of opinion mining in spanish tweets. LNAI, pp. 7629–7630 (2012)
Metadaten
Titel
I, Me, Mine: The Role of Personal Phrases in Author Profiling
verfasst von
Rosa María Ortega-Mendoza
Anilú Franco-Arcega
Adrián Pastor López-Monroy
Manuel Montes-y-Gómez
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-44564-9_9

Premium Partner