Skip to main content

2015 | OriginalPaper | Buchkapitel

Leveraging Interactive Knowledge and Unlabeled Data in Gender Classification with Co-training

verfasst von : Jingjing Wang, Yunxia Xue, Shoushan Li, Guodong Zhou

Erschienen in: Database Systems for Advanced Applications

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Conventional approaches to gender classification much rely on a large scale of labeled data, which is normally hard and expensive to obtain. In this paper, we propose a co-training approach to address this problem in gender classification. Specifically, we employ both non-interactive and interactive texts, i.e., the message and comment texts, as two different views in our co-training approach to well incorporate unlabeled data. Experimental results on a large data set from micro-blog demonstrate the appropriateness of leveraging interactive knowledge in gender classification and the effectiveness of the proposed co-training approach in gender classification.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Blum, A., Mitchell, T.: Combing labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory, pp. 92–100 (1998) Blum, A., Mitchell, T.: Combing labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory, pp. 92–100 (1998)
Zurück zum Zitat Corney, M., Vel, O., Anderson, A., Mohay, G.: Gender-preferential text mining of E-mail discourse. In: Proceedings of the 18th Annual Computer Security Applications Conference, pp. 282–289 (2002) Corney, M., Vel, O., Anderson, A., Mohay, G.: Gender-preferential text mining of E-mail discourse. In: Proceedings of the 18th Annual Computer Security Applications Conference, pp. 282–289 (2002)
Zurück zum Zitat Ciot, M., Sonderegger, M., Ruths, D.: Gender inference of twitter users in non-english contexts. In: Proceedings of EMNLP-13, pp. 1136–1145 (2013) Ciot, M., Sonderegger, M., Ruths, D.: Gender inference of twitter users in non-english contexts. In: Proceedings of EMNLP-13, pp. 1136–1145 (2013)
Zurück zum Zitat Gianfortoni, P., Adamson, D., Rosé, C.: Modeling of stylistic variation in social media with stretchy patterns. In: Proceedings of EMNLP-11, pp. 49–59 (2011) Gianfortoni, P., Adamson, D., Rosé, C.: Modeling of stylistic variation in social media with stretchy patterns. In: Proceedings of EMNLP-11, pp. 49–59 (2011)
Zurück zum Zitat Ikeda, D., Takamura, H., Okumura, M.: Semi-supervised learning for blog classification. In: Proceedings of AAAI-08, pp. 1156–1161 (2008) Ikeda, D., Takamura, H., Okumura, M.: Semi-supervised learning for blog classification. In: Proceedings of AAAI-08, pp. 1156–1161 (2008)
Zurück zum Zitat Filippova, K.: User demographics and language in an implicit social network. In: Proceedings of EMNLP-12, pp. 1478–1488 (2012) Filippova, K.: User demographics and language in an implicit social network. In: Proceedings of EMNLP-12, pp. 1478–1488 (2012)
Zurück zum Zitat Heylighen, F., Dewaele, J.: Variation in the contextuality of language: an empirical measure. Proc. Found. Sci. 7, 293–340 (2002)CrossRef Heylighen, F., Dewaele, J.: Variation in the contextuality of language: an empirical measure. Proc. Found. Sci. 7, 293–340 (2002)CrossRef
Zurück zum Zitat Liu, N., He, Y., Chen, Q., Peng, M., Tian, Y.: A new method for micro-blog platform users classification based on infinitesimal-time. J. Inf. Computantional Sci. 10(9), 2569–2579 (2013)CrossRef Liu, N., He, Y., Chen, Q., Peng, M., Tian, Y.: A new method for micro-blog platform users classification based on infinitesimal-time. J. Inf. Computantional Sci. 10(9), 2569–2579 (2013)CrossRef
Zurück zum Zitat Mukherjee, A., Liu, B.: Improving gender classification of blog authors. In: Proceedings of EMNLP-11, pp. 207–217 (2010) Mukherjee, A., Liu, B.: Improving gender classification of blog authors. In: Proceedings of EMNLP-11, pp. 207–217 (2010)
Zurück zum Zitat Nowson, S., Oberlander, J.: The identity of bloggers: openness and gender in personal weblogs. In: Proceedings of AAAI-06, pp. 163–167 (2006) Nowson, S., Oberlander, J.: The identity of bloggers: openness and gender in personal weblogs. In: Proceedings of AAAI-06, pp. 163–167 (2006)
Zurück zum Zitat Peersman, C., Daelemans, W., Vaerenbergh, L.: Predicting age and gender in online social networks. In: SMUC 2010 Proceedings of the 2nd International Workshop on Search and Mining User-generated Contents, pp. 37–44 (2010) Peersman, C., Daelemans, W., Vaerenbergh, L.: Predicting age and gender in online social networks. In: SMUC 2010 Proceedings of the 2nd International Workshop on Search and Mining User-generated Contents, pp. 37–44 (2010)
Zurück zum Zitat Rao, D., Yarowsky, D., Shreevats, A., Gupta, M.: Classifying latent user attributes in twitter. In: Proceeding SMUC 2010 Proceedings of the 2nd International Workshop on Search and Mining User-generated Contents, pp. 37–44 (2010) Rao, D., Yarowsky, D., Shreevats, A., Gupta, M.: Classifying latent user attributes in twitter. In: Proceeding SMUC 2010 Proceedings of the 2nd International Workshop on Search and Mining User-generated Contents, pp. 37–44 (2010)
Zurück zum Zitat Volkova, S., Wilson, T., Yarowsky, D.: Exploring demographic language variations to improve multilingual sentiment analysis in social media. In: Proceedings of EMNLP-13, pp. 1815–1827 (2013) Volkova, S., Wilson, T., Yarowsky, D.: Exploring demographic language variations to improve multilingual sentiment analysis in social media. In: Proceedings of EMNLP-13, pp. 1815–1827 (2013)
Metadaten
Titel
Leveraging Interactive Knowledge and Unlabeled Data in Gender Classification with Co-training
verfasst von
Jingjing Wang
Yunxia Xue
Shoushan Li
Guodong Zhou
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-22324-7_23