Skip to main content

2015 | OriginalPaper | Buchkapitel

Predicting Emotion Labels for Chinese Microblog Texts

verfasst von : Zheng Yuan, Matthew Purver

Erschienen in: Advances in Social Media Analysis

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We describe an experiment into detecting emotions in texts on the Chinese microblog service Sina Weibo (www.​weibo.​com) using distant supervision via various author-supplied emotion labels (emoticons and smilies). Existing word segmentation tools proved unreliable; better accuracy was achieved using character-based features. Higher-order n-grams proved to be useful features. Accuracy varied according to label and emotion: while smilies are used more often, emoticons are more reliable. Happiness is the most accurately predicted emotion, with accuracies around 90 % on both distant and gold-standard labels. This approach works well and achieves high accuracies for happiness and anger, while it is less effective for sadness, surprise, disgust and fear, which are also difficult for human annotators to detect.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Agichtein, E., Castillo, C., Donato, D., Gionis, A., Mishne, G. (2008) Finding high-quality content in social media. In: Proceedings of the 2008 International Conference on Web Search and Data Mining (WSDM’08). pp. 183–194 Agichtein, E., Castillo, C., Donato, D., Gionis, A., Mishne, G. (2008) Finding high-quality content in social media. In: Proceedings of the 2008 International Conference on Web Search and Data Mining (WSDM’08). pp. 183–194
2.
Zurück zum Zitat Bloodgood, M., Callison-Burch, C.: Bucking the trend: large-scale cost-focused active learning for statistical machine translation. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 854–864. Uppsala, Sweden (2010) Bloodgood, M., Callison-Burch, C.: Bucking the trend: large-scale cost-focused active learning for statistical machine translation. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 854–864. Uppsala, Sweden (2010)
4.
Zurück zum Zitat Chen, K., Liu, S.: Word identification for Mandarin Chinese sentences. In: Proceedings of the 14th Conference on Computational Linguistics, (1992), vol. 1, pp. 101–107 Chen, K., Liu, S.: Word identification for Mandarin Chinese sentences. In: Proceedings of the 14th Conference on Computational Linguistics, (1992), vol. 1, pp. 101–107
7.
Zurück zum Zitat Callison-Burch, C.: Fast, Cheap, and Creative: evaluating translation quality using Amazons mechanical turk. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP-2009), pp. 286–295. Singapore (2009) Callison-Burch, C.: Fast, Cheap, and Creative: evaluating translation quality using Amazons mechanical turk. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP-2009), pp. 286–295. Singapore (2009)
8.
Zurück zum Zitat Chuang, Z., Wu, C.: Multimodal emotion recognition from speech and text. Comput. Linguist. Chin. Lang. 9(2), 45–62 (2004) Chuang, Z., Wu, C.: Multimodal emotion recognition from speech and text. Comput. Linguist. Chin. Lang. 9(2), 45–62 (2004)
9.
Zurück zum Zitat Dave, K., Lawrence, S., Pennock, D.M.: Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: WWW2003, pp. 519–528 Dave, K., Lawrence, S., Pennock, D.M.: Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: WWW2003, pp. 519–528
10.
Zurück zum Zitat Derks, D., Bos, A., von Grumbkow, J.: Emoticons and online message interpretation. Soc. Sci. Comput. Rev. 26(3), 379–388 (2008)CrossRef Derks, D., Bos, A., von Grumbkow, J.: Emoticons and online message interpretation. Soc. Sci. Comput. Rev. 26(3), 379–388 (2008)CrossRef
11.
Zurück zum Zitat Ekman, P.: Universal facial expressions of emotion. In: California Mental Health Research Digest, vol. 8, no. 4 (1970) Ekman, P.: Universal facial expressions of emotion. In: California Mental Health Research Digest, vol. 8, no. 4 (1970)
12.
Zurück zum Zitat Fan, C., Tsai, W.: Automatic word identification in Chinese sentences by the relaxation technique. In: Computer Processing of Chinese and Oriental Languages (1988) Fan, C., Tsai, W.: Automatic word identification in Chinese sentences by the relaxation technique. In: Computer Processing of Chinese and Oriental Languages (1988)
13.
Zurück zum Zitat Fan, R., Chang, K., Hsieh, C., Wang, X., Lin, C.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9(2008), 1871–1874 (2008)MATH Fan, R., Chang, K., Hsieh, C., Wang, X., Lin, C.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9(2008), 1871–1874 (2008)MATH
14.
Zurück zum Zitat Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289–1305 (2003)MATH Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289–1305 (2003)MATH
15.
Zurück zum Zitat Gan, K., Palmer, M., Lua, K.: A statistically emergent approach for language processing: application to modeling context effects in ambiguous Chinese word boundary perception. Comput. Linguist. 22(4), 53153 (1996) Gan, K., Palmer, M., Lua, K.: A statistically emergent approach for language processing: application to modeling context effects in ambiguous Chinese word boundary perception. Comput. Linguist. 22(4), 53153 (1996)
16.
Zurück zum Zitat Geisser, S.: The predictive sample reuse method with applications. In: Journal of the American Statistical Association, pp. 320–328 (1975) Geisser, S.: The predictive sample reuse method with applications. In: Journal of the American Statistical Association, pp. 320–328 (1975)
17.
Zurück zum Zitat Go, A., Bhayani, R., Huang, L.: Twitter Sentiment Classification using Distant Supervision. Master’s thesis, Stanford University (2009) Go, A., Bhayani, R., Huang, L.: Twitter Sentiment Classification using Distant Supervision. Master’s thesis, Stanford University (2009)
18.
Zurück zum Zitat Guo, J.: Critical tokenization and its properties. Comput. Linguist. 23(4), 569596 (1997) Guo, J.: Critical tokenization and its properties. Comput. Linguist. 23(4), 569596 (1997)
19.
Zurück zum Zitat Hatzivassiloglou, V., Wiebe, J.M.: Effects of adjective orientation and gradability on sentence subjectivity. In: Proceedings of the 18th International Conference on Computational Linguistics (2000) Hatzivassiloglou, V., Wiebe, J.M.: Effects of adjective orientation and gradability on sentence subjectivity. In: Proceedings of the 18th International Conference on Computational Linguistics (2000)
20.
Zurück zum Zitat Jiang, W., Huang, L., Liu, Q.: Automatic adaptation of annotation standards: Chinese word segmentation and pos tagging a case study. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp. 522–530. Suntec, Singapore (2009) Jiang, W., Huang, L., Liu, Q.: Automatic adaptation of annotation standards: Chinese word segmentation and pos tagging a case study. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp. 522–530. Suntec, Singapore (2009)
21.
Zurück zum Zitat Jin, W., Chen, L.: Identifying unknown words in Chinese corpora. In: First Workshop on Chinese Language, University of Pennsylvania, Philadelphia (1998) Jin, W., Chen, L.: Identifying unknown words in Chinese corpora. In: First Workshop on Chinese Language, University of Pennsylvania, Philadelphia (1998)
22.
Zurück zum Zitat Joachims, T.: Text categorization with suport vector machines: learning with many relevant features. In: Proceedings of the 10th European Conference on Machine Learning (ECML’08), pp. 137–142 (1998) Joachims, T.: Text categorization with suport vector machines: learning with many relevant features. In: Proceedings of the 10th European Conference on Machine Learning (ECML’08), pp. 137–142 (1998)
23.
Zurück zum Zitat Kayan, S., Fussell, S.R., Setlock, L.D.: Cultural differences in the use of instant messaging in Asia and North America. In: Proceedings of the 20th Anniversary Conference on Computer Supported Cooperative Work (CSCW’06), pp. 525–528. Banff, Alberta, Canada (2006) Kayan, S., Fussell, S.R., Setlock, L.D.: Cultural differences in the use of instant messaging in Asia and North America. In: Proceedings of the 20th Anniversary Conference on Computer Supported Cooperative Work (CSCW’06), pp. 525–528. Banff, Alberta, Canada (2006)
24.
Zurück zum Zitat Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI). Morgan Kaufmann, San Mateo (1995) Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI). Morgan Kaufmann, San Mateo (1995)
25.
Zurück zum Zitat Nakov, P.: Noun compound interpretation using paraphrasing verbs: feasibility study. In: Proceedings of the 13th International Conference on Artificial Intelligence: Methodology, Systems and Applications (AIMSA 2008), pp. 103–117 Nakov, P.: Noun compound interpretation using paraphrasing verbs: feasibility study. In: Proceedings of the 13th International Conference on Artificial Intelligence: Methodology, Systems and Applications (AIMSA 2008), pp. 103–117
26.
Zurück zum Zitat Pak, A., Paroubek, P.: Twitter as a corpus for sentiment analysis and opinion mining. In: Proceedings of the 7th Conference on International Language Resources and Evaluation (LREC’10). Valletta, Malta (2010) Pak, A., Paroubek, P.: Twitter as a corpus for sentiment analysis and opinion mining. In: Proceedings of the 7th Conference on International Language Resources and Evaluation (LREC’10). Valletta, Malta (2010)
27.
Zurück zum Zitat Pang, B., Lee, L.: Opinion mining and sentiment analysis. In: Foundations and Trends in Information Retrieval (2008) Pang, B., Lee, L.: Opinion mining and sentiment analysis. In: Foundations and Trends in Information Retrieval (2008)
28.
Zurück zum Zitat Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of Empirical Methods in Natural Language Processing, (2002), pp. 79–86 Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of Empirical Methods in Natural Language Processing, (2002), pp. 79–86
29.
Zurück zum Zitat Provine, R., Spencer, R., Mandell, D.: Emotional expression online: emoticons punctuate website text messages. J. Lang. Soc. Psychol. 26(3), 299–307 (2007)CrossRef Provine, R., Spencer, R., Mandell, D.: Emotional expression online: emoticons punctuate website text messages. J. Lang. Soc. Psychol. 26(3), 299–307 (2007)CrossRef
30.
Zurück zum Zitat Ptaszynski, M., Maciejewski, J., Dybala, P., Rzepka, R., Araki, K.: CAO: A fully automatic emoticon analysis system based on theory of kinesics. In: Affective Computing, IEEE Transactions (2010) Ptaszynski, M., Maciejewski, J., Dybala, P., Rzepka, R., Araki, K.: CAO: A fully automatic emoticon analysis system based on theory of kinesics. In: Affective Computing, IEEE Transactions (2010)
31.
Zurück zum Zitat Purver, M., Battersby, S.: Experimenting with distant supervision for emotion classification. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pp. 482–491. Avignon, France (2012) Purver, M., Battersby, S.: Experimenting with distant supervision for emotion classification. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pp. 482–491. Avignon, France (2012)
32.
Zurück zum Zitat Read, J.: Using emoticons to reduce dependency in machine learning techniques for sentiment classification. In: Proceedings of the ACL Student Research Workshop, pp. 43–48. Ann Arbor, Michigan (2005) Read, J.: Using emoticons to reduce dependency in machine learning techniques for sentiment classification. In: Proceedings of the ACL Student Research Workshop, pp. 43–48. Ann Arbor, Michigan (2005)
33.
Zurück zum Zitat Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)CrossRef Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)CrossRef
34.
Zurück zum Zitat Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast but is it good? Evaluating non-expert annotations for natural language tasks. In: Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP-2008). Honolulu, Hawaii (2008) Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast but is it good? Evaluating non-expert annotations for natural language tasks. In: Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP-2008). Honolulu, Hawaii (2008)
35.
Zurück zum Zitat Sproat, R., Shih, C.: A statistical method for finding word boundaries in Chinese text. In: Computer Processing of Chinese and Oriental Languages (1990) Sproat, R., Shih, C.: A statistical method for finding word boundaries in Chinese text. In: Computer Processing of Chinese and Oriental Languages (1990)
36.
Zurück zum Zitat Sun, W.: Word-based and characterbased word segmentation models: Comparison and combination. In: Coling 2010: Posters, pp. 1211–1219. Beijing, China (2010) Sun, W.: Word-based and characterbased word segmentation models: Comparison and combination. In: Coling 2010: Posters, pp. 1211–1219. Beijing, China (2010)
37.
Zurück zum Zitat Sun, X., Zhang, Y., Matsuzaki, T., Tsuruoka, Y., Tsujii, J.: A discriminative latent variable Chinese segmenter with hybrid word/character information. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 56–64. Boulder, Colorado (2009) Sun, X., Zhang, Y., Matsuzaki, T., Tsuruoka, Y., Tsujii, J.: A discriminative latent variable Chinese segmenter with hybrid word/character information. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 56–64. Boulder, Colorado (2009)
39.
Zurück zum Zitat Tseng, H., Chang, P., Andrew, G., Jurafsky, D., Manning, C.: A conditional random field word segmenter. In: Proceedings of the 4th SIGHAN Workshop on Chinese Language Processing (2005) Tseng, H., Chang, P., Andrew, G., Jurafsky, D., Manning, C.: A conditional random field word segmenter. In: Proceedings of the 4th SIGHAN Workshop on Chinese Language Processing (2005)
40.
Zurück zum Zitat Tsutsumi, K., Shimada, K., Endo, T.: Movie review classification based on a multiple classifier. In: Proceedings of the 21st Pacific Asia Conforence on Language, Information and Computation (PACLIC) (2007) Tsutsumi, K., Shimada, K., Endo, T.: Movie review classification based on a multiple classifier. In: Proceedings of the 21st Pacific Asia Conforence on Language, Information and Computation (PACLIC) (2007)
41.
Zurück zum Zitat Turney, P.D.: Thumbs Up or Thumbs Down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 417–424. Philadelphia (2002) Turney, P.D.: Thumbs Up or Thumbs Down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 417–424. Philadelphia (2002)
42.
Zurück zum Zitat Vapnik, V.N.: The Nature of Statistical Learning Theory (1995) Vapnik, V.N.: The Nature of Statistical Learning Theory (1995)
43.
Zurück zum Zitat Wu, A.: Customizable segmentation of morphologically derived Words in Chinese. In: Computational Linguistics and Chinese Language (2003) Wu, A.: Customizable segmentation of morphologically derived Words in Chinese. In: Computational Linguistics and Chinese Language (2003)
44.
Zurück zum Zitat Xue, N.: Chinese word segmentation as character tagging. In: International Journal of Computational Linguistics and Chinese Language Processing (2003) Xue, N.: Chinese word segmentation as character tagging. In: International Journal of Computational Linguistics and Chinese Language Processing (2003)
45.
Zurück zum Zitat Yessenov, K., Misailovic, S.: Sentiment analysis of movie review comments. In: Methodology (2009), pp. 1–17 Yessenov, K., Misailovic, S.: Sentiment analysis of movie review comments. In: Methodology (2009), pp. 1–17
46.
Zurück zum Zitat Yuasa, M., Saito, K., Mukawa, N.: Emoticons convey emotions without cognition of faces: an fMRI study. In: CHI 06 Extended Abstracts on Human Factors in ComputingSystems (2006), pp. 1565–1570 Yuasa, M., Saito, K., Mukawa, N.: Emoticons convey emotions without cognition of faces: an fMRI study. In: CHI 06 Extended Abstracts on Human Factors in ComputingSystems (2006), pp. 1565–1570
Metadaten
Titel
Predicting Emotion Labels for Chinese Microblog Texts
verfasst von
Zheng Yuan
Matthew Purver
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-18458-6_7