Skip to main content
Erschienen in: Information Systems Frontiers 6/2015

01.12.2015

Sentiment analysis for Chinese reviews of movies in multi-genre based on morpheme-based features and collocations

verfasst von: Heng-Li Yang, August F. Y. Chao

Erschienen in: Information Systems Frontiers | Ausgabe 6/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The application of sentiment analysis, also known as opinion mining, is more difficult in Chinese than in Indo-European languages, due to the compounding nature of Chinese words and phrases, and relatively lack of reliable resources in Chinese. This study used seed words, Chinese morphemes, which are mono-syllabic characters that function as individual words or be combined to create Chinese words and phrases, to classify movie reviews found on Yahoo! Taiwan. We utilized higher Pointwise Mutual Information (PMI) collocations, which consist of selected morpheme-level compounded features to build classifiers. The contributions of this study include the following: (Bird 2006) proposing a method of generating domain-dependent Chinese morphemes directly from large data set without any predefined sentimental resources; (Bradley and Lang 1999) building morpheme-based classifiers applicable in various movie genres, and shown to produce better results than other classifiers based on keywords (NTUSD and HowNet) or feature selection (TFIDF); (Church and Hanks in Computational linguistics, 16(1), 22-29 1990) identifying compounds that have different semantic polarities depending on contexts.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
A Part-Of-Speech Tagger (P.O.S Tagger) is software that reads text and designates each word as a part of speech (and other token), such as noun, verb, adjective. The Part-of-speech tools from SINICA CKIP are available at http://​ckipsvr.​iis.​sinica.​edu.​tw/​
 
2
Collected Taiwan Yahoo!Movies Corpus with P.O.S Tags from CKIP, https://​github.​com/​fychao/​ChineseMovieRevi​ews
 
3
Simplified/traditional Chinese conversion tables include parallel translation of common words/phrases in Taiwan, China, Hong Kong, and Singapore, and can be retrieved from following link: http://​svn.​wikimedia.​org/​svnroot/​mediawiki/​trunk/​phase3/​includes/​ZhConversion.​php
 
5
Natural Language Toolkit 2.0 https://​github.​com/​nltk
 
6
Ten folding cross-validation is a process that chunks training dataset into 10 equal-lot of subsets, and then uses one subset for testing and others for training sequentially. Therefore, the validation process involves 10 iterations of training and testing procedures.
 
Literatur
Zurück zum Zitat Bird, S. (2006). NLTK: The natural language toolkit. In Proceedings of the COLING/ACL on Interactive presentation sessions (pp. 69–72). Sydney, Australia. Bird, S. (2006). NLTK: The natural language toolkit. In Proceedings of the COLING/ACL on Interactive presentation sessions (pp. 69–72). Sydney, Australia.
Zurück zum Zitat Bradley, M. M. and P. J. Lang (1999). Affective norms for English words (ANEW): Instruction manual and affective ratings, Technical Report C-1, The Center for Research in Psychophysiology, University of Florida. Bradley, M. M. and P. J. Lang (1999). Affective norms for English words (ANEW): Instruction manual and affective ratings, Technical Report C-1, The Center for Research in Psychophysiology, University of Florida.
Zurück zum Zitat Church, K. W., & Hanks, P. (1990). Word association norms, mutual information, and lexicography. Computational linguistics, 16(1), 22–29. Church, K. W., & Hanks, P. (1990). Word association norms, mutual information, and lexicography. Computational linguistics, 16(1), 22–29.
Zurück zum Zitat Das, S., & Chen, M. (2001). Yahoo! for Amazon: extracting market sentiment from stock message boards. Management Science, 53(9), 1375–1388.CrossRef Das, S., & Chen, M. (2001). Yahoo! for Amazon: extracting market sentiment from stock message boards. Management Science, 53(9), 1375–1388.CrossRef
Zurück zum Zitat Dong, Z., & Dong, Q. (2006). HowNet and the Computation of Meaning. World Scientific. Dong, Z., & Dong, Q. (2006). HowNet and the Computation of Meaning. World Scientific.
Zurück zum Zitat Esuli, A., & Sebastiani, F. (2006). Sentiwordnet: A publicly available lexical resource for opinion mining. In Proceedings of LREC (Vol. 6, pp.417–422). Genoa, Italy. Esuli, A., & Sebastiani, F. (2006). Sentiwordnet: A publicly available lexical resource for opinion mining. In Proceedings of LREC (Vol. 6, pp.417–422). Genoa, Italy.
Zurück zum Zitat Feng, S., Wang, L., Xu, W., Wang, D., & Yu, G. (2012). Unsupervised learning Chinese sentiment lexicon from massive microblog data. Advanced Data Mining and Applications, 7713, 27–38. Feng, S., Wang, L., Xu, W., Wang, D., & Yu, G. (2012). Unsupervised learning Chinese sentiment lexicon from massive microblog data. Advanced Data Mining and Applications, 7713, 27–38.
Zurück zum Zitat Ku, L. W., Liang, Y. T. & Chen, H. H. (2006). Opinion extraction, summarization and tracking in news and blog Corpora. Proceedings of AAAI-2006 Spring Symposium on Computational Approaches to Analyzing Weblogs, AAAI Technical Report, 100–107. CA, USA. Ku, L. W., Liang, Y. T. & Chen, H. H. (2006). Opinion extraction, summarization and tracking in news and blog Corpora. Proceedings of AAAI-2006 Spring Symposium on Computational Approaches to Analyzing Weblogs, AAAI Technical Report, 100–107. CA, USA.
Zurück zum Zitat Ku, L. W., Liu, I. C., Lee, C. Y., Chen, K. H., & Chen, H. H. (2008). Sentence-level opinion analysis by COPEOPI in NTCIR-7. In Proceeding of NTCIR-7 Workshop (pp. 260–267). Tokyo, Japan. Ku, L. W., Liu, I. C., Lee, C. Y., Chen, K. H., & Chen, H. H. (2008). Sentence-level opinion analysis by COPEOPI in NTCIR-7. In Proceeding of NTCIR-7 Workshop (pp. 260–267). Tokyo, Japan.
Zurück zum Zitat Ku, L. W., Huang, T. H., & Chen, H. H. (2009). Using morphological and syntactic structures for Chinese opinion analysis. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (Vol. 3, no.3, pp. 1260–1269). Singapore. Ku, L. W., Huang, T. H., & Chen, H. H. (2009). Using morphological and syntactic structures for Chinese opinion analysis. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (Vol. 3, no.3, pp. 1260–1269). Singapore.
Zurück zum Zitat Li, N., & Wu, D. D. (2010). Using text mining and sentiment analysis for online forums hotspot detection and forecast. Decision Support Systems, 48(2), 354–368.CrossRef Li, N., & Wu, D. D. (2010). Using text mining and sentiment analysis for online forums hotspot detection and forecast. Decision Support Systems, 48(2), 354–368.CrossRef
Zurück zum Zitat Li, L., & Yao, T. (2007, August). Kernel-based sentiment classification for Chinese sentence. In Advanced Language Processing and Web Information Technology, ALPIT 2007. Sixth International Conference (pp. 27–32). Henan, China. Li, L., & Yao, T. (2007, August). Kernel-based sentiment classification for Chinese sentence. In Advanced Language Processing and Web Information Technology, ALPIT 2007. Sixth International Conference (pp. 27–32). Henan, China.
Zurück zum Zitat Li, D., Ma, Y. T., & Guo, J. L. (2009). Words semantic orientation classification based on HowNet. The Journal of China Universities of Posts and Telecommunications, 16(1), 106–110.CrossRef Li, D., Ma, Y. T., & Guo, J. L. (2009). Words semantic orientation classification based on HowNet. The Journal of China Universities of Posts and Telecommunications, 16(1), 106–110.CrossRef
Zurück zum Zitat Liu, B. (2010). Sentiment analysis and subjectivity. Handbook of natural language processing, 2nd edition. Liu, B. (2010). Sentiment analysis and subjectivity. Handbook of natural language processing, 2nd edition.
Zurück zum Zitat Miller, G. A. (1995). WordNet: a lexical database for English. Communications of the ACM, 38(11), 39–41.CrossRef Miller, G. A. (1995). WordNet: a lexical database for English. Communications of the ACM, 38(11), 39–41.CrossRef
Zurück zum Zitat Nasukawa, T., & Yi, J. (2003). Sentiment analysis: Capturing favorability using natural language processing. In Proceedings of the 2nd international conference on Knowledge capture (pp. 70–77). NY, USA. Nasukawa, T., & Yi, J. (2003). Sentiment analysis: Capturing favorability using natural language processing. In Proceedings of the 2nd international conference on Knowledge capture (pp. 70–77). NY, USA.
Zurück zum Zitat Pang, B., & Lee, L. (2005). Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. Annual Meeting-Association for computational linguistics, 43(1). Jeju, Korea. Pang, B., & Lee, L. (2005). Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. Annual Meeting-Association for computational linguistics, 43(1). Jeju, Korea.
Zurück zum Zitat Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and trends in information retrieval, 2(1–2), 1–135.CrossRef Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and trends in information retrieval, 2(1–2), 1–135.CrossRef
Zurück zum Zitat Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Duchesnay, É., et al. (2011). Scikit-learn: machine learning in Python. The Journal of Machine Learning Research, 12, 2825–2830. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Duchesnay, É., et al. (2011). Scikit-learn: machine learning in Python. The Journal of Machine Learning Research, 12, 2825–2830.
Zurück zum Zitat Sun, Y. T., Chen, C. L., Liu, C. C., Liu, C. L., & Soo, V. W. (2010). Sentiment classification of short Chinese sentences. Proceedings of the 22nd Conference on Computational Linguistics and Speech Processing (ROCLING 2010) (pp. 184–198). San Jose de Buan, Philippines. Sun, Y. T., Chen, C. L., Liu, C. C., Liu, C. L., & Soo, V. W. (2010). Sentiment classification of short Chinese sentences. Proceedings of the 22nd Conference on Computational Linguistics and Speech Processing (ROCLING 2010) (pp. 184–198). San Jose de Buan, Philippines.
Zurück zum Zitat Tan, S., & Zhang, J. (2008). An empirical study of sentiment analysis for Chinese documents. Expert Systems with Applications, 34(4), 2622–2629.CrossRef Tan, S., & Zhang, J. (2008). An empirical study of sentiment analysis for Chinese documents. Expert Systems with Applications, 34(4), 2622–2629.CrossRef
Zurück zum Zitat Turney, P. D. (2001, September). Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL. Proceedings of the 12th European Conference on Machine Learning (pp. 491–502). Turney, P. D. (2001, September). Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL. Proceedings of the 12th European Conference on Machine Learning (pp. 491–502).
Zurück zum Zitat Turney, P. D. (2002). Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics (pp. 417–424). Freiburg, Germany. Turney, P. D. (2002). Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics (pp. 417–424). Freiburg, Germany.
Zurück zum Zitat Van Rijsbergen, C. J. (1979). Information Retrieval (2nd ed.). London: Butterworth. Van Rijsbergen, C. J. (1979). Information Retrieval (2nd ed.). London: Butterworth.
Zurück zum Zitat Vapnik, V. (1995). The Nature of Statistical Learning Theory. New York: Springer.CrossRef Vapnik, V. (1995). The Nature of Statistical Learning Theory. New York: Springer.CrossRef
Zurück zum Zitat Wan, X. J. (2009). Co-training for cross-lingual sentiment classification. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP (Vol. 1, pp. 235–243). Singapore. Wan, X. J. (2009). Co-training for cross-lingual sentiment classification. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP (Vol. 1, pp. 235–243). Singapore.
Zurück zum Zitat Wang, X., Zhao, Y. Q., & Fu, G. H. (2011). A Morpheme-based Method to Chinese Sentence-Level Sentiment Classification. International Journal of Asian Language Processing, 21(3), 95–106. Penang, Malaysia. Wang, X., Zhao, Y. Q., & Fu, G. H. (2011). A Morpheme-based Method to Chinese Sentence-Level Sentiment Classification. International Journal of Asian Language Processing, 21(3), 95–106. Penang, Malaysia.
Zurück zum Zitat Wu, Z., & Tseng, G. (1993). Chinese text segmentation for text retrieval: achievements and problems. Journal of the American Society for Information Science, 44(9), 532–542.CrossRef Wu, Z., & Tseng, G. (1993). Chinese text segmentation for text retrieval: achievements and problems. Journal of the American Society for Information Science, 44(9), 532–542.CrossRef
Zurück zum Zitat Wu, Z., & Tseng, G. (1999). ACTS: an automatic Chinese text segmentation system for full text retrieval. Journal of the American Society for Information Science, 46(2), 83–96.CrossRef Wu, Z., & Tseng, G. (1999). ACTS: an automatic Chinese text segmentation system for full text retrieval. Journal of the American Society for Information Science, 46(2), 83–96.CrossRef
Zurück zum Zitat Wu, Y., & Wen, M. (2010, August). Disambiguating dynamic sentiment ambiguous adjectives. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010) (pp. 1191–1199). Beijing, China. Wu, Y., & Wen, M. (2010, August). Disambiguating dynamic sentiment ambiguous adjectives. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010) (pp. 1191–1199). Beijing, China.
Zurück zum Zitat Xu, H., Zhao, K., Qiu, L., & Hu, C. (2011). Expanding Chinese sentiment dictionaries from large scale unlabeled corpus. Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation, 3, 53–57. Sendai, Japan. Xu, H., Zhao, K., Qiu, L., & Hu, C. (2011). Expanding Chinese sentiment dictionaries from large scale unlabeled corpus. Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation, 3, 53–57. Sendai, Japan.
Zurück zum Zitat Ye, Q., Shi,W., & Li. Y. (2006). Sentiment classification for movie reviews in Chinese by improved semantic oriented approach. Proceedings of the 39th Hawaii International Conference on System Sciences, HICSS’06, 3. Hawaii, USA. Ye, Q., Shi,W., & Li. Y. (2006). Sentiment classification for movie reviews in Chinese by improved semantic oriented approach. Proceedings of the 39th Hawaii International Conference on System Sciences, HICSS’06, 3. Hawaii, USA.
Zurück zum Zitat Yuen, R. W., Chan, T. Y., Lai, T. B., Kwong, O. Y., & T’sou, B. K. (2004). Morpheme-based derivation of bipolar semantic orientation of Chinese words. In Proceedings of the 20th international conference on Computational Linguistics (pp. 1008–1014). PA, USA. Yuen, R. W., Chan, T. Y., Lai, T. B., Kwong, O. Y., & T’sou, B. K. (2004). Morpheme-based derivation of bipolar semantic orientation of Chinese words. In Proceedings of the 20th international conference on Computational Linguistics (pp. 1008–1014). PA, USA.
Zurück zum Zitat Zhang, W. H., Hua, X., & Wei, W. (2012). Weakness Finder: find product weakness from Chinese reviews by using aspects based sentiment analysis. Expert Systems with Applications, 39(11), 10283–10291.CrossRef Zhang, W. H., Hua, X., & Wei, W. (2012). Weakness Finder: find product weakness from Chinese reviews by using aspects based sentiment analysis. Expert Systems with Applications, 39(11), 10283–10291.CrossRef
Zurück zum Zitat Zhou, X., Marslen-Wilson, W., Taft, M., & Shu, H. (1999). Morphology, orthography, and phonology reading Chinese compound words. Language and cognitive processes, 14(5–6), 525–565.CrossRef Zhou, X., Marslen-Wilson, W., Taft, M., & Shu, H. (1999). Morphology, orthography, and phonology reading Chinese compound words. Language and cognitive processes, 14(5–6), 525–565.CrossRef
Metadaten
Titel
Sentiment analysis for Chinese reviews of movies in multi-genre based on morpheme-based features and collocations
verfasst von
Heng-Li Yang
August F. Y. Chao
Publikationsdatum
01.12.2015
Verlag
Springer US
Erschienen in
Information Systems Frontiers / Ausgabe 6/2015
Print ISSN: 1387-3326
Elektronische ISSN: 1572-9419
DOI
https://doi.org/10.1007/s10796-014-9498-1

Weitere Artikel der Ausgabe 6/2015

Information Systems Frontiers 6/2015 Zur Ausgabe

Premium Partner