Skip to main content
Erschienen in: Discover Computing 5/2009

01.10.2009

A machine learning approach to sentiment analysis in multilingual Web texts

verfasst von: Erik Boiy, Marie-Francine Moens

Erschienen in: Discover Computing | Ausgabe 5/2009

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Sentiment analysis, also called opinion mining, is a form of information extraction from text of growing research and commercial interest. In this paper we present our machine learning experiments with regard to sentiment analysis in blog, review and forum texts found on the World Wide Web and written in English, Dutch and French. We train from a set of example sentences or statements that are manually annotated as positive, negative or neutral with regard to a certain entity. We are interested in the feelings that people express with regard to certain consumption products. We learn and evaluate several classification models that can be configured in a cascaded pipeline. We have to deal with several problems, being the noisy character of the input texts, the attribution of the sentiment to a particular entity and the small size of the training set. We succeed to identify positive, negative and neutral feelings to the entity under consideration with ca. 83% accuracy for English texts based on unigram features augmented with linguistic features. The accuracy results of processing the Dutch and French texts are ca. 70 and 68% respectively due to the larger variety of the linguistic expressions that more often diverge from standard language, thus demanding more training patterns. In addition, our experiments give us insights into the portability of the learned models across domains and languages. A substantial part of the article investigates the role of active learning techniques for reducing the number of examples to be manually annotated.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
3
Despite the strong resemblance, our approach should not be confused with pure transfer learning where you learn a new classification model taking advantage of a model learned for a different, but related task. Transfer learning typically requires further labeled data for the new task (Raina et al. 2007).
 
4
We also considered other features such as n-grams of characters; bigrams and trigrams of words (cf. Chambers et al. 2006); lemmas; punctuation patterns; word pairs that are not necessarily subsequent; words belonging to a certain part-of-speech (POS) (cf. Wiebe 2000); and the number of verbs in a sentence where a large number of verbs could signal epistemic modality and thus the possibility of a sentiment (Rubin et al. 2006). Tests showed that these features had little value when classifying the sentiment of our data.
 
8
No standard annotated corpora exist for entity-based sentiment analysis (including the neutral class) on sentences. Our corpora are currently proprietary: we have asked the company Attentio to publicly release the data.
 
10
All confidence level tests are obtained with two-paired t-test.
 
12
We used an error tolerance of 0.05 for all experiments.
 
13
Note that the ME classifier does not allow parse feature weights within the binary feature functions.
 
14
We tested different uncertainty values on a validation set that threshold the percolation to the third layer, yielding a threshold of 75% for the MNB classifier used for English, 33.4% for the ME classifier used for French, and a hyperplane distance of 0.11 for the SVM classifier used for Dutch.
 
15
Note that no bagging (see Question 6.3.3) was used in these experiments.
 
19
Confidence levels with respect to accuracy: English ≥ 80%, Dutch ≥ 90% and French < 75%.
 
20
In the first selection step, taking into account the distribution of our dataset, the required examples are obtained (confidence level ≥99.9%) when we have seen 88 examples.
 
Literatur
Zurück zum Zitat Aman, S., & Szpakowicz, S. (2008). Using Roget’s thesaurus for fine-grained emotion recognition. In Proceedings of the International Joint Conference on NLP (IJCNLP) (pp. 296–302). Aman, S., & Szpakowicz, S. (2008). Using Roget’s thesaurus for fine-grained emotion recognition. In Proceedings of the International Joint Conference on NLP (IJCNLP) (pp. 296–302).
Zurück zum Zitat Aue, A., & Gamon, M. (2005). Customizing sentiment classifiers to new domains: A case study. Technical report, Microsoft Research. Aue, A., & Gamon, M. (2005). Customizing sentiment classifiers to new domains: A case study. Technical report, Microsoft Research.
Zurück zum Zitat Bai, X., Padman, R., & Airoldi, E. (2005). On learning parsimonious models for extracting consumer opinions. In Proceedings of HICSS-05, 38th Annual Hawaii International Conference on System Sciences (pp. 75–82). Washington, DC: IEEE Computer Society. Bai, X., Padman, R., & Airoldi, E. (2005). On learning parsimonious models for extracting consumer opinions. In Proceedings of HICSS-05, 38th Annual Hawaii International Conference on System Sciences (pp. 75–82). Washington, DC: IEEE Computer Society.
Zurück zum Zitat Baram, Y., El-Yaniv, R., & Luz, K. (2004). Online choice of active learning algorithms. Journal of Machine Learning Research, 5, 255–291. MathSciNet Baram, Y., El-Yaniv, R., & Luz, K. (2004). Online choice of active learning algorithms. Journal of Machine Learning Research, 5, 255–291. MathSciNet
Zurück zum Zitat Berger, A. L., Pietra, S. D., & Pietra, V. J. D. (1996). A maximum entropy approach to natural language processing. Computational Linguistics, 22, 39–71. Berger, A. L., Pietra, S. D., & Pietra, V. J. D. (1996). A maximum entropy approach to natural language processing. Computational Linguistics, 22, 39–71.
Zurück zum Zitat Bondu, A., Lemaire, V., & Poulain, B. (2007). Active learning strategies: A case study for detection of emotions in speech. In Industrial Conference on Data Mining, volume 4597 of Lecture Notes in Computer Science (pp. 228–241). Springer. Bondu, A., Lemaire, V., & Poulain, B. (2007). Active learning strategies: A case study for detection of emotions in speech. In Industrial Conference on Data Mining, volume 4597 of Lecture Notes in Computer Science (pp. 228–241). Springer.
Zurück zum Zitat Brinker, K. (2003). Incorporating diversity in active learning with support vector machines. In Proceedings of ICML-03, 20th International Conference on Machine Learning (pp. 59–66). Washington, DC: AAAI Press. Brinker, K. (2003). Incorporating diversity in active learning with support vector machines. In Proceedings of ICML-03, 20th International Conference on Machine Learning (pp. 59–66). Washington, DC: AAAI Press.
Zurück zum Zitat Budanitsky, A., & Hirst, G. (2004). Evaluating WordNet-based measures of lexical semantic relatedness. Computational Linguistics, 1, 1–49. Budanitsky, A., & Hirst, G. (2004). Evaluating WordNet-based measures of lexical semantic relatedness. Computational Linguistics, 1, 1–49.
Zurück zum Zitat Chambers, N., Tetreault, J., & Allen, J. (2006). Certainty identification in texts: Categorization model and manual tagging results. In J. Shanahan, Y. Qu, & J. Wiebe (Eds.), Computing attitude and affect in text: Theory and applications (pp. 143–158). Springer. Chambers, N., Tetreault, J., & Allen, J. (2006). Certainty identification in texts: Categorization model and manual tagging results. In J. Shanahan, Y. Qu, & J. Wiebe (Eds.), Computing attitude and affect in text: Theory and applications (pp. 143–158). Springer.
Zurück zum Zitat Chesley, P., Vincent, B., Xu, L., & Srihari, R. (2006). Using verbs and adjectives to automatically classify blog sentiment. In Proceedings of AAAI-CAAW-06, the Spring Symposia on Computational Approaches to Analyzing Weblogs. Stanford, CA. Chesley, P., Vincent, B., Xu, L., & Srihari, R. (2006). Using verbs and adjectives to automatically classify blog sentiment. In Proceedings of AAAI-CAAW-06, the Spring Symposia on Computational Approaches to Analyzing Weblogs. Stanford, CA.
Zurück zum Zitat Conrad, J. G., & Schilder, F. (2007). Opinion mining in legal blogs. In Proceedings of ICAIL’07, 11th International Conference on Artificial Intelligence and Law (pp. 231–236). New York: ACM. Conrad, J. G., & Schilder, F. (2007). Opinion mining in legal blogs. In Proceedings of ICAIL’07, 11th International Conference on Artificial Intelligence and Law (pp. 231–236). New York: ACM.
Zurück zum Zitat Cristianini, N., & Shawe-Taylor, J. (2000). An introduction to support vector machines and other kernel-based learning methods. Cambridge, UK: Cambridge University Press. Cristianini, N., & Shawe-Taylor, J. (2000). An introduction to support vector machines and other kernel-based learning methods. Cambridge, UK: Cambridge University Press.
Zurück zum Zitat Croft, W., & Lafferty, B. (2003). Language modeling for information retrieval. Boston, MA: Kluwer Academic Publishers.MATH Croft, W., & Lafferty, B. (2003). Language modeling for information retrieval. Boston, MA: Kluwer Academic Publishers.MATH
Zurück zum Zitat Dagan, I., & Engelson, S. P. (1995). Committee-based sampling for training probabilistic classifiers. In Proceedings of ICML-95, 12th International Conference on Machine Learning (pp. 150–157). Dagan, I., & Engelson, S. P. (1995). Committee-based sampling for training probabilistic classifiers. In Proceedings of ICML-95, 12th International Conference on Machine Learning (pp. 150–157).
Zurück zum Zitat Dave, K., Lawrence, S., & Pennock, D. M. (2003). Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. In Proceedings of WWW-03, 12th International Conference on the World Wide Web (pp. 519–528). New York: ACM Press. Dave, K., Lawrence, S., & Pennock, D. M. (2003). Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. In Proceedings of WWW-03, 12th International Conference on the World Wide Web (pp. 519–528). New York: ACM Press.
Zurück zum Zitat De Smet, W., & Moens, M. F. (2007). Generating a topic hierarchy from dialect texts. In DEXA Workshops (pp. 249–253). IEEE Computer Society. De Smet, W., & Moens, M. F. (2007). Generating a topic hierarchy from dialect texts. In DEXA Workshops (pp. 249–253). IEEE Computer Society.
Zurück zum Zitat Finn, A., & Kushmerick, N. (2003). Learning to classify documents according to genre. Journal of the American Society for Information Science, 57, 1506–1518. Special issue on Computational Analysis of Style. Finn, A., & Kushmerick, N. (2003). Learning to classify documents according to genre. Journal of the American Society for Information Science, 57, 1506–1518. Special issue on Computational Analysis of Style.
Zurück zum Zitat Freund, Y., Seung, H. S., Shamir, E., & Tishby, N. (1997). Selective sampling using the query by committee algorithm. Machine Learning, 28, 133–168.MATHCrossRef Freund, Y., Seung, H. S., Shamir, E., & Tishby, N. (1997). Selective sampling using the query by committee algorithm. Machine Learning, 28, 133–168.MATHCrossRef
Zurück zum Zitat Galley, M., & McKeown, K. (2007). Lexicalized Markov grammars for sentence compression. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference (pp. 180–187). Rochester, New York: Association for Computational Linguistics. Galley, M., & McKeown, K. (2007). Lexicalized Markov grammars for sentence compression. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference (pp. 180–187). Rochester, New York: Association for Computational Linguistics.
Zurück zum Zitat Gamon, M. (2004). Sentiment classification on customer feedback data: Noisy data, large feature vectors, and the role of linguistic analysis. In Proceedings of COLING-04, the 20th International Conference on Computational Linguistics (pp. 841–847). Geneva, CH. Gamon, M. (2004). Sentiment classification on customer feedback data: Noisy data, large feature vectors, and the role of linguistic analysis. In Proceedings of COLING-04, the 20th International Conference on Computational Linguistics (pp. 841–847). Geneva, CH.
Zurück zum Zitat Hatzivassiloglou, V., & McKeown, K. R. (1997). Predicting the semantic orientation of adjectives. In Proceedings of ACL-97, 35th Annual Meeting of the Association for Computational Linguistics (pp. 174–181). Madrid, Spain: Association for Computational Linguistics. Hatzivassiloglou, V., & McKeown, K. R. (1997). Predicting the semantic orientation of adjectives. In Proceedings of ACL-97, 35th Annual Meeting of the Association for Computational Linguistics (pp. 174–181). Madrid, Spain: Association for Computational Linguistics.
Zurück zum Zitat Hatzivassiloglou, V., & Wiebe, J. M. (2000). Effects of adjective orientation and gradability on sentence subjectivity. In Proceedings of COLING-00, 18th International Conference on Computational Linguistics (pp. 299–305). San Francisco, CA: Morgan Kaufmann. Hatzivassiloglou, V., & Wiebe, J. M. (2000). Effects of adjective orientation and gradability on sentence subjectivity. In Proceedings of COLING-00, 18th International Conference on Computational Linguistics (pp. 299–305). San Francisco, CA: Morgan Kaufmann.
Zurück zum Zitat Hearst, M. A. (1992). Direction-based text interpretation as an information access refinement. In P. Jacobs (Ed.), Text-based intelligent systems: Current research and practice in information extraction and retrieval (pp. 257–274). Mahwah, NJ: Lawrence Erlbaum Associates, Inc. Hearst, M. A. (1992). Direction-based text interpretation as an information access refinement. In P. Jacobs (Ed.), Text-based intelligent systems: Current research and practice in information extraction and retrieval (pp. 257–274). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
Zurück zum Zitat Hu, M., & Liu, B. (2004). Mining opinion features in customer reviews. In Proceedings of AAAI-04, 19th National Conference on Artificial Intellgience (pp. 755–760). San Jose, USA. Hu, M., & Liu, B. (2004). Mining opinion features in customer reviews. In Proceedings of AAAI-04, 19th National Conference on Artificial Intellgience (pp. 755–760). San Jose, USA.
Zurück zum Zitat Huang, T., Dagli, C., Rajaram, S., Chang, E., Mandel, M., Poliner, G., & Ellis, D. (2008). Active learning for interactive multimedia retrieval. Proceedings of the IEEE, 96, 648–667.CrossRef Huang, T., Dagli, C., Rajaram, S., Chang, E., Mandel, M., Poliner, G., & Ellis, D. (2008). Active learning for interactive multimedia retrieval. Proceedings of the IEEE, 96, 648–667.CrossRef
Zurück zum Zitat Huber, R., Batliner, A., Buckow, J., Nöth, E., Warnke, V., & Niemann, H. (2000). Recognition of emotion in a realistic dialogue scenario. In Proceedings of the International Conference on Spoken Language Processing (Vol. 1, pp. 665–668). Beijing, China. Huber, R., Batliner, A., Buckow, J., Nöth, E., Warnke, V., & Niemann, H. (2000). Recognition of emotion in a realistic dialogue scenario. In Proceedings of the International Conference on Spoken Language Processing (Vol. 1, pp. 665–668). Beijing, China.
Zurück zum Zitat Iyengar, V. S., Apte, C., & Zhang, T. (2000). Active learning using adaptive resampling. In Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 92–98). New York: ACM. Iyengar, V. S., Apte, C., & Zhang, T. (2000). Active learning using adaptive resampling. In Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 92–98). New York: ACM.
Zurück zum Zitat Kamps, J., & Marx, M. (2002). Words with attitude. In Proceedings of the 1st International Conference on Global WordNet (pp. 332–341). Mysore, India. Kamps, J., & Marx, M. (2002). Words with attitude. In Proceedings of the 1st International Conference on Global WordNet (pp. 332–341). Mysore, India.
Zurück zum Zitat Kessler, B., Nunberg, G., & Schütze, H. (1997). Automatic detection of text genre. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics (pp. 32–38). Somerset, NJ: Association for Computational Linguistics. Kessler, B., Nunberg, G., & Schütze, H. (1997). Automatic detection of text genre. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics (pp. 32–38). Somerset, NJ: Association for Computational Linguistics.
Zurück zum Zitat Knight, K., & Marcu, D. (2000). Statistics-based summarization - step one: Sentence compression. In Proceedings of AAAI/IAAI-00, 12th Conference on Innovative Applications of AI (pp. 703–710). San Francisco, CA: AAAI Press. Knight, K., & Marcu, D. (2000). Statistics-based summarization - step one: Sentence compression. In Proceedings of AAAI/IAAI-00, 12th Conference on Innovative Applications of AI (pp. 703–710). San Francisco, CA: AAAI Press.
Zurück zum Zitat Kobayashi, N., Inui, K., & Matsumoto, Y. (2007). Extracting aspect-evaluation and aspect-of relations in opinion mining. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) (pp. 1065–1074). Kobayashi, N., Inui, K., & Matsumoto, Y. (2007). Extracting aspect-evaluation and aspect-of relations in opinion mining. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) (pp. 1065–1074).
Zurück zum Zitat Leshed, G., & Kaye, J. (2006). Understanding how bloggers feel: Recognizing affect in blog posts. In CHI’06: Extended Abstracts on Human Factors in Computing Systems (pp. 1019–1024). New York: ACM. Leshed, G., & Kaye, J. (2006). Understanding how bloggers feel: Recognizing affect in blog posts. In CHI’06: Extended Abstracts on Human Factors in Computing Systems (pp. 1019–1024). New York: ACM.
Zurück zum Zitat Lewis, D. D., & Catlett, J. (1994). Heterogeneous uncertainty sampling for supervised learning. In Proceedings of ICML-94, 11th International Conference on Machine Learning (pp. 148–156). Morgan CA, New Brunswick, USA: Kaufmann Publishers, San Francisco. Lewis, D. D., & Catlett, J. (1994). Heterogeneous uncertainty sampling for supervised learning. In Proceedings of ICML-94, 11th International Conference on Machine Learning (pp. 148–156). Morgan CA, New Brunswick, USA: Kaufmann Publishers, San Francisco.
Zurück zum Zitat Lewis, D. D., & Gale, W. A. (1994). A sequential algorithm for training text classifiers. In Proceedings of SIGIR’94, 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 3–12). New York: Springer-Verlag. Lewis, D. D., & Gale, W. A. (1994). A sequential algorithm for training text classifiers. In Proceedings of SIGIR’94, 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 3–12). New York: Springer-Verlag.
Zurück zum Zitat Liere, R., & Tadepalli, P. (1997). Active learning with committees for text categorization. In Proceedings of AAAI-97, 14th Conference of the American Association for Artificial Intelligence (pp. 591–596). Menlo Park, CA: AAAI Press. Liere, R., & Tadepalli, P. (1997). Active learning with committees for text categorization. In Proceedings of AAAI-97, 14th Conference of the American Association for Artificial Intelligence (pp. 591–596). Menlo Park, CA: AAAI Press.
Zurück zum Zitat Liu, H., Lieberman, H., & Selker, T. (2003). A model of textual affect sensing using real-world knowledge. In Proceedings of IUI-03, 8th International Conference on Intelligent User Interfaces (pp. 125–132). New York: ACM. Liu, H., Lieberman, H., & Selker, T. (2003). A model of textual affect sensing using real-world knowledge. In Proceedings of IUI-03, 8th International Conference on Intelligent User Interfaces (pp. 125–132). New York: ACM.
Zurück zum Zitat Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval, chapter 13. Cambridge University Press. Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval, chapter 13. Cambridge University Press.
Zurück zum Zitat McCallum, A. K., & Nigam, K. (1998). Employing EM in pool-based active learning for text classification. In Proceedings of ICML-98, 15th International Conference on Machine Learning (pp. 350–358). San Francisco, CA: Morgan Kaufmann Publishers. McCallum, A. K., & Nigam, K. (1998). Employing EM in pool-based active learning for text classification. In Proceedings of ICML-98, 15th International Conference on Machine Learning (pp. 350–358). San Francisco, CA: Morgan Kaufmann Publishers.
Zurück zum Zitat Mishne, G. (2005). Experiments with mood classification in blog posts. In Style2005, 1st Workshop on Stylistic Analysis of Text for Information Access at SIGIR 2005. Mishne, G. (2005). Experiments with mood classification in blog posts. In Style2005, 1st Workshop on Stylistic Analysis of Text for Information Access at SIGIR 2005.
Zurück zum Zitat Mishne, G., & de Rijke, M. (2006). A study of blog search. In European Conference on Information Retrieval (pp. 289–301). Berlin, Germany: Springer. Mishne, G., & de Rijke, M. (2006). A study of blog search. In European Conference on Information Retrieval (pp. 289–301). Berlin, Germany: Springer.
Zurück zum Zitat Mulder, M., Nijholt, A., den Uyl, M., & Terpstra, P. (2004). A lexical grammatical implementation of affect. In Proceedings of TSD-04, 7th International Conference on Text, Speech and Dialogue, volume 3206 of Lecture Notes in Computer Science (pp. 171–178). Berlin, Germany: Springer. Mulder, M., Nijholt, A., den Uyl, M., & Terpstra, P. (2004). A lexical grammatical implementation of affect. In Proceedings of TSD-04, 7th International Conference on Text, Speech and Dialogue, volume 3206 of Lecture Notes in Computer Science (pp. 171–178). Berlin, Germany: Springer.
Zurück zum Zitat Mullen, T., & Collier, N. (2004). Sentiment analysis using support vector machines with diverse information sources. In Proceedings of EMNLP-04, 9th Conference on Empirical Methods in Natural Language Processing (pp. 412–418). Barcelona, Spain. Mullen, T., & Collier, N. (2004). Sentiment analysis using support vector machines with diverse information sources. In Proceedings of EMNLP-04, 9th Conference on Empirical Methods in Natural Language Processing (pp. 412–418). Barcelona, Spain.
Zurück zum Zitat Mullen, T., & Malouf, R. (2006). A preliminary investigation into sentiment analysis of informal political discourse. In AAAI 2006 Spring Symposium on Computational Approaches to Analysing Weblogs (AAAI-CAAW 2006) (pp. 125–126). Mullen, T., & Malouf, R. (2006). A preliminary investigation into sentiment analysis of informal political discourse. In AAAI 2006 Spring Symposium on Computational Approaches to Analysing Weblogs (AAAI-CAAW 2006) (pp. 125–126).
Zurück zum Zitat Nguyen, H. T., & Smeulders, A. (2004). Active learning using pre-clustering. In Proceedings of ICML-04, 21st International Conference on Machine Learning (p. 79). New York: ACM Press. Nguyen, H. T., & Smeulders, A. (2004). Active learning using pre-clustering. In Proceedings of ICML-04, 21st International Conference on Machine Learning (p. 79). New York: ACM Press.
Zurück zum Zitat Nijholt, A. (2003). Humor and embodied conversational agents. CTIT Technical Report series No. 03-03, University of Twente. Nijholt, A. (2003). Humor and embodied conversational agents. CTIT Technical Report series No. 03-03, University of Twente.
Zurück zum Zitat Osugi, T. (2005). Exploration-Based Active Machine Learning. Master’s thesis, University of Nebraska. Osugi, T. (2005). Exploration-Based Active Machine Learning. Master’s thesis, University of Nebraska.
Zurück zum Zitat Pang, B., & Lee, L. (2004). A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of ACL-04, 42nd Meeting of the Association for Computational Linguistics (pp. 271–278). East Stroudsburg, PA: Association for Computational Linguistics. Pang, B., & Lee, L. (2004). A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of ACL-04, 42nd Meeting of the Association for Computational Linguistics (pp. 271–278). East Stroudsburg, PA: Association for Computational Linguistics.
Zurück zum Zitat Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of EMNLP-02, the Conference on Empirical Methods in Natural Language Processing (pp. 79–86). Philadelphia, PA: Association for Computational Linguistics. Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of EMNLP-02, the Conference on Empirical Methods in Natural Language Processing (pp. 79–86). Philadelphia, PA: Association for Computational Linguistics.
Zurück zum Zitat Pedersen, T. (2001). A decision tree of bigrams is an accurate predictor of word sense. In Proceedings of the Second Annual Meeting of the North American Chapter of the Association for Computational Linguistics (pp. 79–86). Pedersen, T. (2001). A decision tree of bigrams is an accurate predictor of word sense. In Proceedings of the Second Annual Meeting of the North American Chapter of the Association for Computational Linguistics (pp. 79–86).
Zurück zum Zitat Polanyi, L., & Zaenen, A. (2006). Contextual valence shifters. In J. Shanahan, Y. Qu, & J. Wiebe (Eds.), Computing attitude and affect in text: Theory and applications (pp. 1–10). Springer. Polanyi, L., & Zaenen, A. (2006). Contextual valence shifters. In J. Shanahan, Y. Qu, & J. Wiebe (Eds.), Computing attitude and affect in text: Theory and applications (pp. 1–10). Springer.
Zurück zum Zitat Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14, 130–137. Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14, 130–137.
Zurück zum Zitat Raina, R., Battle, A., Lee, H., Packer, B., & Ng, A. Y. (2007). Self-taught learning: transfer learning from unlabeled data. In Proceedings of ICML-07, 24th International Conference on Machine Learning (pp. 759–766). New York: ACM. Raina, R., Battle, A., Lee, H., Packer, B., & Ng, A. Y. (2007). Self-taught learning: transfer learning from unlabeled data. In Proceedings of ICML-07, 24th International Conference on Machine Learning (pp. 759–766). New York: ACM.
Zurück zum Zitat Riloff, E., Wiebe, J., & Wilson, T. (2003). Learning subjective nouns using extraction pattern bootstrapping. In Proceedings of CoNLL-03, 7th Conference on Natural Language Learning (pp. 25–32). Edmonton, CA. Riloff, E., Wiebe, J., & Wilson, T. (2003). Learning subjective nouns using extraction pattern bootstrapping. In Proceedings of CoNLL-03, 7th Conference on Natural Language Learning (pp. 25–32). Edmonton, CA.
Zurück zum Zitat Roy, N., & McCallum, A. (2001). Toward optimal active learning through sampling estimation of error reduction. In Proceedings of ICML-01, 18th International Conference on Machine Learning (pp. 441–448). San Francisco, CA: Morgan Kaufmann. Roy, N., & McCallum, A. (2001). Toward optimal active learning through sampling estimation of error reduction. In Proceedings of ICML-01, 18th International Conference on Machine Learning (pp. 441–448). San Francisco, CA: Morgan Kaufmann.
Zurück zum Zitat Rubin, V. L., Liddy, E. D., & Kando, N. (2006). Certainty identification in texts: Categorization model and manual tagging results. In J. Shanahan, Y. Qu, & J. Wiebe (Eds.), Computing attitude and affect in text: Theory and applications (pp. 61–76). Berlin, Germany: Springer. Rubin, V. L., Liddy, E. D., & Kando, N. (2006). Certainty identification in texts: Categorization model and manual tagging results. In J. Shanahan, Y. Qu, & J. Wiebe (Eds.), Computing attitude and affect in text: Theory and applications (pp. 61–76). Berlin, Germany: Springer.
Zurück zum Zitat Saar-Tsechansky, M., & Provost, F. (2004). Active sampling for class probability estimation and ranking. Machine Learning, 54, 153–178.MATHCrossRef Saar-Tsechansky, M., & Provost, F. (2004). Active sampling for class probability estimation and ranking. Machine Learning, 54, 153–178.MATHCrossRef
Zurück zum Zitat Salvetti, F., Lewis, S., & Reichenbach, C. (2004). Impact of lexical filtering on overall opinion polarity identification. In Proceedings of the AAAI Spring Symposium on Exploring Attitude and Affect in Text: Theories and Applications. Stanford, CA. Salvetti, F., Lewis, S., & Reichenbach, C. (2004). Impact of lexical filtering on overall opinion polarity identification. In Proceedings of the AAAI Spring Symposium on Exploring Attitude and Affect in Text: Theories and Applications. Stanford, CA.
Zurück zum Zitat Seung, H. S., Opper, M., & Sompolinsky, H. (1992). Query by committee. In Computational learning theory (pp. 287–294). Seung, H. S., Opper, M., & Sompolinsky, H. (1992). Query by committee. In Computational learning theory (pp. 287–294).
Zurück zum Zitat Tong, R., & Yager, R. (2006). Characterizing buzz and sentiment in internet sources: Linguistic summaries and predictive behaviors. In J. Shanahan, Y. Qu, & J. Wiebe (Eds.), Computing attitude and affect in text: Theory and applications (pp. 281–296). Springer. Tong, R., & Yager, R. (2006). Characterizing buzz and sentiment in internet sources: Linguistic summaries and predictive behaviors. In J. Shanahan, Y. Qu, & J. Wiebe (Eds.), Computing attitude and affect in text: Theory and applications (pp. 281–296). Springer.
Zurück zum Zitat Tong, S., & Koller, D. (2002). Support vector machine active learning with applications to text classification. Journal of Machine Learning Research, 2, 45–66. Tong, S., & Koller, D. (2002). Support vector machine active learning with applications to text classification. Journal of Machine Learning Research, 2, 45–66.
Zurück zum Zitat Turney, P. (2002). Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In Proceedings of ACL-02, 40th Annual Meeting of the Association for Computational Linguistics (pp. 417–424). Philadelphia, PA: Association for Computational Linguistics. Turney, P. (2002). Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In Proceedings of ACL-02, 40th Annual Meeting of the Association for Computational Linguistics (pp. 417–424). Philadelphia, PA: Association for Computational Linguistics.
Zurück zum Zitat Viola, P., & Jones, M. (2001). Robust real-time object detection. Technical report, Cambridge Research Lab, Compaq. Viola, P., & Jones, M. (2001). Robust real-time object detection. Technical report, Cambridge Research Lab, Compaq.
Zurück zum Zitat Wang, B., & Wang, H. (2007). Bootstrapping both product properties and opinion words from Chinese reviews with cross-training. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (pp. 259–262). Washington, DC, USA: IEEE Computer Society. Wang, B., & Wang, H. (2007). Bootstrapping both product properties and opinion words from Chinese reviews with cross-training. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (pp. 259–262). Washington, DC, USA: IEEE Computer Society.
Zurück zum Zitat Whitelaw, C., Garg, N., & Argamon, S. (2005). Using appraisal taxonomies for sentiment analysis. In Proceedings of MCLC-05, 2nd Midwest Computational Linguistic Colloquium. Columbus, OH. Whitelaw, C., Garg, N., & Argamon, S. (2005). Using appraisal taxonomies for sentiment analysis. In Proceedings of MCLC-05, 2nd Midwest Computational Linguistic Colloquium. Columbus, OH.
Zurück zum Zitat Wiebe, J. (2000). Learning subjective adjectives from corpora. In Proceedings of AAAI-00, 17th Conference of the American Association for Artificial Intelligence (pp. 735–740). Austin, TX: AAAI Press/The MIT Press. Wiebe, J. (2000). Learning subjective adjectives from corpora. In Proceedings of AAAI-00, 17th Conference of the American Association for Artificial Intelligence (pp. 735–740). Austin, TX: AAAI Press/The MIT Press.
Zurück zum Zitat Xu, Z., Yu, K., Tresp, V., Xu, X., & Wang, J. (2003). Representative sampling for text classification using support vector machines. In European Conference on Information Retrieval (pp. 393–407). Berlin, Germany: Springer. Xu, Z., Yu, K., Tresp, V., Xu, X., & Wang, J. (2003). Representative sampling for text classification using support vector machines. In European Conference on Information Retrieval (pp. 393–407). Berlin, Germany: Springer.
Zurück zum Zitat Zagibalov, T., & Carroll, J. (2008). Unsupervised classification of sentiment and objectivity in Chinese text. In Proceedings of the International Joint Conference on NLP (IJCNLP). Zagibalov, T., & Carroll, J. (2008). Unsupervised classification of sentiment and objectivity in Chinese text. In Proceedings of the International Joint Conference on NLP (IJCNLP).
Zurück zum Zitat Zhang, E., & Zhang, Y. (2006). UCSC on REC 2006 blog opinion mining. Technical report, University of California Santa Cruz, CA. Zhang, E., & Zhang, Y. (2006). UCSC on REC 2006 blog opinion mining. Technical report, University of California Santa Cruz, CA.
Zurück zum Zitat Zhu, J., Wang, H., & Hovy, E. H. (2008). Learning a stopping criterion for active learning for word sense disambiguation and text classification. In Proceedings of the International Joint Conference on NLP (IJCNLP). Zhu, J., Wang, H., & Hovy, E. H. (2008). Learning a stopping criterion for active learning for word sense disambiguation and text classification. In Proceedings of the International Joint Conference on NLP (IJCNLP).
Metadaten
Titel
A machine learning approach to sentiment analysis in multilingual Web texts
verfasst von
Erik Boiy
Marie-Francine Moens
Publikationsdatum
01.10.2009
Verlag
Springer Netherlands
Erschienen in
Discover Computing / Ausgabe 5/2009
Print ISSN: 2948-2984
Elektronische ISSN: 2948-2992
DOI
https://doi.org/10.1007/s10791-008-9070-z

Weitere Artikel der Ausgabe 5/2009

Discover Computing 5/2009 Zur Ausgabe

Premium Partner