Elsevier

Decision Support Systems

Volume 66, October 2014, Pages 170-179
Decision Support Systems

Tweet sentiment analysis with classifier ensembles

https://doi.org/10.1016/j.dss.2014.07.003Get rights and content

Highlights

  • We show that classifier ensembles are promising for tweet sentiment analysis.

  • We compare bag-of-words and feature hashing for the representation of tweets.

  • Classifier ensembles obtained from bag-of-words and feature hashing are discussed.

Abstract

Twitter is a microblogging site in which users can post updates (tweets) to friends (followers). It has become an immense dataset of the so-called sentiments. In this paper, we introduce an approach that automatically classifies the sentiment of tweets by using classifier ensembles and lexicons. Tweets are classified as either positive or negative concerning a query term. This approach is useful for consumers who can use sentiment analysis to search for products, for companies that aim at monitoring the public sentiment of their brands, and for many other applications. Indeed, sentiment classification in microblogging services (e.g., Twitter) through classifier ensembles and lexicons has not been well explored in the literature. Our experiments on a variety of public tweet sentiment datasets show that classifier ensembles formed by Multinomial Naive Bayes, SVM, Random Forest, and Logistic Regression can improve classification accuracy.

Introduction

Twitter is a popular microblogging service in which users post status messages, called “tweets”, with no more than 140 characters. In most cases, its users enter their messages with much fewer characters than the limit established. Twitter represents one of the largest and most dynamic datasets of user generated content — approximately 200 million users post 400 million tweets per day [1]. Tweets can express opinions on different topics, which can help to direct marketing campaigns so as to share consumers' opinions concerning brands and products [2], outbreaks of bullying [3], events that generate insecurity [4], polarity prediction in political and sports discussions [6], and acceptance or rejection of politicians [5], all in an electronic word-of-mouth way. Automatic tools can help decision makers to ensure efficient solutions to the problems raised. Under this perspective, the focus of our work is on the sentiment analysis of tweets.

Sentiment analysis aims at determining opinions, emotions, and attitudes reported in source materials like documents, short texts, sentences from reviews [7], [8], [9], blogs [10], [11], and news [12], among other sources. In such application domains, one deals with large text corpora and most often “formal language”. At least two specific issues should be addressed in any type of computer-based tweet analysis: first, the frequency of misspellings and slang in tweets is much higher than that in other domains, as users usually post messages from many different electronic devices, such as cell phones and tablets, and develop their own culture of a specific vocabulary in this type of environment. Second, Twitter users post messages on a variety of topics, unlike blogs, news, and other sites, which are tailored to specific topics.

We consider sentiment analysis a classification problem. Just like in large documents, sentiments of tweets can be expressed in different ways and classified according to the existence of sentiment, i.e., if there is sentiment in the message, then it is considered polar (categorized as positive or negative), otherwise it is considered neutral. Some authors, on the other hand, consider the six “universal” emotions [13]: anger, disgust, fear, happiness, sadness, and surprise as sentiments. In this paper, we adopt the view that sentiments can be either positive or negative, as in [14], [15], [16], [17], [18].

Big challenges can be faced in tweet sentiment analysis (Hassan et al. [19]): (i) neutral tweets are way more common than positive and negative ones. This is different from other sentiment analysis domains (e.g. product reviews), which tend to be predominantly positive or negative; (ii) there are linguistic representational challenges, like those that arise from feature engineering issues; and (iii) tweets are very short and often show limited sentiment cues.

Many researchers have focused on the use of traditional classifiers, like Naive Bayes, Maximum Entropy, and Support Vector Machines to solve such problems. In this paper, we show that the use of ensembles of multiple base classifiers, combined with scores obtained from lexicons, can improve the accuracy of tweet sentiment classification. Moreover, we investigate different representations of tweets that take bag-of-words and feature hashing into account [20].

The combination of multiple classifiers to generate a single classifier has been an active area of research over the last two decades [21], [22], [23]. For example, an analytical framework that quantifies the improvements in classification results due to the combination of multiple models is addressed in [24]. More recently, a survey on traditional ensemble techniques — together with their applications to many difficult real-world problems, such as remote sensing, person recognition, and medicine — is presented in [24]. Studies on ensemble learning for sentiment analysis of large text corpora — like those found in movies and product reviews, web forum datasets, and question answering — are reported in [25], [26], [27], [28], [29], [30], [31]. In summary, the literature on the subject has shown that from independent, diversified classifiers, the ensemble created is usually more accurate than its individual components. Related work on tweet sentiment analysis is rather limited [32], [33], [34], [19], but the initial results are promising.

Our main contributions can be summarized as follows: (i) we show that classifier ensembles formed by diversified components are promising for tweet sentiment analysis; (ii) we compare bag-of-words and feature hashing-based strategies for the representation of tweets and show their advantages and drawbacks; and (iii) classifier ensembles obtained from the combination of lexicons, bag-of-words, emoticons, and feature hashing are studied and discussed.

The remainder of the paper is organized as follows: Section 2 addresses the related work. Section 3 describes our approach, for which experimental results are provided in Section 4. Section 5 concludes the paper and discusses directions for future work.

Section snippets

Related work

Several studies on the use of stand-alone classifiers for tweet sentiment analysis are available in the literature, as shown in the summary in Table 1. Some of them propose the use of emoticons and hashtags for building the training set, as Go et al. [35] and Davidov et al. [36], who identified tweet polarity by using emoticons as class labels. Others use the characteristics of the social network as networked data, like in Hu et al. [37]. According to the authors, emotional contagion theories

Classifier ensembles for tweet sentiment analysis

Ensemble methods train multiple learners to solve the same problem [22]. In contrast to classic learning approaches, which construct one learner from the training data, ensemble methods construct a set of learners and combine them. Dietterich [56] lists three reasons for using an ensemble based system:

  • Statistical

    Assume that we have a number of different classifiers, and that all of them provide good accuracy in the training set. If a single classifier is chosen from the available ones, it may not yield

Datasets

Our experiments were performed in representative datasets obtained from tweets on different subjects [66]:

Concluding remarks

The use of classifier ensembles for tweet sentiment analysis has been underexplored in the literature. We have demonstrated that classifier ensembles formed by diversified components — specially if these come from different information sources, such as textual data, emoticons, and lexicons — can provide state-of-the-art results for this particular domain. We also compared promising strategies for the representation of tweets (i.e., bag-of-words and feature hashing) and showed their advantages

Acknowledgments

The authors would like to acknowledge Research Agencies CAPES (DS-7253238/D), FAPESP (2013/07787-6 and 2013/07375-0), and CNPq (303348/2013-5) for their financial support. They are also grateful to Marko Grobelnik for pointing out some related work on tweet analysis.

References (70)

  • S.S. Standifird

    Reputation and e-commerce: ebay auctions and the asymmetrical impact of positive and negative ratings

    Journal of Management

    (2001)
  • K. Tumer et al.

    Analysis of decision boundaries in linearly combined neural classifiers

    Pattern Recognition

    (1996)
  • G. Brown et al.

    Diversity creation methods: a survey and categorisation

    Information Fusion

    (2005)
  • A. Ritter et al.

    Named entity recognition in tweets: an experimental study

  • B.J. Jansen et al.

    Twitter power: Tweets as electronic word of mouth

    Journal of the American Society for Information Science and Technology

    (2009)
  • J.-M. Xu et al.

    Learning from bullying traces in social media

    HLT-NAACL

    (2012)
  • M. Cheong et al.

    A microblogging-based approach to terrorism informatics: exploration and chronicling civilian sentiment and response to terrorism events via twitter

    Information Systems Frontiers

    (2011)
  • P.H.C. Guerra et al.

    From bias to opinion: a transfer-learning approach to real-time sentiment analysis

  • N.A. Diakopoulos et al.

    Characterizing debate performance via aggregated twitter sentiment

  • P.D. Turney

    Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews

  • B. Pang et al.

    Thumbs up? Sentiment classification using machine learning techniques

  • M. Hu et al.

    Mining and summarizing customer reviews

  • B. He et al.

    An effective statistical approach to blog post opinion retrieval

  • P. Melville et al.

    Sentiment analysis of blogs by combining lexical knowledge with text classification

    Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD'09

    (2009)
  • A. Balahur, R. Steinberger, M. Kabadjov, V. Zavarella, E. van der Goot, M. Halkia, B. Pouliquen, J. Belyaeva, Sentiment...
  • P. Ekman

    Emotion in the Human Face

    (1982)
  • B. Liu

    Sentiment analysis and opinion mining

    Synthesis Lectures on Human Language Technologies

    (2012)
  • B. Liu

    Web data mining: exploring hyperlinks, contents, and usage data

    Data-Centric Systems and Applications

    (2006)
  • B. Liu

    Sentiment Analysis and Subjectivity

    (2010)
  • A. Agarwal et al.

    Sentiment analysis: a new approach for effective use of linguistic knowledge and exploiting similarities in a set of documents to be classified

  • A. Hassan et al.

    Twitter Sentiment Analysis: A Bootstrap Ensemble Framework

    (2013)
  • K.Q. Weinberger et al.

    Feature hashing for large scale multitask learning

  • L.I. Kuncheva

    Combining Pattern Classifiers: Methods and Algorithms

    (2004)
  • Z. Zhou

    Ensemble methods: foundations and algorithms

    Chapman & Hall/CRC Data Mining and Knowledge Discovery Series

    (2012)
  • J.A. Benediktsson et al.

    Multiple classifier systems

    Vol. 5519 of Lecture Notes in Computer Science

    (2009)
  • G. Wang et al.

    Sentiment classification: the contribution of ensemble learning

    Decision Support Systems

    (2013)
  • A. Abbasi et al.

    Affect analysis of web forums and blogs using correlation ensembles

    IEEE Transactions on Knowledge and Data Engineering

    (2008)
  • Heterogeneous ensemble learning for Chinese sentiment classification

    Journal of Information and Computational Science

    (2012)
  • B. Lu et al.

    Combining a large sentiment lexicon and machine learning for subjectivity classification

  • Y. Su et al.

    Ensemble learning for sentiment classification

  • M. Whitehead et al.

    Sentiment mining using ensemble classification models

  • A. Abbasi et al.

    Affect analysis of web forums and blogs using correlation ensembles

    IEEE Transactions on Knowledge and Data Engineering

    (2008)
  • J. Lin et al.

    Large-scale machine learning at twitter

  • S. Clark et al.

    Swatcs: combining simple classifiers with estimated accuracy

  • C. Rodriguez Penagos et al.

    Fbm: combining lexicon-based ml and heuristics for social media polarities

  • Cited by (0)

    View full text