Tweet sentiment analysis with classifier ensembles
Introduction
Twitter is a popular microblogging service in which users post status messages, called “tweets”, with no more than 140 characters. In most cases, its users enter their messages with much fewer characters than the limit established. Twitter represents one of the largest and most dynamic datasets of user generated content — approximately 200 million users post 400 million tweets per day [1]. Tweets can express opinions on different topics, which can help to direct marketing campaigns so as to share consumers' opinions concerning brands and products [2], outbreaks of bullying [3], events that generate insecurity [4], polarity prediction in political and sports discussions [6], and acceptance or rejection of politicians [5], all in an electronic word-of-mouth way. Automatic tools can help decision makers to ensure efficient solutions to the problems raised. Under this perspective, the focus of our work is on the sentiment analysis of tweets.
Sentiment analysis aims at determining opinions, emotions, and attitudes reported in source materials like documents, short texts, sentences from reviews [7], [8], [9], blogs [10], [11], and news [12], among other sources. In such application domains, one deals with large text corpora and most often “formal language”. At least two specific issues should be addressed in any type of computer-based tweet analysis: first, the frequency of misspellings and slang in tweets is much higher than that in other domains, as users usually post messages from many different electronic devices, such as cell phones and tablets, and develop their own culture of a specific vocabulary in this type of environment. Second, Twitter users post messages on a variety of topics, unlike blogs, news, and other sites, which are tailored to specific topics.
We consider sentiment analysis a classification problem. Just like in large documents, sentiments of tweets can be expressed in different ways and classified according to the existence of sentiment, i.e., if there is sentiment in the message, then it is considered polar (categorized as positive or negative), otherwise it is considered neutral. Some authors, on the other hand, consider the six “universal” emotions [13]: anger, disgust, fear, happiness, sadness, and surprise as sentiments. In this paper, we adopt the view that sentiments can be either positive or negative, as in [14], [15], [16], [17], [18].
Big challenges can be faced in tweet sentiment analysis (Hassan et al. [19]): (i) neutral tweets are way more common than positive and negative ones. This is different from other sentiment analysis domains (e.g. product reviews), which tend to be predominantly positive or negative; (ii) there are linguistic representational challenges, like those that arise from feature engineering issues; and (iii) tweets are very short and often show limited sentiment cues.
Many researchers have focused on the use of traditional classifiers, like Naive Bayes, Maximum Entropy, and Support Vector Machines to solve such problems. In this paper, we show that the use of ensembles of multiple base classifiers, combined with scores obtained from lexicons, can improve the accuracy of tweet sentiment classification. Moreover, we investigate different representations of tweets that take bag-of-words and feature hashing into account [20].
The combination of multiple classifiers to generate a single classifier has been an active area of research over the last two decades [21], [22], [23]. For example, an analytical framework that quantifies the improvements in classification results due to the combination of multiple models is addressed in [24]. More recently, a survey on traditional ensemble techniques — together with their applications to many difficult real-world problems, such as remote sensing, person recognition, and medicine — is presented in [24]. Studies on ensemble learning for sentiment analysis of large text corpora — like those found in movies and product reviews, web forum datasets, and question answering — are reported in [25], [26], [27], [28], [29], [30], [31]. In summary, the literature on the subject has shown that from independent, diversified classifiers, the ensemble created is usually more accurate than its individual components. Related work on tweet sentiment analysis is rather limited [32], [33], [34], [19], but the initial results are promising.
Our main contributions can be summarized as follows: (i) we show that classifier ensembles formed by diversified components are promising for tweet sentiment analysis; (ii) we compare bag-of-words and feature hashing-based strategies for the representation of tweets and show their advantages and drawbacks; and (iii) classifier ensembles obtained from the combination of lexicons, bag-of-words, emoticons, and feature hashing are studied and discussed.
The remainder of the paper is organized as follows: Section 2 addresses the related work. Section 3 describes our approach, for which experimental results are provided in Section 4. Section 5 concludes the paper and discusses directions for future work.
Section snippets
Related work
Several studies on the use of stand-alone classifiers for tweet sentiment analysis are available in the literature, as shown in the summary in Table 1. Some of them propose the use of emoticons and hashtags for building the training set, as Go et al. [35] and Davidov et al. [36], who identified tweet polarity by using emoticons as class labels. Others use the characteristics of the social network as networked data, like in Hu et al. [37]. According to the authors, emotional contagion theories
Classifier ensembles for tweet sentiment analysis
Ensemble methods train multiple learners to solve the same problem [22]. In contrast to classic learning approaches, which construct one learner from the training data, ensemble methods construct a set of learners and combine them. Dietterich [56] lists three reasons for using an ensemble based system:
- Statistical
Assume that we have a number of different classifiers, and that all of them provide good accuracy in the training set. If a single classifier is chosen from the available ones, it may not yield
Datasets
Our experiments were performed in representative datasets obtained from tweets on different subjects [66]:
Concluding remarks
The use of classifier ensembles for tweet sentiment analysis has been underexplored in the literature. We have demonstrated that classifier ensembles formed by diversified components — specially if these come from different information sources, such as textual data, emoticons, and lexicons — can provide state-of-the-art results for this particular domain. We also compared promising strategies for the representation of tweets (i.e., bag-of-words and feature hashing) and showed their advantages
Acknowledgments
The authors would like to acknowledge Research Agencies CAPES (DS-7253238/D), FAPESP (2013/07787-6 and 2013/07375-0), and CNPq (303348/2013-5) for their financial support. They are also grateful to Marko Grobelnik for pointing out some related work on tweet analysis.
References (70)
Reputation and e-commerce: ebay auctions and the asymmetrical impact of positive and negative ratings
Journal of Management
(2001)- et al.
Analysis of decision boundaries in linearly combined neural classifiers
Pattern Recognition
(1996) - et al.
Diversity creation methods: a survey and categorisation
Information Fusion
(2005) - et al.
Named entity recognition in tweets: an experimental study
- et al.
Twitter power: Tweets as electronic word of mouth
Journal of the American Society for Information Science and Technology
(2009) - et al.
Learning from bullying traces in social media
HLT-NAACL
(2012) - et al.
A microblogging-based approach to terrorism informatics: exploration and chronicling civilian sentiment and response to terrorism events via twitter
Information Systems Frontiers
(2011) - et al.
From bias to opinion: a transfer-learning approach to real-time sentiment analysis
- et al.
Characterizing debate performance via aggregated twitter sentiment
Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews