Joint multi-grain topic sentiment: modeling semantic aspects for online reviews

doi:10.1016/j.ins.2016.01.013

Information Sciences

Volume 339, 20 April 2016, Pages 206-223

https://doi.org/10.1016/j.ins.2016.01.013 Get rights and content

Abstract

The availability of electronic word-of-mouth, online consumer reviews, is increasing rapidly. Users frequently look for important aspects of a product or service in the reviews. They are typically interested in sentiment-oriented ratable aspects (i.e., semantic aspects). However, extracting semantic aspects across domains is challenging. We propose a domain-independent topic sentiment model called Joint Multi-grain Topic Sentiment (JMTS) to extract semantic aspects. JMTS effectively extracts quality semantic aspects automatically, thereby eliminating the requirement for manual probing. We conduct both qualitative and quantitative comparisons to evaluate JMTS. The experimental results confirm that JMTS generates semantic aspects with correlated top words and outperforms state-of-the-art models in several performance metrics.

Introduction

With the availability of ubiquitous internet access, increasing numbers of people are conducting online research prior to buying a product. People are eager to know what consumers feel and their perspectives about a product. This is known as word-of-mouth. The availability of electronic word-of-mouth, online consumer reviews, is growing rapidly and has a significant influence on the purchasing behavior of consumers. This is because consumer reviews contain user perspectives with different usage scenarios and are frequently considered more credible and trustworthy than vendor product descriptions [23]. Although consumer reviews are helpful for product purchasing and online opinion tracking, manually analyzing reviews to gain user opinion insight such as consumer sentiment about important aspects of a product is tedious. Current user interface (UI) tools (e.g., tagging keywords or numerical ratings) are inadequate to digest the details of user opinions. Therefore, there has recently been considerable interest in developing automated tools for opinion mining and sentiment analysis.

A major challenge in opinion mining is aspect-based sentiment analysis [6]. Some online reviews provide overall ratings for an object. However, users are typically interested in the detailed aspects in addition to the overall ratings. The detailed aspects along with the sentiments are embedded in textual content, which has a significant economic influence [3]. Individual preference levels differ considerably by aspect and thus, an object can be described and rated differently for different aspects. For example, one reviewer may rate a restaurant highly based on the taste of the food whereas another reviewer may rate the same restaurant poorly because of the service or ambience. Aspect-based sentiment analysis is valuable for making an informed decision.

Domain independent aspect-based sentiment analysis is challenging. This is because, in many cases, the sentiment polarity of a word is domain-dependent [6]. For instance, unpredictable plot expresses a positive sentiment in the movie domain whereas unpredictable touch screen expresses a negative sentiment in the electronics domain. For the extensive variety of products and services in the countless diverse domains, it is costly to construct labeled data for each product or service. Therefore, domain-independent models with minimal or no supervision are required for aspect-based sentiment analysis systems.

A typical aspect-based sentiment analysis system functions in two phases. To begin, it extracts aspects. Then, it determines the sentiment of the aspects. In many systems, one of the two phases uses some type of supervised settings. For example, predefined aspects are required in [17] for aspect-based sentiment classification. Conversely, aspects are extracted automatically in [30]; however, aspect-based user numerical ratings are required for aspect-based sentiment summarization.

Recently, domain-independent topic-sentiment models (i.e., ASUM [10], JST [14], [15], and HASM [11]) have been proposed for addressing these two problems simultaneously with joint models of topic and sentiment. These models can be applied to any domain because they do not require predefined aspects or a domain-dependent sentiment lexicon. However, the topic-sentiment models fail to automatically identify ratable aspects from many redundant or uncorrelated topics. Furthermore, the optimal number of topics required to model online reviews is either prohibitively large or small (e.g., 100 or 2). In our analysis, the number of ratable aspects of a product is approximately 10. Consequently, it is difficult to conceptualize or browse the sentiment-oriented ratable aspects. Manual effort is required to identify aspects from topics. We have experimentally observed that many topics do not correspond to ratable aspects and contain redundant or uncorrelated top words, even when these models use approximately 10 topics. Although MG-LDA [31] detects ratable aspects, it cannot identify their sentiment orientation.

These limitations of the previous works motivate our research. Although there have been numerous attempts to model both topics and sentiments, there has been no research that examines the effectiveness of multi-grain topic sentiment for aspect-based sentiment classification. Integrating sentiment with multi-grain topics is not trivial because the topics are derived from regions, defined as windows, of a document. We have experimented with many design choices and developed the Joint Multi-grain Topic Sentiment (JMTS) model. JMTS extends MG-LDA by constructing an additional sentiment layer on the presumption that sentiment-oriented ratable aspects are generated from regional distributions of topics and sentiment. One of our key technical contributions is that JMTS relates sentiment to windows and words whereas ASUM and JST relate sentiment to sentences and words. Modeling the relation between sentiment and window has proven to be effective, as will be verified in the experiments in Section 4.

We extend our preliminary work [2] in three areas. First, we use asymmetric priors while incorporating prior sentiment information into JMTS. Second, we compute the aspect sentiment distribution of the sentences as well as the reviews. Third, we demonstrate the efficacy of JMTS compared to existing models in aspect classification and pointwise mutual information (PMI). The contributions of this paper are as follows:

•
We propose a novel JMTS model for online reviews. JMTS effectively extracts quality sentiment-oriented ratable aspects automatically, eliminating the requirement for a manual probe.
•
We verify the efficacy of JMTS qualitatively by demonstrating that JMTS generates correlated top words with low contamination for sentiment-oriented ratable aspects.
•
We confirm that JMTS outperforms existing models (HASM, ASUM, JST, and MG-LDA) with quantitative comparisons.

The remainder of this paper is organized as follows. Section 2 shows related work. Sections 3 and 4 describe the novel JMTS model and experimental results, respectively. Section 5 concludes this paper.

Section snippets

Related work

Sentiment analysis is a well-studied problem [23]. Some of the work includes the economic influence and helpfulness of reviews [38], emotion mining [27], stock movements [12], cross lingual sentiment analysis [16] and review snippets aggregation [28]. The most common sentiment analysis problem is classifying a text into either positive or negative polarity [23]. Some work [7], [34] classifies sentiment into multiple rather than two classes. The majority of the work emphasizes sentiment

Generative model

Our goal is to extract sentiment-oriented ratable aspects of online reviews by extending topic models (e.g., LDA [4]).

LDA generates a document in three steps. To begin, it draws a topic distribution for each document. Then, it selects a topic from the topic distribution. Finally, a word is drawn from the topic. Topics are sampled once for the entire document collection. The graphical model of LDA is presented in Fig. 1a, where the shaded node represents the observed variable and non-shaded

Experimental results

We perform both qualitative and quantitative experiments to evaluate JMTS and the quality of extracted aspects. We compare JMTS aspects with the aspects of prior models qualitatively using two criteria: the top words of an aspect should be correlated and minimally contaminated with the top words of other aspects. We also make quantitative comparisons in aspect sentiment classification, aspect classification, and pointwise mutual information (PMI).

Conclusion

In this paper, we have addressed the problem of extracting sentiment-oriented ratable aspects from online reviews. We have proposed the Joint Multi-grain Topic Sentiment (JMTS) model. We have confirmed that JMTS outperforms the state-of-the-art models qualitatively and quantitatively. We have also demonstrated the quality of extracted aspects compared to human predefined aspects. We are working on unsupervised aspect summarization and aspect rating prediction.

Acknowledgment

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT and Future Planning (numbers 2015R1A2A1A10052665 and 2015R1A2A1A15052701).

References (41)

Q. Li et al.
The effect of news and public mood on stock movements
Inform. Sci.
(2014)
B.W. Matthews
Comparison of the predicted and observed secondary structure of t4 phage lysozyme
Biochim. Biophys. Acta
(1975)
C. Quan et al.
Unsupervised product feature extraction for feature-oriented opinion determination
Inform. Sci.
(2014)
Y. Rao et al.
Sentiment topic models for social emotion mining
Inform. Sci.
(2014)
F. Xianghua et al.
Multi-aspect sentiment analysis for chinese online social reviews based on topic modeling and hownet lexicon
Knowl.-Based Syst.
(2013)
R. Xia et al.
Ensemble of feature sets and classification algorithms for sentiment classification
Inform. Sci.
(2011)
Alias-i (version 4.0.1)....
M.H. Alam et al.
Semantic aspect discovery for online reviews
Proceedings of the 12th IEEE International Conference on Data Mining
(2012)
N. Archak et al.
Show me the money!: deriving the pricing power of product features by mining consumer reviews
Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
(2007)
D.M. Blei et al.
Latent dirichlet allocation
J. Mach. Learn. Res.
(2003)

S. Brody et al.

An unsupervised aspect-sentiment model for online reviews

Proceedings of Human Language Technologies: The 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics

(2010)

R. Feldman

Techniques and applications for sentiment analysis

Commun. ACM

(2013)

G. Ganu et al.

Beyond the stars: improving rating predictions using review text content

Proceedings of 12th International Workshop on the Web and Databases

(2009)

T.L. Griffiths et al.

Finding scientific topics

Proc. Natl. Acad. Sci.

(2004)

T. Hofmann

Unsupervised learning by probabilistic latent semantic analysis

Mach. Learn.

(2001)

Y. Jo et al.

Aspect and sentiment unification model for online review analysis

Proceedings of 4th ACM International Conference on Web Search and Data Mining

(2011)

S. Kim et al.

A hierarchical aspect-sentiment model for online reviews

Proceedings of the 27th AAAI Conference on Artificial Intelligence

(2013)

T. Li et al.

A non-negative matrix tri factorization approach to sentiment classification with lexical prior knowledge

Proceedings of Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

(2009)

C. Lin et al.

Weakly supervised joint sentiment-topic detection from text

IEEE Trans. Knowl. Data Eng.

(2012)

C. Lin et al.

Joint sentiment/topic model for sentiment analysis

Proceedings of 18th ACM International Conference on Information and Knowledge Management

(2009)

Cited by (85)

Evolutionary learning of selection hyper-heuristics for text classification[Formula presented]
2023, Applied Soft Computing
This paper introduces an evolutionary model in the scope of automated machine learning. This model is in charge of learning hyper-heuristics that represent selection rules of the form if-then, such that given a dataset for a text classification problem, the hyper-heuristics select the best classification method to use with it, based on the data distribution of the dataset. The evolutionary model starts by building a set of hyper-heuristics using a series of meta-features extracted from a training group of datasets that represent their data distribution. Hyper-heuristics are then evolved using adapted crossover and mutation operators. During the evolution, each hyper-heuristic is evaluated on its performance to classify each dataset in the training group. When the evolutionary process is done, the best hyper-heuristic is selected and evaluated for its generality with an independent test group of datasets. The results show that the best learned hyper-heuristic obtains an average classification performance close to the general optimum, and has a similar performance to the two most popular state-of-the-art automated machine learning systems, but with less computational cost. The approach used by the present model is relevant for automated machine learning in three aspects, the generality of the hyper-heuristics so they could be applied to groups of datasets; the interpretability of the representations that facilitate the understanding of the method selection by non-expert users; and the reduction of computational time and resources to reach a decision. Furthermore, the model extends the applicability of evolutionary computation methods, with their problem-independent properties and their ability to explore search spaces, to tackle new complex problems, such as the decision of the best classifier for a text classification dataset.
A novel grid-based many-objective swarm intelligence approach for sentiment analysis in social media
2022, Neurocomputing
Sentiment analysis is a field of study that analyses people's opinions, evaluations, feelings, ratings, sentiments, and attitudes towards entities such as products, organizations, individuals, services, topics, titles, events, and qualifications. Studies on sentiment analysis problems in social media have generally adopted intelligent classification methods. However, there are conflicting and contradictory objectives to simultaneously optimize, and active research continues into developing a more effective analysis model in terms of many metrics in order to achieve effective usage. This study considers sentiment analysis as a many-objective optimization problem for the first time. For this purpose, it first proposes a Grid-based Adaptive Many-Objective Grey Wolf Optimizer (GAM-GWO) based on the Grey Wolf Optimizer algorithm. Then, it adapts this proposed method for the sentiment analysis problem in order to obtain more successful results in terms of different metrics. The study tests the performance of the proposed approach with three different data sets. Experimental results show that GAM-GWO can achieve non-dominated and competitive results in all data set classes.
A hierarchical neural network model with user and product attention for deceptive reviews detection
2022, Information Sciences
Deceptive reviews detection has attracted extensive attentions from the business and research communities in recent years. Existing work mainly uses traditional discrete models with rich features from the viewpoint of linguistics and psycholinguistics. The drawback is that these models fail to capture the global semantic information of a sentence or discourse. Recently, neural network models provide new solutions for this task, and can be used to learn global representation of a review text, achieving competitive performance. We observe that a review text usually contains two types of information. Some words or sentences describe the user’s preferences, while others indicate the characteristics of the product. Based on this observation, this paper explores a hierarchical neural network model with attention mechanism, which can learn a global review representation from the viewpoint of user and product, to identify deceptive reviews. Experimental results show that the proposed neural model achieves 91.7% accuracy on the Yelp datasets, outperforming traditional discrete models and neural baseline systems by a large margin.
A robust optimization method for label noisy datasets based on adaptive threshold: Adaptive-k
2024, Frontiers of Computer Science
Sentiment analysis using a deep ensemble learning model
2024, Multimedia Tools and Applications
Solving data-driven newsvendor problem with textual reviews through deep learning
2024, Soft Computing

View all citing articles on Scopus

View full text

Joint multi-grain topic sentiment: modeling semantic aspects for online reviews

Abstract

Introduction

Section snippets

Related work

Generative model

Experimental results

Conclusion

Acknowledgment

Inform. Sci.

Biochim. Biophys. Acta

Inform. Sci.

Inform. Sci.

Knowl.-Based Syst.

Inform. Sci.

Semantic aspect discovery for online reviews

Proceedings of the 12th IEEE International Conference on Data Mining

Show me the money!: deriving the pricing power of product features by mining consumer reviews

Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Latent dirichlet allocation

J. Mach. Learn. Res.

An unsupervised aspect-sentiment model for online reviews

Proceedings of Human Language Technologies: The 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics

Techniques and applications for sentiment analysis

Commun. ACM

Beyond the stars: improving rating predictions using review text content

Proceedings of 12th International Workshop on the Web and Databases

Finding scientific topics

Proc. Natl. Acad. Sci.

Unsupervised learning by probabilistic latent semantic analysis

Mach. Learn.

Aspect and sentiment unification model for online review analysis

Proceedings of 4th ACM International Conference on Web Search and Data Mining

A hierarchical aspect-sentiment model for online reviews

Proceedings of the 27th AAAI Conference on Artificial Intelligence

A non-negative matrix tri factorization approach to sentiment classification with lexical prior knowledge

Proceedings of Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

Weakly supervised joint sentiment-topic detection from text

IEEE Trans. Knowl. Data Eng.

Joint sentiment/topic model for sentiment analysis

Proceedings of 18th ACM International Conference on Information and Knowledge Management