2015 | OriginalPaper | Buchkapitel
Detection of Opinion Spam with Character n-grams
verfasst von : Donato Hernández Fusilier, Manuel Montes-y-Gómez, Paolo Rosso, Rafael Guzmán Cabrera
Erschienen in: Computational Linguistics and Intelligent Text Processing
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
In this paper we consider the detection of opinion spam as a stylistic classification task because, given a particular domain, the deceptive and truthful opinions are similar in content but differ in the way opinions are written (style). Particularly, we propose using character n-grams as features since they have shown to capture lexical content as well as stylistic information. We evaluated our approach on a standard corpus composed of 1600 hotel reviews, considering positive and negative reviews. We compared the results obtained with character n-grams against the ones with word n-grams. Moreover, we evaluated the effectiveness of character n-grams decreasing the training set size in order to simulate real training conditions. The results obtained show that character n-grams are good features for the detection of opinion spam; they seem to be able to capture better than word n-grams the content of deceptive opinions and the writing style of the deceiver. In particular, results show an improvement of 2.3% and 2.1% over the word-based representations in the detection of positive and negative deceptive opinions respectively. Furthermore, character n-grams allow to obtain a good performance also with a very small training corpus. Using only 25% of the training set, a Naïve Bayes classifier showed
F
1
values up to 0.80 for both opinion polarities.