Fake or real? The computational detection of online deceptive text

Ball, Leslie; Elworthy, Jennifer

doi:10.1057/jma.2014.15

Fake or real? The computational detection of online deceptive text

Original Article
Published: 21 November 2014

Volume 2, pages 187–201, (2014)
Cite this article

Journal of Marketing Analytics Aims and scope Submit manuscript

Leslie Ball¹ &
Jennifer Elworthy

357 Accesses
16 Citations
Explore all metrics

Abstract

Online repositories are providing business opportunities to gain feedback and opinions on products and services in the form of digital deposits. Such deposits are, in turn, capable of influencing the readers’ views and behaviours from the posting of misinformation intended to deceive or manipulate. Establishing the veracity of these digital deposits could thus bring key benefits to both online businesses and internet users. Although machine learning techniques are well established for classifying text in terms of their content, techniques to categorise them in terms of their veracity remain a challenge for the domain of feature set extraction and analysis. To date, text categorisation techniques for veracity have reported a wide and inconsistent range of accuracies between 57 and 90 per cent. This article evaluates the accuracy of detecting online deceptive text using a logistic regression classifier based on part of speech tags extracted from a corpus of known truthful and deceptive statements. An accuracy of 72 per cent is achieved by reducing 42 extracted part of speech tags to a feature vector of six using principle component analysis. The results compare favourably to other studies. Improvements are anticipated by training machine learning algorithms on more complex feature vectors by combining the key features identified in this study with others from disparate feature domains.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Bond, C.F. and DePaulo, B.M. (2006) Accuracy of deception judgments. Personality and Social Psychology Review 10 (3): 214–234.
Article Google Scholar
Caruana, R. and Niculescu-Mizil, A. (2006) An empirical comparison of supervised learning algorithms. In: W. Cohen and A. Moore (eds.) ICML ’06: Proceedings of the 23rd International Conference on Machine learning; 25–29 June, Pittsburgh, PA: ACM Press, pp. 161–168.
Cortes, C., Jackel, L.D., Solla, S.A., Vapnik, V. and Denker, J.S. (1994) Learning curves: Asymptotic values and rate of convergence. Advances in Neural Information Processing Systems 6: 327–334.
Google Scholar
De Maeyer, P. (2012) Impact of online consumer reviews on sales and price strategies: A review and directions for future research. Journal of Product & Brand Management 21 (2): 132–139.
Article Google Scholar
Dietterich, T.G. and Kong, E.B. (1995) Technical Report: Machine learning bias, statistical bias, and statistical variance of decision tree algorithms. Oregon, Department of Computer Science, Oregon State University.
Flood, A. (2012) Sock puppetry and fake reviews: publish and be damned. The Guardian. http://www.guardian.co.uk/books/2012/sep/04/sock-puppetry-publish-be-damned, accessed 29 October 2014.
Forman, G. (2008) BNS feature scaling: an improved representation over TF-IDF for SVM text classification. In: J. Shanahan, S. Amer-Yahia, I. Manolescu, Y. Zhang, D. Evans, A. Kolcz, K.-S. Choi and A. Chowdhury (eds.) Proceedings of the 17th ACM Conference on Information and Knowledge Management; 26–30 October, Napa Valley, CA: ACM Press, pp. 263–270.
Fuller, C. et al. (2006) Detecting deception in person of interest statements. In: Mehrotra, S. et al (eds.), Proceedings of the IEEE International Conference on Intelligence and Security Informatics; 23–24 May, San Diego, CA. Berlin: Springer, pp. 504–509.
Gokhman, S., Hancock, J., Prabhu, P., Ott, M. and Cardie, C. (2012) In search of a gold standard in studies of deception. In: E. Fitzpatrick, J. Bachenko and T. Fornaciari (eds.) Proceedings of the European Chapter for the Association for Computational Linguistics: Computational Approaches to Deception Detection Workshop; 23–27 April Avignon, France: ACL, pp. 23–30.
Hauch, V., Masip, J., Blandon-Gitlin, I. and Sporer, S.L. (2012) Linguistic cues to deception assessed by computer programs: a meta-analysis. In: E. Fitzpatrick, J. Bachenko and T. Fornaciari (eds.) Proceedings of the European Chapter for the Association for Computational Linguistics: Computational Approaches to Deception Detection Workshop; 23–27 April, Avignon, France: ACL, pp. 1–4.
Hayes, P.J., Anderson, P.M., Nirenburg, I.B. and Schmandt, L.M. (1990) TCS: a shell for content-based text categorization. In: Proceedings of CAIA-90, 6th IEEE Conference on Artificial Intelligence Applications; 5–9 May, Santa Barbara, US. Los Alamitos, CA: IEEE Computer Society Press, pp. 320–326.
Humphreys, S.L., Moffitt, K.C., Burns, M.B., Burgoon, J.K. and Felix, W.F. (2011) Identification of fraudulent financial statements using linguistic credibility analysis. Decision Support Systems 50 (3): 585–594.
Article Google Scholar
Jindal, N. and Liu, B. (2008) Opinion spam and analysis. In: M. Najork, A. Broder and S. Chakrabati (eds.) Proceedings of the Conference on Web Search and Web Data Mining; 11–12 February, Stanford University, CA: ACM NY, pp. 219–230.
Joachims, T. (1998) Text categorization with support vector machines: learning with many relevant features. In: C. Nedellec and C. Rouveirol (eds.) Proceedings of the 10th European Conference on Machine Learning; 21–23 April, Chemnitz, Germany. London: Springer, pp. 137–142.
Kohavi, R. and John, G. (1996) Wrappers for feature subset selection. Artificial Intelligence 97 (1–2): 273–324.
Google Scholar
Langley, P. (1994) Selection of relevant features in machine learning. In: R. Greiner and D. Subramanian (eds.) Proceedings AAAI Fall Symposium on Relevance. Menlo Park, CA: AAAI Press, pp. 140–144.
Li, F., Huang, M., Yang, Y. and Zhu, X. (2011) Learning to identify review spam. In: Walsh, T. (ed.) Twenty Second International Joint Conference of Artificial Intelligence. Menlo Park, CA: AAAI Press, pp. 2488–2493.
Li, Y.H. and Jain, A.K. (1998) Classification of text documents. The Computer Journal 41 (8): 537–546.
Article Google Scholar
Liu, B. (2010) Sentiment analysis: A multifaceted problem. IEEE Intelligent Systems 25 (3): 76–80.
Google Scholar
Liu, H. and Motoda, H. (2007) Computational Methods of Feature Selection, Florida: Chapman and Hall.
Google Scholar
Mann, S., Vrij, A. and Bull, R. (2002) Suspects, lies, and videotape: An analysis of authentic high-stake liars. Law and Human Behavior 26 (3): 365–376.
Article Google Scholar
Mihalcea, R. and Strapparava, C. (2009) The lie detector: explorations in the automatic recognition of deceptive language. In: K. Su, J. Su, J. Wiebe and H. Li (eds.) Proceedings of the ACL-IJCNLP Conference Short Papers. Singapore: ACL, pp. 309–312.
Mitchell, T. (1997) Machine Learning, New York: McGraw-Hill.
Google Scholar
Newman, M.L., Pennebaker, J.W., Berry, D.S. and Richards, J.M. (2003) Lying words: Predicting deception from linguistic styles. Personality and Social Psychological Bulletin 29 (5): 665–675.
Article Google Scholar
Ng, A.Y. (1997) Preventing “overfitting” of cross validation data. In: Fisher, D.H. (ed.) Proceedings of the Fourteenth International Conference on Machine Learning; 8–12 July. Nashville, TN: Morgan Kaufmann, pp. 245–253.
Ng, A.Y. (1998) On feature selection: learning with exponentially many irrelevant features as training examples. In: Shavlik, J.W. (ed.) Proceedings of the Fifteenth International Conference on Machine Learning; 24–27 July. Madison, WI: Morgan Kaufmann, pp. 404–412.
Ng, A.Y. (2004) Feature selection, L1 vs. L2 regularization, and rotational invariance. In: Brodley, C.E. (ed.) Proceedings of the Twenty-first International Conference on Machine Learning; 4–8 July, Alberta. New York: ACM, p. 78.
Ng, A.Y. and Jordan, M. (2002) On discriminative vs. generative classifiers: a comparison of logistic regression and naive bayes. In: T.G. Dietterich, S. Becker and Z. Ghahramani (eds.) Advances in Neural Information Processing Systems 14: Proceedings of the 2001 Conference. British Columbia: MIT Press, pp. 841–848.
Ott, M., Choi, Y., Cardie, C. and Hancock, J. (2011) Finding deceptive opinion spam by any stretch of the imagination. In: L. Dekang (ed.) Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies; 19–24 June, Portland, OR: ACL, pp. 309–319.
Pepe, A., Mao, H. and Bollen, J. (2011) Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. In: Proceedings of the 5th International AAAI Conference on Weblogs and Social Media; 17–21 July, Spain. Menlo Park, CA: AAAI Press, pp. 450–453.
Qin, T., Burgoon, J. and Nunamaker, Jr J.F. (2004) An exploratory study on promising cues in deception detection and application of decision tree. In: R. Sprague (ed.) Proceedings of the 37th Hawaii International Conference on System Sciences; 5–8 January, Hawaii: IEEE Computer Society, pp. 23–32.
Sebastiani, F. (2002) Machine learning in automated text categorization. ACM Computing Surveys 34 (1): 1–47.
Article Google Scholar
Sharma, A. and Paliwal, K. (2007) Fast principal component analysis using fixed-point algorithm. Pattern recognition Letters 28 (10): 1151–1155.
Article Google Scholar
Trovillo, P.V. (1939) A history of lie detection. Journal of Criminal Law and Criminology 29 (6): 848–881.
Google Scholar
Trunk, G.V. (1979) A problem of dimensionality: A simple example. Pattern Analysis and Machine Intelligence, IEEE Transactions 1 (3): 306–307.
Article Google Scholar
Vartapetiance, A. and Gillam, L. (2012) Does deception research yet offer a basis for deception detectives? In: E. Fitzpatrick, J. Bachenko and T. Fornaciari (eds.) Proceedings of the European Chapter for the Association for Computational Linguistics: Computational Approaches to Deception Detection Workshop; 23–27 April, Avignon, France: ACL, pp. 5–14.
Vrij, A., Edward, K., Roberts, K.P. and Bull, R. (2000) Detecting deceit via analysis of verbal and nonverbal behaviour. Journal of Nonverbal Behavior 24 (4): 239–264.
Article Google Scholar
Wang, G., Xie, S., Liu, B. and Yu, P.S. (2011) Review graph based online store review spammer detection. In: D. Cook, J. Pei, W. Wang, R. Osmar and X. Wu (eds.) 11th IEEE International Conference on Data Mining. Vancouver, Canada: IEEE, pp. 1242–1247.
Wu, G., Greene, D., Smyth, B. and Cunningham, P. (2010) Technical Report: Distortion as a Validation Criterion in the Identification of Suspicious Reviews. Dublin, University College Dublin.
Xu, K., Liao, S.S., Li, J. and Song, Y. (2011) Mining comparative opinions from customer reviews for competitive intelligence. Decision Support Systems 50 (4): 743–754.
Article Google Scholar
Yang, Y. (1999) An evaluation of statistical approaches to text categorization. Information Retrieval 1 (1–2): 69–90.
Article Google Scholar
Zhang, T. and Oles, F.J. (2001) Text categorization based on regularized linear classification methods. Information Retrieval 4 (1): 5–31.
Article Google Scholar
Zhou, L., Burgoon, J.K., Twitchell, D.P., Qin, T. and Nunamaker, Jr J.F. (2004) A comparison of classification methods for predicting deception in computer-mediated communication. Journal of Management Information Systems 20 (4): 139–166.
Google Scholar
Zhou, L., Twitchell, D.P., Qin, T., Burgoon, J.K. and Nunamaker, Jr J.F. (2002) An exploratory study into deception detection in text-based computer-mediated communication. In: Sprague, Jr R.H. (ed.) System Sciences 2003: Proceedings of Thirty-Sixth Hawaii International Conference on System Sciences. IEEE, pp. 10–19.

Web References

Ante, S.E. (2009) Amazon: turning consumer opinions into gold. BusinessWeek 15 October. http://www.businessweek.com/magazine/content/09_43/b4152047039565.htm, accessed 29 October 2014.
Office for National Statistics (2012) Statistical bulletin: Retail Sales, July 2012, http://www.ons.gov.uk/ons/rel/rsi/retail-sales/july-2012/stb-rsi-july-2012.html, accessed 29 October 2014.

Download references

Author information

Authors and Affiliations

Abertay University, Bell Street, Tayside, DD11HG, Dundee, UK
Leslie Ball

Authors

Leslie Ball
View author publications
You can also search for this author in PubMed Google Scholar
Jennifer Elworthy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Leslie Ball.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ball, L., Elworthy, J. Fake or real? The computational detection of online deceptive text. J Market Anal 2, 187–201 (2014). https://doi.org/10.1057/jma.2014.15

Download citation

Received: 15 October 2014
Revised: 15 October 2014
Published: 21 November 2014
Issue Date: 01 September 2014
DOI: https://doi.org/10.1057/jma.2014.15

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fake or real? The computational detection of online deceptive text

Abstract

Access this article

Similar content being viewed by others

Harnessing the Power of Text Mining for the Detection of Abusive Content in Social Media

Compression-Based Algorithms for Deception Detection

Cross-domain deception detection using support vector networks

References

Web References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fake or real? The computational detection of online deceptive text

Abstract

Access this article

Similar content being viewed by others

Harnessing the Power of Text Mining for the Detection of Abusive Content in Social Media

Compression-Based Algorithms for Deception Detection

Cross-domain deception detection using support vector networks

References

Web References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation