An overview of online fake news: Characterization, detection, and discussion

https://doi.org/10.1016/j.ipm.2019.03.004Get rights and content

Abstract

Over the recent years, the growth of online social media has greatly facilitated the way people communicate with each other. Users of online social media share information, connect with other people and stay informed about trending events. However, much recent information appearing on social media is dubious and, in some cases, intended to mislead. Such content is often called fake news. Large amounts of online fake news has the potential to cause serious problems in society. Many point to the 2016 U.S. presidential election campaign as having been influenced by fake news. Subsequent to this election, the term has entered the mainstream vernacular. Moreover it has drawn the attention of industry and academia, seeking to understand its origins, distribution and effects.

Of critical interest is the ability to detect when online content is untrue and intended to mislead. This is technically challenging for several reasons. Using social media tools, content is easily generated and quickly spread, leading to a large volume of content to analyse. Online information is very diverse, covering a large number of subjects, which contributes complexity to this task. The truth and intent of any statement often cannot be assessed by computers alone, so efforts must depend on collaboration between humans and technology. For instance, some content that is deemed by experts of being false and intended to mislead are available. While these sources are in limited supply, they can form a basis for such a shared effort.

In this survey, we present a comprehensive overview of the finding to date relating to fake news. We characterize the negative impact of online fake news, and the state-of-the-art in detection methods. Many of these rely on identifying features of the users, content, and context that indicate misinformation. We also study existing datasets that have been used for classifying fake news. Finally, we propose promising research directions for online fake news analysis.

Introduction

The exploding development of World Wide Web after the mid-1990s has significantly advanced the way that people communicate with each other. Online social media, like Twitter and Facebook, can facilitate the distribution of real-time information among users from all over the world. With the characteristics of ease-of-use, low cost, and rapid rate, social media has become the major platform for online social interaction and information transmission (Shu, Sliva, Wang, Tang, & Liu, 2017). Nowadays, nearly two-thirds of American adults get access to news via online channels (News use across social media platforms 2016), and this number is still growing exponentially (Dale, 2017, News use across social media platforms 2016).

However, owing to the increasing popularity of online social media, the Internet becomes an ideal breeding ground for spreading fake news, such as misleading information, fake reviews, fake advertisements, rumors, fake political statements, satires, and so on. Now fake news is more popular and widely spread through social media than mainstream media (Balmas, 2014). Being extensively used for confusing and persuading online users with biased facts, fake news has become the major concern for both industry and academia. Furthermore, a massive amount of incredible and misleading information is created and displayed through the Internet, which has arisen as a potential threat to online social communities, and had a deep negative impact on the Internet activities, such as online shopping, and social networking.(Fig. 1)

The issue of online fake news has gained more attention by both researchers and practitioners, especially after 2016 U.S. presidential election (Horne & Adali, 2017). Fake news has been accused of increasing political polarization and partisan conflict during the election campaign (Riedel, Augenstein, Spithourakis, & Riedel, 2017), and the voters can also be easily influenced by the misleading political statements and claims. Many latest online fact-checking systems, such as FactCheck.org and PolitiFact.com are based on manual detection approaches by professionals, where time latency is the main issue. Also, most of the existing online fact-checking resources are mainly focusing on the verification of political news, so the practical applicability of those systems is limited, due to the high variety of news types and formats, and the widely and quickly propagation of fake information in the social network. In addition, a large amount of real-time information is created, commented, and shared via online social media everyday, which makes online real-time fake news detection even more difficult.

In recent years, to help online users identify useful and valuable information, there has been extensive research on establishing an effective and automatic framework for online fake news detection. Identifying credible social information from millions of messages, however, is challenging, due to the heterogeneous and dynamic nature of online social communication. More specifically, it is difficult to distinguish online truthful signals from the fake and anomalous information, since the fake news is intentionally written to mislead readers (Shu et al., 2017). Meanwhile, the linguistic-based features extracted from the news content are not sufficient for revealing the in-depth underlying distribution patterns of fake news (Shu, Sliva, Wang, Tang, Liu, 2017, Zhao, Cao, Wen, Song, Lin, Collins, 2014). Auxiliary features such as the credibility of the news author and the spreading patterns of the news, play more important roles for online fake news prediction. Furthermore, online social data is time-sensitive, which means that they occur in a real-time pattern, and represent the trending topics and events. As a result, an online real-time detection system should be designed for detecting, exploring and interpreting fake information in online social media.

Over the recently years, the fast and explosive development of social media have witnessed the extensive growth in the number of fake news. Nowadays, fake news is annoying, obtrusive, distracting and all over the places. It has profound impacts on both individuals and the society. So it is significant for building an effective detection system for fake news identification. The basic characteristics of fake news can be summarized as follows:

  • The volume of fake news: Without any verification procedure, everyone can easily write fake news on the Internet (Ahmed, 2017). There are lots of webpages which are established purposely to publish fake news and stories, such as denverguardian.com, wtoe5news.com, ABCnews.com.co, and so on. Those websites often resemble legitimate news organizations (Allcott & Gentzkow, 2017), and are deliberately created to distribute hoaxes, propaganda, and disinformation, often for financial or political gain. Therefore, a massive amount of fake contents are distributed through the Internet, even without users’ awareness.

  • The variety of fake news: There are several close definitions of fake news, such as rumors, satire news, fake reviews, misinformation, fake advertisements, conspiracy theories, false statement by politicians etc., which affect every aspect of people’ lives. With the increasing popularity of social media, fake news can dominate public’s opinions, interests and decisions. In addition, fake news change the way that people interact with real news. Some fake news are created intentionally to mislead and confuse social media users, especially young students and old people who are empty of self-protection consciousness (Forbes.com). For example, some rumors were propagated on Twitter immediately after the 2010 earthquake in Chile, which increased the public panic and chaos in the local population (Castillo, Mendoza, & Poblete, 2011). More recently, a story shared on Facebook used selective TV ratings data to make the misleading claim that Cable News Network (CNN) was not one of the 10 most watched cable networks in 2018 (Fichera). Another fake science news reported that Physicist Stephen Hawking warned “aliens existed on the far side of the Moon” (Daily). We can see that online fake news is profound and far-reaching into every aspect of our daily life.

  • The velocity of fake news: Fake news creators tend to be short-lived (Allcott & Gentzkow, 2017). For example, many active fake news webpages during 2016 U.S. election no longer exist after the campaign. As more attention is paid to fake news in recent years, more fake news generators are nothing but a transient flash in order to avoid detection by the detection systems. Furthermore, most of the fake news on social media are focusing on the current events and hot affairs to bring more attention to the online users. The real-time nature of fake news on social media makes identifying online fake news even more difficult. It is complicated to evaluate how many online users are involved with a certain piece of instant message, and it is hard to tell when and how the far-reaching consequences of fake news stop.

By conveying biased and false information, fake news can destroy folk’s faith and beliefs in authorities, experts and the government. For instance, 88% of customers rely on online reviews, and 72% of them firmly believe a business with positive reviews (Ahmed, 2017). Another example is 2016 U.S. presidential election. During this campaign, hundreds or thousands of Russian fake accounts posted anti-Clinton messages, such as “Hillary was sick”, “Hillary was a criminal”, “Obama had a secret army”, and so on, to influence soft Hillary Clinton supporters (Russia used fake news, The fake americans russia created to influence the election). The voters can easily be affected by the false information, and even work as fake news spreaders by sharing the fake content and commenting on the fake news. There is a view that Donald Trump’s victory in 2016 U.S. president election is somehow regarded as the outcome of fake news (The negative impact of fake news, Allcott, Gentzkow, 2017). The fake news continues to dominate the Internet these days, which brings fateful consequences to the society, to the politics, to IT and financial matters, and to everyone who may live in a cyber environment with the crisis of trust. There is an immediate necessity for generating a well-established, accurate-oriented real-time system for online fake news detection and identification.

As fake news detection has become an emerging topic, more and more technical giant companies are seeking future solutions for recognizing online fake information. With the help from fact-checking professionals, Facebook allows users to flag and report satires or news that are potentially suspicious and anomalous (News Feed fyi, Mark zuckerberg). Most recently, a new online service called “Google News Initiative” is announced by Google, in order to fight fake news, misinformation, and contentious breaking stories (Google news initiative). This project will spend $392 millions over the next several years, which could make it easier for readers to subscribe to quality publication. Also, it can help readers on how to spot misleading news and reports (Google announcement).

However, accurate fake news detection, is still challenging, due to the dynamic nature of the social media, and the complexity and diversity of online communication data. In addition, the limited availability of high-quality training data is a big issue for training supervised learning models. It is necessary to design a framework which is able to identify anomalous or suspicious online information even without the knowledge of anomalous samples (Zhao et al., 2014). Under the circumstances, both industry and academia are actively involved in the trend of combating online fake news. It is significant to design effective, automatic and applicable approaches for online fake news detection.

The motivations of this paper can be summarized as follows. (1) The analysis of fake news content is not sufficient to establish an effective and reliable detection system. So other important relevent aspects, such as author and user analysis, news social context are also described in this paper, in order to generate an overall understanding of online social information. (2) The studies on online fake news detection are diverse in terms of objectives, methodologies and domains. It is a necessity to summarize different types of techniques and methods in this area, compare representative hand-crafted features, and evaluate the existing detection systems. By presenting a comprehensive view of online fake news detection, our survey can provide practical conveniences for both researchers and participators. (3) Potential promising data mining algorithms and methods are introduced in this paper, which are valuable for addressing the aforementioned challenging and improving the existing detection frameworks.

Recently, some survery papers also cover the topic of online fake news and false information detection. In Shu et al. (2017), the authors discuss online fake news detection on social media. Especially, they focus on characterizing fake news in the perspectives of psychology and social theories. Also, the existing data mining algorithms and the evaluation metrics are demonstrated in their paper. In Kumar and Shah (2018), the authors present a comprehensive study on the distribution of online false information. They mainly discuss how false information proliferates on the Internet and why it succeeds in deceiving online readers. Also, they quantify the impact of false information and summarize some useful algorithms for detecting false information. Different from their studies, in our work, the fake news is characterized by four major aspects: news creator, news content, social context, and the targets. We believe that in this way the readers can have a better understanding of the nature of online fake news, like who is the news sources, what is their purpose for creating online false information, what writing skills are more likely to be used in fake news, how fake news is distributed via the Internet, and how it can effect online readers. The major contributions of this paper can be summarized as follows. (1) We summarize both practical-based approaches and research-based approaches for online fake news detection. And for the readers, no matter they are researchers, industrial participates, or random Internet users, can find helpful and useful knowledge from our work. (2) We propose an up-to-date and comprehensive set of features which can be used for online fake news identification. This feature set contains three different subcategories: creator and user-based features, news content-based features, and social context-based features. With our proposed features, researchers can not only conduct a task of online fake news detection, but also work on other similar domains, like botnet detection, malicious or fake account detection, unknown news creator detection, sentiment analysis, stance detection, news similarity analysis, and so on. This is practically significant for researchers whose research interests are data mining in social media, natural language processing, and false information detection. (3) At the end of this survey, some potential technologies like unsupervised learning algorithms, one-class classification algorithms, and real-time detection are proposed as future research directions. Also a comprehensive fake news detection ecosystem is designed with three layers (alert layer, detection layer, and intervention layer). We provide a well-structured work on the topic of fake news detection, from characterization, detection to final discussion.

The rest of this paper is organized as follows. Section 2 demonstrates the definition and other important aspects of online fake news, such as the author and target users of fake news, the news content body and the social context of online fake news. Section 3 summarizes practical-based approaches for fake news detection, includes online fact-checking resources and some useful social guides. Section 4 presents the latest research based studies for online fake news detection and analysis, lists the influential features for fake news representation, and evaluates the available fake news datasets. Section 5 discusses the open issues and some promising research directions in online fake news analysis. And finally Section 6 recaps the conclusions and the contributions of this paper.

Section snippets

Fake news characterization

Nowadays online fake news tend to be intrusive and diverse in terms of topics, styles and platforms (Shu et al., 2017). And it is not easy to construct a generally accepted definition for “fake news”. Stanford University provides the definition of fake news as: “the news articles that are intentionally and verifiably false, and could mislead readers (Detecting fake news with nlp)”. According to Wikipedia (Fake news), fake news is: “a type of yellow journalism or propaganda that consists of

Fake news detection – practical-based approaches

Simply speaking, fake news detection is the task of assessing the truthfulness of a certain piece of news (Vlachos & Riedel, 2014). And the current fake news detection resources can be summarized into two categories: (1) the practical-based detection approaches, from the perspective of Internet users, and (2) the existing research-based detection methods, from the perspective of academia and research. And in this section we mainly discuss the practical-based approach for online fake news

The existing research-based approaches

In this section, we review and discuss the stat-of-the-art studies on fake news detection. Table 2 illustrates the overall categorizations of the current research on online fake news detection, from which we can discover the differences of detecting different types of false information in terms of features, data mining algorithms, and platforms.

Recently, the development of online social media rise the widespread dissemination of online fake news. The information distributed via social networks

Open issues and future work

In this section, some challenges and open issues for automatic online fake news detection are discussed, along with some promising research directions in this area. Finally, we present how to build an effective online fake news detection ecosystem.

Conclusion

Recently, fake news is emerging as one of the most threatening harms on social media. Fake news can be used by malicious entities to manipulate people’s options and decisions on important daily activities, like stock markets, health-care options, online shopping, education, and even presidential election. Automatic detection of online fake news is an extremely significant but challenging task for both industry and academia (Ruchansky et al., 2017). In this survey, we present a comprehensive

Acknowledgement

The authors generously acknowledge the funding from the Atlantic Canada Opportunity Agency (ACOA) through the Atlantic Innovation Fund (AIF) Project #201212 and through Discover Grant and Tier 1 Canada Research Chair Funding grant from the National Science and Engineering Research Council of Canada (NSERC 232074) to Dr. Ghorbani.

References (156)

  • Y. Chen et al.

    Misleading online content: Recognizing clickbait as false news

    Proceedings of the 2015 ACM on workshop on multimodal deception detection

    (2015)
  • C. Désir et al.

    One class random forests

    Pattern Recognition

    (2013)
  • B. Adams et al.

    Eventscapes: visualizing events over time with emotive facets

    Proceedings of the 19th ACM international conference on multimedia

    (2011)
  • S. Afroz et al.

    Detecting hoaxes, frauds, and deception in writing style online

    Security and privacy (sp), 2012 IEEE symposium on

    (2012)
  • A. Agarwal et al.

    Sentiment analysis of twitter data

    Proceedings of the workshop on languages in social media

    (2011)
  • H. Ahmed

    Detecting opinion spam and fake news using n-gram analysis and semantic similarity

    (2017)
  • H. Allcott et al.

    Social media and fake news in the 2016 election

    Journal of Economic Perspectives

    (2017)
  • E. Bakshy et al.

    Everyone’s an influencer: Quantifying influence on twitter

    Proceedings of the fourth ACM international conference on web search and data mining

    (2011)
  • M. Balmas

    When fake news becomes real: Combined exposure to multiple news sources and political attitudes of inefficacy, alienation, and cynicism

    Communication Research

    (2014)
  • R. Banerjee et al.

    Keystroke patterns as prosody in digital writings: A case study with deceptive reviews and essays

    Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)

    (2014)
  • P. Bojanowski et al.

    Enriching word vectors with subword information

    Transactions of the Association for Computational Linguistics

    (2017)
  • Bordes, A., Chopra, S., Weston, J. (2014). Question answering with subgraph embeddings. ArXiv...
  • C. Burfoot et al.

    Automatic satire detection: Are you having a laugh?

    Proceedings of the ACL-IJCNLP 2009 conference short papers

    (2009)
  • Business insider most trust news list....
  • Buzzfeednews. https://github.com/BuzzFeedNews/2016-10-facebook-fact-check/blob/master/data/facebook-fact-check.csv....
  • N. Cao et al.

    Targetvue: Visual analysis of anomalous user behaviors in online communication systems

    IEEE Transactions on Visualization and Computer Graphics

    (2016)
  • C. Castillo et al.

    Information credibility on twitter

    Proceedings of the 20th international conference on world wide web

    (2011)
  • C. Castillo et al.

    Predicting information credibility in time-sensitive social media

    Internet Research

    (2013)
  • Cbc news. http://www.cbc.ca/news/canada/toronto/scarborough-hijab-attack-1.4487716. Accessed:...
  • M. Cha et al.

    A measurement-driven analysis of information propagation in the flickr social network

    Proceedings of the 18th international conference on world wide web

    (2009)
  • Chalapathy, R., Krishna Menon, A., & Chawla, S. (2018). Anomaly detection using one-class neural networks. ArXiv...
  • V. Chandola et al.

    Anomaly detection: A survey

    ACM Computing Surveys (CSUR)

    (2009)
  • Z. Chu et al.

    Detecting automation of twitter accounts: Are you a human, bot, or cyborg?

    IEEE Transactions on Dependable and Secure Computing

    (2012)
  • Classify.news. https://makenewscredibleagain.github.io/. Accessed:...
  • P. Cogan et al.

    Reconstruction and analysis of twitter conversation graphs

    Proceedings of the first ACM international workshop on hot topics on interdisciplinary social networks research

    (2012)
  • R. Collobert et al.

    Natural language processing (almost) from scratch

    Journal of Machine Learning Research

    (2011)
  • N.J. Conroy et al.

    Automatic deception detection: Methods for finding fake news

    Proceedings of the Association for Information Science and Technology

    (2015)
  • W. Cui et al.

    Textflow: Towards better understanding of evolving topics in text

    IEEE Transactions on Visualization and Computer Graphics

    (2011)
  • Daily, C.. http://global.chinadaily.com.cn/a/201801/10/WS5a55a685a3102e5b17371dd1_2.html. Accessed:...
  • R. Dale

    Nlp in a post-truth world

    Natural Language Engineering

    (2017)
  • C.A. Davis et al.

    Botornot: A system to evaluate social bots

    Proceedings of the 25th international conference companion on world wide web

    (2016)
  • M.-C. De Marneffe et al.

    Generating typed dependency parses from phrase structure parses

    Proceedings of lrec

    (2006)
  • Del Vicario, M., Quattrociocchi, W., Scala, A., & Zollo, F. (2018). Polarization and fake news: Early warning of...
  • M.L. Della Vedova et al.

    Automatic online fake news detection combining content and social signals

    2018 22nd conference of open innovations association (fruct)

    (2018)
  • I. Dematis et al.

    Fake review detection via exploitation of spam indicators and reviewer behavior characteristics

    International conference on current trends in theory and practice of informatics

    (2018)
  • Denver guardian. https://en.wikipedia.org/wiki/Denver_Guardian. Accessed:...
  • Detecting fake news with nlp. https://medium.com/@Genyunus/detecting-fake-news-with-nlp-c893ec31dee8. Accessed:...
  • A. Devitt et al.

    Sentiment polarity identification in financial news: A cohesion-based approach

    Proceedings of the 45th annual meeting of the association of computational linguistics

    (2007)
  • R. Di et al.

    Fake comment detection based on time series and density peaks clustering

    International conference on algorithms and architectures for parallel processing

    (2018)
  • N. Diakopoulos et al.

    Diamonds in the rough: Social media visual analytics for journalistic inquiry

    Visual analytics science and technology (vast), 2010 IEEE symposium on

    (2010)
  • J.P. Dickerson et al.

    Using sentiment to detect bots on twitter: Are humans more opinionated than bots?

    Advances in social networks analysis and mining (asonam), 2014 IEEE/ACM international conference on

    (2014)
  • P.S. Dodds et al.

    Human language reveals a universal positivity bias

    Proceedings of the National Academy of Sciences

    (2015)
  • Dong, M., Yao, L., Wang, X., Benatallah, B., Huang, C., & Ning, X. (2018a). Opinion fraud detection via neural...
  • M. Dong et al.

    Dual: A deep unified attention model with latent relation representations for fake news detection

    International conference on web information systems engineering

    (2018)
  • Factcheck. https://www.factcheck.org/about/our-mission/. Accessed:...
  • Factmata....
  • Fake news. https://en.wikipedia.org/wiki/Fake_news. Accessed:...
  • Fakespot. https://www.fakespot.com. Accessed:...
  • False mislaeding clickabit or satirical news sources....
  • Farajtabar, M., Yang, J., Ye, X., Xu, H., Trivedi, R., Khalil, E., et al. (2017). Fake news mitigation via point...
  • Cited by (565)

    View all citing articles on Scopus
    View full text