ABSTRACT
First identified in Wuhan, China, in December 2019, the outbreak of COVID-19 has been declared as a global emergency in January, and a pandemic in March 2020 by the World Health Organization (WHO). Along with this pandemic, we are also experiencing an "infodemic" of information with low credibility such as fake news and conspiracies. In this work, we present ReCOVery, a repository designed and constructed to facilitate research on combating such information regarding COVID-19. We first broadly search and investigate ~2,000 news publishers, from which 60 are identified with extreme [high or low] levels of credibility. By inheriting the credibility of the media on which they were published, a total of 2,029 news articles on coronavirus, published from January to May 2020, are collected in the repository, along with 140,820 tweets that reveal how these news articles have spread on the Twitter social network. The repository provides multimodal information of news articles on coronavirus, including textual, visual, temporal, and network information. The way that news credibility is obtained allows a trade-off between dataset scalability and label accuracy. Extensive experiments are conducted to present data statistics and distributions, as well as to provide baseline performances for predicting news credibility so that future methods can be compared. Our repository is available at http://coronavirus-fakenews.com.
Supplemental Material
- Ramy Baly, Georgi Karadzhov, Dimitar Alexandrov, James Glass, and Preslav Nakov. 2018. Predicting factuality of reporting and bias of news media sources. arXiv preprint arXiv:1810.01765 (2018).Google Scholar
- Emily Chen, Kristina Lerman, and Emilio Ferrara. 2020. Tracking Social Media Discourse About the COVID-19 Pandemic: Development of a Public Coronavirus Twitter Data Set. JMIR Public Health and Surveillance , Vol. 6, 2 (2020), e19273.Google ScholarCross Ref
- Limeng Cui and Dongwon Lee. 2020. CoAID: COVID-19 Healthcare Misinformation Dataset. arXiv preprint arXiv:2006.00885 (2020).Google Scholar
- Enyan Dai, Yiwei Sun, and Suhang Wang. 2020. Ginger Cannot Cure Cancer: Battling Fake Health News with a Comprehensive Data Repository. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 14. 853--862.Google ScholarCross Ref
- Ensheng Dong, Hongru Du, and Lauren Gardner. 2020. An interactive web-based dashboard to track COVID-19 in real time. The Lancet infectious diseases , Vol. 20, 5 (2020), 533--534.Google Scholar
- Emilio Ferrara. 2019. The history of digital spam. Commun. ACM, Vol. 62, 8 (2019), 82--91.Google ScholarDigital Library
- Chaolin Huang, Yeming Wang, Xingwang Li, Lili Ren, Jianping Zhao, Yi Hu, Li Zhang, Guohui Fan, Jiuyang Xu, Xiaoying Gu, et almbox. 2020. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. The lancet, Vol. 395, 10223 (2020), 497--506.Google Scholar
- Yangfeng Ji and Jacob Eisenstein. 2014. Representation learning for text-level discourse parsing. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1. 13--24.Google ScholarCross Ref
- Yoon Kim. 2014. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014).Google Scholar
- Tanushree Mitra and Eric Gilbert. 2015. CREDBANK: A Large-scale Social Media Corpus with Associated Credibility Annotations. In Ninth International AAAI Conference on Web and Social Media.Google Scholar
- Maria Nicola, Zaid Alsafi, Catrin Sohrabi, Ahmed Kerwan, Ahmed Al-Jabir, Christos Iosifidis, Maliha Agha, and Riaz Agha. 2020. The socio-economic implications of the coronavirus and COVID-19 pandemic: A review. International Journal of Surgery (2020).Google ScholarCross Ref
- Jeppe Nørregaard, Benjamin D Horne, and Sibel Adali. 2019. NELA-GT-2018: A Large Multi-Labelled News Dataset for the Study of Misinformation in News Articles. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 13. 630--638.Google ScholarCross Ref
- James W Pennebaker, Ryan L Boyd, Kayla Jordan, and Kate Blackburn. 2015. The development and psychometric properties of LIWC2015. Technical Report.Google Scholar
- Kai Shu, Deepak Mahudeswaran, Suhang Wang, Dongwon Lee, and Huan Liu. 2018. FakeNewsNet: A Data Repository with News Content, Social Context and Dynamic Information for Studying Fake News on Social Media. arXiv preprint arXiv:1809.01286 (2018).Google Scholar
- Niraj Sitaula, Chilukuri K Mohan, Jennifer Grygiel, Xinyi Zhou, and Reza Zafarani. 2020. Credibility-based Fake News Detection. In Disinformation, Misinformation and Fake News in Social Media: Emerging Research Challenges and Opportunities. Springer.Google Scholar
- James Thorne, Andreas Vlachos, Christos Christodoulopoulos, and Arpit Mittal. 2018. FEVER: a large-scale dataset for fact extraction and verification. arXiv preprint arXiv:1803.05355 (2018).Google Scholar
- William Yang Wang. 2017. " liar, liar pants on fire": A new benchmark dataset for fake news detection. arXiv preprint arXiv:1705.00648 (2017).Google Scholar
- Bo Xu, Bernardo Gutierrez, Sumiko Mekaru, Kara Sewalk, Lauren Goodwin, Alyssa Loskill, Emily L Cohn, Yulin Hswen, Sarah C Hill, Maria M Cobo, et almbox. 2020. Epidemiological data from the COVID-19 outbreak, real-time case information. Scientific data, Vol. 7, 1 (2020), 1--6.Google Scholar
- Reza Zafarani, Mohammad Ali Abbasi, and Huan Liu. 2014. Social media mining: an introduction .Cambridge University Press.Google ScholarDigital Library
- Reza Zafarani, Xinyi Zhou, Kai Shu, and Huan Liu. 2019. Fake News Research: Theories, Detection Strategies, and Open Problems. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 3207--3208.Google ScholarDigital Library
- Jiawei Zhang, Bowen Dong, and S Yu Philip. 2020. Fakedetector: Effective fake news detection with deep diffusive neural network. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 1826--1829.Google ScholarCross Ref
- Xinyi Zhou, Atishay Jain, Vir V Phoha, and Reza Zafarani. 2020 a. Fake News Early Detection: A Theory-driven Model. Digital Threats: Research and Practice , Vol. 1, 2 (2020), 1--25.Google ScholarDigital Library
- Xinyi Zhou, Jindi Wu, and Reza Zafarani. 2020 b. SAFE: Similarity-Aware Multi-Modal Fake News Detection. In The 24th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer.Google Scholar
- Xinyi Zhou and Reza Zafarani. 2020. A Survey of Fake News: Fundamental Theories, Detection Methods, and Opportunities. ACM Computing Surveys (CSUR) (2020).Google Scholar
Index Terms
- ReCOVery: A Multimodal Repository for COVID-19 News Credibility Research
Recommendations
Science Disinformation: On the Problem of Fake News
AbstractThis article is devoted to an important socio-cultural phenomenon that undermines public confidence in science, that is, fake science news. The term fake news is analyzed and data on the dissemination of fake news on social networks is provided. ...
Social Media Information or Misinformation About COVID-19: A Phenomenological Study During the First Wave
Coronavirus disease (COVID-19) is an infectious disease caused by a newly discovered novel strain of coronavirus, SARS-CoV-2 (WHO, 2020). With the internet, social media have become the most acclaimed tool for freedom of speech, democracy, truth, ...
Fake News, Disinformation, Propaganda, Media Bias, and Flattening the Curve of the COVID-19 Infodemic
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data MiningThe rise of social media has democratized content creation and has made it easy for anybody to share and to spread information online. On the positive side, this has given rise to citizen journalism, thus enabling much faster dissemination of ...
Comments