research-article

Public Access

ReCOVery: A Multimodal Repository for COVID-19 News Credibility Research

Authors:
Xinyi Zhou

Syracuse University, Syracuse, NY, USA

Syracuse University, Syracuse, NY, USA
View Profile

,
Apurva Mulay

Syracuse University, Syracuse, NY, USA

Syracuse University, Syracuse, NY, USA
View Profile

,
Emilio Ferrara

University of Southern California, Los Angeles, CA, USA

University of Southern California, Los Angeles, CA, USA
View Profile

,
Reza Zafarani

Syracuse University, Syracuse, NY, USA

Syracuse University, Syracuse, NY, USA
View Profile

CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge ManagementOctober 2020Pages 3205–3212https://doi.org/10.1145/3340531.3412880

Published:19 October 2020Publication History

CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management

Pages 3205–3212

ABSTRACT

First identified in Wuhan, China, in December 2019, the outbreak of COVID-19 has been declared as a global emergency in January, and a pandemic in March 2020 by the World Health Organization (WHO). Along with this pandemic, we are also experiencing an "infodemic" of information with low credibility such as fake news and conspiracies. In this work, we present ReCOVery, a repository designed and constructed to facilitate research on combating such information regarding COVID-19. We first broadly search and investigate ~2,000 news publishers, from which 60 are identified with extreme [high or low] levels of credibility. By inheriting the credibility of the media on which they were published, a total of 2,029 news articles on coronavirus, published from January to May 2020, are collected in the repository, along with 140,820 tweets that reveal how these news articles have spread on the Twitter social network. The repository provides multimodal information of news articles on coronavirus, including textual, visual, temporal, and network information. The way that news credibility is obtained allows a trade-off between dataset scalability and label accuracy. Extensive experiments are conducted to present data statistics and distributions, as well as to provide baseline performances for predicting news credibility so that future methods can be compared. Our repository is available at http://coronavirus-fakenews.com.

Supplemental Material

3340531.3412880.mp4

mp4

179.2 MB

Download

References

Ramy Baly, Georgi Karadzhov, Dimitar Alexandrov, James Glass, and Preslav Nakov. 2018. Predicting factuality of reporting and bias of news media sources. arXiv preprint arXiv:1810.01765 (2018).Google Scholar
Emily Chen, Kristina Lerman, and Emilio Ferrara. 2020. Tracking Social Media Discourse About the COVID-19 Pandemic: Development of a Public Coronavirus Twitter Data Set. JMIR Public Health and Surveillance , Vol. 6, 2 (2020), e19273.Google ScholarCross Ref
Limeng Cui and Dongwon Lee. 2020. CoAID: COVID-19 Healthcare Misinformation Dataset. arXiv preprint arXiv:2006.00885 (2020).Google Scholar
Enyan Dai, Yiwei Sun, and Suhang Wang. 2020. Ginger Cannot Cure Cancer: Battling Fake Health News with a Comprehensive Data Repository. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 14. 853--862.Google ScholarCross Ref
Ensheng Dong, Hongru Du, and Lauren Gardner. 2020. An interactive web-based dashboard to track COVID-19 in real time. The Lancet infectious diseases , Vol. 20, 5 (2020), 533--534.Google Scholar
Emilio Ferrara. 2019. The history of digital spam. Commun. ACM, Vol. 62, 8 (2019), 82--91.Google ScholarDigital Library
Chaolin Huang, Yeming Wang, Xingwang Li, Lili Ren, Jianping Zhao, Yi Hu, Li Zhang, Guohui Fan, Jiuyang Xu, Xiaoying Gu, et almbox. 2020. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. The lancet, Vol. 395, 10223 (2020), 497--506.Google Scholar
Yangfeng Ji and Jacob Eisenstein. 2014. Representation learning for text-level discourse parsing. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1. 13--24.Google ScholarCross Ref
Yoon Kim. 2014. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014).Google Scholar
Tanushree Mitra and Eric Gilbert. 2015. CREDBANK: A Large-scale Social Media Corpus with Associated Credibility Annotations. In Ninth International AAAI Conference on Web and Social Media.Google Scholar
Maria Nicola, Zaid Alsafi, Catrin Sohrabi, Ahmed Kerwan, Ahmed Al-Jabir, Christos Iosifidis, Maliha Agha, and Riaz Agha. 2020. The socio-economic implications of the coronavirus and COVID-19 pandemic: A review. International Journal of Surgery (2020).Google ScholarCross Ref
Jeppe Nørregaard, Benjamin D Horne, and Sibel Adali. 2019. NELA-GT-2018: A Large Multi-Labelled News Dataset for the Study of Misinformation in News Articles. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 13. 630--638.Google ScholarCross Ref
James W Pennebaker, Ryan L Boyd, Kayla Jordan, and Kate Blackburn. 2015. The development and psychometric properties of LIWC2015. Technical Report.Google Scholar
Kai Shu, Deepak Mahudeswaran, Suhang Wang, Dongwon Lee, and Huan Liu. 2018. FakeNewsNet: A Data Repository with News Content, Social Context and Dynamic Information for Studying Fake News on Social Media. arXiv preprint arXiv:1809.01286 (2018).Google Scholar
Niraj Sitaula, Chilukuri K Mohan, Jennifer Grygiel, Xinyi Zhou, and Reza Zafarani. 2020. Credibility-based Fake News Detection. In Disinformation, Misinformation and Fake News in Social Media: Emerging Research Challenges and Opportunities. Springer.Google Scholar
James Thorne, Andreas Vlachos, Christos Christodoulopoulos, and Arpit Mittal. 2018. FEVER: a large-scale dataset for fact extraction and verification. arXiv preprint arXiv:1803.05355 (2018).Google Scholar
William Yang Wang. 2017. " liar, liar pants on fire": A new benchmark dataset for fake news detection. arXiv preprint arXiv:1705.00648 (2017).Google Scholar
Bo Xu, Bernardo Gutierrez, Sumiko Mekaru, Kara Sewalk, Lauren Goodwin, Alyssa Loskill, Emily L Cohn, Yulin Hswen, Sarah C Hill, Maria M Cobo, et almbox. 2020. Epidemiological data from the COVID-19 outbreak, real-time case information. Scientific data, Vol. 7, 1 (2020), 1--6.Google Scholar
Reza Zafarani, Mohammad Ali Abbasi, and Huan Liu. 2014. Social media mining: an introduction .Cambridge University Press.Google ScholarDigital Library
Reza Zafarani, Xinyi Zhou, Kai Shu, and Huan Liu. 2019. Fake News Research: Theories, Detection Strategies, and Open Problems. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 3207--3208.Google ScholarDigital Library
Jiawei Zhang, Bowen Dong, and S Yu Philip. 2020. Fakedetector: Effective fake news detection with deep diffusive neural network. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 1826--1829.Google ScholarCross Ref
Xinyi Zhou, Atishay Jain, Vir V Phoha, and Reza Zafarani. 2020 a. Fake News Early Detection: A Theory-driven Model. Digital Threats: Research and Practice , Vol. 1, 2 (2020), 1--25.Google ScholarDigital Library
Xinyi Zhou, Jindi Wu, and Reza Zafarani. 2020 b. SAFE: Similarity-Aware Multi-Modal Fake News Detection. In The 24th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer.Google Scholar
Xinyi Zhou and Reza Zafarani. 2020. A Survey of Fake News: Fundamental Theories, Detection Methods, and Opportunities. ACM Computing Surveys (CSUR) (2020).Google Scholar

Index Terms

ReCOVery: A Multimodal Repository for COVID-19 News Credibility Research
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Clustering and classification
  2. Information systems applications
    1. Collaborative and social computing systems and tools
2. Security and privacy
  1. Human and societal aspects of security and privacy
    1. Social aspects of security and privacy

Recommendations

Science Disinformation: On the Problem of Fake News
Abstract
This article is devoted to an important socio-cultural phenomenon that undermines public confidence in science, that is, fake science news. The term fake news is analyzed and data on the dissemination of fake news on social networks is provided. ...
Read More
Social Media Information or Misinformation About COVID-19: A Phenomenological Study During the First Wave

Coronavirus disease (COVID-19) is an infectious disease caused by a newly discovered novel strain of coronavirus, SARS-CoV-2 (WHO, 2020). With the internet, social media have become the most acclaimed tool for freedom of speech, democracy, truth, ...
Read More
Fake News, Disinformation, Propaganda, Media Bias, and Flattening the Curve of the COVID-19 Infodemic
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

The rise of social media has democratized content creation and has made it easy for anybody to share and to spread information online. On the positive side, this has given rise to citizen journalism, thus enabling much faster dissemination of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management
October 2020
3619 pages
ISBN:9781450368599
DOI:10.1145/3340531
General Chairs:
Mathieu d'Aquin
DSI, Insight, NUI Galway, Ireland
,
Stefan Dietze
GESIS, Cologne, Germany, Heinrich-Heine-University Düsseldorf, Germany, L3S Research Center, Germany
,
Program Chairs:
Claudia Hauff
TU Delft, The Netherlands
,
Edward Curry
DSI, Insight, NUI Galway, Ireland
,
Philippe Cudre Mauroux
eXascale, University of Fribourg, Switzerland
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 October 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
coronavirus
covid-19
fake news
infodemic
information credibility
multimodal
pandemic
repository
social media
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 94
  Total Citations
  View Citations
- 2,127
  Total Downloads
- Downloads (Last 12 months)525
- Downloads (Last 6 weeks)57
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

ReCOVery: A Multimodal Repository for COVID-19 News Credibility Research

CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Science Disinformation: On the Problem of Fake News

Social Media Information or Misinformation About COVID-19: A Phenomenological Study During the First Wave

Fake News, Disinformation, Propaganda, Media Bias, and Flattening the Curve of the COVID-19 Infodemic