skip to main content
10.1145/3340531.3412880acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article
Public Access

ReCOVery: A Multimodal Repository for COVID-19 News Credibility Research

Published:19 October 2020Publication History

ABSTRACT

First identified in Wuhan, China, in December 2019, the outbreak of COVID-19 has been declared as a global emergency in January, and a pandemic in March 2020 by the World Health Organization (WHO). Along with this pandemic, we are also experiencing an "infodemic" of information with low credibility such as fake news and conspiracies. In this work, we present ReCOVery, a repository designed and constructed to facilitate research on combating such information regarding COVID-19. We first broadly search and investigate ~2,000 news publishers, from which 60 are identified with extreme [high or low] levels of credibility. By inheriting the credibility of the media on which they were published, a total of 2,029 news articles on coronavirus, published from January to May 2020, are collected in the repository, along with 140,820 tweets that reveal how these news articles have spread on the Twitter social network. The repository provides multimodal information of news articles on coronavirus, including textual, visual, temporal, and network information. The way that news credibility is obtained allows a trade-off between dataset scalability and label accuracy. Extensive experiments are conducted to present data statistics and distributions, as well as to provide baseline performances for predicting news credibility so that future methods can be compared. Our repository is available at http://coronavirus-fakenews.com.

Skip Supplemental Material Section

Supplemental Material

3340531.3412880.mp4

mp4

179.2 MB

References

  1. Ramy Baly, Georgi Karadzhov, Dimitar Alexandrov, James Glass, and Preslav Nakov. 2018. Predicting factuality of reporting and bias of news media sources. arXiv preprint arXiv:1810.01765 (2018).Google ScholarGoogle Scholar
  2. Emily Chen, Kristina Lerman, and Emilio Ferrara. 2020. Tracking Social Media Discourse About the COVID-19 Pandemic: Development of a Public Coronavirus Twitter Data Set. JMIR Public Health and Surveillance , Vol. 6, 2 (2020), e19273.Google ScholarGoogle ScholarCross RefCross Ref
  3. Limeng Cui and Dongwon Lee. 2020. CoAID: COVID-19 Healthcare Misinformation Dataset. arXiv preprint arXiv:2006.00885 (2020).Google ScholarGoogle Scholar
  4. Enyan Dai, Yiwei Sun, and Suhang Wang. 2020. Ginger Cannot Cure Cancer: Battling Fake Health News with a Comprehensive Data Repository. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 14. 853--862.Google ScholarGoogle ScholarCross RefCross Ref
  5. Ensheng Dong, Hongru Du, and Lauren Gardner. 2020. An interactive web-based dashboard to track COVID-19 in real time. The Lancet infectious diseases , Vol. 20, 5 (2020), 533--534.Google ScholarGoogle Scholar
  6. Emilio Ferrara. 2019. The history of digital spam. Commun. ACM, Vol. 62, 8 (2019), 82--91.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Chaolin Huang, Yeming Wang, Xingwang Li, Lili Ren, Jianping Zhao, Yi Hu, Li Zhang, Guohui Fan, Jiuyang Xu, Xiaoying Gu, et almbox. 2020. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. The lancet, Vol. 395, 10223 (2020), 497--506.Google ScholarGoogle Scholar
  8. Yangfeng Ji and Jacob Eisenstein. 2014. Representation learning for text-level discourse parsing. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1. 13--24.Google ScholarGoogle ScholarCross RefCross Ref
  9. Yoon Kim. 2014. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014).Google ScholarGoogle Scholar
  10. Tanushree Mitra and Eric Gilbert. 2015. CREDBANK: A Large-scale Social Media Corpus with Associated Credibility Annotations. In Ninth International AAAI Conference on Web and Social Media.Google ScholarGoogle Scholar
  11. Maria Nicola, Zaid Alsafi, Catrin Sohrabi, Ahmed Kerwan, Ahmed Al-Jabir, Christos Iosifidis, Maliha Agha, and Riaz Agha. 2020. The socio-economic implications of the coronavirus and COVID-19 pandemic: A review. International Journal of Surgery (2020).Google ScholarGoogle ScholarCross RefCross Ref
  12. Jeppe Nørregaard, Benjamin D Horne, and Sibel Adali. 2019. NELA-GT-2018: A Large Multi-Labelled News Dataset for the Study of Misinformation in News Articles. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 13. 630--638.Google ScholarGoogle ScholarCross RefCross Ref
  13. James W Pennebaker, Ryan L Boyd, Kayla Jordan, and Kate Blackburn. 2015. The development and psychometric properties of LIWC2015. Technical Report.Google ScholarGoogle Scholar
  14. Kai Shu, Deepak Mahudeswaran, Suhang Wang, Dongwon Lee, and Huan Liu. 2018. FakeNewsNet: A Data Repository with News Content, Social Context and Dynamic Information for Studying Fake News on Social Media. arXiv preprint arXiv:1809.01286 (2018).Google ScholarGoogle Scholar
  15. Niraj Sitaula, Chilukuri K Mohan, Jennifer Grygiel, Xinyi Zhou, and Reza Zafarani. 2020. Credibility-based Fake News Detection. In Disinformation, Misinformation and Fake News in Social Media: Emerging Research Challenges and Opportunities. Springer.Google ScholarGoogle Scholar
  16. James Thorne, Andreas Vlachos, Christos Christodoulopoulos, and Arpit Mittal. 2018. FEVER: a large-scale dataset for fact extraction and verification. arXiv preprint arXiv:1803.05355 (2018).Google ScholarGoogle Scholar
  17. William Yang Wang. 2017. " liar, liar pants on fire": A new benchmark dataset for fake news detection. arXiv preprint arXiv:1705.00648 (2017).Google ScholarGoogle Scholar
  18. Bo Xu, Bernardo Gutierrez, Sumiko Mekaru, Kara Sewalk, Lauren Goodwin, Alyssa Loskill, Emily L Cohn, Yulin Hswen, Sarah C Hill, Maria M Cobo, et almbox. 2020. Epidemiological data from the COVID-19 outbreak, real-time case information. Scientific data, Vol. 7, 1 (2020), 1--6.Google ScholarGoogle Scholar
  19. Reza Zafarani, Mohammad Ali Abbasi, and Huan Liu. 2014. Social media mining: an introduction .Cambridge University Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Reza Zafarani, Xinyi Zhou, Kai Shu, and Huan Liu. 2019. Fake News Research: Theories, Detection Strategies, and Open Problems. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 3207--3208.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jiawei Zhang, Bowen Dong, and S Yu Philip. 2020. Fakedetector: Effective fake news detection with deep diffusive neural network. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 1826--1829.Google ScholarGoogle ScholarCross RefCross Ref
  22. Xinyi Zhou, Atishay Jain, Vir V Phoha, and Reza Zafarani. 2020 a. Fake News Early Detection: A Theory-driven Model. Digital Threats: Research and Practice , Vol. 1, 2 (2020), 1--25.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Xinyi Zhou, Jindi Wu, and Reza Zafarani. 2020 b. SAFE: Similarity-Aware Multi-Modal Fake News Detection. In The 24th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer.Google ScholarGoogle Scholar
  24. Xinyi Zhou and Reza Zafarani. 2020. A Survey of Fake News: Fundamental Theories, Detection Methods, and Opportunities. ACM Computing Surveys (CSUR) (2020).Google ScholarGoogle Scholar

Index Terms

  1. ReCOVery: A Multimodal Repository for COVID-19 News Credibility Research

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management
          October 2020
          3619 pages
          ISBN:9781450368599
          DOI:10.1145/3340531

          Copyright © 2020 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 19 October 2020

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate1,861of8,427submissions,22%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader