skip to main content
10.1145/3340531.3412765acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

TweetsCOV19 - A Knowledge Base of Semantically Annotated Tweets about the COVID-19 Pandemic

Authors Info & Claims
Published:19 October 2020Publication History

ABSTRACT

Publicly available social media archives facilitate research in the social sciences and provide corpora for training and testing a wide range of machine learning and natural language processing methods. With respect to the recent outbreak of the Coronavirus disease 2019 (COVID-19), online discourse on Twitter reflects public opinion and perception related to the pandemic itself as well as mitigating measures and their societal impact. Understanding such discourse, its evolution, and interdependencies with real-world events or (mis)information can foster valuable insights. On the other hand, such corpora are crucial facilitators for computational methods addressing tasks such as sentiment analysis, event detection, or entity recognition. However, obtaining, archiving, and semantically annotating large amounts of tweets is costly. In this paper, we describe TweetsCOV19, a publicly available knowledge base of currently more than 8 million tweets, spanning October 2019 - April 2020. Metadata about the tweets as well as extracted entities, hashtags, user mentions, sentiments, and URLs are exposed using established RDF/S vocabularies, providing an unprecedented knowledge base for a range of knowledge discovery tasks. Next to a description of the dataset and its extraction and annotation process, we present an initial analysis and use cases of the corpus.

References

  1. Sarah Alqurashi, Ahmad Alhindi, and Eisa Alanazi. 2020. Large arabic twitter dataset on covid-19. arXiv preprint arXiv:2004.04315Google ScholarGoogle Scholar
  2. Juan M. Banda, Ramya Tekumalla, Guanyu Wang, Jingyuan Yu, Tuo Liu, Yuning Ding, Katya Artemova, Elena Tutubalina, and Gerardo Chowell. 2020. A large-scale COVID-19 Twitter chatter dataset for open scientific research - an international collaboration. https://doi.org/10.5281/zenodo.3831406Google ScholarGoogle Scholar
  3. Roi Blanco, Giuseppe Ottaviano, and Edgar Meij. 2015. Fast and space-efficient entity linking for queries. In Eighth International Conference on Web Search and Data Mining. ACM, 179--188.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. John G Breslin, Stefan Decker, Andreas Harth, and Uldis Bojars. 2006. SIOC: an approach to connect web-based communities. International Journal of Web Based Communities, Vol. 2, 2 (2006).Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Axel Bruns and Katrin Weller. 2016. Twitter as a first draft of the present: and the challenges of preserving it for the future. In 8th ACM Conference on Web Science. ACM, 183--189.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Abhijnan Chakraborty, Johnnatan Messias, Fabricio Benevenuto, Saptarshi Ghosh, Niloy Ganguly, and Krishna Gummadi. 2017. Who Makes Trends? Understanding Demographic Biases in Crowdsourced Recommendations.Google ScholarGoogle Scholar
  7. Emily Chen, Kristina Lerman, and Emilio Ferrara. 2020. Tracking Social Media Discourse About the COVID-19 Pandemic: Development of a Public Coronavirus Twitter Data Set. JMIR Public Health Surveill , Vol. 6, 2 (2020). https://doi.org/10.2196/19273Google ScholarGoogle ScholarCross RefCross Ref
  8. Jae Eun Chung. 2017. Retweeting in health promotion: Analysis of tweets about Breast Cancer Awareness Month. Computers in Human Behavior , Vol. 74 (2017), 112--119.Google ScholarGoogle ScholarCross RefCross Ref
  9. Pavlos Fafalios, Manolis Baritakis, and Yannis Tzitzikas. 2015. Exploiting linked data for open and configurable named entity extraction. International Journal on Artificial Intelligence Tools, Vol. 24, 2 (2015), 1540012.Google ScholarGoogle ScholarCross RefCross Ref
  10. Pavlos Fafalios, Vasileios Iosifidis, Eirini Ntoutsi, and Stefan Dietze. 2018. Tweetskb: A public and large-scale rdf corpus of annotated tweets. In European Semantic Web Conference. Springer, 177--190.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Zhiwei Gao, Shuntaro Yada, Shoko Wakamiya, and Eiji Aramaki. 2020. NAIST COVID: Multilingual COVID-19 Twitter and Weibo Dataset. (2020). arxiv: 2004.08145Google ScholarGoogle Scholar
  12. Malo Gasquet, Darlene Brechtel, Matthaus Zloch, Andon Tchechmedjiev, Katarina Boland, Pavlos Fafalios, Stefan Dietze, and Konstantin Todorov. 2019. Exploring Fact-checked Claims and their Descriptive Statistics. In ISWC 2019 Satellite Tracks-18th International Semantic Web Conference.Google ScholarGoogle Scholar
  13. Ramanathan V Guha, Dan Brickley, and Steve Macbeth. 2016. Schema.org: evolution of structured data on the web. Commun. ACM, Vol. 59, 2 (2016), 44--51.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Manish Gupta, Jing Gao, ChengXiang Zhai, and Jiawei Han. 2012. Predicting future popularity trend of events in microblogging platforms. Proceedings of the American Society for Information Science and Technology, Vol. 49, 1 (2012), 1--10.Google ScholarGoogle ScholarCross RefCross Ref
  15. Fatima Haouari, Maram Hasanain, Reem Suwaileh, and Tamer Elsayed. 2020. ArCOV-19: The First Arabic COVID-19 Twitter Dataset with Propagation Networks. arXiv preprint arXiv:2004.05861 (2020).Google ScholarGoogle Scholar
  16. Xiaolei Huang, Amelia Jamison, David Broniatowski, Sandra Quinn, and Mark Dredze. 2020. Coronavirus Twitter Data: A collection of COVID-19 tweets with automated annotations. https://doi.org/10.5281/zenodo.3735015Google ScholarGoogle Scholar
  17. Daniel Kerchner and Laura Wrubel. 2020. Coronavirus Tweet Ids. https://doi.org/10.7910/DVN/LW0BTBGoogle ScholarGoogle Scholar
  18. Eunice Kim, Yongjun Sung, and Hamsu Kang. 2014. Brand followers? retweeting behavior on Twitter: How brand relationships influence brand electronic word-of-mouth. Computers in Human Behavior , Vol. 37 (2014), 18--25.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Marina Kogan, Leysia Palen, and Kenneth M Anderson. 2015. Think local, retweet global: Retweeting by the geographically-vulnerable during Hurricane Sandy. In 18th Conference on Computer Supported Cooperative Work & Social Computing. ACM, 981--993.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Rabindra Lamsal. 2020 a. Coronavirus (COVID-19) Geo-tagged Tweets Dataset. https://doi.org/10.21227/fpsb-jz61Google ScholarGoogle Scholar
  21. Rabindra Lamsal. 2020 b. Coronavirus (COVID-19) Tweets Dataset. https://doi.org/10.21227/781w-ef42Google ScholarGoogle Scholar
  22. Cristian Lumezanu, Nick Feamster, and Hans Klein. 2012. #bias: Measuring the Tweeting Behavior of Propagandists.Google ScholarGoogle Scholar
  23. Abhilash Mittal and Sanjay Patidar. 2019. Sentiment Analysis on Twitter Data: A Survey. In 7th International Conference on Computer and Communications Management. ACM, 91--95.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Martin M Müller and Marcel Salathé. 2019. Crowdbreaks: Tracking health trends using public social media data and crowdsourcing. Frontiers in public health , Vol. 7 (2019).Google ScholarGoogle Scholar
  25. Kashyap Popat, Subhabrata Mukherjee, Jannik Strötgen, and Gerhard Weikum. 2017. Where the truth lies: Explaining the credibility of emerging claims on the web and social media. In 26th International Conference on World Wide Web Companion. International World Wide Web Conferences Steering Committee, 1003--1012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Umair Qazi, Muhammad Imran, and Ferda Ofli. 2020. GeoCoV19: A Dataset of Hundreds of Millions of Multilingual COVID-19 Tweets with Location Information. ACM SIGSPATIAL Special , Vol. 12, 1 (2020).Google ScholarGoogle Scholar
  27. Ibrahim Sabuncu and Zyenep Yurek. 2020. Corona Virus (COVID-19) Turkish Tweets Dataset. https://doi.org/10.21227/0wf0-0792Google ScholarGoogle Scholar
  28. J Fernando Sánchez-Rada and Carlos A Iglesias. 2016. Onyx: A linked data approach to emotion representation. Information Processing & Management , Vol. 52, 1 (2016), 99--114.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Surendra Sedhai and Aixin Sun. 2015. Hspam14: A collection of 14 million tweets for hashtag-oriented spam research. In 38th International Conference on Research and Development in Information Retrieval. ACM, 223--232.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Stefan Stieglitz and Linh Dang-Xuan. 2012. Political communication and influence through microblogging--An empirical analysis of sentiment in Twitter messages and retweet behavior. In 45th Hawaii International Conference on System Sciences. IEEE, 3500--3509.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. Tchechmedjiev, P. Fafalios, K. Boland, M. Gasquet, M. Zloch, B. Zapilko, S. Dietze, and K. Todorov. 2019. ClaimsKG: A Live Knowledge Graph of Fact-Checked Claims. In 18th International Semantic Web Conference. Springer, 309--324.Google ScholarGoogle Scholar
  32. Mike Thelwall, Kevan Buckley, and Georgios Paltoglou. 2012. Sentiment strength detection for the social web. Journal of the American Society for Information Science and Technology, Vol. 63, 1 (2012), 163--173.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Sebastian Tschiatschek, Adish Singla, Manuel Gomez Rodriguez, Arpit Merchant, and Andreas Krause. 2018. Fake News Detection in Social Networks via Crowd Signals. In Companion Proceedings of the The Web Conference 2018. International World Wide Web Conferences Steering Committee, 517--524.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Soroush Vosoughi, Deb Roy, and Sinan Aral. 2018. The spread of true and false news online. Science, Vol. 359, 6380 (2018), 1146--1151.Google ScholarGoogle Scholar

Index Terms

  1. TweetsCOV19 - A Knowledge Base of Semantically Annotated Tweets about the COVID-19 Pandemic

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management
            October 2020
            3619 pages
            ISBN:9781450368599
            DOI:10.1145/3340531

            Copyright © 2020 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 19 October 2020

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate1,861of8,427submissions,22%

            Upcoming Conference

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader