Published June 23, 2020 | Version v1
Dataset Open

Tweet IDs using History related hashtags

  • 1. Tokyo Metropolitan University
  • 2. Kyoto University

Description

This repository contains IDs of tweets that are related to history and that were collected for the purpose of analyzing how history-related content is disseminated in online social networks. Our IJDL paper shows the analysis results. The preliminary version of the analysis report is available here.

 

We used the Twitter official search API provided by Twitter to collect tweets. Note that three kinds of tweets are typically found in Twitter: tweets, retweets and quote tweets. Tweet is an original text issued as a post by a Twitter user. A retweet is a copy of an original tweet for the purpose of propagating the tweet content to more users (i.e., one's followers). Finally, a quote tweet copies the content of another tweet and allows also to add new content. A quote tweet is sometimes called a retweet with a comment. In this work, we simply treat all quote tweets as original tweets since they include additional information/text. There were however only 1,877 (0.2%) tweets recognized as quote tweets in our dataset. 

 

To collect tweets that refer to the past or are related to collective memory of past events/entities, we performed hashtag based crawling together with bootstrapping procedure. 
At the beginning, we gathered several historical hashtags selected by experts (e.g. #HistoryTeacher, #history, #WmnHist). 
In addition, we prepared several hashtags that are commonly used when referring to the past: #onthisday, #thisdayinhistory, #throwbackthursday, #otd. We then collected tweets that contain these hashtags by using Twitter official search API.

 

The collected tweets were issued from 8 March 2016 to 2 July 2018. 
Bootstrapping allowed us to search for other hashtags frequently used with the seed hashtags. The tweets tagged by such hashtags were then included into the seed set after the manual inspection of all the discovered hashtags as of their relation to the history, and filtering ones that are unrelated. 
In total, we gathered 147 history-related hashtags which allowed us to collect 2,370,252 tweet IDs pointing to 882,977 tweets and 1,487,275 re-tweets.

 

Related papers:

  1. Yasunobu Sumikawa, Adam Jatowt, and Marten During, "Digital History meets Microblogging: Analyzing Collective Memories in Twitter", In Proceedings of the 18th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL'18, IEEE/ACM, pp. 213 -- 222, 2018. [paper]
  2. Yasunobu Sumikawa and Adam Jatowt, "Analyzing History-related Posts in Twitter", International Journal on Digital Libraries, Springer, 2020. https://doi.org/10.1007/s00799-020-00296-2 [paper]
  3. Yasunobu Sumikawa and Adam Jatowt, "Annotated Dataset of History-related Tweets", Data in Brief, Vol. 38, pp. 107344, Elsevier, 2021. [paper][annotated dataset]

Files

history_related_tweets_analysis_journal.txt

Files (45.2 MB)

Name Size Download all
md5:3cad4e9d4896059fb2ef8554016194f1
45.2 MB Preview Download