1 Introduction
2 Related work
2.1 Information retrieval and automatic summarization
2.2 Entity linking and semantic modeling
2.3 Twitter content analysis and categorization
3 Proposed method for tweet contextualization
3.1 Basic concepts definitions
-
Tweet: we represent a tweet as bag of words. We removed all the stop words (based on the standard INQUERY stop list). The final representation is a clean tweet without stop words or useless words. We used t as a symbol for tweet object.
-
User: symbol u is used for user object.
-
Initial tweet representation: we represent an initial (ambiguous) tweet \(t_{in}\) as a bag of hashtags. Formally, we have \(t_{{ in}_{i}}=\{h_{1},\ldots ,h_{j}\}\)
-
tweet contextualization task: the idea is to expand a collection of n initial (ambiguous) tweets \(S_{t_{ in}}\)=\(\{t_{{ in}_{1}},\ldots ,t_{{ initial}_{n}}\)} using a collection of m Twitter conversations \(S_{ c}=\{c_{1},\ldots ,c_{m}\}\) by providing a context \(C_{i}\) for each tweet \(t_{{ in}_{i}} \in S_{t_{ in}}\). For given tweet, we retrieve a sub-set \({ sub}_{ c}\) of relevant conversations from \(S_{ c}\); then we select the most relevant tweets from conversations in \({ sub}_{ c}\).
-
Context representation: the context \(C_{i}\) of an initial tweet \(t_{ in}\) is defined as a set of informative tweets from the \({ sub}_{ c}\) sub-set.
-
Twitter conversation: we define Twitter conversation as a set of tweets posted by users at specific timestamp on the same topic. These tweets can be directly replied to other users by using “@username” or indirectly by retweeting, mention and other possible interactions (favorite).
3.2 Twitter conversations trees analysis
3.3 Candidate tweets retrieval from social conversations
3.3.1 Initial tweet formatting
3.3.2 Retrieving Twitter conversations
3.4 Candidate tweets selection
-
tweet influence: the importance of a tweet within the conversation where it appears is estimated using social influence.
-
tweet relevance regarding initial text: we compute the cosine similarity between the candidate tweet and the initial tweet.
-
tweet relevance regarding URL: we compute the word overlap and the cosine similarity between the candidate tweet and the body content of the linked page, as well as with the title of the Web page.
3.4.1 Social influence generation based on user–tweet interaction model
-
tweet influence score: refers to those features which represent the particular characteristics of tweet.
-
tweet’s author influence score: refers to those features which represent the influence of tweet’s author.
-
Reply influence score(t) The action here is replying. The more replies a tweet receives, the more influential it is. This influence can be measured by the number of replies that the tweet receives. The reply influence is defined as follows:
-
Retweet influence score(t) The action here is retweeting. The more frequently user’s messages are retweeted by others, the more influential it is. This can also be quantified by the number of retweet. It is defined as follows:\(\beta \) \(\in \) (0, 1]. It is adjustable and indicates the weight of retweet edge.$$\begin{aligned} Retweet\_influence(t)= \beta \times number\_retweet(t). \end{aligned}$$(2)
-
Favorite influence score(t) The action here is favoriting. When a user mark a tweet as favorite, she/he indicates that the tweet’s content is useful and relevant. The more favorites a tweet receives, the more influential it is. This influence can be determined by the number of favorite the tweet receives. It is defined as follows:\(\gamma \) \(\in \) (0, 1]. It is adjustable and indicates the weight of favorite edge.$$\begin{aligned} Favorite\_influence(t)= \gamma \times number\_favorite(t). \end{aligned}$$(3)
-
Mention influence measured through the number of mentions containing one’s name, it indicates the ability of that user to engage others in a conversation. The mention influence score is defined as follows:
-
Follow influence
Parameter | Weight |
---|---|
\(\alpha \)
| 0.6 |
\(\beta \)
| 0.2 |
\(\gamma \)
| 0.2 |
\(\delta \)
| 0.5 |
\(\omega \)
| 0.5 |
3.5 Candidate tweets scoring
-
Similarity to initial tweet
-
Similarity of content
-
Relevance regarding URLs When an URL is present in the tweet, we download the page and extract its title as well as the body content. For each candidate tweet t, we computed the following:
-
The word overlap between a candidate tweet t and the web page title, and between t and the body content of the web page.
-
The cosine similarity between t and the web page title, and between t and the body content of the web page.
-
Feature | Name | Weight |
---|---|---|
c1 | Tweet influence | 0.6257 |
c2 | Tweetauthor influence | 0.533 |
c3 | Cosine initial tweet | 0.207 |
c4 | Cosine tweet | 0.3128 |
c5 | Overlap text URLs | 0.459 |
c6 | Cosine titre URLs | 0.025 |
4 Experiments and results
-
A collection of relevant conversations: these are conversations that are used as a resource to extract components (tweets) of a context corresponding to the contextualization of a tweet.
-
A collection of tweets to contextualize: they correspond to a set of ambiguous tweets.
-
Reference summary: that will be used to compare their contents to our proposed summary.
-
Evaluation measures: tweet contextualization is evaluated on both informativeness and readability; we will use these measures to evaluate our results.
4.1 Ambiguous tweets dataset
-
We have selected only tweets among informative accounts (e.g. @CNN) to avoid purely personal tweets that could not be contextualized.
-
We have chosen only tweets containing hashtags; therefore, hashtags can be considered as one of the main topics of the tweet. This type of information, hashtags, will be used in our experiments to improve the queries used to retrieve conversations and, therefore, to improve the generated content.
4.2 Twitter conversations dataset
4.3 Reference summary
4.4 Evaluation metrics
-
Informativeness The objective of this metric is to evaluate relevant tweets selection. The 10 best tweets summary, for each initial tweet, are selected for evaluation. This choice is made based on the score assigned by the automatic system tweet contextualization (high scores). The dissimilarity between a human selected summary (constructed using a pilot study) and the proposed summary (using our method) is given bywhere \(P=\frac{f_{T}(t)}{f_{T}}+1\) and \(Q=\frac{f_{S}(t)}{f_{S}}+1\). S is the set of informative tweets presented in our proposed summary. and T is the set of terms presented in reference summary. For each term t \(\in \) T, \(f_{T}(t)\) represents the frequency of occurrence of t in reference summary and \(f_{S}(t)\) its frequency of occurrence in the proposed summary. The More the Dis (T,S) is low, the more the proposed summary is similar to the reference. T may take three distinct forms:$$\begin{aligned} Dis (T,S) =\sum _{t\in T} (P-1)\times \left( {1-\frac{min(log(P),log(Q))}{max(log(P),log(Q))}} \right) , \end{aligned}$$(11)
-
Unigrams made of single lemmas.
-
Bigrams made of pairs of consecutive lemmas (in the same sentence).
-
Bigrams with 2-gaps as well as the bigram, but can be separated by two lemmas.
-
-
Readability Readability aims at measuring how clear and easy it is to understand summary. By contrast, readability is evaluated manually and presented in Table 4. Each summary has been evaluated by considering the following parameters [27]:
-
Relevance: judge if the tweet make sense in their context (i.e. after reading the other tweets in the same context). Each assessor had to evaluate relevance with three levels, namely highly relevant (value equal to 2), relevant (value equal to 1) or irrelevant (value equal to 0).
-
Non-redundancy: evaluates the ability of context does not contain too much redundant information, i.e., information that has already been given in a previous tweet. Each assessor had to evaluate redundancy with three levels, namely not redundant (value equal to 2), redundant (value equal to 1) or highly redundant (value equal to 0).
-
Soundness: each assessor had to evaluate the anaphora resolution in the context.
-
Syntax: each assessor had to evaluate syntax of produced context.
-
Unigrams | Bigrams | Skipgrams | |
---|---|---|---|
Topic1 | |||
Human summary | 0.7263 | 0.8534 | 0.9213 |
Proposed summary |
0.7009
|
0.8165
|
0.9055
|
Topic2 | |||
Human summary | 0.7932 | 0.9137 | 0.9361 |
Proposed summary |
0.7505
|
0.9008
|
0.9192
|
Topic3 | |||
Human summary | 0.7786 | 0.9472 | 0.9526 |
Proposed summary |
0.7127
|
0.9138
|
0.9117
|
Relevance (%) | Non redundancy (%) | Soundness (%) | Syntax (%) | AVG (%) | |
---|---|---|---|---|---|
Topic1 | |||||
Human summary | 88.65 | 66.33 | 65.04 | 69.22 | 72.31 |
Proposed summary | 89.72 | 69.78 | 70.68 | 67.37 | 74.38 |
Topic2 | |||||
Human summary | 90.72 | 65.82 | 68.24 | 71.52 | 74.07 |
Proposed summary | 91.03 | 67.49 | 74.52 | 70.02 | 75.76 |
Topic3 | |||||
Human summary | 90.23 | 69.06 | 65.04 | 67.34 | 72.91 |
Proposed summary | 90.24 | 69.72 | 66.64 | 62.35 | 72.23 |
4.5 Comparison using INEX’s data
Unigrams | Bigrams | Skipgrams | |
---|---|---|---|
ref2013 | 0.705 | 0.794 | 0.796 |
ref2014 | 0.7528 | 0.8499 | 0.8516 |
Proposed method | 0.7709 | 0.702 | 0.855 |