1 Introduction
-
We propose a novel framework to represent and measure collective attention shift. Based on this, we systematically study the collective attention during multiple shocking terrorist attack events in 2015 and 2016 and reveal several properties of network structures and temporal dynamics that are consistent across events.
-
We formulate a new problem for efficient monitoring of the collective attention dynamics, and we propose a cost-efficient sampling strategy that takes the users’ hashtag adoption frequency, connectedness and diversity into account, with a stochastic sampling algorithm to cope with the variability of the sampling targets.
-
We conduct extensive experiments and show that our proposed sampling approach significantly outperforms several alternative methods in both retaining the network structures and preserving the information with a small set of sampling targets, suggesting the utility of the proposed method in various realistic settings.
2 Related work
2.1 Collective attention
2.2 Data sampling
3 Characterizing collective attention under shocks
3.1 Data collection
Dataset
|
Duration
|
# of users
|
# of tweets
|
---|---|---|---|
Paris attacks (Paris users) | 10/27/2015-11/20/2015 | 13,439 | 2.58 million |
Paris attacks (New York users) | 10/27/2015-11/20/2015 | 20,022 | 2.63 million |
Paris attacks (London users) | 10/27/2015-11/20/2015 | 12,053 | 1.57 million |
Brussels bombings | 3/5/2016-3/29/2016 | 9,797 | 2.60 million |
San Bernardino attack | 11/17/2015-12/09/2015 | 10,109 | 1.12 million |
Orlando shooting | 5/26/2016-6/19/2016 | 6,157 | 1.26 million |
Trump Followers | 10/29/2016-11/11/2016 | 22,659 | 0.70 million |
Clinton Followers | 10/29/2016-11/11/2016 | 26,810 | 0.73 million |
3.2 Representing collective attention shift
3.3 Measuring collective attention shift
-
Network size: the number of hashtags in the network.
-
Modularity: the community structure exhibited in hashtag connections. A high modularity value indicates a clearly separated community structure. We leverage the Infomap algorithm [36] to compute the modularity of a directed, weighted network.
-
Average weighted degree: the edge weights capture the number of users whose attention shifted from one hashtag to another; hence the average weighted degree of a network for a particular period of time reflects the attention shift frequency or rate.
-
Gini coefficient for weighted degree: measures the level of degree concentration, denoting whether a few hashtags have become dominant in connecting with other hashtags. Gini coefficient ranges from 0 to 1, with 1 representing the highest concentrated attention. Like the power-law exponent, Gini coefficient can be used to measure the preferential patterns but in a more general way. We use the weighted distribution instead of the unweighted distribution as the weighted one allows for capturing the number of unique users that have shifted their attention.
-
Assortativity: the tendency for a node to attach to others that are similar in terms of node degree. For a directed network, there are four types: in-in, in-out, out-in and out-out assortativity. In this work, we use the weighted in-in assortativity as defined in [37].
-
Average clustering coefficient: the tendency of nodes to form triangles. In an attention shift network, this reflects the degree to which the collective attention is likely to shift at a local scale.
-
New tag percentage: the number of newly emerged hashtags relative to the total number of hashtags in the network [38]. We consider a hashtag to be new if it has not been used within one week prior to the time of the network.
-
New tag attention ratio: the percentage of weighted degrees given by the newly emerged hashtags. Specifically, the ratio r is defined as:where \(k_{\mathrm{in}}^{i}\) is the weighted in-degree of node i, and \(H_{\mathrm{new}}\) and H are the set of newly emerged hashtags and all hashtags in a network, respectively.$$ r = \frac{\sum_{j \in H_{\mathrm{new}}}k_{\mathrm{in}}^{j}}{\sum_{i\in H}k_{\mathrm{in}}^{i}}, $$(1)
3.4 Observation: collective attention shift around terrorist attacks
3.5 Comparison with a null model
3.6 Comparison with non-emergency events
4 Attention sampling
4.1 Problem formulation
4.2 Sampling approach
4.2.1 Sampling criteria: who should we include in a sample?
-
Activeness: the extent to which a user will actively mention various topics of interest in their tweets. We consider users who tend to tweet with hashtags at a relatively high frequency as primary sampling candidates as they are more likely to tweet with hashtags during the time of interest.
-
Connectedness: the extent to which a user will diversely cover the topics of interest of many other users. We consider users who tend to tweet with hashtags commonly used by others as desirable sampling candidates as their hashtag use is more likely to cover the use of a broader set of users.
-
Adaptiveness: the extent to which a user will adaptively attend to rare or new topics of interest. We consider users who tend to tweet with novel hashtags as desirable sampling candidates as they are more likely to attend to new topics or information about newly emerging events.
4.2.2 Sampling algorithms: how should we make a stochastic sample?
5 Experiments
5.1 Experiment setup
5.2 Results
-
Sampling criteria - What kind of users should be included in a sample for capturing the collective attention shift in a larger population?
-
Sampling algorithms - What stochastic sampling algorithm is effective for monitoring the dynamics of collective attention?
Sample |
CoPerplexity_PRW
(
α
= 0.4)
|
CoPerplexity_RW
|
CommonTag_RW
| ||||||
---|---|---|---|---|---|---|---|---|---|
1st
|
2nd
|
3rd
|
1st
|
2nd
|
3rd
|
1st
|
2nd
|
3rd
| |
Paris attacks(Paris users) | |||||||||
10% |
51.19%
| 33.93% | 14.88% | 43.45% | 43.45% | 13.10% | 8.33% | 23.21% | 68.45% |
30% | 46.43% | 43.45% | 10.12% |
52.38% | 39.88% | 7.74% | 7.14% | 17.26% | 75.60% |
Paris attacks(London users) | |||||||||
10% |
53.57%
| 36.90% | 9.52% | 39.88% | 47.02% | 13.10% | 10.12% | 22.66% | 72.62% |
30% | 32.73% | 49.40% | 17.86% |
59.52%
| 28.57% | 11.90% | 17.86% | 26.79% | 55.36% |
Brussels bombings | |||||||||
10% |
57.14%
| 25.00% | 17.86% | 30.95% | 36.90% | 32.14% | 14.88% | 40.48% | 44.64% |
30% | 22.02% | 44.05% | 33.93% |
68.45%
| 22.62% | 8.93% | 12.50% | 35.12% | 52.38% |
San Bernardino shooting | |||||||||
10% |
44.04%
| 36.31% | 19.64% | 21.43% | 47.02% | 31.55% | 40.47% | 19.05% | 40.47% |
30% | 22.02% | 39.29% | 32.14% | 33.93% | 46.43% | 31.55% |
50.60%
| 23.21% | 26.19% |