Introduction
-
Independent cascade: if \(x_{t-1}\) is the set of newly activated nodes or we call it seed node at time step \(t-1\), then, at each time step t, each node i belonging to \(x_{t-1}\) will infect the inactive neighbor j with the probability \(p_{ij}\).
-
Linear threshold: if each node i has a threshold \(\theta _i\) in the interval [0, 1], then, at each time step t, each inactive node j becomes active if the total inference from all activated neighbors \(\sum _{i \in H_{t-1}}b_{ij} > \theta _j\), where \(H_{t-1}\) is the set of nodes activated at time \(t-1\) or earlier.
-
Seed selection: this can be controlled by the information provider, who selects a proper set of initial seeds that will receive the deemed information. In the previous example, Alice is the seed node.
-
Information cascade: this includes two variables. One is the node activation status, which describes the process wherein the user receives a message from their followee. The other one is the node repost decision, which is controlled by the message receiver. In our model, the repost decision depends on the user preference and the topic of the received message. In the previous example, Bob reposted the message because he likes it. However, if Bob dislikes this message, what will be happen? Since the message is coming from Alice, Bob may think Alice has tastes different from his, and so he might unfollow Alice. The “unfollow” action will break the information flow from Alice to Bob, which leads to a change in the network topology.
-
We introduce the discrete choice model in the information maximization problem, where the network topology dynamically changes during the independent cascading process.
-
We develop practical algorithms to solve the multistage stochastic programming problem under endogenous uncertainty.
-
To avoid directly dealing with large state spaces of node activation, we exploit the implicit Monte Carlo-based partially observable Markov decision process.
-
We compare the results using two algorithms and various sample sizes.
Mathematical models
Problem description
Symbol | Definition |
---|---|
Node preference—like | |
Node preference—dislike | |
Node status—active | |
Node status—inactive | |
Seed/source node |
Mathematical formulation
Symbol | Definition |
---|---|
Indices and sets | |
\(\,i \in {{\mathcal {I}}}\) | Node |
\(\,k \in {{\mathcal {K}}}\) | Message |
\(\,t \in {{\mathcal {T}}}\) | Time |
\(\,s \in {{\mathcal {S}}}\) | Scenario |
Parameters | |
\(\,a^{t,s}_{ij}\) | The directed arc from node i to node j |
\(\,b_{ki}\) | The information preference of node i with respect to message k |
\(\,c_{ki}\) | The pre-activation, that node i has known or has not known the message k before the seed selection |
\(\,w_k\) | The influence weight of message k |
Decision variable | |
\(\,x^t_{ki}\) | Binary variable, seed selection, whether the node i is selected as the seed node of message k at time t |
\(\,y^{t,s}_{ki}\) | Binary variable, node activation, whether the node i is activated by message k at time t and scenario s |
\(\,z^{t,s}_{ki}\) | Binary variable, message transmission, whether the node i decide to transmit message k to its neighbor at time t and scenario s |
Other | |
\(\,Q(x)\) | The total cost of seed selection x |
\(\,R(y)\) | The total reward of node activation y |
\(\,P(a)\) | The network topology probability of all the arcs a |
Solution approaches
-
Myopic policy: does not explicitly use any forecasted network topology and separate the multistage into several two-stage problems (MYSP) by discrete time.
-
Reinforcement learning: reformulates the stochastic programming model to Markov decision process (MDP)
Two-stage stochastic programming with myopic policy
Symbol | Definition |
---|---|
Indices and sets | |
\(\,i \in {\mathcal {I}}\) | Node |
\(\,k \in {\mathcal {K}}\) | Message |
\(\,s \in {\mathcal {S}}\) | Scenario |
Parameters | |
\(\,a^{s}_{ij}\) | The directed arc from node i to node j |
\(\,b_{ki}\) | The information preference of node i with respect to message k |
\(\,c_{ki}\) | The pre-activation, that node i has known or has not known the message k before the seed selection |
\(\,d_{ki}\) | The node repost decision, that node i will repost message k in the network |
\(\,w_k\) | The influence weight of message k |
Decision variable | |
\(\,x_{ki}\) | Binary variable, seed selection, whether the node i is selected as the seed node of message k at time t |
\(\,y^{s}_{ki}\) | Binary variable, node activation, whether the node i is activated by message k at time t and scenario s |
Reinforcement learning with Markov decision process
Symbol | Definition |
---|---|
Indices and sets | |
\(\,i \in {\mathcal {I}}\) | Node |
\(\,k \in {\mathcal {K}}\) | Message |
Parameters | |
\(\,b_{ki}\) | The information preference of node i with respect to message k |
\(\,w_k\) | The influence weight of message k |
\(\,\rho _{ij}\) | The probability of arc connection from node i to node j |
Variable | |
\(\,\sigma _{ki}\) | The element of state matrix \(\varvec{s} \in S\) in row k and column i, that the activation status of node i by message k |
\(\,\alpha _{ki}\) | The element of action matrix \(\varvec{a} \in A\) in row k and column i, that the seed selection of node i by message k |
-
S: the finite set of state, i.e., activation status, \(\varvec{s} \in S\)
-
A: the finite set of action, i.e., source user selection, \(\varvec{a} \in A\)
-
P: the probability of transition from s to \(s'\) through action a, \(P_a(\varvec{s}, \varvec{s'})\)
-
R: the expected reward of transition from s to \(s'\) through action a, i.e., weighted information influence, \(R_a(\varvec{s}, \varvec{s'})\).
Policy evaluation
State | Action | Influence |
---|---|---|
\(\varvec{s}\) | \(\varvec{a}\) | \(Q^{\pi }(\varvec{s},\varvec{a})\) |
\(\left( \,\begin{matrix} 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 \end{matrix}\,\right)\) | \(\left( \,\begin{matrix} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 &{} 0 \end{matrix}\,\right)\) | 3.27869 |
\(\left( \,\begin{matrix} 1 &{} 0 &{} 0 &{} 0 \\ 0 &{} 1 &{} 0 &{} 0 \end{matrix}\,\right)\) | 3.09836 | |
\(\left( \,\begin{matrix} 1 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 1 &{} 0 \end{matrix}\,\right)\) | 3.22414 | |
\(\vdots\) | \(\vdots\) | |
\(\left( \,\begin{matrix} 0 &{} 0 &{} 0 &{} 1 \\ 0 &{} 0 &{} 0 &{} 1 \end{matrix}\,\right)\) | 3.90909 |
Policy improvement
State | Action | Initial policy | Updated policy |
---|---|---|---|
\(\varvec{s}\) | \(\varvec{a}\) | \(\pi (\varvec{s},\varvec{a})\) | \(\pi '(\varvec{s},\varvec{a})\) |
\(\left( \,\begin{matrix} 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 \end{matrix}\,\right)\) | \(\left( \,\begin{matrix} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 &{} 0 \end{matrix}\,\right)\) | 0.0625 | 0.0463788 |
\(\left( \,\begin{matrix} 1 &{} 0 &{} 0 &{} 0 \\ 0 &{} 1 &{} 0 &{} 0 \end{matrix}\,\right)\) | 0.0625 | 0.0515202 | |
\(\left( \,\begin{matrix} 1 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 1 &{} 0 \end{matrix}\,\right)\) | 0.0625 | 0.0554653 | |
\(\vdots\) | \(\vdots\) | \(\vdots\) | |
\(\left( \,\begin{matrix} 0 &{} 0 &{} 0 &{} 1 \\ 0 &{} 0 &{} 0 &{} 1 \end{matrix}\,\right)\) | 0.0625 | 0.0737093 |
Computational experiments on algorithms’ convergence
-
data set (2,4) with 2 messages and 4 nodes
-
data set (2,7) with 2 messages and 7 nodes
-
data set (3,7) with 3 messages and 7 nodes (cannot converge within 24 h, the average computation time for each iteration will take 60000 s).
Method | Best result | Time (s) | Best result | Time (s) |
---|---|---|---|---|
Dataset (2,4) | Dataset (2,7) | |||
SP-MYOPIC | 3.771 | 5.349 | 10.238 | 69.701 |
RL-MDP sample size 1E+04 | 3.6055 | 3.819 | 9.247 | 59.622 |
RL-MDP sample size 1E+06 | 3.8166 | 283.216 | 10.291 | 4364.74 |