In our previous work (
Wang and Street 2014;
2015a), we develop a
reachability-based influence-diffusion model to exploit the implicit knowledge of influence-based connectivity and vertex-pair similarity encoded in the network graph topology. This model has been successfully used to uncover influence centrality and community structure in social networks (Wang and Street
2015a, b). In this
reachability-based model, a node’s influence significance is measured by the total amount of
information the node spreads over its neighborhood within a pre-specified depth limit. Roughly speaking, it focuses on information’s visibility (e.g., brand awareness) regardless of information’s effect (e.g., adoption). In this paper, we investigate
activation-based influence-diffusion modeling for viral marketing, in which a node’s influence significance is measured by the total number of nodes it can activate to adopt a new technology or purchase a new product. We adapt our
reachability-based influence-diffusion model to the viral-marketing scenario and propose a novel, more realistic
activation-based influence-diffusion model. We call it the
multiple-path asynchronous threshold (MAT) model, in which we quantify influence and track its diffusion and aggregation. We integrate direct and indirect influence, and instantiate the 3A process to model complex WOM communication and its effects. Moreover, we naturally incorporate in our MAT model the influence attenuation along diffusion paths, influence decay with time, and individual temporal diffusion dynamics based on the relationship strength or contact frequency between a node of interest and its neighbors. Further, we develop an effective and efficient heuristic, called
IV-Greedy, to address the influence-maximization problem.
MAT Model
Our model is applicable to both undirected/directed and unweighted/weighted networks. Without loss of generality, let G=(V,E,W) be a directed and weighted network with |V|=n and |E|=m. The weight w
uv
∈W associated with a directed edge (u→v) represents the relationship strength of node u over node v, e.g., the frequency that u calls or emails v. The relationship strength is usually asymmetric in directed networks. For any unweighted network, we regard it as a weighted network with a weight of 1 on each link. For any undirected edge between two nodes, we replace it with a pair of directed links pointing to each other, associated with the same weight as the original weight on the undirected edge. The influence is propagated through out-links.
We categorize all the nodes in the network into two types: influencers and messengers. An
influencer is an active node (product adopter) that can originate and spread its influence in the network. A
messenger is an inactive node that may acquire influence and also pass on to her friends the influence it receives from influencers or other messengers. Once a messenger acquires influence that is greater than or equal to its threshold, it is activated and turns into an influencer who starts to spread out its own influence to others. It is noted that an influencer can not only diffuse its own influence to others but also act as a messenger passing along the influence from other influencers. Our model captures an important characteristic of WOM communication (
Kozinets et al. 2010) in that:
anyone (either active or inactive) can pass along WOM messages and potentially influence the recipient. In other words, a node can be activated by not only
direct influence from its active neighbors (influencers) but also
indirect influence passed along by its inactive neighbors (messengers). This is a distinguishing, more realistic feature built in our MAT model.
From a graph-theoretic point of view, the influence diffusion from each influencer can be regarded as a batch of random walks around the neighborhood of the influencer. Each walk explores one diffusion path from the influencer and passes on the influence to the nodes it visits along the path. All walks together constitute an
influence-diffusion tree with the influencer as the root node. To model the complex WOM communication in a more realistic manner, we incorporate in the diffusion tree/process of each influencer several important rules/mechanisms as follows:
-
No cycles or loops are allowed. It makes sense since no one should repeatedly send out the same WOM message to her friends when the message she sent out circulates back to herself;
-
A node may be visited multiple times along different diffusion paths. It resembles that the message/influence may be passed on to the same person via different chains of friends. This is a more realistic imitation of WOM messaging and social reinforcement in terms of repeated exposures (
Romero et al. 2011);
-
The diffusion tree of one influencer may contain other influencers. That is because influencers also act as messengers passing along the message/influence from other influencers;
-
Influence dissipates along diffusion paths. Moreover, the diffusion process stops after three hops from the influencer to match the well-known
three-degrees-of-influence phenomenon (
Christakis and Fowler 2007);
-
Influence diffusion from a parent node to its child nodes is asynchronous. This is an important mechanism implemented in our model to capture the individual diffusion dynamics.
It is worth pointing out that: 1) the influence-diffusion processes of different influencers are independent of each other; 2) influence not only attenuates along diffusion paths but also decays over time. We elaborate below how we quantify influence and keep track of its diffusion and aggregation following the rules described above.
The relationship strength, as indicated by the weight on edges, may vary significantly among family members, close friends, casual acquaintances, and so on. Stronger relationship implies stronger influence potential in general. To quantify the relationship strength on influence, we define a
normalized influence weight that measures the fraction of influence a node receives from a specific in-neighbor relative to the total influence it may receive from all of its in-neighbors. Given a directed edge
u→
v with a raw weight
w
uv
, and letting
\(N_{v}^{in}\) denote the set of node
v’s in-neighbors, the normalized influence weight
\(\hat {w}_{uv}\) is defined as
$$ \hat{w}_{uv}=\frac{w_{uv}}{\sum_{k\in N_{v}^{in}} w_{kv}}. $$
(2)
To quantify the influence attenuation along a diffusion path, we define a
depth-associated attenuation coefficient
$$ \alpha = d^{-2}, $$
(3)
where
d is the depth (number of hops) from an influencer to the node of interest along a diffusion path. It is the same as what we define in the
reachability-based influence-diffusion model (
Wang and Street 2014;
2015a). It can be interpreted as the probability of an influencer’s influence reaching a node
d hops away, as analogous to the probability of a center node linking to a node at a fixed distance
d of its concentric scales of resolution, which is proportional to
d−2 as depicted in (
Easley and Kleinberg 2010). From a probabilistic perspective, letting a random variable
X
i
denote the total amount of influence an influencer
i spreads, and letting a random variable
Y denote any diffusion path from
i, then we can for now quantify influencer
i’s total influence as the conditional expectation
$$ E[\!X_{i}]=\sum\limits_{y} E[\!X_{i}|Y=y]P\{Y=y\}, $$
(4)
where
y is a diffusion path from influencer
i to a destination node
j.
E[
X
i
|
Y=
y] is the expected influence
j acquires along the path
y, which is estimated by the chain product of the normalized influence weights of the corresponding links that constitute the diffusion path
y.
P{
Y=
y} is the probability that
i’s influence reaches
j along the path
y, which is estimated by the depth-associated attenuation coefficient at the depth from
i to
j along the path
y. The depth-associated attenuation is actually a compounding factor that incorporates the decreasing reaching probability, trustworthiness decay and information corruption. As indicated in the
three-degrees-of-influence phenomenon (
Christakis and Fowler 2007), the influence ceases to have a noticeable effect on people beyond three degrees of separation. Therefore, we set the depth limit
d
max
to 3 for each influencer. In other words, the influence of any influencer can propagate up to a social horizon of three hops.
Influence also dissipates over time. For example, right after we purchase a new product, we usually feel the excitement and urge to tell our friends (potentially with stronger attempt to influence them). Our enthusiasm and influence potential fade away as time elapses. We model the temporal influence decay as a
time-associated attenuation coefficient β(
t) as an exponential function of time
$$ \beta(t)=e^{-\lambda t}, $$
(5)
where λ is a user-specified tunable parameter of decay rate. It can be used to account for different products on various social media. When a node is newly activated, a timer is attached to this new influencer and the time is set to 0 at that moment so that we can track the number of hops/steps its influence propagates in its neighborhood. It is noted that the timing for that node acting as a messenger of other influencers has no change.
The next is to model the individual diffusion dynamics. Many real-world networks are thought to be scale-free, in the sense that the degree distribution follows a power law. Therefore, at the network level, it is often seen that some people have much more friends and are much more active in WOM messaging than others. At the individual level, people usually communicate much more frequently with their family members and close friends than casual acquaintances. The contact frequency among friends is strongly correlated with the relationship strength. As a result, WOM messaging from one person to each of her friends generally takes place in an asynchronous manner. To capture the individual temporal diffusion dynamics, we model the heterogeneity of WOM messaging as a
Poisson process. For a directed edge (
u→
v), let
X
uv
denote the number of times node
u makes contact with node
v during one time step. We assume that
X
uv
follows a Poisson distribution with a rate of
μ
uv
. Its probability mass function can be written as
$$ p_{uv}(x)=\left\{ \begin{array}{ll} \frac{\mu_{uv}^{x} e^{-\mu_{uv}}}{x!}, & x=0,1,2\ldots\\ 0, & \text{elsewhere.} \end{array}\right. $$
(6)
Statistically,
μ
uv
is the average number of times
u makes contact with
v per time step. It can be estimated using
out-link weights of each node to differentiate their activeness and vertex-pair diffusion dynamics. In practice, the weights on edges may have different meanings and properties, and their values may vary significantly. We use 0-1 scaling to bring them into alignment. Specifically,
μ
uv
is computed as
$$ \mu_{uv}=\hat{\mu}+\mu_{u}+\tilde{w}_{uv}, $$
(7)
where
\(\hat {\mu }\) is a tunable parameter that measures the overall activeness of the network, and
μ
u
and
\(\tilde {w}_{uv}\) can be interpreted as the
global activeness of node
u and
local activeness of
u→
v, respectively. A sensible range for
\(\hat {\mu }\) is (0,2]. The larger
\(\hat {\mu }\), the more active the network is. When
\(\hat {\mu }=2\), the probability that a node contacts with its out-link neighbor at least once in one time step is greater than or equal to 0.865. When
\(\hat {\mu }\) is too small, the network would be too inactive to facilitate viral marketing. It can be learned using data-driven approaches in practice. We set it to 1 by default. Let
w
u
denote the sum of all out-link weights of node
u, and define
w
max
= max
i∈
Vw
i
and
w
min
= min
i∈
Vw
i
. Let
wu:max and
wu:min denote the maximum and minimum weight among node
u’s out-links, respectively. Then
μ
u
and
\(\tilde {w}_{uv}\) can be calculated as follows
$$ \mu_{u}=\frac{w_{u}-w_{min}}{w_{max}-w_{min}}, $$
(8)
$$ \tilde{w}_{uv}=\frac{w_{uv}-w_{u:min}}{w_{u:max}-w_{u:min}}. $$
(9)
These normalization schemes enable us to differentiate a node’s activeness at both the network and the individual levels. Specifically, for a scale-free network, there exist a few hubs, which are nodes with a degree that greatly exceeds the average. These high-degree nodes are usually more active and more influential, and play an important role in WOM marketing. The global activeness μ
u
has a desirable interpretation of the degree distribution. Hubs in a scale-free network would have a significantly larger μ
u
than other nodes. On the other hand, if the network of interest is not scale-free, μ
u
would be more randomly or normally distributed. Degree-based seeding strategy would be no longer effective. In case w
max
=w
min
(which rarely occurs), we set μ
u
to 0.5 for all u∈V. Similarly, \(\tilde {w}_{uv}\) captures node u’s local activeness, which is proportional to the weight on the link from u to its respective neighbor. If wu:max=wu:min (e.g., unweighted networks), we set \(\tilde {w}_{uv}\) to 0.5.
Once the rate
μ
uv
is determined, it is straightforward to calculate the probability with which the WOM message propagates from node
u to its out-neighbor
v at one time step as
$$ p_{uv}=1-P(X=0)=1-e^{-\mu_{uv}}. $$
(10)
If the propagation is not realized at time step t
i
, then the probability for the message to be transmitted at time step ti+1 is unchanged due to the memoryless property of Poisson distribution. If u has delayed passing on the message to v for delay
max
time steps (maximum delay), it is assumed that u has no intention to pass the message on to v at all. This is a sensible mechanism reflecting the fact that not everyone is actively engaged in WOM messaging with each of its neighbors at any time step. In addition, we set delay
max
to 3 time steps in alignment with the depth limit d
max
of 3, which implies that the message would have no noticeable influence even if it is finally propagated after it has been held for 3 or more time steps.
Now we can quantify the influence along a diffusion path at a specific time step. Let
\(\sigma _{i:x\rightarrow y}^{t,d}\) denote the amount of influence (originated from an influencer
i) that node
x passes on to node
y at time step
t and depth
d. Then following a diffusion path from influencer
i→
j→
k→
l, the influence that nodes
j,
k and
l acquire from node
i is respectively calculated as
$$\begin{array}{@{}rcl@{}} \sigma_{i:i\rightarrow j}^{t_{1},1}&=& e^{-\lambda t_{1}}\times \frac{1}{1^{2}}\times \hat{w}_{ij}, \end{array} $$
(11)
$$\begin{array}{@{}rcl@{}} \sigma_{i:j\rightarrow k}^{t_{2},2}&=& e^{-\lambda t_{2}}\times \frac{1}{2^{2}}\times \hat{w}_{ij}\times \hat{w}_{jk}, \end{array} $$
(12)
$$\begin{array}{@{}rcl@{}} \sigma_{i:k\rightarrow l}^{t_{3},3}&=& e^{-\lambda t_{3}}\times \frac{1}{3^{2}}\times \hat{w}_{ij}\times \hat{w}_{jk}\times \hat{w}_{kl}. \end{array} $$
(13)
In each equation above, the first term on the right hand side is the temporal influence decay, the second term is the depth-associated influence attenuation, and the rest is the chain product of the normalized influence weights on the corresponding links that constitute the diffusion path from the influencer to the node of interest.
The diffusion process starts with an initial set of influencers (seed nodes)
S0 with |
S0|=
K, and unfolds in discrete time steps. At each time step, the influence is propagated one hop from a node
u (parent node) to each out-neighbor
v (child node) with a probability
p
uv
. Each seed node is assigned an influence value of 1. Each inactive node
v is initialized with an influence value of 0, and selects an activation threshold
θ
v
uniformly at random in the range [ 0,1]. Let
A denote the set of
v’s in-neighbors who pass on influence to
v, and let
f
v
(
A) denote the threshold function of
v, that is, the total influence that
v receives from nodes in
A. Formally, we define
f
v
(
A) as
$$ f_{v}(A)=\min \left(1, \sum\limits_{u\in A} b_{uv}\right), \quad A\subseteq N_{v}^{in}, $$
(14)
where b
uv
represents the total amount of influence that v receives from u, and \(N_{v}^{in}\) denotes the set of in-neighbors of v. Whenever f
v
(A)≥θ
v
, v is activated and turns into an influencer with an influence value of 1. Then it not only continues passing other influencers’ influence as a messenger, but also starts to spread out its own influence as an influencer. The diffusion process stops when the number of hops of influence diffusion of each influencer (including the seed nodes and all activated nodes) reaches the depth limit (d
max
=3 by default) and no new activation is possible.
Influence Maximization under MAT Model
The influence-maximization (IM) problem under our MAT model can be described in a general framework as follows: Given a social network G=(V,E,W), a budget K denoting the seed-set size, and the MAT model on G associated with parameters λ (temporal decay rate), d
max
(depth limit), p
uv
for each (u→v)∈E (individual diffusion probability) and delay
max
(maximum delay) as input, find a seed set S0⊆V with |S0|≤K such that the expected influence spread σ(S0) is maximized under the MAT model. For viral marketing, the seed nodes are the initial adopters of a new product, and the influence spread is measured by the total number of people in the network who adopt the product in the end of the diffusion process.
The IM problem is NP-hard under the MAT model, which can be proved by reduction. If we set
λ=0,
d
max
=1,
delay
max
=0, and
p
uv
=1 if
u is active and
p
uv
=0 if
u is inactive for any
u∈
V, the MAT model then reduces to the classic LT model. Therefore, the IM problem over this class of instances is equivalent to the IM problem under the classic LT model, which is known to be NP-hard (
Kempe et al. 2003). This concludes the proof.
It is easy to show that the influence spread function σ(·) for the MAT model is monotone. For each run of the diffusion process, the threshold θ
v
(for each v∈V) is randomly selected and fixed. When the seed set S0 grows, the final active set ϕ(S0) also grows under these fixed thresholds, due to the monotonicity of f
v
(S). Finally, σ(S0) is simply the average of the size of all final active sets among all possible threshold values selected in different runs, and thus is monotone.
However, it is hard to prove that
σ(·) for the MAT model is also submodular. The widely-used approach is to construct a
live-edge graph model (
Chen et al. 2013) that is equivalent to the diffusion model of interest. In a live-edge graph, any active node should be reachable from at least one seed node along at least one
live-edge path consisting entirely of active nodes chained up from one to another. It implies that any active node (except seed nodes) should have at least one active neighbor, which makes it infeasible to apply this approach to our MAT model since a node without any active neighbors may still be activated by indirect influence in the MAT model. For now, we conjecture that
the influence spread function σ(·) for our MAT model is submodular. We leave the proof as future work.
To address the IM problem, we need to find the “optimal” seed set. Let us first clarify two important concepts: aggregation and overlap. Aggregation means that an inactive node is reinforced with more WOM exposures and receives more influence. When it accumulates enough influence (greater than or equal to its threshold), it becomes active and starts to spread its own influence as an influencer. Overlap means that an influencer spreads its influence to any nodes that have been activated by other influencers. On one hand, if seed nodes are closely clustered, they may potentially have large overlaps, which leads to deterioration in performance. On the other hand, if seed nodes are totally isolated, they can not achieve aggregation effect, which results in a detriment to new activations as well. Therefore, the ideal selection of the K seed nodes would be the K most influential nodes such that: (1) each of them achieves individual influence spread as large as possible; (2) they should be far enough away from each other to minimize potential overlaps; and (3) they should be close enough to reinforce aggregation effects. Using these guidelines in our seeding strategy, we develop a novel algorithm, IV-Greedy, to tackle the IM problem under the MAT model.
IV-Greedy algorithm
As discussed in the previous section, MC-Greedy (
Kempe et al. 2003) is a simple greedy hill-climbing algorithm, using Monte Carlo (MC) simulation to estimate the influence spread. Although its computational cost is extremely high due to the overhead of MC simulation, it achieves the largest influence spread under the IC and LT models among all state-of-the-art algorithms (
Chen et al. 2013). Regardless of its low efficiency, the
greedy strategy used in MC-Greedy makes sense and is highly effective. We develop IV-Greedy borrowing the greedy strategy of MC-greedy but replacing MC simulation by using the
influence vector of each node.
The influence vector of a node captures where and how much influence that node spreads over its neighborhood based on a
nonstochastic version of the MAT model, in which we do not consider individual diffusion dynamics (i.e., let
p
uv
=1 for any
u∈
V), or the activation of the nodes (i.e., only
i is an influencer; all other nodes remain messengers). In addition, we ignore the temporal diffusion decay (i.e., let
λ=0) when generating the influence vectors such that our IV-Greedy is more robust with different decay rates. We develop a modified depth-limited search algorithm, called
IV-Builder, to explore the diffusion tree of each node individually, and generate an influence vector for each node, which records its influence spread to all the nodes in its neighborhood within a depth limit of 3 (by default). IV-builder uses the same approach as we use to generate the influence matrix for community detection in (
Wang and Street 2014;
2015a). The only difference exists in the weight normalization schemes. For community detection, we normalize each in-weight of a node by the
maximum in-weight of that node to capture the
relative susceptibility of the node to its in-neighbors, and the influence vectors are used to differentiate vertex-pair similarity. For viral marketing, we normalize each in-weight of a node by the
total in-weight of that node to quantify the
absolute influence transmitted from each in-neighbor, and the influence vectors are used to find out which nodes are more likely to be activated. Due to space limit, we refer the interested reader to (
Wang and Street 2015a) for implementation details on influence-vector generation.
We use these influence vectors as a proxy for the
stochastic influence-diffusion process to avoid the expensive MC simulation. Specifically, We define the
influence score of a node as the sum of all elements in its influence vector, which represents an estimate of its total influence spread. This influence score can be used to differentiate the influence significance of the nodes in the network. An intuitive solution is to select the top-
K nodes of the highest influence scores as the seeds. Unfortunately, it does not work well. Like those high-degree nodes, the nodes of high influence scores may be clustered or too close to each other, leading to large overlaps of influence spread. We need to somehow separate them from each other to minimize overlaps. The strategy that we use in IV-Greedy is a greedy strategy similar to MC-Greedy, in which we sweep over the influence vector of each node to repeatedly pick the node with the
maximum marginal gain and add it to the seed set until all
K seeds are found. The pseudocode is shown in Algorithm 2.
In IV-Greedy, each node is indicated by its node index from 1 to n. IV-Greedy starts with an empty seed set (Line 2), and then generate the influence vector IV
u
for each node u∈V (Line 3). IV
u
is an n-element array, in which element IV
u
(j) represents the amount of influence node u exerts on node j. For each node u, we get its influence score R
u
by summing up all elements in its influence vector IV
u
(Lines 4-6). We find the node v that has the highest influence score (Line 7), and add it to the seed set S as the first seed (Line 8). Then we make a copy of IV
v
to an array AR, which is used as a representation of the collective influence distribution of currently selected seeds. To find the next seed, we sweep over each node i∈V∖S and compute the marginal gain g
i
of adding i to S individually. We compute its benefit factor c=1−AR(i) (Line 13). Recall that AR(i) represents the amount of influence i receives so far, which can be also regarded as the probability activating i. For example, if AR(i)=1, it means that i has been activated, then selecting i as a seed has no benefit to the overall influence spread. Then for each element IV
i
(j), we get p=c×IV
i
(j) (Line 15), which represents the marginal gain that node j gets. However, remember that the threshold function f
v
(S) is defined within the range [ 0,1]. If the amount of influence that node v receives is greater than 1, it is set to 1 since v has been activated and more influence on v is of no more effect or benefit. Therefore, we add p to AR(j) to get q (Line 16), which represents the accumulated influence that node j receives. If q is greater than 1, we set the actual marginal gain p=1−AR(j) (Lines 17-19). Then we add p to g
i
, which represents the total marginal gain of adding node i to S. Once we get the marginal gain of each node i∈V∖S (Lines 11-22), we select the node that produces the maximum marginal gain as a seed and add it to S (Lines 23-24). Then we update AR by calculating the benefit factor c and adding the marginal gain to each element in AR (Lines 25-28). We repeat the process described above (K−1) times to find the (K−1) seeds, which along with the first seed node constitute a full seed set S.