Top

Social Network Analysis and Mining

Published in:

Open Access 01-12-2018 | Original Article

A new approximation method for the Shapley value applied to the WTC 9/11 terrorist attack

Authors: Tjeerd van Campen, Herbert Hamers, Bart Husslage, Roy Lindelauf

Published in: Social Network Analysis and Mining | Issue 1/2018

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Patentsearch

Off

Abstract

The Shapley value (Shapley in Ann Math Stud 2:28, 1953) is one of the most prominent one-point solution concepts in cooperative game theory that divides revenues (or cost, power) that can be obtained by cooperation of players in the game. The Shapley value is mathematically characterized by properties that have appealing real-world interpretations and hence its use in practical settings is easily justified. The down part is that its computational complexity increases exponentially with the number of players in the game. Therefore, in practical problems that consist of more than 25 players the calculation of the Shapley value is usually too time expensive. Among others the Shapley value is applied in the analysis of terrorist networks (cf. Lindelauf et al. in Eur J Oper Res 229(1):230–238, 2013) which generally extend beyond the size of 25 players. In this paper we therefore present a new method to approximate the Shapley value by refining the random sampling method introduced by Castro et al. (Comput Oper Res 36(5):1726–1730, 2009). We show that our method outperforms the random sampling method, reducing the average error in the Shapley value approximation by almost $30\%$. Moreover, our new method enables us to analyze the extended WTC 9/11 network of Krebs (Connections 24(3):43–52, 2002) that consists of 69 members. This in contrast to the restricted WTC 9/11 network considered in Lindelauf et al. (2013), that only considered the operational cells consisting of the 19 hijackers that conducted the attack.

1 Introduction

Cooperative game theory with transferable utilities studies situations in which players can work together and create additional revenues (or costs reductions) instead of the situation in which each player acts on its own. When all players involved in the game work together, the so-called grand coalition forms. The objective is to find an allocation of the total revenue (or cost) of the grand coalition among the players that is considered ‘fair’ by all (sets of) players.

There are many solution concepts in cooperative game theory, each satisfying its own set of properties. We mention as one-point solutions the nucleolus (Schmeidler 1969) and the compromise value (Tijs 1981). However, the drawback of these solutions is that there do not exist general characterizations consisting of intuitive appealing properties. The most prominent one-point solution concept satisfying intuitive properties that are also considered as fair in many situations in practice is the Shapley value (Shapley 1953). The properties symmetry (i.e., two players that are symmetric in a game should receive an equal share), dummy (i.e., a player that does not contribute in a game only receives its individual contribution) and monotonicity (i.e., if a game changes such that a set of players is rewarded more in each coalition they participate, then these players should receive at least the same as allocated in the original game) are examples of such properties (cf. Shapley 1953; Young 1985) that are satisfied by the Shapley value. The Shapley value is defined as the average of all n! marginal vectors in a cooperative game consisting of n players. Each marginal vector is determined by a specific order of the players, say $\sigma$, in which the player in the i-th position in $\sigma$, say player k, is allocated the difference between the value of the coalition consisting of k and its predecessors and the value of the coalition consisting only of the predecessors of k. Although the concept of marginal contributions is intuitive clear, computing the Shapley value as the average marginal contribution of each player is considerably harder. The Shapley value has both theoretical and practical merits. From a theoretical perspective, for instance, the Shapley value can be characterized using properties that are in general considered as appealing and fair (cf. Shapley 1953; Young 1985) or it can be regarded as a special case of the theory of power indexes (Shapley and Shubik 1954). An overview and further generalizations of the Shapley value are provided in the book edited by Roth (1988).

The Shapley value has been applied to many real-world cases. Consider the airport landing fee problem described in Littlechild and Owen (1973). Littlechild and Owen proposed a division of the landing fees using the Shapley value. More recent, Deidda et al. (2009) use the Shapley value to allocate the cost of transporting water from reservoirs to the users in the Turia river basin. In genetics Moretti et al. (2007) use the Shapley value of the microarray game to determine the most important genes for specific body functions. Lindelauf et al. (2013) model terrorist networks by connectivity games and use the Shapley value to identify key players in such networks. More applications of the Shapley value in different fields can be found in the survey of Moretti and Patrone (2008).

The biggest challenge in applying the Shapley value to real-world applications is its computational time which generally increases exponentially with the number of players. More precisely, it can be shown that computing the Shapley value is an NP-complete problem (cf. Deng and Papadimitriou 1994; Faigle and Kern 1992).

A notable exception is the class of games that are mathematically defined in such a way that it becomes possible to compute the Shapley value in polynomial time [e.g., airport games (Littlechild and Owen 1973) and sequencing games (Curiel et al. 1989)]. In the context of terrorist networks Michalak et al. (2013) developed an algorithm to compute the Shapley value for connectivity games on networks that is polynomial in the number of connected subgraphs when the underlying network structure is known. Another exception is when properties of the Shapley value can be used to determine it in polynomial time [e.g., unanimity games (Shapley 1953)]. Unfortunately most games lack both a convenient mathematical structure and the option to use properties of the Shapley value such that it can be determined in polynomial time. Then even in a game consisting of 25 players the computation of the Shapley value is too time expensive. In general real-world applications deal with a large number of players, hence underlining the importance of heuristics in determining the Shapley value.

Two kinds of heuristics have been developed up to now. The first one focusses on the special class of simple games, i.e., games that only attain a value of 0 or 1. Within this class voting games receive most attention. See for instance Fatima et al. (2008) who present an algorithm that has time complexity linear in the number of players. They show that their algorithm outperforms the following approximation methods: Monte Carlo simulation (Mann and Shapley 1960), multi-linear extension (MLE) (Owen 1972), modified MLE (Leech 2003) and random permutation (Zlotkin and Rosenschein 1994). Moreover, they provide some error bound analysis and show that their algorithm is also applicable to k-majority games.

The second type of heuristic uses the average of a random subset of marginal vectors, see Castro et al. (2009). In contrary to Fatima et al. (2008) that is restricted to the class of voting games which are 0, 1 valued, the heuristic of Castro et al. (2009) is applicable to all types of games. This approximation method for the Shapley value has been used more often lately in the literature. The majority of these papers focus on specific classes of games. In Bachrach et al. (2010), for example, the focus is again on simple games. Liben-Nowell et al. (2012) consider convex games. Furthermore, Narayanam and Narahari (2008, 2011) use this method in social networks.

One area where better approximation methods for the Shapley value are needed is in the ranking procedure of individuals in networks of a terrorist, insurgent or criminal nature. For instance in Lindelauf et al. (2013) the operational WTC 9/11 network is analyzed, consisting of the 19 hijackers of the four planes involved in the attack. Using a cooperative game that takes both compositional and structural variables into account a ranking is obtained using the Shapley value. However, the extended WTC 9/11 network consists of 69 members (see Krebs 2002) which is too large to analyze with the Shapley value in order to obtain a ranking of the players in a reasonable amount of time. The algorithm of Michalak et al. (2013) that considers connected subgraphs is also too time expensive as the number of such subgraphs becomes too large. Furthermore, there is a need for a general approximation method to the Shapley value that is also applicable to games without network structure.

In this paper we therefore introduce a refinement of the random sampling method introduced by Castro et al. (2009) and establish rankings for all the 69 members in the extended WTC 9/11 network. Hence, the heuristic we introduce is also applicable to all types of games. Our sampling method first selects a random set of permutations of all players in the game. Second we modify these permutations in such a way that each player attains each position in the permutation the same number of times. This slight modification results in a better performance than the heuristic of Castro et al. (2009). On average the error in the Shapley value approximation is reduced by almost $30\%$.

This paper is organized as follows. In Sect. 2 some well-known aspects of the Shapley value are recalled. Section 3 discusses the heuristic of Castro et al. (2009) and our new heuristic which we will refer to as structured random sampling. Section 4 provides the performance analysis of the structured random sampling method. Section 5 applies our heuristic to the network of hijackers and accomplices involved in the 9/11 attack. Section 6 ends with conclusions.

2 The Shapley value

A cooperative game is a pair (N, v), where $N=\{1,2,\ldots ,n\}$ denotes the set of players. These players can cooperate and form different coalitions. A map v assigns a value v(S) to each possible coalition $S\subseteq N$, which reflects the potential benefit or power of coalition S. By definition $v(\emptyset )=0$. The Shapley value allocates the value of the grand coalition, i.e., v(N), over all individual players. The most commonly used definition of the Shapley value is based on marginal vectors. Let $\Pi$ contain all possible orderings of the players in the grand coalition N and let $\sigma =(\sigma _1,\sigma _2,\ldots ,\sigma _N)\in \Pi$ be one such ordering. If player i is at position k, i.e., $\sigma _k=i$, then its marginal contribution $m_v^\sigma (i)$ is defined as $m_v^\sigma (i)=v\left( \{\sigma _1,\ldots ,\sigma _k\}\right) -v\left( \{\sigma _1,\ldots ,\sigma _{k-1}\}\right)$. Hence, the marginal contribution of player i is the extra value (or benefit) that player i contributes to the already established coalition $\{\sigma _1,\ldots ,\sigma _{k-1}\}$. Clearly, the marginal contribution of a player depends on the chosen ordering $\sigma$. Considering all n! possible orderings in $\Pi$, we define the Shapley value $\varphi _i(v)$ of player i as its average marginal contribution:

$$\begin{aligned} \varphi _i(v)=\frac{1}{n!}\sum _{\sigma \in \Pi } m_v^\sigma (i). \end{aligned}$$

(1)

The following example illustrates how to compute the Shapley value via the marginal contributions of players.

Example 2.1

(Computing the Shapley value via marginal contributions) Consider the 3-person game (N, v) with v as in Table 1.

Table 1

An example of a 3-person game

S	$\emptyset$	$\{1\}$	$\{2\}$	$\{3\}$	$\{1,2\}$	$\{1,3\}$	$\{2,3\}$	$\{1,2,3\}$
v(S)	0	1	3	0	5	7	4	10

There are $3!=6$ orderings possible in this 3-person game. The corresponding marginal contributions are depicted in Table 2. In the first row of this table, for example, it can be seen that the marginal contribution of player 2 in the ordering (1, 2, 3) equals $m_v^{\sigma }(2)=v\left( \{1,2\}\right) -v\left( \{1\}\right) =5-1=4$.

Table 2

The marginal contributions of the 3-person game in Table 1

$\sigma$	$m_v^{\sigma }(1)$	$m_v^{\sigma }(2)$	$m_v^{\sigma }(3)$
(1, 2, 3)	1	4	5
(1, 3, 2)	1	3	6
(2, 1, 3)	2	3	5
(2, 3, 1)	6	3	1
(3, 1, 2)	7	3	0
(3, 2, 1)	6	4	0

Averaging the marginal contributions of each player results in the Shapley value: $\varphi (v)=(3\frac{5}{6}, 3\frac{1}{3}, 2\frac{5}{6})$.

Computing the Shapley value via marginal contributions is time expensive since all n! possible orderings of the players need to be considered. In several special cases, however, a time-efficient closed formula to compute the Shapley value exists. This is the case, for example, when the game (N, v) is represented as a linear combination of unanimity games. Given $N=\{1,2,\ldots ,n\}$ and a non-empty subset $T \subseteq N$ of players, a unanimity game $u_T$ is defined as

$$\begin{aligned} u_T(S)={\left\{ \begin{array}{ll} 1 &{} \quad \text{ if } \quad T\subseteq S, \\ 0 &{}\quad \text{ otherwise, }\end{array}\right. } \end{aligned}$$

(2)

for each possible coalition $S \subseteq N$. Interpretation: a benefit of 1 can only be attained when all players in T are involved in the coalition. Consider a game (N, v) that is represented as a linear combination of several unanimity games, i.e., $v(S)=\sum _{T\subseteq N, T\ne \emptyset } c_T\cdot u_T(S)$, where $c_T$ is the coefficient of the corresponding unanimity game. For this sum-of-unanimity-games game (or, shorter, SOUG game) the Shapley value of player i can readily be computed (cf. Shapley 1953):

$$\begin{aligned} \varphi _i(v)=\sum _{T:T\ni i} \frac{c_T}{|T|}. \end{aligned}$$

(3)

Any cooperative game can be represented as a SOUG game (c.f. Shapley 1953) and this representation provides an efficient way to compute the Shapley value by means of (3). Unfortunately, to reformulate a random cooperative game as a SOUG game all $2^n-1$ coalition values v(S) of the game (N, v) need to be considered. Hence, this procedure is only useful when the SOUG game is already provided.

The following example illustrates how to efficiently compute the Shapley value when the underlying SOUG game is provided.

Example 2.2

(Computing the Shapley value for a SOUG game) Consider the following 3-person unanimity games $u_{\{1\}}$, $u_{\{1,2\}}$ and $u_{\{2,3\}}$. Table 3 presents the coalition values for each unanimity game.

Table 3

An example of three unanimity games

S	$\{1\}$	$\{1,2\}$	$\{1,3\}$	$\{2,3\}$	$\{1,2,3\}$
$u_{\{1\}}$	1	1	1	0	1
$u_{\{1,2\}}$	0	1	0	0	1
$u_{\{2,3\}}$	0	0	0	1	1

Let $c_{\{1\}}=2$, $c_{\{1,2\}}=-\,3$, $c_{\{2,3\}}=5$ and let all other coefficients equal 0. Then, for the corresponding 3-person SOUG game (N, v) it follows that $v(S)=2\cdot u_{\{1\}}(S)-3\cdot u_{\{1,2\}}(S)+5\cdot u_{\{2,3\}}(S)$, see Table 4.

Table 4

The 3-person SOUG game corresponding to Table 3

S	$\emptyset$	$\{1\}$	$\{2\}$	$\{3\}$	$\{1,2\}$	$\{1,3\}$	$\{2,3\}$	$\{1,2,3\}$
v(S)	0	2	0	0	$-$ 1	2	5	4

Applying the procedure of Example 2.1 we could compute the average marginal contribution of each player, and, hence, the Shapley value. Note, however, that (3) provides a straightforward way to compute the Shapley value by considering the explicit formulation of the SOUG game. For example, the Shapley value of player 2 is $\varphi _2(v)=\frac{c_{\{2\}}}{|\{2\}|}+\frac{c_{\{1,2\}}}{|\{1,2\}|}+\frac{c_{\{2,3\}}}{|\{2,3\}|}+\frac{c_{\{1,2,3\}}}{|\{1,2,3\}|} =\frac{0}{1}+\frac{-\,3}{2}+\frac{5}{2}+\frac{0}{3}=1$. In a similar way the Shapley values of the other two players follow and $\varphi (v)=(\frac{1}{2}, 1, 2\frac{1}{2})$.

3 Approximation methods

In cases where the Shapley value of a cooperative game is too time expensive to compute we resort to approximation methods. Such methods are either based on the class of the underlying game or, for more general classes of games, are based on sampling methods. In this section we first review the nowadays popular random sampling method. Then we introduce a new approximation method, the so-called structured random sampling method. We illustrate each approximation method by an example.

3.1 Random sampling

In the last decade there has been a focus on approximating the Shapley value of arbitrary games with many players using a random sample of the marginal vectors. This random sampling method was first introduced by Castro et al. (2009). They applied the following three-step procedure.

Procedure random sampling

Input: n-person cooperative game (N, v).

Select a subset $\Pi _r $ of r orderings from all n! possible orderings, i.e. $\Pi _r \subset \Pi$.

Compute the marginal contributions $m_v^{\sigma }(i)$ for all players $i\in N$ and for all orderings $\sigma \in \Pi _r$.

Approximate the Shapley value for each player i by averaging the marginal contributions obtained at step 2, i.e., $\hat{\varphi }_i(v)=\frac{1}{r}\sum _{\sigma \in \Pi _r} m_v^\sigma (i)$.

Notice that in step 1 the modeler can choose how many orderings he wants to use, i.e., he selects the value of r. We will use the random sampling method as a benchmark for our new structured random sampling method. In the following example we apply the random sampling method to the 3-person game introduced in Sect. 2 (Table 1) with $r=3$. In Table 5 one possible choice of orderings is given.

Example 3.1

(Approximating the Shapley value via random sampling) Consider the 3-person game (N, v) of Example 2.1. Say we sample half of the possible orderings, i.e., $r=3$. Then compute the marginal contributions for each of the three players in the sampled orderings. Table 5 depicts a possible scenario. Finally, average the marginal contributions to obtain an approximated Shapley value for each player. Hence, $\hat{\varphi }(v)=(3,3\frac{1}{3},3\frac{2}{3})$.

Table 5

The marginal contributions of randomly selected orderings of the 3-person game in Table 1

$\sigma$	$m_v^{\sigma }(1)$	$m_v^{\sigma }(2)$	$m_v^{\sigma }(3)$
(1, 2, 3)	1	4	5
(1, 3, 2)	1	3	6
(3, 1, 2)	7	3	0

In Example 3.1 the approximated Shapley values for players 1 and 3 differ substantially from the real Shapley values $\varphi _1(v)=3\frac{5}{6}$ and $\varphi _3(v)=2\frac{5}{6}$. In this example the difference is mainly due to the lack of orderings in the sample in which player 1 is the last player since these orderings largely influence the Shapley value of both players 1 and 3. Hence, influential orderings like these can easily be missed when sampling at random. This observation led to the idea to ensure that each player attains each position in the ordering the same number of times. This is why we propose to use a new approximation method that adds structure to the orderings that are sampled.

3.2 Structured random sampling

Structured random sampling method is inspired by the random sampling method. The idea of this new heuristic is to ensure that each player attains each position in the ordering the same number of times. As a consequence the marginal contribution of a player to a coalition of the same size is calculated the same number of times. The intuition is that this leads to a better estimate because the calculation of the marginals with respect to coalitions of a certain size is equally distributed. To realize this the randomly selected orderings are tweaked by swapping players to their preferred positions in the orderings. The marginal contributions of the players in these new orderings are then used to approximate the Shapley value.

The swapping method is illustrated for a 4-person game in Table 6. To ensure that each player attains each position in the ordering the same number of times the sample size r must be a multiple of the number of players of the game. In this case $r=8$ orderings are randomly selected and divided into $n=4$ groups of size $t=2$. Observe that the number of groups is always equal to the number of players in the game. The size of a group indicates the number of times a player attains the same position in the ordering. In the two orderings in the first group player 1 is swapped with the player at the first position. In the two orderings in the second group player 1 is swapped with the player at the second position et cetera. These new orderings are used to compute the marginal contributions of player 1 and they in turn are averaged to approximate the Shapley value of player 1. This procedure is then repeated for players 2 to n, each time using the original randomly selected r orderings as a starting point for the swapping method. In Table 6 player 1 attains each position in the ordering exactly $t=2$ times. The remaining positions in the orderings, however, remain random. The same holds for players 2 to n when the swapping method is applied to construct the orderings for these players.

Table 6

Swapping player 1 to his preferred positions in the orderings

https://static-content.springer.com/image/art%3A10.1007%2Fs13278-017-0480-z/MediaObjects/13278_2017_480_Tab6_HTML.gif

The procedure to approximate the Shapley value of an arbitrary game using the structured random sampling method is as follows.

Procedure structured random sampling

Input: n-person cooperative game (N, v). (Hence, n is fixed and determines the number of groups.)

Select a subset $\Pi _r$ of r orderings from all n! possible orderings, i.e., $\Pi _r \subset \Pi$, with $r=t\cdot n$ and $t\in \mathbb {N}$. (Hence, the subset must be a multiple of n)

Divide the subset $\Pi _r$ in n groups of size t. (This ensures that each player can attain each position in the ordering the same number of times.)

For each player i:

(a)

Swap player i with the player at position j for each of the t orderings in group j, where $j\in \{1,\ldots ,n\}$, resulting in a set $\Pi '_r$ of r new orderings. (This ensures that each player will attain each position in the ordering the same number of times.)

(b)

Compute the marginal contributions $m_v^\sigma (i)$ of player i for all new orderings $\sigma \in \Pi '_r$.

(c)

Approximate the Shapley value of player i by averaging the marginal contributions obtained at step 3b, i.e., $\hat{\varphi }_i(v)=\frac{1}{r}\sum _{\sigma \in \Pi '_r} m_v^{\sigma }(i)$.

The following example illustrates how the structured random sampling procedure could be applied to the cooperative game of Example 2.1.

Example 3.2

(Approximating the Shapley value via structured random sampling) Consider again the 3-person game (N, v) of Example 2.1. Assume that we sample the same subset $\Pi _r$ of $r=3$ orderings as in the random sampling example, see the second column of Table 7. Since we have a three player game, i.e., $n=3$, we divide this subset into 3 groups. Since the size of the subset is chosen to be 3, i.e., $r=3$, we have that the size of each group equals one, i.e., $t=1$. Now consider player 1. Swapping this player with the player at the first, second and third position results in the new orderings depicted in the third column of Table 7. The fourth column in this table depicts the corresponding marginal contributions $m_v^{\sigma }(1)$ of player 1 in the new orderings. More precisely, if $\sigma = (1,2,3)$ then $m_v^{\sigma }(1)= v(\{1\})- v(\emptyset )= 1-0=1$, if $\sigma = (3,1,2)$ then $m_v^{\sigma }(1)= v(\{1,3\})- v(\{3\})= 7-0=7$, and if $\sigma = (3,2,1)$ then $m_v^{\sigma }(1)= v(\{1,2,3\})- v(\{2,3\})= 10-4=6$. Averaging these marginal contributions yields an approximation of the Shapley value for player 1, i.e., $\hat{\varphi }_1(v)=(1+7+6)/3=4\frac{2}{3}$.

Table 7

The marginal contributions of randomly selected orderings of the 3-person game in Table 1

Group	Ordering	Swap 1	$m_v^{\sigma }(1)$	Swap 2	$m_v^{\sigma }(2)$	Swap 3	$m_v^{\sigma }(3)$
1	(1, 2, 3)	(1, 2, 3)	1	(2, 1, 3)	3	(3, 2, 1)	0
2	(1, 3, 2)	(3, 1, 2)	7	(1, 2, 3)	4	(1, 3, 2)	6
3	(3, 1, 2)	(3, 2, 1)	6	(3, 1, 2)	3	(2, 1, 3)	5

Starting again from the original subset $\Pi _r$ in the second column and swapping player 2 with the player at the first, second and third position results in a new subset of orderings for player 2, see the fifth column of Table 7. The sixth column depicts the corresponding marginal contributions, resulting in $\hat{\varphi }_2(v)=3\frac{1}{3}$. Repeating this swapping method for the third player results in the orderings and marginal contributions depicted in the seventh and eight column of Table 7, which in turn lead to $\hat{\varphi }_3(v)=3\frac{2}{3}$. Hence, $\hat{\varphi }(v)=(4\frac{2}{3}, 3\frac{1}{3}, 3\frac{2}{3})$.

Two important observations follow from Examples 3.1 and 3.2. First, both methods calculate the same number of marginal contributions per player in order to approximate the Shapley value. The structured random sampling method, however, also needs to swap players. This extra operation results in a slightly larger computation time, see Sect. 4.3. Second, random sampling is efficient, i.e., v(N) is distributed over the n players, whereas structured random sampling is not efficient. This lack of efficiency is due to the fact that structured random sampling only considers marginal contributions of a single player for each sampled ordering. In spite of these two drawbacks the structured random sampling method outperforms the random sampling method when it comes to approximating the Shapley value as will be shown in the next section.

4 Performance analysis

In this section the performance of our proposed structured random sampling method is compared to the performance of the random sampling method. The idea is to apply both sampling methods on arbitrary games with many players and compare the results with one another as well as with the exact Shapley values of these games. First we describe how the errors between the approximated and exact Shapley values are measured. Then we run two types of performance analyses: one on the number of orderings used in the sampling methods and one on the number of players in the cooperative game. Observe that the random sampling method is the only method that can also approximate any class of games, in contrary to the method of Fatima et al. (2008) that is only applicable to a the special class of voting games that can only attain the value 0 and 1. Therefore, we only use the random sampling method as benchmark for the structured random sampling method. Nevertheless, at the end of this section we also shortly compare the performance between the structured random sampling method and the method of Fatima et al. (2008).

We test the performance of both sampling methods for the class of SOUG games. In order to generate random SOUG games the number of players, the number of unanimity games and the corresponding coefficients of these unanimity games have to be chosen randomly. Using this class of games has two main advantages. First, since each cooperative game can be represented as a SOUG game, and the SOUG games are generated randomly, the cooperative games used in the analysis are also random. Second, since the generated SOUG games are known, the exact Shapley values of the players in the game can be computed efficiently, see Example 2.2. Hence, we will be able to compute error measures for both approximation methods by comparing the approximated Shapley values with the exact Shapley values.

The error measures are computed as follows. For a randomly generated SOUG game $v_j$ the exact Shapley values as well as the approximated Shapley values are computed for all players in the game. Then both the absolute error and the percentage error are computed for each player. Averaging these errors leads to the average absolute error and the average percentage error of the sampling method for the corresponding game $v_j$. This provides insight in the absolute as well as the relative size of the errors incurred by our approximation method. To improve the validity of the error measures not just one but 50 random SOUG games $v_j$ are generated, hence, $j=1,\ldots ,50$. Averaging the average absolute error and the average percentage error results in the error measures ‘average average absolute error’ (AAAE) and ‘average average percentage error’ (AAPE) for these 50 random SOUG games $v_j$, i.e.,

$$\begin{aligned} \text{ AAAE }&= \frac{1}{50}\sum _{j=1}^{50}\left( \frac{1}{n}\sum _{i=1}^n \left| \hat{\varphi }_i(v_j)-\varphi _i(v_j)\right| \right) \end{aligned}$$

(4)

$$\begin{aligned} \text{ AAPE }&= \frac{1}{50}\sum _{j=1}^{50}\left( \frac{1}{n}\sum _{i=1}^n \frac{\left| \hat{\varphi }_i(v_j)-\varphi _i(v_j)\right| }{\left| \varphi _i(v_j)\right| } \right) \end{aligned}$$

(5)

These average error measures provide for better benchmarking of the approximation methods to the exact method. To facilitate averaging over multiple games the value of the grand coalition, i.e., $v_j(N)$, in each random SOUG game is normalized. The general procedure to compute the error measures is summarized in the following four-step approach.

Procedure error measures

Randomly generate 50 SOUG games and normalize the value of the grand coalition in each game.

Compute the exact Shapley values for all players in all 50 games.

Use random sampling to approximate the Shapley values for all players in all 50 games and compute the error measures AAAE en AAPE.

Use structured random sampling to approximate the Shapley values for all players in all 50 games and compute the error measures AAAE en AAPE.

A note on the computation of the error measure AAPE. Consider a player with a Shapley value close to zero. This Shapley value may be approximated very well by (structured) random sampling, but may still result in a relatively large percentage error. Therefore, in the computation of the error measure AAPE only players with a Shapley value larger than $0.1\%$ of v(N) are taken into account.

4.1 Number of orderings

The idea behind both sampling methods is to consider only a fraction of the total number of orderings of the players. It is expected that the approximations will be better, i.e., closer to the exact Shapley values, when the number of orderings used in the sampling methods increases. A simulation consists of 50 randomly generated SOUG games. Each of these SOUG games consists of 100 players and is constructed from 100 unanimity games, with a randomly selected subset $T\subset N$. The coefficients, which are multiplied with the unanimity games, are random integers taken from the interval $[-\,10, 10]$. In the simulations the number of orderings vary from only 500 orderings up to 5000 orderings. Note, however, that for a game with 100 players an astoundingly $100!\approx 9.33 \times 10^{157}$ different orderings of the players exist. Even when sampling 5000 orderings only a tiny fraction of the total number of orderings is used in the approximation, i.e., less than $0.01\%$.

Figure 1 shows the results of the performance analysis on the number of orderings. In this figure the ‘average average absolute error’ (AAAE) and ‘average average percentage error’ (AAPE) are provided for both sampling methods when the number of orderings is varied from 500 up to 5000 orderings.

From this figure it can be seen that average absolute errors (the AAAEs) for both sampling methods are very small. The average percentage errors (the AAPEs), however, are quite large. For example, using 5000 orderings the AAPE is 16% for random sampling and 11% for structured random sampling. Hence, more orderings should be sampled to obtain more accurate approximations. Still, considering the tiny fraction of orderings used in the approximation of the Shapley values the results of both sampling methods are promising. Furthermore, it can be seen that structured random sampling outperforms random sampling. That is, the errors in the approximations from structured random sampling are significantly smaller than the errors of random sampling. This effect is even more profound when the number of orderings is small, i.e., close to 500.

4.2 Number of players

The second analysis tests the performance of both approximation methods for different numbers of players. Again arbitrary games are considered by randomly generating 50 SOUG games in each simulation. The parameter settings of the SOUG games construction are quite similar to the settings in the performance analysis of the number of orderings. The only differences with the previous analysis are that the number of orderings is fixed to 5000 orderings and the number of players in each game varies from 10 to 100 players. Furthermore, the normalized value of the grand coalition in each game of n players is factored by 0.01n thereby enabling the comparison of the absolute errors in the approximation methods for games with different numbers of players.

Figure 2 shows the results of the performance analysis on the number of orderings. In this figure the ‘average average absolute error’ (AAAE) and ‘average average percentage error’ (AAPE) are provided for both samplings methods while the number of players is varied from 10 up to 100 players.

Both sampling methods are seen to have small AAAEs en AAPEs, but these errors seem to increase as the number of players increases. This is most likely due to the fact that the number of orderings used in all the approximations is fixed to 5000 orderings. When the number of players increases the number of possible orderings increases as well. Hence, a smaller fraction of the total number of orderings is sampled and the approximations are expected to be less accurate. Nevertheless, the approximation errors increase only slowly when the number of players increases. For example, the AAPEs of random sampling and structured random sampling are, respectively, 4 and 3% in a game of 10 players. These numbers are, respectively, 13 and 9% in a game of 100 players. Hence, a factor 10 increase in the number of players results in only a factor 3 increase in the average errors. Again, structured random sampling outperforms random sampling. This time the effect is even more profound when the number of players in the game increases.

4.3 A note on computation times

Each performance analysis implies running several simulations. In each simulation 50 random SOUG games are considered. All simulations and computations were performed with MATLAB, release R2014, on a computer with four Intel X3430 processors of 2.4 Ghz and with 4.00 Gb RAM.

The computation time of the structured random sampling method is slightly larger than the computation time of random sampling. This extra time is due to swapping players to their preferred positions in the orderings in the structured random sampling procedure. Overall, this time difference is, however, negligible. Average computation times are about 500–900 s with the number of orderings ranging from 2500 to 5000. These times are about 300–850 s with the number of players ranging from 50 to 100.

4.4 Benchmark to weighted majority games

In this section we shortly compare the performance of structured random sampling to the method of Fatima et al. (2008). Recall that the structured random sampling method can be applied to any game whereas Fatima’s method can only be applied to weighted majority games, a special class of games that can only attain the values 0 and 1. Hence, a fair benchmarking is not really possible because in the structured random sampling we do not use any characteristics of a game whereas the method of Fatima et al. (2008) strongly depends on the special structure of weighted majority games. More precisely, their method has to choose weights for each player. For their estimation of the Shapley value they use the Central Limit Theorem and take the average of only n marginal vectors that are created using the normal distribution of the sample mean of the weights. Although the small (fixed) number of marginal vectors makes this method very fast, this is also a limitation since it is not possible to increase the number of marginal vectors to reduce the approximation error. In the structured random sampling method the number of marginal vectors can be increased to improve the approximation of the Shapley value. In the following example we show that, in case the structured random sampling method only uses n marginal vectors, it performs often quite similarly to the method of Fatima et al. (2008), and in case the number of orderings is increased, but still entails only a tiny fraction of the total number of orderings, structured sampling clearly outperforms Fatima’s method.

Example 4.1

(Structured random sampling and Fatima’s method) Fatima et al. 2008 use several data cases to test the performance of their method. Nine of these data cases consider voting games with $n=20$ players, which we will use in our benchmark. The weights of the players in these data cases vary from a mean weight of 20 up to a mean weight of 100. In each data case different values are considered for the quota, resulting in a total of 68 different games. We have applied our structured random sampling method to each of these 68 voting games. Figure 3 shows the average approximation errors of Fatima’s method and structured random sampling with $n=20$ orderings (hence, 20 marginal vectors). In Fig. 3, it can be seen that the very small number of marginal vectors results in spikes in the average approximation error for the structured random sampling method with respect to Fatima’s method for three of the data cases. In particular for the data case with mean weight 30 the structured random sampling method performs poorly. The spikes at mean weights 70 and 90 are less severe. In the remaining six data cases, however, the performance of the structured random sampling method equals Fatima’s method. The big advantage of structured random sampling (besides its application to all games instead of only majority games) is its scalability to improve performance. As Fig. 4 illustrates, increasing the number of orderings to 10,000, which is still a very small subset of the total of $20!\approx 2.4\times 10^{18}$ orderings and takes only 14 s (Fatima’s method takes 3 s on the same machine), improves the performance of structured random sampling significantly.

5 Application: Al-Qaeda’s network of the WTC 9/11 attack

In this section the structured random sampling method is applied to a real-world case, i.e., the WTC 9/11 terrorist attack of Al-Qaeda. On September 11th, 2001, Al-Qaeda hijacked four planes in the United States of America. Two of the planes flew into the Twin Towers of the World Trade Center of New York. A third plane flew into the Pentagon and a fourth plane crashed in Pennsylvania. The attack resulted in the immediate death of 2977 people and left many more injured.

Based on publicly available resources Krebs (2002) mapped Al-Qaeda’s terrorist network responsible for the WTC attack. This includes a network of 19 pilots and hijackers, which were directly responsible for the 9/11 attack, and an extended network of 69 members that includes accomplices, see Fig. 5. In this figure the size of each node is proportional to the number of connections of each member. Lindelauf et al. (2013) computed the Shapley value of the connectivity game based on the network of the 19 pilots and hijackers. In this connectivity game a coalition receives a value of 1 when all players in the coalition are able to (indirectly) communicate with each other (i.e., the underlying subgraph is connected). Otherwise, the coalition receives a value of 0 (i.e., the underlying subgraph is disconnected). Consider, for example, the lower right part of Fig. 5. In the connectivity game coalition $\{\text {Jean-Marc Grandvisir, Nizar Trabelsi}\}$ receives a value of 0, whereas the coalition $\{\text {Jean-Marc Grandvisir, Nizar Trabelsi, Djamal Beghal}\}$ receives a value of 1. Computing the Shapley value of the connectivity game for the extended network of 69 members can not be accomplished using the exact method of Lindelauf et al. (2013) due to the extremely large number of $69!\approx 1.7\times 10^{98}$ orderings to be considered in the computations.

Fortunately, our structured random sampling method is able to approximate the Shapley value for each of the 69 members in the network. These approximated Shapley values can be used to construct a ranking of the members in the network. In this ranking the members with the highest Shapley values attain the top positions in the ranking. Table 8 depicts the first 15 members in the ranking based on the approximated Shapley values using structured random sampling.

Table 8

First 15 members in WTC network of Fig. 5 according to the approximated Shapley value

Ranking	Name	Appr. Shapley value
1	Mohamed Atta	0.1145
2	Essid Sami Ben Khemais	0.1134
3	Hani Hanjour	0.1112
4	Djamal Beghal	0.1081
5	Khalid Almihdhar	0.1077
6	Mahmoun Darkazanli	0.1075
7	Zacarias Moussaoui	0.1015
8	Nawaf Alhazmi	0.1008
9	Ramzi Bin al-Shibh	0.0984
10	Raed Hijazi	0.0930
11	Hamza Alghamdi	0.0099
12	Fayez Ahmed	0.0093
13	Marwan Al-Shehhi	0.0045
14	Satam Suqami	0.0038
15	Saeed Alghamdi	0.0034

The ranking in Table 8 is obtained using a sample of only 20,000 orderings (with a computation time of 11 min). Increasing the number of orderings up to 500,000 orderings (taking up 4 h of computation time) had no effect on the members present in the top-10. Note that even in the case of 500,000 orderings still only a tiny fraction of the total number of $69!\approx 1.7\times 10^{98}$ orderings is sampled. Hence, only a tiny fraction of all possible orderings is needed to obtain consistent results.

The top-10 contains members like Mohamed Atta (pilot of flight 11, WTC North), Hani Hanjour (pilot of flight 77, Pentagon), Khalid Almihdhar and Nawaf Alhazmi (both hijackers of flight 77). Zacarias Moussaoui, who was arrested before the WTC attacks took place and became known as ‘the 20-th hijacker’, is also present in the top-10. Besides these pilots and hijackers, members like Essid Sami Ben Khemais, a former head of operations for Al-Qaeda in Italy, and Djamal Beghal can be found in the top-10 of the ranking. Both have been identified by US government of plotting attacks on US embassies world wide. Figure 5 shows their connections to the members involved in the WTC 9/11 attack.

In this application we have only considered the structure of the WTC network. A big advantage of game theoretic measures is that they, in contrast to standard centrality measures, can also be applied when additional information on nodes and relationships is present, see Lindelauf et al. (2013) (note that there do exist some non-game theoretic measures that consider more than only network structure to determine centrality of nodes, e.g., diffusion centrality, c.f. Kang et al. 2016). Using a game theoretic measure in our application we could for example model Al-Qaeda’s network as a weighted connectivity game. Our structured random sampling method could then approximate the Shapley value of each member resulting in a ranking of the members in the WTC network taking into account both compositional and structural variables. Comparing this ranking to rankings obtained by standard centrality measures (that only consider network structure) may lead to valuable insights in dismantling such networks.

6 Conclusions

The Shapley value is one of the most prominent one-point solution concepts in cooperative game theory. In general, however, its computational complexity is exponential in the number of players. There are numerous real-world applications in which the number of players in the game is too large to calculate the Shapley value, warranting methodology to approximate the Shapley value. Next to the already existing heuristic of random sampling we presented a structured random sampling approach to approximate the Shapley value and compared it to the random sampling procedure. Our simulations showed that the errors in the approximations using structured random sampling are substantially smaller than the errors using random sampling, becoming more prominent when the number of orderings is small or the number of players increases. Average computation times of the structured random sampling method were only slightly larger than the random sampling method, making structured random sampling a preferred choice when approximating the Shapley value. We applied our structured random sampling method on a connectivity game for Al-Qaeda’s extended network of the WTC 9/11 attack to identify the key players in this network, which previously was impossible due to the size of the WTC network.

Acknowledgements

The authors wish to thank Valdis Krebs for providing the network of the WTC 9/11 attack and the anonymous referees of SNAM for their valuable comments on an earlier version of this manuscript.

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

previous article Learning from noisy label proportions for classifying online social data

next article Incremental eigenpair computation for graph Laplacian matrices: theory and applications

Bachrach Y, Markakis E, Resnick E, Procaccia AD, Rosenschein JS, Saberi A (2010) Approximating power indices: theoretical and empirical analysis. Auton Agents Multi Agent Syst 20(2):105–122CrossRef

Castro J, Gómez D, Tejada J (2009) Polynomial calculation of the shapley value based on sampling. Comput Oper Res 36(5):1726–1730MathSciNetMATHCrossRef

Curiel I, Pederzoli G, Tijs S (1989) Sequencing games. Eur J Oper Res 40(3):344–351MathSciNetMATHCrossRef

Deidda D, Andreu J, Perez MA, Sechi GM, Zucca R, Zuddas P (2009) A cooperative game theory approach to water pricing in a complex water resource system. In: Proceedings of the 18th world IMACS/MODSIM congress, Cairnes, Australia

Deng X, Papadimitriou CH (1994) On the complexity of cooperative solution concepts. Math Oper Res 19:257–266MathSciNetMATHCrossRef

Faigle U, Kern W (1992) The Shapley value for cooperative games under precedence constraints. Int J Game Theory 21(3):249–266MathSciNetMATHCrossRef

Fatima SS, Wooldridge M, Jennings NR (2008) A linear approximation method for the Shapley value. Artif Intell 172(14):1673–1699MathSciNetMATHCrossRef

Kang C, Kraus S, Molinaro C, Spezzano F, Subrahmanian VS (2016) Diffusion centrality: a paradigm to maximize spread in social networks. Artif Intell 239:70–96MathSciNetMATHCrossRef

Krebs VE (2002) Mapping networks of terrorist cells. Connections 24(3):43–52

Leech D (2003) Computing power indices for large voting games. Manag Sci 49(6):831–837MATHCrossRef

Liben-Nowell D, Sharp A, Wexler T, Woods K (2012) Computing Shapley value in supermodular coalitional games. In: Gudmundsson J, Mestre J, Viglas T (eds) Computing and combinatorics, vol 7434 of lecture notes in computer science. Berlin, Springer, pp 568–579

Lindelauf R, Hamers H, Husslage B (2013) Cooperative game theoretic centrality analysis of terrorist networks: the cases of Jemaah Islamiyah and Al Qaeda. Eur J Oper Res 229(1):230–238MathSciNetMATHCrossRef

Littlechild SC, Owen G (1973) A simple expression for the Shapley value in a special case. Manag Sci 20(3):370–372MATHCrossRef

Mann I, Shapley LS (1960) Values of large games, IV: evaluating the electoral college by montecarlo techniques. Technical report, RAND Corporation

Michalak T, Rahwan T, Szczepanski PL, Skibski O, Narayanam R, Wooldridge M, Jennings NR (2013) Computational analysis of connectivity games with applications to the investigation of terrorist networks. In: Proceedings 23rd international joint conference on AI (IJCAI). AAAI Press/international joint conferences on artificial intelligence, pp 293–301

Moretti S, Patrone F (2008) Transversality of the Shapley value. TOP 16(1):1–41MathSciNetMATHCrossRef

Moretti S, Patrone F, Bonassi S (2007) The class of microarray games and the relevance index for genes. TOP 15(2):256–280MathSciNetMATHCrossRef

Narayanam R, Narahari Y (2008) Determining the top-k nodes in social networks using the Shapley value. In: Padgham L, Parkes DC, Mller JP, Parsons S (eds) AAMAS, vol 3. IFAAMAS, pp 1509–1512

Narayanam R, Narahari Y (2011) A Shapley value-based approach to discover influential nodes in social networks. IEEE Trans Autom Sci Eng 8(1):130–147CrossRef

Owen G (1972) Multilinear extensions of games. Manag Sci 18(5):P64–P79MathSciNetCrossRef

Roth Alvin E (ed) (1988) The Shapley value: essays in honor of Lloyd S. Shapley. Cambridge University Press, CambridgeMATH

Schmeidler D (1969) The nucleolus of a characteristic function game. J Appl Math 17:1163–1170MathSciNetMATH

Shapley LS (1953) A value for n-person games. Contribution to the theory of games. Ann Math Stud 2:28

Shapley LS, Shubik M (1954) A method for evaluating the distribution of power in a committee system. Am Polit Sci Rev 48:787–792CrossRef

Tijs S (1981) Bounds of the core of a game and the $\tau$-value. In: Moeschlin O, Pallaschke D (eds) Game theory and mathematical economics. North-Holland Publishing Company, Amsterdam, pp 123–132

Young HP (1985) Monotonic solutions of cooperative games. Int J Game Theory 14(2):65–72MathSciNetMATHCrossRef

Zlotkin G, Rosenschein JS (1994) Coalition, cryptography, and stability: mechanisms for coalition formation in task oriented domains. In: Proceedings of the twelfth national conference on artificial intelligence, vol 1, AAAI ’94, Menlo Park, CA, USA. American association for artificial intelligence, pp 432–437

Title: A new approximation method for the Shapley value applied to the WTC 9/11 terrorist attack
Authors: Tjeerd van Campen
Herbert Hamers
Bart Husslage
Roy Lindelauf
Publication date: 01-12-2018
Publisher: Springer Vienna
Published in: Social Network Analysis and Mining / Issue 1/2018
Print ISSN: 1869-5450
Electronic ISSN: 1869-5469
DOI: https://doi.org/10.1007/s13278-017-0480-z

S	\(\{1\}\)	\(\{1,2\}\)	\(\{1,3\}\)	\(\{2,3\}\)	\(\{1,2,3\}\)
\(u_{\{1\}}\)	1	1	1	0	1
\(u_{\{1,2\}}\)	0	1	0	0	1
\(u_{\{2,3\}}\)	0	0	0	1	1

\(\sigma\)	\(m_v^{\sigma }(1)\)	\(m_v^{\sigma }(2)\)	\(m_v^{\sigma }(3)\)
(1, 2, 3)	1	4	5
(1, 3, 2)	1	3	6
(3, 1, 2)	7	3	0

Springer Professional

A new approximation method for the Shapley value applied to the WTC 9/11 terrorist attack

Abstract

1 Introduction

2 The Shapley value

3 Approximation methods

3.1 Random sampling

3.2 Structured random sampling

4 Performance analysis

4.1 Number of orderings

4.2 Number of players

4.3 A note on computation times

4.4 Benchmark to weighted majority games

5 Application: Al-Qaeda’s network of the WTC 9/11 attack

6 Conclusions

Acknowledgements

Premium Partner

Springer Professional

Abstract

1 Introduction

2 The Shapley value

3 Approximation methods

3.1 Random sampling

3.2 Structured random sampling

4 Performance analysis

4.1 Number of orderings

4.2 Number of players

4.3 A note on computation times

4.4 Benchmark to weighted majority games

5 Application: Al-Qaeda’s network of the WTC 9/11 attack

6 Conclusions

Acknowledgements

Other articles of this Issue 1/2018

BullyBlocker: toward an interdisciplinary approach to identify cyberbullying

Analysis of social interaction network properties and growth on Twitter

Estimating degree rank in complex networks

The difference between optimal and germane communities

Profiling Web users using big data

Toward maximizing the visibility of content in social media brand pages: a temporal analysis

Premium Partner