Top

Knowledge and Information Systems

Published in:

Open Access 08-01-2022 | Regular Paper

Computing top-k temporal closeness in temporal networks

Authors: Lutz Oettershagen, Petra Mutzel

Published in: Knowledge and Information Systems | Issue 2/2022

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Patentsearch

Off

Abstract

The closeness centrality of a vertex in a classical static graph is the reciprocal of the sum of the distances to all other vertices. However, networks are often dynamic and change over time. Temporal distances take these dynamics into account. In this work, we consider the harmonic temporal closeness with respect to the shortest duration distance. We introduce an efficient algorithm for computing the exact top-k temporal closeness values and the corresponding vertices. The algorithm can be generalized to the task of computing all closeness values. Furthermore, we derive heuristic modifications that perform well on real-world data sets and drastically reduce the running times. For the case that edge traversal takes an equal amount of time for all edges, we lift two approximation algorithms to the temporal domain. The algorithms approximate the transitive closure of a temporal graph (which is an essential ingredient for the top-k algorithm) and the temporal closeness for all vertices, respectively, with high probability. We experimentally evaluate all our new approaches on real-world data sets and show that they lead to drastically reduced running times while keeping high quality in many cases. Moreover, we demonstrate that the top-k temporal and static closeness vertex sets differ quite largely in the considered temporal networks.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Introduction

Centrality measures are a cornerstone of social network analyses. One of the most popular and well-researched centrality measures is closeness, first introduced by Bavelas [1]. In a static and undirected graph, the closeness of a vertex is the inverse of the sum of the smallest distances to the other vertices of the network. However, many real-world networks are temporal, e.g., in a social network, persons only interact at specific points in time. Recently, the analyses of dynamic networks, or temporal graphs, have gained increasing attention [5, 18, 26, 32, 40]. Here, a temporal network consists of a set of vertices and a set of temporal edges. Each temporal edge is only available at a specific discrete point in time (called the availability time), and edge traversal costs a strictly positive amount of time (called the transition time). Important examples are human and social contact networks, communication networks, and transportation networks. The transition times in transportation networks correspond to the times needed for spatial movements, e.g., the duration of flights or the travel time of buses between the bus stops. For communication networks, the transition times depend on the type of communication, e.g., email communication uses unit transition times. In a temporal network modeling phone usage, the transition time of an edge may be the duration of a phone call between two users. In the case of unit transition times, a temporal graph can be seen as a sequence of static graphs over a fixed set of vertices and edges that evolve over time. Figure 1a shows a temporal graph, where the edges are labeled with the time stamps of their existence. Figure 1c shows the representation as a sequence of static graphs over time. The temporal properties direct any possible flow of information in the network. For example, a dissemination process on the network, such as the spread of rumors, fake news, or diseases, has to respect the forward flow of time. Hence, a meaningful adaption of a conventional path is the time-respecting temporal path. In a temporal path, at each vertex, the time stamp of the next edge of the path cannot be earlier than the arrival time at the vertex [7, 18, 52]. Modifications of static closeness to temporal closeness using temporal path and temporal distances have been suggested [11, 28, 41, 45, 49, 52]. One of these temporal distances is the shortest duration, i.e., the duration of a fastest path between two nodes. Here, we consider the harmonic temporal closeness of a vertex. It is defined as the sum of the reciprocals of the durations of the fastest paths to all other vertices. Furthermore, we define the harmonic temporal in-closeness as the sum of the reciprocals of the durations of the fastest paths arriving at a vertex. Harmonic closeness for non-connected static graphs was introduced in [31]. The reason for using the harmonic variant for temporal closeness and in-closeness is that reachability between vertices, even in an undirected temporal graph, is restricted. It has been shown that in networks modeling dissemination processes, like the spread of viruses or fake news, a vertex with a high temporal closeness can be expected to be of high importance for the transportation or dissemination process [48, 49]. In general, high temporal closeness vertices in temporal networks can differ from high static closeness and high degree vertices. For example, in the temporal graph shown in Fig. 1a, traversing an edge takes one time unit for all edges. Figure 1b shows the aggregated static graph, in which static edges replace the temporal edges. In this static graph, the most central vertex in terms of static closeness is a. It also has the most outgoing edges. However, in the temporal graph, the vertex with the highest temporal closeness is vertex e. Notice that vertex a can only reach its direct neighbors in the temporal graph, while vertex e can reach all other vertices. Computing the exact temporal closeness values of all vertices can be costly because it demands finding a fastest temporal path between each pair of vertices. However, often knowing the top-k important vertices is sufficient. Our contributions are:

We propose an algorithm for computing the top-k harmonic temporal closeness values and the corresponding vertices in a temporal network. This algorithm can be simplified for the task of computing the closeness values of all vertices. The algorithms are based on a new minimum duration path algorithm on temporal graphs.

We derive two heuristics for the temporal closeness of all and the top-k vertices that, in our experimental evaluation, improve running times significantly and introduce only small errors on all our data sets.

For temporal graphs with equal transition times, we introduce the temporal transpose, which is a generalization of reversing the edges in a directed static graph. This transformation enables our algorithms to compute the temporal in-closeness. Furthermore, using the temporal transpose, we adapt a stochastic approximation for the transitive closure of temporal graphs. The transitive closure is essential for the top-k algorithm. This approximation speeds up the running time of the top-k algorithm substantially. Moreover, we adapt a stochastic sampling algorithm for approximating the temporal closeness of all vertices directly.

We comprehensively evaluate our algorithms using real-world temporal networks. As a baseline, we use a temporal closeness algorithm based on the edge stream fastest path algorithm introduced by Wu et al. [52]. Our approaches decrease the running times for all data sets significantly.

Differences to the conference version The main differences and additions to the conference version [38] are the following. We added the definition of temporal in-closeness. Our new results show that with the temporal transpose, our algorithms can efficiently compute the temporal in-closeness. Furthermore, we introduce two new heuristics with detailed descriptions and analyses. Our experimental evaluation for the two new heuristics shows better results for both of them compared to the one in the conference version. Finally, we extended our experiments section with a comparison between temporal closeness, static closeness, degree centrality, and reachability.

Comprehensive overviews and introductions of centrality approaches, including closeness centrality, are provided, e.g., in [12, 27, 43, 46]. The problem of ranking vertices and selecting the k most influential vertices has been widely studied, see, e.g., [3, 53, 54]. Bergamini et al. [3] present algorithms for computing the top-k closeness in unweighted static graphs. They start a breadth-first search (BFS) from each vertex in order to find the shortest distance to all other vertices. In each BFS run, they calculate an upper bound for the closeness of the current vertex. If the current vertex cannot have a top-k closeness value, their algorithm stops the computation early. We adapt this strategy for temporal closeness in Sect. 4. Our algorithm differs by computing the fastest temporal paths, allowing variable transition times, and using corresponding upper bounds. A simple adaption of the static closeness algorithm from [3] would be impossible because it does not take the temporal features, like temporal edges with transition times or waiting times at vertices, into account. In [4] Bisenius et al. extend the same framework to dynamic graphs in which edges can be added and removed. It allows efficient updates of the static closeness after edge insertions or deletions. However, their algorithm works for static closeness using unweighted static shortest paths without considering edge availability or transition times.

Eppstein and Wang [13] proposed a randomized approximation algorithm for closeness in weighted undirected graphs. It approximates the closeness of all vertices with a small additive error with high probability. Their algorithm also is not directly applicable to the temporal case. In Sect. 5, we introduce a transformation of temporal graphs to an inverse representation such that the algorithm can be used to approximate temporal closeness. Based on [13], Okamoto et al. [39] introduced an algorithm for approximating the top-k closeness. First, they use an approximation to find a candidate set of vertices. Next, they rank the top-k vertices with high probability. Cohen et al. [9] combined the sampling approach and a pivoting technique for approximating the closeness of all vertices.

An early overview of different dynamic graph models is given by Harary and Gupta [15]. More recent and comprehensive introductions to temporal graphs, including overviews of different temporal centrality measures, are provided in, e.g., [18, 28, 45]. Michail [33] gives an algorithmic oriented introduction to temporal graphs. The early work on fastest paths by Cooke and Halsey [10] suggests to iteratively compute T distance matrices for each vertex, where T is a time-dependent parameter chosen beforehand. Kempe et al. [21] discuss time-respecting paths and related connectivity problems. Xuan et al. [7] introduce algorithms for finding the shortest, fastest, and earliest arrival paths, which are generalizations of Dijkstra’s shortest paths algorithm. Their fastest path algorithm only supports unit traversal times for all edges and enumerates all the earliest arrival paths for each possible starting time. In [19], the authors define variants of temporal graph traversals. A depth-first search (DFS) variant can be used to construct a DFS tree starting at a vertex v, containing information about the fastest paths from v to the other vertices. The authors of [36] consider bicriteria temporal paths in weighted temporal graphs, where each edge has an additional cost value. They propose an algorithm that enumerates all efficient paths with polynomial-time delay.

Wu et al. [52] introduce streaming algorithms operating on the temporal edge stream in which the edges arrive with non-decreasing time stamps. They discuss finding fastest, shortest, latest departure, and earliest arrival paths and suggest using their algorithms to compute temporal closeness. We use their state-of-the-art streaming algorithms for the fastest paths and temporal closeness in the baseline in our experiments. In Sect. 4.5, we further discuss the differences to our algorithms.

For temporal graphs with unit traversal times, Crescenzi et al. [11] introduce a temporal closeness variant based on the durations of earliest arrival paths by integrating over all starting times in a given time interval. For the exact closeness computation, they use a variant of the edge streaming algorithm. For finding the top-k closeness vertices, the authors define a backward version of their algorithm and provide a variant of Eppstein and Wangs [13] algorithm for closeness approximation. After approximating the closeness of all vertices, the vertices are ranked according to the approximated closeness. Finally, the exact closeness is computed for the $K>k$ highest-ranked vertices in order to obtain a final top-k ranking. In [50], the authors introduce shortest-fastest paths as a combination of the conventional distance and shortest duration. They extend Brandes’ algorithm [6] for computing betweenness centrality in temporal graphs. There are works on temporal walk-based centrality measures, e.g., the authors of [2] adapt the walk-based Katz centrality to temporal graphs, and Rozenshtein and Gionis [44] give a temporal PageRank variant. Nicosia et al. [37], and Kim and Anderson [25] examine a wide range of properties of temporal graphs, including various temporal centrality measures. In [48] and [49], the authors compare temporal distance metrics and temporal centrality measures to their static counterparts. They reveal that the temporal versions for analyzing temporal graphs have advantages over static approaches on the aggregated graphs.

3 Preliminaries

A directed temporal graph $\mathbf {G}=(V, \mathbf {E})$ consists of a finite set of vertices V and a finite set $\mathbf {E}$ of directed temporal edges $e=(u,v,t,\lambda )$ with u and v in V, $u\ne v$, availability time (or time stamp) $t \in {\mathbb {N}}$ and transition time (or traversal time) $\lambda \in {\mathbb {N}}$. The transition time of an edge denotes the time required to traverse the edge. We only consider directed temporal graphs—it is possible to model undirectedness using a forward- and a backward-directed edge with equal time stamps and traversal times for each undirected edge. Notice that in a temporal graph, the number of edges is not polynomially bounded by the number of vertices. Given a temporal graph $\mathbf {G}$, removing all time stamps and traversal times, and merging resulting multi-edges, we obtain the aggregated graph $A(\mathbf {G})=(V,E_s)$ with $E_s=\{(u,v)\mid (u,v,t,\lambda )\in \mathbf {E}\}$. For a directed temporal graph $\mathbf {G}=(V,\mathbf {E})$ and a time interval $I=[\alpha , \beta ]$ with $\alpha ,\beta \in {\mathbb {N}}$ and $\alpha \le \beta $, we define the temporal subgraph $\mathbf {G}^I=(V,\mathbf {E}^I)$ with $\mathbf {E}^I=\{(u,v,t,\lambda )\in \mathbf {E}\mid t\ge \alpha \text { and } t+\lambda \le \beta \}$. For $u\in V$ let $\tau _+^I(u)=\{t\mid (u,v,t,\lambda )\in \mathbf {E}^I\}$ and $\tau _-^I(u)=\{t+\lambda \mid (v,u,t,\lambda )\in \mathbf {E}^I\}$. Furthermore, let ${\hat{\tau }}_+^I=\max \{|\tau ^I_+(u)|\mid u\in V\}$, and ${\hat{\tau }}_-^I=\max \{|\tau _-^I(u)|\mid u\in V\}$, i.e., the largest numbers of different starting or arrival times at any $v\in V$. Finally, let ${\mathcal {T}}(\mathbf {G})$ be the set of all availability times in $\mathbf {G}$.

Figure 2a shows an example of a temporal graph $\mathbf {G}$ with ${\mathcal {T}}(\mathbf {G})=\{1,2,5,6,7,8\}$. At each edge the availability and transition time is shown. Notice that $\mathbf {G}=\mathbf {G}^I$ for $I=[1,12]$, ${\hat{\tau }}_+^I=\max \{1,2,3\}=3$ and ${\hat{\tau }}_-^I=\max \{0,1,2\}=2$. Restricting the temporal graph to the interval $J=[5,9]$ leads to the temporal subgraph shown in Figure 2b. For $\mathbf {G}^{J}$ it is ${\mathcal {T}}(\mathbf {G}^{J})=\{5,6,7\}$, ${\hat{\tau }}_+^{J}=1$ and ${\hat{\tau }}_-^{J}=1$.

3.1 Temporal paths

A temporal path P between vertices $v_1,v_{\ell +1}\in V$ of length $\ell $ is an alternating sequence of vertices and temporal edges

$$\begin{aligned}\left( v_1,e_1=(v_1,v_2,t_1,\lambda _1),v_2,\dots ,e_\ell =(v_\ell ,v_{\ell +1},t_\ell ,\lambda _\ell ),v_{\ell +1}\right) \end{aligned}$$

such that $e_i\in \mathbf {E}$, each vertex in P is visited exactly once, and $t_{i}+\lambda _i \le t_{i+1}$ for $1\le i <\ell $. For notational convenience, we sometimes omit edges or vertices. The starting time of P is $s(P)=t_1$, the arrival time is $a(P)=t_\ell +\lambda _\ell $, and the duration is $d(P)=a(P)-s(P)$. A fastest path is a path with minimal duration. We say vertex v is reachable from vertex u if there exists a temporal (u, v)-path. In contrast to classical static graphs, even undirected temporal graphs are, in general, not strongly connected and have non-symmetric and limited reachability between nodes with respect to temporal paths. Paths and reachability can be restricted to a time interval I and the corresponding subgraph $\mathbf {G}^I$. Let $R^I(u)$ be the set of temporally reachable vertices from u in the time interval I, i.e., the set of vertices that can be reached from u using a temporal path in $\mathbf {G}^I$. We define $r^I(u)=|R^I(u)|$.

For example, in Fig. 2a there are three paths between vertices a and d. The first one consists of only the edge $P_1=((a,d,1,5))$ and has a duration of $d(P_1)=6-1=5$. The second paths is $P_2=((a,b,2,1),(b,d,7,2))$ with duration $d(P_2)=9-2=7$. And lastly, path three $P_3=((a,b,5,2),(b,d,7,2))$ with duration $d(P_3)=9-5=4$. Therefore, $P_3$ is the only fastest (a, d)-path. Notice that the subpath $P'_3=((a,b,5,2))$ is not a fastest (a, b)-path. The only fastest (a, b)-path consists of only edge (a, b, 2, 1) and has a duration of one. Furthermore, notice the restricted reachability due to the temporal constraints. For example vertex b is able to reach vertex d, and vertex d can reach vertex c. However, the path from b to d has an arrival time of nine and is too late to be continued with one of the outgoing edges at d, which start at six and eight, respectively.

3.2 Harmonic temporal closeness

For a temporal graph $\mathbf {G}=(V,\mathbf {E})$ and a time interval I, let ${\mathcal {P}}_{uv}^I$ be the set of all temporal paths between $u,v \in V$ in $\mathbf {G}^I$. We define the shortest duration between u and v as $d^I(u,v)=\min _{P\in {\mathcal {P}}_{uv}^I}(d(P))$. If v is not reachable from vertex u during the interval I, we set $d^I(u,v)=\infty $, and we define $\frac{1}{\infty }=0$. Due to the restricted reachability in temporal graphs, we use the harmonic variant of temporal closeness. Marchiori and Latora [31] introduced harmonic closeness in static graphs.

Definition 1

(Harmonic Temporal Closeness) Let $\mathbf {G}=(V,\mathbf {E})$ be a temporal graph and I a time interval. We define the harmonic temporal closeness for $u\in V$ with respect to I as $ c^I(u) = \sum _{v\in V\setminus \{u\}}\frac{1}{d^I(u,v)}$. We call $c^I_n (u) = \frac{c^I(u)}{|V|}$ the normalized harmonic temporal closeness.

Now we define the top-k temporal closeness problem.

Definition 2

For a temporal graph $\mathbf {G}=(V,\mathbf {E})$ and $k\in {\mathbb {N}}$, the Top-k Harmonic Temporal Closeness Problem asks for the k largest values of the harmonic temporal closeness and the set of all vertices in V with these values.

Notice that the temporal closeness can be defined using either the outgoing or incoming fastest paths, i.e., using $d^I(u,v)$ or $d^I(v,u)$. In the first case, the temporal closeness describes how well vertex u can reach the other vertices in terms of duration. The second case describes how well the other vertices can reach vertex u. Therefore, analogous to the harmonic temporal closeness, we define the harmonic temporal in-closeness.

Definition 3

(Harmonic Temporal in-Closeness) Let $\mathbf {G}=(V,\mathbf {E})$ be a temporal graph and I a time interval. We define the harmonic temporal in-closeness for $u\in V$ with respect to I as $ c^I_{in}(u) = \sum _{v\in V\setminus \{u\}}\frac{1}{d^I(v,u)}$.

In Sect. 5, we introduce the temporal transpose of a temporal graph that enables our temporal closeness algorithms to calculate the harmonic temporal in-closeness in case that all edges have equal transition times.

In the following, we often drop the word harmonic and call the problem temporal closeness.

4 Algorithms for temporal closeness

First, we present a new fastest path algorithm, which we then use for the top-k temporal closeness computation. The new algorithm is tailored to be part of our top-k algorithm and operates on the adjacency list representation of the temporal graph, i.e., for each vertex, all out-going edges are in a list.

4.1 A label setting fastest path algorithm

Recall that, in general, a sub-path of a fastest path is not necessarily a fastest path. We deal with this problem by using a label setting algorithm that finds the fastest paths from a start vertex u to all other vertices during a time interval I. The algorithm uses labels, where each label $l=(v,s,a)$ represents a (u, v)-path that starts at time s at vertex u and arrives at time a at vertex v. For each vertex $v\in V$, the algorithm keeps all such labels in a list $\Pi [v]$ and uses a dominance check when a new label is created to remove labels that cannot lead to optimal paths. We use the dominance relation, which is also used in [52].

Definition 4

(Dominance) We say label $(v,s',a')$ dominates label (v, s, a) if $s< s'$ and $a \ge a'$, or $s= s'$ and $a> a'$.

A non-dominated label does not necessarily represent a fastest path. However, it might represent a prefix-path of a fastest path. On the other hand, a dominated label cannot represent a fastest path or a prefix-path of a fastest path. Therefore, all dominated labels can be deleted. Using a dominance check, Algorithm 1 only keeps labels that may lead to a fastest path. In the case of two equal labels, we only need to keep one. Besides the label lists $\Pi [v]$, we use a priority queue ${\mathcal {Q}}$ containing all labels which still need to be processed. In each iteration of the while loop, we get the label (v, s, a) with the smallest duration $a-s$ from the priority queue (line 5). At this point, if the algorithm discovers v for the first time, the shortest duration $d^I(u, v)$ is found.

Lemma 1

Proof

The duration is strictly increasing in the length of a path because we have strictly positive transition times, i.e., using an edge takes at least one time step. We use induction over the iteration i of the while loop (line 4). The first initial label has duration $a-s=0$. Now assume the hypothesis holds after the i-th iteration, for an $i\ge 1$. In the $(i+1)$-th iteration, first the label $l_{i+1}$ with the currently shortest duration is returned from the priority queue. The inner for loop (line 9) iterates over all outgoing-edges at w and a new label might be added to the priority queue for each possible extension of the (u, w)-path. The duration of all newly inserted labels is larger than the one of $l_{i+1}$ and therefore also larger then the duration of any label that was already in the queue when $l_{i+1}$ was returned. However, labels newly inserted during one iteration of the while loop may have equal duration. $\square $

Lemma 2

If during the iterations of the while loop a vertex v is inserted in F (line 6 ff.), the duration $d[v]=a-s$ is equal to the duration of a fastest (u, v)-path.

Proof

Vertex v is added to F when label $l=(v,s,a)$ is returned from the priority queue and if $v\not \in F$. After a vertex has been added to F, it remains in F. Therefore, l is the first label that is processed for vertex v. Due to Lemma 1, it follows that the durations of the processed labels are non-decreasing. Consequently, $a-s$ is the shortest duration of a fastest path from u to v. $\square $

The following theorem states the correctness and asymptotic running time of Algorithm 1.

Theorem 1

Let $\mathbf {G}=(V,\mathbf {E})$ be a temporal graph, $u\in V$, and I a time interval. And, let $\delta _m^+$ be the maximal out-degree in $\mathbf {G}$, $\pi =\min \{{\hat{\tau }}_-^I, |\tau ^I_+(u)|\}$ and $\xi =\max \{\pi \delta _m^+,\log (|\mathbf {E}^I|)\}$. Algorithm 1 returns the durations of the fastest (u, v)-paths for all $v\in V$ in $\mathbf {G}^I$ in running time ${\mathcal {O}}(|V|+|\mathbf {E}|\cdot |\tau ^I_+(u)|\cdot \xi )$.

Proof

Let $w\in R^I(u)$ be a temporally reachable vertex. There has to be at least one, or the temporal closeness of u is 0. Because w is reachable, we know that there exists a fastest temporal (u, w)-path $P=(v_1=u, e_1, \ldots , e_{\ell -1}, v_\ell =w)$. By induction over the length $\ell $, we show that for each prefix-path a corresponding label is generated. For the prefix-path of length $h=0$, we have the initial label at vertex u. We assume that for some $h\ge 0$ the hypothesis holds. Now, for the case $h+1$, we have the prefix-path $P_h=(v_1=u, e_1, \ldots , e_h, v_{h+1})$ which consists of $(v_1=u, e_1,\ldots , v_{h})$ and edge $e_h$ arriving at vertex $v_{h+1}$. By applying the induction hypothesis, a label $(v_h,s',a')$ for $P_h$ was created and added to ${\mathcal {Q}}$.

The only way the label could get deleted without being processed is by being dominated by a label $(v_h,s'',a'')$. Without loss of generality, we can assume that not both $s'=s''$ and $a'\ge a''$ holds (in this case we consider the path represented by $(v_h,s'',a'')$). Assume that $s'<s''$ and $a'\ge a''$ then P would not be a fastest temporal path. Therefore, the label does not get deleted from ${\mathcal {Q}}$ due to dominance checks.

However, eventually, it gets processed in some iteration, and a label for $P_{h+1}$ is added to the priority queue. The label representing $P_{h+1}$ cannot be dominated for similar reasons as for the label for $P_h$. Hence, at some point, vertex w is reached for the first time and added to the set F. With Lemma 2, it follows that a fastest temporal path has been found, and d[w] is set to the shortest duration.

The total number of labels is upper bounded by $|\mathbf {E}^I|\cdot |\tau ^I_+(u)|$ and ${\mathcal {Q}}$ will eventually be empty and the algorithm terminates.

For the running time, we have the following considerations. Initialization is done in ${\mathcal {O}}(|V|)$. The while loop (line 4) is called for every label (v, s, a), and the inner for loop (line 9) is called for each out-going edge at vertex v in $\mathbf {G}$. During the inner for loop, a new label is created and compared to all labels at vertex w for the dominance check. Due to the dominance check, for each possible arrival time, we may only keep the label with the latest starting time and no equal labels. And, for two labels with equal starting times, we only keep the one with the earlier arrival time. Therefore, the number of labels at a vertex w is less or equal to the maximum of the number of different arrival times at any $v\in V$ and different starting times at u, i.e., the size of $\Pi [w]$ is at most $\pi $. The domination checks for all outgoing edges (line 9 ff.) can be done in $\pi \delta _m^+$. The total number of labels generated is less or equal to $|\mathbf {E}^I|\cdot |\tau ^I_+(u)|$. However, at any time there are at most $|\mathbf {E}^I|$ labels in the priority queue, because we only keep the label with the latest starting time per edge. Therefore, the cost for extracting and deleting a label from the priority queue is amortized ${\mathcal {O}}(\log (|\mathbf {E}^I|))$ using a Fibonacci heap, and inserting is possible in constant time. Because we have to check for each edge in $e\in \mathbf {E}$ if e is in the time interval I, the total running time is in ${\mathcal {O}}(|V|+|\mathbf {E}|+|\mathbf {E}^I|\cdot |\tau ^I_+(u)|\cdot \xi )={\mathcal {O}}(|V|+|\mathbf {E}|\cdot |\tau ^I_+(u)|\cdot \xi )$. $\square $

4.2 Computing Top-k temporal closeness

Based on the fastest path algorithm presented in the previous section, we now introduce the top-k temporal closeness algorithm. Given a temporal graph $\mathbf {G}$ and a time interval I, Algorithm 2 computes the exact top-k temporal closeness values and the corresponding vertices in $\mathbf {G}^I$. If vertices share a top-k temporal closeness value, the algorithm finds all of these vertices. We adapt the pruning framework introduced by Bergamini et al. [3] for the temporal closeness case. In contrast to their work, we compute the fastest paths in temporal graphs instead of BFS in static graphs and introduce upper bounds that take temporal aspects like transition and waiting times into account. We iteratively run Algorithm 1 for each $u\in V$. During the computations, we determine upper bounds of the closeness and stop the closeness computation of u if we are sure that the closeness of u cannot be in the set of the top-k values. For calculating the upper bounds, we use two pairwise disjunctive subsets of the vertices that get updated every iteration, i.e., $F_i(u)\cup T_i(u) \subseteq V$, with

$F_i(u)$ contains u and all vertices for which we already computed the exact temporal distance from u, and

$T_i(u)$ contains all vertices w for which there is an edge $e=(v,w,t,\lambda )\in \mathbf {E}$ with $v \in F_i(u)$ and $w \not \in F_i(u)$.

Together with the set $R^I(u)$ of reachable vertices, which does not change during the algorithm, we obtain the upper bound $\bar{c}_i^I(u)$ of the temporal closeness of vertex u in iteration i as

$$\begin{aligned} \bar{c}_i^I(u)=&\sum _{v\in F_i(u)}\frac{1}{d^I(u,v)} + \sum _{v\in T_i(u)}\frac{1}{\text {lower bound for }d^I(u,v)} \\ {}&+ \frac{|R^I(u)|-|F_i(u)|-|T_i(u)|}{\text {common lower bound }d^I\text { for all }v\in R^I(u) \setminus (F_i(u)\cup T_i(u))}\text {.} \end{aligned}$$

In the following, we introduce upper bounds for the temporal closeness, such that we can use Algorithm 1 to compute the duration $c^I(u)$ exactly or stop the computation if the vertex cannot achieve a top-k temporal closeness value. Algorithm 2 has as input a temporal graph $\mathbf {G}$, time interval I, and the number of reachable vertices $r^I(u)$ for all $u\in V$. Algorithms for computing the number of reachable vertices are discussed in Sect. 4.4. Algorithm 2 first removes all edges that are not leaving or arriving during the time interval I and proceeds on $\mathbf {G}^I$. Next, the algorithm orders the vertices by decreasing number of out-going edges and starts the computation of the closeness for each vertex $v_j$ in the determined order. The intuition behind processing the vertices by decreasing out-degree is that a vertex with many out-going edges can reach many other vertices fast. The processing order of the vertices is crucial in order to stop the computations for as many vertices as soon as possible. After ordering the vertices, for each $v_j$, the shortest durations from $v_j$ to the reachable vertices are computed using the adapted version of Algorithm 1. In each iteration of the while loop, the algorithm extracts the label (v, s, a) with the smallest duration $a-s$ from the priority queue, and if v is still in $T_{i-1}(v_j)$, then it found the shortest duration $d^I(v_j, v)$.

Let $(l_1=(w_1, s_1,a_1), \ldots , l_p=(w_p, s_p,a_p))$ be the sequence of labels returned by the extractMin call of the priority queue. Then the durations are non-decreasing, i.e., $a_i-s_i\le a_h-s_h$ for $1\le i < h \le p$. And, in iteration i, when reaching line 27, the label $l_{i+1}$ that is returned from the priority queue in iteration $i+1$ is determined. It holds that $d_{i+1}=a_{i+1}-s_{i+1}$ is the next possible duration to any reachable vertex that is not yet in $F_i(v_j)$. Now, let $\Delta $ be the minimal temporal waiting time at a vertex w over all vertices, i.e., $\Delta =\min _{w\in V}\{t_b-(t_a+\lambda _a)\mid (x,w,t_a,\lambda _a), (w,y,t_b,\lambda _b)\in \mathbf {E}^I \text { and } t_a+\lambda _a \le t_b \}$. Furthermore, let $\lambda _{min}$ the smallest transition time in $\mathbf {E}^I$.

Lemma 3

In Algorithm 2, during the inner while loop in line 27, for each vertex $w \in T_i(v_j)$, the duration $d(v_j, w)$ is lower bounded by $d_{i+1}$. And, for each vertex $r \in R^I(v_j)\setminus (F_i(v_j)\cup T_i(v_j))$, it is $d(v_j,r)\ge d_{i+1}+\Delta +\lambda _{min}$.

Proof

Let $w \in T_i(v_j)$. Then its duration is not found yet. The next label $l_{i+1}$ provides the next possible duration any reachable vertex can have with $d_{i+1}=a_{i+1}-s_{i+1}$. Moreover, for any reachable vertex z that is not in $F_i(v_j) \cup T_i(v_j)$, we have to account for at least the additional waiting time at a vertex in $y\in T_i(v_j)$ plus the transition time from y to z. $\square $

Altogether, we have the following upper bound.

Theorem 2

In Algorithm 2, the temporal closeness $c^I(u)$ in iteration i of the while loop is less or equal to

$$\begin{aligned} \bar{c}^I_i(u) = \sum _{v\in F_i(u)}\frac{1}{d(u,v)} + \frac{|T_i(u)|}{d_{i+1}} + \frac{|R^I(u)|-|F_i(u)|-|T_i(u)|}{d_{i+1}+\Delta +\lambda _{min}} \text {.} \end{aligned}$$

Finally, we discuss the running time of the algorithm.

Theorem 3

Let $\delta ^+_m$ be the maximal out-degree in $\mathbf {G}^I$, $\pi =\min \{{\hat{\tau }}_-^I, {\hat{\tau }}_+^I\}$, and furthermore $\xi =\max \{\pi \delta ^+_m,\log (|\mathbf {E}^I|)\}$. The running time of Algorithm 2 is in ${\mathcal {O}}(|V|^2+|V||\mathbf {E}|\cdot {\hat{\tau }}_+^I \cdot \xi )$.

Proof

Ordering the vertices by decreasing out-degree takes ${\mathcal {O}}(|V|\log |V|)$ time. Computing $\lambda _{min}$ is done during the removal of vertices that are not in the time interval I, i.e., this together takes ${\mathcal {O}}(|\mathbf {E}|)$ time. We can compute $\Delta $ by checking for each edge $(u,v,t,\lambda )\in \mathbf {E}^I$ the time stamps of the outgoing edges at v in ${\mathcal {O}}(|\mathbf {E}^I|\delta _m^+)$. Initialization is done in ${\mathcal {O}}(|\mathbf {E}|)$. The outer for loop (line 4) is called for each vertex, and the inner loop is an adaption of Algorithm 1. The updating of the upper bound can be done in constant time. Therefore, we have |V| times the running time of Algorithm 1. Keeping track of the top-k results is possible in ${\mathcal {O}}(|V|\log k)$. $\square $

We can easily adapt this algorithm to compute the closeness for all vertices. In this case, we do not need the number of reachable vertices, and we do not need to keep track of the partitioning of the vertex set or the upper bound $\bar{c}(v_i)$.

4.3 Heuristic modifications

We propose two different modifications for our fastest path algorithm and obtain heuristic versions of our temporal closeness algorithms.

Heuristic 1: We propose the following heuristic approach. The idea is to limit the size of the label lists at the vertices. Let $h\in {\mathbb {N}}$ be the maximal size of each label list at $v\in V$. At each vertex $v\in V$, we then have maximal h labels at any point during the computation of the temporal closeness, i.e., $\Pi [v]\le h$. We achieve this by adding a check before adding a new label to the list in line 24. If the size of $\Pi [v]$ is already h at this point, we discard the new label.

Theorem 4

Let $\delta _m^+$ be the maximal out-degree and $\xi =\max \{h\delta ^+_m,\log (|\mathbf {E}^I|)\}$. The running time of the temporal top-k closeness algorithm using Heuristic 1 is in ${\mathcal {O}}(|V|^2+|V||\mathbf {E}| \cdot {\hat{\tau }}_+^I \xi )$.

Proof

Let $\mathbf {G}=(V,\mathbf {E})$ be a temporal graph, $u\in V$, and I a time interval. The fastest path algorithm using the first heuristic returns estimations of the durations of the fastest (u, v)-paths for all $v\in V$ in $\mathbf {G}^I$ in running time ${\mathcal {O}}(|V|+|\mathbf {E}|\cdot |\tau ^I_+(u)|\cdot \xi )$. The difference for the running time happens during the inner for loop when a label is created and compared to all labels at vertex w for the dominance check. The number of labels at a vertex w is now less or equal to the size of $\Pi [w]$, i.e., h, and the domination checks for all outgoing edges (line 9 ff.) can be done in $h\delta _m^+$. Because we call the adapted fastest paths algorithm for each $v\in V$ the result follows. $\square $

Heuristic 2: We can further reduce the running time in the case of $h=1$ by ignoring vertices after they are visited for the first time. We only need to store a single label at each vertex and do not need the label lists at the vertices. Even though this further reduces the possibilities to find the shortest duration path, our experimental evaluation shows that this approach performs well on real-world data sets. During the computation of the closeness for a vertex $v_j$, after for a vertex v the duration $d(v_j, v)$ is set and v is added to the set $F_i$, the heuristic algorithms do not further consider or generate labels for v. Therefore, vertex v is assigned the duration $a-s$ when label (v, s, a) is processed, and new labels are only generated for edges $(v,w,t,\lambda )$ with $a\le t$. All further labels $(v,s',a')$ that are returned from the priority queue in the following iterations are ignored, even if $a'<a$. This means that we only have to store the label with the minimum duration found so far at each vertex $v \in V$ because this is the label that will determine the final duration d(u, v). Consequently, this approach leads to a Dijkstra-like algorithm. As we argued already in Sect. 3.1, we cannot guarantee to find the fastest paths because, in general, fastest paths do not consist of fastest sub-paths.

Theorem 5

The running time of temporal top-k closeness algorithm using the second heuristic is in ${\mathcal {O}}(|V|^2+|V||\mathbf {E}| \cdot \log (|\mathbf {E}^I|))$.

Proof

The fastest path algorithm using the second heuristic has a running time in ${\mathcal {O}}(|V|+|\mathbf {E}| \cdot \log (|\mathbf {E}^I|))$. The dominance checks are unnecessary, and the total number of labels generated is less or equal to $|\mathbf {E}|^I$. Because we do not visit any vertex more than once, for each edge at most one label is added to the priority queue. We call the adapted fastest paths algorithm for each $v\in V$, and therefore, the result follows. $\square $

4.4 The number of reachable vertices

The top-k temporal closeness algorithms need as one of its inputs the number $r^I(u)$ of reachable vertices in $\mathbf {G}^I$ for all $u\in V$. There are different possibilities to obtain the number of reachable vertices exactly. We can determine all $r^I(u)$ using |V| times a temporal DFS or BFS, cf. [19]. If the adjacent edges are stored in chronological order at each vertex, it takes ${\mathcal {O}}(|V|^2+|V||\mathbf {E}|)$ total running time. Alternatively, we can use a one-pass streaming algorithm similar to the earliest-arrival path algorithm in [52] with ${\mathcal {O}}(|V|^2+|V||\mathbf {E}|)$ running time. In Sect. 5.1, we present an approximation for speeding up the computation of the number of reachable vertices.

4.5 Comparison to the baseline algorithm

For a temporal graph $\mathbf {G}=(V, \mathbf {E})$ and a time interval I, the baseline algorithm first removes all edges that are not in I. Next, it runs the fastest path edge streaming algorithm from Wu et al. [52] for each $u\in V$ to determine the closeness $c^I(u)$. Their fastest path algorithm uses a single pass over the edges, which need to be sorted by increasing time steps. One crucial difference is that their algorithm may update the minimal duration of a vertex after it was visited the first time. This is necessary if, e.g., the last edge of the edge stream is the fastest (u, v)-path consisting of one edge. However, for our top-k algorithms, the fastest duration between two vertices u and v must be determined when v is discovered for the first time. For a temporal graph $\mathbf {G}=(V,\mathbf {E})$ spanning time interval I, let $\delta _m^-$ be the maximal in-degree in $\mathbf {G}$, and let $\pi =\min \{\delta _m^-, |\tau ^I_+(u)|\}$. The running time of the edge stream algorithm for computing fastest path is in ${\mathcal {O}}(|V| + |\mathbf {E}|\log \pi )$ and in graphs with equal transition time for all edges in ${\mathcal {O}}(|V| + |\mathbf {E}|)$. For $\pi '=\min \{\delta _m^-, {\hat{\tau }}_+^I\}$, starting their algorithm from each vertex leads to temporal closeness algorithms with running times of ${\mathcal {O}}(|V|^2 + |V||\mathbf {E}|\log \pi ')$, or ${\mathcal {O}}(|V|^2 + |V||\mathbf {E}|)$, respectively. Notice that this is faster than the worst case running time of our algorithm. However, the streaming algorithm always has to scan all edges. Even for a vertex v with no out-going edge, the edge stream algorithm has to traverse over all edges because in each iteration it only knows the edges up to the point in time of the current edge and can not rule out possible future edges incident to v. Our algorithm, however, stops in this case after one iteration. Therefore, our algorithm performs well on real-world data sets.

5 Approximations for equal transition times

We lift two approximations from the static to the temporal domain. The first one is for the number of reachable vertices and is based on the estimation framework introduced by Cohen et al. [8]. The second one approximates the temporal closeness for all vertices and adapts the undirected static closeness approximation from Wang et al. [13]. Both algorithms are applicable if the temporal graph has equal transition times for all edges. This is often the case for social and contact networks.

First, we introduce the concept of the temporal transpose of a temporal graph, which corresponds to reversing the edges in a conventional static graph and allows the search of incoming temporal paths.

Definition 5

For a temporal graph $\mathbf {G}=(V,\mathbf {E})$, we call ${\mathcal {I}}(\mathbf {G})=(V,{\tilde{\mathbf {E}}})$ with edge set ${\tilde{\mathbf {E}}} = \{(v,u,t_{max}-t,\lambda )\mid (u,v,t,\lambda )\in \mathbf {E}\}$ and $t_{max}=\max \{t+\lambda \mid (u,v,t,\lambda )\in \mathbf {E}\}$ the temporal transpose of $\mathbf {G}$.

Figure 3 shows an example for a temporal graph $\mathbf {G}$ and its temporal transpose ${\mathcal {I}}(\mathbf {G})$. The following lemma is the basis for the approximations.

Lemma 4

Let $\mathbf {G}=(V,\mathbf {E})$ be a temporal graph in which all edges have equal transition times, ${\mathcal {I}}(\mathbf {G})$ its temporal transpose, and $u,v\in V$. For each (u, v)-path $P_{uv}$ with duration $d(P_{uv})=d_P$ in $\mathbf {G}$ there exists a (v, u)-path $P_{vu}$ with duration $d(P_{vu})=d_P$ in ${\mathcal {I}}(\mathbf {G})$ and vice versa.

Proof

We use t(e) to denote the availability time of edge $e\in \mathbf {E}$. First notice that every (non-temporal) (u, v)-path P in $\mathbf {G}$ corresponds to a (non-temporal) (v, u)-path in ${\mathcal {I}}(\mathbf {G})$ that visits the vertices in reverse order of P. Therefore, having a temporal path in $\mathbf {G}$ (or ${\mathcal {I}}(\mathbf {G})$) there exists a sequence of vertices and edges in ${\mathcal {I}}(\mathbf {G})$ (or $\mathbf {G}$, respectively) for which we have to show that the time stamps at the edges allow for a temporal path. Let $P=\left( v_1,e_1,v_2,\dots ,e_\ell ,v_{\ell +1}\right) $ be a temporal path in $\mathbf {G}$. Now, consider the sequence $Q=\left( v_{\ell +1},e'_\ell ,v_{\ell -1},\dots ,e'_1,v_1\right) $ in ${\mathcal {I}}(\mathbf {G})$, with $e'_i=(v_{i+1},v_i,t_{max}-t,\lambda )$ for each $e_i=(v_i,v_{i+1},t,\lambda )$ and $i\in \{1,\ldots ,\ell \}$. From $t(e_i)+\lambda \le t(e_{i+1})$ follows $t_{max}-t(e_i)\ge t_{max}-t(e_{i+1})+\lambda $. Therefore, the sequence Q is a temporal path in ${\mathcal {I}}(\mathbf {G})$. The other direction is symmetric. Furthermore, we have

$$\begin{aligned} d(P)&=a(P)-s(P)\\ {}&=\lambda +t(e_\ell )-t(e_1)\\ {}&= \lambda +t(e_\ell )-t(e_1) + t_{max}-t_{max} \\ {}&= \lambda +(t_{max}-t(e_1))-(t_{max}-t(e_\ell ))\\ {}&=\lambda +t(e'_1)-t(e'_\ell )\\ {}&=a(Q)-s(Q)=d(Q)\text {.} \end{aligned}$$

$\square $

Lemma 4 allows us to apply our temporal closeness algorithms to the transposed temporal graph in order to consider the shortest durations of incoming instead of outgoing temporal paths. This way, we can compute the temporal in-closeness.

Theorem 6

Let $\mathbf {G}$ be a temporal graph in which all edges have equal transition times, ${\mathcal {I}}(\mathbf {G})$ its temporal transpose. The temporal closeness c(u) of a vertex $u\in V$ in ${\mathcal {I}}(\mathbf {G})$ is equal to the temporal in-closeness $c_{in}(u)$ in $\mathbf {G}$.

Furthermore, Lemma 4 shows that the set of vertices R(u) reachable by u in the transposed graph ${\mathcal {I}}(\mathbf {G})$ is equal to the set of vertices in $\mathbf {G}$ that can reach u.

5.1 Approximation for the number of reachable vertices

We propose an approximation algorithm for the number of reachable vertices based on the estimation framework introduced by Cohen et al. [8]. The non-weighted version proceeds the following way. For two sets X and Y let $S:Y\rightarrow 2^X$. Furthermore, let L be an oracle that if it is presented a random permutation $r:X\rightarrow \{1,\ldots ,|X|\}$ it returns a mapping $l:Y\rightarrow X$ such that for all $y\in Y$, $l(y)\in S(y)$ and $r(l(y))=\min _{x\in S(y)}r(x)$. In h rounds, a uniformly randomly chosen value from the interval [0, 1] is assigned to each $x\in X$, i.e., the rank ${\mathcal {R}}_i(x)$ in round i. In each round, the sorted ranking induces a permutation on X that is presented to the oracle L, which returns the mapping $l_i:Y\rightarrow X$. For an estimator ${\hat{s}}(y)=\frac{h}{\sum _{i=1}^{h}{\mathcal {R}}_i(l_i(y))}-1$ the following holds.

Theorem 7

(Cohen et al. [8]) Let $0<\epsilon <1$. Then $ P(||S(y)|-{\hat{s}}(y)|\ge \epsilon |S(y)|) = e^{-\Omega (\epsilon ^2\cdot h)} \text {.} $

For a temporal graph $\mathbf {G}$ and a time interval I, Algorithm 3 computes the estimates ${\hat{r}}^I(u)$ of the number of reachable vertices $r^I(u)$, for all $u\in V$. Given $\mathbf {G}$, I and $h\in {\mathbb {N}}$, our algorithm first computes the temporal transpose ${\mathcal {I}}(\mathbf {G}^I)$. Notice that we only need to add edges $(u,v,t,\lambda )\in \mathbf {E}$ that start or arrive during I to the temporal transpose ${\mathcal {I}}(\mathbf {G}^I)$. The algorithm then proceeds with the following steps on ${\mathcal {I}}(\mathbf {G}^I)$. In h rounds, we first assign a uniformly randomly chosen value from the interval [0, 1] to each vertex, i.e., the rank ${\mathcal {R}}(v)$ of the vertex v. Let $v_1,\ldots ,v_n$ be the vertices ordered by increasing rank. The algorithm then determines for each $v_i$ the set of reachable vertices from $v_i$, i.e., $R(v_i)$, and sets the mapping $l_j(v)$ of all newly found $v\in {R}(v_i)$ to ${\mathcal {R}}(v_i)$.

Theorem 8

Let $\mathbf {G}$ be a temporal graph, I a time interval and $h\in {\mathbb {N}}$. Algorithm 3 approximates the reachability ${\hat{r}}^I(u)$ for each $u\in V$ and $0<\epsilon < 1$ such that

$$\begin{aligned} P\big (|r^I(u)-{\hat{r}}^I(u)|\ge \epsilon r^I(u)\big ) = e^{-\Omega (\epsilon ^2\cdot h)} \text {.} \end{aligned}$$

Proof

After assigning $l_j(v)={\mathcal {R}}(v_i)$, we do not need to reach v again using the edges that we took on the path from $v_i$ to v. All possible extensions outgoing from v will be explored. Then, the result follows from Lemma 4 and Theorem 7. $\square $

Computing ${\mathcal {I}}(\mathbf {G}^I)$ takes ${\mathcal {O}}(|\mathbf {E}|)$ and assigning ${\mathcal {R}}(v)$ for all $v\in V$ is possible in ${\mathcal {O}}(|\mathbf {E}^I|+|V|)$. Sorting is possible in ${\mathcal {O}}(|V|\log |V|)$, and computing R(v) takes ${\mathcal {O}}(|\mathbf {E}^I|+|V|)$ for each $v\in V$, however, each edge is at most processed once in each of the h rounds.

Theorem 9

The running time of Algorithm 3 is in ${\mathcal {O}}(|\mathbf {E}| + h\cdot (|\mathbf {E}^I|+|V|^2))$.

After approximating the reachability ${\hat{r}}^I(u)$ for all $u\in V$, we can use them for the top-k temporal closeness algorithms. However, notice that underestimating the number of reachable vertices may stop the computation early for a vertex with a top-k temporal closeness value.

5.2 Approximation of the temporal closeness

We lift the randomized algorithm for undirected, static graphs from Wang et al. [13] to the temporal domain. Algorithm 4 computes an approximation of the normalized temporal closeness for each vertex in a directed temporal graph with equal transition times. Given a temporal graph $\mathbf {G}$, a time interval I and sampling size $h\in {\mathbb {N}}$, the algorithm first computes the temporal transpose ${\mathcal {I}}(\mathbf {G}^I)$, and then samples h vertices $v_1,\ldots ,v_h$. For each vertex $v_i$ it determines the shortest durations to all vertices $w\in V\setminus \{v_i\}$ in ${\mathcal {I}}(\mathbf {G}^I)$. Due to Lemma 4, we can estimate the closeness of a vertex u in $\mathbf {G}^I$ by averaging the distances d(v, u) found in ${\mathcal {I}}(\mathbf {G}^I)$.

Theorem 10

Let $\mathbf {G}=(V,\mathbf {E})$ be a temporal graph, I a time interval and $h\in {\mathbb {N}}$. Algorithm 4 approximates the normalized temporal closeness ${\hat{c}}^I_n (u)$ for each vertex $u\in V$, such that for

$h=\log |V|\cdot \epsilon ^{-2}$ the probability

$P\big (|{\hat{c}}^I_n (u)-c^I_n (u)| \ge \epsilon \big ) \le \frac{2}{|V|^2}\text {.}$

The running time is in ${\mathcal {O}}(|\mathbf {E}| + h\cdot (|V|+|\mathbf {E}^I|))$.

Proof

Due to Lemma 4, we know that there is a one-to-one mapping between the temporal paths in $\mathbf {G}^I$ and ${\mathcal {I}}(\mathbf {G}^I)$ in which corresponding paths have the same duration. Next, we use Hoeffdings inequality [16]. Let $x_1,\ldots ,x_h$ be independent random variables that are bounded by $0\le x_i \le 1$ for $i\in \{1,\ldots h\}$. Furthermore, let $S=\sum _{i=1}^{h}{x_i}$ and $\mu =\text {E}\big [S/h\big ]$ the expected mean. Then the following inequality holds.

$$\begin{aligned} P\big (|S/h - \mu | \ge \epsilon \big ) \le 2e^{-2h\epsilon ^2}\text {.} \end{aligned}$$

Now, for $x_i = \frac{1}{d(v_i,u)}$ and $\mu =\text {E}\big [\sum _{i=1}^{h}\frac{1}{d(v_i,u)}\cdot \frac{1}{h}\big ]$ we have

$$\begin{aligned}&P\Bigg (\Big |1/h \cdot \sum _{i=1}^{h}\frac{1}{d(v_i,u)} - \mu \Big | \ge \epsilon \Bigg ) \le 2e^{-2h\epsilon ^2}\text {.} \end{aligned}$$

Then with $h=\log |V|\cdot \epsilon ^{-2}$ it follows

$$\begin{aligned}&P\big (|{\hat{c}}^I_\text {n}(u)-c^I_\text {n}(u)| \ge \epsilon \big ) \le 2e^{-2(\log |V|\cdot \epsilon ^{-2})\epsilon ^2}= \frac{2}{|V|^2}\text {.} \end{aligned}$$

$\square $

6 Experiments

We address the following questions:

Q1 How do the running times of the algorithms for the top-k temporal closeness, the temporal closeness for all vertices, and the baseline compare to each other?
Q2 How much does the reachability estimation speed up the computation of the top-k temporal closeness, and how good are the results?
Q3 How does using the heuristics for computing the fastest paths affect the top-k and the exact algorithms in terms of running time and solution quality?
Q4 How well does the sampling algorithm in terms of running time and approximation quality perform?
Q5 How does temporal closeness compare to static closeness, degree centrality, and reachability? How do the temporal closeness and the temporal in-closeness compare to each other?

6.1 Data sets

We used the following real-world temporal graph data sets:

Infectious A data set from the SocioPatterns project.¹ The Infectious graph represents face-to-face contacts between visitors of the exhibition Infectious: Stay Away [20].
Arxiv An authors collaboration network from the arXiv’s High Energy Physics - Phenomenology (hep-ph) section [30]. Vertices represent authors and edges collaborations. The time stamp of an edge is the publication date.
Facebook This graph is a subset of the activity of a Facebook community over three months and contains interactions in the form of wall posts [51].
Prosper A network based on a personal loan website. Vertices represent persons, and each edge a loan from one person to another person [42].
WikipediaSG The network is based on the Wikipedia network. Each vertex represents a Wikipedia page, and each edge a hyperlink between two pages [35].
WikiTalk Vertices represent users and edges edits of a user’s talk page by another user [29, 47].
Digg and FlickrSG Digg and Flickr are social networks in which vertices represent persons and edges friendships. The times of the edges indicate when the friendship was formed [17, 34].

For WikipediaSG and FlickrSG, we chose a random time interval in size of ten percent of the total time span. For the other data sets, the time interval spans over all edges. Table 1 gives an overview of the statistics of the networks (forWikipediaSG and FlickrSG the statistics are for the subgraph $\mathbf {G}^I$). The transition times are one for all edges in all data sets.

Table 1

Statistics and properties of the data sets, with $\delta _m^-$ (resp. $\delta _m^+$) being the maximal in-degree (resp. out-degree), and $|E_s|$ the number of edges in the aggregated graph

Data set	Property
Data set	\|V\|	$ \|\mathbf {E}\|$	$\delta _m^+$	$\delta _m^-$	${\hat{\tau }}_+$	${\hat{\tau }}_-$	$\|{\mathcal {T}}(\mathbf {G})\|$	$\|E_s\|$
Infectious	$10\,972$	$415\,912$	457	544	310	307	$76\,943$	$52\,761$
Arxiv	$28\,093$	$4\,596\,803$	$9\,301$	$3\,962$	321	284	$2\,337$	$3\,148\,447$
Facebook	$63\,731$	$817\,035$	998	219	412	86	$333\,924$	$817\,035$
Prosper	$89\,269$	$3\,394\,978$	$9\,436$	$1\,071$	412	8	$1\,259$	$3\,330\,224$
WikipediaSG	$208\,142$	$810\,702$	$1\,574$	$4\,939$	153	223	223	$810\,702$
WikiTalk	$225\,749$	$1\,554\,698$	$113\,867$	$36\,413$	$75\,045$	$36\,410$	$1\,389\,632$	$565\,476$
Digg	$279\,630$	$1\,731\,653$	995	$12\,038$	982	$11\,506$	$6\,864$	$1\,731\,653$
FlickrSG	$323\,181$	$1\,577\,469$	$3\,153$	$2\,128$	20	20	20	$1\,577\,469$

6.2 Experimental protocol and algorithms

All experiments were conducted on a workstation with an AMD EPYC 7402P 24-Core Processor with 3.35 GHz and 256 GB of RAM running Ubuntu 18.04.3 LTS. We used GNU CC Compiler 9.3.0 with the flag –O2 and implemented the following algorithms in C++:

TC-Top-k is our top-k temporal closeness algorithm.
TC-All is our temporal closeness algorithm for computing the exact values for all vertices.
TC-Approx is our temporal closeness approximation.
EdgeStr is the temporal closeness algorithm based on the edge stream algorithm for equal transition times [52].

For TC-Top-k and TC-All, we also implemented the variants using the heuristics Heuristic1 and Heuristic2 described in Sect. 4.3. Our source code and the data sets are available at https://gitlab.com/tgpublic/tgcloseness.

6.3 Results and discussion

Table 2

Running times in seconds for the exact temporal closeness algorithms

Data set	Algorithm
Data set	TC-Top-1	TC-Top-10	TC-Top-100	TC-Top-1000	TC-All	EdgeStr
Infectious	1.19	1.26	1.39	1.77	1.77	21.15
Arxiv	339.48	367.10	546.76	$1\,076.83$	$1\,876.88$	$1\,488.43$
Facebook	107.18	108.57	113.82	146.29	213.69	389.84
Prosper	264.70	267.09	269.98	288.76	465.83	$1\,917.84$
WikipediaSG	394.10	400.02	400.88	439.76	768.24	$1\,622.04$
WikiTalk	$1\,190.07$	$1\,314.41$	$1\,765.39$	$2\,899.65$	$3\,468.53$	$3\,158.30$
Digg	$2\,334.34$	$2\,339.54$	$2\,422.64$	$3\,697.73$	$8\,541.81$	$5\,741.24$
FlickrSG	$4\,212.86$	$4\,228.94$	$4\,299.79$	$4\,792.36$	$14\,162.06$	$10\,666.30$

Table 3

Running times in seconds for the algorithms using the reachability approximation with $h=5$. The running times are the average and standard deviations over ten repetitions

Data set	Algorithm with Reachability Approximation
Data set	TC-Top-10	TC-Top-100	TC-Top-1000
Infectious	$1.30\, \scriptstyle \pm \, 0.01$	$1.45 \,\scriptstyle \pm \,0.01$	$1.88\, \scriptstyle \pm \,0.01$
Arxiv	$117.25 \,\scriptstyle \pm \,0.34$	$310.36\, \scriptstyle \pm \,0.53$	$1\,075.13\, \scriptstyle \pm \,5.10$
Facebook	$33.81\, \scriptstyle \pm \, 2.83$	$41.14 \,\scriptstyle \pm \, 4.68$	$77.32 \,\scriptstyle \pm \, 5.39$
Prosper	$224.33\, \scriptstyle \pm \,3.09$	$227.56 \,\scriptstyle \pm \, 0.48$	$246.65 \,\scriptstyle \pm \,1.49$
WikipediaSG	$346.95\, \scriptstyle \pm \, 25.20$	$340.39 \,\scriptstyle \pm \, 2.31$	$369.37\, \scriptstyle \pm \, 2.43$
WikiTalk	$1\,010.12\, \scriptstyle \pm \, 5.86$	$1\,472.38\, \scriptstyle \pm \,10.63$	$2\,678.99 \,\scriptstyle \pm \, 4.31$
Digg	$957.80\, \scriptstyle \pm \,6.19$	$1\,056.20\, \scriptstyle \pm \, 5.55$	$2\,353.76 \,\scriptstyle \pm \, 2.78$
FlickrSG	$965.75\, \scriptstyle \pm \,2.03$	$1\,014.69\, \scriptstyle \pm \, 1.78$	$1\,510.15\, \scriptstyle \pm \, 2.73$

Table 4

Running times in seconds for the first heuristic with $h=2$

Data set	Algorithms using Heuristic1
Data set	TC-Top-1	TC-Top-10	TC-Top-100	TC-Top-1000	TC-All
Infectious	0.62	0.64	0.68	0.89	0.90
Arxiv	112.43	117.53	157.86	381.49	717.44
Facebook	94.35	95.30	99.46	126.84	168.43
Prosper	88.01	88.67	90.34	103.66	178.69
WikipediaSG	282.07	280.49	282.81	298.79	352.15
WikiTalk	891.24	902.72	940.52	$1\,245.50$	$2\,033.96$
Digg	$2\,023.51$	$2\,027.85$	$2\,054.38$	$2\,376.91$	$6\,548.97$
FlickrSG	$3\,987.14$	$3\,987.44$	$4\,008.83$	$4\,235.94$	$9\,457.16$

Table 5

Running times in seconds for the second heuristic

Data set	Algorithms using Heuristic2
Data set	TC-Top-1	TC-Top-10	TC-Top-100	TC-Top-1000	TC-All
Infectious	0.53	0.54	0.56	0.66	0.61
Arxiv	115.21	116.97	131.37	218.09	309.34
Facebook	93.27	93.60	95.63	109.96	101.13
Prosper	83.62	83.63	85.25	93.08	130.55
WikipediaSG	266.17	266.06	267.61	277.92	242.35
WikiTalk	858.12	863.55	875.96	989.35	$1\,149.01$
Digg	$2\,000.59$	$2\,000.63$	$2\,008.89$	$2\,089.17$	$2\,971.84$
FlickrSG	$3\,934.20$	$3\,935.52$	$3\,942.18$	$4\,011.21$	$4\,444.82$

Q1 Table 2 shows the running times of the exact temporal closeness algorithms. We computed the numbers of reachable vertices exactly and set $k\in \{1,10,100,1000\}$ for the top-k algorithms. The reported running times include the time spend for computing the reachability. For all data sets, the top-k algorithms need significantly less running time than EdgeStr and TC-All. For the Infectious data set, the running time can be reduced by a factor between 10 ($k=1000$) and 17 ($k=1$) compared to EdgeStr. Figure 4 shows the reduction of the running time in percent compared to EdgeStr. For Prosper and WikipediaSG the running times compared to EdgeStr are decreased by more than $85\%$ and $72\%$ for each $k\in \{1,10,100,1000\}$. In case of Facebook the decrease is between $62\%$ ($k=1000$) and $72\%$ ($k=1$). Also, for the other data sets, considerable improvements are reached, i.e., for $k=1$ and $k=10$, the running time is reduced by at least $60\%$. With increasing k, the running time also increases because the upper bounds calculated during the run of TC-Top-k converge slower at the lower end of the top-k values.

Our exact algorithm TC-All is faster than EdgeStr for the Infectious, Facebook, Prosper and WikipediaSG data sets. For the Infectious data set our exact algorithm is more than ten times faster, and for the Prosper data set it is more than four times faster than the baseline. In case of the WikipediaSG data set, TC-All is more than twice as fast, and it decreases the running time for Facebook by $45\%$. For the Arxiv, Digg, WikiTalk and the FlickrSG data sets, EdgeStr is faster than TC-All. The reason is a higher number of labels per vertex that are generated during the iterations of Algorithm 2. This leads to a worse performance of TC-All for these data sets.

Q2 Table 3 shows the average running times and standard deviations for our top-k temporal closeness algorithms using the reachability approximation over ten repetitions. We approximate ${\hat{r}}(u)$ for all $u\in V$ using Algorithm 4 with parameter $h=5$. For the relatively small Infectious network, the approximation cannot improve the running time. For all other data sets, the reachability approximation reduces the running time. Notice the large improvements for the Facebook, Digg, and FlickrSG data sets of at least $45\%$, $36\%$, and $68\%$, respectively. Even though there are approximation guarantees for the reachability approximation (Theorem 9), we are not able to give a guarantee for finding the top-k sets. The reason is that underestimating the number of reachable vertices may stop the computation of a vertex that has a top-k temporal closeness value. To evaluate the solution quality, we measure the Jaccard similarity between top-k vertex sets computed using the algorithms with the reachability approximation and the exact top-k sets. The Jaccard similarity between two sets A and B is defined as $J(A,B)=\frac{|A\cap B|}{|A\cup B|}$. Values closer to one mean a higher similarity of the sets. For all data sets and all $k\in \{10,100,1000\}$, we obtained top-k vertex sets with similarities of one, i.e., the algorithm found the exact top-k sets in all ten rounds.

Q3 We ran the temporal closeness algorithms using Heuristic1 with $h=2$. The running times are shown in Table 4. They are lower than those of our exact algorithms for all data sets. Figure 5a shows the decrease in running time compared to the exact algorithms. For Infectious, the running time is decreased by around $50\%$ for all algorithms. In case of Arxiv the heuristic decreases the running time between $61\%$ (TC-All) and $71\%$ (TC-Top-100). Similar results are achieved for the Prosper data set with decreases between $61\%$ (TC-All) and $66\%$ (TC-Top-1, TC-Top-10 and TC-Top-100). For WikipediaSG and WikiTalk, the decrease in running time is between $25\%$ and $57\%$, and for Digg between $13\%$ and $35\%$. The lowest decrease in running time is achieved for Facebook and FlickrSG with values between $11\%-21\%$ and $5\%-33\%$, respectively. We measured the average relative deviation from the exact closeness values and the Jaccard similarity between the computed top-k sets and the exact solutions. For Arxiv, Prosper, Digg and FlickrSG the exact top-k sets are computed for all $k\in \{1,10,100,1000\}$, i.e., the Jaccard similarity is one. Furthermore, for Arxiv and Prosper the average relative deviation from the exact closeness is less than $2\cdot 10^{-6}$. For Digg and FlickrSG, the deviations are less than 0.0003 and $7\cdot 10^{-5}$. The Jaccard similarities for all remaining data sets are one in case of $k=1$ and $k=10$. For Facebook, WikipediaSG and WikiTalk, the Jaccard similarity is one for $k=100$ and 0.99 for $k=1000$. The similarities for Infectious are 0.96 for both $k=100$ and $k=1000$. The average relative deviation from the exact closeness is less than $3\cdot 10^{-5}$ for Facebook, WikipediaSG and WikiTalk. For the Infectious data set, the deviation is less than 0.007.

Next, we ran the temporal closeness algorithms using the second heuristic (Heuristic2). Table 5 shows the running times. As expected, the running times are lower than those of our exact algorithms for all data sets and lower than the running times of the algorithms using the first heuristic in almost all cases. Figure Fig. 5b shows the decrease in running time compared to the exact algorithms. For Infectious, Arxiv and Prosper the second heuristic leads to large gains in performance for all algorithms of at least $55\%$, $66\%$, and $68\%$, respectively. In case of WikipediaSG and WikiTalk, the decrease is at least $27\%$ and $32\%$, respectively. Notice that for TC-All the decrease is at least $52\%$ (Facebook) and up to $83\%$ (Arxiv). All data sets but Infectious have a Jaccard similarity of one for $k\in \{1,10,100\}$. Arxiv, Facebook and Prosper have a Jaccard similarity of one for $k=1000$, and their average relative deviation from the exact closeness is less than $2\cdot 10^{-6}$. The Infectious data set has a Jaccard similarity of 0.96 for both $k=100$ and $k=1000$. The average relative deviation is 0.008. The remaining data sets have a Jaccard similarity of 0.99 for $k=1000$ and an average relative deviation of 0.003 (FlickrSG), $5\cdot 10^{-5}$ (WikiTalk), $3\cdot 10^{-5}$ (Digg), and $1\cdot 10^{-6}$ (WikipediaSG). The heuristics provide very good results with small errors in the computed closeness values. For small k, the correct top-k sets are found. For larger k, the Jaccard similarity is still very high.

Q4 For a fixed error $\epsilon $, the sample size h grows logarithmically in |V|, and the approximation is most suitable for data sets with a high number of vertices. For example, if we want to achieve an error of $\epsilon \le 0.001$ (with high probability) for a data set with $|V|=10^6$ vertices it follows $h=\log 10^6 \cdot 0.001^{-2} \approx 1.38\cdot 10^7$. In this case, the needed sample size exceeds the number of vertices by a factor greater than ten, and the exact computation of the normalized temporal closeness will be faster. However, for a data set with $|V|=10^8$ vertices, we have $h\approx 1.84\cdot 10^7$, and more drastically for $|V|=10^{100}$ it follows $h\approx 2.3 \cdot 10^8$. Choosing a large enough h to achieve a low error for data sets with few vertices may lead to too high running times. Hence, we ran the Algorithm 4 ten times for each data set with the sampling sizes $h=p\cdot |V|$ with $p\in \{0.1,0.2,0.5\}$. Table 6 reports the average running times and the standard deviations. Theorem 10 shows that TC-Approx with high probability computes closeness values with a small additive error. For $p=0.5$, the running times for Arxiv, WikiTalk and Digg exceed the running times using the exact algorithms. As expected, with increased sampling sizes, the approximation error is reduced. Table 6 shows the approximation errors for the data sets. For $p=0.1$ an approximation error of only $8\%$ is reached for WikiTalk while using less than half the running time of EdgeStr. The standard deviation of the mean approximation error over the ten rounds is below 0.01 for all data sets.

Table 6

The average running times and standard deviations in seconds and the mean approximation error for TC-Approx with sampling sizes $h=p\cdot |V|$

Data set	TC-Approx
	Running time			Approximation error
	$p=0.1$	$p=0.2$	$p=0.5$	$p=0.1$	$p=0.2$	$p=0.5$
Infectious	$0.25 \,\scriptstyle \pm \, 0.01$	$0.47\, \scriptstyle \pm \,0.01$	$1.12\, \scriptstyle \pm \, 0.01$	0.83	0.74	0.52
Arxiv	$410.78\, \scriptstyle \pm \,5.16$	$862.46\, \scriptstyle \pm \, 10.71$	$2\,067.93\, \scriptstyle \pm \, 7.52$	0.78	0.70	0.45
Facebook	$28.22\, \scriptstyle \pm \, 0.37$	$56.61 \,\scriptstyle \pm \, 1.00$	$131.58 \,\scriptstyle \pm \,1.20$	0.63	0.57	0.38
Prosper	$48.83 \,\scriptstyle \pm \,0.94$	$98.00 \,\scriptstyle \pm \, 0.62$	$228.96 \,\scriptstyle \pm \,0.48$	0.59	0.53	0.34
WikipediaSG	$84.51 \,\scriptstyle \pm \, 0.67$	$170.38 \,\scriptstyle \pm \,2.92$	$353.54 \,\scriptstyle \pm \, 4.23$	0.41	0.36	0.25
WikiTalk	$1\,573.23\, \scriptstyle \pm \,10.06$	$3\,173.40 \,\scriptstyle \pm \, 27.47$	$7\,744.86 \,\scriptstyle \pm \, 35.24$	0.08	0.07	0.06
Digg	$1\,682.70\, \scriptstyle \pm \, 27.82$	$3\,356.85 \,\scriptstyle \pm \, 42.60$	$8\,259.91 \,\scriptstyle \pm \, 62.70$	0.23	0.21	0.15
FlickrSG	$1\,656.13 \,\scriptstyle \pm \,7.58$	$3\,288.93\, \scriptstyle \pm \, 27.47$	$6\,946.62 \,\scriptstyle \pm \, 44.30$	0.66	0.59	0.43

Table 7

Jaccard similarity between the top-k temporal closeness vertices and the temporal in-closeness, static closeness vertices in the aggregated graph, vertices with the highest number of outgoing temporal edges, and vertices with the highest reachability

Data set	Static closeness			Degree centrality			Reachability			Temporal in-closeness
Data set	$k=10$	$k=100$	$k=1000$	$k=10$	$k=100$	$k=1000$	$k=10$	$k=100$	$k=1000$	$k=10$	$k=100$	$k=1000$
Infectious	0.25	0.18	0.26	0.00	0.11	0.09	0.00	0.03	0.09	0.00	0.03	0.15
Arxiv	0.81	0.41	0.41	0.25	0.54	0.63	0.00	0.08	0.23	0.00	0.04	0.10
Facebook	0.00	0.00	0.01	1.00	0.74	0.01	0.17	0.06	0.18	0.00	0.03	0.09
Prosper	0.00	0.45	0.41	0.81	0.90	0.01	0.00	0.03	0.06	0.00	0.00	0.00
WikipediaSG	0.00	0.15	0.39	1.00	0.70	0.005	0.00	0.02	0.07	0.00	0.00	0.07
WikiTalk	0.42	0.45	0.63	0.53	0.57	0.004	0.00	0.01	0.13	0.17	0.50	0.66
Digg	0.00	0.00	0.00	0.83	0.56	0.004	0.00	0.00	0.01	0.00	0.00	0.13
FlickrSG	0.42	0.34	0.25	1.00	0.95	0.003	0.05	0.05	0.13	0.33	0.26	0.33

Q5 We measured the Jaccard similarity between the temporal and static top-k closeness vertex sets. Table 7 shows that the similarity is very low in most cases, and for Facebook and Digg even zero. In case of Arxiv and $k=10$, the similarity is highest with 0.81. Furthermore, Table 7 shows the Jaccard similarity between the temporal top-k closeness vertices and the vertices with the largest number of outgoing temporal edges, i.e., the out-degree centrality. For all chosen k, the Jaccard similarity for Infectious is low. For Facebook, WikipediaSG, FlickrSG the sets of the top-10 temporal closeness vertices and degree centrality vertices are equal. However, the similarity declines fast for each of these data sets and is very low for $k=1000$. We also compared the Jaccard similarity between the top-k temporal closeness vertices and the top-k vertex sets of vertices with the highest reachability values, i.e., vertices u for which r(u) is highest (Table 7). The values are small for all data sets, with 0.23 being the highest similarity for Arxiv and $k=1000$. For $k=10$, all similarities are zero with exception of the Facebook and FlickrSG data sets with similarities of 0.17 and 0.05. The Jaccard similarities between the top-k temporal closeness and the temporal in-closeness vertices are very low for all data sets but WikiTalk and FlickrSG. These low Jaccard similarities are expected due to the missing symmetries of the directed temporal edges and the temporal restrictions. The vertices with a high temporal closeness are different from the vertices that have a high temporal in-closeness.

To further investigate the relationships between the vertex rankings obtained using temporal closeness, degree centrality, and reachability, we measured the correlations of the rankings using the Kendall rank correlation coefficient [23]. Because vertices in a graph can have equal centrality values and rank, we use the version accounting for possible ties in the rankings (Kendall’s $\tau _b$ coefficient) [24]. The Kendall rank correlation coefficient is commonly used for analyzing and comparing vertex rankings given by centrality measures [14]. Let $X=(x_1,\ldots ,x_n)$ and $Y=(y_1,\ldots ,y_n)$ be two sequences, i.e., the ranked vertices. A pair of $(x_i,y_i)$ and $(x_j,y_j)$ with $1\le i<j\le n$ is concordant if $x_i<x_j$ and $y_i<y_j$ or $x_i>x_j$ and $y_i>y_j$. The pair is discordant if $x_i>x_j$ and $y_i<y_j$ or $x_i<x_j$ and $y_i>y_j$. The Kendall rank correlation is defined as the number of concordant pairs minus the number of discordant pairs, normalized by the number of all possible pairs (accounting for ties) in the rankings. The correlation coefficient takes on values between $-1$ and 1, where values close to one indicate similar rankings, close to zero no correlation, and close to minus one a strong negative correlation. Figure 6 shows the correlation matrices for all data sets. The correlation between the rankings by degree and temporal closeness is very strong for most data sets. For the Infectious and WikiTalk data sets, it is 0.53 and 0.78, and for the other data sets, it is between 0.94 and one. The reason for the high correlation is that, in the case of unit traversal times, each direct neighbor adds a value of one to the temporal closeness. For the reachability, the correlation varies between 0.56 for the Infectious and 0.97 for the WikiTalk data set, with an average value of 0.82. There is only little correlation between temporal and static closeness for most data sets.

Finally, we compare different centrality measures as a heuristic for selecting initial vertices for distributing information in a temporal network. The problem of choosing optimal initial vertices is an instance of the influence maximization problem in temporal networks. Kempe et al. [22] have shown that selecting the most influential vertices is NP-hard in static networks. Here, we simulated an information diffusion process in the Infectious network. This information could be, e.g., (fake) news, a viral campaign, or some infectious disease. We simulated the dissemination over the time spanned by the temporal graph. At the start of the simulation, only a set of $k=100$ seed vertices possess the information. We chose the seed vertices with the top-k temporal closeness, the top-k highest degree, top-k static closeness, and k randomly chosen vertices, respectively. The static closeness was computed on the aggregated graph $A(\mathbf {G})$. Next, we evaluated the spread of the information over time. Each vertex, i.e., person, that has the information propagates it with a probability of $10\%$. Hence, each time a temporal edge (u, v, t, 1) is available at time t, the vertex u gives the information to vertex v with a probability of $10\%$. The information is then available to vertex v for further distribution at time $t+1$. We repeated the experiment ten times and measured the mean number of persons who obtained the information after processing the last edge. Figure 7 shows the results. When using the top-k temporal closeness vertices as seeds, the information is distributed to the highest number of people. Using the vertices with the highest degree as seeds leads to the second-best results. The static closeness results cannot compete because the temporal restrictions are not respected during the computation of the top-k static closeness vertices. The randomly chosen vertices perform better than static closeness. In conclusion, temporal closeness can be a helpful heuristic for selecting influential vertices in temporal graphs.

7 Conclusion

We introduced algorithms for computing temporal closeness in temporal graphs. The basis for our algorithms is a new fastest path algorithm using a label setting strategy and which might be of interest on its own. Our top-k temporal closeness algorithms improved the running times for all real-world data sets. We introduced simple yet strong heuristic modifications that significantly reduced the running times and led to only small errors for all data sets. Furthermore, we adapted two randomized approximation algorithms, one for estimating the number of reachable vertices and one for approximating the closeness for all vertices. Also, the reachability approximation is interesting in itself. Both randomized approaches lead to significant speed-ups, and the sampling algorithm for temporal closeness has only a small additive error with high probability. Our approaches allow us to efficiently find the most relevant vertices and their temporal closeness in temporal graphs.

Acknowledgements

This work is funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy – EXC-2047/1 – 390685813. We thank the anonymous reviewer for valuable feedback on this paper.

Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

previous article The phantom steering effect in Q&A websites

next article Community detection combining topology and attribute information

http://www.sociopatterns.org

Bavelas A (1950) Communication patterns in task-oriented groups. The J Acoust Soc Am 22(6):725–730CrossRef

Béres F, Pálovics R, Oláh A, Benczúr András A (2018) Temporal walk based centrality metric for graph streams. Appl Netw Sci 3(1):32:1-32:26CrossRef

Bergamini E, Borassi M, Crescenzi P, Marino A, Meyerhenke H (2019) Computing top-k closeness centrality faster in unweighted graphs. ACM Trans Knowl Discov Data 13(5):53:1-53:40CrossRef

Bisenius P, Bergamini E, Angriman E, and Meyerhenke H (2018) Computing top-k closeness centrality in fully-dynamic graphs. In: Proceedings of the twentieth workshop on algorithm engineering and experiments, ALENEX, pp. 21–35. SIAM

Braha D and Bar-Yam Y (2009) Time-dependent complex networks: dynamic centrality, dynamic motifs, and cycles of social interactions. In: Adaptive Networks, Springer, pp. 39–50

Brandes Ulrik (2001) A faster algorithm for betweenness centrality. The J Math Sociol 25(2):163–177CrossRef

Bui-Xuan B-M, Ferreira A, Jarry A (2003) Computing shortest, fastest, and foremost journeys in dynamic networks. Int J Found Comput Sci 14(02):267–285MathSciNetCrossRef

Cohen E (1997) Size-estimation framework with applications to transitive closure and reachability. J Comput Syst Sci 55(3):441–453MathSciNetCrossRef

Cohen E, Delling D, Pajor T, and Werneck RF (2014) Computing classic closeness centrality, at scale. In: Proceedings of the second ACM conference on Online social networks, pp. 37–50

10.

Cooke KL, Halsey E (1966) The shortest route through a network with time-dependent internodal transit times. J Math Anal Appl 14(3):493–498MathSciNetCrossRef

11.

Crescenzi P, Magnien Clémence, Marino A (2020) Finding top-k nodes for temporal closeness in large temporal graphs. Algorithms 13(9):211MathSciNetCrossRef

12.

Das Kousik, Samanta S, Pal M (2018) Study on centrality measures in social networks: a survey. Soc Netw Anal Min 8(1):13CrossRef

13.

Eppstein D, Wang J (2001) Fast approximation of centrality. In: Proceedings of the twelfth annual symposium on discrete algorithms, January 7-9, 2001, Washington, DC, USA. ACM/SIAM, pp. 228–229

14.

Grando F, Noble D, and Lamb LC (2016) An analysis of centrality measures for complex and social networks. In: 2016 IEEE Global Communications Conference (GLOBECOM), pp. 1–6. IEEE

15.

Harary F, Gupta G (1997) Dynamic graph models. Math Comput Model 25(7):79–87MathSciNetCrossRef

16.

Hoeffding W (1994) Probability inequalities for sums of bounded random variables. In: The collected works of Wassily Hoeffding, pp. 409–426. Springer: New York, NY

17.

Hogg T, Lerman Kristina (2012) Social dynamics of digg. EPJ Data Sci 1(1):5CrossRef

18.

Holme Petter (2015) Modern temporal network theory: a colloquium. The Eur Phys J B 88(9):234CrossRef

19.

Huang S, Cheng J, and Wu H (2014) Temporal graph traversals: definitions, algorithms, and applications. CoRR, abs/1401.1919

20.

Isella L, Stehlé J, Barrat A, Cattuto C, Pinton J-F, Van den Broeck W (2011) What’s in a crowd? Analysis of face-to-face behavioral networks. J Theor Biol 271(1):166–180MathSciNetCrossRef

21.

Kempe D, Kleinberg JM, Kumar A (2002) Connectivity and inference problems for temporal networks. J Comput Syst Sci 64(4):820–842MathSciNetCrossRef

22.

Kempe D, Kleinberg JM, Tardos É (2015) Maximizing the spread of influence through a social network. Theory Comput 11:105–147MathSciNetCrossRef

23.

Kendall MG (1938) A new measure of rank correlation. Biometrika 30(1/2):81–93CrossRef

24.

Kendall MG (1945) The treatment of ties in ranking problems. Biometrika 33(3):239–251MathSciNetCrossRef

25.

Kim H, Anderson R (2012) Temporal node centrality in complex networks. Phys Rev E 85(2):026107CrossRef

26.

Kossinets Gueorgi, Watts DJ (2006) Empirical analysis of an evolving social network. Science 311(5757):88–90MathSciNetCrossRef

27.

Landherr A, Friedl B, Heidemann J (2010) A critical review of centrality measures in social networks. Bus Inform Syst Eng 2(6):371–385CrossRef

28.

Latapy M, Viard T, Magnien C (2018) Stream graphs and link streams for the modeling of interactions over time. Soc Netw Anal Min 8(1):61:1-61:29CrossRef

29.

Leskovec J, Huttenlocher D, and Kleinberg J (2010) Governance in social media: a case study of the wikipedia promotion process. In: Fourth International AAAI conference on weblogs and social media

30.

Leskovec J, Kleinberg J, and Faloutsos C (2007) Graph evolution: densification and shrinking diameters. ACM Trans Knowl Discov Data, 1(1):2–es

31.

Marchiori M, Latora Vito (2000) Harmony in the small-world. Physica A Stati Mech Appl 285(3–4):539–546CrossRef

32.

Masuda N, Holme Petter (2019) Detecting sequences of system states in temporal networks. Sci Rep 9(1):1–11

33.

Michail Othon (2016) An introduction to temporal graphs: an algorithmic perspective. Internet Math. 12(4):239–280MathSciNetCrossRef

34.

Mislove A, Koppula HS, Gummadi KP, Druschel P, and Bhattacharjee B (2008) Growth of the flickr social network. In: Proceedings of the first workshop on Online social networks, pp. 25–30

35.

Mislove AE (2009) Online social networks: measurement, analysis, and applications to distributed information systems. PhD thesis, Rice University

36.

Mutzel P and Oettershagen L (2019) On the enumeration of bicriteria temporal paths. In: Theory and applications of models of computation, TAMC, volume 11436 of lecture notes in computer science, Springer, pp. 518–535

37.

Nicosia V, Tang J, Mascolo C, Musolesi M, Russo G, Latora V (2013) Graph metrics for temporal networks. In: Temporal Networks, Springer, pp. 15–40

38.

Oettershagen L and Mutzel P (2020) Efficient top-k temporal closeness calculation in temporal networks. In: 2020 IEEE international conference on data mining, ICDM, pp. 402–411. IEEE

39.

Okamoto K, Chen W, and Li X-Y (2008) Ranking of closeness centrality for large-scale social networks. In: Frontiers in algorithmics, second annual international workshop, FAW. Springer, pp. 186–195

40.

Palla G, Barabási A-L, Vicsek T (2007) Quantifying social group evolution. Nature 446(7136):664–667CrossRef

41.

Pan RK, Saramäki J (2011) Path lengths, correlations, and centrality in temporal networks. Phys Rev E 84(1):016105CrossRef

42.

Redmond U, Cunningham P (2013) A temporal network analysis reveals the unprofitability of arbitrage in the prosper marketplace. Exp Syst Appl 40(9):3715–3721CrossRef

43.

Rodrigues FA (2019) Network centrality: an introduction. In: A mathematical modeling approach from nonlinear dynamics to complex systems, Springer, pp. 177–196

44.

Rozenshtein P and Gionis A (2016) Temporal pagerank. In: Machine learning and knowledge discovery in databases - European Conference, ECML PKDD, volume 9852 of lecture notes in computer science, Springer, pp. 674–689

45.

Santoro N, Quattrociocchi W, Flocchini P, Casteigts A, Amblard F (2011) Time-varying graphs and social network analysis: temporal indicators and metrics. https://arxiv.org/abs/1102.0629

46.

Saxena A and Iyengar S (2020) Centrality measures in complex networks: a survey. CoRR, abs/2011.07190

47.

Sun J, Kunegis J, and Staab S (2016) Predicting user roles in social networks using transfer learning with feature transformation. In: IEEE international conference on data mining workshops, pp. 128–135. IEEE Computer Society

48.

Tang J, Leontiadis I, Scellato S, Nicosia V, Mascolo C, Musolesi M, and Latora V (2013) Applications of temporal graph metrics to real-world networks. In: Temporal Networks. Springer, pp. 135–159

49.

Tang J, Musolesi M, Mascolo C, Latora V and Nicosia V (2010) Analysing information flows and key mediators through temporal centrality metrics. In: Proc 3rd Workshop on Social Network Systems, pp. 1–6

50.

Tsalouchidou I, Baeza-Yates R, Bonchi F, Liao K, and Sellis T (2019) Temporal betweenness centrality in dynamic graphs. Int J Data Sci Analy, pp. 1–16

51.

Viswanath B, Mislove A, Cha M, and Gummadi KP (2009) On the evolution of user interaction in facebook. In: Proceedings of the 2nd ACM workshop on Online social networks, pp. 37–42

52.

Huanhuan W, Cheng J, Huang S, Ke Y, Yi L, Yanyan X (2014) Path problems in temporal graphs. Proc VLDB Endow 7(9):721–732CrossRef

53.

Yan E, Ding Y (2009) Applying centrality measures to impact analysis: a coauthorship network analysis. J Am Socr Inform Sci Technol 60(10):2107–2118CrossRef

54.

Zhan J, Gurung S, Parsa Sai Phani Krishna (2017) Identification of top-k nodes in large networks using Katz centrality. J Big Data 4(1):1–19CrossRef

Title: Computing top-k temporal closeness in temporal networks
Authors: Lutz Oettershagen
Petra Mutzel
Publication date: 08-01-2022
Publisher: Springer London
Published in: Knowledge and Information Systems / Issue 2/2022
Print ISSN: 0219-1377
Electronic ISSN: 0219-3116
DOI: https://doi.org/10.1007/s10115-021-01639-4

Data set	Property
Data set	\|V\|	\( \|\mathbf {E}\|\)	\(\delta _m^+\)	\(\delta _m^-\)	\({\hat{\tau }}_+\)	\({\hat{\tau }}_-\)	\(\|{\mathcal {T}}(\mathbf {G})\|\)	\(\|E_s\|\)
Infectious	\(10\,972\)	\(415\,912\)	457	544	310	307	\(76\,943\)	\(52\,761\)
Arxiv	\(28\,093\)	\(4\,596\,803\)	\(9\,301\)	\(3\,962\)	321	284	\(2\,337\)	\(3\,148\,447\)
Facebook	\(63\,731\)	\(817\,035\)	998	219	412	86	\(333\,924\)	\(817\,035\)
Prosper	\(89\,269\)	\(3\,394\,978\)	\(9\,436\)	\(1\,071\)	412	8	\(1\,259\)	\(3\,330\,224\)
WikipediaSG	\(208\,142\)	\(810\,702\)	\(1\,574\)	\(4\,939\)	153	223	223	\(810\,702\)
WikiTalk	\(225\,749\)	\(1\,554\,698\)	\(113\,867\)	\(36\,413\)	\(75\,045\)	\(36\,410\)	\(1\,389\,632\)	\(565\,476\)
Digg	\(279\,630\)	\(1\,731\,653\)	995	\(12\,038\)	982	\(11\,506\)	\(6\,864\)	\(1\,731\,653\)
FlickrSG	\(323\,181\)	\(1\,577\,469\)	\(3\,153\)	\(2\,128\)	20	20	20	\(1\,577\,469\)

Data set	Algorithm with Reachability Approximation
Data set	TC-Top-10	TC-Top-100	TC-Top-1000
Infectious	\(1.30\, \scriptstyle \pm \, 0.01\)	\(1.45 \,\scriptstyle \pm \,0.01\)	\(1.88\, \scriptstyle \pm \,0.01\)
Arxiv	\(117.25 \,\scriptstyle \pm \,0.34\)	\(310.36\, \scriptstyle \pm \,0.53\)	\(1\,075.13\, \scriptstyle \pm \,5.10\)
Facebook	\(33.81\, \scriptstyle \pm \, 2.83\)	\(41.14 \,\scriptstyle \pm \, 4.68\)	\(77.32 \,\scriptstyle \pm \, 5.39\)
Prosper	\(224.33\, \scriptstyle \pm \,3.09\)	\(227.56 \,\scriptstyle \pm \, 0.48\)	\(246.65 \,\scriptstyle \pm \,1.49\)
WikipediaSG	\(346.95\, \scriptstyle \pm \, 25.20\)	\(340.39 \,\scriptstyle \pm \, 2.31\)	\(369.37\, \scriptstyle \pm \, 2.43\)
WikiTalk	\(1\,010.12\, \scriptstyle \pm \, 5.86\)	\(1\,472.38\, \scriptstyle \pm \,10.63\)	\(2\,678.99 \,\scriptstyle \pm \, 4.31\)
Digg	\(957.80\, \scriptstyle \pm \,6.19\)	\(1\,056.20\, \scriptstyle \pm \, 5.55\)	\(2\,353.76 \,\scriptstyle \pm \, 2.78\)
FlickrSG	\(965.75\, \scriptstyle \pm \,2.03\)	\(1\,014.69\, \scriptstyle \pm \, 1.78\)	\(1\,510.15\, \scriptstyle \pm \, 2.73\)

Springer Professional

Computing top-k temporal closeness in temporal networks

Abstract

Publisher's Note

1 Introduction

3 Preliminaries

3.1 Temporal paths

3.2 Harmonic temporal closeness

4 Algorithms for temporal closeness

4.1 A label setting fastest path algorithm

4.2 Computing Top-k temporal closeness

4.3 Heuristic modifications

4.4 The number of reachable vertices

4.5 Comparison to the baseline algorithm

5 Approximations for equal transition times

5.1 Approximation for the number of reachable vertices

5.2 Approximation of the temporal closeness

6 Experiments

6.1 Data sets

6.2 Experimental protocol and algorithms

6.3 Results and discussion

7 Conclusion

Acknowledgements

Publisher's Note

Premium Partner

Data set	TC-Approx
	Running time			Approximation error
	\(p=0.1\)	\(p=0.2\)	\(p=0.5\)	\(p=0.1\)	\(p=0.2\)	\(p=0.5\)
Infectious	\(0.25 \,\scriptstyle \pm \, 0.01\)	\(0.47\, \scriptstyle \pm \,0.01\)	\(1.12\, \scriptstyle \pm \, 0.01\)	0.83	0.74	0.52
Arxiv	\(410.78\, \scriptstyle \pm \,5.16\)	\(862.46\, \scriptstyle \pm \, 10.71\)	\(2\,067.93\, \scriptstyle \pm \, 7.52\)	0.78	0.70	0.45
Facebook	\(28.22\, \scriptstyle \pm \, 0.37\)	\(56.61 \,\scriptstyle \pm \, 1.00\)	\(131.58 \,\scriptstyle \pm \,1.20\)	0.63	0.57	0.38
Prosper	\(48.83 \,\scriptstyle \pm \,0.94\)	\(98.00 \,\scriptstyle \pm \, 0.62\)	\(228.96 \,\scriptstyle \pm \,0.48\)	0.59	0.53	0.34
WikipediaSG	\(84.51 \,\scriptstyle \pm \, 0.67\)	\(170.38 \,\scriptstyle \pm \,2.92\)	\(353.54 \,\scriptstyle \pm \, 4.23\)	0.41	0.36	0.25
WikiTalk	\(1\,573.23\, \scriptstyle \pm \,10.06\)	\(3\,173.40 \,\scriptstyle \pm \, 27.47\)	\(7\,744.86 \,\scriptstyle \pm \, 35.24\)	0.08	0.07	0.06
Digg	\(1\,682.70\, \scriptstyle \pm \, 27.82\)	\(3\,356.85 \,\scriptstyle \pm \, 42.60\)	\(8\,259.91 \,\scriptstyle \pm \, 62.70\)	0.23	0.21	0.15
FlickrSG	\(1\,656.13 \,\scriptstyle \pm \,7.58\)	\(3\,288.93\, \scriptstyle \pm \, 27.47\)	\(6\,946.62 \,\scriptstyle \pm \, 44.30\)	0.66	0.59	0.43

Data set	Static closeness			Degree centrality			Reachability			Temporal in-closeness
Data set	\(k=10\)	\(k=100\)	\(k=1000\)	\(k=10\)	\(k=100\)	\(k=1000\)	\(k=10\)	\(k=100\)	\(k=1000\)	\(k=10\)	\(k=100\)	\(k=1000\)
Infectious	0.25	0.18	0.26	0.00	0.11	0.09	0.00	0.03	0.09	0.00	0.03	0.15
Arxiv	0.81	0.41	0.41	0.25	0.54	0.63	0.00	0.08	0.23	0.00	0.04	0.10
Facebook	0.00	0.00	0.01	1.00	0.74	0.01	0.17	0.06	0.18	0.00	0.03	0.09
Prosper	0.00	0.45	0.41	0.81	0.90	0.01	0.00	0.03	0.06	0.00	0.00	0.00
WikipediaSG	0.00	0.15	0.39	1.00	0.70	0.005	0.00	0.02	0.07	0.00	0.00	0.07
WikiTalk	0.42	0.45	0.63	0.53	0.57	0.004	0.00	0.01	0.13	0.17	0.50	0.66
Digg	0.00	0.00	0.00	0.83	0.56	0.004	0.00	0.00	0.01	0.00	0.00	0.13
FlickrSG	0.42	0.34	0.25	1.00	0.95	0.003	0.05	0.05	0.13	0.33	0.26	0.33

Springer Professional

Abstract

Publisher's Note

1 Introduction

2 Related work

3 Preliminaries

3.1 Temporal paths

3.2 Harmonic temporal closeness

4 Algorithms for temporal closeness

4.1 A label setting fastest path algorithm

4.2 Computing Top-k temporal closeness

4.3 Heuristic modifications

4.4 The number of reachable vertices

4.5 Comparison to the baseline algorithm

5 Approximations for equal transition times

5.1 Approximation for the number of reachable vertices

5.2 Approximation of the temporal closeness

6 Experiments

6.1 Data sets

6.2 Experimental protocol and algorithms

6.3 Results and discussion

7 Conclusion

Acknowledgements

Publisher's Note

Other articles of this Issue 2/2022

Community detection combining topology and attribute information

Visionary: a framework for analysis and visualization of provenance data

A hybrid quantum approach to leveraging data from HTML tables

An efficient lightweight coordination model to multi-agent planning

Determining maximum cliques for community detection in weighted sparse networks

Named entity disambiguation in short texts over knowledge graphs

Premium Partner