A general algorithm
In this section, we describe an algorithm for counting all journey routes from a given node to all the other nodes in the TVG, passing through any possible intermediate node. This module is at the basis of the computation of foremost betweenness.
We start by introducing some notations and functions used in the algorithm. Given an edge (
x,
y), let function arriv(
x,
y,
t) return the arrival time to
y, leaving
x at time
t. Given a time-stamped journey
\(\pi \), with an abuse of notation, let us indicate by arriv
\((\pi )\) the arrival time at the last node of
\(\pi \). The foremost arrival time in
G to any node
v from a given source
s can be computed using the Algorithm from [
30]. Let foremost(
s,
v) denote such a time.
We are now ready to describe the algorithm. The input of Algorithm CountFormemostJRoutes is a pair (G, s), where \(G=(V,E) \) is a TVG and s is a starting node. The algorithm returns a matrix Count\(_s[x,y]\), for all \(x,y \in V\) containing the number of foremost journeys from s to y passing through x (note that Count\(_s[x,x]\) denotes the number of foremost journeys from s to x).
The counting algorithm is simple and it is based on multiple Depth-First Search (DFS) traversals. It consists of visiting every journey route of G starting from s, incrementing the appropriate counters every time a newly encountered journey is foremost. We remind that a node can reappear more than once in a journey route, with various occurrences corresponding to different times. This means that we need to store the time when a node is visited in the journey route so that, if it is visited again, we can determine whether the subsequent visit corresponds to a later time and thus the node has to be considered again. Note that this is the main difference with respect to a DFS in a static graph, where instead every node is visited exactly once.
To perform the traversal managing multiple visits (corresponding to different traversal times), we use two stacks: Path and S, where Path contains the nodes corresponding to the journey currently under visit and S contains the edges to be visited. In both Path and S, we store also time-stamps, to register the time of the first visit of nodes in Path and the time for the future visits of edges in S. If a node happens to be revisited at a later time, in fact, it is treated as a new node.
The traversal starts as a typical DFS, pushing the incident edges of the source s onto stack S with their arrival times in these journeys (lines 4–6). The nodes corresponding to the current journey under visit are kept in the second stack Path (these nodes are implicitly marked visited), initially containing only the source. When considering the next candidate edge (x, y) to visit (line 8), we may be continuing the current journey (if the top of stack Path contains x) or we may have backtracked to some previous nodes (if the top is different from x). In this last case, the content of Path is updated to reflect the backtracking (lines 9–11). After visiting a node y (line 16), the DFS continues pushing on S the edges incident to y that are feasible with the current journey under visit (i.e., the edges whose target is not already in Path, and whose latest traversal time is greater than or equal to the earliest arrival time at y) (lines 17–19). The if clause at line 20 checks whether the discovered journey is foremost and updates the corresponding counters.
In other words, as soon as a journey \(\pi = [(x_0,x_1),(x_1,x_2), \ldots ,(x_{k-1},x_k)]\) is encountered in the traversal, Count\([x_i,x_k]\), \(i\le k\) is updated only if \(\pi \) is a foremost journey, and, regardless of it being foremost, the traversal continues pushing on the stack the edges incident to \(x_k\) that are temporally feasible with \(\pi \). Whenever backtracking is performed, however, the already visited nodes on the backtracking path are popped from Path (thus implicitly remarked unvisited) in such a way that they can be revisited as part of different journey routes, not explored yet.
Observations on complexity
The running time of Algorithm CountFormemostJRoutes is linear in the number of nodes belonging to different foremost journeys, because it traverses each one of them. However, depending on the structure of the TVG, such a number could be exponential, thus an overall exponential worst-case complexity.
More precisely, let \(\mu _s\) be the number of foremost journeys from a source node s to all the other nodes in \(\mathcal{G}\), \(n(\mu _s)\) be the number of nodes belonging to those journeys, and n the number of nodes of \(\mathcal{G}\). Moreover, let \(\mu \) and \(n(\mu )\) be, respectively, the overall number of foremost journeys in \(\mathcal{G}\) and the overall number of nodes in those journeys. The algorithm to count all foremost journeys from s to all the other nodes traverses every foremost journey from the source to any other node, and it performs an update for every visited node in each foremost journey that it encounters. Thus, its time complexity is \(O(n(\mu _s))\). To compute foremost betweenness, the algorithm has to be repeated for every possible source, thus traversing every possible foremost journey in \(\mathcal{G}\) for a total time complexity of \(O(n(\mu ))\). Since \(n(\mu )\) could be exponential in n, we have a worst-case exponential complexity in the size of the network. Note that the high cost is inevitable for any deterministic algorithm to compute foremost betweenness.
Algorithm for KnowledgeNet
Algorithm 1 is applicable to a general TVG. We now consider a very special type of TVG with specific temporal restrictions that correspond to the type of network that we analyze in this paper. One such peculiarity is given by
instant edges (edges that appear only during a unique time interval). Another characteristic is
zero latency (i.e., edges that can be traversed instantaneously). Finally, in this setting, we base betweenness computation on increasing journey routes.
We then describe a variation of the general algorithm specifically designed for those conditions (instant edges with zero latency), and we compute foremost betweenness applying the foremost betweenness formula restricted to foremost increasing journeys.
Given a TVG \(G=(V,E)\), since we assume the presence of instant edges, we can divide time in consecutive intervals \(I_1, I_2, \ldots , I_k\) corresponding to k snapshots \(G_1, G_2, \ldots G_k\) (\(G_i=(V_i,E_i)\)), in such a way that \((x,y)\in E_i\) implies that \((x,y)\not \in E_j\) for \(j\ne i\). Furthermore, we know by \(\zeta =0\) that an edge can be traversed in zero time.
The key idea that can be applied to this very special structure is based on the observation that, given a foremost route \(\pi _{x,y}\) from x to y with edges in time intervals \(I_j\), provided that \(j>i\) and j appears immediately after i, and given any journey route \(\pi '_{s,x}\) from s to x with edges only in \(I_i\), the concatenation of \(\pi '_{s,x}\) and \(\pi _{x,y}\) is a foremost route from s to y, passing through x.
This observation leads to the design of an algorithm that starts by counting the foremost routes belonging to the last snapshot \(G_k\) only, and proceeds backwards using the information already computed. More precisely, when considering snapshot \(G_i\) from a source s, the goal is to count all foremost routes involving only edges in \(\cup _ {j\ge i} E_j\) (i.e., with time intervals in \(\cup _ {j\ge i} I_j\)), and when doing so, all the foremost routes involving only edges strictly in the “future” (i.e., time intervals \(\cup _ {j>i} I_j\)) have been already calculated for any pair of nodes. The already computed information is used when processing snapshot \(G_i\) in a dynamic programming fashion.
As for Algorithm 1, the input of Algorithm 2 is a pair: a snapshot \(G_i\) and a starting node s. The algorithm returns an array, Count\(_{s}[u,v]\), where Count\(_{s}[u,v]\) for all \(u,v \in V\) contains the number of foremost journeys from s to u passing through v counted so far (i.e., considering only edges in \(\cup _{j\ge i} E_j\)).
The actual counting algorithm on snapshot \(G_i\) is a modified version of Algorithm 1, still based on Depth-First Search (DFS) traversal. Lines 2–11 are exactly the same as in Algorithm 1, except that here we do not need to keep track of the arrival time for each edge, as we run Algorithm 2 in a single snapshot and the latency for edges is zero.
In line 13, we examine whether the target of the current edge y has already been visited or not. If it has not been visited already, it either falls in the current snapshot, or it flows into the next snapshot.
In the case where y stays in the current snapshot (lines 15–22), we push its adjacent nodes into the stack S and determine whether the route ending at y is foremost. If a foremost route is discovered at y, we update Count\(_{s}[z,v]\) by incrementing its value for all \(z \in Path\) (z being the node that falls on the journey route from s to y).
If instead it is not a foremost route in the current interval (lines 23–25), meaning that y is a node that existed in the “future,” a special update is performed using the data already calculated for the “future snapshots.”
More precisely, when a journey route (in this case a foremost journey route) from
s to
x (
\(s\leadsto x\)) is a prefix of a journey route
\(x\leadsto y\) at a later time snapshot, we perform a procedure called
SpecialCount (Algorithm 3). The special count procedure involves aggregating the values of Count
\(_s[v,x]\) with Count
\(_x[v',y]\), for all nodes (resp.
v,
\(v'\)) occurring in the journey routes between
s and
x and between
x and
y (see Algorithm 3). Algorithm 3 simply calculates the product of the number of foremost journeys between two routes
\(s\leadsto x\), and
\(x\leadsto y\), if they do not share any vertex (lines 4–9). If instead they share some vertex
v, the calculation is slightly more complicated: let
a be the number of foremost journeys from
s to
y where
v is visited at least once on the route between
x and
y; let
b be the number of foremost journeys from
s to
y where
v is visited at least once on the route between
s and
x; and let
c be the sum of
a and
b.
c represents the number of all foremost journeys from
s to
y that pass through
v. However,
c counts the journey route passing through
v multiple times if
v happened to exist in both Count
\(_s[v,x]\) and Count
\(_x[v,y]\), and we need to remove such multiple counting of journeys, which is done along with the update to Count
\(_s[v,y]\) in line 13.
Observations on Complexity
The worst-case time complexity of Algorithm 2, CountAllZeroLatency, is the same as the one of the general algorithm, CountFormemostJRoutes. In our network, however, it performed better than Algorithm 1. We try to explain below the reasons for this.
Algorithm 2 has to be executed in anti-chronological order of the different snapshots, starting from the last one, since it uses the previously calculated results in the computation of the new results. This approach is amenable to concurrent computations. In fact, since the graph is divided into independent snapshots, the number of all journeys can be computed separately for each snapshot, and the result of the calculation can be aggregated at the end. This has the advantage of eliminating all the special updates from the first part of the algorithm (while detecting all the journey routes), and deferring them to the second part (when aggregating all the information for the final update). Thus, instead of performing the special count at each level, we can postpone it to the last step of the algorithm, and loop once through all the collected counts with hard-coded intervals in the loop.
While not being advantageous in worst-case scenarios, this strategy results in a more efficient solution from a practical point of view. Still, the algorithm is very costly, even in such a small network (KnowledgeNet has 366 vertices and 750 edges) and it did run in almost a month when implemented in C++ with a machine with 40 cores and 1TB RAM.