In this section, we present our algorithmic methods. We start by presenting the dynamic programming polynomial-time algorithms for k-Densest-Alignment-Episodes (an algorithm called DP) and k-\(\ell\)-Densest-Alignment-Episodes (an algorithm called L-DP), then we present a heuristic approach applied on an optimal solution of k-\(\ell\)-Densest-Alignment-Episodes in order to compute a (possibly suboptimal) solution of k-Densest-Alignment-Episodes.
3.1 Polynomial-time algorithms for k-Densest-Alignment-Episodes and k-\(\ell\)-Densest-Alignment-Episodes
First, we present the DP algorithm for k-Densest-Alignment-Episodes. Given an alignment graph \(G_A\) over the time interval [1, j], with \(j \le t_\textrm{max}\), we consider the function D[j, h], with \(h \le k\) and \(0 \le j \le t_\textrm{max}\), that is equal to the density of h densest episodes in \(G_A[1,j]\).
Given two timestamps i and j, with \(1 \le i \le j \le t_\textrm{max}\), we denote by Dens\((G_A[i,j])\) the density of a densest subgraph in \(G_A[i,j]\), where the subgraph must be defined in timestamp j (not necessarily in i). Assume that Dens\((G_A[i,j])\) has already been computed for all values \(1 \le i \le j \le t_\textrm{max}\), function D(j, h), \(1 \le i \le j \le t_\textrm{max}\), \(1 \le h \le k\), can be computed as follows:
For
\(h \ge 2\) and
\(j \ge 2\):
$$\begin{aligned} D(j,h) = \max \left\{ \begin{array}{ll} \max _{2 \le i \le j} D(i-1,h-1) + \textrm{Dens}(G_A[i,j]) &{} \text { with } j \ge 2 \\ D(j-1,h) &{} \\ \end{array} \right. \end{aligned}$$
(1)
For
\(h = 1\) and
\(j \ge 2\):
$$\begin{aligned} D(j,1) = \max _{1 \le i \le j} \left\{ \begin{array}{ll} \textrm{Dens}(G_A[i,j]) &{} \\ D(j-1,1) &{} \\ \end{array} \right. \end{aligned}$$
(2)
For
\(j = 1\):
$$\begin{aligned} D(1,h) = \left\{ \begin{array}{ll} -\infty &{} \text { if } h \ge 2 \\ \textrm{Dens}(G_A[1,1]) &{} \text { if } h = 1 \end{array} \right. \end{aligned}$$
(3)
Next, we prove the correctness of Eqs.
1,
2, and
3.
Proof
We prove the lemma by induction on h and on j.
If
\(h=1\), we prove the correctness of Eqs.
2 and
3 by induction on
\(j \ge 1\). In the base case, when
\(j = 1\), then
\(D(1,1) = q\) if and only if there exists a densest subgraph in timestamp 1 having density
q, thus proving the correctness of Eq.
3.
Now, we show the correctness for \(j\ge 2\), assuming the correctness for \(j-1\). Consider one episode of maximum density contained in interval [1, j], then either it is defined in timestamp j, hence it has density equal to an episode in \(G_A[i,j]\), for some i with \(1 \le i \le j\), or it is not defined in position j and by induction hypothesis it has density equal to \(D(j-1,1)\).
Assume now that \(D(j,1)=q\). If \(D(j,1) = \max _{1 \le i \le j}Dens(G_A[i,j])\), then there exists one episode defined in [i, j] of density q. Assume that \(D(j,1) = D(j-1,1) = q\), then by induction hypothesis it holds that there exists an episode of density q in interval \([1,j-1]\). We can thus conclude that for \(h = 1\) the lemma holds.
Assume now that the lemma is correct for
\(h-1 \ge 1\), we show that it holds for
h. More precisely, we prove that Eqs.
1 and
3 hold by induction on
\(j \ge 1\). In the base case, when
\(j = 1\), then clearly
\(D(1,h) = -\infty\) as it is not possible to define
\(h \ge 2\) episodes in a single timestamp, thus proving the correctness of Eq.
3.
Consider the case \(j \ge 2\). Assume that there is a set \(\mathcal {S}\) of h disjoint episodes defined in interval [1, j], where the last episode in \(\mathcal {S}\) is defined over interval \([i+1,z]\), with \(i \le z \le j\) and it has density \(q_1\). If \(z < j\), then it holds that \(D(j,h) = D(j-1,h)\), and by induction hypothesis \(D(j-1,h) = q\). If \(z = j\), then \(\mathcal {S}\) contains \(h-1\) disjoint episodes defined in interval [1, i], hence by induction hypothesis \(D(i,h-1) = q - q_1\) and, since \({\rm Dens}(G_A[i+1,j]) = q_1\), it follows that \(D(j,h) = q\).
Assume that
\(D(j,h) = q\). Since
\(h > 1\), by the definition of the recurrence (Eq.
1)
\(D(j,h) = D(j-1,h)\) or
\(D(j,h) = D(i,h-1) + Dens(G_A[i+1,j])\). In the first case, by induction hypothesis on
j there exists a set of
h disjoint episodes of density
q defined in interval
\([1,j-1]\), thus also in [1,
j]. In the second case, there exists a value
i, with
\(1 \le i < j\), such that
\(D(i,h-1) = q- q_1\) and
\(Dens(G_A[i+1,j]) = q_1\). Then, by induction hypothesis there exists a set of
\(h-1\) disjoint episodes of density
\(q-q_1\) defined interval [1,
i] and an episode in
\([i+1,j]\) of density
\(q_1\). Hence, there exist
h disjoint episodes in [1,
j] of overall density
q, thus concluding the proof.
\(\square\)
The previous lemma leads to the following result.
Next, we present the polynomial-time algorithm, called L-DP, for k-\(\ell\)-Densest-Alignment-Episodes. We recall that an \(\ell\)-constrained episode is an episode defined on an interval of length at most \(\ell\). Similarly to k-Densest-Alignment-Episodes, given an alignment graph \(G_A\) over the time interval [1, j], with \(j \le t_\textrm{max}\), we define the function \(D_c[j,h]\), with \(h \le k\) and \(1 \le j \le t_\textrm{max}\), that is equal to the density of h densest \(\ell\)-constrained episodes in \(G_A[1,j]\).
Recall that, given an alignment graph \(G_A[i,j]\), \(Dens(G_A[i,j])\) denotes a densest subgraph in \(G_A[i,j]\), that it is defined on an interval that must include timestamp j.
The recurrence to compute \(D_c(j,h)\), for each \(j\in {\{1,2,\ldots ,t_\textrm{max}}\}\) is defined as follows:
For
\(h \ge 2\) and
\(j \ge 2\):
$$\begin{aligned} D_c(j,h) = \max \left\{ \begin{array}{ll} \max _{2 \le i \le j} D_c(i-1,h-1) + \textrm{Dens}(G_A[i,j]) &{} \text { with } j-i+1 \le \ell \\ D_c(j-1,h) &{} \\ \end{array} \right. \end{aligned}$$
(4)
For
\(h = 1\) and
\(j \ge 2\):
$$\begin{aligned} D_c(j,1) = \max _{1 \le i \le j} \left\{ \begin{array}{ll} \textrm{Dens}(G_A[i,j]) &{} \text { with } j-i+1 \le \ell \\ D_c(j-1,1) &{} \\ \end{array} \right. \end{aligned}$$
(5)
For
\(j = 1\):
$$\begin{aligned} D_c(1,h) = \left\{ \begin{array}{ll} -\infty &{} \text { if } h \ge 2 \\ \textrm{Dens}(G_A[1,1]) &{} \text { if } h = 1 \end{array} \right. \end{aligned}$$
(6)
Similarly to
k-Densest-Alignment-Episodes, we can prove the correctness of the recurrence described in Eqs.
4,
5 and
6.
Proof
We prove the lemma by induction on h and on j.
If
\(h=1\), then we prove the correctness of Eqs.
5 and
6 by induction on
\(j \ge 1\). In the base case, when
\(j = 1\), then
\(D_c(1,1) = q\) if and only if there exists a densest subgraph defined in timestamp 1 of density
q. Now, we consider the case
\(j\ge 1\) and we prove that it is correct, assuming the correctness of Eqs.
5 and
6 for
\(j-1\). Consider one
\(\ell\)-constrained episodes of maximum density in [1,
j], then either it is defined in timestamp
j, hence it has density equal to
\(\max _{1 \le i \le j}\textrm{Dens}(G_A[i,j])\) such that
\(j-i+1 \le \ell\), or it is not defined in position
j and then by induction hypothesis has density equal to
\(D_c(j-1,1)\).
Assume that \(D_c(j,1)=q\). If it is defined as \(D_c(j,1) = \max _{1 \le i \le j}\textrm{Dens}(G_A[i,j]) = q\), with \(j-i+1 \le \ell\), then there exist one \(\ell\)-constrained episode defined in [i, j] of density q. Assume that \(D_c(j,1) = D_c(j-1,1) = q\), then by induction hypothesis it holds that there exists an \(\ell\)-constrained episode of density q in \(G_A[1,j-1]\).
Assume that the lemma holds for
\(h-1 \ge 1\), we show that it holds for
h. We prove that Eqs.
4 and
6 hold by induction on
\(j \ge 1\). In the base case, when
\(j = 1\), then clearly
\(D_c(0,h) = -\infty\) as it is not possible to define more than one episode in a single timestamp.
Consider a set \(\mathcal {S}\) of h disjoint \(\ell\)-constrained episodes in \(G_A[1,j]\), where the last episode in \(\mathcal {S}\) is defined over interval [i, z], with \(i \le z \le j\) and \(z- i +1 \le \ell\), and it has density \(q_1\). If \(z < j\), it holds that \(D_c(j,h) = D_c(j-1,h)\), and by induction hypothesis it holds that \(D_c(j-1,h) = q\). If \(z = j\), then there exists \(h-1\) \(\ell\)-constrained episodes of density \(q - q_1\), hence, by induction hypothesis, it holds that \(D_c(i,h-1) = q - q_1\) and since \(\textrm{Dens}(G_A[i+1,j]) = q_1\), it follows that \(D_c(j,h) = q\).
Assume that \(D_c(j,h) = q\). Since \(h > 1\), by the definition of the recurrence \(D_c(j,h) = D_c(j-1,h)\) or \(D_c(j,h) = \max _{2 \le i \le \ell }D_c(i,h-1) + \textrm{Dens}(G_A[i+1,j])\), where \(j-i+1 \le \ell\). In the first case, by induction hypothesis on j there exists a set of h disjoint \(\ell\)-constrained episodes of density q in \(G_A[1,j-1]\). In the second case, it holds that \(D_c(i,h-1) = q- q_1\) and Dens\((G_A[i+1,j]) = q_1\). Then, by induction hypothesis there exists a set of \(h-1\) disjoint \(\ell\)-constrained episodes of density \(q-q_1\) and an \(\ell\)-constrained episode in \([i+1,j]\) of density \(q_1\) (the episode is \(\ell\)-constrained since \(j-i+1 \le \ell\)). Hence, there exist h disjoint episodes in [1, j] of overall density q, thus concluding the proof. \(\square\)