1 Introduction
1.1 Setting and Existing Approaches
1.2 Outline of the Main Contributions
1.3 Organization of the Paper
2 Preliminaries and Notation
2.1 Time-Dependent Measures
2.2 Optimal Transport Regularizer
2.3 Extremal Points of \(J_{\alpha ,\beta }\)
3 The Dynamic Inverse Problem and Conditional Gradient Method
3.1 Functional Analytic Setting for Time-Continuous Fidelity Term
3.2 Surrogate Minimization Problem
3.3 Linearized Problem
3.4 The Primal-Dual Gap
4 The Algorithm: Theoretical Analysis
4.1 Core Algorithm
4.1.1 Iterates
4.1.2 Insertion Step
-
if \(\langle \rho _{\gamma ^*}, w^n\rangle \le 1\), then \(\mu ^n\) is solution to (\(\mathcal {P}\)). The algorithm outputs \(\mu ^n\) and stops,
-
if \(\langle \rho _{\gamma ^*}, w^n\rangle > 1\), then \(\mu ^n\) is not a solution to (\(\mathcal {P}\)) and \(\gamma ^* \in \mathrm{AC}^2\). The found atom \(\mu _{\gamma ^*}\) is inserted in the n-th iterate \(\mu ^n\) and the algorithm continues.
4.1.3 Coefficients Optimization Step
4.1.4 Algorithm Summary
4.2 Quadratic Optimization
4.3 Convergence Analysis
4.4 Stopping Condition
4.5 Time-Discrete Version
4.5.1 Time-Discrete Framework
4.5.2 Adaption of Algorithm 1 to the Time-Discrete Setting
-
if \(\langle \rho _{\gamma ^*}, w^n\rangle _{\mathcal {D}} \le 1\), then \(\mu ^n\) is solution to (\(\mathcal {P}_{discr}\)). The algorithm outputs \(\mu ^n\) and stops,
-
if \(\langle \rho _{\gamma ^*}, w^n\rangle _{\mathcal {D}} > 1\), then \(\mu ^n\) is not a solution to (\(\mathcal {P}_{discr}\)) and \(\gamma ^* \in \mathrm{AC}^2\). The found atom \(\mu _{\gamma ^*}\) is inserted in the n-th iterate \(\mu ^n\) and the algorithm continues.
5 The Algorithm: Numerical Implementation
5.1 Insertion Step Implementation
5.1.1 Minimization Strategy
5.1.2 Descent Operator
5.1.3 Random Starts
sample
, which inputs a dual variable w and outputs a randomly generated curve \(\gamma \in \mathrm{AC}^2\).5.1.4 Crossovers Between Stationary Points
crossover
, which inputs two curves \(\gamma _1, \gamma _2 \in \mathrm{AC}^2\) and outputs a set of curves in \(\mathrm{AC}^2\), possibly empty.5.1.5 Multistart Gradient Descent Algorithm
MultistartGD
, which inputs a set of curves \(\mathcal {A}\) and a dual variable w, and outputs a (possibly empty) set \(\mathcal {S}\) of stationary points for F defined at (68). We now describe how to interpret such subroutine in the context of Algorithm 1, and how it can be used to replace the insertion step operation at line 4.sample
described in Sect. 5.1.3. We then descend \(\gamma \), obtaining the new curve \(\gamma ^*:=\mathcal {G}_{w^n}(\gamma )\), where \(\mathcal {G}_{w^n}\) is defined at (72). If the outputted point \(\gamma ^*\) does not belong to the set of known stationary points \(\mathcal {S}\), and \(\gamma ^* \ne \gamma _{\infty }\), then we first compute the crossovers of \(\gamma ^*\) with all the elements of \(\mathcal {S}\), and afterward insert it in \(\mathcal {S}\). After \(N_\mathrm{max}\) iterations, the set \(\mathcal {S}\) is returned. Notice that \(\mathcal {S}\) could be empty only if \(\mathcal {A}=\emptyset \), i.e., if MultistartGD
is called at the first iteration of Algorithm 1 (see Sect. 5.1.2).5.2 Acceleration Strategies
5.2.1 Multiple Insertion Step
5.2.2 Sliding Step
5.3 Full Algorithm
5.3.1 Algorithm Summary
MultistartGD
at line 4, which is called with arguments \(\varvec{\gamma }\) and \(w^n\). The output is a set of curves \(\mathcal {S}\) which contains stationary points for the functional F at (68) with respect to the dual variable \(w^n\). If \(\mathcal {S}=\emptyset \), the algorithm stops and outputs the current iterate \(\mu ^n\). As observed in Sect. 5.1.2, this can only happen at the first iteration of the algorithm. Otherwise, in line 7, the function order
is employed to input \(\mathcal {S}\) and output a tuple of curves, obtained by ordering the elements of \(\mathcal {S}\) increasingly with respect to their value of F. As anticipated in Sect. 5.1, the first element of \(\texttt {order}(\mathcal {S})\), named \(\gamma _{N_n + 1}^*\), is considered to be the best available candidate solution to the insertion step problem (68), and is used in the stopping condition at lines 8, 9. Such stopping condition has been used similarly in Algorithm 1, with the difference that in Algorithm 2 the curve \(\gamma _{N_n + 1}^*\) is not necessarily the global minimum of F, but, in general, just a stationary point. We further discuss such stopping criterion in Sect. 5.3.2. After, the found stationary points are inserted in the current iterate, and the algorithm alternates between the coefficients optimization step (Sect. 4.1.3) and the sliding step (Sect. 5.2.2). Such operations are executed from line 11 to 16 in Algorithm 2, for \(K_\mathrm{max}\) times.5.3.2 Stopping Condition
5.3.3 Convergence and Numerical Residual
6 Numerical Implementation and Experiments
-
the considered domain is \(\Omega := (0,1)\times (0,1) \subset \mathbb {R}^2\),
-
the number of time samples is fixed to \(T = 50\), with \(t_i := i/T\) for \(i =0,\ldots ,T\),
-
the data spaces \(H_{t_i}\) and forward operators \(K_{t_i}^* :\mathcal {M}(\overline{\Omega }) \rightarrow H_{t_i}\) are as in Sect. 6.1.1, and model a Fourier transform with time-dependent spatial undersampling. A specific choice of sampling pattern will be made in each experiment,
-
in each experiment problem, (\(\mathcal {P}_{discr}\)) is considered for specific choices of regularization parameters \(\alpha ,\beta >0\) and data \(f=(f_{t_0},\ldots , f_{t_T})\) with \(f_{t_i} \in H_{t_i}\),
-
the crossover parameters described in Sect. 5.1.4 are chosen as \(\varepsilon =0.05\) and \(\delta = 1/T\). The random starts presented in Sect. 5.1.3 are implemented using the function \(Q(x) = \exp (\max (x + 0.05,0)) - 1\). These parameter choices are purely heuristical, and there is no reason to believe that they are optimal.
6.1 Measurements, Data, Noise Model and Visualization
6.1.1 Measurements
6.1.2 Data
6.1.3 Noise Model
6.1.4 Visualization via Backprojection
6.1.5 Reconstruction’s Intensities
6.2 Numerical Experiments
6.2.1 Experiment 1: Single Atom with Constant Speed
6.2.2 Experiment 2: Complex Example with Time-Varying Measurements
Relative Noise | \( {(\alpha ,\beta )}\) | Restarts | Iterations | Execution time |
---|---|---|---|---|
0% | (0.1, 0.1) | 200 | 4 | 1.5 hours |
20% | (0.1, 0, 1) | 1000 | 7 | 5.8 hours |
60% | (0.1, 0.1) | 10000 | 21 | 10.5 days |
60% | (0.3, 0.3) | 5000 | 4 | 16.8 hours |
6.2.3 Experiment 3: Crossing Example
6.3 Remark on Execution Times
sample
and crossover
included in Subroutine 1, the coefficient optimization step and the sliding step, have, in comparison, negligible computational cost. In particular, the sliding step grows in execution time with the number of active atoms, but this effect appears toward the last iterations of the algorithm, and it is shadowed by the insertion step, whose gradient descents become longer as the iterate is getting closer to the optimal value. As a confirmation of the role of the insertion step in the overall computational cost, one can see that the execution times of the algorithm linearly depend on the total number of gradient descents that are run in each example. Indeed, since the total number of gradient descents is given by the number of restarts multiplied by the number of iterations (see Table 1), the ratio between the execution times and the total number of gradient descents is of the same order for all the presented examples (between \(0.0008\) and \(0.0019\) hour/gradient descent). It is worth pointing out that the multistart gradient descent is a highly parallelizable method. Some early tests in this direction indicate that much lower computational times are achievable by simultaneously computing gradient descents on several GPUs for the presented examples.