Weighted digraphs are used to model a variety of natural systems and can exhibit interesting structure across a range of scales. In order to understand and compare these systems, we require stable, interpretable, multiscale descriptors. To this end, we propose grounded persistent path homology (GrPPH)—a new, functorial, topological descriptor that describes the structure of an edge-weighted digraph via a persistence barcode. We show there is a choice of circuit basis for the graph which yields geometrically interpretable representatives for the features in the barcode. Moreover, we show the barcode is stable, in bottleneck distance, to both numerical and structural perturbations.
Notes
Communicated by Herbert Edelsbrunner.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
1 Introduction
Directed graphs with positive edge-weights arise both as natural objects of mathematical study and as useful models of real-world systems (e.g. [3, 4, 30, 36]). A common task is to distinguish between weighted digraphs. Frequently, this is achieved by defining an invariant, i.e. a map \(\mathop {\mathrm {\mathcal {I}}}\limits :\varvec{\textrm{WDgr}}\rightarrow X\) from weighted digraphs into some set X, together with a metric d on X. The metric allows us to quantitatively measure to what extent a pair of weighted digraphs differ. When \(\mathop {\mathrm {\mathcal {I}}}\limits \) is well understood we may be able to explain why the weighted digraphs differ. In order to determine desirable characteristics of such an \(\mathop {\mathrm {\mathcal {I}}}\limits \), consider the examples shown in Fig. 1, in which edge weights correspond to length as drawn.
×
Consider each \(G_i\) from the perspective of a particle flowing through the digraph, such that the particle may only traverse an edge in the direction specified and the time it takes corresponds to the weight. To the particle, loops (or circuits) in the graph are significant features. However, loops can vary greatly based on the orientation and weight of constituent edges.
Advertisement
Despite sharing the same underlying undirected graph, \(G_1\) and \(G_2\) support very different flows since \(G_1\) has a single source and a single sink whereas \(G_2\) has 4 sources and 2 sinks. To reflect this, \(d(\mathop {\mathrm {\mathcal {I}}}\limits (G_1), \mathop {\mathrm {\mathcal {I}}}\limits (G_2))\) should be large. In contrast, \(G_3\) has a different undirected graph but can be obtained from \(G_1\) by simply subdividing each edge. In applications, this may arise from a finer resolution image of the same system. A suitable invariant should be relatively stable to such subdivisions, ideally converging to a limiting value upon iterated subdivision. Finally, \(G_4\) has a higher circuit rank but the new loops are on a small scale, whilst the large scale organisation is mostly similar to \(G_1\). Therefore, \(d(\mathop {\mathrm {\mathcal {I}}}\limits (G_1), \mathop {\mathrm {\mathcal {I}}}\limits (G_4))\) should be small and the difference between \(\mathop {\mathrm {\mathcal {I}}}\limits (G_1)\) and \(\mathop {\mathrm {\mathcal {I}}}\limits (G_4)\) should reflect this multiscale comparison.
For successful application, any invariant should be stable to a reasonable noise model. A typical requirement is that \(\mathop {\mathrm {\mathcal {I}}}\limits \) is continuous (or better yet Lipschitz), with respect to a choice of metric on \(\varvec{\textrm{WDgr}}\). Designing metrics for graphs is an active area of research but a common choice is the graph edit distance [19]. For this metric, costs are assigned to operations such as deleting an edge or modifying a weight, then the distance between two graphs is the minimal cumulative cost of modifying one into the other. Since assigning costs to graph operations is somewhat arbitrary, it is reasonable instead to require a bound on \(d(\mathop {\mathrm {\mathcal {I}}}\limits (G),\mathop {\mathrm {\mathcal {I}}}\limits ({\mathbb {O}G))}\), over a range of graph operations, \(\mathbb {O} :\varvec{\textrm{WDgr}}\rightarrow \varvec{\textrm{WDgr}}\).
Finally, in many applications (particularly in biology), it is important that any invariant \(\mathop {\mathrm {\mathcal {I}}}\limits \) is interpretable. That is, one must be able to explain why the invariant has the value it does. Typically this is achieved through the identification of key contributing subgraphs.
In summary, we seek an invariant for weighted digraphs which
(a)
distinguishes graphs with different flow profiles due to directionality;
(b)
can detect and describe features (i.e. loops) across a range of scales;
(c)
is stable to reasonable perturbations and converges under iterated subdivision; and
(d)
is interpretable, e.g. through the identification of important subgraphs.
Pursuant to these goals, we employ two tools from topological data analysis (TDA) – path homology and persistent homology. A number of homology theories for digraphs have been developed (see summary in Sect. 2 of [7]). Path homology is one such theory [24], which is sensitive to directionality and has useful functorial properties [13, 23]. Persistent homology is a tool for developing stable descriptors [1] that extract relevant information in multiscale scenarios. As such, persistent homology has seen successful applications to fields including neuroscience [6, 21, 22, 37], vasculature [32, 35] and financial networks [26], to name but a few [20]. A theory of persistent path homology (PPH) was proposed by Chowdhury and Mémoli [13] and is a stable descriptor for directed networks. In a search to develop an interpretable invariant for weighted digraphs, which respects the inert topology of the underlying digraph, we are lead to an alteration of PPH which we prove meets goals (a)-(d).
Advertisement
1.1 Contributions and Outline
In Sect. 2 we give an overview of path homology and persistent homology, and set up the categorical framework for the rest of the paper. In particular, we define a category of weighted digraphs \(\varvec{\textrm{Cont}}\varvec{\textrm{WDgr}}\) in which morphisms are digraph maps of the underlying digraphs as well as contractions of the natural, shortest-path quasimetric.
In Sect. 3.1 we review a standard pipeline for extracting a topological invariant of a weighted digraph, via PPH. Evaluating this invariant against our stated goals, motivates an alteration of this pipeline, which we call the ‘grounded pipeline’. We define this new pipeline and describe categories upon which the resulting invariant is functorial in Sect. 3.2.
Arising from the grounded pipeline, our main contribution is Definition/Theorem 3.16, wherein we define grounded persistent path homology (GrPPH). This new invariant is a functor
where \(\varvec{\textrm{PersVec}}\) is the category of persistent vector spaces. Couching this definition in category theory yields a strong framework for comparing weighted digraphs through the invariant. Indeed, we use this functoriality later in the paper to aid the proof of decomposition and stability results.
Section 4 is devoted to developing an interpretation of GrPPH. The early subsections are dedicated to understanding the features detected by \( ^{g}{\mathcal {H}}_1(G) \); the following theorem summarises our findings.
Theorem 1.1
Given a weighted digraph \(G\in \varvec{\textrm{WDgr}}\), denote the underlying undirected graph by \(\mathcal {U}(G)\).
(a)
All features in \( ^{g}{\mathcal {H}}_1(G) \) are born at \(t=0\);
(b)
\( ^{g}{\mathcal {H}}_1(G) \) at time \(t=0\) is the cycle space of \(\mathcal {U}(G)\); and moreover
(c)
there is a choice of circuits in \(\mathcal {U}(G)\) whose homology classes generate \( ^{g}{\mathcal {H}}_1(G) \).
These results demonstrate how GrPPH is sensitive to circuits in the digraph at all scales, meeting goal (b), and can be interpreted through a persistence basis of such circuits, meeting goal (d). We also discuss how to use \( ^{g}{\mathcal {H}}_1(G) \) to assign a ‘scale’ to any circuit in \(\mathcal {U}(G)\) in Sect. 4.2. In Sect. 4.4, we prove that if G can be decomposed into smaller parts, then GrPPH also decomposes.
Theorem 1.2
Given a weighted digraph \(G\in \varvec{\textrm{WDgr}}\), if G decomposes as a wedge decomposition \(G=G_1 \vee _{\hat{v}} G_2\) or a disjoint union \(G = G_1 \sqcup G_2\) then
In Sect. 5 we investigate the stability of \( ^{g}{\mathcal {H}}_1\). We employ path homotopy theory to prove the main stability theorem (Theorem 5.8), which provides a strategy for obtaining bounds on the bottleneck distance of the barcode, upon perturbing the input weighted digraph. Applying this result, we find local stability to weight perturbation (Theorem 5.11), edge subdivision (Theorem 5.16) and certain classes of edge collapses (Theorem 5.28) and edge deletions (Theorem 5.38). In particular, edge subdivision stability automatically implies that our invariant converges under iterated subdivision (Corollary 5.22). In contrast, the descriptor is unstable to generic edge collapses (Theorem 5.35) and edge deletions (Theorem 5.42), which we demonstrate through a number of counter-examples. We argue that these stability properties suffice to meet goal (c) and indeed stability to larger classes of edge collapses and edge deletion would be undesirable. For a summary of all stability results obtained, please consult Table 1.
In order to build intuition for what is measured by GrPPH, we compute a number of illustrative examples in Sect. 6. In particular, in Sect. 6.1, we consider a simple, cycle graph and determine the limiting value of GrPPH under iterated edge subdivision. In Sect. 6.2, we compute the invariant for a number of small square digraphs with varying edge orientations, illustrating sensitivity to directionality, as required by goal (a). Finally, in Sect. 6.5, we describe properties of \( ^{g}{\mathcal {H}}_1(G) \) when G is a complete digraph, with weights forming a finite quasimetric space, and compute the descriptor for a point sample of the unit circle.
1.2 Computations
An algorithm for computing PPH in arbitrary degrees was proposed by Chowdhury and Mémoli [13]; a more efficient algorithm for computing PPH in degree 1 was later proposed by Dey, Li, and Wang [17]. We modify the latter algorithm to compute GrPPH; an implementation, capable of computing representatives, is available as a Python package [8].
2 Background
2.1 Basic Notation and Category Theory
We introduce some basic language for graphs and categories, and then recall the definitions of path homology and persistent homology, the two theories we combine later in a new way to define our invariant.
Notation 2.1
For \(d\in \mathbb {N}\), the standard d-simplex is
(2.1)
Notation 2.2
Given a category \(\varvec{\mathcal {C}}\), we denote the collection of objects \(\textrm{Obj}(\varvec{\mathcal {C}})\) and the collection of morphisms \(\textrm{Mor}(\varvec{\mathcal {C}})\). For two objects \(X, Y\in \textrm{Obj}(\varvec{\mathcal {C}})\), we denote the set of morphisms \(X \rightarrow Y\) by \(\textrm{Mor}_{\varvec{\mathcal {C}}}(X, Y)\). Where it is clear from context whether \(\alpha \) is an object or a morphism, we simply write \(\alpha \in \varvec{\mathcal {C}}\).
Notation 2.3
Fix any three categories \(\varvec{\mathcal {C}}\), \(\varvec{\mathcal {D}}\) and \(\varvec{\mathcal {D}}'\).
(a)
We denote the category of functors \(\varvec{\mathcal {C}}\rightarrow \varvec{\mathcal {D}}\), where morphisms are natural transformations, by \({\textbf {[}}\varvec{\mathcal {C}},\, \varvec{\mathcal {D}}{\textbf {]}}\).
(b)
For a morphism \(f\in \textrm{Mor}_{{\textbf {[}}\varvec{\mathcal {C}},\, \varvec{\mathcal {D}}{\textbf {]}}}(M, N)\), we denote the components of the natural transformation by \(f_x: M(x) \rightarrow N(x)\) for each \(x\in \varvec{\mathcal {C}}\).
(c)
Given a functor \(\mu : \varvec{\mathcal {D}}\rightarrow \varvec{\mathcal {D}}'\), there is a functor \({\textbf {[}}\varvec{\mathcal {C}},\, \mu {\textbf {]}}: {\textbf {[}}\varvec{\mathcal {C}},\, \varvec{\mathcal {D}}{\textbf {]}}\rightarrow {\textbf {[}}\varvec{\mathcal {C}},\, \varvec{\mathcal {D}}'{\textbf {]}}\). Given \(F\in \textrm{Obj}({\textbf {[}}\varvec{\mathcal {C}},\, \varvec{\mathcal {D}}{\textbf {]}})\), we map \({\textbf {[}}\varvec{\mathcal {C}},\, \mu {\textbf {]}}(F){:}{=}\mu \circ F\). Given a natural transformation \(\nu : F \Rightarrow F'\) between functors \(F, F'\in {\textbf {[}}C,\, D{\textbf {]}}\), we map \({\textbf {[}}\varvec{\mathcal {C}},\, \mu {\textbf {]}}(\nu ){:}{=}\mu \circ \nu \) which one can confirm is a natural transformation \(\mu \circ F \Rightarrow \mu \circ F'\).
Notation 2.4
(a)
We let \(\varvec{\textrm{R}}\) denote the poset of \(\mathbb {R}\) equipped with the \(\le \) relation, viewed as a category.
(b)
We let \(\varvec{\textrm{Vec}}\) denote the category of \(\mathbb {R}\)-vector spaces and \(\varvec{\textrm{vec}}\) denote the full subcategory of finite-dimensional \(\mathbb {R}\)-vector spaces.
(c)
We let \(\varvec{\textrm{Ch}}\) denote the category of chain complexes over \(\mathbb {R}\).
(d)
Given a category \(\varvec{\mathcal {C}}\), a filtration is a functor \(F:\varvec{\textrm{R}}\rightarrow \varvec{\mathcal {C}}\) where the morphisms \(F(s \le t)\) are inclusions (assuming such a notion is defined in the category \(\varvec{\mathcal {C}}\)).
Definition 2.5
Given a chain complex \(C_\bullet \in \varvec{\textrm{Ch}}\), we denote the \(k^{th}\) chain group by \(C_k\) and the boundary map by . For each \(k\in \mathbb {N}\), the \(k^{th}\) homology group, \(H_k(C)\) is the quotient
(2.2)
We can view \(H_k\) as a functor \(\varvec{\textrm{Ch}}\rightarrow \varvec{\textrm{Vec}}\).
2.2 Directed Graphs
Definition 2.6
(a)
A (simple) digraph is a tuple \(G=(V, E)\) where V, the set of vertices, is a finite set and E, the set of edges, is a subset of \(V\times V \setminus \Delta _V\) where
(2.3)
(b)
A directed acyclic graph (DAG) is a simple digraph \(G=(V, E)\) such that there is a partial order < on the nodes such that \((i, j)\in E \implies i < j\).
(c)
An oriented graph is a simple digraph \(G=(V, E)\) with no double edges, i.e. \((i, j) \in E \implies (j, i)\not \in E\).
(d)
A weighted (digraph/DAG/oriented graph) is a triple \(G=(V, E, w)\) such that (V, E) is a (simple digraph/DAG/oriented graph) and \(w: E\rightarrow \mathbb {R}_{>0}\) is a positively-valued function on the edges.
(e)
An undirected graph is a tuple \(G=(V, E)\) where E is a multiset of 2-element subsets of V.
(f)
Given a digraph \(G=(V, E)\), the underlying undirected graph is \({\mathcal {U}(G)} {:}{=}(V, \mathcal {U}(E))\) where
$$\begin{aligned} (i, j)\in E ,(j, i)\not \in E&\implies \{i, j\} \in \mathcal {U}(E) \text { with multiplicity }1, \end{aligned}$$
The weakly connected components of G are the connected components of \(\mathcal {U}(G)\), as a partition of V(G). If a and b belong to the same weakly connected component, we say they are weakly connected, else we say they are weakly disconnected.
(h)
Given two digraphs \(G_1=(V_1, E_1)\), \(G_2=(V_2, E_2)\) with \(V_1, V_2 \subseteq V\), their union is \({G_1 \cup G_2} {:}{=}(V_1 \cup V_2, E_1 \cup E_2)\). If \(V_1\), \(V_2\) are disjoint, then we denote this \({G_1\sqcup G_2}\).
Notation 2.7
Fix a weighted digraph \(G=(V, E, w)\).
(a)
We denote \({V(G)}{:}{=}V\), \({E(G)}{:}{=}E\), \({w(G)}{:}{=}w\).
(b)
For an edge \(e=(i, j)\in E\), we write \({\mathop {\textrm{st}}\limits (e)}{:}{=}i\) and \({\mathop {\textrm{fn}}\limits (e)}{:}{=}j\), to denote the ‘start’ and ‘finish’ of e. Furthermore, we say that i and j are incident to e.
(c)
For an edge \(e=(i, j)\in E\), we write \({w(i, j)}{:}{=}w(e)\).
(d)
We write \({i\rightarrow j}\) to mean there is an edge \((i, j) \in E\).
Definition 2.8
Fix a weighted digraph \(G=(V, E, w)\).
(a)
Given \(V'\subseteq V\), the induced subgraph on \(V'\) is \((V', E', w')\), where \(E' = E \cap (V' \times V')\) and \(w'\) is w restricted to \(E'\). Note that the induced subgraph on V is all of G.
(b)
Given \(E'\subseteq E\), the induced subgraph on \(E'\) is \((V', E', w')\), where \(V'\) is the set of all vertices incident to some edges in \(E'\) and \(w'\) is w restricted to \(E'\). Note that the induced subgraph on E may be a proper subgraph of G.
(c)
For \(v\in V\),
(2.6)
(2.7)
(2.8)
and the vertex-neighbourhood graph, \(\mathop {\mathrm {\mathcal {N\!G}}}\limits (v;G)\), is the induced subgraph on \(\mathcal {N}(v)\). Where G is clear from context, we omit it from notation.
(d)
For \(F\subseteq E\),
(2.9)
and if \(F=\{e\}\) is a single edge, we use the notation \(\mathcal {N}(e; G)\). The edge-neighbourhood graph, \(\mathop {\mathrm {\mathcal {N\!G}}}\limits (F; G)\), is the induced subgraph on \(\mathcal {N}(F; G)\). Where G is clear from context, we omit it from notation.
(e)
Given a vertex \(v\in V\), the closed star of v is the induced subgraph on those edges that are incident to v, which we denote \({\mathop {\mathrm {\textrm{Star}}}\limits (v; G)}\). Where G is clear from context, we omit it from notation. Note that \(\mathop {\mathrm {\textrm{Star}}}\limits (v) \subseteq \mathop {\mathrm {\mathcal {N\!G}}}\limits (v)\).
(f)
For two vertices \(a,b \in V\), a (directed) trail from a to b is an alternating sequence of vertices, \(v_i \in V\), and forward edges, \(e_i \in E\),
$$\begin{aligned} p = (v_0 = a, e_1, v_1, e_2, \dots , e_{k}, v_k=b) \end{aligned}$$
(2.10)
such that \(e_i = (v_{i-1}, v_i)\). We write that p is a trail \(a\leadsto b\). In a simple digraph, the \(e_i\) uniquely determine the \(v_i\) (and vice versa) so we occasionally omit one from the notation.
(g)
A trail with no self-intersections is a path, i.e. \(v_i = v_j \implies i = j\).
(h)
An undirected circuit is an alternating sequence of vertices, \(v_i \in V\), and edges, \(e_i \in E\),
$$\begin{aligned} p = (v_0 = a, e_1, v_1, e_2, \dots , e_{k}, v_k=a) \end{aligned}$$
(2.11)
such that \(\{\mathop {\textrm{st}}\limits (e_i), \mathop {\textrm{fn}}\limits (e_i)\} = \{v_{i-1}, v_i\}\) and \(v_0 = v_k\). Unlike trails, we also require \(k > 1\) to ensure the circuit contains at least one edge.
(i)
Given an undirected circuit p, as above, if \(v_0, \dots , v_{k-1}\) are all distinct then we say p is simple.
Remark 2.9
We allow trails and paths to contains zero edges, i.e. \(p=(a)\) is a trail \(a\leadsto a\) but not an undirected circuit.
by removing \(f(e_i)\) and \(f(v_i)\) from the sequence if \(f(e_i)\) is a self-loop, i.e. \(f(v_{i-1}) = f(v_i)\).
Remark 2.14
Suppose \(f:G \rightarrow H\) is a digraph map such that \(w(H)(f(e)) \le w(G)(e)\) for every edge \(e\in E(G)\) with \(f(e) \not \in \Delta _{V(H)}\). Then for any path \(p:i \leadsto j\) in G, f(p) is a path \(f(i)\leadsto f(j)\) in H and \(\mathop {\textrm{len}}\limits (f(p))\le \mathop {\textrm{len}}\limits (p)\). Therefore, f is a contraction. Whilst this is a sufficient condition to form a contraction, it is not a necessary condition; the shortest path joining \(f(i)\leadsto f(j)\) in H need not be the image of the shortest path joining \(i \leadsto j\).
Given these ways of mapping between (weighted) digraphs, a number of categories naturally arise.
Definition 2.15
(a)
We denote the category of simple digraphs, directed acyclic graphs and oriented graphs, where the morphisms are all digraph maps, by \({\varvec{\textrm{Dgr}}}\), \({\varvec{\textrm{Dag}}}\) and \({\varvec{\textrm{Dor}}}\) respectively.
(b)
We use the prefix \({\varvec{\textrm{W}}}\) to denote the corresponding categories of weighted digraphs where a morphism is any digraph map of the underlying, unweighted digraphs. For example, \(\varvec{\textrm{WDgr}}\) is a category of weighted simple digraphs.
(c)
For a category of weighted digraphs, we use the prefix \({\varvec{\textrm{Cont}}}\) to denote the subcategory, containing all objects, with the additional restriction that morphisms must be contractions.
(d)
For a category of weighted or unweighted digraphs, we use the prefix \({\varvec{\textrm{Incl}}}\) to denote the wide subcategory, containing all objects, with the additional restriction that morphisms must be inclusions.
Finally, in order to align our terminology with that of [13] we make the following definition.
Definition 2.16
Given a finite set V, a directed network is a non-negative function \(A: V \times V \rightarrow [0, \infty )\) such that \(A(v_1, v_2)=0 \iff v_1 = v_2\).
2.3 Path Homology
Path homology is a homology theory for directed graph, which was first introduced by Grigor’yan et al. [24]. Subsequent papers prove Künneth theorems for Cartesian products and joins [25], and invariance under an appropriate notation of digraph homotopy [23]. A directed network gives rise to a natural filtration of digraphs which leads to a stable theory of persistent path homology [13]. (Persistent) path homology has also been extended to vertex-weighted digraphs [28]. Path homology can be defined for an arbitrary path complex; here we present the definition for a digraph.
Fix a ring \(R\) and a simple directed graph \(G=(V, E)\).
Definition 2.17
The following definitions classify sequences of vertices in V:
(a)
An elementary p-path is any sequence \(v_0 \dots v_p\) of \((p+1)\) vertices, \(v_i \in V\).
(b)
An elementary p-path, \(v_0 \dots v_p\), is regular if \(v_i \ne v_{i+1}\) for every i. Otherwise, we say it is non-regular.
(c)
An elementary p-path, \(v_0 \dots v_p\), is allowed if \((v_i, v_{i+1})\in E\) for every i.
Definition 2.18
We freely generate \(R\)-modules from these sequences of vertices, for each \(p\ge 0\).
For \(p=-1\), we let \(\Lambda _{-1}{:}{=}\mathcal {R}_{-1}{:}{=}\mathcal {A}_{-1}{:}{=}R\).
Definition 2.19
Given \(p\ge 0\), the non-regular boundary map is given on the standard basis by
(2.21)
where \(v_0 \dots \hat{v_i} \dots v_p\) is the \((p-1)\)-path obtained by removing \(v_i\) from \(v_0 \dots v_p\).
Definition 2.20
Since \(\Lambda _p = \mathcal {R}_p \oplus \mathcal {I}_p\), let \(\pi :\Lambda _p \rightarrow \mathcal {R}_p\) denote the projection onto \(\mathcal {R}_p\) along \(\mathcal {I}_p\) and let \(\iota : \mathcal {R}_p \rightarrow \Lambda _p\) denote the natural inclusion. The regular boundary map is given by
(2.22)
To justify the name, a standard check confirms that indeed [24, Lemma 2.9]. However, the \(\left\{ \mathcal {R}_p\right\} \) are invariant to the edge set and although \(\mathcal {A}_p\) is a subspace of each \(\mathcal {R}_p\), the boundary map does not pass down to a boundary operator. Hence, we must define the following sub-modules.
Definition 2.21
The space of -invariant p-paths is
(2.23)
Elements of this space are called -invariant p-paths.
Note that restricts to a homomorphism and hence forms a chain complex.
Definition 2.22
The regular path (chain) complex is The homology of the regular path complex is the regular path homology of G, the \(k^{th}\) homology group is
(2.25)
The \(k^{th}\) Betti number is .
×
Remark 2.23
Note that the chain groups of this complex are not \(\mathcal {R}_p\), we use the adjective regular because is constructed via the regular boundary map.
Remark 2.24
Note that when \(R=\mathbb {Z}\) or \(\mathbb {R}\), coincides with the circuit rank of \(\mathcal {U}(G)\) — those unfamiliar with this notion can take this as the definition.
Definition 2.25
Given a digraph map \(f: G \rightarrow H\), the induced map \({f}_\#:\mathcal {R}_p(G) \rightarrow \mathcal {R}_p(H)\) is given on the standard basis by
Given a digraph map \(f: G \rightarrow H\), we denote the induced map on homology by \({f}_* {:}{=}H_k({f}_\#): H_k(G) \rightarrow H_k(H)\) for each \(k\in \mathbb {N}\).
Lemma 2.27
([23, Theorem 2.10], [13, Proposition A.2]) The induced maps restrict to maps which commute with \(\partial _p\) and hence form chain maps between the regular path complexes. Moreover these chain maps are functorial, i.e. \({(f\circ g)}_\# = {f}_\# \circ {g}_\#\) and \({(\textrm{id})}_\#=\textrm{id}\). Hence is a functor \(\varvec{\textrm{Dgr}}\rightarrow \varvec{\textrm{Ch}}\).
We will primarily be interested in the \(1^{st}\) homology group . Hence the following characterisation of the low-dimensional chain groups will be of use.
Proposition 2.28
([24, § 3.3]) For any simple digraph \(G=(V, E)\), is isomorphic to the \(R\)-module freely generated by the vertices and is isomorphic to the \(R\)-module freely generated by the edges, i.e.
(2.27)
Notation 2.29
Note that an edge \(e=(a, b)\in E\) gives rise to an allowed 1-path \(ab\in \mathcal {A}_1(G)\). Moreover, . For ease of notation, given an edge \(e\in E\), we will also use to refer to the generator in .
Since is generated by edges in G, any trail p has a representative.
Notation 2.30
Given a directed trail \(p=(e_1, \dots , e_k)\) in a digraph G, the representative of p is
(2.28)
Likewise, there is a representative for any undirected circuit.
Notation 2.31
Given an undirected circuit \(p=(v_0=a, e_1, v_1, \dots , e_k, v_k=a)\) in a digraph G, the representative of p is
(2.29)
where \(\alpha _i = 1\) if \(e_i = (v_{i-1}, v_i)\), else \(\alpha _i = -1\).
Remark 2.32
The representative of a circuit p, does not depend on the starting point, but if \(p'\) traverses the circuit in the opposite direction then \(\mathfrak {R}({p'}) = - \mathfrak {R}({p})\).
×
Proposition 2.33
([23, Proposition 2.9], [17, Theorem 3]) Let G be a finite, simple digraph and \(R=\mathbb {R}\text { or }\mathbb {Z}\). Any can be written as a linear combination of -invariant 2-paths of the following three types (represented in Fig. 2):
(a)
(iji) where \(i\rightarrow j \rightarrow i\) (double edge);
(b)
(ijk) where \(i \rightarrow j \rightarrow k\), \(i\rightarrow k\) and \(i\ne k\) (directed triangle); and
(c)
\((ijk - imk)\) where \(i\rightarrow j \rightarrow k\), \(i \rightarrow m \rightarrow k\), \(i\not \rightarrow k\) and \(i\ne k\) (long square).
First note that all of the elements identified in Proposition 2.33 are elements of and hence they form a generating set. However, the generators corresponding to long squares are not necessarily linearly independent. For example, in Fig. 2d, we see
Removing some long squares to account for these linear relations, we can obtain a basis of .
2.4 Path Homotopy
Path homotopy, also introduced by Grigor’yan et al. [23], is a homotopy theory under which path homology is invariant. We employ this theory in order to build interleavings between the grounded persistent path homology of two related weighted digraphs. Below, we present the basic definitions and results of path homotopy theory, extending it to a relative version in which we fix a subset of the vertices throughout the homotopy.
Definition 2.34
(a)
A line digraph of length n is a digraph on \(0, 1, \dots , n\) with exactly one of \((i, i+1)\), \((i+1, i)\) for each i and no other edges.
(b)
Denote the two length 1 line digraph by \(I_+\) for \(0 \longrightarrow 1\) and \(I_-\) for \(0\longleftarrow 1\).
(c)
Given two digraphs G, H, their box product is the digraph \(G\boxdot H\) where
(d)
Given two digraph maps \(f, g: G \rightarrow H\), a homotopy between f and g is a digraph map \(F: G \boxdot I \rightarrow H\), where I is a line digraph of length n
(2.31)
(e)
If I is length 1 in the above, we say F is a one-step homotopy. Otherwise, we say F is a multi-step homotopy.
(f)
Suppose F is a one-step homotopy. If, furthermore, \(I=I_+\) we say F is a from f to g else \(I=I_-\) and we say F is from g to f.
(g)
If there is a homotopy in either direction, we say f and g are homotopic and write \(f \simeq g\).
(h)
Given \(A\subseteq V(G)\), we say F is a homotopy relative A if for each \(a\in A\), the partially applied function \(F(a, \bullet ):V(I)\rightarrow V(H)\) is constant.
Given an arbitrary homotopy \(F:G\boxdot I \rightarrow H\) between f and g where I is of length n, let \(f_i: G \rightarrow H\) denote the restriction of F to \(G\times \{ i \}\) and let \(I_i\) denote the induced subgraph of I on \(\{ i-1, i\}\) (a single edge). By restricting F to each \(G \boxdot I_i\), we obtain none-step homotopies \(F_i: G \boxdot I_i \rightarrow H\) for \(i=1, \dots , n\). Each homotopy \(F_i\) is between \(f_{i-1}\) and \(f_{i}\), with the direction dependent on the direction of the edge \(I_i\). Moreover, if F is a relative \(A\subseteq V(G)\) then so too is each \(F_i\).
Grigor’yan et al. [23] showed that path homology is invariant with respect to path homotopy. In [23, Theorem 3.3] the chain homotopy which achieves this invariance is only made explicit in the one-step homotopy case. In order to describe the behaviour of relative path homotopies, we make the construction explicit for multi-step homotopies.
Theorem 2.35
If \(f\simeq g\) then there is an induced chain homotopy, i.e a sequence of morphisms , such that for all ,
(2.32)
Proof
Suppose F is a one-step homotopy from f to g, then in the proof of [23, Theorem 3.3] it was shown that there exists a chain homotopy L satisfying Eq. (2.32). Suppose instead that F is an arbitrary homotopy between f and g. Then, as described above, there is a sequence of digraph maps
$$\begin{aligned} f=f_0 =, f_1, \dots , f_n = g \end{aligned}$$
(2.33)
and a sequence of one-step homotopies \(F_1, \dots F_n\) such that either \(F_i\) is from \(f_{i-1}\) to \(f_i\) or is from \(f_i\) to \(f_{i-1}\). In the first case, we define \(\sigma _i {:}{=}1\) and in the latter we define \(\sigma _i {:}{=}-1\). Each of the one-step homotopies induces a chain homotopy which we denote \(L^i\) and the signs ensure that, for each i,
(2.34)
Finally, we linearly combine these homotopies in each degree \(L {:}{=}\sum _{i=1}^n \sigma _i L^i\) to obtain the required chain homotopy. This L satisfies Eq. (2.32) thanks to a telescoping sum. \(\square \)
Corollary 2.36
([23, Theorem 3.3]) If \(f, g: G \rightarrow H\) are homotopic digraph maps then they induce identical maps on homology \({f}_*, {g}_*: H_k(G) \rightarrow H_k(H)\) for every \(k\in \mathbb {N}\).
If \(f, g: G \rightarrow H\) are path homotopic relative \(A \subseteq V(G)\) then any supported entirely on vertices of A has the same image under \({f}_\#\) and \({g}_\#\). As one might hope, these -invariant paths are in the kernel of the induced chain homotopy. This behaviour will be crucial in the proof of the main stability theorem (Theorem 5.8).
Lemma 2.37
Suppose \(F: G \boxdot I \rightarrow H \) is a path homotopy relative \(A \subseteq V(G)\), let \(G_A\) denote the induced subgraph of G on A and let L denote the induced chain homotopy from Theorem 2.35. Then .
Proof
Due to the linear construction above, it suffices to show this result in the case where F is a one-step homotopy. We also assume \(I=I_+\); the \(I=I_-\) case admits a similar proof.
To ease notation, for each vertex \(v\in V(G)\), let v denote \((v, 0) \in V(G\boxdot I)\) and let \(v'\) denote \((v,1)\in V(G\boxdot I)\). Next, we can write any in terms of the standard basis of \(\mathcal {A}_p(G_A)\)
In each summand, we note \(F(v_k) = F(v_k')\) since F is relative A. Therefore each of the paths \(F(v_0)\dots F(v_k) F(v_k') \dots F(v_p')\) is an irregular path which is sent to 0 under \(\pi \). \(\square \)
2.5 Persistent Homology
Topological data analysis (TDA) is a field of applied mathematics which employs the powerful, discriminative tools of algebraic topology to study complex datasets. The cornerstone of the field is persistent homology (PH) which yields a stable, discrete, topological invariant, called a barcode (see [5, 9, 33] for an overview). The barcode summarises topological features in the data (e.g. connected components and loops) and measures the range of scales across which they persist.
Definition 2.38
(a)
A persistent chain complex is a functor \(\varvec{\textrm{R}}\rightarrow \varvec{\textrm{Ch}}\).
(b)
A persistent vector space is a functor \(\varvec{\textrm{R}}\rightarrow \varvec{\textrm{Vec}}\). We denote the category of such functors \({\varvec{\textrm{PersVec}}}{:}{=}{\textbf {[}}\varvec{\textrm{R}},\, \varvec{\textrm{Vec}}{\textbf {]}}\).
(c)
A persistent vector space M is pointwise finite-dimensional (p.f.d) if M(t) is finite dimensional for all \(t\in \mathbb {R}\), that is \(M\in {\textbf {[}}\varvec{\textrm{R}},\, \varvec{\textrm{vec}}{\textbf {]}}\). We denote the category of p.f.d persistent vector spaces \({\varvec{\textrm{Persvec}}}{:}{=}{\textbf {[}}\varvec{\textrm{R}},\, \varvec{\textrm{vec}}{\textbf {]}}\).
(d)
A persistent vector space \(M:\varvec{\textrm{R}}\rightarrow \varvec{\textrm{Vec}}\) is tame [31] if
(i)
M(t) is finite dimensional for all \(t \in \mathbb {R}\), and
(ii)
there are finitely many \(t\in \mathbb {R}\) such that there is no \(\epsilon > 0\) such that \(M(t- \epsilon \le t+ \epsilon )\) is an isomorphism.
(e)
For an interval \(I\subseteq R\), we define the corresponding interval, \({P(I)}\in \varvec{\textrm{PersVec}}\), in which the vector spaces are
A morphism of persistent vector spaces is a morphism in the category \({\textbf {[}}\varvec{\textrm{R}},\, \varvec{\textrm{Vec}}{\textbf {]}}\). That is, for \(M, N \in {\textbf {[}}\varvec{\textrm{R}},\, \varvec{\textrm{Vec}}{\textbf {]}}\) a morphism \(\phi : M \rightarrow N\) is a family of linear maps \(\left\{ \phi _t: M(t) \rightarrow N(t) \right\} \) such that
whenever \(s\le t\). We say \(\phi \) is an isomorphism if each \(\phi _t\) is an isomorphism of vector spaces. If an isomorphism \(M \rightarrow N\) exists, we write \(M\cong N\).
A (p.f.d) persistent vector space can be decomposed as a direct sum of interval modules, the indecomposable persistent vector spaces. Moreover, this decomposition is unique, discrete and finite in all practical applications.
Theorem 2.39
(Structure Theorem for p.f.d persistent vector spaces, [16, Theorem 1.1], [10, Theorem 2.8]) Given \(M\in \varvec{\textrm{Persvec}}\), there is a multiset \(\mathcal {B}{M}\) of intervals of \(\mathbb {R}\) such that
$$\begin{aligned} M \cong \bigoplus _{I\in \mathcal {B}{M}} P(I) \end{aligned}$$
(2.41)
and any such decomposition is unique, up to reordering. We call \(\mathcal {B}{M}\) the barcode of M.
Definition 2.40
(a)
A multiset of intervals of \(\mathbb {R}\) is called a barcode.
(b)
We call an interval I in a barcode a feature. If I starts at a and ends at b, we say the feature is born at time a and dies at time b.
(c)
Given a barcode \(\mathcal {B}\), the diagram of \(\mathcal {B}\) is the multiset of endpoints
(2.42)
(d)
Given \(M\in \varvec{\textrm{Persvec}}\), the persistence diagram of M is \({{{\,\textrm{Dgm}\,}}(M)}{:}{=}{{\,\textrm{Dgm}\,}}(\mathcal {B}{M})\).
The barcode can be used as a summary of the persistent vector space. When arising as the homology of a filtration of topological spaces, this summary captures how topological features are born and killed throughout the filtration. In order to use this summary for further statistics, it is desirable that this summary is stable to noise and perturbations in the input data. To quantify this stability, we require metrics on persistent vector spaces and the resulting barcodes.
Definition 2.41
([18]) Given two multisets \(D_1, D_2 \subseteq \mathbb {R}^2\), the bottleneck distance is
where \(\gamma \) is over all multi-bijections \(D_1 \cup \Delta _\mathbb {R}\rightarrow D_2 \cup \Delta _\mathbb {R}\) and is the diagonal with multiplicity 1. Given barcodes \(\mathcal {B}_1\), \(\mathcal {B}_2\), we define the bottleneck distance between them to be the bottleneck distance between their diagrams
where \(\gamma \) is over all multi-bijections \(D_1 \cup \Delta _\mathbb {R}\rightarrow D_2 \cup \Delta _\mathbb {R}\) and is the diagonal with multiplicity 1. The p-Wasserstein between two barcodes \(\mathcal {B}_1\), \(\mathcal {B}_2\), is
([1]) Given a category \(\varvec{\mathcal {C}}\), fix \(M, N\in {\textbf {[}}\varvec{\textrm{R}},\, \varvec{\mathcal {C}}{\textbf {]}}\) and \(\delta \ge 0\).
(a)
The \(\delta \)-shift of M is \(M [ \delta ]\in {\textbf {[}}\varvec{\textrm{R}},\, \varvec{\mathcal {C}}{\textbf {]}}\) where
$$\begin{aligned} M [ \delta ](t) {:}{=}M(t + \delta ) \quad \text { and }\quad M [ \delta ](s \le t) {:}{=}M(s + \delta \le t + \delta ). \end{aligned}$$
(2.47)
(b)
Given a morphism \(f\in \textrm{Mor}_{{\textbf {[}}\varvec{\textrm{R}},\, \varvec{\mathcal {C}}{\textbf {]}}}(M, N)\) the \(\delta \)-shift of f is \(f [ \delta ]:M [ \delta ]\rightarrow N [ \delta ]\) in which \(f_t {:}{=}f_{t+\delta }\). When clear from context we often denote \(f [ \delta ]=f\).
(c)
The \(\delta \)-transition morphism is a morphism \({ \mathcal {T}(M, \delta )}:M\rightarrow M [ \delta ]\) which at \(t\ge 0\) is given by \(M(t \le t+\delta )\).
(d)
A \(\delta \)-interleaving is a pair of morphisms \(\phi : M \rightarrow N [ \delta ]\) and \(\psi : N \rightarrow M [ \delta ]\) such that
Recall that, given a \(\delta \)-interleaving \(\phi \) and \(\psi \), in order to constitute morphisms \(M\rightarrow N [ \delta ]\) and \(N\rightarrow M [ \delta ]\), they must satisfy relations
Now that we have metrics on persistent vector spaces and their barcode summaries, we can state the isometry theorem. This guarantees that the barcode is a stable summary of the input persistent vector space.
Finally, when a persistent vector space is tame, there are finitely many critical values \(t_0< \dots < t_k\) such that if \(t_{i-1}< s \le t < t_i\) then \(M(s \le t)\) is an isomorphism [10]. Hence, all information of the persistent vector space is contained within the maps \(M(t_{i-1} \le t_i)\) for \(i=1, \dots , k\). In particular, any interval in the barcode must have its endpoints at one of the critical values (or \(\pm \infty \)). In these scenarios, it suffices to consider M as a functor \({\textbf {[}}\varvec{k} {\textbf {]}}_{\varvec{\le }} \rightarrow \varvec{\textrm{Vec}}\), where \({\textbf {[}}\varvec{k} {\textbf {]}}_{\varvec{\le }}\) is the sub-poset of \(\varvec{\textrm{R}}\) consisting of the integers \(0, \dots , k\) [31].
3 Motivation and Definition of GrPPH
Firstly, in Sect. 3.1, we describe a standard pipeline for extracting a topological summary from a weighted digraph, and illustrate a number of issues that naturally arise. Motivated by this in Sect. 3.2, we alter the standard pipeline in order to define a ‘grounded pipeline’. We prove that this new pipeline is functorial in an appropriate sense, which we will later exploit for stability results. The pipeline is parameterised by two choices; in Sect. 3.3 we fix these choices in order to define our proposed descriptor.
3.1 Standard Pipeline
A typical TDA pipeline for weighted digraphs consists of three ingredients:
1.
a map \(F:\textrm{Obj}(\varvec{\textrm{WDgr}})\rightarrow \textrm{Obj}({\textbf {[}}\varvec{\textrm{R}},\, \varvec{\textrm{Dgr}}{\textbf {]}})\), which assigns a filtration of digraphs to every weighted digraph;
2.
a chain complex functor \(C:\varvec{\textrm{Dgr}}\rightarrow \varvec{\textrm{Ch}}\) which maps each digraph to a chain complex and induces chain map for every digraph map; and finally
3.
a choice of homology functor \(H_k: \varvec{\textrm{Ch}}\rightarrow \varvec{\textrm{Vec}}\) in some degree k.
These components can then be combined into the following pipeline.
×
Examples of this pipeline include persistent homology of the directed flag complex [29] and persistent path homology [13]. However, it is possible to use these pipelines with filtrations of digraphs that do not arise from an initial weighted digraph.
We obtain a map \(\mathcal {H}_k: \textrm{Obj}(\varvec{\textrm{WDgr}}) \rightarrow \textrm{Obj}({\textbf {[}}\varvec{\textrm{R}},\, \varvec{\textrm{Vec}}{\textbf {]}})\) given by \(\mathcal {H}_k {:}{=}{\textbf {[}}\varvec{\textrm{R}},\, H_k{\textbf {]}} \circ {\textbf {[}}\varvec{\textrm{R}},\, C{\textbf {]}} \circ F\). Since F is not a priori functorial, neither is \(\mathcal {H}_k\). Under mild assumptions on F and C, it is possible to define a subcategory of \(\varvec{\textrm{WDgr}}\) which makes this pipeline functorial.
Notation 3.1
Given \(F:\textrm{Obj}(\varvec{\textrm{WDgr}})\rightarrow \textrm{Obj}({\textbf {[}}\varvec{\textrm{R}},\, \varvec{\textrm{Dgr}}{\textbf {]}})\), \(G \in \varvec{\textrm{WDgr}}\) and \(s\le t\) we write
(a)
\({F^t G}{:}{=}F(G)(t)\), the image of t under the functor F(G); and
(b)
\({\iota (s , t)}{:}{=}F(G)(s\le t)\), the image of \(s\le t\) under the functor F(G).
Definition 3.2
(a)
A filtration map is any map \(F:\textrm{Obj}(\varvec{\textrm{WDgr}}) \rightarrow \textrm{Obj}({\textbf {[}}\varvec{\textrm{R}},\, \varvec{\textrm{Incl}}\varvec{\textrm{Dgr}}{\textbf {]}})\) such that \(V(F^t G) \subseteq V(G)\) for all \(t\in \mathbb {R}\). In particular, \(\iota (s , t)\) must always be an inclusion.
(b)
Given a filtration map F, a morphism of weighted digraphs \(f\in \textrm{Mor}_{\varvec{\textrm{WDgr}}}(G, H)\) is called F-compatible if for every \(t\in \mathbb {R}\) the underlying vertex map \(f: V(G) \rightarrow V(H)\) restricts to a vertex map \(V(F^t G) \rightarrow V(F^t H)\) which in turn yields a digraph map \(F^t G \rightarrow F^t H\).
(c)
Given a filtration map F, the F-compatible category of weighted digraphs, \(\varvec{\textrm{WDgr}}_F\), is the subcategory of \(\varvec{\textrm{WDgr}}\) such that \(\textrm{Obj}(\varvec{\textrm{WDgr}}_F)=\textrm{Obj}(\varvec{\textrm{WDgr}})\) and
(3.1)
Lemma 3.3
Any filtration map \(F: \textrm{Obj}(\varvec{\textrm{WDgr}})\rightarrow \textrm{Obj}({\textbf {[}}\varvec{\textrm{R}},\, \varvec{\textrm{Incl}}\varvec{\textrm{Dgr}}{\textbf {]}})\) induces a functor \(F: \varvec{\textrm{WDgr}}_F\rightarrow {\textbf {[}}\varvec{\textrm{R}},\, \varvec{\textrm{Dgr}}{\textbf {]}}\), which we call a filtration functor.
Proof
Given \(f\in \textrm{Mor}_{\varvec{\textrm{WDgr}}_F}(G, H)\), since f is F-compatible, the underlying vertex map induces digraph maps \(f:F^t G \rightarrow F^t H\) for every \(t \in \mathbb {R}\). Given \(s\le t\), both \(F(G)(s\le t)\) and \(F(H)(s\le t)\) are digraph maps induced by the inclusion vertex map. Hence the following square of morphisms in \(\varvec{\textrm{Dgr}}\) commutes.
×
Hence f induces a natural transformation between F(G) and F(H). Moreover, since each \(f:F^t G \rightarrow F^t H\) is fully determined by the underlying vertex map \(V(G) \rightarrow V(H)\), this construction is certainly functorial. \(\square \)
Remark 3.4
Since any filtration map induces a filtration functor, it suffices to define a filtration functor only as a map on objects.
When F is a filtration map, it induces a functor \(\varvec{\textrm{WDgr}}_F\rightarrow {\textbf {[}}\varvec{\textrm{R}},\, \varvec{\textrm{Dgr}}{\textbf {]}}\) and hence \(\mathcal {H}_k\) is a functor \(\varvec{\textrm{WDgr}}_F\rightarrow {\textbf {[}}\varvec{\textrm{R}},\, \varvec{\textrm{Vec}}{\textbf {]}}\), as desired. We will now consider an illustrative example of this pipeline. Assuming the weight of an edge corresponds to a distance between its endpoints (e.g. the time it takes for a particle to flow down the edge), a natural choice of filtration functor is the following.
Definition 3.5
The shortest-path filtration is a map \(F_d:\varvec{\textrm{WDgr}}\rightarrow {\textbf {[}}\varvec{\textrm{R}},\, \varvec{\textrm{Dgr}}{\textbf {]}}\). For \(G=(V, E, w)\in \varvec{\textrm{WDgr}}\) and \(t\in \mathbb {R}\), we define
(3.2)
and d is the shortest-path quasimetric on G. For \(s\le t\), the digraph map \(G^s \rightarrow G^t\) is induced by the identity vertex map \(\textrm{id}_V\).
×
Example 3.6
Choosing \(F=F_d\) as above and C to be the regular path complex, we obtain a functor \(\mathcal {H}_k:\varvec{\textrm{WDgr}}_F\rightarrow {\textbf {[}}\varvec{\textrm{R}},\, \varvec{\textrm{Vec}}{\textbf {]}}\). The shortest-path quasimetric of a weighted digraph is a directed network and this pipeline measures the persistent path homology of that network. This pipeline was first considered in [13] for cycle networks, alongside a stability analysis of persistent path homology for arbitrary directed networks.
In Fig. 3 we apply this pipeline (with \(k=1\)) to a small bifurcating network \(G_1\), a toy model for vasculature networks, in which each edge is given unit weight. The resulting barcode has a single feature with lifetime [1, 2). The second network, \(G_2\), is obtained by subdividing each edge in \(G_1\), giving all edges weight 0.5. The resulting barcode is \(\left\{ \!\!\left\{ [0.5, 1), [0.5, 1), [0.5, 2) \right\} \!\!\right\} \) which has three features.
This example highlights three key issues with this pipeline:
(a)
the number of features in the barcode changes upon subdivision;
(b)
loops bounded by triangles or long squares are ‘killed’ as soon as they are born; and
(c)
the birth-time of each feature is an artefact of the ‘resolution’ of the weighted digraph.
A subtler issue arises when we attempt to interpret the diagram. A feature born at time t is supported on edges of the digraph \(G^t\), which may not be edges in the original weighted digraph G. This makes interpretation of features more challenging. Note, this issues does not arise for either \(G_i\) shown in Fig. 3 since, in each case, there is some T such that \(G_i^T = G_i\) and moreover \(E(G_i^t) = \emptyset \) for all \(t<T\).
3.2 Grounded Pipeline
We now describe an alteration to the standard pipeline which alleviates these issues by including the underlying digraph G in degree 1 for all \(t\in \mathbb {R}\). The main distinction is that we do not factor through a filtration functor F. Instead, we use F and \(C\in {\textbf {[}}\varvec{\textrm{Dgr}},\, \varvec{\textrm{Ch}}{\textbf {]}}\) to construct a new functor \( ^{g}{C}_F:\varvec{\textrm{WDgr}}_F\rightarrow {\textbf {[}}\varvec{\textrm{R}},\, \varvec{\textrm{Ch}}{\textbf {]}}\).
×
In order to define the map on objects, we only need a weaker condition on C.
Definition 3.7
Given a filtration functor \(F:\varvec{\textrm{WDgr}}_F\rightarrow {\textbf {[}}\varvec{\textrm{R}},\, \varvec{\textrm{Dgr}}{\textbf {]}}\), a functor \(C:\varvec{\textrm{Incl}}\varvec{\textrm{Dgr}}\rightarrow \varvec{\textrm{Ch}}\), \(G\in \varvec{\textrm{WDgr}}\) and \(t\in \mathbb {R}\), the chain complex \( ^{g}{C}_\bullet (G, t; F)\) is the top row of the following diagram.
×
In the above, \(\iota :F^t G \hookrightarrow G\cup F^t G\) is the inclusion digraph map, induced by the inclusion vertex map. Then \({\iota }_\#=C(\iota )\) is the image of this map under the functor C. The boundary maps are derived either from the chain complex \(C_\bullet (F^t G)\) or \(C_\bullet (G \cup F^t G)\).
We denote the chain groups as \( ^{g}{C}_k(G, t; F)\) and the boundary maps as . When F and t are clear from context, we omit them from notation
We use the prescript \( ^{g}{\square }\) to denote that this chain complex is grounded; as we will show in Lemma 4.2, after appropriate choices of F and C, all degree 1 homology classes have representatives in the underlying digraph.
Lemma 3.8
For each \(t\in \mathbb {R}\) and \(G\in \varvec{\textrm{WDgr}}\), is a chain complex.
Proof
Since \({\iota }_\#\) is a chain map \(C(F^t G)\rightarrow C(G\cup F^t G)\) we have
(3.3)
and hence \(C_\bullet (G,t;F)\) defines a chain complex. \(\square \)
Lemma 3.9
Fix a filtration functor \(F:\varvec{\textrm{WDgr}}_F\rightarrow {\textbf {[}}\varvec{\textrm{R}},\, \varvec{\textrm{Dgr}}{\textbf {]}}\) and a functor \(C:\varvec{\textrm{Incl}}\varvec{\textrm{Dgr}}\rightarrow \varvec{\textrm{Ch}}\). Given \(G, H\in \varvec{\textrm{WDgr}}\) and a vertex map \(f:V(G) \rightarrow V(H)\), suppose that f induces digraph maps \(f:F^s G \rightarrow F^t H\) and \(f: G \cup F^s G \rightarrow H \cup F^t H\) for some \(s \le t\). Then there is a chain map
Finally, for any \(t\in \mathbb {R}\), \(J(\textrm{id}_{V(G)}, t\le t)\) is the identity chain map.
Proof
To distinguish the two digraph maps induced by f, we denote them
$$\begin{aligned} f: F^s G \rightarrow F^t H \quad \text {and}\quad f': G \cup F^s G \rightarrow H \cup F^t H. \end{aligned}$$
(3.6)
First note that the two digraph maps induced by f commute with the relevant inclusions so that the following square commutes.
×
Applying the functor C to the digraph maps f and \(f'\) we obtain two chain maps \({f}_\#:C(F^s G)\rightarrow C(F^t H)\) and \({f'}_\#:C(G\cup F^s G)\rightarrow C(H\cup F^t H)\). These chain maps can be combined as in the following diagram.
×
Squares and and commute because all vertical maps are components of the same chain map. Square commutes because it is the image of the initial square of commuting digraph maps under the functor C, restricted to degree 1. Therefore, the vertical solid arrows constitute a chain map between the required chain complexes, which we denote \(J(f, s\le t)\). The remaining functorial relations follow automatically because in each degree \(J(f, s\le t)\) coincides with either \({f}_\#\) or \({f'}_\#\), which each satisfy the relations. \(\square \)
Lemma 3.10
Given a filtration functor \(F:\varvec{\textrm{WDgr}}_F\rightarrow {\textbf {[}}\varvec{\textrm{R}},\, \varvec{\textrm{Dgr}}{\textbf {]}}\) and a functor \(C:\varvec{\textrm{Incl}}\varvec{\textrm{Dgr}}\rightarrow \varvec{\textrm{Ch}}\), the chain complex \( ^{g}{C}_\bullet (G, t; F)\) is functorial in t.
Proof
For each \(s\le t\) we require chain maps which satisfy the usual functorial axioms in t. First note that that the identity map \(\textrm{id}: V(G) \rightarrow V(G)\) induces a digraph map \(\textrm{id}: G \rightarrow G\). Moreover, for each \(s\le t\) the identity induces digraph maps \(\iota (s , t): F^s G \rightarrow F^t G\) since F is a filtration functor. Hence, the identity also induces digraph maps \(G \cup F^s G \rightarrow G \cup F^t G\). Applying Lemma 3.9, we obtain a chain map \(J(\textrm{id}, s\le t): ^{g}{C}_\bullet (G, s) \rightarrow ^{g}{C}_\bullet (G, t)\). Now, given \(s \le t \le r\), Lemma 3.9 implies that
and moreover \(J(\textrm{id}, t\le t)\) is the identity chain map on \( ^{g}{C}(G, t)\). Therefore, we can take . \(\square \)
Definition 3.11
Given a filtration functor \(F:\varvec{\textrm{WDgr}}_F\rightarrow {\textbf {[}}\varvec{\textrm{R}},\, \varvec{\textrm{Dgr}}{\textbf {]}}\) and a functor \(C:\varvec{\textrm{Incl}}\varvec{\textrm{Dgr}}\rightarrow \varvec{\textrm{Ch}}\), the map \({ ^{g}{C}_F}: \textrm{Obj}(\varvec{\textrm{WDgr}}) \rightarrow \textrm{Obj}({\textbf {[}}\varvec{\textrm{R}},\, \varvec{\textrm{Ch}}{\textbf {]}})\) is given on objects by
(3.8)
and the morphism \( ^{g}{C}_F(G)(s\le t)\) is as constructed in the proof of Lemma 3.10.
Notation 3.12
(a)
We denote the induced map on chain complexes by .
(b)
In each homology degree k, we denote the induced map on homology by .
Thanks to Lemma 3.10, \( ^{g}{C}_F\) gives us a map \(\textrm{Obj}(\varvec{\textrm{WDgr}})\rightarrow \textrm{Obj}({\textbf {[}}\varvec{\textrm{R}},\, \varvec{\textrm{Ch}}{\textbf {]}})\), which we can compose with homology to obtain a persistent vector space. Under additional functorial assumptions on C, when we restrict to the appropriate category \( ^{g}{C}_F\) becomes a functor.
Theorem 3.13
Given a filtration functor \(F:\varvec{\textrm{WDgr}}_F\rightarrow {\textbf {[}}\varvec{\textrm{R}},\, \varvec{\textrm{Dgr}}{\textbf {]}}\) and a functor \(C:\varvec{\textrm{Dgr}}\rightarrow \varvec{\textrm{Ch}}\), \( ^{g}{C}_F\) is a functor \( ^{g}{C}_F:\varvec{\textrm{WDgr}}_F\rightarrow {\textbf {[}}\varvec{\textrm{R}},\, \varvec{\textrm{Ch}}{\textbf {]}}\).
Proof
Given \(f\in \textrm{Mor}_{\varvec{\textrm{WDgr}}_F}(G, H)\) and \(t\in \mathbb {R}\), we need a chain map . Moreover, f must satisfy the usual functorial axioms, and given \(s\le t\), commute with the chain maps .
Note that f is given by a vertex map \(f: V(G) \rightarrow V(H)\) which induces a digraph map \(G \rightarrow H\). Moreover, since f is F-compatible, it induces digraph maps \(F^t G \rightarrow F^t H\). Therefore f must also induce digraph maps \(G \cup F^t G \rightarrow H \cup F^t H\). Hence, using Lemma 3.9, we can take . The functorial axioms follow immediately from the functorial relations shown in Lemma 3.9. Moreover, since is also constructed via Lemma 3.9, Eq. (3.5) implies
(3.9)
and hence constitutes a morphism \( ^{g}{C}_F(G)\rightarrow ^{g}{C}_F(H)\). \(\square \)
Notation 3.14
Given a morphism \(f\in \textrm{Mor}(\varvec{\textrm{WDgr}}_F)\), we denote the induced map on chain complexes, constructed above, by and the induced map on homology in degree k by .
3.3 Definition of GrPPH
In order to investigate properties and stability of the grounded pipeline, we make choice for both F and C. As we have already discussed, since we interpret edge-weights as a measure of distance, a natural choice for F is the shortest-path filtration. Choices for C include the regular path complex, non-regular path complex and the directed flag complex. However, the latter two constructions are not functors \(\varvec{\textrm{Dgr}}\rightarrow \varvec{\textrm{Ch}}\). Henceforth, for the rest of the paper, we fixFto be the shortest-path filtration andCto be the regular path complex,
$$\begin{aligned} F = F_d \quad \text { and }\quad C=\Omega . \end{aligned}$$
(3.10)
Since F is fixed, we will largely remove it from notation. We also use C instead of .
Lemma 3.15
The \(F_d\)-compatible category of weighted digraphs is the contraction category of weighted digraphs (see Definition 2.15), \({\varvec{\textrm{WDgr}}}_{F_d}=\varvec{\textrm{Cont}}\varvec{\textrm{WDgr}}\).
Proof
First note \(f\in \textrm{Mor}(\varvec{\textrm{WDgr}})\) is precisely a vertex map \(f:V(G) \rightarrow V(H)\) which induces a digraph map \(G\rightarrow H\). A vertex map \(f:V(G) \rightarrow V(H)\) induces a digraph map \(G^t \rightarrow H^t\) for every \(t\in \mathbb {R}\) if and only if
for every \(i, j \in V(G)\). Hence, \(f\in \textrm{Mor}(\varvec{\textrm{WDgr}})\) is \(F_d\)-compatible if and only if it is a contraction map. So a morphism \(f\in \textrm{Mor}({\varvec{\textrm{WDgr}}}_{F_d})\) is precisely a contraction digraph map \(G \rightarrow H\). \(\square \)
Now that we understand the category \({\varvec{\textrm{WDgr}}}_{F_d}\), applying Theorem 3.13 yields a functor \( ^{g}{C}:\varvec{\textrm{Cont}}\varvec{\textrm{WDgr}}\rightarrow {\textbf {[}}\varvec{\textrm{R}},\, \varvec{\textrm{Ch}}{\textbf {]}}\). Taking the first homology yields a persistent vector space in a functorial way; this functor is our proposed invariant for weighted digraphs.
Definition/Theorem 3.16
Grounded persistent path homology (GrPPH) is the functor
from the contraction category of weighted digraphs (see Definition 2.15) to the category of persistent vector spaces (see Definition 2.38).
Notation 3.17
Given \(G\in \varvec{\textrm{WDgr}}\), in degree k at filtration step t, we denote
(3.13)
(3.14)
(3.15)
Remark 3.18
Note that for any \(t\in \mathbb {R}\),
$$\begin{aligned} k > 1&\implies ^{g}{H}_k(G, t) \cong H_k(G^t), \\ k < 1&\implies ^{g}{H}_k(G, t) \cong H_k(G \cup G^t). \end{aligned}$$
Therefore, the only new homology occurs in degree \(k=1\), since it compares 1-cycles in \(G\cup F^t G\) with 1-boundaries from \(G^t\). This justifies our focus on degree 1 homology in Definition/Theorem 3.16.
×
Example 3.19
In Fig. 4, we consider again the bifurcating network example of Fig. 3, in which all edges have weight 1. We see the barcodes of the grounded persistent homology are both
In \(G_1\) the two [0, 1) features correspond to the smaller 4-node circuits in the centre of the network, coloured in blue. These circuits birth homological cycles in \(G\cup G^t\) at \(t=0\), which are then killed by long squares when the edges appear in \(G^t\) at \(t=1\).
The [0, 2) feature corresponds to the large inner circuit (coloured in green). Again this circuit births a homological cycle at \(t=0\) which then becomes null-homologous at \(t=2\) when shortcut edges give rise to a new long square.
The outer red cycle is a linear combination of the inner green and blue cycles, hence it does not give rise to a fourth feature in the barcode. Moreover, at \(t=1\) the red and green cycles becomes homologous.
In \(G_2\) the features correspond to the same circuits (once subdivided).
4 Interpretation of GrPPH
4.1 Decreasing Betti Curves
In the first example we considered (Example 3.19), we saw that all features were born at \(t=0\). Indeed, this is always the case and \( ^{g}{H}_1(G, 0) \) is in fact the cycle space of \(\mathcal {U}(G)\) (with \(\mathbb {R}\)-valued coefficients).
Lemma 4.1
Given a digraph \(G=(V, E, w)\), two distinct nodes \(a, b \in V\) and a trail \(p:a\leadsto b\), then for all \(t \ge \mathop {\textrm{len}}\limits (p)\)
Fix arbitrary \(t\ge \mathop {\textrm{len}}\limits (p)\) and denote the vertices of the path as \(a=v_0, \dots , v_m = b\). Whenever \(i<j\) we can truncate p to obtain a path \(v_i\leadsto v_j\) of length at most t and so \((v_i, v_j) \in E(G^t)\). Hence, whenever \(i< j < k\), there is a directed triangle \(v_i v_j v_k \in C_2(G^t)\) and hence \(v_i v_k = v_i v_j + v_j v_k \pmod { ^{g}{B}_1(G, t) }\). Therefore, inductively we can write
Fix a weighted digraph (G, w) and \(t\ge 0\). For any cycle \(v\in ^{g}{Z}_1(G, t) \), there is an initial cycle \(v'\in ^{g}{Z}_1(G, 0) \), supported on the edges of G, such that v is homologous to .
Proof
Given any edge \(\tau =(a, b)\in E(G^t)\), there is a path \(p:a \leadsto b\) in G of length at most t. Denoting the edges of p by \((\tau _1, \dots , \tau _m)\), Lemma 4.1 tells us, \(\tau = \sum _{i=1}^m \tau _i \pmod { ^{g}{B}_1(G, t) }\). Now, since \(\tau _i \in E(G)\), we see for each edge in p. \(\square \)
Corollary 4.3
Given \(G\in \varvec{\textrm{WDgr}}\),
(a)
any interval in \(\mathcal {B}{ ^{g}{\mathcal {H}}_1(G) }\) has birth time 0;
(b)
\( ^{g}{H}_1(G, 0) \) has a basis of simple undirected circuits; and
(c)
\(\# {{\,\textrm{Dgm}\,}}( ^{g}{\mathcal {H}}_1(G))\) coincides with the circuit rank of the underlying undirected graph.
Proof
The first point follows immediately from Proposition 4.2. To see the final two points, consider the chain complex, \( ^{g}{C}_\bullet (G, 0)\) at the start of the filtration. Since \(G^0\) has no edges, the chain complex is simply
×
Since G is an orientation of \(\mathcal {U}(G)\), the first homology of this chain complex is precisely the real cycle space of \(\mathcal {U}(G)\). Fix an arbitrary spanning forest T of \(\mathcal {U}(G)\), and label the remaining edges \(e_1, \dots , e_k\). Note k is the circuit rank of \(\mathcal {U}(G)\). Let \(p_i\) denote the simple undirected circuit in G which traverses \(e_i\) and then returns to \(\mathop {\textrm{st}}\limits (e_i)\) through the unique path in T. Then \(\{ \mathfrak {R}({p_1}), \dots , \mathfrak {R}({p_k}) \}\) is a basis for \( ^{g}{H}_1(G, 0) \). \(\square \)
4.2 Circuit Lifetimes
While all features in the barcode (and hence all cycles) are born at time \(t=0\), their death times generally differ. We can assign a death-time to any cycle \(v\in ^{g}{Z}_1(G, 0) \), as the first time v becomes null-homologous.
Definition 4.4
Given \(v\in ^{g}{Z}_1(G, 0) \), the death-time of v is
(4.3)
where we let \(\mathop {\mathrm {\mathcal {D}}}\limits (v){:}{=}\infty \) if there is no such t. The lifetime of v is the interval \({\mathop {\mathrm {\mathcal {L}}}\limits (v)} {:}{=}[0, \mathop {\mathrm {\mathcal {D}}}\limits (v))\).
Remark 4.5
Let p be an undirected circuit in G and \(p'\) the same circuit, traversed in the opposite direction so that \(\mathfrak {R}({p'}) = - \mathfrak {R}({p})\). Since is linear, \(\mathop {\mathrm {\mathcal {D}}}\limits (\mathfrak {R}({p}))=\mathop {\mathrm {\mathcal {D}}}\limits (\mathfrak {R}({p'}))\). Also, since \(\mathfrak {R}({p})\) does not depend on the starting vertex of p, neither does \(\mathop {\mathrm {\mathcal {D}}}\limits (\mathfrak {R}({p}))\).
This pipeline gives us a method for associating a lifetime to an undirected circuit in G which is ‘geometric’ in the sense that it does not depend on the starting vertex or direction. The length of this lifetime gives us a ‘size’ to the circuit from the perspective of the filtration. Since we use the shortest-path filtration, we interpret this size as the time it takes for the flow to ‘fill in’ the circuit.
Lemma 4.6
Given \(G=(V, E, w)\in \varvec{\textrm{WDgr}}\) and two directed paths \(p_1, p_2: a \leadsto b\) between distinct vertices \(a, b \in V\), let \(p_c\) denote the undirected circuit which traverses \(p_1\) forwards and then \(p_2\) in reverse. For \(i= 1, 2\), define
Denote \(T{:}{=}\max (h_1, h_2)\). First assume that there are at least 2 edges in each \(p_i\). Then, by the definition of \(h_i\), there exists \(v_i\in V\setminus \left\{ a, b \right\} \) along each \(p_i\) such that \(d(a, v_i) \le T\) and \(d(v_i, b) \le T\). Hence there is a long square \(a v_1 b - a v_2 b \in ^{g}{C}_2(G, T)\). By Lemma 4.1,
(4.5)
Finally, if \(p_1\) contains one edge then \(h_1 = \mathop {\textrm{len}}\limits (p_1)\) so \(d(a, b) \le h_1 \le T\). The definition of \(h_2\) ensures there is \(v_2\in V\) along \(p_2\) such that \(d(a, v_2), d(v_2, b) \le T\). Hence there is a directed triangle \(a v_2 b \in ^{g}{C}_2(G, T)\). By Lemma 4.1,
(4.6)
which concludes the proof. \(\square \)
×
Example 4.7
Note that the bound of Lemma 4.6 is by no means sharp. Consider for example Fig. 5. Let \(p_1\) be the outer red path (a, b, d), \(p_2\) the lower red path (a, c, d) and \(p_c\) the undirected circuit which traverse \(p_1\) forward then \(p_2\) in reverse. Then \(h_1 = h_2 = 10\) but \(\mathop {\mathrm {\mathcal {D}}}\limits (\mathfrak {R}({p_c})) = 2\).
To see this is the correct death time, first note that \( ^{g}{H}_1(G, t) \) can only change at integer values. At \(t=1\), the only edges present in \(G^t\) are the black ones drawn in in Fig. 5. Hence, \(C_2(G, 1)\) is generated by the long square \(ebf - ecf\), whose boundary is not \(\mathfrak {R}({p_c})\).
However, at \(t=2\) the edges (a, b), (b, d), (a, c) and (c, d) also appear in \(G^t\). These generate additional long squares and directed triangles. In particular \(abd - acd \in C_2(G, 2)\) and
(4.7)
4.3 Representatives
After computing persistent homology, it is common to compute homological cycles which represent the intervals in the barcode. In general, these “representatives” are not unique and may be quite complicated. In practice, one can often compute representatives with integer (and even unit) coefficients [27] and these representatives are frequently used for interpreting features (e.g. [2, 34]).
Definition 4.8
A persistence basis for \( ^{g}{\mathcal {H}}_1(G) \) is a choice of initial cycles \(B=\left\{ b_i \in ^{g}{Z}_1(G, 0) \right\} \) such that for each \(t\ge 0\), the set yields a basis for \( ^{g}{H}_1(G, t) \). We call elements of a persistence basis representatives.
Lemma 4.9
Given any \(G\in \varvec{\textrm{WDgr}}\), a persistence basis for \( ^{g}{\mathcal {H}}_1(G) \) always exists.
Proof
This follows from the structure theorem (Theorem 2.39) and Corollary 4.3. \(\square \)
Obtaining a persistence basis \(B\subseteq ^{g}{Z}_1(G, 0) \) is desirable because the constituent cycles represent the features of the barcode, in the following sense. If the barcode is \(\mathcal {B}{ ^{g}{\mathcal {H}}_1(G) } = \left\{ \!\!\left\{ I_1, \dots , I_m \right\} \!\!\right\} \) then there is an ordering on the cycles \(B = \left\{ b_1, \dots , b_m \right\} \) such that \(\mathop {\mathrm {\mathcal {L}}}\limits (b_i) = I_i\) and
Moreover, the isomorphism \(\phi : \oplus _{i=1}^m P(I_i) \rightarrow ^{g}{\mathcal {H}}_1(G) \) is given by mapping
(4.9)
This mapping gives an isomorphism because is always a basis for \( ^{g}{H}_1(G, t) \). In this sense, the representatives in B generate \( ^{g}{\mathcal {H}}_1(G) \).
Representatives live in \( ^{g}{Z}_1(G, 0) \) so they are just \(\mathbb {R}\)-linear combinations of edges in G. However, a priori, the coefficients of these linear combinations may be arbitrarily complicated. The goal of this section is to show that grounded persistent homology always admits a geometrically interpretable persistence basis, in the following sense.
Theorem 4.10
Given \(G\in \varvec{\textrm{WDgr}}\) with circuit rank m there exist undirected circuits \(p_1, \dots , p_m\) in G, such that \(\{\mathfrak {R}({p_1}),\dots , \mathfrak {R}({p_m})\}\) is a persistence basis for \( ^{g}{\mathcal {H}}_1(G) \).
×
Example 4.11
First, we note that it does not suffice to chose any basis of undirected circuits for \( ^{g}{Z}_1(G, 0) \). For example, consider the two weighted digraphs pictured in Fig. 6. In both digraphs, ignoring choice of direction, there are three undirected simple circuits, whose representatives we denote
$$\begin{aligned} \gamma _1^i&{:}{=}ab + bc - ac, \end{aligned}$$
(4.10)
$$\begin{aligned} \gamma _2^i&{:}{=}bc + cd - bd, \end{aligned}$$
(4.11)
$$\begin{aligned} \gamma _3^i&{:}{=}ab + bd - cd - ac. \end{aligned}$$
(4.12)
where \(\gamma _j^i \in ^{g}{Z}_1(G_i, 0) \). Note \(\gamma _3^i = \gamma _1^i - \gamma _2^i\). The GrPPH of these two weighted digraphs is
A persistence basis for \( ^{g}{\mathcal {H}}_1(G_1) \) is \(\{\gamma _1^1, \gamma _2^1\}\) with \(\mathop {\mathrm {\mathcal {L}}}\limits (\gamma _1^1) = [0, 1)\) and \(\mathop {\mathrm {\mathcal {L}}}\limits (\gamma _2^1) = [0, 2)\). Note that \(\{\gamma _2^1, \gamma _3^1\}\) is not a persistence basis for \( ^{g}{\mathcal {H}}_1(G_1) \) because \(\mathop {\mathrm {\mathcal {L}}}\limits (\gamma _3^1) = [0, 2)\).
However \(\mathop {\mathrm {\mathcal {L}}}\limits (\gamma _1^2) = \mathop {\mathrm {\mathcal {L}}}\limits (\gamma _2^2) = [0, 2)\), hence \(\{\gamma _1^2, \gamma _2^2\}\) does not form a persistence basis for \( ^{g}{\mathcal {H}}_1(G_2) \). At \(t=2\), we see and hence . Instead, a persistence basis for \( ^{g}{\mathcal {H}}_1(G_2) \) is \(\{\gamma _2^2, \gamma _3^2\}\).
This illustrates that an arbitrary choice of undirected circuit basis for \( ^{g}{H}_1(G, 0) \) may not yield a persistence basis of \( ^{g}{\mathcal {H}}_1(G) \). Moreover, a correct choice of basis does not depend only on \(\mathcal {U}(G)\); we must incorporate information about how cycles in \( ^{g}{Z}_1(G, 0) \) die, in order to choose a persistence basis.
To begin tackling Theorem 4.10, since G is finite, we note there are finitely many critical values \(t_1=0, \dots , t_m\) where the chain complex \( ^{g}{C}_\bullet (G, t)\) changes. Therefore, it suffices to study the following finite persistent chain complex instead.
×
To the right of the chain complex we show the induced maps on homology . By Lemma 4.2, these maps on homology are always surjective. Our strategy is to find undirected circuit bases for the kernel of each of these maps; Lemma 4.12 achieves this and is the key result. We then collect these elements into a basis for . Together, these elements form representatives for the homology classes with finite lifetime. To obtain representatives for the infinite feature, we extend this to a basis for all of \( ^{g}{H}_1(G, 0) \) and show that we obtain a persistence basis.
Lemma 4.12
For each \(i=2, \dots , m\), there is a basis \(\{b_1, \dots , b_{k_i}\}\) of such that for some undirected circuit \(p_{i, j}\) in G.
Proof
For notational convenience, we define \(r{:}{=}t_{i-1}\) and \(s{:}{=}t_i\). When the filtration increases from \(t=r\) to \(t=s\), some number of edges are added to \(G^t\) which yield new generators for both \(C_1(G \cup G^t)\) and \(C_2(G^t)\). Our approach is to decompose into a sequence of chain maps. In the first, all the new generators of \(C_1(G \cup C^{s})\) are added, along with sufficient new generators in \(C_2(G^{s})\) to make the new edges homologous to a sum of edges already present in \(C_1(G\cup G^{r})\). Therefore, on homology this first map is an isomorphism. We then add the remaining generators of \(C_2(G^{s})\) one at a time in order to find a basis for . Since we are only interested in degree 1 homology, it suffices to restrict our attention to degrees 0, 1 and 2. \(\square \)\(\square \)
Denote the set of new edges \(E_{new} {:}{=}E(G\cup G^{s}) {\setminus } E(G \cup G^{r})\). Given an edge \(e=(a, b)\in E_{new}\), there is some directed path \(p:a \leadsto b\) in G, of length at most s. Moreover this path must have at least one vertex distinct from the endpoints of e, otherwise \(e \in G\). Choose arbitrary such \(v_e \in V(p)\). Then \((a, v_e, b)\) is a directed triangle in \(G^{s}\) so \(w_e {:}{=}av_e b\) is a new generator of \(C_2(G^{s})\). Repeating this for all new edges we obtain a set of generators
(4.14)
which were not present in \(C_2(G^{r})\) and are linearly independent. Define \(W_0 {:}{=}\left\langle U_{new} \right\rangle \) and let \(Q_0\) denote the degree 1 homology of the chain complex
×
which is a subcomplex of \( ^{g}{C}_\bullet (G, s)\).
Claim 4.13
The inclusion chain map
×
induces an isomorphism on degree 1 homology, \(q^0: ^{g}{H}_1(G, r) \rightarrow Q_0\).
Proof of Claim
We define a chain map \({c}_\#\) in the opposite direction and a homotopy \(P:C_1(G\cup G^s) \rightarrow C_2(G^r)\oplus W_0\) such that \(c_1 \circ q^0_1 = \textrm{id}\) while . Hence, on degree 1 homology, \({c}_\#\) induces an inverse to \(q^0\). The chain map in degrees \(k\ne 1\) is given by
where \(w_{(a, b)}\in U_{new}\). A standard check of the two cases verifies that \({c}_\#\) is a chain map and the relations \(c_1 \circ q_1^0 = \textrm{id}\) and hold.
Intuitively, \({c}_\#\) collapses a new edge \(e=(a, b)\in E_{new}\) onto the sum of edges \(av_e + v_eb\) in \(G \cup G^r\). The homotopy P shows the two elements are homologous thanks to the presence of the directed triangle \(w_e\in U_{new}\). \(\square \)
By Proposition 2.33, there exists \(u_1, \dots , u_n \in C_2( G^{s} )\) such that
where each \(u_i\) is amongst the generators identified in Proposition 2.33. Define \(W_i {:}{=}W_0 \oplus \left\langle u_1, \dots , u_i \right\rangle \). This gives a sequence of inclusion chain maps
×
where each rows is a subcomplex of \( ^{g}{C}_\bullet (G, s)\). We denote the degree 1 homology groups by \(Q_i\), with \(Q_n {:}{=} ^{g}{H}_1(G, S) \), and the induced homology maps by \(q^i: Q_{i-1} \rightarrow Q_i\). Together these chain maps decompose and hence the \(q^i\) decompose , as show in the following diagram.
×
Note that \(\dim Q_i\) drops by at most 1 at each step, since \(\dim W_{i+1} = \dim W_i + 1\). Let J denote the subset of indices where the dimension drops, i.e.
(4.19)
Then for each \(i\in J\), the map on homology \(q^i\) has nullity 1 and a basis for \(\ker q^i\) is \(\{[ b_i ]\}\) where .
Note that \([b_i] \in Q_0\) and \((q^{i-1} \circ \dots \circ q^1)[b_i] = [b_i]\) in \(Q_{i-1}\). Since each \([b_i]\) for \(i \in J\) dies in a different \(Q_i\), they must be linearly independent in \(Q_0\). Moreover, the nullity of \(q^n \circ \dots \circ q^1\) is \(\# J\) so gives a basis for \(\ker (q^n \circ \dots \circ q^1)\). Hence forms a basis for . It now remains to prove that each \((q^0)^{-1}[b_i]\) has a undirected circuit representative.
Claim 4.14
For each \(i\in J\), there exists an undirected circuit \(p_i\) in G such that .
Proof of Claim
First we recall that where \(u_i\) is either a double edge, a directed triangle or a long square in \(G^s\). Some of the boundary edges may be edges in \(G^r\) but at least one boundary edge is new in \(G^s\). By the previous claim, a representative for \((q^0)^{-1}[b_i]\) is \(c_1(b_i)\).
We can write \(b_i = \mathfrak {R}({p})\) where p is the undirected circuit which traces the outline of the generator \(u_i\). Now \(c_1\) maps edges in \(G^r\) to themselves and edges not in \(G^r\) to a sum of two edges in \(G^r\) with the same boundary. Therefore, no matter which type of generator \(u_i\) is, we can write \(c_1(b_i) = \mathfrak {R}({\widetilde{p}})\) for some undirected circuit \(\widetilde{p}\) through \(G^r\). Using Lemma 4.1, for each edge \(\tau \in E(\widetilde{p})\) in this circuit there is a directed path \(T_\tau \) through G of length at most r such that \(\tau = \mathfrak {R}({T_\tau }) \pmod { ^{g}{B}_1(G, r) }\). Concatenating the \(T_\tau \) we obtain an undirected circuit \(p_{i}\) through G such that \(\mathfrak {R}({p}) = \mathfrak {R}({p_i})\pmod { ^{g}{B}_1(G, r) }\). \(\square \)
This claim concludes the proof. \(\square \)
Remark 4.15
Note that the undirected circuits \(p_{i, j}\) may not be simple. We conjecture that it should be possible to choose every \(p_{i, j}\) to be simple but do not, as yet, have a proof.
×
Example 4.16
To illustrate how the generators are added in the proof of Lemma 4.12, consider Fig. 7. First note that there is a single feature with representative
$$\begin{aligned} \gamma {:}{=}(av_1 + v_1 v_2 + v_2 v_3 + v_3 b) - (a v_4 + v_4 v_5 + v_5 v_6 + v_6 b) \end{aligned}$$
(4.20)
which dies at \(t=3\). Further note that new edges appear at integer values in the shortest-path filtration and \(\mathcal {B}{ ^{g}{\mathcal {H}}_1(G) } = \left\{ \!\!\left\{ [0, 3) \right\} \!\!\right\} \).
The new edges which appear at \(t=3\) are highlighted in blue. The new generators of \(C_2(G^3)\) are the four blue directed triangles and the central green long square. The blue directed triangles form the elements of \(U_{new}\) and the central long square is the sole remaining generator \(u_1\). So a basis for \(\ker q^1\) is .
To find a basis for we must compute . Firstly . Then the chain map \(c_1\) maps each of the blue edges to a sum of edges in \(G^2\). In fact which is the representative of the sole simple undirected circuit in G.
The following Lemma follows by a standard linear algebra argument, since each is surjective.
Lemma 4.17
Given \(B_i\subseteq ^{g}{H}_1(G, 0) \) such that is a basis for , the union \(\cup _{i=1}^{m-1} B_i\) is a basis for .
Certainly a basis of undirected circuits for \( ^{g}{H}_1(G, 0) \) exists (by Corollary 4.3). Therefore, we can always extend linearly independent undirected circuits to a basis of such circuits for \( ^{g}{H}_1(G, 0) \).
Lemma 4.18
Given undirected circuits \(p_1, \dots , p_k\) such that \([\mathfrak {R}({p_1})], \dots , [\mathfrak {R}({p_k})]\) are linearly independent in \( ^{g}{H}_1(G, 0) \), there exists undirected circuits \(p_{k+1}, \dots , p_N\) such that \(\{[\mathfrak {R}({p_i})]\}_{i=1}^N\) is a basis for \( ^{g}{H}_1(G, 0) \).
We now have all the ingredients we need to prove the main theorem.
Proof of Theorem 4.10
Using Lemmas 4.12 and 4.17, we obtain undirected circuits \(p_1, \dots , p_l\) such that \(\left\{ \mathfrak {R}({p_1}), \dots , \mathfrak {R}({p_l}) \right\} \) forms a basis for . Using Lemma 4.18, we extend this to a basis \(\left\{ \mathfrak {R}({p_1}), \dots , \mathfrak {R}({p_l}), \mathfrak {R}({p_{l+1}}), \dots , \mathfrak {R}({p_N}) \right\} \) for \( ^{g}{H}_1(G, 0) \). This is a persistence basis for \( ^{g}{\mathcal {H}}_1(G) \). \(\square \)
Remark 4.19
While we are guaranteed a basis of undirected circuits, this choice of basis is by no means unique. As a simple example, consider again Example 4.11 and Fig. 6. Two possible persistence bases for \( ^{g}{\mathcal {H}}_1(G_2) \) are \(\left\{ \gamma ^2_1, \gamma ^2_3 \right\} \) and \(\left\{ \gamma ^2_2, \gamma ^2_3 \right\} \). The non-uniqueness of the basis arises in the proof of Lemma 4.12. Namely, there is a choice of \(v_e\) and \(w_e\) for each \(e\in E_{new}\), and a choice of order on the remaining \(u_i\).
×
4.4 Decomposition
In order to more easily compute GrPPH, it is desirable to understand how decompositions of the input weighted digraphs give rise to decompositions of the descriptor. The simplest such decomposition is a disjoint union; as one might expect, the descriptor decomposes as a direct sum (Fig. 8).
Theorem 4.20
Suppose \(G\in \varvec{\textrm{WDgr}}\) decomposes as a disjoint union, \(G=G_1\sqcup G_2\), then
For each degree \(k\ge 0\), if \(H = H_1 \sqcup H_2\) then \(C_k(H) = C_k(H_1)\oplus C_k(H_2)\). Therefore, for each \(k\ge 0\), \( ^{g}{C}_k(G, t)\) splits as direct sum \( ^{g}{C}_k(G_1, t)\oplus ^{g}{C}_k(G_2, t)\). The boundary operator respects this split, mapping \( ^{g}{C}_k(G_i, t) \rightarrow ^{g}{C}_{k-1}(G_i, t)\) and the maps also respect this split, mapping \( ^{g}{C}(G_i, s) \rightarrow ^{g}{C}(G_i, t)\). Taking homology in degree 1 maintains this direct sum decomposition. \(\square \)
Definition 4.21
(a)
Given a weighted digraph \(G=(V, E, w)\), a wedge vertex is a vertex \(\hat{v}\in V\) such that there is a decomposition
$$\begin{aligned} V = V_1 \cup V_2 \end{aligned}$$
(4.23)
with \(V_1 \cap V_2 = \{ \hat{v} \}\) such that \(E \subseteq (V_1 \times V_1) \cup (V_2 \times V_2)\).
(b)
Given a wedge vertex, \(\hat{v}\) as above the corresponding wedge decomposition of G is the pair \((G_1, G_2)\) where \(G_1\) and \(G_2\) are the induced subgraphs on \(V_1\) and \(V_2\) respectively. We write \({G=G_1\vee _{\hat{v}} G_2}\).
(c)
Given a wedge decomposition as above, a pair of vertices \(a, b \in V\) are called separated if they do not lie in a common \(V_i\).
Remark 4.22
Given a wedge decomposition \(G = G_1\vee _{\hat{v}} G_2\) note that \(G = G_1 \cup G_2\).
In the case of a wedge decomposition \(G=G_1 \vee _{\hat{v}} G_2\), since each simple circuit is contained either entirely in \(G_1\) or entirely in \(G_2\), one expects that GrPPH also decomposes. The proof is more complicated because, in general, \(G^t \ne G_1^t \vee _{\hat{v}} G_2^t\), since there may be paths between separated vertices, through \(\hat{v}\). However, using a chain homotopy, we can show that these edges do not affect the homology.
Theorem 4.23
For a weighted digraph \(G=(V, E, w)\) and a wedge decomposition \(G=G_1 \vee _{\hat{v}} G_2\),
There are natural inclusion digraph maps \(j_i: G_i \rightarrow G\) which are also contractions. Less obviously, there are contraction digraph maps \(f_i: G \rightarrow G_i\), where
Since these are all morphisms in \(\varvec{\textrm{Cont}}\varvec{\textrm{WDgr}}\), we obtain induced morphisms and . We combine these morphisms to get two morphisms as follows
(4.26)
(4.27)
Composing with homology in degree 1, denote \({J}_* {:}{=}{\textbf {[}}\varvec{\textrm{R}},\, H_1{\textbf {]}} \circ J\) and \({F}_* {:}{=}{\textbf {[}}\varvec{\textrm{R}},\, H_1{\textbf {]}} \circ F\). In the rest of the proof, we show that \({J}_*\) and \({F}_*\) are mutually inverse.
Claim 4.24
In degree 1, \(F\circ J = \textrm{id}\) is the identity map on \( ^{g}{C}(G_1)\oplus ^{g}{C}(G_2)\).
Proof of Claim
First note that, \(f_i \circ j_i\) is the identity digraph map \(\textrm{id}_i: G_i \rightarrow G_i\). However, \(f_{3-i}\circ j_i\) is the constant digraph map \(c_{i}: G_i \rightarrow G_{3-i}\) which maps all of \(G_i\) to the vertex \(\hat{v}\). Hence, in matrix form, we can write \(F\circ J\) as
(4.28)
Since the constant maps \(c_i\) send all vertices to a single vertex, maps every edge to 0. Hence, in degree 1, is the zero map. Whereas, in degree 1, is the identity map on \( ^{g}{C}_1(G_i, t)\) at each t. Therefore, on degree 1, \(F\circ J\) is the identity map on \( ^{g}{C}(G_1) \oplus ^{g}{C}(G_2)\). \(\square \)
Composing with homology in degree 1, we see \({F}_*\circ {J}_*=\textrm{id}\) is the identity map on \( ^{g}{\mathcal {H}}_1(G_1) \oplus ^{g}{\mathcal {H}}_1(G_2) \).
Claim 4.25
In degree 1, \({J}_* \circ {F}_* = \textrm{id}\) is the identity map on \( ^{g}{\mathcal {H}}_1(G) \).
Proof of Claim
First, we compute \(J\circ F\) in degree 1. Recall that \( ^{g}{C}_1(G, t)\) is freely generated by the edges in \(G \cup G^t\). Given an edge \(e=(a, b)\in E(G \cup G^t)\), if the vertices a, b lie in a common \(V_i\) then \((J \circ F)(e) = e\). However, if a, b are separated then \((J\circ F)(e) = a\hat{v} + \hat{v}b\). So we see, at the level of chains, \(J\circ F\) does not compose to the identity.
If \(e=(a, b)\in E(G\cup G^t)\) but the endpoints are separated then we must have \(e\in E(G^t)\). Hence, there is a path \(p: a\leadsto b\) in G of length at most t. Moreover, this path must traverse the vertex \(\hat{v}\). Hence, p decomposes into two paths \(a\leadsto \hat{v}\) and \(\hat{v} \leadsto b\), each of length at most t. Therefore, the directed triangle \(a\hat{v} b\) is present in \(G^t\) and is a generator of \( ^{g}{C}_2(G, t)\). Note that the boundary of \(a\hat{v}b\) is
(4.29)
This discussion show that we can define a map \(P: ^{g}{C}_1(G, t) \rightarrow ^{g}{C}_2(G, t)\) by
Then, as maps on \( ^{g}{C}_1(G)\). Composing with homology, we see \({J}_*\circ {F}_* = \textrm{id}\) is the identity on \( ^{g}{\mathcal {H}}_1(G) \). \(\square \)
Since \({J}_*\) and \({F}_*\) are mutually inverse, they induce isomorphisms of persistent vector spaces. \(\square \)
×
5 Stability Analysis of GrPPH
It is important that GrPPH is stable with respect to a reasonable noise model. Typically this is shown by proving that \( ^{g}{\mathcal {H}}_1\) is Lipschitz with respect to reasonable metrics on \(\varvec{\textrm{WDgr}}\) and the bottleneck distance on \(\varvec{\textrm{PersVec}}\). A common choice of metric on graphs is the graph edit distance. However, assigning costs to operations such as edge deletion or edge subdivision is somewhat arbitrary.
Therefore, in this section, we consider operations \(\mathbb {O}^{T}_{\theta } :\textrm{Obj}(\varvec{\textrm{WDgr}})\rightarrow \textrm{Obj}(\varvec{\textrm{WDgr}})\) for editing weighted digraphs, where T is the type of operation \(\theta \) is the parameter of the operation. For each type T, we derive bounds of the form
and then say GrPPH is stable with respect to operations of type T.
Often the operations only alter the graph at a subset of vertices or edges. We say that GrPPH is locally stable to the operation if we obtain a bound as in (5.1) and f depends only on the neighbourhood graph around the altered vertices/edges and \(\theta \). If we can show that no such local f exists (for general G and \(\theta \)) then we say GrPPH is locally unstable. If, in general, f depends on all of G then we say GrPPH is non-locally stable to the operation. Occasionally, some operations do not change the descriptor and we can find an isomorphism \( ^{g}{\mathcal {H}}_1(G) \cong ^{g}{\mathcal {H}}_1(\mathbb {O}^{T}_{\theta }{G}) \).
Figure 9 illustrates all of the operations we consider, the precise definitions of which are provided in the relevant subsection. Table 1 summaries our findings.
Table 1
Stability and instability theorems for \( ^{g}{\mathcal {H}}_1\), under various digraph operations. \(\blacklozenge \) Denotes a theorem which only applies to a subset of such operations
In order to prove stability, we will have to build interleaving chain maps. We construct these via maps of the underlying vertex sets.
Definition 5.1
For \(\delta \ge 0\), a \(\delta \)-shifting vertex map, between two weighted digraphs G and H, is a vertex map \(f:V(G) \rightarrow V(H)\) such that f induces digraph maps \(G^t \rightarrow H^{t+\delta }\) and \(G\cup G^t \rightarrow H \cup H^{t+\delta }\) for all \(t\ge 0\).
Remark 5.2
By Lemma 3.15, a 0-shifting vertex map is precisely a contraction digraph map.
Lemma 5.3
Any \(\delta \)-shifting vertex map \(f:V(G) \rightarrow V(H)\) induces a morphism
(5.2)
Given another \(\epsilon \)-shifting vertex map \(g:V(H) \rightarrow V(K)\),
(5.3)
Moreover, if f is 0-shifting then .
Proof
Since f is \(\delta \)-shifting, it induces the necessary digraph maps so that Lemma 3.9 yields a chain map \(J(f, t\le t+\delta ): ^{g}{C}_\bullet (G, t) \rightarrow ^{g}{C}_\bullet (H, t+\delta )\) for any \(t\ge 0\). Given \(s\le t\), the chain map is induced, through Lemma 3.9, by the identity vertex map. Hence, by Eq. (3.5), we see
(5.4)
Therefore we can combine the chain maps \(J(f, t\le t+\delta )\) into the required morphism .
Equation (5.3) follows immediately from Eq. (3.5). Finally, when \(\delta = 0\), we note that in fact \(f\in {\varvec{\textrm{WDgr}}}_{F_d}\) and the construction of is identical to the construction of , as done in Theorem 3.13. \(\square \)
Recall that \( \mathcal {T}( ^{g}{C}(G), \epsilon )\) at each \(t\ge 0\) is the chain map . Shifting this by \(\delta \), \( \mathcal {T}( ^{g}{C}(G), \epsilon ) [ \delta ]\) is given at \(t\ge 0\) by the chain map .
Lemma 5.4
A \(\delta \)-shifting vertex map is a \(\delta '\)-shifting vertex map for any \(\delta ' \ge \delta \) and
(5.5)
Proof
At each \(t\ge 0\), each of the morphisms in Eq. (5.5) are induced by Lemma 3.9. Namely, is given by \(J(f, t\le t+\delta ')\), \( \mathcal {T}( ^{g}{C}(H), \delta ' - \delta ) [ \delta ]\) is given by \(J(\textrm{id}_{V(H)}, t+\delta \le t+\delta ')\) and is given by \(J(f, t\le t+\delta )\). Since \(\textrm{id}_{V(H)} \circ f = f\) as vertex maps and \(t \le t+\delta \le t+\delta '\), Eq. (5.5) follows immediately from Eq. (3.5). \(\square \)
Definition 5.5
Fix weighted digraphs G, H and two vertex maps \(f: V(G) \rightarrow V(H)\) and \(g: V(H) \rightarrow V(G)\).
(a)
Denote
(5.6)
(5.7)
Also, denote the following unweighted digraph \({G_{diff}(g, f)} {:}{=}(V(G), E_{diff}(g, f))\).
(b)
If \(\textrm{id}: G_{diff}(g, f) \rightarrow G^{2\delta }\) and \(g\circ f: G_{diff}(g, f) \rightarrow G^{2\delta }\) are both digraph maps and furthermore are path homotopic relative \(V_{fix}(g, f)\) then we say the ordered pair (g, f) has grounded codistortion \(\le \delta \).
(c)
If f and g are both \(\delta \)-shifting vertex maps and the pairs (g, f) and (f, g) both have grounded codistortion \(\le \delta \) then we say they form a \(\delta \)-grounded interleaving.
Remark 5.6
(a)
Fix \(\delta ' \ge \delta \). Since \(t\mapsto G^t\) is an increasing filtration, if a pair of vertex maps (g, f) has grounded codistortion \(\le \delta \), then they certainly have grounded codistortion \(\le \delta '\).
(b)
If \(g \circ f = \textrm{id}\) as a vertex map then (g, f) has grounded codistortion \(\le 0\).
(c)
A 0-grounded interleaving is precisely a mutually inverse pair of isomorphisms of the underling digraphs which also induce isometries of the shortest-path quasimetrics.
Remark 5.7
For the interested reader, this definition is strongly inspired by the Kalton-Ostrovskii characterisation of network distance, which first appeared for directed networks in [12]. In particular, this characterisation was subsequently used to prove the stability of persistent path homology in [13].
Theorem 5.8
(Main stability theorem) Given two weighted digraphs G, H, if there is a \(\delta \)-grounded interleaving between them then \(d_B( \mathcal {B}{ ^{g}{\mathcal {H}}_1(G) }, \mathcal {B}{ ^{g}{\mathcal {H}}_1(H) } ) \le \delta \).
Proof
Denote the \(\delta \)-grounded interleaving by \(f: V(G) \rightarrow V(H)\) and \(g: V(H) \rightarrow V(G)\). Since the vertex maps are \(\delta \)-shifting they induce chain maps and . In the following we will show they induce a \(\delta \)-interleaving between \( ^{g}{\mathcal {H}}_1(G) \) and \( ^{g}{\mathcal {H}}_1(H) \). The result then follows by the isometry theorem.
Since (g, f) has grounded codistortion \(\le \delta \) there is a homotopy \(F: G_{diff}(g, f) \rightarrow G^{2\delta }\) between \(\textrm{id}\) and \(g \circ f\), relative \(V_{fix}(g, f)\). By Theorem 2.35, this induces a chain homotopy \(L:C_\bullet (G_{diff}(g, f)) \rightarrow C_{\bullet + 1}(G^{2\delta })\) between \({\textrm{id}}_\#\) and \({(g \circ f)}_\#\). Observe that
Hence, from the chain homotopy, we obtain the following diagram where the top row is \( ^{g}{C}_\bullet (G, 0)\) and the bottom row is \( ^{g}{C}_\bullet (G, 2\delta )\). To ease notation, we drop the ring \(R\) and we use blue to denote the domain of the \(L_i\) maps.
×
Now take a degree-1 cycle in the top row, i.e. some chain \(c\in C_1(G)\) such that . Thanks to Eq. (5.8), we can decompose \(c = c_1 + c_2\) where \(c_1 \in R\langle E_{diff}(g, f) \rangle \) and \(c_2 \in R\langle E_{fix}(g, f) \rangle \).
Since \(c_1\) is supported only on edges of \(G_{diff}(g, f)\), the chain homotopy equation for L yields
(5.11)
Since \(c_2\) is supported only on edges in \(E_{fix}(g, f)\), one can easily verify that \({\textrm{id}}_\# (c_2) = {(g\circ f)}_\#(c_2)\). Moreover is supported only on vertices in \(V_{fix}(g, f)\) and the homotopy is relative \(V_{fix}(g, f)\) so Lemma 2.37 ensures that . Piecing these together we obtain
Note that \({(g\circ f)}_\#\) is the degree-1 component of the chain map , at t=0. Similarly \({\textrm{id}}_\#\) is the degree-1 component of the chain map . Therefore on the level of homology, . Moreover, Proposition 4.2 implies that is a surjection for all \(t \ge 0\) from which it follows that for all \(t \ge 0\). The same argument holds with the opposite composition and hence we obtain a \(\delta \)-interleaving as required. \(\square \)
Remark 5.9
Given \(\delta \)-shifting vertex maps f and g, we obtain chain maps and . Rather than showing the pairs have grounded codistortion \(\le \delta \), it is possible to show they form an interleaving on homology by instead considering their action on a basis of undirected circuits for \( ^{g}{Z}_1(G, 0) \). One must show that for any undirected circuit c, there is some \(b\in C_2(G^{2\delta })\) such that
(5.12)
This approach was taken in a previous version of this manuscript, available on the arXiv. Note that if (g, f) has grounded codistortion \(\le \delta \) then one can take \(b = L_1(\mathfrak {R}({c}))\), where \(L_1\) is the degree-1 component of the induced chain homotopy between \(g\circ f\) and \(\textrm{id}\).
5.2 Weight Perturbation
The classical stability theorem of persistent homology (first shown in [14]) is that for two continuous tame function \(f, g: X \rightarrow \mathbb {R}\) of a triangulable topological space X, denoting the persistence barcode of their sub-level set filtration by \(\mathcal {B} _f\) and \(\mathcal {B} _g\) respectively,
In our setting, the closet analogy to changing the function is changing the weighting, as well as the corresponding effect that has on the shortest-path quasimetric. We find that GrPPH is stable to perturbations of the edge weights. Moreover, the stability is local since it depends only of the weights of the perturbed edges.
Definition 5.10
Given a weighted digraph \(G=(V, E, w)\) and a new weight function \(w': E(G) \rightarrow \mathbb {R}_{>0}\), we define \({\mathbb {O}^{p}_{w'}{G}}{:}{=}(V, E, w')\).
Theorem 5.11
Given a weighted digraph \(G=(V, E, w)\in \varvec{\textrm{WDgr}}\) and a new weighting function \(w':E(G) \rightarrow \mathbb {R}_{>0}\), let d and \(d'\) denote the shortest-path quasimetric on G and \(\mathbb {O}^{p}_{w'}{G}\) respectively. Then
For brevity, denote \(G'{:}{=}\mathbb {O}^{p}_{w'}{G}\) and \(\delta {:}{=}\max _{i, j\in V}\left| d(i, j) - d'(i, j)\right| \). First note that for any (i, j) and any path \(p\in \mathcal {P}({i}\rightarrow {j})\)
So the cost of p differs by at most \(W_1\). Minimising over \(\mathcal {P}({i}\rightarrow {j})\), we see \(\left| d(i, j) - d'(i, j)\right| \le W_1\).
Since \(V(G) = V(G')\), there are identity vertex maps \(i_1:V(G) \rightarrow V(G')\) and \(i_2:V(G')\rightarrow V(G)\). Now \(i_1\) defines a digraph map \(G\rightarrow G'\) since \(G=G'\) as digraphs. Moreover, given \((i, j)\in E(G^t)\), then \(d(i, j) \le t\) so \(d'(i, j) \le t + \delta \) and hence \((i, j) \in E({(G')}^{t+\delta })\). This shows \(i_1\) defines a digraph map \(G^t \rightarrow {(G')}^{t+\delta }\) for all \(t \ge 0\). Therefore \(i_1\) (and likewise \(i_2\)) is a \(\delta \)-shifting vertex map. Moreover, since composing \(i_1\) and \(i_2\) in either ordered yields the identity vertex map, these morphisms certainly constitute a \(\delta \)-grounded interleaving. The first inequality then follows by the main stability theorem (Theorem 5.8). \(\square \)
Remark 5.12
Continuing the analogy to the classical stability theorem, note that the sharper bound obtained by Theorem 5.11 is \(|\!|d-d'|\!|_\infty \) while the weaker bound is \(|\!|w-w'|\!|_1\).
5.3 Edge Subdivision
Weighted digraphs arising in applications are subject not only to numerical noise (i.e. weight perturbation) but also structural noise. For the remainder of this section, we investigate the effects of various structural perturbations.
First, we consider edge subdivision, in which one or more parent edge is split into multiple child edges with the weight distributed amongst them. Since we are interpreting edge weights as corresponding to a length, it is natural to require that the sum of the weights of the child edges equals the weight of the parent edge. In order to formalise how the weight of an edge is subdivided amongst its children, we use maps into the standard d-simplex, where d is the number of child edges.
Definition 5.13
Given a weighted digraph \(G=(V, E, w)\), a subdivision S of G is a choice of edges \(F \subseteq E\), along with a map \(S:F\rightarrow \sqcup _{d\in \mathbb {N}} {{\,\mathrm{\textrm{int}}\,}}(\Delta ^d)\) from edges in F to the formal disjoint union of the interiors of each standard d-simplices.
Intuitively, a subdivision gives us a recipe for subdividing the edges of F where \(S(e)_i\) describes the fraction of w(e) which the \(i^{th}\) child edge of e should receive..
Notation 5.14
Given a subdivision \(S:F\rightarrow \sqcup _{d\in \mathbb {N}}{{\,\mathrm{\textrm{int}}\,}}(\Delta ^d)\),
(a)
Let d(e) denote the simplex dimension such that \(S(e) \in {{\,\mathrm{\textrm{int}}\,}}(\Delta ^{d(e)})\). Note S(e) is a \((d(e)+1)\)-tuple whose components we denote \(S(e)=(S(e)_0, \dots S(e)_{d(e)})\).
(b)
Let CS(e) denote the \((d(e)+1)\)-tuple of cumulative sums, i.e. \(CS(e)_i {:}{=}\sum _{j=0}^i S(e)_j\).
Definition 5.15
Given a subdivision \(S:F\rightarrow \sqcup _{d\in \mathbb {N}}{{\,\mathrm{\textrm{int}}\,}}(\Delta ^d)\), define
where \(\tau _{e, i} = (v_{e, i}, v_{e, i+1})\) and we denote \(v_{e, 0} {:}{=}\mathop {\textrm{st}}\limits (e)\) and \(v_{e, d(e)+1} {:}{=}\mathop {\textrm{fn}}\limits (e)\). We then define \({\mathbb {O}^{s}_{S}{G}}{:}{=}(V_S, E_S, w_S)\).
We show that the descriptor is stable to arbitrary subdivisions of arbitrary subsets of edges. Moreover, this stability is local since the bound depends only on the weight of subdivided edges.
Theorem 5.16
Given a weighted digraph \(G=(V, E, w)\in \varvec{\textrm{WDgr}}\) and any subdivision \(S:F\rightarrow \sqcup _{d\in \mathbb {N}}{{\,\mathrm{\textrm{int}}\,}}(\Delta ^d)\),
To begin, denote \(\delta {:}{=}\max _{e \in F} w(G)(e)\). We will construct a \(\delta \)-grounded interleaving and then employ the main stability theorem (Theorem 5.8). First, we setup some notation. Denote \(G_S{:}{=}\mathbb {O}^{s}_{S}{G} = (V_{old} \sqcup V_{new}, E_{old}\sqcup E_{new}, w_S)\). We let d and \(d_S\) denote the shortest-path quasimetric on G and \(G_S\) respectively. Define the following vertex maps
For vertices \(i, j \in V_{old}\), there is a path \(i\leadsto j\) in G of length t if and only if there is one in \(G_S\).
Proof of Claim
This is clear to see, since the weight of an edge is shared amongst its child edges in the subdivision. \(\square \)
Claim 5.18
For any \(t\ge 0\), f defines a digraph map \(G^t \rightarrow G_S^{t+\delta }\) and \(G\cup G^t \rightarrow G_S \cup G_S^{t+\delta }\).
Proof of Claim
Since f is just the inclusion vertex map, Claim 5.17 shows that f defines a digraph map \(G^t \rightarrow G_S^t\) and so certainly \(G^t \rightarrow G_S^{t+\delta }\). For the second map, pick an edge \(e \in E\) and note \(f(e) = e\). If \(e\not \in F\) then it is undivided and \(e \in E(G_S)\). Otherwise \(e\in F\) and \(e\not \in E(G_S)\), however we note \(d_S(\mathop {\textrm{st}}\limits (e), \mathop {\textrm{fn}}\limits (e)) \le w(e) \le \delta \). Therefore, for any \(t\ge 0\), \(e \in G_S^{t+\delta }\) and hence f defines a digraph map \(G\cup G^t \rightarrow G_S \cup G_S^{t+\delta }\). \(\square \)
Claim 5.19
For any \(t\ge 0\), g defines a digraph map \(G_S^t \rightarrow G^{t+\delta }\) and \(G_S \cup G_S^t \rightarrow G \cup G^{t+\delta }\).
Proof of Claim
Given an edge \(\tau = (a, b)\in E(G_S^t)\) there is a path \(p:a\leadsto b\) in \(G_S\) of length at most t. We may assume that \(g(a)\ne g(b)\), else there is nothing to check for this edge. If \(a=v_{e, i}\) is a new vertex from subdividing an edge \(e\in F\) then g(a) is either \(\mathop {\textrm{st}}\limits (e)\) or \(\mathop {\textrm{fn}}\limits (e)\). Either by adding or removing relevant child edges of e to/from the start of p, we obtain a new path \(g(a)\leadsto b\) in \(G_S\). By construction, this will add at most \(w(e)/2 \le \delta /2\) to the length of p. Likewise we can alter the end of p to obtain a path \(g(a) \leadsto g(b)\) in \(G_S\) of length at most \(t+\delta \). By Claim 5.17, we see \(g(\tau )\in E(G^{t+\delta })\).
Finally, given an edge \(\tau \in G_S\) there are two cases. If \(\tau \in E_{old}\) then the edge is preserved under g. Else \(\tau = \tau _{e, i} \in E_{new}\) in which case either \(\tau \) is collapsed to one of the endpoints of e, or it is mapped to e. Hence g is digraph map \(G_S \rightarrow G\) and the final requirement follows. \(\square \)
These claims show that f and g are \(\delta \)-shifting vertex maps. Note that, as vertex maps, \(g\circ f = \textrm{id}_{V(G)}\). Therefore (g, f) certainly has grounded codistortion \(\le \delta \). Composing vertex maps in the opposite order, we do not obtain the identity. Note that \(E_{diff}(f, g) = E_{new}\) and \(V_{diff}(f, g) = V_{new}\).
For each divided edge \(e=(a, b)\in F\), we construct a homotopy \(F_e:\textrm{Ch}(e) \boxdot I \rightarrow {G_S}^{2\delta }\) where \(\textrm{Ch}(e)\) is the induced subgraph of \(G_S\) on the child edges of e, namely \(\{ \tau _{e, 0}, \tau _{e,1}, \dots , \tau _{e, d(e)} \}\). It is a two-step homotopy; in the first step all \(v_{e,i}\) with \(CS(e)_i < 1/2\) are mapped to a and the second step the remaining \(v_{e, i}\) are mapped to b. The homotopy for each edge is shown schematically in Fig. 11.
×
This is indeed a digraph map \(F_e:\textrm{Ch}(e)\boxdot I \rightarrow {G_S^{2 \delta }}\) because the original edge e had weight \(\le \delta \) and all edges point along the path \((a, v_{e,1}, \dots , b)\). Note that the top of the diagram is the map \(f \circ g\) and the bottom is the map \(\textrm{id}\), when restricted to \(\varvec{\textrm{Ch}}(e)\). Moreover, for each edge \(e\in F\), \(F_e\) fixes the vertices \(\mathop {\textrm{st}}\limits (e), \mathop {\textrm{fn}}\limits (e)\). Also note that \(G_{diff}(f, g) = \cup _{e \in F} \textrm{Ch}(e)\). Hence the \(F_e\) can be combined into a homotopy \(F:G_{diff}(f, g)\boxdot I \rightarrow {G_S^{2\delta }}\) by the formula
This is a homotopy between \(f\circ g\) and \(\textrm{id}\) on all of \(G_{diff}(f, g)\) and is relative \(V_{old}=V_{fix}(f, g)\). Hence we see that (f, g) has grounded codistortion \(\le \delta \), concluding the proof. \(\square \)
Remark 5.20
Since subdividing an edge does not effect circuit rank of \(\mathcal {U}(G)\), the number of features does not change upon subdivision (by Corollary 4.3).
Definition 5.21
Fix a weighted digraph \(G=(V, E, w)\in \varvec{\textrm{WDgr}}\).
(a)
The medial subdivision, \({S_{med}(G)}:E(G)\rightarrow \Delta ^2\), is given by \(S(e) = (1/2, 1/2)\) for every \(e \in E(G)\).
(b)
The \(n^{th}\) iterated medial subdivision of G, \(\textrm{IMS}_{n}(G)\), is defined iteratively as follows. Firstly, \(\textrm{IMS}_{0}(G){:}{=}G\) then for each n, we define \(\textrm{IMS}_{n}(G) {:}{=}\mathbb {O}^{s}_{S}{\textrm{IMS}_{n-1}(G)}\) where \(S=S_{med}(\textrm{IMS}_{n-1}(G))\).
Corollary 5.22
Given a weighted digraph \(G\in \varvec{\textrm{WDgr}}\), the sequence of barcodes \((\mathcal {B}{ ^{g}{\mathcal {H}}_1(\textrm{IMS}_{n}(G)) })_{n\in \mathbb {N}}\) converges under the bottleneck distance.
Hence, Theorem 5.16 implies that the sequence of barcodes is Cauchy. The space of persistence diagrams with the bottleneck distance is complete [11] and hence the sequence of barcodes converges. \(\square \)
For an example, we refer the reader forward to Proposition 6.1. While we have bottleneck stability, we do not have p-Wasserstein stability for any \(p\in [1, \infty )\).
Proposition 5.23
Fix \(1 \le p < \infty \). There exists no function \(f: \varvec{\textrm{WDgr}}\rightarrow \mathbb {R}\) such that for any weighted digraph \(G=(V, E, w)\in \varvec{\textrm{WDgr}}\) and any subdivision \(S: F \rightarrow {{\,\mathrm{\textrm{int}}\,}}(\Delta ^d)\) we have (Fig. 12).
Suppose such f exists and consider the following sequence of digraphs in which each edge has unit weight.
Intuitively, \(G_n\) is constructed by gluing n disjoint copies of \(G_1\) along the path \((v_0, v_1, v_2, v_3)\). Note that each copy of \(G_1\) introduces a feature which dies at \(t=3\) so
$$\begin{aligned} \mathcal {B}{ ^{g}{\mathcal {H}}_1(G_n) } = \left\{ \!\!\left\{ [0, 3)\text { with multiplicity } n \right\} \!\!\right\} . \end{aligned}$$
(5.22)
Upon subdividing the edge \(e=(v_1, v_2)\) via \(S(e)=(1/2, 1/2)\), each feature changes to [0, 2.5). Hence
which eventually exceeds the constant \(f(\mathop {\mathrm {\mathcal {N\!G}}}\limits (e; G))\). \(\square \)
5.4 Edge Collapse
Another potential structural perturbation is that of edge collapses, in which the two end points of an edge are identified and the edge deleted. In applications, this may happen particularly to low-weight edges, which cannot be discerned by the imaging method and hence collapsed to a vertex instead. Since we interpret edge-weights as corresponding to distance, we add half the weight of the collapsed edge to each of its neighbours so that the length of paths through the collapsed edge are not changed.
Definition 5.24
Given a weighted digraph \(G=(V, E, w)\) we say a subset of edges \(F \subseteq E\) is collapsible if the edge sets of the neighbourhoods \(\mathop {\mathrm {\mathcal {N\!G}}}\limits (e; G)\) for each \(e\in F\) are pairwise disjoint.
Definition 5.25
Given a weighted digraph \(G=(V, E, w)\) and \(F\subseteq E\) collapsible we define the edge collapse \({\mathbb {O}^{c}_{F}{G}}{:}{=}(V_F, E_F, w_F)\) as follows.
(a)
We define an equivalence relation \(\sim \) on V such that \(i \sim j \iff \)\(i=j\) or \((i, j)\in F\) or \((j, i)\in F\). The vertex set is the set of equivalence classes of this relation .
(b)
Given two distinct vertices \(I, J\in V_F\), we include an edge \((I, J)\in E_F\) if and only if there is some \(i \in I\) and \(j\in J\) such that there is an edge \(i\rightarrow j\) in G.
(c)
Finally, given an edge \((I, J) \in E_F\), we define the weight by
The minimum is required in Eq. (5.24) because there may be an edge \(i\rightarrow a\) and an edge \(i\rightarrow b\) with different weights.
Notation 5.27
Thanks to the collapsible condition on F, all equivalence classes in \(V_F\) are singletons except for those of the form \(\{ a, b\}\) for \(e=(a, b)\in F\). Therefore, we denote any singleton class \(\{i\}\) by its sole representative i and the class \(\{a, b\}\) by the symbol \(v_e\).
Arbitrary edge collapses can drastically change the connectivity of the digraph and in turn alter the shortest-path quasimetric. However, given some control on the local neighbourhood of collapsed edges, it is possible to bound these effects and hence get local stability.
Theorem 5.28
Given a weighted digraph, \(G=(V, E, w)\in \varvec{\textrm{WDgr}}\) and \(F\subseteq E\) collapsible, suppose that for each \(e\in F\), \(\mathcal {N}_{out}(\mathop {\textrm{st}}\limits (e)) = \{\mathop {\textrm{fn}}\limits (e)\}\) and \(\mathcal {N}_{in}(\mathop {\textrm{fn}}\limits (e)) = \{\mathop {\textrm{st}}\limits (e)\}\) (as in Fig. 13). Then, for each \(e\in F\), define
Denote \(G_F {:}{=}\mathbb {O}^{c}_{F}{G} = (V_F, E_F, w_F)\) and \(\delta {:}{=}\max _{e\in F}\delta _e\). Note that the condition on each \(e \in F\) ensures given any edge \((I, J)\in E_F\), there is exactly one \(i\in I\) and \(j\in J\) such that \(i \rightarrow j\).
We define two vertex maps. Firstly \(f: V \rightarrow V_F\) is given by \(a,b\mapsto v_e\) for each \(e=(a, b)\in F\) and \(v\mapsto v\) otherwise. Secondly, we define \(g:V_F \rightarrow V\) as follows. For most elements of \(V_e\) we choose the only representative of the equivalence class \(v\mapsto v\). The only classes containing more than one element are of the form \(v_e=\{\mathop {\textrm{st}}\limits (e), \mathop {\textrm{fn}}\limits (e) \}\) for some \(e \in F\). For this class, we choose \(g(v_e)=\mathop {\textrm{st}}\limits (e)\) if
else we choose \(g(v_e) = \mathop {\textrm{fn}}\limits (e)\). We show that f and g define a \(\delta \)-grounded interleaving.
Claim 5.29
f defines a digraph map \(G \rightarrow G_e\).
Proof of Claim
All edges of G are mapped to edges of \(G_e\), with the exception of \(e=(a, b)\). The two endpoints of e are mapped to the same point, \(v_e\). Therefore, f define a digraph map as required. \(\square \)
Given a path \(p:i \leadsto j\) in G, f(p) is a path \(f(i)\leadsto f(j)\) in \(G_e\). Suppose the edges of p are all contained in \(\mathop {\mathrm {\mathcal {N\!G}}}\limits (e; G)\) for some \(e=(a, b) \in F\). If \(i \in \mathcal {N}_{in}(a)\) and \(j\in \mathcal {N}_{out}(b)\) then length of f(p) equals that of p, thanks to the weight distribution. In any other case f(p) can increase in length by at most w(e)/2.
Claim 5.30
f defines a digraph map \(G^t\rightarrow G_F^{t+\delta }\) for all \(t\ge 0\).
Proof of Claim
Given a path \(p:i \leadsto j\) in G, f(p) is a path \(f(i)\leadsto f(j)\) in \(G_F\). Suppose the edges of p are all contained in \(\mathop {\mathrm {\mathcal {N\!G}}}\limits (e; G)\) for some \(e=(a, b) \in F\). If \(i \in \mathcal {N}_{in}(a)\) and \(j\in \mathcal {N}_{out}(b)\) then length of f(p) equals that of p, thanks to the weight distribution. Otherwise one of i or j must be a or b and in these cases f(p) can increase in length by at most \(w(e)/2\le \delta _e/2 \le \delta /2\).
If \((i, j)\in E(G^t)\), then there is a path p joining \(i\leadsto j\) of length at most t in G. We can decompose p into a sequence of maximal sub-paths \(p_1, \dots , p_m\) such that each \(p_k\) is either entirely contained in \(\mathop {\mathrm {\mathcal {N\!G}}}\limits (e; G)\) for some \(e\in F\) or contains no edge in any such neighbourhood. Sub-paths not contained in any \(\mathop {\mathrm {\mathcal {N\!G}}}\limits (e; G)\) for \(e \in F\) have the same length after mapping through f. Sub-paths \(p_i\) contained in some \(\mathop {\mathrm {\mathcal {N\!G}}}\limits (e; G)\) for \(e\in F\) where \(1< i < m\) must fully traverse from some in-neighbour of \(\mathop {\textrm{st}}\limits (e)\) to an out-neighbour of \(\mathop {\textrm{fn}}\limits (e)\) and thus \(f(p_i)\) has the same length as \(p_i\). The remaining \(p_1, p_m\) can increase in length by at most \(\delta / 2\) each and hence f(p) is a path \(f(i)\leadsto f(j)\) of length at most \(t+ \delta \). Therefore \((i, j) \in E(G_F^{t+\delta })\). \(\square \)
Claim 5.31
g defines a digraph map \(G_F^t \rightarrow G^{t+\delta }\).
Proof of Claim
Suppose \((i, j) \in E(G_F^t)\), then there is a path \(p:i\leadsto j\) in \(G_F\) of length at most t. We construct a path \(p'\) in G as follows: replace any singleton class \(i\in V_e\) with its single representative \(i\in V\) and replace any other class \(v_e\) for \(e=(a, b)\in F\) with the vertices a, b. We claim that \(p'\) is a path of length at most \(t+\delta \) that contains a sub-path \(g(i)\leadsto g(j)\).
As in the previous claim, we can decompose p into a sequence of maximal sub-paths \(p_1, \dots , p_m\) such that each \(p_k\) is either entirely contained in \(\mathop {\mathrm {\mathcal {N\!G}}}\limits (v_e; G_F)\) for some \(e\in F\) or contains no edge in any such neighbourhood. We let \(p_i'\) denote the corresponding sub-paths of \(p'\).
If \(p_i\) is not contained in any \(\mathop {\mathrm {\mathcal {N\!G}}}\limits (v_e; G_F)\) for \(e \in F\) then \(p_i'\) has the same length as \(p_i\). If \(p_i\) is contained in some \(\mathop {\mathrm {\mathcal {N\!G}}}\limits (v_e; G_F)\) for \(e\in F\) where \(1< i < m\), then \(p_i\) must fully traverse from some in-neighbour of \(v_e\) to an out-neighbour of \(v_e\). Then \(p_i'\) must also traverse from the same in-neighbour of \(\mathop {\textrm{st}}\limits (v_e)\) to the same out-neighbour of \(\mathop {\textrm{fn}}\limits (v_e)\). Since the weight of e is distributed evenly to the neighbouring edges, \(p_i'\) has the same length as \(p_i\). The remaining sub-paths \(p_1, p_m\) can increase in length by at most \(\delta / 2\) each and hence \(p'\) is a path of length at most \(t+ \delta \).
Finally, note that if p traverses \(v_e\) for some \(e \in F\) then both vertices \(\mathop {\textrm{st}}\limits (e), \mathop {\textrm{fn}}\limits (e)\) appear in the path \(p'\). Hence, no matter which vertex of e is chosen for \(g(v_e)\), the path \(p'\) will traverse it. Therefore, \(p'\) contains a sub-path \(g(i) \leadsto g(j)\). \(\square \)
Claim 5.32
g defines a digraph map \(G_F \cup G_F^t \rightarrow G \cup G^{t+\delta }\).
Proof of Claim
Thanks to the previous claim, we only need to check edges \((i, j)\in E(G_F)\). Any edge \((i, j)\in E(G_F)\) which is not incident to some \(v_e\) is mapped by g to itself \((i, j)\in E(G)\). For the remaining cases, thanks to the collapsibility condition on F, we can assume that exactly one of i, j is a vertex of the form \(v_e\in V_F\) for some \(e \in F\). Further assume that \(g(v_e)=\mathop {\textrm{st}}\limits (e)\); the case where \(g(v_e)=\mathop {\textrm{fn}}\limits (e)\) admits a similar proof.
If \(j=v_e\) then \(i\in V_e\) is a singleton and \(i \in \mathcal {N}_{in}(\mathop {\textrm{st}}\limits (e))\) and hence \((g(i), g(v_e)) = (i, \mathop {\textrm{st}}\limits (e))\in E(G)\). Else, if \(i = v_e\) then j is a singleton and \(j \in \mathcal {N}_{out}(\mathop {\textrm{fn}}\limits (e))\). Note that the path \((\mathop {\textrm{st}}\limits (e), \mathop {\textrm{fn}}\limits (e), j)\) in G is of length \(w(e)+w(b, j) \le \delta _e \le \delta \). Therefore \((g(v_e), g(j))=(\mathop {\textrm{st}}\limits (e), j)\in E(G^{t+\delta })\) for all \(t\ge 0\). \(\square \)
×
As vertex maps we note that \(f\circ g = \textrm{id}_{V_e}\) and hence it only remains to show that (g, f) has grounded codistortion \(\le \delta \). First, we note that the vertices \(v \in V_{diff}(g, f)\) are precisely the endpoints of each \(e \in F\) such that \(g(v_e)\ne v\). Hence, \(G_{diff}(g, f)\) is just the union of the closed stars of such v and moreover the edge sets of these neighbourhoods are disjoint thanks to the collapsibility condition. For a fixed line digraph I and each \(v\in V_{diff}(g, f)\) we will define a path homotopy \(F_v: \mathop {\mathrm {\textrm{Star}}}\limits (v; G)\boxdot I \rightarrow G^{2\delta }\) that is relative all the neighbours of v. We can then combine these homotopies according to the formula
to obtain the desired homotopy \(F: G_{diff}(g, f)\rightarrow G^{2\delta }\).
The homotopies \(F_v\) and line digraph I are shown schematically in Fig. 14 for a vertex v which is the endpoint of some collapsed edge \(e=(a, b)\in F\). All vertices are fixed through the homotopy except for v, whose image varies with \(j\in I\) as shown in the schematic. There are two cases, depending on whether \(g(v_e)=a\) or \(g(v_e)=b\). Focusing on the first case, the vertex \(v=b\) has a single in-neighbour a and a finite number of out-neighbours \(b_1, b_2, \dots , b_k\). The fact that \(g(v_e) = a\) implies that
and hence \((a, b, b_i)\) is always a path of length at most \(\delta \). Therefore, all of the edges (a, b), \((a, b_i)\) and \((b, b_i)\) are contained within \(G^{2\delta }\). A similar proof works in the case \(g(v_e)=b\). \(\square \)
Remark 5.33
Suppose G is a DAG and contains an edge \(e=(a,b)\) that satisfies the condition of Theorem 5.28. Then one can number the vertices of G as \(v_1, \dots , v_n\) such that \(v_i \rightarrow v_j \implies i < j\) and moreover a and b are adjacent. However, note that this is not a sufficient condition for local stability to edge collapse. For an example, we refer the reader to Fig. 21 and the subsequent discussion in Sect. 6.2. In the digraph \(G_2\), note that a, d, c, b is a valid ordering of the vertices but \( ^{g}{\mathcal {H}}_1(G_2) = \left\{ \!\!\left\{ [0, \infty ) \right\} \!\!\right\} \) whilst \( ^{g}{\mathcal {H}}_1(\mathbb {O}^{c}_{(d, c)}{G_2}) = \left\{ \!\!\left\{ [0, 1.5) \right\} \!\!\right\} \).
Remark 5.34
Collapses of the sort described in Theorem 5.28 remove exactly \(\# F\) vertices and \(\#F\) edges, and do not change the number of weakly connected components. Therefore, the circuit rank of \(\mathcal {U}(G)\) does not change and hence, by Corollary 4.3 we have \(\# \mathcal {B}{ ^{g}{\mathcal {H}}_1(G) }= \# \mathcal {B}{ ^{g}{\mathcal {H}}_1(\mathbb {O}^{c}_{e}{G}) }\).
Given a collapse of the sort required by Theorem 5.28 and two vertices v, w that are not adjacent to a collapsed edge, there is a path \(v\leadsto w\) in G if and only if there is such a path in \(\mathbb {O}^{c}_{F}{G}\). In general this is not the case (see Fig. 15); indeed this leads to local instabilities.
Theorem 5.35
There exists no function \(f: \varvec{\textrm{WDgr}}\rightarrow \mathbb {R}\) such that for any weighted digraph \(G=(V, E, w)\in \varvec{\textrm{WDgr}}\) and any edge \(e\in E\) therein we have
Suppose such f exists then consider the weighted digraphs illustrated in Fig. 15. First, denote the edges \(e_1 {:}{=}(v_0, v_5)\) and \(e_2 {:}{=}(v_2, v_3)\) in all three graphs (wherein those edges exist). Note that \(G'\) is independent of the weight W; hence we can safely define \(W {:}{=}2 f ( \mathop {\mathrm {\mathcal {N\!G}}}\limits (e_1; G' ) ) + 4\) to be the weight of \(e_2\) in G. Observe that \(\mathop {\mathrm {\mathcal {N\!G}}}\limits (e_1; G) = \mathop {\mathrm {\mathcal {N\!G}}}\limits (e_1; G')\) and hence
$$\begin{aligned} W = 2 f ( \mathop {\mathrm {\mathcal {N\!G}}}\limits (e_1; G') ) +4 = 2 f ( \mathop {\mathrm {\mathcal {N\!G}}}\limits (e_1; G) ) + 4. \end{aligned}$$
(5.30)
×
Initially, \(\mathcal {B}{ ^{g}{\mathcal {H}}_1(G) }\) has 3 features, which die at 2, 2 and W. After collapsing \(e_1\), \(\mathcal {B}{ ^{g}{\mathcal {H}}_1(\mathbb {O}^{c}_{e_1}{G}) }\) has 3 features, which die at 2.5, 2.5 and 3. The longer feature, supported on the red edge \((v_2, v_3)\), has a reduced death-time in \(\mathbb {O}^{c}_{e_1}{G}\) because there is a shortcut \((v_2, v_{e_1}, v_3)\), of length 3. Any bijection between these features (and the diagonals) must have bottleneck cost at least \(\min (W/2, W-3) > f(\mathop {\mathrm {\mathcal {N\!G}}}\limits ({e_1}; G))\). \(\square \)
While this seems like a serious problem for our descriptor, note that the collapse in Fig. 15 makes significant changes to the topology of the underlying digraph. Originally, G was a DAG with source \(v_0\) and sink \(v_5\); the edge collapse identified these two nodes and introduced directed cycles. Moreover, in G the only path \(v_2\leadsto v_3\) was via the costly red edge but in \(\mathbb {O}^{d}_{e}{G}\) there is a shortcut via \(v_e\). Therefore, since the profile of paths has changed drastically, it is arguably desirable that our descriptor changes too.
5.5 Edge Deletion
5.5.1 General Case
Another important class of structural perturbations is edge deletion. Intuitively, as with edge collapse, some edge deletions can have drastic impact on the descriptor whereas some deletions are minor events.
Definition 5.36
Given a weighted digraph \(G=(V, E, w)\) and a subset of edges \(F \subseteq E\), we define \({\mathbb {O}^{d}_{F}{G}}{:}{=}(V, E{\setminus } F, w_F)\) where \(w_F\) is obtained by restricting w to \(E{\setminus } F\). If \(F = \{ e \}\) is a single edge, we denote this \(\mathbb {O}^{d}_{e}{G}\).
Remark 5.37
We do not distribute the weight of e to neighbouring edges. In applications, we anticipate these errors may appear due to, for example, imaging errors, in which case the neighbouring edges would not change weight.
We find that our descriptor is stable to deletions but the bound depends on the minimum length of a possible diversion. In general, this diversion cost may be infinite.
Theorem 5.38
Given a weighted digraph \(G=(V, E, w)\in \varvec{\textrm{WDgr}}\) and a subset of edges \(F \subseteq E\), let d and \(d_F\) denote the shortest-path quasimetric for G and \(\mathbb {O}^{d}_{F}{G}\) respectively. Then
and \(\delta {:}{=}\max (\delta _1, \delta _2)\). We claim that \(\textrm{id}_{V}\) constitutes a \(\delta \)-shifting vertex map \(G\rightarrow G_F\) and \(G_F \rightarrow G\). Then, since the vertex maps are both the identity, they constitute a \(\delta \)-grounded interleaving and thus we obtain the result via the main stability theorem.
The shortest-path distance between any two nodes in G increases by at most \(\delta _1\) upon deleting the edges of F. Hence, since \(\delta \ge \delta _1\), \(\textrm{id}_V\) defines a digraph map \(G^t \rightarrow G_F^{t+\delta }\) and \(G_F^t \rightarrow G^{t+\delta }\) for all \(t\ge 0\). Then \(E\setminus F \subseteq E\), so \(\textrm{id}_V\) certainly defines a digraph map \(G_F \rightarrow G\) and thus \(G_F \cup G_F^t \rightarrow G \cup G^{t+\delta }\) for all \(t\ge 0\).
In the other direction, \(\textrm{id}_V\) does not define a digraph map \(G \rightarrow G_F\). However, given any deleted edge \(e \in F\), \(e\in E(G_F^{t})\) for all \(t\ge \delta _2\). Hence \(\textrm{id}_V\)does define a digraph map \(G\cup G^t \rightarrow G_F \cup G_F^{t+\delta }\) for all \(t\ge 0\). \(\square \)
Remark 5.39
The above bound is infinite if and only if there is some deleted edge \(e=(i, j) \in F\) such that the only path \(i\leadsto j\) in G is through e.
Corollary 5.40
Given a weighted digraph \(G=(V, E, w)\in \varvec{\textrm{WDgr}}\) and an edge \(e=(a, b) \in E\) let d and \(d_e\) denote the shortest-path quasimetric for G and \(\mathbb {O}^{d}_{e}{G}\) respectively. Then
The result then follows by Theorem 5.38. To see this inequality, note that for arbitrary \(i, j \in V\) we have \(d_e(i, j) \ge d(i, j)\) since any path \(i\leadsto j\) in \(G_e\) is also a path \(i\leadsto j\) in G of the same length. Moreover, there is path \(p_e:a\leadsto b\) in \(G_e\) of length at most \(\delta \). Then, given a path \(p:i\leadsto j\) in G of length t, the path contains e at most once. We can replace e with \(p_e\) to obtain a new path \(i\leadsto j\) in \(G_e\) of length at most \(t+(\delta - w(e))\le t+\delta \). Therefore, \(d_e(i, j) \le d(i, j) + \delta \). \(\square \)
In general, Theorem 5.38 is a non-local bound, but if an edge has an alternative route in its local neighbourhood then the bound becomes local.
Corollary 5.41
Given a weighted digraph \(G=(V, E, w)\) and an edge \(e=(a, b)\in E\) such that there exists a vertex \(v\in V\) such that \(a \rightarrow v \rightarrow b\) then
In general, the shortest-path distance between the endpoints of an edge, upon its deletion, can depend on all remaining edges in the graph. Therefore, the bound of Theorem 5.38 is non-local and indeed no generic, local stability theorem is possible.
Theorem 5.42
There exists no function \(f:\varvec{\textrm{WDgr}}\rightarrow \mathbb {R}\) such that for any digraph \(G=(V, E, w)\in \varvec{\textrm{WDgr}}\) and any edge \(e\in E\) therein we have
Suppose such f exists then consider the following weighted digraph G, where \(e{:}{=}(v_1, v_3)\) and \(W{:}{=}2f(\mathop {\mathrm {\mathcal {N\!G}}}\limits (e; G))+2\) (Fig. 16).
×
Note, G has a single feature which dies at time W, whereas \(\mathbb {O}^{d}_{e}{G}\) has no features. Therefore, the bottleneck distance is \(W/2 > f(\mathop {\mathrm {\mathcal {N\!G}}}\limits (e; G))\). \(\square \)
5.5.2 Separating Edges
Since all features are born at \(t=0\) and \( ^{g}{Z}_1(G, 0) \) has a basis of simple undirected circuits, one might expect that edges never involved in such circuits can be safely deleted without changing the descriptor. Indeed, this is the case and is a direct consequence of the wedge decomposition theorem.
Definition 5.43
In a weighted digraph \(G=(V, E, w)\), an edge \(e=(a, b)\in E\) is called a separating edge if a and b are not weakly connected in \(\mathbb {O}^{d}_{e}{G}\).
Remark 5.44
An edge is separating if and only if there are no simple undirected circuits containing it.
Corollary 5.45
Given a weighted digraph \(G=(V, E, w)\) and a separating edge \(e=(a, b)\in E\),
First note that a and b are both wedge vertices. Let \(V_1\) denote the vertices in the weak connected component of a in \(\mathbb {O}^{d}_{e}{G}\). Define \(V_2 {:}{=}\{ a, b \}\). Finally, define \(V_3 {:}{=}(V(G)\setminus V_1)\). Let \(G_1\), \(G_2\) and \(G_3\) denote the induced subgraph of G on \(V_1\), \(V_2\) and \(V_3\) respectively.
Then a wedge decomposition of G is \(G = (G_1 \vee _a G_2) \vee _b G_3\) and a disjoint union decomposition of \(\mathbb {O}^{d}_{e}{G}\) is \(\mathbb {O}^{d}_{e}{G} = G_1 \sqcup G_3\). Note that \(G_2\) is just a single edge connecting two vertices so \( ^{g}{\mathcal {H}}_1(G_2) \) is the trivial persistent vector space. Using Theorems 4.20 and 4.23, we see
Theorem 5.38 tells us that we are stable to deleting edges which have fast diversions. That is, if there is a path \(p:i\leadsto j\), not involving the edge \(e{:}{=}(i, j)\), of length \(\delta \), then removing e changes the barcode by at most \(\delta \) in bottleneck distance. Note, this bound is independent of the weight of the deleted edge w(e).
To illustrate this point, consider \(G_1\) in Fig. 17. Removing \((v_2, v_3)\) incurs a bottleneck cost of at most 2, since there is a diversion of length 2. Likewise, despite being a highly-weighted edge, we can also remove \((v_0, v_1)\) for a bottleneck cost of at most 2.
On the other hand, consider now \(G_2\) in Fig. 17. The barcode contains a single feature, \(\mathcal {B}{ ^{g}{\mathcal {H}}_1(G_2) } = \left\{ \!\!\left\{ [0, 10) \right\} \!\!\right\} \). The edge \((w_0, w_1)\) has a high weight and the only diversion is via the black edges, of length 10. Deleting the edge \((w_0, w_1)\) removes the single feature, which can be matched with [5, 5) on the diagonal and thus incurs a bottleneck cost of 5. Moreover, deleting one of the smaller edges (for example \((w_2, w_3)\)) also incurs a bottleneck cost of 5 since it deletes the same feature.
5.6 Vertex Deletion
Definition 5.46
Given a weighted digraph \(G=(V, E, w)\) and a vertex \(v\in V\), we define \({\mathbb {O}^{d}_{v}{G}}{:}{=}(V\setminus \{v\}, E_v, w_v)\) where \(E_v {:}{=}E \cap (V{\setminus } \{v\})\times (V{\setminus } \{v\})\) and \(w_v\) is obtained by restricting w to \(E_v\). Given a subset of vertices \(W \subseteq V\), we define \({\mathbb {O}^{d}_{W}{G}}\) iteratively by choosing an arbitrary \(w\in W\) then setting \(\mathbb {O}^{d}_{W}{G} {:}{=}\mathbb {O}^{d}_{W\setminus \{w\}}{\mathbb {O}^{d}_{w}{G}}\).
Remark 5.47
As with edge deletion, we do not distribute the weight of deleted edges because we anticipate these errors occurring due to imaging, in which case the neighbouring edges would not change weight.
Since a single vertex graph has trivial GrPPH the disjoint union decomposition theorem (Theorem 4.20) allows us to delete isolated vertices.
Corollary 5.48
Given a weighted digraph \(G=(V, E, w)\) and an isolated vertex \(v_i\in V\) (i.e. \(\mathcal {N}(v)=\emptyset \)), then
However, in general, deleting a vertex from a digraph can drastically change its topology. This follows immediately from Theorems 5.16 and 5.42 since a local vertex deletion stability theorem would imply a local edge deletion stability theorem.
Corollary 5.49
There exists no function \(f:\varvec{\textrm{WDgr}}\rightarrow \mathbb {R}\) such that for any digraph \(G=(V, E, w)\) and any vertex \(v\in V\) therein we have
As with edge deletion, it is possible to bound the effect of a deleting a vertex \(v_0\), but the bound depends on the choice of an alternative vertex \(v_1\) and the length of a number of diversions. In general, this is a non-local bound.
Theorem 5.50
Given a weighted digraph \(G=(V, E, w)\) and a vertex \(v_0\in V\), let d denote the shortest-path quasimetric. Suppose that for some \(\delta > 0\) there is some vertex \(v_1\in V \setminus \{v_0\}\) such that, for all \(a\in \mathcal {N}_{in}(v_0)\) and \(b\in \mathcal {N}_{out}(v_0)\)
(a)
\(d(a, v_0)\le 2\delta \) and \(d(v_0, b)\le 2\delta \);
(b)
there are paths \(p_a:a\leadsto v_1\) and \(q_b:v_1\leadsto b\) each of length at most \(\delta \) and not traversing \(v_0\);
(c)
there is a path \(p_{a, b}:a\leadsto b\) of length at most \(w(a, v_0)+w(v_0, b)+\delta \) and not traversing \(v_0\);
(d)
there is a path q connecting \(v_0 \leadsto v_1\) or \(v_1\leadsto v_0\) of length at most \(2\delta \).
Note there is an inclusion map \(g: V(\mathbb {O}^{d}_{v_0}{G}) \rightarrow V(G)\) and define a map in the opposite direction \(f:V(G) \rightarrow V(\mathbb {O}^{d}_{v_0}{G})\) by
Denote the altered digraph by \(G_d{:}{=}\mathbb {O}^{d}_{v_0}{G}\). We claim that, under the conditions of the theorem, f and g constitute a \(\delta \)-grounded interleaving.
First note that g is a digraph map \(G_d \rightarrow G\) because \(G_d\) is a subgraph. Moreover, g induces a contraction on the shortest-path quasimetric and hence g is certainly a \(\delta \)-shifting vertex map.
Claim 5.51
f induces digraph maps \(G^t \rightarrow G_d^{t+\delta }\).
Proof of Claim
Suppose \((i, j) \in E(G^t)\); hence there is a path \(p: i \leadsto j\) of length at most t in G. We require a path \(f(i)\leadsto f(j)\) of length at most \(t+\delta \) in \(G_d\).
Assume that \(i, j \ne v_0\) so that \(f(i)=i\) and \(f(j)=j\). If p does not traverse \(v_0\) then p is a path in \(G_d\) and we are done. Else p contains a sequence of vertices \((a, v_0, b)\) for some \(a\in \mathcal {N}_{in}(v_0)\) and \(b\in \mathcal {N}_{out}(v_0)\). Thanks to condition (3) of the theorem, we can replace this sequence with the path \(p_{a, b}\) to obtain a new path \(i\leadsto j\) in \(G_d\), of length at most \(t+\delta \).
Next, suppose \(j=v_0\) but \(i\ne v_0\). Then p must finish with the sequence \((a, v_0)\) for some \(a\in \mathcal {N}_{in}(v_0)\). We can replace this sequence with \(p_a\) to obtain a new path \(i\leadsto v_1=f(v_0)\) in \(G_d\) of length at most \(\delta \). A similar proof works for the final case \(i=v_0\), \(j\ne v_0\). \(\square \)
Thanks to the previous claim we only need to consider edges in E(G). Given an edge \((i, j)\in E(G)\) suppose that \(i, j\ne v_0\). Then \(f(i) = i\) and \(f(j)=j\) and (i, j) is still an edge in \(G_d\). Suppose \((a, v_0)\in E(G)\), then \(f(a) = a\) and \(f(v_0)=v_1\); the existence of the path \(p_a\) shows that \((a, v_1)\in E(G_d^{t+\delta })\) for all \(t\ge 0\). Likewise, if \((v_0, b)\in E(G)\) then the existence of \(q_b\) shows that \((v_1, b)\in E(G_d^{t+\delta })\) for all \(t\ge 0\). \(\square \)
Now note that \(f\circ g = \textrm{id}\) is the identity on \(V(G_d)\) and hence (f, g) has grounded codistortion \(\le \delta \).
×
Claim 5.53
(g, f) has grounded codistortion \(\le \delta \).
Proof of Claim
First note that \(G_{diff}(g, f) = \mathop {\mathrm {\textrm{Star}}}\limits (v_0; G)\) so we require a homotopy \(F: \mathop {\mathrm {\textrm{Star}}}\limits (v_0)\boxdot I \rightarrow G^{2\delta }\) for some line digraph I. Note further that \(g\circ f\) fixes all vertices except \(v_0\) which gets mapped to \(v_1\). Assume that q is a path \(v_0 \leadsto v_1\) of length at most \(2\delta \), then set \(I=I_+\) and define the homotopy via
which is shown in Fig. 18. Each of the edges in the lower layer belong to \(G^{2\delta }\) thanks to condition (1) of the theorem. Each of the edges in the upper layer belong to \(G^{2\delta }\) thanks to condition (2) of the theorem. The only non-trivial vertical edge is \(v_0 \rightarrow v_1\) which exists in \(G^{2\delta }\) thanks to our assumption on q. Moreover, F is relative all vertices except \(v_0\), as required. In the case where q is a path \(v_1 \leadsto v_0\) we follow the same construction and proof, except we take \(I=I_-\). \(\square \)
The bound on bottleneck distance now follows from the main stability theorem. \(\square \)
This theorem can be extended to the case of deleting multiple vertices, \(W \subseteq V\). In order for the proof above to easily extend, each of the neighbourhoods must be sufficiently isolated, i.e. the subgraphs \(\mathop {\mathrm {\textrm{Star}}}\limits (w; G)\) for \(w \in W\) should be pairwise edge-disjoint. This allows us to join the homotopies around each \(w \in W\) together, as in the proof of Theorem 5.16.
Next, one must choose a replacement vertex for each of the deleted vertices, i.e. a map \(f: W \rightarrow V \setminus W\). The resultant bound on the bottleneck distance will then depend on a number of factors, such as:
1.
the extent to which f changes the quasi-metric, i.e. the minimum \(\delta \) such that f induces a digraph map \(G^t \rightarrow {(\mathbb {O}^{d}_{W}{G})}^{t+\delta }\) for all \(t\ge 0\);
2.
the maximum distance in \(\mathbb {O}^{d}_{W}{G}\) amongst d(a, f(v)) and d(f(v), b) over \(v \in W\), \(a \in \mathcal {N}_{in}(v; G)\), \(b\in \mathcal {N}_{out}(v; G)\);
3.
the maximum distance in G amongst d(a, v) and d(v, b) over \(v \in W\), \(a \in \mathcal {N}_{in}(v; G)\), \(b\in \mathcal {N}_{out}(v; G)\);
4.
the maximum distance between v and f(v) (in either direction) in \(\mathbb {O}^{d}_{W}{G}\), over \(v \in W\).
Once these factors are controlled, by the existence of sufficiently short paths in G and \(\mathbb {O}^{d}_{W}{G}\), then a similar proof yields a bound on the bottleneck distance. However, one may have to replace F with a two-step homotopy to allow for the possibility that there are only short paths \(w_1 \leadsto f(w_1)\) and \(f(w_2) \leadsto w_2\) for some \(w_1, w_2 \in W\).
5.7 Combining Operations
Given \(G_1, G_2, G_3\in \varvec{\textrm{WDgr}}\), suppose that the previous stability theorems give
One immediately obtains the bound \(d_B(\mathcal {B}{ ^{g}{\mathcal {H}}_1(G_1) }, \mathcal {B}{ ^{g}{\mathcal {H}}_1(G_3) }) \le \delta _1 + \delta _2\). A natural question arises as to when this bound can be improved upon. Suppose that \(\{f, g\}\) is a \(\delta _1\)-grounded interleaving and \(\{p, q\}\) is a \(\delta _2\)-grounded interleaving arranged as follows:
×
It is easy to see that both \(p \circ f\) and \(g\circ q\) form \((\delta _1 + \delta _2)\)-shifting vertex maps. However, it is not clear, a priori, that either of the ordered pairs have grounded codistortion \(\le (\delta _1 + \delta _2)\). Nevertheless, depending on the combination of operations, it is often possible to show that \(\{p\circ f, g\circ q\}\) is a \(\delta \)-grounded interleaving for some \(\delta \le \delta _1 + \delta _2\).
As an illustrative example, given \(G\in \varvec{\textrm{WDgr}}\), suppose an edge \(e\in E(G)\) is subdivided according to some subdivision S, to obtain \(G_S {:}{=}\mathbb {O}^{s}_{S}{G}\). Then suppose one of the child edges \(\tau {:}{=}\tau _{e, i}\) is deleted (as shown in Fig. 19), to obtain \(\mathbb {O}^{d}_{\tau }{G_S}\). In the final digraph, \(\mathbb {O}^{d}_{\tau }{G_S}\), there is no alternative path between the endpoints of \(\tau \) so Theorem 5.38 would yield an infinite bound
However, the remaining child edges and vertices from the subdivision can be further deleted from \(\mathbb {O}^{d}_{\tau }{G_S}\) to obtain \(G_e {:}{=}\mathbb {O}^{d}_{e}{G}\). Then, Corollaries 5.45 and 5.48 imply that \( ^{g}{\mathcal {H}}_1(\mathbb {O}^{d}_{\tau }{G_S}) \cong ^{g}{\mathcal {H}}_1(G_e) \). Combining Theorems 5.16 and 5.38, we can bound
where \(d_e\) is the shortest-path quasimetric in \(G_e\).
Recall that the grounded interleaving used to prove edge deletion stability (Theorem 5.38) was a pair of identity vertex maps. Therefore, in the following, we show that the grounded interleaving used to prove subdivision stability directly yields a sharper bound for this combination of operations.
×
Theorem 5.54
Given a weighted digraph \(G=(V, E, w)\in \varvec{\textrm{WDgr}}\), an edge \(e=(a, b)\in E\) and a subdivision \(S: \{e \} \rightarrow {{\,\mathrm{\textrm{int}}\,}}(\Delta ^d)\), denote \(G_S {:}{=}\mathbb {O}^{s}_{S}{G}\) and \(G_e{:}{=}\mathbb {O}^{d}_{e}{G}\). Let \(d_e\) denote the shortest-path quasimetric in \(G_e\). Choose any of the child edges \(\tau {:}{=}\tau _{e, i} \in E(G_S)\), then.
We note that f induces a contraction digraph map \(G_e \rightarrow G_S\) and is hence a \(\delta \)-shifting vertex map.
Claim 5.55
g induces a digraph map \(G_S^{t} \rightarrow G_e^{t+\delta }\), for every \(t\ge 0\).
Proof of Claim
Given \((i, j)\in E(G_S^t)\), there is a path \(p:i\leadsto j\) in \(G_S\) of length at most t. Since \(d_e(a, b)\le \delta \) there is a path \(p_e: a\leadsto b\) in \(G_e\) of length at most \(\delta \). We construct a new trail \(p'\) in \(G_e\) as follows.
If the entire sequence of child edges \((\tau _{e, 1}, \dots , \tau _{e, d(e)})\) appears in p then we replace that sequence with \(p_e\). If \(i \in V_{new}\) and \(g(i)=a\) then we replace the initial sequence of child edges with \(p_e\). If \(i \in V_{new}\) and \(g(i) = b\) then we simply remove the initial sequence of child edges. Likewise, if \(j\in V_{new}\) and \(g(j) = b\) then we replace the final sequence of child edges with \(p_e\). If \(j \in V_{new}\) and \(g(j) = a\) then we simply remove the final sequence of child edges. This yields a trail \(p': g(i) \leadsto g(j)\). Since p cannot repeat edges, this construction inserts \(p_e\) at most once and hence the length of \(p'\) is at most \(t+\delta \). Therefore \((i, j) \in E(G_e^{t+\delta })\). \(\square \)
Claim 5.56
g induces a digraph map \(G_S\cup G_S^{t} \rightarrow G_e \cup G_e^{t+\delta }\), for every \(t\ge 0\).
Proof of Claim
It remains to check the image of edge \(e\in E(G_S)\). Any un-subdivided edge \(e\in E_{old}\) is preserved under g. Given an edge \(\tau _{e, i}=(x, y)\in E_{new}\) then \(\tau =(v_{e, i-1}, v_{e, i})\) for some i and there are three cases
Hence either \(g(x) = g(y)\) or \((g(x), g(y)) = e\). The edge e does not appear in \(G_e\) but it does appear in \(G_e^{t+\delta }\) for all \(t\ge 0\). Therefore g defines a digraph map as required. \(\square \)
First observe \(g \circ f = \textrm{id}_{V(G)}\) and hence (g, f) has grounded codistortion \(\le 0\). The homotopy used to show (f, g) has grounded codistortion \(\le \delta \) is identical to the corresponding homotopy in the proof of Theorem 5.16. However, note that we require \(2\delta \ge w(e)\) so that all edges in the image of the homotopy are edges of \(G_S^{2\delta }\). \(\square \)
6 Examples
6.1 Iterated Medial Subdivision
In order to develop intuition for how the descriptor behaves under iterative subdivision, we explicitly derive the limiting diagram for a DAG with exactly one loop (shown in Fig. 20). Certainly, the diagram contains exactly one feature which is born at \(t=0\). Intuitively, the death time corresponds to the earliest time that a long square can appear between the source and sink nodes, filling in the central hole.
×
Proposition 6.1
Suppose \(G\in \varvec{\textrm{WDag}}\) is the union two directed paths \(p_1, p_2\) from a source to a sink, with lengths \(l_1 \ge l_2\) respectively. Recall the definition of iterated medial subdivision (Definition 5.21). Then
For brevity we denote \(G_n {:}{=}\textrm{IMS}_{n}(G)\). By Corollary 4.3, the barcode \(\mathcal {B}{ ^{g}{\mathcal {H}}_1(G_n) }\) has exactly one feature. Let \(p_i^{(n)}\) denote the path \(a\leadsto b\) in \(G_n\) arising from subdividing the edges of \(p_i\). For each n, let \(c^{(n)}\) denote the simple undirected circuit in \(G_n\) which follows \(p_1^{(n)}\) and then \(p_2^{(n)}\) in reverse. Clearly \(\{\mathfrak {R}({c^{(n)}})\}\) is a persistence basis for \( ^{g}{\mathcal {H}}_1(G_n) \). Therefore, it suffices to show \(\mathop {\mathrm {\mathcal {D}}}\limits (\mathfrak {R}({c^{(n)}})) \rightarrow \frac{1}{2}l_1\) as \(n\rightarrow \infty \). Fix some natural n.
Using Lemma 4.6 we see \(\mathop {\mathrm {\mathcal {D}}}\limits (\mathfrak {R}({c^{(n)}})) \le \max (h_{1}^{(n)}, h_{2}^{(n)})\) where
(6.2)
Note that \(h_1^{(n)} \rightarrow \frac{1}{2}l_1\) and \(h_2^{(n)} \rightarrow \frac{1}{2}l_2\) and hence \(\max (h_1^{(n)}, h_2^{(n)}) \rightarrow \frac{1}{2}l_1\) as \(n\rightarrow \infty \).
Next, we wish to show \(\mathop {\mathrm {\mathcal {D}}}\limits (\mathfrak {R}({c^{(n)}})) \ge \frac{1}{2}l_1\). Choose arbitrary \(t_2 < \frac{1}{2} l_1\), then it suffices to show that \(\dim ^{g}{H}_1(G_n, t_2) \ne 0\). In order to do so, we claim the inclusion chain map
×
induces an isomorphism on homology in degree 1. It then follows that \(\dim ^{g}{H}_1(G_n, t_2) =1\) because the first homology of top row is the real cycle space of \(G_n\).
We define a chain map q in the opposite direction to j. In degree 2, \(q_2\) is the zero map and in degree 0, \(q_0\) is the identity map. Finally in degree 1, given \((i, j)\in C_1(G_n \cup G_n^{t_2})\), if \((i, j) = (a, b)\) then let \(p_{i, j}{:}{=}p_2\), otherwise let \(p_{i, j}\) denote the unique path \(i\leadsto j\) in \(G_n\) Then \(q_1\) is given by \(q_1(ij) {:}{=}\mathfrak {R}({p_{i,j}})\). This is a chain map because there is no 2-path avb where v is somewhere along \(p_1^{(n)}\).
It is certainly the case that, at the level of chain maps, \(q_1 j_1 = \textrm{id}\). Choose arbitrary \((i, j) \in C_1(G_n \cup G_n^{t_2})\) and note that \(j_1 q_1 (ij) - (ij) = \mathfrak {R}({p_{i, j}}) - ij\). By Lemma 4.1, there is some \(u_{i, j} \in C_2(G_n^{t_2})\) such that . Define \(P:C_1(G_n \cup G_n^{t_2}) \rightarrow C_2(G_n^{t_2})\) by \(ij \mapsto u_{i, j}\). Then, by construction, we see . Hence, j and q are mutually inverse on homology in degree 1.
Taking the limit \(n\rightarrow \infty \) finishes the proof. \(\square \)
Note that description of Proposition 6.1 is not unique to this descriptor, indeed the same result holds for the standard pipeline. As discussed in Sect. 3.1, for the standard pipeline, as the weighted digraph is subdivided, the birth times of all features tend to 0. When the digraph is sufficiently subdivided, all edges enter the filtration very early on and the effect of adding the edges from G at \(t=0\) has negligible effect. Hence, in the subdivision limit, the diagrams obtained from the two pipelines coincide.
Theorem 6.2
Given \(G\in \varvec{\textrm{WDgr}}\), let \(\mathcal {H}_1\) denote the ‘standard pipeline’ with \(C=\Omega \) and \(F=F_d\), as used in Example 3.6. Then
For brevity, we denote \(G_n {:}{=}\textrm{IMS}_{n}(G)\). Fix \(\epsilon >0\) and choose N sufficiently large that for any \(n\ge N\) we have \(w(e) < \epsilon \) for all \(e\in E(G_n)\). For any \(t\ge 0\), define
where each \(i_k\) is taken from the chain map induced by the relevant inclusion of digraphs. It can be easily checked that and hence \(i_1\) induces a map on homology \(i_*: H_1(G_n^t) \rightarrow ^{g}{H}_1(G_n,{t+\epsilon })\). Similarly, for any \(t\ge 0\) define
where each \(j_k\) is likewise taken from the chain map induced by the relevant inclusion of digraphs. Note, in particular, given an edge \(e\in E(G_n)\), we know \(d(\mathop {\textrm{st}}\limits (e), \mathop {\textrm{fn}}\limits (e)) < \epsilon \) and hence \(e \in E(G_n^{t+\epsilon })\). Again and hence \(j_1\) induces a map on homology \(j_*: ^{g}{H}_1(G,t) \rightarrow H_1(G^{t+\epsilon })\).
Clearly and \(j_*\circ i_*= {\iota (t , t+2\epsilon )}_*\). Therefore, by the algebraic stability theorem, we see
Further to the interpretation developed in Proposition 6.1, consider the four weighted digraphs in Fig. 21. All edges are given unit weight and the barcodes are indicated under each digraph. Homology representatives for each of the features are given by
$$\begin{aligned}&ab + bd - cd - ac; \\&ab - db + dc - ac; \\&ab + be - ce - ac \quad , \quad db + be - ce - dc; \\&ab - eb + ec - ac \quad , \quad db - eb + ec - dc. \end{aligned}$$
In \(G_1\), note that the flow starting at a recombines at d after flowing for \(t=2\) seconds. In contrast, the flow in \(G_2\) splits from the sources and then never recombines. This is reflected in the lifetime of the feature changing from [0, 1) to \([0, \infty )\).
If we add additional edges to \(G_2\) to recombine the flow at a new vertex (as in \(G_3\)), we add an additional feature but all features now have finite lifetime. Finally, reversing these additional edges (as in \(G_4\)) prevents the flow from recombining again and the features return to lifetime \([0, \infty )\).
This further emphasises the interpretation that features arise when flow is split between two paths and the lifetime of the feature is related to the time it takes for the flow to recombine.
×
6.3 Multiple Paths
Example 6.4
Consider Fig. 22, in which there is a single source and a single sink but multiple paths between. Define \(\alpha _i {:}{=}\max (a_i, b_i)\) and assume that \( \alpha _1 \le \alpha _2 \le \dots \alpha _{n-1} \le \alpha _n. \) Then, the barcode is
A persistence basis for \( ^{g}{\mathcal {H}}_1(G) \) is \(\{ c_2, \dots , c_n \}\) where \( c_i {:}{=}{a v_1} + {v_1 b} - {v_i b} - {a v_i} \) and \(\mathop {\mathrm {\mathcal {D}}}\limits (c_i) = \alpha _i\).
×
6.4 Digraphs with Identical Quasimetrics
Example 6.5
Finally, consider the two weighted digraphs illustrated in Fig. 23. Since they both have the same shortest-path quasimetric, they yield the same barcode under the standard pipeline. More formally, \(F_d(G_1) = F_d(G_2)\) and hence \(\mathcal {H}_1(G_1) = \mathcal {H}_1(G_2)\). Moreover, \(\mathcal {B}{\mathcal {H}_1(G_1)}\) is empty because the circuit \((v_0, v_1, v_3, v_2)\) is filled-in with a long square as soon as it appears in the filtration. In contrast,
A persistence basis for \( ^{g}{\mathcal {H}}_1(G_1) \) is \( \{v_0 v_1 + v_1 v_3 - ( v_0 v_2 + v_2 v_3 ) \} \) while a persistence basis for \( ^{g}{\mathcal {H}}_1(G_2) \) is \(\{ v_0 v_1 + v_1 v_3 - (v_0 v_2 + v_2 v_3) \quad , \quad v_0 v_1 + v_1 v_3 - v_0 v_3. \}\) Note that at \(t=1\) the two triangular cycles becomes homologous in \( ^{g}{\mathcal {H}}_1(G_2) \) but are still non-trivial, until they die at \(t=2\).
×
6.5 Finite (quasi)metric Space
Given a finite quasimetric space \(d: X \times X \rightarrow [0, \infty )\), one can construct a complete digraph, weighted by d as \(MG(X) {:}{=}(X, X\times X {\setminus } \Delta _X, w)\) where
$$\begin{aligned} w((v_i, v_j)) {:}{=}d(v_i, v_j) \quad \text { for all }(v_i, v_j)\in X \times X \setminus \Delta _X. \end{aligned}$$
(6.8)
Naturally one can ask what \(\mathcal {B}{ ^{g}{\mathcal {H}}_1(MG(X)) }\) measures, and how it compares to persistent path homology.
Firstly, since the input digraph has \(n(n-1)\) edges, n nodes and 1 weakly connected component, the resulting barcode contains
features. Also given any three distinct vertices \(v_0, v_1, v_2 \in X\), all of the edges \((v_0, v_1)\), \((v_1, v_2)\) and \((v_0, v_2)\) are present in G and thus form a circuit, c, with death time
Indeed, any cycle \(c'\) supported on an edge \((v_i, v_j)\) must have death time \(\mathop {\mathrm {\mathcal {D}}}\limits (\mathfrak {R}({c'})) \ge d(v_i, v_j)\).
Defining, the diameter of X via \(\textrm{diam}(X) {:}{=}\max _{v_i, v_j \in X} d(v_i, v_j)\), we note that \(MG(X)^t\) is the complete graph for \(t\ge \textrm{diam}(X)\). Since any complete graph has trivial path homology [23, Example 3.11], all features must have death time at most \(\textrm{diam}(X)\). Therefore, the largest feature in \(\mathcal {B}{ ^{g}{\mathcal {H}}_1(MG(X)) }\) must have death time equal to \(\textrm{diam}(X)\).
In Fig. 24, we have sampled 100 points on the unit circle \(X \subseteq S^1\) and computed \(\mathcal {B}{ ^{g}{\mathcal {H}}_1(MG(X)) }\), using the ambient Euclidean metric. We see a large quantity of features with death time of approximately 2, the diameter of the circle. A representative of the largest feature is shown in the second panel, which is indeed a directed triangle supported on an edge which spans almost the full diameter of the circle.
The distribution of death times is well-aligned with the distribution of the top \((100-1)^2 = 9801\) pairwise distances, as illustrated by the quantile-quantile plot in Fig. 24. The main deviation between the two distribution occurs at the small scales whilst the larger quantiles are almost identical. This illustrates that GrPPH has similar descriptive power to the collection of pairwise distances. That is, not much information is lost, but consequently the output is not very interpretable and provides little additional insight.
In contrast, computing standard PPH, the degree 1-barcode contains a single feature with lifetime approximately [0.291, 1.418). The birth time corresponds to the length of the largest gap in the point sample (at the top of the circle). Meanwhile, the death time is approximately \(\sqrt{2}\), the side length of a square inscribing the circle.
Grounded persistence was designed as a descriptor of sparse weighted digraphs where the existence of an edge is an important signal from the input data and the circuits are of interest. We enrich the circuits already present in G with an intrinsic, directed notion of scale, using the weighting and an appropriate filtration. In the finite (quasi)metric space scenario, all possible edges are present in MG(d) and thus the circuit representatives are not as interpretable as in the sparse case. As such, our recommendation would be to use non-grounded persistent path homology [13] (viewing the data as a directed network) or traditional, symmetric TDA methods.
Acknowledgements
The first author would like to thank H. Byrne, A. Goriely, A. Ó hEachteirn and T. Thompson for valuable discussions which motivated and aided the early stages of this work. The authors are thankful for the detailed and helpful comments of the reviewers of this manuscript. In particular, the definition of a \(\delta \)-grounded interleaving in Sect. 5.1 was greatly motivated by a suggestion to adapt the Kalton-Ostrovskii characterisation of network distance. HAH gratefully acknowledges funding from a Royal Society University Research Fellowship. The authors are members of the Centre for Topological Data Analysis, which is funded by the EPSRC grant ‘New Approaches to Data Science: Application Driven Topological Data Analysis’ EP/R018472/1. For the purpose of Open Access, the authors have applied a CC BY public copyright licence to any Author Accepted Manuscript (AAM) version arising from this submission.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.