An annotated hypergraph a hypergraph accompanied by additional metadata.
It is difficult to find a modeling justification for hypergraphs with role-degenerate edges, but degenerate edges may have modeling applications. For example, in email networks, it may be useful to log instances in which a sender also copies themsleves into the recipients list. In our experiments below, however, we work only on data with neither form of degeneracy. Throughout the remainder of the paper, we restrict \(\mathbb {H}\) to the set of nondegenerate annotated hypergraphs.
A configuration model for annotated hypergraphs
We now define a configuration null model on nondegenerate, annotated hypergraphs. As has recently been emphasized by
Fosdick et al. (2018), there are several distinct models that are often called “configuration models,” and care is required in order to define and sample from them.
To do so, we define two sets of vectors that summarize the incidence array
T. For each
x, define the vector
dx of nonnegative integers entrywise as
$$\begin{array}{*{20}l} d_{v}^{x} = \sum_{e \in E} t_{vex}\;. \end{array} $$
(2)
Similarly, define the vector
k of nonnegative integers entrywise as
$$\begin{array}{*{20}l} k_{e}^{x} = \sum_{v \in V} t_{vex}\;. \end{array} $$
(3)
The vector dx counts the number of times that each node plays role x in \(\mathcal {H}\), while the vector kx counts the number of nodes with role x in each edge. Both nodes of degree zero (which participate in no edges) and edges of dimension zero (which contain no nodes) are permitted. However, because we will later sample from a probability distribution via edge swaps, these nodes and edges play no role in sampling and can be ignored.
Let D be the matrix whose xth column is dx, and K the matrix whose xth column is kx. In a slight abuse of notation, we also regard D and K as functions of \(\mathcal {H}\).
Throughout the remainder of this paper, we assume that
D and
K are configurable. Note that this is always the case when
D and
K are extracted from an empirical data set. More general problems concerning the configurability of arbitrary
D and
K may also be considered. By viewing the annotated bipartite graph
\(\mathcal {B}(\mathcal {H})\) as a union
\(\cup _{x \in \mathcal {X}} \mathcal {B}^{x}(\mathcal {H})\) of single-role bipartite graphs, one for each role
x, we can reduce the problem to determining the configurability of the individual pairs
dx and
kx. An elegant necessary and sufficient condition for the configurability of two integer sequences is provided in early work by Ryser (
1960;
2009);
Gale (1957).
In
\(\mathbb {C}_{\mathbf {D}, \mathbf {K}}\), each node “remembers” how many times it played role each
x and each edge remembers how many nodes playing role
x were contained in it. Summing over
x, we see that nodes remember their degrees and edges their dimensions. The null space
\(\mathbb {C}_{\mathbf {D}, \mathbf {K}}\) thus generalizes the hypergraph configuration null space of
Chodrow (2019a).
A natural approach to defining a null model is to define the uniform measure on
\(\mathbb {C}(\mathcal {H}_{0})\). Such a definition is attractive from a theoretical standpoint, since this measure is also the entropy-maximizing measure on
\(\mathbb {H}\) subject to the degree and dimension constraints. However, methods for sampling from uniform models suffer from computational issues related to counting edge multiplicities and rejection probabilities, often resulting in slow sampling (
Fosdick et al. 2018;
Chodrow 2019b). We therefore instead follow the traditional path of
Bollobás (1980);
Molloy and Reed (1998), and others in defining a configuration model as the output of a stub-matching algorithm.
A
stub is an indexed pair (
v,
x,
i) of a node and a role; the index
i serves simply to distinguish stubs. For each such pair,
\(d_{v}^{x}\) counts the number of times that node
v has role
x in
\(\mathcal {H}_{0}\). We collect all stubs a single set:
$$\begin{array}{*{20}l} \Sigma = \bigcup_{x \in \mathcal{X}} \bigcup_{v \in V} \underbrace{\left\{(v,x, 1),\ldots,(v,x, d_{v}^{x})\right\}}_{d_{v}^{x} \text{ copies}}\;. \end{array} $$
Stub-matching forms hyperedges by selecting sets of stubs from the set
Σ. For each edge
e:
(1)For each role x, uniformly sample \(k_{e}^{x}\) stubs with role x from Σ, without replacement, and combine them via multiset union. Add the result to the edge set \(\mathcal {E}\).
The algorithm terminates when it is impossible to form the next edge. When D and K are configurable, stub-matching terminates when Σ=∅ and produces a partition generating a stub-labeled hypergraph with specified degree and dimension sequences.
By construction, the output of stub-matching is distributed according to the uniform measure μ0 on the set ΣD,K of partitions of indexed stubs into subsets with fixed numbers of stubs per node and elements per subset. Let \(g:\Sigma _{\mathbf {D},\mathbf {K}} \rightarrow \mathbb {C}_{\mathbf {D},\mathbf {K}}\) be the map that sends to each such partition its associated hypergraph.
The configuration model μ weights elements of \(\mathbb {C}_{\mathbf {D},\mathbf {K}}\) according to their likelihood of being realized via stub-matching, conditional on nondegeneracy. In this, it differs from the uniform distribution on \(\mathbb {C}_{\mathbf {D},\mathbf {K}}\), since the same annotated hypergraph can be realized through multiple configurations. Indeed, any permutation of stubs of the form (v,x,1),(v,x,2) does not alter the image under g. Because of this, the configuration model tends to give greater probabilistic weight to annotated hypergraphs in which there are many parallel edges.
By definition, we can in principle sample from
μ by repeatedly performing stub-matching until a nondegenerate configuration is obtained, and then applying the function
g. This approach is usually impractical, as the probability of realizing a nondegenerate configuration is typically very low. The generalization of limit laws such as those provided by
Angel et al. (2016) for a dyadic configuration model governing the probability of nondegeneracy would be a welcome development beyond our present scope.
An alternative approach is to perform stub-matching and then simply discard degenerate edges. This approach was pioneered by
Molloy and Reed (1995). The model of
Allard et al. (2015) can be used to sample annotated hypergraphs using this method. The theoretical justification of this approach is that, though the probability of at least one degeneracy may be high, the expected
number of degeneracies is low (see again
Angel et al. (2016)). One might therefore suppose that these can be removed without doing significant violence to the topological structure of the realized network. Since our present interest lies in null random graph hypothesis-testing, it is desirable to avoid the issue of nondegeneracy entirely. We do so by developing a simple extension of the standard edge-swap Markov chain, and show that this chain is sufficient to perform approximate sampling from
μd,k directly.
Edge-swap Markov chains
Edge-swap Markov Chain Monte Carlo provides an alternative approach to sampling from μD,K. The benefit of this method of sampling is that, as long as it is initialized with a non-degenerate hypergraph, all samples produced are guaranteed to be nondegenerate.
It is convenient to define edge swaps on the annotated bipartite graph
\(\mathcal {B}\). There is considerable literature on edge-swap Markov chains for bipartite graphs (
Kannan et al. 1999;
Erdős and Gallai 1960;
Ryser 1960). The edge-swap Markov chain we develop here may be viewed as a superposition of standard chains, one for each role label. This is in principle sufficient to imply irreducibility and aperiodicity; we provide a full proof in order to make our exposition self-contained.
We can also regard a role-preserving edge-swap of bipartite edges f1 and f2 as a map \(\pi :\mathbb {B}\rightarrow \mathbb {B}\) that generates a new bipartite graph \(\mathcal {B}^{\prime }\). In this case we write \(\mathcal {B}^{\prime } = \pi (\mathcal {B}|f_{1}, f_{2})\). Note that it is possible that \(\mathcal {B} = \pi (\mathcal {B}|f_{1}, f_{2})\); this occurs when f1=(v,e1;x) and f2=(v,e2;x) for some v or f1=(v1,e;x) and f2=(v2,e;x) for some e. Nondegeneracy rules out the case that f1=f2=(v,e;x) for distinct f1 and f2.
Let
\(\mathcal {B}^{x}\) denote the edges of
\(\mathcal {B}\) with role label
x. Then, we can construct a Markov chain
\(\mathcal {B}_{t}\in \mathbb {B}\) by repeated role-preserving double edge swaps. Some care is needed to ensure that each state of this chain is nondegenerate. The full algorithm is formalized in Algorithm 1. The Markov chain
\(\mathcal {B}_{t}\) of hypergraphs induces a chain
\(\mathcal {H}_{t} \in \mathbb {H}\) of annotated hypergraphs. Theorem 1 ensures that samples from this chain at sufficiently long intervals will be approximately i.i.d. according to
μ.
Proof
It is convenient to first view stub-matching as an algorithm for generating stub-labeled bipartite graphs. To do so, we observe that the output partition of stub-matching defines a bipartite graph similar to \(\mathcal {B}\), except that to each edge (v,e;x) is associated an integer between 1 and \(d_{v}^{x}\). Let \(\bar {\mathcal {B}}\) denote this stub-labeled bipartite graph, and \(\bar {\mathbb {B}}\) the set of such graphs. We recover a standard bipartite graph \(\mathcal {B}\) from \(\bar {\mathcal {B}}\) by erasing the stub-labels.
We now observe that a bipartite edge-swap
\(\bar {\mathcal {B}}_{t+1} = \pi (\bar {\mathcal {B}}_{t}|f_{1},f_{2})\) always produces a new element of
\(\mathbb {B}\), since each bipartite edge has a distinct stub-label. For the same reason, if
\(\bar {\mathcal {B}}_{t+1} = \pi (\bar {\mathcal {B}}_{t}|f_{1},f_{2})\), then
f1 and
f2 are the only bipartite edges for which this relation holds. In particular, the transition kernel of the edge-swap Markov chain may be written
$$\begin{array}{*{20}l} P(\bar{\mathcal{B}}^{\prime}|\bar{\mathcal{B}}) = \left\{\begin{array}{ll} r(f_{1}, f_{2}|\bar{\mathcal{B}}) &\quad \exists f_{1}, f_{2} : \bar{\mathcal{B}}^{\prime} = \pi(\bar{\mathcal{B}}|f_{1},f_{2}) \\ 0 &\quad \text{otherwise,} \end{array}\right. \end{array} $$
where \(r(f_{1}, f_{2}|\bar {\mathcal {B}})\) is the probability that bipartite edges f1 and f2 are sampled and that x1=x2. It follows that P will be reversible with respect to the uniform measure on \(\mathbb {B}\) provided that \(r(f_{1},f_{2}|\bar {\mathcal {B}}) = r(f_{1}^{\prime },f_{2}^{\prime }|\bar {\mathcal {B}}^{\prime })\) whenever \(\bar {\mathcal {B}}^{\prime } = \pi (\bar {\mathcal {B}}|f_{1},f_{2})\) and \(\bar {\mathcal {B}} = \pi (\bar {\mathcal {B}}^{\prime }|f_{1}^{\prime },f_{2}^{\prime })\). To see why this is the case, note that \(r(f_{1},f_{2}|\bar {\mathcal {B}})\) depends on \(\bar {\mathcal {B}}\) only through the sizes of edges and the role distributions within each edge. These quantities are preserved under double edge swaps. We thus conclude that \(\mathcal {B}_{t}\) is reversible with respect to μ0, and therefore \(\mathcal {H}_{t}\) is reversible with respect to μ.
It remains to show irreducibility. We will construct a supported path in state space from \(\bar {\mathcal {B}}\) to \(\bar {\mathcal {B}}^{\prime }\), where these are stub-labeled bipartite graphs with fixed marginals. Choose x such that \(\bar {\mathcal {B}}^{x} \setminus \bar {\mathcal {B}}^{\prime x}\) is nonempty, and let \((v_{1},e_{1};x,i_{1}) \in \bar {\mathcal {B}}^{\prime x} \setminus \bar {\mathcal {B}}^{x}\). Then, there must exist edges of the form (v1,e2;x,i1) and (v2,e1;x,i2) in \(\bar {\mathcal {B}}^{x}\setminus \bar {\mathcal {B}}^{\prime x}\), since node v1 must be connected to some edge with role x, and similarly edge e1 must be connected to some node with role x in \(\bar {\mathcal {B}}\). In particular, the swap (v1,e2;x,i1),(v2,e1;x,i2)↦(v1,e1;x,i1),(v2,e2;x,i2) reduces the size of the set \(\bar {\mathcal {B}}^{x} \setminus \bar {\mathcal {B}}^{\prime x}\) by at least one. This is because we have generated the edge (v1,e1;x,i1), which was in \(\bar {\mathcal {B}}^{\prime x}\) by hypothesis, while neither of the removed edges (v1,e2;x,i1) or (v2,e1;x,i2) were in \(\bar {\mathcal {B}}^{\prime x}\). Repeating this procedure allows us to reduce the size of \(\bar {\mathcal {B}}^{x} \setminus \bar {\mathcal {B}}^{\prime x}\) indefinitely Iterating over all labels x is then sufficient to generate a supported path between any two stub-labeled bipartite graphs \(\mathcal {B}, \mathcal {B}^{\prime } \in \mathbb {B}\). Dropping the stub-labels produces a supported path between any two elements of \(\mathcal {B}\), and therefore any two elements of \(\mathcal {H}\), as was to be shown. □
For the feature of guaranteed nondegeneracy, we pay a computational cost in mixing times. While some results are known for mixing times of bipartite edge-swap Markov chains (
Kannan et al. 1999;
Erdös et al. 2010), the conditions under which these results are obtained are restrictive and often do not hold in empirical data sets. Results available for non-bipartite graphs (
Greenhill 2014;
2011) scale poorly with the node degrees and total number of edges, suggesting that the computational burden may be significant. Despite these considerations, edge-swap Markov chains can be nevertheless be practically deployed in a variety of practical settings, see
Fosdick et al. (2018) for a review.