Revealing networks from dynamics: an introduction

Marc Timme; Jose Casadiego

doi:10.1088/1751-8113/47/34/343001

1. Where are you linked?

1.1. Relating connection topology of a network to its dynamics

Networks are everywhere. And most of them are dynamic. From networks of biochemical reactions that regulate the metabolism in the cells of our bodies to the neuronal circuits in our brains, from social ties forming networks of our friendships and collaborators to the power grids and internet that provide huge amounts of electric energy and information every second. All of these systems form networks of units that interact to yield complex, collective forms of functions—and all are crucial to our everyday life.

The interaction topology of complex networks strongly impact their collective dynamics and thus the function of entire systems. For many network dynamical systems, for instance in physics and biology, the dynamics of individual units becomes more and more accessible whereas their intricate web of interactions remains uncertain or even often largely unknown. As an example, many constituents of protein and gene interaction networks are well characterized but how they interact and which pathways are relevant for suitable functioning is not well understood [1, 2]. In neuroscience, the number of units from which one can simultaneously measure neuronal activity is increasing rapidly from a few units to hundreds of them [3]. Still, identifying the synaptic connections of a neuronal circuit by anatomical methods is mostly restricted to individual synapses and computer-aided reconstruction based on optical methods for more than two cells becomes available only since very recently [4–6]. In social networks, even in simple settings such as basic games, pairwise interactions are roughly understood, but often both the (temporally varying) interaction network and its collective consequences remain a riddle. Thus, reconstructing the structure of interaction networks from (only) the collection of local dynamical data constitutes a current open challenge, with applications across the natural and social sciences as well as engineering.

Network dynamics: forward versus inverse problem. Yet, the vast majority of research on network dynamics has focused on the 'forward direction' of modeling and asked what types of collective dynamics emerge from a network of given topology. Researchers from the natural sciences and engineering systematically address the reverse questions—how to control a network or, more generally, how to design networks for a desired dynamics and thus function and how to infer topology from dynamics—now at a rapidly increasing pace: in particular in engineering, the design of systems for a specific function always was core [7] and with the systems becoming more complex, considering recurrently interacting networks becomes indispensable. Conversely, complex systems in physics and biology require a view on networked systems to understand how complex emergent phenomena contribute to (possibly optimal) system's dynamics or function [8–12]. Finally, also how one could perhaps redesign collective dynamics of networks, e.g., gene and protein networks [13, 14] poses challenges of frontier research.

The inverse problem of how to infer interaction topology from network dynamics constitutes the main topic of this review.

1.2. Aims and options of network inference

What do we aim to infer? Before addressing any inference problem, we have to clarify what we actually want to find out about the networked system and which level of detail we are interested in, see table 1. For instance, we may want to know effective or functional connectivity not caring about individual interactions per se but only about statistical dependences which the entire set of interactions yield between pairs of units through the collective network dynamics. Inter-unit correlations and various information-theoretic measures have been devised to solve such problems. These often neglect the temporal dynamics as they use temporal averages or statistical distributions of observables. Moreover, effective connectivity may depend on the current collective state and function of a given system and thus the same physical network may display different effective topologies for different functions or in different states. We may alternatively want to know structural connectivity, i.e. which unit directly interacts with which other units (and how)?

Table 1. Levels of interest. Which properties of the connections are we interested in?

	Distinctions
Property of connection	...and examples thereof
Structural versus effective	Direct interaction or statistical dependence?
	EEG versus connectomics for neural 'connectivity'
Pure existence	Presence or absence of links?
	does one gene directly influence another?
Sign	Positive, negative, or mixed-sign interaction?
	phase-advancing or phase retarding (for coupled oscillators);
	activating or inhibiting (for gene and protein interactions);
Directedness	Directed or undirected (bi-directed) interactions?
	chemical versus electrical neuronal synapses
Type of interaction	Continuous time or discrete time; linear versus nonlinear?
	chemical versus electrical synapses
	diffusive versus nonlinear coupling
Time scales	Instantaneous, temporally localized or extended?
	slot communication (in mobile phones) versus genetic interactions
Spatial scales	Local, global, non-local, bounded?
	message broadcasting, from wireless networks to cell tissue

In this review, we focus on these direct physical interactions and address various levels of detail. We also briefly present basic methods to infer different types of effective connectivity. By construction, such a review cannot be complete, also because the field is currently developing at a breathtaking pace. We therefore select specific reconstruction methods from those that are commonly used, appear promising for the future of the field, or have been recently developed and form the basis of current research.

What can we learn about the connections of a network from accessing the units' dynamics? Mathematically, inferring the connectivity constitutes a high-dimensional inverse problem and various methods have been devised to address this question. Every inference method starts from different levels of pre-knowledge about the system and has its own aims what to reconstruct, cf table 1. We may be interested in whether the interactions are directed or undirected, in whether or not a link exists, in the sign of the interaction dynamics, the type of links (e.g., electric versus chemical synapses, diffusive versus nonlinear coupling), the strengths and the temporal and spatial scales of interactions etc.

Structural connectivity may be very different from effective or functional connectivity (figure 1). On the one hand, high correlations may exist between two units that are not directly connected but only influenced by each other via a third unit they both directly interact with. In general, such indirect interactions may be induced not only by one third node, but equally by the entire collective dynamics of a network. On the other hand, even a strong direct interaction between two units does not necessarily mean that their dynamics is highly correlated; correlations could be submerged, e.g., by external noise or recurrent inputs the two units receive, e.g., from two distinct other parts of the network, or even from outside of it.

In general, effective and structural connectivity are related in a highly non-trivial way. In fact, a number of counter-intuitive phenomena have been observed in various systems. For instance, recent work on coupled oscillator networks highlight that under certain conditions noise may aid to reconstruct structural connectivity from correlation-based effective connectivity [15]; and one may detect small-world features in the functional connectivity even if it is derived from randomly connected dynamical systems without any specific small-world features [16].

Where do we start from? It is important to clarify which knowledge about the networked system we presuppose. Is the collective dynamics known to be simple such as converging to a fixed point of concentrations in biochemical networks or to a limit cycle in coupled oscillator networks? Or do we expect more complex, irregular, and perhaps unpredictable, chaotic or random types of dynamics? Can we change the dynamics of units by interfering with the system or can we just observe? Does the research question require a global analysis in state space or do we focus on a specific dynamical state where local analysis may suffice? We should answer these questions, among others, before using or developing any inference method—to achieve reconstruction at the level we need with best quality and minimal efforts, both experimentally and computationally.

Here we review recently developed approaches to inferring structural connectivity of a network from accessing its collective dynamics. The presented approaches assume various levels of pre-knowledge about the system and may or may not require the observer to interfere with the system. The article is structured as follows. We first clarify in section 2 what we mean by a network dynamical system, taking the view of continuous-time dynamics described by coupled ordinary differential equations. In sections 3–5, we explain three principally different classes of methods to obtain information about the structure of the network topology from dynamical quantities.

Section 3 illustrates basic theoretical approaches based on measuring and evaluating the response of a network to external perturbations or driving. As the response depends on both the external driving signal and the interaction topology of the network, sufficiently many driving-response experiments yield information about the entire network topology. A second class of methods sets up a model copy of the original system (section 4) and adapts the coupling matrix of the model so as to synchronize its dynamics with the original. If synchronization is achieved, the topology obtained for the model is taken as a proxy for the original. Finally, a set of direct methods (section 5) rely on measuring time series (or features thereof), evaluating temporal derivatives, and exploiting smoothness assumptions to find solutions to an optimization problem given by the restrictions by data.

We briefly comment on technical issues (section 6) and mention basic core approaches for identifying effective connectivity (section 7). These approaches rely on simple linear correlation, maximum entropy principles and related statistical inference methods. Finally, we provide an outlook (section 8) where we highlight current challenges, point out aspects sometimes overlooked and show potential research paths towards uncovering more of the topology of the networks that surround us.

This review on purpose is short but self-contained. It is intended for researchers mainly in the physical and biological sciences and engineering and assumes basic knowledge of dynamical systems and probability theory. Let us start with the details.

2. Nonlinear dynamics of networks

2.1. Systems of ordinary differential equations describing networks

Throughout the main part of this review, we consider networks of units assumed to be described by systems of ordinary differential equations. Discrete time maps coupled to a network are not discussed but typically approaches similar to those presented here are viable in slightly modified forms. We discuss specific issues for systems of pulse-coupled units, such as spiking neurons, in section 5. These are formally hybrid systems, i.e. mixtures of continuous-time and discrete time systems.

Assuming that interactions occur between pairs of coupled units, a generic network dynamical systems is given by

$\begin{equation} \frac{{\rm d}}{{\rm d}t}\boldsymbol{x}_{i}=\boldsymbol{f}_{i}(\boldsymbol{x}_{i})+\sum _{j=1}^{N}J_{ij}\boldsymbol{g}_{ij}(\boldsymbol{x}_{i},\boldsymbol{x}_{j})+\boldsymbol{I}_{i}(t)+\boldsymbol{\xi }_{i}(t) \end{equation} \tag{ 1 }$

where i, j ∈ {1, 2, ...N}, $\boldsymbol{x}_{i}(t)=[x_{i}^{(1)}(t),x_{i}^{(2)}(t),\ldots ,x_{i}^{(D)}(t)]^{\mathsf {T}}\in \mathbb {R}^{D}$ describes the state of the ith unit at time $t\in \mathbb {R}$ , and the functions $\boldsymbol{f}_{i}:\,\mathbb {R}^{D}\rightarrow \mathbb {R}^{D}$ and $\boldsymbol{g}_{ij}:\,\mathbb {R}^{D}\times \mathbb {R}^{D}\rightarrow \mathbb {R}^{D}$ mediate intrinsic and interaction dynamics of the D-dimensional units, respectively. The term I_i(t) represents a vector of external driving signals (possibly random) and ξ_i(t) symbolically represents external noise. Finally, the J_ij define the interaction topology, in the simplest setting in terms of the adjacency matrix A such that J_ij = A_ij = 1 if there is a direct physical interaction from unit j to i and J_ij = A_ij = 0 otherwise. In general, units' interaction may be higher order, e.g. requiring terms like h_ijk(x_i, x_j, x_k) added to the right-hand side of (1). For instance in gene and protein interaction networks, a protein (the so-called transcription factor, say unit k) is directly influencing the rate of transcription of a gene (say, unit j) to a DNA segment (say unit i). We do not treat such terms here explicitly. Their relevance for network dynamical systems is discussed in [17].

For some methods to infer effective connectivity, the functional form of (1) does not directly enter the inference argument, other methods can be extended to include higher order terms explicitly. We comment on higher order terms where appropriate (cf also section 5).

2.2. Rescaling, simplifications, and common interactions

Some a priori technical issues. Considering (1) as our basic level of description, in case the dimension D_i of the local dynamical system i depends on unit i, we would just consider the maximal occurring dimension D = max _{i ∈ {1, ..., N}}D_i and for each unit ignore the D − D_i dummy variables. This is done purely for notational simplification. Note further that in general, the quantity J_ij is a D × D matrix of coupling strengths $J_{ij}^{dd^{\prime }}$ , but that for many paradigmatic model systems, only one of these elements is non-zero, i.e. $J_{ij}^{dd^{\prime }}=J_{ij}^{dd^{\prime }}\delta _{d,d_{1}}\delta _{d^{\prime },d_{2}}$ such that $J_{ij}=J_{ij}^{d_{1}d_{2}}$ is sufficient to describe the influence of unit j on i.

Given a dynamical system of the form (1), the right-hand side is determined up to N × D additive constants $C_{i}^{d}$ and one overall multiplicative constant C₀. Shifting the state variables $x_{i}^{d}$ to co-moving frames and rescaling time enables us to set $C_{i}^{d}=0$ for all i and all d and C₀ = 1 without loss of generality.

For simplicity of presentation, we furthermore describe the methods below as if they were for networks of coupled one-dimensional units only. Often, different dimensions may be treated independently during reconstruction.

We thus take the coupled equations

$\begin{equation} \fbox{$\displaystyle\frac{{\rm d}}{{\rm d}t}x_{i}=f_{i}(x_{i})+\sum \limits_{j=1}^{N}J_{ij}g_{ij}(x_{i},x_{j})+I_{i}(t)+\xi _{i}(t)$} \end{equation} \tag{ 2 }$

as our basic network characterization we start from, where now the variables x_i and x_j and the functions f_i and g_ij are treated as real scalars.

Common Interaction Functions. Only non-trivial coupling terms that are not identically zero,

$\begin{equation} \frac{\partial g_{ij}}{\partial x_{j}}\ne 0 \end{equation} \tag{ 3 }$

for the relevant domain of arguments actually contribute to interactions, so we assume all coupling functions g_ij for which J_ij ≠ 0 to be non-trivial in this sense. A broad range of systems exhibits diffusive coupling such that

$\begin{equation} g_{ij}(x_{i},x_{j})\propto (x_{j}-x_{i}) \end{equation} \tag{ 4 }$

which is a special case of coupling functions

$\begin{equation} g_{ij}(x_{i},x_{j})=\tilde{g}_{ij}(x_{j}-x_{i}) \end{equation} \tag{ 5 }$

that depend on state differences (e.g. phase differences for coupled oscillators) only. The simplest non-trivial form of interaction is linear coupling,

$\begin{equation} g_{ij}(x_{i},x_{j})=A_{ij}x_{j}, \end{equation} \tag{ 6 }$

and does not depend on the dynamical variable of the unit it influences.

We have now set the stage to dive into specific inference approaches.

3. Driving-response experiments

One idea of inferring network topology is to measure the collective response of a network dynamical system to driving by external signals. For instance, if a system exhibits a stable collective state (e.g. fixed point or periodic orbit, cf figure 2), it will relax back to that state after a transient input (pulse), if the latter is not too strong (such as to not kick the system out of the basin of attraction of the stable state) [18–22]. Here, the input signal (driving) effectively changes the initial condition of the system, leaving the system features (given by the local and interaction functions and their parameters) the same. Similarly, a stable state of the system will generically move in state space (and keep qualitatively the same stability properties) in response to sufficiently weak, temporally constant external perturbations. These kinds of perturbations effectively creates a non-identical but similar systems with different parameters determined by the driving signal.

**Figure 2.** **Stable state approaches to network inference.** (a) Constant external driving signal (parameter change) moves the stable fixed point (gray) in state space to another position (blue). The difference vector v (red) depends on both the driving signal and the network topology such that several measurements of v under different driving conditions yields information about the topology. (b) Same as (a) but for transient perturbative signal: After the driving signal is switched off, the dynamics relaxes back (dashed blue trajectory) towards the original fixed point (gray). Now, the relaxation trajectory contains information about the topology.
Download figure:
Standard image High-resolution image

Both the relaxation dynamics and the shift in state space in general depend not only on the external signal (which unit is perturbed, how and how strongly, i.e. known quantities), but also on the (unknown) interaction topology of the network. Each collective response of the system to an external perturbation yields a restriction on the network topology such that sufficiently many driving-response experiments may reveal the entire topology. In this section, we present the main ideas underlying several related driving-response approaches.

3.1. Restrictions from local fixed point analysis

Recent efforts for developing methods to identify a network's topology have emerged from the need to understand biological, in particular gene regulatory networks [1, 23–25]. Such networks consist of genes and proteins that interact with each other within a cell [26, 27]. These interactions in particular control (indirectly) the rates at which genes are transcribed into mRNA and the regulatory features emerge via the interactions and generally do not follow from the single-gene level [28]. Gene regulatory networks and other reaction networks, e.g., in chemistry or population dynamics, are often modeled by nonlinear differential equations

$\begin{equation} \dot{z}_{i}=f_{i}(z_{1},z_{2},\ldots ,z_{N};\boldsymbol{p},\tilde{\boldsymbol{I}}), \end{equation} \tag{ 7 }$

describing the rate of change of the expected numbers (or concentrations) z_i(t) of entities (e.g. genes, atoms and molecules or individual organisms) at time t in terms of their dependence on the number of other entities [27]⁴. Here p is a set of parameters and $\tilde{\boldsymbol{I}}$ represents a set of external perturbations directed to the entities. The z_i are typically positive real numbers but mathematically there is no restriction for them to also be negative. Under the assumption that such a system is close to a steady state $\boldsymbol{z}^{*}=(z_{1}^{*},z_{2}^{*},\ldots ,z_{N}^{*})$ , a stable fixed point where $f_{i}(z_{1},z_{2},\ldots ,z_{N};\boldsymbol{p},\tilde{\boldsymbol{I}})=0$ for all i, the dynamics for perturbations $x_{i}(t)=z_{i}(t)-z_{i}^{*}$ from steady state concentrations of such nonlinear models may be approximated to first order in the x_i by

$\begin{equation} \dot{x}_{i}=\sum _{j=1}^{N}J_{ij}x_{j}+I_{i}(t), \end{equation} \tag{ 8 }$

where the local Jacobian J_ij = (∂f_i/∂z_j)(z*) is the effective interaction matrix given the steady state and I_i(t) is assumed to be an external perturbation linearly coupling to deviations of variables x_i. We remark that when deriving (8) from (7) we implicitly use the relation

$\begin{equation} I_{i}(t)=\sum _{j}\tilde{I}_{j}(t)\frac{\partial f_{i}}{\partial \tilde{I}_{j}}\bigg|_{\boldsymbol{z}=\boldsymbol{z}^{*},\tilde{\boldsymbol{I}}=\mathbf {0}}+\mathcal {O}(\tilde{I}_{i}\tilde{I}_{j}) \end{equation} \tag{ 9 }$

ignoring the second and higher order terms.

3.1.1. Driving the system constantly to move a stable state (changing parameters)

The method presented now effectively moves the parameters and thus in particular the fixed points of a system. Originally intended for gene and protein interaction networks, Gardner and coworkers [1, 29] have explicated that reconstruction is possible, at least for small networks. We here discuss two approaches of using the general form (8) to reconstruct the interaction network, i.e. the J_ij. The first is based on driving the system (8) by sufficiently small, temporally constant driving forces I_i(t) = I_{i, m} to new fixed points $x_{i,m}^{*}\ne 0$ close to the original one $x_{i}^{*}=0$ . As the original fixed point was structurally stable, the new fixed point will generically exist and have qualitatively the same stability properties if the I_{i, m} are small enough (figure 2(a)). This is because solutions of (7), in particular fixed point solutions, typically vary continuously with changing parameters (here: changing I_{i, m} from zero) and there is no bifurcation close to the parameters yielding a generic stable fixed point $y_{i}^{*}$ .

The observed values assumed at each new steady state together with the (known) driving signals provide information about the interaction topology J_ij. Performing several perturbation experiments m ∈ {1, ..., M} yield N × M equations

$\begin{equation} \sum _{j=1}^{N}J_{ij}x_{j,m}^{*}=-I_{i,m}, \end{equation} \tag{ 10 }$

one for each experiment m and for each unit i.

After an arbitrary number M of experiments, equation (10) in matrix form becomes

$\begin{equation} JX=Y, \end{equation} \tag{ 11 }$

where $J\,{\in}\, \mathbb {R}^{N\times N}$ represents the connectivity among the units, $X\in \mathbb {R}^{N\times M}$ the steady state values with $X_{i,m}=x_{i,m}^{*}$ , and $Y{\in \mathbb {R}}^{N\times M}$ the perturbations Y_{i, m} = −I_{i, m} that we assume to be known. This matrix equation restricts the connectivity J given the measured data X and the input perturbations I. The matrix equations constraining the full network topology J can be split into N equations

$\begin{equation} \boldsymbol{J}_{i}X=\boldsymbol{Y}{}_{i}, \end{equation} \tag{ 12 }$

one for each input connectivity $\boldsymbol{J}_{i}:=(J_{i,1},\ldots ,J_{i,N})\in \mathbb {R}^{1\times N}$ of a unit i. Thus, the same set of data X restrict all the sets of units providing interactions to i ∈ {1, ..., N} but the data $\boldsymbol{Y}{}_{i}=(Y_{i,1},\ldots ,Y_{i,M})^{\sf {T}}$ are unit dependent. This reduction to N individual equations also admits to split the computational effort for solving them. The problem becomes trivially parallelizable because for different i, these restrictions (12) are independent in the sense that reconstruction of the input coupling strengths to each unit i can be performed without taking care of input coupling strengths of other units k ≠ i.

3.1.2. Observe relaxation to stable state after transient driving (changing initial conditions)

A second approach assumes that the quantities y_i (and thus the x_i) are perturbed such that at time t₀ we have $x_{i}(t_{0})=x_{i}^{(0)}$ and the transient dynamics y_i(t) of relaxation back to the original fixed point $y_{i}^{*}$ (and thus $x_{i}^{*}=0$ ) are observed at a sequence of times t_m > t₀, m ∈ {1, ..., M}. This yields the same type of equation (11), but now with the $Y_{i,m}=-\hat{\dot{x}}_{i,m}$ being estimates of the derivatives $\dot{x}{}_{i}(t_{m})$ . We remark that these derivatives may be estimated in various ways, each of them requiring a resolution of the measured data on sufficiently small time scales, cf section 5.1. In this second approach, the different times the transient dynamics is measured replaces the different driving experiments in the first approach. Finally, both approaches can of course be combined, several experiments evaluated at several time points, again yielding the same form of restrictions (11).

3.2. Solving the restricting equations

How can we finally obtain the coupling elements J_ij and thus the interaction network? In principle, solving the matrix equation (11) yields the interaction matrix J as a function of the known data X and Y. One may naively assume that it is directly solvable once the number of experiments equals the number of units in the network, M = N. However, this problem can be numerically ill-conditioned [30] for large N, such that the result is not reliable. In addition, as also stressed in [1], the results may be sensitive to noise in the measured data.

A way to overcome this problem is by performing (many) more experiments than nodes available, M ≫ N, thus over-determining (11). In general, due to noise and measurement inaccuracies, this yields the system (12) to be inconsistent such that there is no vector J_i that satisfies all constraints. It will still be possible to find a robust approximation $\boldsymbol{\hat{J}}_{i}$ that minimizes the error between the predicted dynamics J_iX and the actual dynamics Y_i for a given node. Specifically, this error function may be modeled as

$\begin{equation} E_{i}(\boldsymbol{\hat{J}}_{i})=d(\boldsymbol{Y}_{i},\,\boldsymbol{\hat{J}}{}_{i}X), \end{equation} \tag{ 13 }$

where the distance measure $d(\boldsymbol{v},\boldsymbol{w})=\Vert {\boldsymbol v}-{\boldsymbol w}\Vert _{p}^{p}$ between two vectors $\boldsymbol{v},\boldsymbol{w}\in {\mathbb R}^M$ is commonly defined in terms of the pth power

$\begin{equation} d(\boldsymbol{v},\boldsymbol{0})=\Vert \boldsymbol{v}\Vert _{p}^{p}=\left(\sum _{m=1}^{M}v_{m}^{p}\right) \end{equation} \tag{ 14 }$

of an L_p-norm with p ⩾ 1, due to its convexity properties. This guarantees that any local minimum of E_i is also a global minimum [31]. Particularly, the L₂-minimization criterion,

$\begin{equation} E_{i}(\boldsymbol{\hat{J}}_{i})=\Vert \boldsymbol{Y}_{i}-\boldsymbol{\hat{J}}{}_{i}X\Vert _{2}^{2}, \end{equation} \tag{ 15 }$

is of great importance because it has an analytical solution for its extremum. Equating to zero the derivatives of the error function with respect to the matrix elements, $\frac{\partial }{\partial J_{ik}}E_{i}(\boldsymbol{\hat{J}}{}_{i})\stackrel{!}{=}0$ , yields an analytical solution (see appendix A) to L₂ error-minimization given by

$\begin{equation} \boldsymbol{\hat{J}}{}_{i}=\boldsymbol{Y}{}_{i}X^{\mathsf {T}}(XX^{\mathsf {T}})^{-1}. \end{equation} \tag{ 16 }$

Evaluating such equations for all i ∈ {1, ..., N} yields the complete reconstructed network $\hat{J}$ . This mathematical form of minimum L₂-norm solution is implemented in many mathematical packages (e.g. as the mrdivide function in Matlab [32] or the LeastSquares function in Mathematica [33]). We explicate that the obtained off-diagonal terms $\hat{J}_{ij}$ serve as the best estimate (in the procedural sense using the L₂-minimization above) for the coupling constants J_ij ; at the same time, the diagonal elements J_ii are not relevant for the network topology because the influence of these terms on the dynamics of unit i is physically indistinguishable from an intrinsic drive to i included in the local dynamics specified by f(x_i) in (2), cf equation (3).

Experimentally, it is in principle possible to over-determine a system of equations by performing repeated measurements on the network until the condition M ≫ N is achieved. Nevertheless, it may often seem unsuitable for large networks due to the large number of experiments that would be required.

So, if the size of the network is an issue or the number of available measurements insufficient to over-determine the system, we have M < N. Assuming that the network is sparse (i.e. that each unit is connected with a small number of others and thus many connection strengths are J_ij = 0), may still yield the collection of all network links. This implies that several J_ij are effectively set to zero, therefore decreasing the amount of unknown coefficients to be solved for. It leaves us with the problem of finding which links are actually present and which are not. We present two related options to do so.

3.2.1. Sparse solution with a bounded connectivity per unit

If an upper bound for the number of links K_i < M is known, we may assume only M < N experiments are available and we have a rough idea of how many nodes (at most) are connected to a particular node. In particular, assume that the number of incoming connections for node i is given by at most K_i < M [1]. This assumption shifts the system (12) from having more unknowns than constraints, M < N, to have more constraints than unknowns, M > K_i, therefore, implicitly over-determining the system. It means that out of the N nodes present in the network only K_i of them are chosen to be part of the system of equations for node i. Such assumption may be done when there is some a priori information about the network's connectivity and dynamics.

Specifically, the system of equations (12) may in principle be rewritten as

$\begin{equation} \boldsymbol{B}_{i}Z_{i}=\boldsymbol{Y}{}_{i}, \end{equation} \tag{ 17 }$

where ${\boldsymbol B}_{i}\in \mathbb {R}^{1\times K_{i}}$ is the reduced connectivity vector for node i that contains the coupling strengths for the selected nodes and $\boldsymbol{Z}_{i}\in \mathbb {R}^{K_{i}\times M}$ is a matrix that contains the states of such nodes. If we knew which K_i of the N − 1 possible connections actually contributed, we could use equation (16) to solve (17) using L₂-minimization yielding

$\begin{equation} \hat{\boldsymbol{B}}_{i}=\boldsymbol{Y}{}_{i}Z_{i}^{T}(Z_{i}Z_{i}{}^{T})^{-1}, \end{equation} \tag{ 18 }$

where $\hat{\boldsymbol{B}}_{i}$ is the best approximation to B_i.

Yet, it so far remains unclear which of the ${{N}\choose{K_{i}}}=\frac{N!}{K_{i}!(N-K_{i})!}$ possible combinations of incoming connections is best suited for reproducing the dynamics of i. In principle, $\hat{\boldsymbol{B}}_{i}$ may be calculated for each combination of K_i genes, and the combination that yields the smallest value of the L₂ norm $\Vert \hat{\boldsymbol{B}}_{i}\Vert _{2}$ in (18) may be chosen as the best estimate for J_i. The efficiency of such a procedure relies in number K_i of interactions per node. Hence, choosing a proper K_i aiming to recover the largest number of real interactions with the smallest number of false positives is a key factor to achieve a successful topological reconstruction.

3.2.2. Maximizing the sparseness of the connectivity matrix

If M < N and the K_i are unknown, cannot be estimated or there are too many of them (making the combinatorial search practically impossible) maximizing the number of zero entries in J, (i.e. minimizing the K_i and thus maximizing sparseness) may be a way to solve (12). This approach is particularly useful if the only a priori knowledge about the network's connectivity is some sparsity.

For general matrix equations

$\begin{equation} A\boldsymbol{y}=\boldsymbol{b}, \end{equation} \tag{ 19 }$

where $A\in \mathbb {R}^{m\times n}$ , $\boldsymbol{y}\in \mathbb {R}^{n\times 1}$ and $\boldsymbol{b}\in \mathbb {R}^{m\times 1}$ , singular value decomposition (SVD) of A according to

$\begin{equation} A=U\Sigma V{}^{\mathsf {T}}, \end{equation} \tag{ 20 }$

yields an analytic solution

$\begin{equation} \boldsymbol{y}=V\tilde{\Sigma }U{}^{\mathsf {T}}\boldsymbol{b}+V\boldsymbol{c}, \end{equation} \tag{ 21 }$

where $\tilde{\Sigma }=\Sigma ^{\mathsf {T}}(\Sigma \Sigma ^{\mathsf {T}})^{-1}$ that parametrizes the space of all solutions through the vector $\boldsymbol{c}\in \mathbb {R}^{n\times 1}$ with c_i = 0 for i ∈ {1, ..., r} and r = Rank(A).

In our reconstruction problem, we are seeking to maximize the number of zero entries in J based on solving the restricting equations (12) for J_i as we solved (19) for y. Consider the transpose

$\begin{equation} X^{\mathsf {T}}\boldsymbol{J}_{i}^{\mathsf {T}}=Y{}_{i}^{\mathsf {T}} \end{equation} \tag{ 22 }$

of (12). The analogous SVD-based solution then reads

$\begin{equation} \boldsymbol{J}{}_{i}^{\mathsf {T}}=V\tilde{\Sigma }U{}^{\mathsf {T}}\boldsymbol{Y}{}_{i}^{\mathsf {T}}+V\boldsymbol{c}, \end{equation} \tag{ 23 }$

where $U\in \mathbb {R}^{M\times M}$ , $V\in \mathbb {R}^{N\times N}$ , $\tilde{\Sigma }\in \mathbb {R}^{N\times M}$ and $\boldsymbol{c}\in \mathbb {R}^{N\times 1}$ is a vector of remaining coefficients parametrizing the solution space. Thus, the set of all possible solutions for J_i is given by (23). The goal now is to pick the sparsest solution from this set. Therefore, equation (23) may be posed as the overdetermined (M > N − r) problem

$\begin{equation} V\tilde{\Sigma }U^{T}\boldsymbol{Y}_{i}^{T}=-V\boldsymbol{c}. \end{equation} \tag{ 24 }$

Minimizing the L₁ error

$\begin{equation} E_{i}(\boldsymbol{c})=\Vert V\tilde{\Sigma }U^{T}\boldsymbol{Y}_{i}^{T}+V\boldsymbol{c}\Vert _{1}, \end{equation} \tag{ 25 }$

yields a sparse solution [31]. However, unlike the L₂ minimization, L₁ minimization has no analytical solution, so choosing an appropriate iterative algorithm to solve it is essential. The Barrodale Roberts algorithm [34] provides a particularly fast solver that has been vastly used in the field of network reconstruction [10, 12, 29, 35].

Remarks. The core equations (50) also provide the option to reconstruct network connectivity via maximizing sparseness of the network and there is a particular relation to what is known as compressive sensing cf, e.g. [36].

In general, linearization of dynamical equations, e.g. linearizing in state variables close to fixed points, often well approximates nonlinear dynamics. This seems to hold for gene regulatory networks [29] as well as in models of Drosophila segmentation networks [37] and may thus be of general use across systems. For gene and protein interaction networks, often single genes are selected for perturbations in an experiment, with the danger of providing non-generic restrictions in (12). Finally, for some systems increasing the number of experiments may reduce the resulting computational costs such that this trade-in may be considered.

3.3. Driving the system's state to a fixed point

One may also infer network structure by externally driving the system to a fixed point and shifting component values of the fixed point for individual units [38, 39]. As before, the differences between pairs of steady-state responses are analyzed. Let us describe the network as

$\begin{equation} \dot{x}_{i}=f_{i}(x_{i})+\sum _{j=1}^{N}A_{ij}g_{ij}(x_{i},x_{j})+I_{i}, \end{equation} \tag{ 26 }$

where the A_ij ∈ {0, 1} are the entries of the adjacency matrix specifying only if an interaction from j to i is present (A_ij = 1) or not (A_ij = 0), the g_ij(x_i, x_j) are the coupling functions from j ∈ {1, 2, ..., N} to i, and I_i is the driving signal applied to unit i. It was demonstrated by Yu and Parlitz [38] that under driving signals

$\begin{equation} I_{i}=-(x_{i}-\hat{x}_{i})\theta , \end{equation} \tag{ 27 }$

with sufficiently large gain factor $\theta \in \mathbb {R}$ and Lipschitz continuous f_i and g_ij , the network may be driven to a globally stable fixed point $\boldsymbol{x}^{*}:=(x_{1}^{*},x_{2}^{*},\ldots ,x_{N}^{*})^{\mathsf {T}}\in \mathbb {R^{N}}$ that is arbitrarily close to a predetermined point $\hat{\boldsymbol{x}}:=(\hat{x}_{1},\hat{x}_{2},\ldots ,\hat{x}_{N})^{\mathsf {T}}\in \mathbb {R^{N}}$ , independent of the initial conditions [38] . At such fixed point we have

$\begin{equation} (x_{i}^{*}-\hat{x}_{i})\theta =f_{i}\left(x_{i}^{*}\right)+\sum _{j=1}^{N}A_{ij}g_{ij}\left(x_{i}^{*},x_{j}^{*}\right) \end{equation} \tag{ 28 }$

for all i. To understand how the network responds to changes in $\hat{\boldsymbol{x}}$ , let us define

$\begin{equation} \Delta _{i}:=f_{i}(\hat{x}_{i})+\sum _{j=1}^{N}J_{ij}g_{ij}(\hat{x}_{i},\hat{x}_{j})-\left[f\left(x_{i}^{*}\right)+\sum _{j=1}^{N}J_{ij}g_{ij}\left(x_{i}^{*},x_{j}^{*}\right)\right], \end{equation} \tag{ 29 }$

hence, equation (28) may be rewritten in terms of Δ_i as

$\begin{equation} \left(x_{i}^{*}-\hat{x}_{i}\right)\theta =f_{i}(\hat{x}_{i})+\sum _{j=1}^{N}A_{ij}g_{ij}(\hat{x}_{i},\hat{x}_{j})-\Delta _{i}. \end{equation} \tag{ 30 }$

The main idea at this point is to check whether unit k couples to unit i by evaluating how $x_{i}^{*}$ reacts to the shifting of $x_{k}^{*}$ through $\hat{x}_{k}$ . Therefore, let us set

$\begin{equation} \hat{x}_{j}=\left\lbrace \begin{array}{@{}l@{\quad }l@{}}\hat{x}_{k} & \mbox{if }j=k\\ 0 & \mbox{if }j\ne k, \end{array}\right. \end{equation} \tag{ 31 }$

and evaluate (30) at this point, yielding

$\begin{equation} x_{i}^{*}\theta =A_{ik}g_{ik}\left(0,\hat{x}_{k}\right)-\Delta _{ik}+f_{i}(0)+\sum _{j\ne k}^{N}A_{ij}g_{ij}\left(0,0\right). \end{equation} \tag{ 32 }$

Shifting the same component twice to $\hat{x}_{k}^{(1)}$ and $\hat{x}_{k}^{(2)}$ , resp., fixes a reference frame and thereby yields equations characterizing the difference between responses of a given unit i to a shifted fixed point component for unit k. It results in

$\begin{equation} \left[x_{i,2}^{*}-x_{i,1}^{*}\right]\theta =A_{ik}[g_{ik}(0,\hat{x}_{k,2})-g_{ik}(0,\hat{x}_{k,1})]+[\Delta _{ik,1}-\Delta _{ik,2}], \end{equation} \tag{ 33 }$

a condition that may be rewritten as

$\begin{equation} S_{ik}\theta =A_{ik}\eta _{ik}+\lambda _{ik}. \end{equation} \tag{ 34 }$

We remark that the differences [Δ_{ik, 1} − Δ_{ik, 2}] in (33) are not known but the general form (34) may be used to reveal whether they are zero, λ_ik = 0, or not: given that we are dealing with entries of the adjacency matrix, we may infer two possible outcomes from (34), whether system k is coupled to i or not. Specifically,

$\begin{equation} S_{ik}\theta =\left\lbrace \begin{array}{@{}ll@{}}\eta _{ik}+\lambda _{ik} & {\rm if\ unit\ } k\ {\rm coupled\ to}\ i\\ \lambda _{ik} & \hbox{if not.} \end{array}\right. \end{equation} \tag{ 35 }$

It was also demonstrated by Yu and Parlitz [38] that Δ_i decreases with θ if f_i and g_ij are Lipschitz continuous. This permits to discriminate whether there is a coupling between a pair k → i. Especially, when θ is sufficiently large, the |S_ikθ| values may be classified into sets $\mathbb {I}_{0}$ and $\mathbb {I}_{1}$ , non-coupled and coupled sets, respectively. To construct such sets, Yu and Parlitz [38] propose to:

For fixed k, organize the |S_ikθ| values in an ascending order, i.e., the values should be arranged into a new series z where z_{k, j} < z_{k, j + 1} < ... that defines the indexing j of the z_{k, j}.
Establish the critical values of each set. In this case, the critical values j_c and j_{c + 1} define the end and the beginning of $\mathbb {I}_{0}$ and $\mathbb {I}_{1}$ , respectively. Yu and Parlitz suggest to find j_c by requiring the distance between any element from $\mathbb {I}_{1}$ with respect to z_{i, 1} to be larger than twice the size of $\mathbb {I}_{0}$ , $z_{k,j}-z_{k,1}\ge 2(z_{k,j_{c}}-z_{k,1})$ for all j > j_c.

Finally, by performing this process on every unit, the topology of the network may be reconstructed.

Remarks. The approach relies on the feasibility of (i) perturbing the systems in a specific manner, and (ii) measuring the steady states, suggesting that it is model independent to a large extent in that it does not in principle require knowledge of local dynamics or coupling functions (yet it requires these to be Lipschitz continuous). These features may make the approach of interest under certain conditions where only little pre-knowledge about the system is available. Yet the approach requires substantial control over the system, in particular, the option of externally driving every unit (independently) constitutes a major requirement. The study [38] does not state how indirect actions are treated, for instance, how is an indirect effect from unit k via k' onto i distinguished from direct interactions from k to i? Possibly, driving k may indirectly affect i only weakly and this potentially second order contribution could be treated in a perturbative way.

3.4. Distributed perturbations to collective periodic dynamics

The approaches presented above (sections 3.1–3.3) required the existence of fixed points either in the original system or in the presence of sufficiently strong external driving. Yet, more complex dynamics prevails in a large range of biological, physical or artificial systems. The second most simple invariant dynamics are periodic orbits and often arise as limit cycles of coupled oscillatory units, thus asking for a generalization beyond simple fixed point approaches. Even more complex dynamics, e.g. collective chaos, is treated by a direct approach below in section 5.1.

Is it possible to infer network topology from driving-response experiments also for oscillator networks? Below we positively answer this question, at the same time showing that distributed driving signals not precisely targeted to one or a few units are at least equally appropriate to infer network topology. Several theoretical model studies of coupled oscillators [40–45] have shown that the response of single units in a network to constant or periodic driving signals as well as the transient dynamics of synchronization depend on the network topology. Some recent works [43, 44] helped us to understand specific quantitative influence of structural features on the response and how the network response provides some information about the structure (and the driving signal). For instance, the magnitude of responses seem to decay exponentially with distance from the driving node [43], and the coarse-scale connectivity among connected components may qualitatively determine to which degree network dynamics is coordinated globally [44]. Further developing such insights, a follow-up work [10] presents a method of reconstructing network topology from systematic measurements of network responses to temporally constant, distributed driving signals in coupled phase-oscillator networks.

The basic idea is that any network displaying a stable invariant dynamics, not just fixed points, yield a specific response to a given perturbation as a consequence of the network's topology and the perturbation itself [40, 42, 44], cf figure 3. If the perturbations are small, the invariant set is typically qualitatively unchanged and only slightly moved in state space. Keeping track of which driving signals resulted in which responses, we can collect evidence about the interactions among units in a network. Sufficiently many repetitions of appropriate driving-response experiments then yield the network's topology.

**Figure 3.** **Topology revealed by driving a stable periodic orbit.** As for fixed point approaches discussed above (section 3.1 and figure 2), the difference vector v (red) depends on both the driving signal and the network topology such that several measurements of v under different driving conditions yields information about the topology.
Download figure:
Standard image High-resolution image

**Figure 4.** **Revealing network topologies from response dynamics for directed networks of phase-locked Kuramoto oscillators.** Variables evolve according to $\dot{x}_{i}=\omega _{i}+k^{-1}\sum _{j=1}^{N}J_{ij}\sin (x_{j}-x_{i})$ , with random frequencies ω_i ∈ [0.1, 1] and k = 8 directed interactions of strengths *J_ij* = k⁻¹, randomly selected for each unit. Panels show reconstructed coupling matrices J for (a) N = 16, and M = 32, and (b) N = 64, and M = 32. The matrices are gray-coded from white ( $\hat{J_{ij}}$ = 0) to black ( $\hat{J}_{ij}=\max _{i^{\prime }j^{\prime }}\lbrace \hat{J}_{i^{\prime }j^{\prime }}\rbrace$ ). Insets: element-wise absolute difference $|J_{ij}^{{\rm derived}}-J_{ij}^{{\rm original}}|$ , plotted on the same scale.
Download figure:
Standard image High-resolution image

**Figure 4.** **Revealing network topologies from response dynamics for directed networks of phase-locked Kuramoto oscillators.** Variables evolve according to $\dot{x}_{i}=\omega _{i}+k^{-1}\sum _{j=1}^{N}J_{ij}\sin (x_{j}-x_{i})$ , with random frequencies ω_i ∈ [0.1, 1] and k = 8 directed interactions of strengths *J_ij* = k⁻¹, randomly selected for each unit. Panels show reconstructed coupling matrices J for (a) N = 16, and M = 32, and (b) N = 64, and M = 32. The matrices are gray-coded from white ( $\hat{J_{ij}}$ = 0) to black ( $\hat{J}_{ij}=\max _{i^{\prime }j^{\prime }}\lbrace \hat{J}_{i^{\prime }j^{\prime }}\rbrace$ ). Insets: element-wise absolute difference $|J_{ij}^{{\rm derived}}-J_{ij}^{{\rm original}}|$ , plotted on the same scale.
Download figure:
Standard image High-resolution image

Weakly coupled limit cycle oscillators are well-characterized by ignoring (in the long time limit) amplitude responses to coupling and by modeling them as phase-oscillators with coupling via their phase differences only. A method to infer network topology for coupled phase-oscillators with arbitrary stable, phase-locked dynamics has been presented in [10]. One key observation is that the phase differences (yet not the phases themselves) in such systems converge with time and that comparing differences of phase differences among different driving conditions yield restrictions to network topology. The network dynamics is given by

$\begin{equation} \dot{\phi }_{i}=\omega _{i}+\sum _{j=1}^{N}J_{ij}g_{ij}(\phi _{j}-\phi _{i})+I_{i,m}, \end{equation} \tag{ 36 }$

where ϕ_i(t) and ω_i are the phase and natural frequency of oscillator i, respectively, J_ij the connection strength from oscillator j to i and I_{i, m} is a temporally constant driving signal applied to i during the experiment m. We assume that in the absence of driving, I_{i, m} ≡ 0, the network is in a phase-locked state where $\dot{\phi }_{j}-\dot{\phi }_{i}=0$ for all i, j. We remark that one, several or all units may be perturbed during each given experiment, such that driving can be arbitrarily distributed and effectively changes the frequencies of the driven oscillators. As for the approaches relying on fixed points (section 3.1), the existence of a stable periodic orbit (and thus in particular a phase-locked state) implies that sufficiently small constant perturbations yield an (only slightly moved and slightly different) stable periodic orbit.

If for a given driving condition m, the dynamics becomes phase-locked, the phase differences

$\begin{equation} \Delta _{ij,m}(t)=\phi _{j,m}(t)-\phi _{i,m}(t) \end{equation} \tag{ 37 }$

become constant in time, $\Delta _{ij,m}(t)\rightarrow \Delta _{ij,m}^{*}:=\lim _{t\rightarrow \infty }\left(\phi _{j,m}(t)-\phi _{i,m}(t)\right)$ because all oscillators move at the same collective frequency

$\begin{equation} \Omega _{m}=\omega _{i}+\sum _{j=1}^{N}J_{ij}g_{ij}(\phi _{j,m}-\phi _{i,m})+I_{i,m}. \end{equation} \tag{ 38 }$

Hence, if the network is perturbed by a sufficiently small driving signal, the original phase-locked state (for I_{i, m} ≡ 0) is slightly moved such that $|\Delta _{ij,m}^{*}-\Delta _{ij,0}^{*}|\ll 1$ and there is a small difference between the perturbed and non-perturbed collective frequencies Ω_m and Ω₀. Defining the effective frequency difference D_{i, m} := Ω_m − Ω₀ − I_{i, m} of oscillator i, and approximating the arbitrarily nonlinear coupling functions g_ij by a first order Taylor expansion around $\Delta _{ij,0}^{*}$ we obtain

$\begin{equation} D_{i,m}=\sum _{j=1}^{N}\hat{J}_{ij}\theta _{j,m} \end{equation} \tag{ 39 }$

where θ_{j, m} := ϕ_{j, m} − ϕ_{j, 0} is the phase shift and $\hat{J}$ is the Laplacian matrix of the network given by

$\begin{equation} \hat{J}_{ij}=\left\lbrace \begin{array}{@{}l@{\quad }l@{}}-J_{ij}g_{ij}^{\prime } \big(\Delta _{ij,0}^{*}\big) & \mbox{for }i\ne j\\ {\displaystyle \sum _{k,k\ne i}}J_{ij}g_{ik}^{\prime }\big(\Delta _{ik,0}^{*}\big) & \mbox{for }i=j. \end{array}\right. \end{equation} \tag{ 40 }$

Now, identifying the matrices X_{i, m} = D_{i, m} and Y_{i, m} = θ_{i, m} we have reduced the problem of identifying network topology using distributed perturbations in systems of limit cycle oscillators to solving the same linear algebraic equation (11).

As remarked in previous sections, several experiments are necessary in order to perform the reconstruction of $\hat{J}$ . Therefore, from repeated measurements for different conditions it is possible to rewrite equation (39) in the form (11) in terms of $Y=D\in \mathbb {R}^{N\times M}$ and $X=\Theta \in \mathbb {R}^{N\times M}$ representing the differences between collective dynamics and phase shifts for each of the N systems during the M experiments.

Now, for sufficiently many experiments, i.e. M ⩾ N, the reconstruction may be accomplished in principle but may be ill-conditioned numerically. In addition, for large N the many required experiments may not be practical. if this condition is not fulfilled, methods like the setting a maximum connectivity per system or maximizing the sparseness of the connectivity matrix (section 3.2.2), may be applied in order to make the problem of retrieving $\hat{J}$ an overdetermined problem.

As shown in [10], we may compare how accurate our prediction is by defining $J_{\max }:=\max _{i^{\prime }j^{\prime }}\big\lbrace \big|J_{i^{\prime }j^{\prime }}^{{\rm derived}}\big|,\big|J_{i^{\prime }j^{\prime }}^{{\rm original}}\big|\big\rbrace$ , and using a relative difference defined as

$\begin{equation} \Delta J_{ij}:=\frac{1}{2J_{\max }}\left|J_{ij}^{{\rm derived}}-J_{ij}^{{\rm original}}\right|, \end{equation} \tag{ 41 }$

where ΔJ_ij ∈ [0, 1] ∀ i, j . In addition, the quality of reconstruction Q_α may be posed as the fraction

$\begin{equation} Q_{\alpha }:=\frac{1}{N^{2}}\sum _{i,j}H((1-\alpha )-\Delta J_{ij})\in [0,1] \end{equation} \tag{ 42 }$

of connection strengths which are assumed to be correct. Here α ⩽ 1 is a constant employed to set the required accuracy for predictions and H is the Heaviside function, H(x) = 1 for x ⩾ 0 and H(x) = 0 for x < 0. For instance, α = 0.95 means that the derived matrix has a normalized relative error (41) of at most 5%. Moreover, we may estimate the minimum number of experiments

$\begin{equation} M_{q,\alpha }:=\min \left\lbrace M|Q_{\alpha }(M)\ge q\right\rbrace \end{equation} \tag{ 43 }$

required for a reconstruction with a quality level q and with a prediction accuracy α. Figure 5 illustrates these measures for random networks of phase-locked Kuramoto oscillators for several random topologies and parameters.

The driving response method in principle may be applied to a broad variety of problems involving stable dynamics. A model analogous to equation (39) could be inferred as long as the systems may be linearized around a stable state, allowing to retrieve the topology from the network responses as above. Yet, there may be practical problems. For instance, even for perturbations induced by constant driving signals, the invariant solution resulting from perturbations to more complex periodic orbits or other stable invariant sets may be describable only by time dependent quantities (and not, e.g., temporally constant phase differences), limiting the approach suggested above to specific classes of systems.

3.5. Features and restrictions

One common advantage of the approaches presented above is that their required computational effort scales well (weaker than linearly) with system size N such that at least moderately large systems appear accessible (cf figure 5). At the same time, the approaches are relatively simple to realize because they do not require knowledge in higher mathematics or computational approaches beyond a basic standard.

A possible route of generalization is to combine some of the above approaches. For instance, one may first drive a system to a stable fixed point as in section 3.1 and then apply small perturbations around that new point as in section 3.3.

Yet, all these approaches require the researchers to be able to access (measure and drive) the dynamics of all units in the system. Moreover, the local dynamics as well as the (approximate) form of interactions typically need to be at least partially known. The collective dynamics suitable for the driving-response approaches described above also need to be simple, in fact to exhibit a stable fixed point or periodic orbit or to admit the system to be driven to such as state. Finally, the presented inference of the existence of physical interactions and their functional form [46] seems well understood for networks of phase-oscillators, where perturbations in oscillation amplitude decays on faster time scale than the relaxation of phases. It thus remains an open problem how to use a driving-response approach to properly infer structural network connectivity of coupled oscillators in systems, where the amplitude degrees of freedom play a role or are even dominant. More generally, systems exhibiting more complex dynamics, such asynchronous chaotic activity, bifurcations, multistability or other prevalent features of high-dimensional, nonlinear systems, currently still prevent network reconstruction by the methods presented above.

These requirements severely restrict the range of applicability in praxis to simple, well-accessible systems only. In particular for biological systems such as neural circuits or gene interaction networks, dynamics are typically more complex, systems are large and it is still hard to implement controlled large-scale driving experiments on the single-unit level. Direct methods (section 5) that do not rely on driving the system seem to offer viable directions towards reconstructing networks with such more complex dynamics as well. A currently open question of research constitutes how to exploit recorded time series from only a fraction of the units.

4. Copy-synchronization: adapting a model copy

Another way of reconstructing network topology of a given network is by adapting the topology of a second, model system, a network copy, such that it synchronizes with the original system. The idea is to update the model topology continuously until the copy system exhibits a dynamics identical to the original system; the rationale is that the final topology of the copy is likely to be the original topology [47].

Specifically, consider an (original) system of the form

$\begin{equation} \dot{x}_{i}=f_{i}(x_{i})+\sum _{j=1}^{N}J_{ij}g_{j}(x_{j}), \end{equation} \tag{ 44 }$

where f_i and g_ij are known and assumed to be Lipschitz continuous. This original system can in principle have arbitrary dynamics. Now, let us propose a model copy

$\begin{equation} \dot{y}_{i}=f_{i}(y_{i})+\sum _{j=1}^{N}K_{ij}g_{j}(y_{j})+I_{i}, \end{equation} \tag{ 45 }$

where y_i represents the state of the copy system, I_i(t) are control feedback signals and K_ij(t) is current coupling strength in the test system. In order to synchronize the copy to the original system, both the feedback signals I_i(t) and the coupling strengths K_ij(t) evolve in time and depend on the states of the actual and the copy system. Defining the synchronization error

$\begin{equation} e_{i}=y_{i}-x_{i} \end{equation} \tag{ 46 }$

we adapt the coupling strengths in the model copy according to

$\begin{equation} \dot{K}_{ij}=-\gamma _{ij}g_{j}(y_{j})e_{i} \end{equation} \tag{ 47 }$

and

$\begin{equation} I_{i}=-\alpha e_{i} \end{equation} \tag{ 48 }$

where α > 0 and γ_ij > 0. We remark that here the local dynamics f_i(.) as well as the coupling function g_ij(.) need to be known in order to set up the test copy. It was proven in [47] that under feedback signals (48) with sufficiently large α, the synchronization error e_i decreases in time, $\dot{e}_{i}\le 0$ for all i such that the two systems converge to a synchronized state. The rationale is that after synchronization, the copy topology equals that of the original network, K_ij ≈ J_ij, cf figure 6. With minor modifications on the control signals I_i this method admits to reconstruct networks and sub-networks in the presence of disturbances and modeling errors as well [47].

**Figure 6.** **Revealing network topology via copy-synchronization.** Dynamics of reconstructed network for directionally coupled Kuramoto oscillators $\dot{x}_{i}=\omega _{i}+k^{-1}\sum _{j=1}^{N}J_{ij}\sin (x_{j}-x_{i})$ with random frequencies ω_i ∈ [ − 1, 1], random coupling strengths *J_ij* ∈ [0.5, 1], N = 10 and k = 3 random incoming connections per node. (a) Dynamics of synchronization error between the original and the copy network measured as $e(t)=\sqrt{N^{-1}\sum _{i}e_{i}^{2}(t)}$ . (b), (c), (d) and (e) show the inferred topologies at t = 0, t = 2, t = 500 and t = 10⁴. The matrices are gray-coded from white (*J_ij* = 0) to black (*J_ij* = 1). The insets depict the element-wise absolute differences $|J_{ij}^{{\rm derived}}-J_{ij}^{{\rm original}}|$ , plotted on the same scale.
Download figure:
Standard image High-resolution image

We remark that to the best of our knowledge, there is no guarantee that the resulting topology of the copy system actually reflects the one of the original network. In particular, symmetries might lead to disambiguities.

Further, the method based on copy-synchronization is model dependent such that knowing the intrinsic and coupling functions of units is vital, as in several parts of section (3). Its applicability has been explicated for sample networks of up to N = 16 nodes, but it remains unclear how to handle large networks as of the order of N² links K_ij need to be co-evolved in time and a bound of convergence times is currently missing. At the same time, the copy approach does not require perturbations to the original systems, so experimental access to it need not include driving access to its units. Interestingly, Yu et al [47] highlight that the method may be useful to track changes in a network's connectivity in real time.

5. Direct approaches

In the previous two sections, we have introduced methods to infer network connectivity by either interfering with the system (driving-response approaches, section 3) or by setting up and synchronizing a second, model system (section 4). Both classes of methods work if certain requirements are met (in particular, the option to actively drive the system or the option to synchronize the copy with the original system, respectively). It remains a challenge to infer network topology from dynamics without such requirements.

5.1. Reconstruction by purely observing dynamics^⁵

Methods based on copy-synchronization (section 4) assume that the local dynamics f_i as well as the interaction functions g_ij in (1) are known and the g_ij do not depend on the state of unit i, g_ij(x_j). Inferring network connectivity, i.e. the J_ij, then relies on the construction of a second, model network, with dynamics governed by equation (1) and network parameters $J^{\prime }_{ij}$ that are tuned to that of the real network by an error minimization procedure. As noted recently [12], one may solve the same reconstruction problem with significantly reduced efforts and reduced requirements by evaluating the states and their time derivatives directly from the time series recorded from the original system. In particular, such a simple direct method [12] removes the need to set up and synchronize a second, copied, system.

The idea is as follows. If the local dynamics and the coupling functions are known, their evaluations at different times are also known from recorded time series and the only remaining unknown parameters in equation (1) are the coupling strengths, which are to be determined. Specifically recording a time series x_i(t_m) at sufficiently closely spaced times $t_{m}\in \mathbb {R}$ and estimating the temporal derivatives⁶ of it yields the dynamics of the ith unit given by

$\begin{equation} \dot{x}_{i}(t_{m})=\mathit {f_{i}(x_{i}(t_{m}))+\sum _{j=1}^{N}J_{ij}g_{ij}(\mathit {x{}_{i}(t_{m}),x}_{j}(t_{m})).} \end{equation} \tag{ 49 }$

If there are M such times, m ∈ {1, ..., M}, we have M equations of the form

$\begin{equation} \dot{x}_{i,m}=f_{i,m}+\sum _{j=1}^{N}J_{_{ij}}g_{ij,m}\, \end{equation} \tag{ 50 }$

with abbreviations $\dot{x}_{i,m}:=x_{i}(t_{m})$ , f_{i, m} := f_i(x_i(t_m)) and $g_{ij,m}:=g_{ij}(\mathit {x{}_{i}(t_{m}),x}_{j}(t_{m}))$ . Repeated evaluations of the equations of motion (49) of the system at different times t_m thus comprise a simple and implicit restriction on the network topology J_ij as follows: writing $Y_{i,m}=\dot{x}_{i,m}-f_{i,m}$ and the matrix $X_{i}=(g_{ij,m})_{j,m}\in \mathbb {R}^{N\times M}$ , these equations constitute the matrix equation

$\begin{equation} \boldsymbol{Y}_{i}=\boldsymbol{J}_{i}X_{i} \end{equation} \tag{ 51 }$

where $\boldsymbol{Y}_{i}\in \mathbb {R}^{1\times M}$ and $\boldsymbol{J}_{i}\in \mathbb {R}^{1\times N}$ is the ith row of the interaction matrix J, comprising the vector (J_ij)_{j ∈ {1, ..., N}} of all input coupling strengths to unit i.

The main restricting equations (51) again have the same form as the generic restrictions (11) and thus may be solved analogously. In [12], Euclidean L₂-norm minimization was used to infer the topology. Numerical tests show that reconstruction works well for transient as well as attractor dynamics, for simple as well as complex, chaotic units and collective states, and even in the presence of noise that substantially alters the dynamics and thus the recorded time series. Figure 7 illustrates successful reconstruction in the presence of various levels of noise.

**Figure 7.** **Reconstructing a random network complex chaotic dynamics.** Example of N = 32 Lorenz oscillators with dynamics given by $\dot{x}_{i}=\sigma _{i}(y_{i}-x_{i})+\sum _{j=1}^{N}J_{ij}(x_{j}-x_{i})+\xi _{i}^{(x)},\,\dot{y}_{i}=x_{i}(\rho _{i}-z_{i})-y_{i}+\xi _{i}^{(y)},\,\dot{z}_{i}=x_{i}y_{i}-\beta _{i}z_{i}+\xi _{i}^{(z)}$ , with unknown parameters ∀ i, j {*J_ij*, σ_i, ρ_i, β_i} in the presence of external Gaussian white noise $\boldsymbol{\xi }_{i}$ of amplitude λ. (a) Dynamics of a unit in the absence (blue) and presence (black) of noise with λ = 5. Original (b) and reconstructed (c) connectivity, inferred parameters (f), and their errors compared to the original topology and parameters (e) and (g). (c) Receiver operating characteristics (ROC) of reconstruction under noise-free (blue) and noisy measurements (black and red) for λ ∈ {0.1, 1, 10}. Adapted from [12].
Download figure:
Standard image High-resolution image

**Figure 7.** **Reconstructing a random network complex chaotic dynamics.** Example of N = 32 Lorenz oscillators with dynamics given by $\dot{x}_{i}=\sigma _{i}(y_{i}-x_{i})+\sum _{j=1}^{N}J_{ij}(x_{j}-x_{i})+\xi _{i}^{(x)},\,\dot{y}_{i}=x_{i}(\rho _{i}-z_{i})-y_{i}+\xi _{i}^{(y)},\,\dot{z}_{i}=x_{i}y_{i}-\beta _{i}z_{i}+\xi _{i}^{(z)}$ , with unknown parameters ∀ i, j {*J_ij*, σ_i, ρ_i, β_i} in the presence of external Gaussian white noise $\boldsymbol{\xi }_{i}$ of amplitude λ. (a) Dynamics of a unit in the absence (blue) and presence (black) of noise with λ = 5. Original (b) and reconstructed (c) connectivity, inferred parameters (f), and their errors compared to the original topology and parameters (e) and (g). (c) Receiver operating characteristics (ROC) of reconstruction under noise-free (blue) and noisy measurements (black and red) for λ ∈ {0.1, 1, 10}. Adapted from [12].
Download figure:
Standard image High-resolution image

It is clear that certain types of dynamics do not allow topology inference. For instance, if all local dynamics are identical f_i ≡ f, all coupling functions are identical, g_ij ≡ g, and collectively the units are identically synchronous, i.e. all in the same states at the same times, there is no information about the connection topology one can possibly obtain from (only) recording time series, because for all strongly connected topologies⁷, the collective dynamics would be identical.

Further theoretical considerations show that following the same lines of analysis as above, all parameters occurring linearly in the local dynamics of coupling functions, can also be reconstructed by the same error minimization based on equation (11). For instance for the (fictional) coupling function g_ij(x, y) = asin (xy) − exp (b + y − x), the parameters a and b need not be known but can be inferred as well (up to one constant prefactor for each pair (i, j) of nodes), because both a and exp (b) occur linearly in the equations of motions (49). In many physical systems, parameters actually do occur linearly. These include, for instance, the dynamics of coupled classical electric LCR circuits and the strengths of diffusive (4) or direct linear coupling (6). Moreover, widely used model systems for networks of neurons, such as leaky integrate-and-fire (LIF) and quadratic integrate-and-fire neurons [50] or networks of coupled chaotic systems such as Rössler or Lorenz systems [51, 52] exhibit all or at least most parameters occurring linearly or affinely. For all such systems, local dynamics f_i and coupling functions g_ij may not or not completely be required to be known in advance.

Such a direct inference method [12] for network topology from complex collective dynamics thus serves as a simple starting strategy for connectivity reconstruction in a broad range of networked systems.

5.2. Pulse-coupling: reconstruction from spike patterns

Networks of spiking neurons or other pulse-coupled units are formally hybrid systems [53], because the continuous-time flow is interrupted at certain time points, where events (e.g. spike sending or reception) modify the dynamics [54, 55] via discrete time maps.

To start addressing the reconstruction problem for such hybrid systems, researchers have considered one of the simplest model networks of pulse-coupled units based on LIF models. Such network models are most broadly used in mathematical analysis and computational modeling. Each unit in such a network has one state variable, its membrane potential, roughly emulates leaky capacitive properties observed for membranes of biological neurons and is complemented with a threshold where a pulse (action potential or spike [56]) is artificially created and the membrane potential is reset. Although these models lack certain biological details, such as a natural spike generating mechanisms, they are simple enough to be studied analytically and they have been useful for furthering our understanding how the information is processed among neurons [57].

Explicitly, the membrane potential V_i(t) of a LIF unit i changes as in an RC circuit (resistor and a capacitor connected in parallel) according to

$\begin{equation} \dot{V}_{i}=\gamma _{i}(R_{i}I_{i}-V_{i})+S_{i}(t), \end{equation} \tag{ 52 }$

where R_i, γ_i and I_i are the membrane resistance, inverse membrane time constant and the external current applied to neuron i, respectively. Once the potential of a unit j crosses a threshold, V_j(t) ⩾ V_{T, j}, the potential is reset to V_j(t⁺) = V_{R, j} and the unit emits a pulse that it transmitted to the neuron's post-synaptic neighbors [58]. The time of this pulse sending event is remembered as the unit's mth spike time t_{j, m} := t. The collection of such pulses then define the actions onto post-synaptic units i such that the quantity

$\begin{equation} S_{i}(t)=\sum _{j=1}^{N}\sum _{k\in \mathbb {Z}}J_{ij}\delta (t-t_{j,m}-\tau _{ij}), \end{equation} \tag{ 53 }$

in (52) represents the pulses that unit i receives from the rest of the network. Here, J_ij and τ_ij are the synaptic coupling strength and the synaptic transmission delay from unit j and i, respectively. Furthermore, δ(.) is the Dirac delta distribution modeling a potential response kernel that is much faster than all other time scales involved.

The main question we address now is whether and how the network connectivity, as specified by the matrix of coupling strength J_ij can be reconstructed if the pattern of spikes times $\left(t_{j,m}\right)_{j\in \lbrace 1,\ldots ,N\rbrace ,m\in \mathbb {Z}}$ is given? We remark that we do not assume access to the natural state variables, the membrane potentials V_i(t), which may not be observable, but only to the spike times that are generated by the dynamics of these potentials. This difference constitutes the main novelty of the approaches presented in this subsection compared to those for continuous time state variables presented before.

First observe that (52) has an explicit solution [35] given by

$\begin{equation} V_{i}(t)=R_{i}I_{i}(1-{\rm e}^{-\gamma _{i}(t-t_{0})})+V_{i}(t_{0})\,{\rm e}^{-\gamma _{i}(t-t_{0})}+\sum _{j=1}^{N}\sum _{t_{0}<t_{j,m}+\tau _{ij}\le t}J_{ij}\,{\rm e}^{-\gamma _{i}(t-t_{j,m}-\tau _{ij})}. \end{equation} \tag{ 54 }$

if the time interval [t₀, t) lies in between two subsequent spikes of neuron i. The first sum is taken over all neurons while the second is taken over those spikes generated by the other neurons j ≠ i that affect i in the time interval [t₀, t).

Based on this solution, we now present two distinct approaches to infer network connectivity, one direct and exact inference method assuming oscillatory units and one coarse approximate method based on stochastic optimization.

5.2.1. Direct topology inference assuming oscillatory units

Van Bussel et al [35] assumes that all the parameters of (54), besides the synaptic couplings J_ij, are known. The rationale is that delays and membrane time constants as well as a unit's equilibrium potential R_iI_i may be estimated beforehand. How could these coupling strengths and thus the structural network topology be inferred? If the currents I_i are sufficiently large such that R_iI_i > V_{T, i} the units are oscillatory such that they create spike even without recurrent inputs from the remaining network. After crossing the threshold V_{T, i}, unit i sends a spike to its post-synaptic neighbors and is reset to a resting value V_{R, i}, so that the unit's state at the boundaries of each inter-spike interval [t_{i, k − 1}, t_{i, k}) is determined and one may take t₀ = t_{i, k − 1} and t = t_{i, k}. However, these threshold crossings may be induced in two manners at t = t_{i, k − 1}, (a) by incoming excitatory spikes from other neurons such that V_i(t⁻) < V_{T, i} but $V_{i}(t^{-})+\sum _{\left\lbrace j:\exists m:t_{j,m}+\tau _{ij}=t\right\rbrace }J_{ij}>V_{T,i}$ or (b) by the intrinsic oscillation of the unit such that V_i(t⁻) = V_{T, i} without simultaneously incoming spike(s). In both cases, the membrane potential is at reset value immediately after sending a spike. So identifying t₀ = t_{i, k − 1} in (54), we have

$\begin{equation} \begin{array}{c}V_{i}(t_{0}):=V_{i}(t_{i,k-1})=V_{R,i}.\end{array} \end{equation} \tag{ 55 }$

If the next spike is oscillation-induced (b), the membrane potential approaches its threshold value V_{T, i} continuously such that in addition

$\begin{equation} \begin{array}{c}V_{i}(t^{-}):=\begin{array}{c}V_{i}(t_{i,k}^{-})=V_{T,i}\end{array}\end{array} \end{equation} \tag{ 56 }$

where we now identified t = t_{i, k} in (54). Thus, each oscillation-induced spike at some t = t_{i, k} implies a linear restriction of the form (54) for the coupling strengths J_ij by equating [t₀, t) = [t_{i, k − 1}, t_{i, k}).

For M different inter-spike intervals obeying (55) and (56), this provides a linear system of equations restricting the coupling matrix. However, as van Bussel et al [35] remark, consecutive intervals may display similar patterns, such that it is often advisable to select M > N inter-spike intervals sufficiently separated in time, to minimize correlations between intervals and thus numerical inaccuracies. For each i, defining the subset of inter-spike intervals

$\begin{equation} D_{i,m}:=(t_{i,l_{m}-1},t_{i,l_{m}}) \end{equation} \tag{ 57 }$

and

$\begin{equation} T_{i,m}=t_{i,l_{m}}-t_{i,l_{m}-1}=|D_{i,m}| \end{equation} \tag{ 58 }$

equation (54) may be rewritten as

$\begin{equation} X_{i}\boldsymbol{J}_{i}^{T}=\boldsymbol{Y}_{i}, \end{equation} \tag{ 59 }$

where J_i := (J_ij)_{j ∈ {1, ..., N}} and $X_{i}\in \mathbb {R}^{M\times N}$ and $\boldsymbol{Y}_{i}\in \mathbb {R}^{M\times 1}$ are given by

$\begin{equation} (x_{i})_{mj}=\sum _{t_{j,k}+\tau _{ij}\in D_{i,m}}J_{ij}\,{\rm e}^{-\gamma _{i}(t_{i,l_{m}}-t_{j,k}-\tau _{ij})} \end{equation} \tag{ 60 }$

and

$\begin{equation} Y_{i,m}=V_{T,i}-R_{i}I_{i,m}(1-{\rm e}^{-\gamma _{i}T_{i,m}})-V_{R,i}\,{\rm e}^{-\gamma _{i}T_{i,m}}. \end{equation} \tag{ 61 }$

Repeating the same process for all units i yields the topology of the whole network. As shown in section 3.2, the overdetermined problem, M > N, and the undetermined, M ≪ N, may be solved minimizing the L₂-norm or maximizing the sparseness of the network, respectively. A major limitation is that the presented approach requires a large amount of prior knowledge about the system, such as the time delays between units and their time constants, among others. Given this knowledge, the approach is capable of inferring large networks of neurons as the computation time, as well as the number M of required inter-spike intervals to be evaluated scales linearly with system size [35, 59].

5.2.2. Stochastic optimization of all systems parameters

In a complementary study, Makarov et al [60] showed how stochastic optimization of all system parameters

$\begin{equation} \boldsymbol{p}_{i}=\left(\gamma _{i},R_{i},I_{i},\tau _{ij},J_{ij}\right)_{i,j\in \lbrace 1,\ldots ,N\rbrace } \end{equation} \tag{ 62 }$

given the spike trains for all neurons in the network may yield a network topology roughly consistent with the actual one.

The idea is to evaluate the predicted inter-spike interval durations $\bar{T}_{i,m}(\boldsymbol{p})$ from a test set of parameters and to optimize those parameters to reproduce a given (measured) spike train as closely as possible. Thus, minimizing the difference between the predicted and measured spike trains is vital for this method. The authors in particular applied their idea also to recordings of biological neurons.

As the equation (54) yields transcendental relations for the $\bar{T}_{i,m}(\boldsymbol{p})$ , their estimates need to be determined numerically. Thus, finding

$\begin{equation} \boldsymbol{p}_{i}^{*}=\arg \min _{\boldsymbol{p}_{i}}\, E_{i}(\boldsymbol{p}_{i})=\arg \min _{\boldsymbol{p}_{i}}\sum _{m=1}^{M}(T_{i,m}-\bar{T}_{i,m}(\boldsymbol{p}_{i}))^{2} \end{equation} \tag{ 63 }$

yields the best (in Euclidean distance norm) solution for the set of parameters.

Briefly, to find the minimum $\boldsymbol{p}_{i}^{*}$ in (63) one must explore the parameters space. This means that the solution is found through iteratively choosing random values for p_i and comparing the value of (63) for consecutive iterations. By relating the changes in (63) with the changes in p_i one may establish directions in which the minimum may be found by gradient descent. However, it is also advisable to change directions in the parameters space randomly. Mainly, because there may better solutions for $\boldsymbol{p}_{i}^{*}$ in regions of the parameters space where they are not expected to exist [61]. Makarov et al [60] thus applied stochastic optimization for searching the minimum.

As a strong requirement, this method needs the number of recorded spikes to be M≫2N; as noted in [60], robust regression models are more suitable to handle this kind of problems and special care with the inter-spike intervals must be taken as, e.g., spike bursts may lead to false intervals.

5.3. Features and limitations

The stochastic optimization approach provides a generic approach in finding best fitting parameters and thus here, potentially a well fitting network; yet, it is computationally demanding and simultaneously requires many recordings compared to the size of the network. In contrast, the direct approach assumes a large amount of pre-knowledge, in particular regarding the unit's parameters. In addition, by requiring to pre-process the data sets (i.e. choosing appropriate time intervals), the minimization problem is no longer required to deal with transcendental relations. This leads to a considerable increase on the computational performance and it is the key factor that makes the method suitable for large networks. We remark that still, both methods are currently not suitable in our opinion to reliably infer network topology from real recordings of spike data, because of several reasons: For instance, other network-external sources of spike generation, stochastic fluctuations due to intrinsic noise and the highly specific conditions required for reconstruction still limit the methods applicability. Finally, both approaches assume linearity of interactions, yet it is known that interactions can be nonlinear, e.g. due to dendritic spikes in response to sufficiently strong, simultaneous inputs to single dendritic branches of a neuron [62–64].

6. Technical issues

Let us briefly comment on four technical issues related to the structural inference approaches discussed above. We mention how some of them may be directly transferred to settings, where discrete time dynamics describes the system (section 6.1), discuss issues when measuring data from real, e.g. biological networks (section 6.2), remark on L₁ versus L₂ minimization (section 6.4) and finally discuss what happens for hyper-networks, where more than two-point interactions influence the collective state (section 6.5).

6.1. Discrete time maps

If the dynamical system is described as a network of coupled discrete time maps instead of by coupled differential equations (2), approaches similar to the ones presented above are often viable in slightly modified form. For instance, the core equation becomes (for one-dimensional local units)

$\begin{equation} \fbox{$ x_{i}(t+1)=f_{i}(x_{i}(t))+\sum \limits_{j=1}^{N}J_{ij}g_{ij}(x_{i}(t),x_{j}(t))+I_{i}(t)+\xi _{i}(t)$} \end{equation} \tag{ 64 }$

where now the time $t\in \mathbb {Z}$ is an integer. Here, the direct method of subsection 5.1 is immediately applicable, even with the additional advantage that no temporal derivatives need to be estimated such that an observed time series (x_i(t))_{i ∈ {1, ..., N}, t ∈ {1, ..., M}} enters the inference equations without approximations. Of course, that time series may still contain substantial measurement errors that propagate into any solution of the inference problem.

6.2. Sampling dynamics of real networks

When sampling real life phenomena, measurement constraints may arise due to the nature of the given phenomena, making the study of such a major challenge. For instance, in EEG analysis, a group of sensors is employed to measure the brain activity. These recordings help in finding functional connections among regions in the brain. Basically, each sensor measures the activity of a population of neurons below its active area for a certain amount of time. Then, by analyzing the correlations among the time series of each sensor, the interaction network is constructed. Nevertheless, as remarked by Bialonski in [65], special considerations regarding the spatial and temporal sampling must be accounted for. A simple constraint is that sensors placed too close to each other, may record overlapping activity from neighboring populations, thus increasing the difficulty to discern between direct and indirect interactions. Moreover, if the size and frequency of the sampling fails to capture the intrinsic time scales, false functional connections in network may be constructed. Clearly this example is particularly relevant for inferring effective connectivities, as discussed in section 7. As an example regarding the time domain, gene and protein interaction networks often still have a very limited number of sampling points available [66, 67] such that a network inference problem may become drastically under-determined. Additional information such as the known existence of certain interactions in such a network (but not the existence of others) sometimes is available but how precisely to use this is currently unknown and requires further method development. So briefly, when studying real networks, special technical considerations should be accounted for to actually use what it is measured (the data) for what one wants to know about the system, cf section 1.2.

6.3. L₁ norm minimization as a linear program

Given the optimization problem

$\begin{equation} \min _{\boldsymbol{y}}\left\Vert A\boldsymbol{y}-\boldsymbol{b}\right\Vert _{1}, \end{equation} \tag{ 65 }$

where $A\in \mathbb {R}^{M\times N}$ , $\boldsymbol{y}\in \mathbb {R}^{N}$ and $\boldsymbol{b}\in \mathbb {R}^{M}$ , (65) may be posed as [31]

$\begin{equation} \min _{\boldsymbol{s}}\boldsymbol{1}^{\mathsf {T}}\boldsymbol{s}\quad {\rm s.t. }\begin{array}{@{}c@{}}A\boldsymbol{y}-\boldsymbol{b}\preceq \boldsymbol{s}\\ A\boldsymbol{y}-\boldsymbol{b}\succeq -\boldsymbol{s} \end{array}, \end{equation} \tag{ 66 }$

where $\boldsymbol{1}\in \mathbb {R}^{M}$ is a vector of ones and ⪯ and ⪰ denote entry-wise comparison and ${\boldsymbol s}$ is an auxiliary variable. To solve (66), one has to solve the linear program

$\begin{equation} \min _{x}\tilde{\boldsymbol{c}}^{\mathsf {T}}\boldsymbol{x}{\quad {\rm s.t.}}\quad \tilde{A}\boldsymbol{x}=\tilde{\boldsymbol{b}}, \end{equation} \tag{ 67 }$

where

$\begin{equation} \boldsymbol{x}=\left[\begin{array}{@{}c@{}}\boldsymbol{y}\\ \boldsymbol{s} \end{array}\right],\quad \tilde{A}=\left[\begin{array}{@{}cc@{}}A & -I\\ -A & -I \end{array}\right],\quad \tilde{\boldsymbol{b}}=\left[\begin{array}{@{}c@{}}\boldsymbol{b}\\ -\boldsymbol{b} \end{array}\right],\quad \tilde{\boldsymbol{c}}=\left[\begin{array}{@{}c@{}}\boldsymbol{0}\\ \boldsymbol{1} \end{array}\right] \end{equation} \tag{ 68 }$

and $I\in \mathbb {R}^{M\times M}$ and $\boldsymbol{0}\in \mathbb {R}^{N}$ are the identity matrix and a vector of zeroes, respectively. The advantage of posing problem (65) as (67) is that the latter can be easily solved in a standard way by implementing any solver for linear programs (e.g. the linprog function in MATLAB [32]).

6.4. L₂ versus L₁ norm minimization

The need to choose a minimization scheme may turn the reconstruction problem into a great challenge, because different schemes may yield different solutions, thus forcing us to discern which scheme is best suited for our purposes in specific reconstruction problems. As illustrated in [1, 10, 29, 35, 59, 60, 68] these differences between minimizers may be exploited in certain situations. For instance, the L₂ minimizer finds the closest solution in the L₂-norm and moreover has an analytic solution. Yet, given the nature of the minimizers (check [68]), an L₁ minimizer is suited for finding particular sparse solutions and it is more robust to outliers than L₂, so it might be seen as more useful for applications. However, minimizing an L₁ norm is computationally more costly compared to the L₂ and it may have more than one solution [31]. We note that ||p||₂ ⩽ ||p||₁ for any vector $\boldsymbol{p}\in \mathbb {R}^{N}$ .

6.5. Hypernetworks: beyond two-point interactions

We have explicitly excluded networks with more than two-unit interactions from our general mathematical description (2) or dynamical networks with temporally changing connections. Those may occur, for instance in networks of computers, or other communication networks of engineering, where inputs from several units that give input to the same other unit are nonlinearly processed (for instance, multiplied) to change the latter units' state. Similarly, non-additive dendritic interactions in neurons [20, 63], where two simultaneously received synaptic inputs in close spatial proximity initiate so-called dendritic spikes and thereby a nonlinearity [62], naturally imply three-point interactions.

We remark that direct three-unit and higher order interactions (i.e. three-term and higher order products such as $x_{i}^{d}x_{j}^{d^{\prime }}x_{k}^{d^{\prime \prime }}$ where i, j, k are mutually different) are omitted in (1) because they refer to dynamical systems on hypernetworks, thus going beyond the scope of this review. Such third order and higher order terms are not covered by (1), firstly, because the notation stays much simpler without them, but secondly and more importantly, because there seem to be few major theoretical results on reconstructing such systems, if any, that may be or become of practical use. Yet, in social, communication and information networks, such questions may soon become of interest as 'big data' are pouring in.

A brief introduction to the dynamics of complex hypernetworks is given in [17].

7. Effective connectivity: correlation-based methods

7.1. Linear correlation and covariance

One common way of quantifying effective connectivity is to measure scalar time series x(t) = (x₁(t), ..., x_N(t)), t ∈ {t₁, ...t_M} from the N units of a system (spike rates of neurons, expression levels of genes, ...), compute their (temporal) averages μ_i = 〈x_i(t)〉_t and variances $\sigma _{i}^{2}=\langle (x_{i}(t)-\mu _{i})^{2}\rangle _{t}$ and from those compute the covariance matrix

$\begin{equation} \mbox{Cov}_{ij}:=\frac{\langle (x_{i}(t)-\mu _{i})(x_{j}(t)-\mu _{j})\rangle _{t}}{\sigma _{i}\sigma _{j}}. \end{equation} \tag{ 69 }$

Here the averages are time averages

$\begin{equation} \left\langle y_{i}(t)\right\rangle _{t}:=M^{-1}\sum _{m\in \lbrace 1,\ldots ,M\rbrace }y_{i}(t_{m}). \end{equation} \tag{ 70 }$

If multiple measurements are available, averages may be taken over ensembles of experiments rather than (or in addition to) temporal averaging. The obtained covariance matrix is then often thresholded, either just choosing a heuristic value or according to some measure of statistical significance (against an appropriate null hypothesis) and the resulting non-zero values interpreted either as 'strength' of effective connectivity (weighted connectivity matrix) or just as existence (adjacency matrix) of a functional link between units. Sometimes, the value of a threshold is systematically varied and changes in resulting connectivity evaluated. Correlation, covariance and other linear algebra measures are commonly used, often complemented by nonlinear operations such as thresholding, to analyze, for instance functional magnetic resonance imaging (fMRI) or other spatio-temporal data [69]. We note that correlation measure may be (mathematically) seen in a number of different ways [70].

7.2. Entropy maximization

Entropy measures the uncertainty associated with a given probability distribution and constitutes a key quantity in information theory [71]. Given the probabilities of a set of events, the entropy measures how uncertain, on average, the occurrence of an event is; or in other words, how much information, on average, one obtains by measuring the occurrence of that event knowing the probability distribution of events. Reversely, given a collection of (observed) data points, we can choose probabilities to maximize the entropy. Such a distribution, known as a maximum entropy probability distribution, would be the least biased distribution possible and any other would require further assumptions on the nature of the problem [72]. For a network dynamical system, i.e. systems of coupled dynamical units, we can ask what the effective interactions are such that the probability distribution that best describes the data (averages and correlations) has maximum entropy.

Specifically, the principle of maximum information from Bayesian statistics postulates the 'most likely' probability ρ(x) of measuring x given the same type of time series data x(t) = (x₁(t), ..., x_N(t)) is the one maximizing the information obtained from measuring one more state y. More precisely, the goal is to find the probability ρ(x) that maximizes Shannon entropy

$\begin{equation} S=-\sum _{\boldsymbol{x}}\rho (\boldsymbol{x})\ln \rho (\boldsymbol{x}) \end{equation} \tag{ 71 }$

under the constraints that the first and second moments are consistent with those estimated from the data,

$\begin{equation} \left\langle x_{i}\right\rangle _{\rho }:=\sum _{\boldsymbol{x}}x_{i}\rho (\boldsymbol{x})\overset{!}{=}\left\langle x_{i}(t)\right\rangle _{t} \end{equation} \tag{ 72 }$

and

$\begin{equation} \langle x_{i}x_{j}\rangle _{\rho }:=\sum _{\boldsymbol{x}}x_{i}x_{j}\rho (\boldsymbol{x})\overset{!}{=}\langle x_{i}(t)x_{j}(t)\rangle _{t}. \end{equation} \tag{ 73 }$

By restricting ourselves to these conditions (and no others), we here focus on pairwise interactions and neglect three-point and higher order coupling that arise in hypernetworks (cf section 6.5 and [17]). This yields (appendix A) the probability distribution of the form

$\begin{equation*} \rho (\boldsymbol{x})=Z^{-1}\exp \left(\sum _{i=1}^{N}h_{i}x_{i}+\frac{1}{2}\sum _{i=1}^{N}\sum _{j=1}^{N}\hat{J}_{ij}x_{i}x_{j}\right) \end{equation*}$

where $Z=\int _{\mathbb {R^{N}}}\exp \big(\sum _{i=1}^{N}h{}_{i}x_{i}+\frac{1}{2}\sum _{i=1}^{N}\sum _{j=1}^{N}\hat{J}_{ij}x_{i}x_{j}\big)d^{N}x$ is a normalization constant and h_i and $\hat{J_{ij}}$ are parameters to be chosen such that (C.3) and (C.4) hold. The quantities $\hat{J}_{ij}$ are interpreted as effective couplings between units i and j.

If the matrix of covariances between the N time series is

$\begin{equation} C_{ij}=\langle x_{i}(t)x_{j}(t)\rangle _{t}-\langle x_{i}(t)\rangle _{t}\langle x_{j}(t)\rangle _{t}. \end{equation} \tag{ 74 }$

the effective coupling matrix is given by its inverse (appendix A)

$\begin{equation} \hat{J}=C^{-1}. \end{equation} \tag{ 75 }$

This means that the best available probability distribution (i.e. that yielding the maximum information) is given by second order effective coupling strengths determined by (but by no means identical to) the linear correlation matrix.

This concept is applied to a range of different systems, in particular in biology, including gene networks [73], protein networks [74] and neural circuits [75]. We comment on an approach originally suggested by Bialek and coworkers [75, 76] for revealing in how far two-point interactions characterize the coupled dynamics of neural circuits in retina. In fact, a systematic study of neural activity in the retina of larval tiger salamander and guinea pig revealed that 90% of the multi-information (which measures all correlative dependences in a system [76]) is covered by pairwise correlations only [75]. The authors took this finding as a sign that pairwise interactions well characterize the full network dynamics and that higher order interactions may be neglected. Specifically, they temporally discretized neural responses into '1' or '0' states depending on whether a neuron was or was not active during each considered time interval of generically 20 ms. Thus, the state of the entire (observed) network at each time interval is given by an N-dimensional word composed of the binary components of the N neurons. As sometimes done for effective connectivity, researchers often tend to go beyond what Bialek and coworkers concluded and interpret the effective coupling strengths (75) as actual physical interactions of experimentally observed units. However, it is typically not clear how correlative dependences yield information about direct physical interactions and such that such interpretations are not justified without substantial further knowledge about the system.

7.3. Further in finding effective connectivity

We would like to emphasize that there is a multitude of additional approaches for finding effective or functional connectivities. For instance, a range of methods are based on information-theoretic measures such as mutual information [77], transfer entropy [78] and Granger causality [79, 80] and extensions thereof. In addition, there are various methods relying on Bayesian statistics or explicit or implicit modeling or extended regression such as used in generalized linear models [81]. In particular in biological sciences, such statistical methods have been used early on even at times not many or not very reliable data were available, see for instance [82] for an early study regarding effective connectivity in neural circuits.

There are a number of open challenges regarding both precision of such methods and the interpretation of the respective results. For instance, fMRI experiments of brain areas may rely on differences in blood oxygen level (so-called BOLD signals) as observables but often aims to relate actual dependences in neural spiking activity. The reasoning here is that the cell metabolism and thus the blood oxygen consumption in a local region (typically one cubic millimeter) of the brain is larger the more action potentials per time are generated by the (10⁴–10⁶) neurons in that region so that such approaches are not undisputed [83]. Moreover, the terms effective connectivity, functional connectivity and structural or anatomical connectivity are sometimes not well defined, used in different meanings across studies. There are even overlapping definitions of non-structural forms of connectivity, e.g. for functional connectivity, effective connectivity etc. Here we did not delve into historic waters and used the term 'effective connectivity' for all forms of connectivity that is not structural. Sometimes, effective connectivity is even taken as an indication for structural connectivity of physical interactions. For instance, distinguishing direct interactions from indirect influences may be an important issues (cf figure 1) that is not yet fully clarified [84].

Finally, we mention that for neural circuits [85] have devised a statistical method to find the couplings J_ij that optimize the likelihood that a class of integrate and fire models generates the spike trains observed in experiments. This statistical method assumes the same model class as the approach for inferring structural connectivity presented above (section 5.2) and its goal indeed is finding the (most likely) structural connectivity.

As a conclusive warning, we remark that in particular different methods exit for obtaining effective connectivity given the same data set; the results each provide information about specific features of the system: exactly those defined by the method. Some might almost coincide with others, some might be congruent with and some may well be contradicting a given structural connectivity (cf [86]).

8. Conclusion and open questions

This review focuses on how structural connectivity of networks may be inferred from dynamical features of the networks' nodes. It is on purpose on an introductory level and (given that the area of network reconstruction is rapidly growing simultaneously in different fields, from gene and neural networks to engineering systems) by construction only highlights some selected approaches, most of which based on a perspective of considering the collective nonlinear network dynamics. We provide basic approaches about finding effective or functional connectivities of a network from time series as a brief complement and to get a taste for essential differences in perspective. One main distinction between approaches for identifying qualitatively different types of connectivity is that structural inference, aiming to reconstruct real physical interactions, take into account the time evolution of a system. In contrast, finding effective connectivity is often based on a stationarity assumption and uses distributions of states, neglecting all or parts of their temporal evolution. Relating observable, possibly effective and physical connectivity, is at the heart of current interdisciplinary research [15, 87]. As also mentioned in the Introduction and briefly discussed in section 7, functional versus structural inference if often not well distinguished and, in particular in early publications, the qualitative difference in approach was sometimes not even mentioned.

The methods and approaches presented in this review aim to tell whether or not and how strongly units in a network directly interact with each other. This is in distinction to the entire field of system's identification [7] where the aim is to identify the rules underlying a dynamical system from time series and predict its (future) dynamics based on this identification. Systems identification for multi-dimensional systems, and thus in particular for large networks, seems intractably hard because even if the 'real' system is almost (but not entirely) perfectly reconstructed, predicting their future dynamics can be impossible due to chaos (sensitive dependence on initial conditions), exponentially many different collective states and uncontrolled external influences, with all these factors becoming typical for multi-dimensional complex systems. At the same time, as in part illustrated in this review figure 5, learning the existence of strengths of interactions only (and not the precise form of dynamics) may well be successful for much larger systems. We thus speculate that novel methods complementing those of systems identification may yield further insights into the interaction networks of various complex systems.

A number of key issues are not discussed in this review but still are sometimes pressing for progress research, in particular with respect to applications to real world settings. We list a few.

(1)
How much information can we actually access (cf [88])? Can we observe all the units of a network? Can we observe all dynamical variables (dimensions) of each unit? In which sense may it make sense to seek connectivity of nodes that are not observed?
(2)
What do we know a priori about the system? Are the functional forms of network interactions known? What can we say about stability or instability of the dynamics? Perhaps the dynamics even exhibits a complex mixture of stable and unstable dynamics [9, 89, 90].
(3)
What are reasonable (or possible) number of experiments or measurements that can be done and what is a clever trade-off against the required computational efforts that may depend on the quality and quantity of those measurements [23].
(4)
Which of the structural features are actually the most relevant and which are even possible. In biochemical and gene regulatory networks, electric circuits and power grids, for instance, many interaction links (or their absence) may have been identified by other methods not related to network collective dynamics. Under these circumstances, can we improve existing approaches to take such information into account?
(5)
How can we address genuinely stochastic dynamics, e.g. in excitable systems, possibly even induced by small number fluctuations—a pathway of though that links to non-equilibrium physics, cf [91]?
(6)
What if network connectivities change, for instance by synaptic plasticity in neural circuits [92–94] or other forms of adaptation, e.g. in social networks [95].
(7)
...think of your own, there are many more questions with great opportunity for scientific progress in a range of fields ...

Answers to all these questions will specifically restrict or enhance the space of optional networks consistent with recorded data. From an abstract point of view, these modify the form and dimensionality of the set of all possible networks and points to a direction that still addresses network dynamics as an inverse problem but goes beyond network reconstruction. Given all restrictions, perhaps we can design or control a network to robustly exhibit a specific functionality. Network design and control form two additional branches in the theory of network dynamical systems, with currently highly active research, from biological sciences to engineering [11, 96–99]. Such approaches might soon be extremely influential and thought-provoking, when, e.g., the neural and biochemical networks in our bodies as well as our infrastructure networks surrounding us can not only be reconstructed, but even controlled and specifically engineered. We recommend a view point by Freeman Dyson for a very practical taste [100].

Acknowledgments

We thank Srinivas Gorur Shandilya for help during project initiation and Raoul-Martin Memmesheimer, Mor Nitzan, Dirk Witthaut, David Gross, Matthias Wendland, Sarah Hallerberg and Hinrich Arnoldt for valuable discussions and constructive comments on parts of the manuscript. Partially supported by the International Max Planck Research School (IMPRS) Physics of Biological and Complex Systems (JC) the Ministry for Education and Science (BMBF), Germany, through the Bernstein Center for Computational Neuroscience, grant no. 01GQ1005B (MT) as well as by a grant by the Max Planck Society to MT.

Appendix A.: Multiple linear regression and L₂ −norm minimization

Multiple linear regression is a widely-used statistical tool for predicting values of a set of variables depending on one independent set of variables. Its main purpose is to infer a linear relationship between them. Thus, the relationship between sets is assumed to have the form

$\begin{equation} \boldsymbol{y}={{\boldsymbol{\beta }}}x+{\boldsymbol{\varepsilon }}, \end{equation} \tag{ A.1 }$

where $\boldsymbol{y}\in \mathbb {R}^{1\times M}$ are the values for a dependent variable, $\boldsymbol{\boldsymbol{\beta }}\in \mathbb {R}^{1\times K}$ the column vector of unknown linear coefficients, $x\in \mathbb {R}^{K\times M}$ is the set of values for the K −independent variables and $\boldsymbol{\boldsymbol{\varepsilon }}=(\varepsilon _{1},\ldots ,\varepsilon _{M})\in \mathbb {R}^{1\times M}$ are random errors ε_i, i ∈ {1, ..., M}. In addition, the errors ε_i are assumed to be independent random variables distributed according to a Gaussian distribution with mean μ = 0 and constant variance σ² [101].

The question is how to estimate β by some $\hat{\boldsymbol{\beta }}$ such that the difference between the predicted and real values, $\boldsymbol{{\boldsymbol{\beta }}}x$ and y, is minimized? Using L₂ minimization, the problem formally becomes

$\begin{equation} \hat{\boldsymbol{\beta }}=\arg \min _{\boldsymbol{\beta }}\Vert \boldsymbol{y}-\boldsymbol{\beta }x\Vert _{2}^{2}, \end{equation} \tag{ A.2 }$

also known as the linear least squares method [101].

The local extremum of the L₂ norm appearing in the right-hand side of (A.2) implies

$\begin{equation} \forall i\in \lbrace 1,\ldots ,K\rbrace \,:\,\,\,\frac{\partial }{\partial \beta _{i}}[(\boldsymbol{y}-\boldsymbol{\beta }x)(\boldsymbol{y}-\boldsymbol{\beta }x)^{T}]|_{\boldsymbol{\beta }=\boldsymbol{\hat{\beta }}}=0 \end{equation} \tag{ A.3 }$

such that

$\begin{eqnarray*} &&\frac{\partial }{\partial \beta _{i}}[\boldsymbol{yy}^{T}-2\boldsymbol{\beta }x\boldsymbol{y}^{T}-\boldsymbol{\beta }x(\boldsymbol{\beta }x)^{T}]|_{\boldsymbol{\beta }=\boldsymbol{\hat{\beta }}} = 0,\\ &&\Leftrightarrow \frac{\partial \boldsymbol{\beta }}{\partial \beta _{i}}(x\boldsymbol{y}^{T}-x(\boldsymbol{\beta }x)^{T})|_{\boldsymbol{\beta }=\boldsymbol{\hat{\beta }}} = 0, \end{eqnarray*}$

which in turn implies the best estimate

$\begin{equation} \hat{\boldsymbol{\beta }}=\boldsymbol{y}x^{T}(xx^{T})^{-1}, \end{equation} \tag{ A.4 }$

of β according to the linear least squares method [101].

Appendix B.: Singular value decomposition and L₁ −norm minimization

Singular value decomposition (SVD) is regarded as an important tool for statisticians due to its broad variety of applications (reduced rank regression, polar decomposition, image compression, among others [102]). Formally, from the fundamental theorem of linear algebra, any rectangular matrix $A\in \mathbb {R}^{m\times n}$ may be decomposed into the product of three matrices as

$\begin{equation} A=U\Sigma V^{\mathsf {T}}, \end{equation} \tag{ B.1 }$

where $U\in \mathbb {R}^{m\times m}$ and $V\in \mathbb {R}^{n\times n}$ are unitary matrices with their columns referred to as left and right singular vectors, respectively, and $\Sigma \in \mathbb {R}^{m\times n}$ is a rectangular diagonal matrix containing the singular values [103]. This decomposition is known as the SVD of matrix A.

The SVD properties are useful in finding the set of all possible solutions to an under-determined system of equations, because for every under-determined system

$\begin{equation} A\boldsymbol{y}=\boldsymbol{b}, \end{equation} \tag{ B.2 }$

where $\boldsymbol{b}\in \mathbb {R}^{m\times 1}$ , we can use SVD (B.1) to rewrite A such that solving (B.2) for y yields the particular solution

$\begin{equation} \boldsymbol{y}_{p}=V\tilde{\Sigma }U^{\mathsf {T}}\boldsymbol{b}, \end{equation} \tag{ B.3 }$

where $\tilde{\Sigma }\in \mathbb {R}^{n\times m}$ is the pseudo-inverse of Σ and is defined as

$\begin{equation} \tilde{\Sigma }=\Sigma ^{\mathsf {T}}(\Sigma \Sigma ^{\mathsf {T}})^{-1}. \end{equation} \tag{ B.4 }$

Equation (B.3) defines a particular solution from the set of all possible solutions. The general solution to equation (B.2) is given by our particular solution plus a linear combination of vectors in the null-space of A, i.e.,

$\begin{equation} \boldsymbol{y}=V\tilde{\Sigma }U^{\mathsf {T}}\boldsymbol{b}+V\boldsymbol{c}, \end{equation} \tag{ B.5 }$

where $\boldsymbol{c}\in \mathbb {R}^{n\times 1}$ and c_i = 0 for i ∈ {1, ..., r} and r = Rank(A). The latter ensures that the linear combinations are made only with vectors that span the null-space.

In this case, the problem consists in maximizing the number of zero entries in y by choosing the c_j ∀ j > r properly. This may be achieved by solving the overdetermined system (B.5) with M equations and M − r unknowns. Specifically, it has been illustrated in [10, 29, 35] that by solving

$\begin{equation} \arg \min _{\boldsymbol{y}}\Vert V\tilde{\Sigma }U^{T}\boldsymbol{b}+V\boldsymbol{c}\Vert _{1}, \end{equation} \tag{ B.6 }$

employing the Barrowdale and Roberts algorithm [34], a solution having a low number (possibly the least number) of non-zero values, consistent with the restricting equations, is often recovered. However, there seem to not be a general proof of this observation [31]. In the network reconstruction context, it has been shown that optimizing systems like (B.6) (where y and b are replaced by the unit's connectivity $\boldsymbol{J}_{i}^{\mathsf {T}}$ and network constraints $\boldsymbol{Y}_{i}^{\mathsf {T}}$ ) yields the actual network topology when the network is sparse (even if there are less linear constraints than units in the network).

Appendix C.: Estimating maximum entropy parameters

Maximizing the entropy

$\begin{equation} S[\rho ]:=-\sum _{\boldsymbol{x}}\rho (\boldsymbol{x})\ln \rho (\boldsymbol{x}), \end{equation} \tag{ C.1 }$

we derive first the functional form of ρ(x) and second its parameters under the constraints that the non-negative function ρ ⩾ 0 is normalized

$\begin{equation} \sum _{\boldsymbol{x}}\rho (\boldsymbol{x})=1, \end{equation} \tag{ C.2 }$

and the first moment

$\begin{equation} \left\langle x_{i}\right\rangle _{\rho }:=\sum _{\boldsymbol{x}}x_{i}\rho (\boldsymbol{x})\overset{!}{=}\left\langle x_{i}(t)\right\rangle _{t}, \end{equation} \tag{ C.3 }$

and second moment

$\begin{equation} \langle x_{i}x_{j}\rangle _{\rho }:=\sum _{\boldsymbol{x}}x_{i}x_{j}\rho (\boldsymbol{x})\overset{!}{=}\langle x_{i}(t)x_{j}(t)\rangle _{t}, \end{equation} \tag{ C.4 }$

are consistent with those estimated from data of N time scalar series x_i(t). The covariance matrix is defined via the data as

$\begin{equation} C_{ij}=\langle x_{i}(t)x_{j}(t)\rangle _{t}-\langle x_{i}(t)\rangle _{t}\langle x_{j}(t)\rangle _{t}. \end{equation} \tag{ C.5 }$

We maximize the entropy under these 1 + N + N(N − 1)/2 constraints using a Lagrange function

$\begin{equation} L[\rho ]:=S[\rho ]-a\sum _{\boldsymbol{x}}\rho (\boldsymbol{x})-\sum _{i=1}^{N}h_{i}\sum _{\boldsymbol{x}}x_{i}\rho (\boldsymbol{x})-\frac{1}{2}\sum _{i=1}^{N}\sum _{j=1}^{N}\gamma_{ij}\sum _{\boldsymbol{x}}x_{i}x_{j}\rho (\boldsymbol{x}), \end{equation} \tag{ C.6 }$

where we drop constants as they do not influence the location of a maximum. Here a, h_i and $\frac{1}{2}\gamma _{ij}$ are the Lagrange multipliers to be determined. Computing the first derivatives and equating them to zero

$\begin{equation} -\frac{\partial L}{\partial \rho (\boldsymbol{y})}=\ln \left(\rho (\boldsymbol{y})\right)+1+a+\sum _{i=1}^{N}h_{i}y_{i}+\frac{1}{2}\sum _{i=1}^{N}\sum _{j=1}^{N}\gamma _{ij}y_{i}y_{j}\overset{!}{=}0, \end{equation} \tag{ C.7 }$

yields the (unique local) maximum entropy of the form

$\begin{eqnarray} \rho (\boldsymbol{x}) & = & \exp \left(-1-a-\sum _{i=1}^{N}h_{i}x_{i}-\frac{1}{2}\sum _{i=1}^{N}\sum _{j=1}^{N}\gamma _{ij}x_{i}x_{j}\right)\nonumber\\ & = & \exp \left(-1-a-\boldsymbol{h}^{\sf {T}}\boldsymbol{\boldsymbol{x}}-\frac{1}{2}\boldsymbol{x}^{\sf {T}}\hat{J}\boldsymbol{x}\right)\nonumber\\ & = & A\exp \left(-\frac{1}{2}\boldsymbol{z}^{\sf {T}}\hat{J}\boldsymbol{z}\right), \end{eqnarray} \tag{ C.8 }$

where we use the abbreviations $\hat{J}:=(\gamma _{ij})_{ij}$ , $\boldsymbol{z}:=\boldsymbol{x}+\hat{J}^{-1}\boldsymbol{h}$ is an affine function of x and

$\begin{equation} A:=\exp \left(-1-a+\frac{1}{2}\boldsymbol{h}^{\sf{T}}\hat{J}^{-1}\boldsymbol{h}\right), \end{equation} \tag{ C.9 }$

is independent of x.

In summary, this exact computation yields the probabilities of the form

$\begin{equation} \rho (\boldsymbol{x})=Z^{-1}\exp \left(\sum _{i=1}^{N}h_{i}x_{i}+\frac{1}{2}\sum _{i=1}^{N}\sum _{j=1}^{N}\hat{J}_{ij}x_{i}x_{j}\right), \end{equation} \tag{ C.10 }$

where Z⁻¹ = exp ( − 1 − a).

To estimate the parameters a, h_i and $\hat{J}_{ij}$ we make the approximation that the data x_i form a continuous set such that we can approximate sums by integrals. We then first observe that

$\begin{equation} \sum _{\boldsymbol{x}}\rho (\boldsymbol{x})\overset{\mathsf {cont}}{=}\int _{\mathbb {R}^{N}}\rho (\boldsymbol{x})d^{N}x=\int _{\mathbb {R}^{N}}A\,{\rm e}^{-\frac{1}{2}\boldsymbol{z}^{\sf {T}}\hat{J}\boldsymbol{z}}d^{N}z\overset{!}{=}1, \end{equation} \tag{ C.11 }$

due to normalization (C.2). This implies

$\begin{equation} A=(|\det (\hat{J})|/(2\pi ))^{N/2}, \end{equation} \tag{ C.12 }$

and thus yields the parameter a as a function of h and $\hat{J}$ . Similarly, fixing the averages (C.3) yields (approximately)

$\begin{eqnarray} \left\langle x_{i}\right\rangle & = & \int _{\mathbb {R}^{N}}\, x_{i}\rho (\boldsymbol{x})d^{N}x\nonumber\\ & = & \int _{\mathbb {R}^{N}}A\,{\rm e}^{-\frac{1}{2}\boldsymbol{z}^{\sf {T}}\hat{J}\boldsymbol{z}}\left(z_{i}-\sum _{j=1}^{N}\hat{(J^{-1})}_{ij}h_{j}\right)d^{N}z\nonumber\\ & = & \sum _{j=1}^{N}(\hat{J}^{-1})_{ij}h_{j}, \end{eqnarray} \tag{ C.13 }$

for all i and thus h as a function of $\hat{J}$ .

Finally, the equations fixing the second moments

$\begin{eqnarray} \langle x_{i}x_{j}\rangle & = & \int _{\mathbb {R}^{N}}\, x_{i}x_{j}\rho (\boldsymbol{x})d^{N}x\nonumber\\ & = & \int _{\mathbb {R}^{N}}x_{i}x_{j}\exp \left(-1-a-\boldsymbol{h}^{\sf {T}}\boldsymbol{\boldsymbol{x}}-\frac{1}{2}\boldsymbol{x}^{\sf {T}}\hat{J}\boldsymbol{x}\right)d^{N}x, \end{eqnarray} \tag{ C.14 }$

can be evaluated using

$\begin{equation} F(\boldsymbol{h},\boldsymbol{x})=\exp \left(-1-a-\boldsymbol{h}^{\sf {T}}\boldsymbol{\boldsymbol{x}}-\frac{1}{2}\boldsymbol{x}^{\sf {T}}\hat{J}\boldsymbol{x}\right), \end{equation} \tag{ C.15 }$

subjected to

$\begin{equation} \frac{\partial ^{2}F}{\partial h_{i}\partial h_{j}}=x_{i}x_{j}F(\boldsymbol{h},\boldsymbol{x}). \end{equation} \tag{ C.16 }$

With the transformation $\boldsymbol{x}=\boldsymbol{z}-\hat{J}^{-1}\boldsymbol{h}$ , equation (C.14) becomes

$\begin{eqnarray} \langle x_{i}x_{j}\rangle & = & \frac{\partial ^{2}}{\partial h_{i}\partial h_{j}}\,\int _{\mathbb {R}^{N}}F(\boldsymbol{h},\boldsymbol{x})\, d^{N}x\nonumber\\ & & \frac{\partial ^{2}}{\partial h_{i}\partial h_{j}}\,\exp \left(\frac{1}{2}\boldsymbol{h}^{\sf {T}}\hat{J}^{-1}\boldsymbol{h}\right)\,\int _{\mathbb {R}^{N}}\exp \left(-1-a-\frac{1}{2}\boldsymbol{z}^{\sf {T}}\hat{J}\boldsymbol{z}\right)\, d^{N}z\nonumber\\ & = & \frac{\partial ^{2}}{\partial h_{i}\partial h_{j}}\,\exp \left(\frac{1}{2}\boldsymbol{h}^{\sf {T}}\hat{J}^{-1}\boldsymbol{h}\right)\exp (-1-a)\left(\frac{2\pi }{|\det (\hat{J})|}\right)^{\frac{N}{2}}\nonumber\\ & = & \exp \left(\frac{1}{2}\boldsymbol{h}^{\sf {T}}\hat{J}^{-1}\boldsymbol{h}\right)\exp (-1-a)\left(\frac{2\pi }{|\det (\hat{J})|}\right)^{\frac{N}{2}}[((\hat{J}^{-1})_{i}\boldsymbol{h})((\hat{J}^{-1})_{j}\boldsymbol{h})+(\hat{J}^{-1})_{ij}]\nonumber\\ & = & \langle x_{i}\rangle \langle x_{j}\rangle +(\hat{J}^{-1})_{ij} , \end{eqnarray} \tag{ C.17 }$

where the last equations follows from (C.9), (C.12) and (C.13). Thus, using the definition of the correlation matrix (C.5) and assuming that temporal and statistical averages are the same, we have that the matrix

$\begin{equation} \hat{J}=C^{-1} \end{equation} \tag{ C.18 }$

of effective coupling strengths between the variables is given by the inverse of the correlation matrix of the data.