Adaptive Cluster Expansion for the Inverse Ising Problem: Convergence, Algorithm and Tests

Cocco, S.; Monasson, R.

doi:10.1007/s10955-012-0463-4

Adaptive Cluster Expansion for the Inverse Ising Problem: Convergence, Algorithm and Tests

Published: 27 March 2012

Volume 147, pages 252–314, (2012)
Cite this article

Journal of Statistical Physics Aims and scope Submit manuscript

S. Cocco^1,2 &
R. Monasson^1,3

1091 Accesses
47 Citations
6 Altmetric
1 Mention
Explore all metrics

Abstract

We present a procedure to solve the inverse Ising problem, that is, to find the interactions between a set of binary variables from the measure of their equilibrium correlations. The method consists in constructing and selecting specific clusters of spins, based on their contributions to the cross-entropy of the Ising model. Small contributions are discarded to avoid overfitting and to make the computation tractable. The properties of the cluster expansion and its performances on synthetic data are studied. To make the implementation easier we give the pseudo-code of the algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Study of many-body localization by principal component analysis

Article 03 May 2022

Fast $$(1+\epsilon )$$ -Approximation of the Löwner Extremal Matrices of High-Dimensional Symmetric Matrices

Structure Solution by Charge Flipping

Notes

We have to minimize here rather than maximize since the true Lagrange multipliers take imaginary values, the couplings and fields being their imaginary part.
The vocable ‘field’, should strictly speaking, be used when the variables σ _i are spins taking ±1 values. For 0,1 variable, the use of the denomination ‘chemical potential’ would be more appropriate. We keep to the simpler denomination ‘field’ hereafter.
The minimal strength of the couplings which can be ‘detected’ depend on the quality of the sampling, and scales as $\sqrt{\log N/B}$, see Sect. 4.3.2.
A further theoretical argument supporting the existence of the cancellation property is, in the case of perfect sampling, the fact that the entropy S must be extensive in N. As S is the sum of ∼2^N cluster entropies, those contributions must compensate each other.
In the present work where spins are equal to 0,1, the couplings (J _ij) and fields (h _i) are the one’s in spin 0,1 ($\hat{J}_{ij},\hat{h_{i}}$) by the transformation: $J_{ij}=4 \hat{J}_{i,j}$ and $h_{i}=-\frac{1}{2} \sum_{j\neq i} J_{ij} +\hat{h}_{i}$. The numerical experiments of [17] where done with ±1 spins.
If $p=\frac{1}{2}$, ΔS _IS is of the order of B ^−K.
This statement is widely believed to be true in the theory of liquids literature. The fast decay of the inverse susceptibility, χ ⁻¹(r), or, equivalently, of the direct pair correlation, g(r), is used to approximately close the hierarchy of correlation functions. The Percus-Yevik closure scheme, which gives an accurate equation of state for liquids of hard spheres, assumes that the inverse susceptibility vanishes above the interaction range of the potential (diameter of a particle).
We have verified that the dependence on N is weaker in the case of periodic boundary conditions.
The numerical experiments of [17] were done with ±1 spins and with coupling parameter θ and field ν=0; in the present work where spins are equal to 0,1, the corresponding couplings and fields are: J=4θ and $h_{i}=-\frac{1}{2} \sum_{j\neq i} J_{ij}$.
In mean-field spin-glasses, as the couplings scale as the inverse square root of the number N of spins, only loop diagrams have non-zero weights in the thermodynamical limit.
With N=40, p=0.024, we find that the standard deviation is of the order of 10⁻¹³ for B=10⁶, and 2⋅10⁻¹¹ for B=10⁵, see Fig. 7.

References

Brush, S.G.: Rev. Mod. Phys. 39, 883 (1967)
Article ADS Google Scholar
Schneidman, E., Berry, M.J. II, Segev, R., Bialek, W.: Nature 440, 1007 (2006)
Article ADS Google Scholar
Tkacik, G., Schneidman, E., Berry, M.J. II, Bialek, W.: arXiv:q-Bio.NC/0611072 (2006)
Marre, O., El Boustani, S., Frégnac, Y., Destexhe, A.: Phys. Rev. Lett. 102, 138101 (2009)
Article ADS Google Scholar
Peyrache, A., et al.: Nat. Neurosci. 12, 919 (2009)
Article Google Scholar
Weigt, M., et al.: Proc. Natl. Acad. Sci. 106, 67 (2009)
Article ADS Google Scholar
Balakrishnan, S., et al.: Proteins 79, 1061 (2011)
Article Google Scholar
Bialek, W., Cavagna, A., Giardina, I., Mora, T., Silvestri, E., Viale, M., Walczak, A.M.: arXiv:1107.0604 (2011)
Cocco, S., Monasson, R.: Phys. Rev. Lett. 106, 090601 (2011)
Article ADS Google Scholar
Jaynes, E.T.: Proc. IEEE 70, 939 (1982)
Article ADS Google Scholar
Cocco, S., Leibler, S., Monasson, R.: Proc. Natl. Acad. Sci. 106, 14058 (2009)
Article ADS Google Scholar
Opper, M., Saad, D. (eds.): Advanced Mean Field Methods: Theory and Practice. MIT Press, Cambridge (2001)
MATH Google Scholar
Ackley, D.H., Hinton, G.E., Sejnowski, T.J.: Cogn. Sci. 9, 147 (1985)
Article Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, Berlin (2009)
MATH Google Scholar
Wallace, C.S.: Statistical and Inductive Inference by Minimum Message Length. Information Science and Statistics. Springer, Berlin (2005)
MATH Google Scholar
Ravikumar, P., Wainwright, M.J., Lafferty, J.: Ann. Stat. 38, 1287 (2010)
Article MathSciNet MATH Google Scholar
Bento, J., Montanari, A.: Which graphical models are difficult to learn? In: NIPS, 2009
Jerrum, M., Sinclair, A.: The Markov chain Monte Carlo method: an approach to approximate counting and integration. In: Hochbaum, D.S. (ed.) Approximation Algorithms for NP-Hard Problems. PWS, Boston (1996)
Google Scholar
Cagliotti, E., Kuna, T., Lebowitz, J.L., Speer, E.R.: Markov Process. Relat. Fields 12, 257 (2006)
Google Scholar
Kuna, T., Lebowitz, J.L., Speer, E.R.: J. Stat. Phys. 129, 417 (2007)
Article MathSciNet ADS MATH Google Scholar
Swendsen, R.H.: Phys. Rev. Lett. 52, 1165 (1984)
Article MathSciNet ADS Google Scholar
Meinshausen, N., Bühlmann, P.: Ann. Stat. 34, 1436 (2006)
Article MATH Google Scholar
Aurell, E., Ekeberg, M.: arXiv:1107.3536 (2011)
Plefka, T.: J. Phys. A, Math. Gen. 15, 1971 (1982)
Article MathSciNet ADS Google Scholar
Georges, A., Yedidia, J.: J. Phys. A, Math. Gen. 24, 2173 (1991)
Article ADS Google Scholar
Georges, A.: In: Mancini, F., Avella, A. (eds.) Lectures on the Physics of Highly Correlated Electron Systems VIII: 8th Training Course in the Physics Correlated Electron Systems and High-Tc Superconductors, AIP Conf. Proc., vol. 715, p. 3 (2004)
Google Scholar
Thouless, D.J., Anderson, P.W., Palmer, R.G.: Philos. Mag. 35, 593 (1977)
Article ADS Google Scholar
Sherrington, D., Kirkpatrick, S.: Phys. Rev. Lett. 35, 1792 (1975)
Article ADS Google Scholar
Tanaka, T.: Phys. Rev. E 58, 2302 (1998)
Article ADS Google Scholar
Roudi, Y., Aurell, E., Hertz, J.: Front. Comput. Neurosci. 3, 22 (2009)
Article Google Scholar
Sessak, V., Monasson, R.: J. Phys. A 42, 055001 (2009)
Article MathSciNet ADS Google Scholar
Pelizzola, A.: J. Phys. A 38, R 309 (2005)
Article MathSciNet ADS Google Scholar
Mézard, M., Mora, T.: J. Physiol. (Paris) 103, 107 (2009)
Article Google Scholar
Marinari, E., Van Kerrebroeck, V.: J. Stat. Mech. P02008 (2010)
de Dominicis, C.: J. Math. Phys. 3, 983 (1962)
Article ADS Google Scholar
Hansen, J.P., McDonald, I.R.: Theory of Simple Liquids. Academic Press, New York (1976)
Google Scholar
Frisch, H.L., Lebowitz, J.L.: The Equilibrium Theory of Classical Fluids. Benjamin, New York (1964). (A lecture note and reprint volume)
MATH Google Scholar
Flajolet, P., Gourdon, X., Dumas, P.: Theor. Comput. Sci. 144, 3 (1995)
Article MathSciNet MATH Google Scholar
Schnitzer, M.J., Meister, M.: Neuron 37, 499–511 (2003)
Article Google Scholar
Gumbel, E.J.: Statistics of Extremes. Dover, New York (2004)
MATH Google Scholar
MacKay, D.J.C.: Neural Comput. 4, 415 (1991)
Article MathSciNet Google Scholar
Bray, A.J., Moore, M.A.: J. Phys. A 10, 1927 (1977)
Article MathSciNet ADS Google Scholar
Percus, J.K., Yevick, G.J.: Phys. Rev. 110, 1 (1958)
Article MathSciNet ADS MATH Google Scholar
Borzi, C., Ord, G., Percus, J.K.: J. Stat. Phys. 46, 51 (1986)
Article MathSciNet ADS Google Scholar
Barton, J.: private communication
Gori, G., Trombettoni, A.: J. Stat. Mech. P10021 (2011)
Fisher, M.: J. Math. Phys. (N.Y.) 5, 944 (1964)
Article ADS Google Scholar
Fisher, M.: Phys. Rev. 162, 480 (1967)
Article ADS Google Scholar
Zobin, D.: Phys. Rev. 5, 2387 (1978)
Google Scholar

Download references

Acknowledgements

We are grateful to J. Barton, J. Lebowitz, E. Speer for very useful and stimulating discussions, in particular regarding the correspondence between the inverse susceptibility and the direct correlation functions and the practical implementation of the inference algorithm. We thank E. Aurell for pointing to us the difference between P and Q, see Sect. 5.4.1.

Author information

Authors and Affiliations

Simons Center for Systems Biology, Institute for Advanced Study, Einstein Drive, Princeton, NJ, 08540, USA
S. Cocco & R. Monasson
Laboratoire de Physique Statistique de l’ENS, CNRS & UPMC, 24 rue Lhomond, 75005, Paris, France
S. Cocco
Laboratoire de Physique Théorique de l’ENS, CNRS & UPMC, 24 rue Lhomond, 75005, Paris, France
R. Monasson

Authors

S. Cocco
View author publications
You can also search for this author in PubMed Google Scholar
R. Monasson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to R. Monasson.

Appendices

Appendix A: Expression of the Entropy of Clusters with Size K=3

In this appendix, we give the analytical expression for the entropy of a cluster with K=3 spins. Using this expression instead of minimizing the cross-entropy (7) offers a valuable computational speed-up as there are O(N ³) clusters of size K=3. We start with the definition of the entropy P(σ):

$$S_3=-\sum_{\sigma_1=0,1}\sum _{\sigma_2=0,1}\sum_{\sigma_3=0,1} P(\sigma_1,\sigma_2,\sigma_3) \log P(\sigma_1,\sigma_2,\sigma_3) . $$

(A.1)

We then replace the probabilities P(σ ₁,σ ₂,σ ₃) of the eight configurations of the three spins above with their expressions in terms of the probabilities {p _i,p _kl} in the data, and of the probability p ₁₂₃ that the three spins are equal to 1: P(1,1,1)=p ₁₂₃, P(1,1,0)=p ₁₂−p ₁₂₃, P(1,0,1)=p ₁₃−p ₁₂₃, P(0,1,1)=p ₂₃−p ₁₂₃, P(0,0,1)=p ₃−p ₂₃−p ₁₃+p ₁₂₃, P(0,1,0)=p ₂−p ₁₂−p ₂₃+p ₁₂₃, P(1,0,0)=p ₁−p ₁₃−p ₁₂+p ₁₂₃, P(0,0,0)=1−p ₁−p ₂−p ₃+p ₁₂+p ₁₃+p ₂₃−p ₁₂₃. The only unknown quantity (not available from p) is the probability p ₁₂₃. To determine p ₁₂₃ we impose

$$ \frac{d S_3}{d p_{123} }=0 ,$$

(A.2)

which means that the three-body coupling J ₁₂₃ vanishes. Condition (A.2) gives a third degree equation on p ₁₂₃,

$$p_{123}^3+\alpha p_{123}^2+\beta p_{123}+\gamma=0 $$

(A.3)

with

(A.4)

Upon substitution of p ₁₂₃ in (A.1) we obtain the desired cross-entropy S ₃, as a function of the three average values p _i and the three two-point averages p _kl. The expression of the cluster-entropy is given by,

(A.5)

according to (28). The expressions of the cluster-entropies for one and two spins are given by, respectively, (23) and (25). Similarly, one obtains the expressions for the contributions of the 3-spin cluster to the values of the interactions parameters by differentiating ΔS with respect to the p _i’s and the p _kl’s.

Appendix B: Diagrammatic Expansion of the Cluster-Entropies in Powers of the Connected Correlations

A better understanding of the cluster expansion and of the role of the reference entropy S ₀ can be gained through the diagrammatic expansion of the entropy S(p) in powers of the connected correlations (high-temperature expansion),

$$c_{ij} = p_{ij}-p_ip_j .$$

(B.1)

Note that the entry M _ij of the matrix M defined in (21) vanishes linearly with c _ij. Thus, an expansion in powers of c _ij is equivalent to an expansion in powers of M _ij. A procedure to derive in a systematic way the diagrammatic expansion of S(p) is proposed in [31]. The diagrammatic expansion provides a simple representation of the cluster-entropies, in which the entropy S(p) can be represented as a sum of connected diagrams (Fig. 26). Each diagram is made of sites, connected or not by one or more edges. Each point symbolizes a variable, and carries a factor p _i. The presence of n(≥0) edges between the sites k and l results in a multiplicative factor (c _kl)ⁿ. The contribution of a diagram to the entropy is the product of the previous factors, times a function of the p _i specific to the topology of the diagram, see [31]. Diagrams of interest include (Fig. 26):

the N single-point diagrams, whose contributions are ΔS _(i)(p _i);
the ‘loop’ diagrams, which consist of a circuit with K edges going through K sites i ₁→i ₂→⋯→i _K→i ₁, whose contributions to the entropy are
$$ S_{\mathit{loop}} (\mathbf{p}|i_1,i_2,\ldots,i_K) = (-1)^{K-1} M_{i_1,i_2}M_{i_2,i_3}\ldots M_{i_{K-1},i_K}M_{i_K,i_1} ;$$
(B.2)
the Eulerian circuit diagrams, for which there exists a closed path visiting each edge exactly once;
the non-Eulerian diagrams, with the lowest number of links (smallest power in M).

The entropy for two variables i,j, S(p _i,p _j,p _ij) (24), is the sum of the two single-point diagrams i and j, plus the sum of all connected diagrams made of the two sites i and j with an arbitrary large number of edges (n≥2) in between (first two columns in Fig. 26). According to (25), the cluster-entropy ΔS _(i,j)(p _i,p _j,p _ij) is equal to the latter sum (second column in Fig. 26). More generally, the entropy of a cluster ΔS _Γ(p) is the infinite sum of all diagrams whose sites are the indices in Γ.

We now interpret the Mean Field expression for the entropy, S _MF, in the diagrammatic framework. We start from identity (21), and rewrite,

(B.3)

Using the fact that the diagonal elements of M are equal to unity, the term corresponding to K=1 above vanishes. For K≥2, we have

(B.4)

where the matrix $\hat{M}$ has the same off-diagonal elements as M, and has zero diagonal elements. Each term in the above sum corresponds to an Eulerian circuit over K′≤K sites, where K′ is the number of distinct indices in (i ₁,i ₂,…,i _K). Note that the same circuit can be obtained from different K-uplets of indices. Consider for instance the longest circuits, obtained for K′=K, i.e. all distinct indices. 2K different K-uplets (i ₁,i ₂,…,i _K) correspond to the same circuit, as neither the starting site nor the orientation of the loop matter. For instance, i ₁→i ₂→i ₃→i ₁, i ₂→i ₃→i ₁→i ₂, i ₁→i ₃→i ₂→i ₁, … are all equivalent. This multiplicity factor 2K precisely cancels the 2K at the denominator in (B.3). The contribution corresponding to a circuit therefore coincides with expression (B.2) for the loop entropy. We conclude that

S _MF(p) sums up all loop diagrams exactly;
S _MF(p), in addition, sums up Eulerian circuit diagrams, but with weights a priori different from their values in the cross-entropy S(p).^{Footnote 10} An exception is the three-variable Eulerian diagram shown in Fig. 26, whose weights in S _MF and S coincide.
no non-Eulerian diagram is taken into account in S _MF(p).

As a conclusion, the diagrammatic expansion provides a natural justification for the choice of the reference entropy S ₀(p)=S _MF(p). In addition, it provides us with the dominant contribution to the cluster-entropies once the Mean-Field entropy is substracted, see Fig. 26. A detailed study of those dominant contributions is presented in Appendix D.

Appendix C: Properties of the Cluster-Entropies of the One-dimensional Ising Model

Consider the one-dimensional Ising model with nearest-neighbor couplings and periodic boundary conditions. The Hamiltonian of the model is

$$ H=-h \sum_i \sigma_i- J\sum_i \sigma_i \sigma_{i+1},$$

(C.1)

where the spins σ _i take 0,1 values. The parameters of the model are the N identical fields h _i=h, the N couplings J _i,i+1=J between neighbors and the remaining N×(N−3)/2 zero couplings J _i,j=0 between non-neighbors.

We recall a few elementary facts about the model. The transfer matrix is

(C.2)

The eigenvalues are $\lambda_{\pm}=\frac{1}{2} (e^{J+h}+1\pm \sqrt{(e^{J+h}-1)^{2}+4 e^{h}} )$, and the two components of the eigenvectors are, respectively, $v_{\pm} (1)=-(1-\lambda_{\pm })/\sqrt{e^{h}+(1-\lambda_{\pm})^{2}}$ and $v_{\pm} (2)=e^{h/2}/\sqrt {e^{h}+(1-\lambda_{\pm})^{2}}$. The probability that a spin is up is given by, in the N→∞ limit,

$$p=\langle\sigma_i\rangle_\mathbf{J} = \bigl(v_+(1)\bigr)^2 , $$

(C.3)

and the connected correlation at distance d is

$$ c_{i,i+d}= \langle\sigma_i\sigma_{i+d}\rangle_\mathbf{J}-\langle \sigma_i\rangle_\mathbf{J}\langle\sigma_{i+d}\rangle_\mathbf{J}=p(1-p) \biggl(\frac{\lambda_-}{\lambda_+} \biggr)^{d}= p(1-p) \exp(-d/\xi ),$$

(C.4)

where the correlation length is given by ξ=−1/log(λ ₋/λ ₊).

3.1 C.1 Calculation of the Cluster-Entropies and Cancellation Property

In this section, we show the exact cancellation property between the entropies of clusters with different sizes discussed in Sect. 4.1. We will see that this property is a direct consequence of the existence of a unique interaction path along the unidimensional chain.

3.1.1 C.1.1 Case S ₀=0

We first consider the case where the reference entropy is zero. Let Γ=(i ₁,i ₂,…,i _K) be a cluster of size K, with i ₁<i ₂<⋯<i _K. Due to the unidimensional nature of the interactions, the Gibbs distribution over the K-spin configurations σ obeys the chain rule,

$$ P_\mathbf{J}[\boldsymbol{\sigma}] = P_\mathbf{J}(\sigma_{i_K}| \sigma_{i_{K-1}})\dots P_\mathbf{J}(\sigma_{i_4}| \sigma_{i_3}) P_\mathbf{J}(\sigma_{i_3}| \sigma_{i_2}) P_\mathbf{J}(\sigma_{i_2},\sigma_{i_1}) ,$$

(C.5)

where P(⋅,⋅) and P(⋅|⋅) denote, respectively, joint and conditional probabilities. Inserting the above formula into expression (9) for the cross-entropy, we obtain

(C.6)

Each variable $\sigma_{i_{l}}$, with l=2,…,K−1, appears three times in (C.6), which explains the presence of three fields h with the same index i _l. After optimization over $\mathbf{J}=(\{J_{i_{l},i_{l+1}}\},\{h^{\rightarrow}_{i_{l}}\},\{h^{\leftarrow}_{i_{l}}\},\{h^{0}_{i_{l}}\})$ all these fields are equal, and we obtain

(C.7)

Hence the cross-entry S(p) is the sum of the 1-cluster entropies and of the entropies of the 2-clusters made of adjacent sites. None of the other cluster-entropies appear, which proves that they cancel each other. To illustrate the cancellation mechanism, consider the case K=3. According to (C.7),

$$ S(\mathbf{p}) =\Delta S_{(i_1,i_2)}(\mathbf{p}) +\Delta S_{(i_2,i_3)}(\mathbf{p})+ \Delta S_{(i_1)} (\mathbf{p}) +\Delta S_{(i_2)} (\mathbf{p}) +\Delta S_{(i_3)} (\mathbf{p}) .$$

(C.8)

Comparing with (28) we obtain

$$\Delta S_{(i_1,i_2,i_3)} (\mathbf{p}) = - \Delta S_{(i_1,i_3)} (\mathbf{p}) ,$$

(C.9)

which shows that the entropy of a 3-cluster and the one of a 2-cluster with the same extremities i ₁,i ₃ are opposite to one another. By a recursive applications of (C.7) this result can be immediately generalized to higher values of K. The entropy of a K-cluster is simply the entropy of the 2-cluster with the same extremities, multiplied by (−1)^K−2. Hence, identity (43) is established.

According to formula (C.4) for the connected correlation, the entropy of a two-site cluster is a function of the distance d between the two sites:

$$ \Delta S_{(i,i+d)} = F \bigl(\exp(-d/\xi) \bigr) ,$$

(C.10)

where

(C.11)

To obtain the expression (C.11) for F, we have used formula (26) for the 2-spin cluster-entropy, with p ₁=p ₂=p and p ₁₂=p ²+c ₁₂, where the correlation c ₁₂ is given by (C.4). Note that F(u)=O(u ²) for small u, in agreement with scaling (41).

3.1.2 C.1.2 Case S ₀=S _MF

We now introduce the reference entropy S ₀=S _MF. The matrix M defined in (21) has elements

$$M_{ij} = \frac{c_{ij}}{\sqrt{p_i(1-p_i)p_j(1-p_j)}}= \exp \bigl(-|i-j|/\xi\bigr) .$$

(C.12)

The inverse of M, G=M ⁻¹, is a tridiagonal matrix, whose non-zero elements are

$$ G_{ii}= \frac{1+\exp(-2/\xi)}{1-\exp(-2/\xi)}, \qquad G_{i,i\pm1}= - \frac{\exp(-1/\xi)}{1-\exp(-2/\xi)} .$$

(C.13)

Consider now the Gaussian model over N real-values variables φ _i, whose energy function is given by

$$ E[\boldsymbol{\varphi}] = \frac{1}{2} \sum_{i,j}G_{ij} \varphi_i \varphi_j .$$

(C.14)

For this Gaussian model, the logarithm of the partition function is (up an irrelevant additional constant), $\log Z[\mathbf{G}]= -\frac{1}{2} \log \det G$. By construction, model (C.14) is the solution of the inverse Gaussian problem, with data: 〈φ _i〉=0,〈φ _i φ _j〉=M _ij. Hence, S ₀ can be interpreted as the cross-entropy of Gaussian model (C.14) under those data. A key feature of the Gaussian model above is that its interaction matrix G _ij is tridiagonal. Only nearest neighbor variables are coupled to each other according to (C.13). We conclude that the Gaussian model is a one-dimensional model. Consequently, it obeys a chain rule similar to (C.5). This is the only requirement for the main conclusion of Sect. C.1.1 to hold: in the cluster expansion of S ₀, the entropy of a K-cluster is simply equal to the entropy of the 2-cluster with the same extremities, multiplied by (−1)^K−2. As both the expansions of S and the one of S ₀ enjoy this property, so does the expansion of S−S ₀.

We conclude this section by the expression of the 2-cluster entropy ΔS _(i,i+d). In the presence of the reference entropy S ₀=S _MF, we substract the following contribution to expression (C.10), see (21),

(C.15)

Hence, function F(u) defined in (C.11) should be substracted $\frac{1}{2} \log(1-u^{2})$. It is a simple check that $F(u)- \frac{1}{2} \log (1-u^{2})= O(u^{3})$, in agreement with scaling (45).

3.2 C.2 Examples and Calculation of Diagrammatic Coefficients

We have studied the histograms of cluster-entropies with K=2 and K=3 spins for specific choices of J,h. The averages p _i and p _ij were calculated exactly through formulas (C.3) and (C.4) (perfect sampling). We have found that

for K=2, the cluster-entropies take discrete values, labelled by the distance d between the two extremities of the cluster, e.g. Γ=(i,i+d). Expanding F(u) to the lowest order in u (for S ₀=S _MF) we find the asymptotic formula for the 2-cluster entropy:
$$ \Delta S_{i,i+d} \simeq\frac{(2p-1)^2}{6p(1-p)}e^{-3d/\xi} ,$$
(C.16)
in agreement with (D.1). We have verified that this formula is in very good agreement with the numerics at large distance d.
for K=3, we find that the entropies of 3-clusters of the type (i ₁,i ₂,i ₃) depend only on the distances d=i ₃−i ₁ between the extremities. In addition, the values are exactly the opposite of the ones found for K=2-clusters with the same distance d as expected. Two differences are:
- The peak in d=1 in the histogram is not present because the minimal distance between three spins is d=2. The largest 3-spins entropy thus corresponds to triplets of the type (i,i+1,i+2).
- The height of the peak (number of clusters) corresponding to distance d is (d−1)N. The degeneracy (d−1) is the number of ways of choosing the location of site i ₂ in between i ₁ and i ₃.

We now show how the value of the cluster entropy can be found back from the leading terms in the diagrammatic expansion calculated in Sect. D.4. Let us call d′=j−i<d the distance between the first two sites in the cluster. For each diagram in Fig. 27 we give in Table 1 the sum of the distances of its links, i.e. the power of exp(−1/ξ).

Table 1 Distances corresponding to the diagrams of Fig. 27

Full size table

Interestingly, the lowest total distances are found in diagrams (a) and (d), while the latter diagram is of a higher power (6) in terms of the correlated function than the former (5). Hence, contrary to the case of independent spins (Appendix F), diagrams (a) and (d) give the dominant contributions to the entropy. Summing the contributions of (a) and (d) we find

(C.17)

To derive the coefficients α ⁽⁵⁾ and α ⁽⁶⁾, we impose that ΔS _(i,j,k) is the opposite of (C.16). We deduce that α ⁽⁵⁾ and α ⁽⁶⁾ are given by, respectively, (D.4) and (D.6).

The exact cancellation property discussed above has important consequences for the inferred fields and couplings. Consider for instance the coupling J _i,i+2, which vanishes in the 1D-Ising model with nearest-neighbor interactions (C.1). As the connected correlation c _i,i+2 is not equal to zero, a contribution to the coupling will be collected from the cluster (i,i+2) itself, equal to

$$\Delta J_{i,i+2;(i,i+2)}=- \frac{\partial\Delta S_{(i,i+2)}}{\partial p_{i,i+2}} .$$

(C.18)

Other contributions will come from larger clusters. For instance the cluster (i,i+1,i+2) will give an additional

$$\Delta J_{i,i+2;(i,i+1,i+2)}=- \frac{\partial\Delta S_{(i,i+1,i+2)}}{\partial p_{i,i+2}} .$$

(C.19)

The sum of the two contributions above vanishes due to the cancellation property. It can be checked that the contributions coming from all the other clusters vanish, too, which makes the coupling J _i,i+2=0 as it should.

Appendix D: Leading Diagrammatic Contributions to Small Cluster-Entropies

We analyze the dominant diagrams contributing to the cluster-entropies for the various values of the cluster sizes, K, in the limit of small connected correlations c _kl.

4.1 D.3 Case K=2

The entropy ΔS _(i,j) of a 2-spin cluster is the sum of all diagrammatic contributions containing two spins and an arbitrary number of links between them, corresponding to the power of the expansion parameter M _ij=c _ij/(p _i(1−p _i)p _j(1−p _j)) (Fig. 26) and Appendix B. For small values of M _ij the largest contribution to ΔS _(i,j) is the one with three links (cubic power of M _ij), if the reference entropy S ₀=S _MF removes the two-link loop diagram. The entropy contribution of this diagram was computed in [31], with the result

$$ \Delta S^{(3)}_{i,j}=\alpha_{i,j} (c_{ij})^3,$$

(D.1)

where

$$\alpha_{i,j}^{(3)}= \frac{(2 p_i-1)(2 p_j-1)}{6 (p_i)^2(1-p_i)^2(p_j)^2(1-p_j)^2} . $$

(D.2)

The superscript 3 refers to the power of the connected correlation.

4.2 D.4 Case K=3

For K=3 the leading term to ΔS _(i,j,k) in powers of M _ij was not derived analytically in [31]. Based on the studies of the unidimensional Ising model and the independent spin models (Appendix C), we find that the leading diagrams are diagrams (a), (b), (c) in Fig. 27 (bold diagrams in Fig. 26), whose sum is given by

$$ \Delta S^{(5)}_{i,j,k}= \alpha^{(5)}_{ijk}(c_{ij})^2(c_{jk})^2c_{ki} +\alpha^{(5)}_{jik}(c_{ij})^2c_{jk}(c_{ki})^2+ \alpha^{(5)}_{jki}c_{ij} (c_{jk})^2(c_{ki})^2,$$

(D.3)

with

$$ \alpha_{ijk}^{(5)}=-\frac{(2 p_i-1)(2p_k-1)}{2 (p_i)^2(1-p_i)^2(p_j)^2(1-p_j)^2(p_k)^2(1-p_k)^2 }.$$

(D.4)

Note that $\alpha^{(5)}_{ijk}$ differs from $\alpha^{(5)}_{jik}$. We have also found the coefficients of the subsequent diagrams, of the order of M ⁶. These diagrams are labelled by (d), (e), (f) in Fig. 27. Their total contribution to the cluster-entropy is

$$ \Delta S^{(6)}_{i,j,k}= \alpha^{(6)}_{ijk}(c_{ij})^3(c_{jk})^3+\alpha^{(6)}_{jik} (c_{ij})^3(c_{ki})^3 + \alpha^{(6)}_{jki}(c_{jk})^3(c_{ki})^3$$

(D.5)

with

$$ \alpha^{(6)}_{ijk} =\frac{(2 p_i-1)(2p_k-1)}{3 (p_i)^2(1-p_i)^2(p_j)^3(1-p_j)^3(p_k)^2(1-p_k)^2 }.$$

(D.6)

4.3 D.5 Generic Case K≥4

The above results for K=3 are easily generalized to any value of the cluster size K≥4. The diagrammatic expansion of a K-spin cluster includes all circuits where pairs of spins are linked together. Each diagram with (one or two) links between i _l and i _l+1 (l=1,…,K−1) and (one or two) links between i ₁ and i _K gives

(D.7)

At the next order in power of M _ij, each diagram with three links between i _l and i _l+1 (l=1,…,K−1) gives a contribution

(D.8)

Appendix E: Critical Correlation Length ξ _c for the Absolute Convergence

In this appendix, we briefly explain why the cluster-entropy series is absolutely convergent if and only if the correlation length ξ is smaller than

$$\xi_c =\frac{\Omega}{\log v} .$$

(E.1)

Here, Ω=2 when the reference entropy is S ₀=0, and Ω=3 when S ₀=S _MF. Parameter v denotes the number of neighbors of a site on the lattice, supposed to be uniform. For instance, v=2D on a hypercubic lattice in dimension D≥1.

Consider a set of K distinct points on the lattice. Let ${\mathcal{N}}(L)$ be the number of closed paths of length L visiting all K points. We obviously have ${\mathcal{N}} (L)\leq v^{L}$. Hence, the series

$$ \sum_L {\mathcal{N}} (L) \exp (-\Omega L/\xi )$$

(E.2)

is convergent if ξ<ξ _c. Reciprocally, let L ₀ be the length of the shortest closed path ${\mathcal{C}}_{0}$ encircling the K points. A closed path of length L ₁+L ₀ can be built from ${\mathcal{C}}_{0}$ by attaching a closed loop of length L ₁ to any one of the sites in ${\mathcal{C}}_{0}$. Hence, for L≥L ₀+2, ${\mathcal{N}} (L)\geq L_{0}v^{L-L_{0}}$. We deduce that the series (E.2) is divergent if ξ>ξ _c.

Appendix F: Distribution of Cluster-Entropies for the Independent Spin Model

We generate B configurations of N independent spins σ _i. Spin i is equal to 1 with probability p and to zero with probability 1−p (for simplicity we assume here that all the frequencies p _i are equal to the same value p). The empirical connected correlations c _ij computed from the B sampled configurations of spins are generally non-zero. The marginal distribution of c _ij is a normal law, with zero mean and standard deviation (48). The largest values of the correlations are, for a system with N spins, of the order of

$$c_{ij}^{\mathit{MAX}}=c_B\sqrt{ 4 \log N} , $$

(F.1)

according to extreme value theory.

We have checked the validity of formulas (D.3) and (D.5) for, respectively, the cluster-entropies ΔS _ij and ΔS _ijk with numerics carried out from randomly sampled configurations. The agreement, for B=10⁶ sampled configurations, is excellent due to the small value of c _B≃2⋅10⁻⁵.

6.1 F.6 Distribution of Cluster-Entropies for K=2

To derive the analytical expression of the distribution in the N→∞ limit, we use the small-correlation formula (D.1) for ΔS _(1,2), and the fact that the distribution of the connected correlation is Gaussian. As a result, approximating α _1,2 with its average value α obtained by substituting p ₁ and p ₂ with p in (D.2), we obtain

$$H_{\mathit{IS}}(\Delta S_{(1,2)})= \frac{\exp \bigl(-\frac{(\Delta S_{(1,2)})^{2/3}}{2 (c_B)^2\alpha^{2/3}} \bigr)}{3\alpha^{1/3} \sqrt {2\pi(c_B)^2}(\Delta S_{(1,2)})^{2/3} }. $$

(F.2)

This distribution is a stretched exponential at infinity, and diverges in zero. Its standard deviation is

$$\sigma_{\Delta S_{(1,2)}} =\sqrt{15}\alpha (c_B)^3=\frac{\sqrt{15} (2p-1)^2}{6 p(1-p) B^{3/2}}.$$

(F.3)

This analytical prediction is in very good agreement with the distribution of cluster-entropies found from numerical data for large values of B, e.g. B=10⁶ for p=0.024. For finite N, the largest correlations are given by (F.1), and cluster-entropies cannot exceed ΔS _(1,2), of the order of

$$\Delta S_{(1,2)}^{\mathit{MAX}} \simeq(4 \log N )^{3/2} (\sigma_{\Delta S_{(i,j)}} )^{1/2} . $$

(F.4)

6.2 F.7 Distribution of the Cluster-Entropies for K=3

The leading order contribution to the entropy of a 3-cluster is given by (D.3). We want to calculate the distribution of ΔS _(1,2,3) when the connected correlations c _ij are random Gaussian variables, of zero mean and variance (c _B)². We neglect the correlations between c ₁₂,c ₁₃,c ₂₃, which is legitimate for large B. Let us call x=ΔS _(1,2,3)/(α(c _B)⁵), with α given by (D.4). We can easily evaluate the variance of each of the three terms of the sum in (D.3) as the product of the variances of the three terms in the product, based on the approximation that the connected correlations c _ij are independent stochastic variables. We obtain^{Footnote 11}

$$ \sigma_{\Delta S_{ijk}} =\frac{ 3\sqrt{3} (2p-1)^2 (c_B)^5}{2 p^6(1-p)^6} =\frac{3\sqrt{3} (2p-1)^2}{2p(1-p)B^{5/2}} .$$

(F.5)

Let P(x) be the probability density of x. Though we have not been able to find a closed expression for P(x), the asymptotics behavior of P for large or small arguments can characterized analytically.

6.2.1 F.7.3 Large x Behavior

The Mellin transform of P [38] is

$$ \int_0^\infty dx P(x)x^\lambda= \biggl(\frac{2}{\pi}\biggr)^{3/2} \int _0^\infty dc_{12} dc_{13}dc_{23} e^{-F(c_{12},c_{13},c_{23})}$$

(F.6)

where

(F.7)

The tail of P(x) at large x can be studied by considering large values of λ. We expect the dominant contribution to the multiple integral on the right hand side of (F.6) to come from large correlations. The location of the main contribution to the integral is the value of (c ₁₂,c ₁₃,c ₂₃) which maximizes F. As F is invariant under any permutation of its arguments, we look for a maximum where c ₁₂=c ₁₃=c ₂₃≡c ^∗. A straigthforward calculation shows that

$$c^*(\lambda)=\sqrt{\frac{5}{3} \lambda}, \qquad F^*(\lambda)=\frac{5}{2} \lambda\log \lambda+ \lambda \biggl(\log3 +\frac{5}{2} \log\frac{5}{3} -\frac{5}{2} \biggr) .$$

(F.8)

We now use the saddle-point method again, this time to estimate the integral on the left hand side of (F.6). We obtain

$$ \max_x \bigl[ \log P(x) + \lambda \log x \bigr]= F^*(\lambda) ,$$

(F.9)

which is true when λ is very large. Hence, F ^∗(λ) is the Legendre transform of logP(x). Solving (F.9) gives

$$ \log P(x) \simeq- \frac{3}{2} \biggl(\frac{x}{3}\biggr)^{2/5}$$

(F.10)

at large x. The distribution of the cluster entropies ΔS _(1,2,3) thus follows a stretched exponential with exponent $\frac{2}{5}$. This decay is much slower than an exponential, and leads to large tails as can be seen from Fig. 7.

6.2.2 F.7.4 Small x Behavior

In order for the rescaled entropy x to be small, at least one among the three correlations should be small according to (D.3). Without restriction, we may assume that c ₁₂ is the smallest of the three correlations. As c ₁₂ appears once with power one, and twice with power two in (D.3), we approximate $x \simeq c_{12}c_{13}^{2}c_{23}^{2}$. The Mellin transform of P is, for negative λ,

$$ \int_0^\infty dx P(x)x^\lambda\simeq3 \biggl(\, \int_0^\infty dc\frac{2}{\sqrt{2\pi}} c^\lambda e^{-c^2/2} \biggr) \biggl(\, \int _c^\infty dc \frac{2}{\sqrt{2\pi}}c^{2\lambda} e^{-c^2/2} \biggr)^2 .$$

(F.11)

The largest pole is located in $\lambda=-\frac{1}{2}$, and is of order 2. According to standard results on the inversion of Mellin transforms [38], we obtain a precise characterization of the divergence of the probability density at small x,

$$P(x) \simeq C \frac{(-\log x)}{\sqrt{x}} ,$$

(F.12)

where C is a constant.

6.3 F.8 Distribution of Cluster-Entropies for Generic K≥4

In general, for K≥3, the leading contribution to $\Delta S_{(i_{1},i_{2},\ldots i_{K})}$ (D.7) contains the sum of K×(K−1)!/2 terms, each one being the product of K random variables, among which (K−1) are elevated to power two, and 1 is elevated to power 1. The factor K comes from the fact that there are K way of choosing the single link in the circuits with K spins. The factor (K−1)!/2 is the number of non-equivalent circuits going through K spins. We define the rescaled entropy x through

$$x =|\Delta S_{(i_1,i_2,\ldots i_K)}|\times \frac{2 (p(1-p))^{2K}}{\sqrt{\frac{K!}{2}} (2p-1)^2 (c_B)^{2K-1}}.$$

(F.13)

The approach followed in Sect. F.7 to calculate the asymptotic behavior of the probability density P of x for K=3 can be extended without difficulty to any value of K>3. We find that P(x) diverges when x→0, with

$$ P(x) = C \frac{(-\log x)^{K-2}}{\sqrt{x}}$$

(F.14)

where C is a constant. Hence the shape of the distribution of x is, up to logarithmic terms, independent of K. On the contrary, the tail of the distribution for large x is very sensitive to K,

$$ \log P(x) \simeq-\frac{K}{2(K-\frac{1}{2})^2} \biggl(\frac{x}{K}\biggr)^{2/(2K-1)}.$$

(F.15)

As in the K=3 case, the distribution of the cluster entropies ΔS follows a stretched exponential. The exponent of the stretched exponential decreases with K. The variance of the distribution can be easily evaluated, with the result

$$ \sigma_{\Delta S_{(i_1,i_2,\ldots i_K)}}= \frac{\sqrt{K!/2} (\sqrt{3})^{K-1} (2p-1)^2 (c_B)^{2K-1}}{ 2 (p(1-p))^{2K}} .$$

(F.16)

Appendix G: Inverse Susceptibility Matrix for the Unidimensional Ising Model

Hereafter, we want to invert the matrix χ, whose elements are given in (60). The matrix is of dimension $\frac{1}{2} N(N-1)$, and each element is labelled by two indices (i,j) and (k,l), with i<j and k<l. Each index (i,j) can be represented by a site of coordinates i and j on the half-grid of Fig. 28(a). We now show that the non-zero entries of the inverse susceptibility matrix, (χ ⁻¹)_ij,kl, are in one-to-one correspondence with the sites (i,j) and (k,l) that are either identical, or nearest neighbors, or diagonally opposed on the elementary mesh of the half-grid (Fig. 28(b, c, d)). Depending on the value of the difference j−i, the number of those sites is equal to 9, 8, or 6.

We start with the case j−i≥3 (Fig. 28(b)). By symmetry, the nine unknown matrix elements (χ ⁻¹)_ij,kl take only three independent values, denoted by γ for (k,l)=(i,j), β for (k,l) and (i,j) nearest neighbors, and α for (k,l)=(i±1,j±1). We now write the matrix inversion identity,

$$ \sum_{k<l} \bigl(\chi^{-1}\bigr)_{ij,kl} \chi_{kl,mn} = \delta_{i,m}\delta_{j,n} ,$$

(G.1)

for various values of (m,n). Let d=j−i. For m=i,n=j, constraint (G.1) gives

$$\gamma\bigl( 1- x^{2d}\bigr) + 2\beta \bigl( 2x-x^{2d-1}-x^{2d+1}\bigr)+\alpha \bigl( 4x^2-x^{2d-2}-x^{2d}-x^{2d+2}\bigr)= 1,$$

(G.2)

which should hold for all d≥3. We deduce two coupled equations for the three unknown variables:

(G.3)

(G.4)

For m=i+1,n=j, constraint (G.1) is equivalent to

(G.5)

The d-dependent term in the equation above cancels by virtue of (G.3). We are left with an additional equation over α,β,γ:

$$\gamma x + \beta \bigl(1+ 3x^2\bigr)+2 x\bigl(1+x^2\bigr) \alpha=0 . $$

(G.6)

By symmetry of the matrices χ,χ ⁻¹, no new constraint is obtained when the values of m,n are further varied. Solving (G.3), (G.4), (G.6) we obtain

$$\alpha= \frac{x^2}{(1-x^2)^2}, \qquad \beta=- \frac{x(1+x^2)}{(1-x^2)^2} ,\qquad \gamma=\frac{(1+x^2)^2}{(1-x^2)^2} .$$

(G.7)

The analysis of the other cases j=i+2 (Fig. 28(c)) and j=i+1 (Fig. 28(d)) can be done along the same lines. We do not write the calculations in details, and simply report the results. The case j=i+2 is very similar to the previous case. There are 8 coefficients to be calculated, with three independent values, α′,β′,γ′. It turns out that

$$\alpha'=\alpha,\qquad \beta'=\beta,\qquad \gamma'=\gamma.$$

(G.8)

As for the last case, j=i+1, we call α″ the values of the entries of χ ⁻¹ with (k,l)=(i−1,j−1),(i−1,j+1),(i+1,j+1), β″ the values of the entries with (k,l)=(i−1,j),(i,j+1), and γ″ the diagonal element corresponding to (k,l)=(i,j). After some elementary algebra, we find

$$\alpha''=\alpha,\qquad \beta''= \beta,\qquad \gamma''=\frac{1+x^2+x^4}{(1-x^2)^2} .$$

(G.9)

All those results are reported in (61).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cocco, S., Monasson, R. Adaptive Cluster Expansion for the Inverse Ising Problem: Convergence, Algorithm and Tests. J Stat Phys 147, 252–314 (2012). https://doi.org/10.1007/s10955-012-0463-4

Download citation

Received: 23 October 2011
Accepted: 08 March 2012
Published: 27 March 2012
Issue Date: April 2012
DOI: https://doi.org/10.1007/s10955-012-0463-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive Cluster Expansion for the Inverse Ising Problem: Convergence, Algorithm and Tests

Abstract

Access this article

Similar content being viewed by others

Study of many-body localization by principal component analysis

Fast $$(1+\epsilon )$$ -Approximation of the Löwner Extremal Matrices of High-Dimensional Symmetric Matrices

Structure Solution by Charge Flipping

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: Expression of the Entropy of Clusters with Size K=3

Appendix B: Diagrammatic Expansion of the Cluster-Entropies in Powers of the Connected Correlations

Appendix C: Properties of the Cluster-Entropies of the One-dimensional Ising Model

3.1 C.1 Calculation of the Cluster-Entropies and Cancellation Property

3.1.1 C.1.1 Case S ₀=0

3.1.2 C.1.2 Case S ₀=S _MF

3.2 C.2 Examples and Calculation of Diagrammatic Coefficients

Appendix D: Leading Diagrammatic Contributions to Small Cluster-Entropies

4.1 D.3 Case K=2

4.2 D.4 Case K=3

4.3 D.5 Generic Case K≥4

Appendix E: Critical Correlation Length ξ _c for the Absolute Convergence

Appendix F: Distribution of Cluster-Entropies for the Independent Spin Model

6.1 F.6 Distribution of Cluster-Entropies for K=2

6.2 F.7 Distribution of the Cluster-Entropies for K=3

6.2.1 F.7.3 Large x Behavior

6.2.2 F.7.4 Small x Behavior

6.3 F.8 Distribution of Cluster-Entropies for Generic K≥4

Appendix G: Inverse Susceptibility Matrix for the Unidimensional Ising Model

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Adaptive Cluster Expansion for the Inverse Ising Problem: Convergence, Algorithm and Tests

Abstract

Access this article

Similar content being viewed by others

Study of many-body localization by principal component analysis

Fast $$(1+\epsilon )$$ -Approximation of the Löwner Extremal Matrices of High-Dimensional Symmetric Matrices

Structure Solution by Charge Flipping

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: Expression of the Entropy of Clusters with Size K=3

Appendix B: Diagrammatic Expansion of the Cluster-Entropies in Powers of the Connected Correlations

Appendix C: Properties of the Cluster-Entropies of the One-dimensional Ising Model

3.1 C.1 Calculation of the Cluster-Entropies and Cancellation Property

3.1.1 C.1.1 Case S 0=0

3.1.2 C.1.2 Case S 0=S MF

3.2 C.2 Examples and Calculation of Diagrammatic Coefficients

Appendix D: Leading Diagrammatic Contributions to Small Cluster-Entropies

4.1 D.3 Case K=2

4.2 D.4 Case K=3

4.3 D.5 Generic Case K≥4

Appendix E: Critical Correlation Length ξ c for the Absolute Convergence

Appendix F: Distribution of Cluster-Entropies for the Independent Spin Model

6.1 F.6 Distribution of Cluster-Entropies for K=2

6.2 F.7 Distribution of the Cluster-Entropies for K=3

6.2.1 F.7.3 Large x Behavior

6.2.2 F.7.4 Small x Behavior

6.3 F.8 Distribution of Cluster-Entropies for Generic K≥4

Appendix G: Inverse Susceptibility Matrix for the Unidimensional Ising Model

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

3.1.1 C.1.1 Case S ₀=0

3.1.2 C.1.2 Case S ₀=S _MF

Appendix E: Critical Correlation Length ξ _c for the Absolute Convergence