Skip to main content
Log in

Adaptive Cluster Expansion for the Inverse Ising Problem: Convergence, Algorithm and Tests

  • Published:
Journal of Statistical Physics Aims and scope Submit manuscript

Abstract

We present a procedure to solve the inverse Ising problem, that is, to find the interactions between a set of binary variables from the measure of their equilibrium correlations. The method consists in constructing and selecting specific clusters of spins, based on their contributions to the cross-entropy of the Ising model. Small contributions are discarded to avoid overfitting and to make the computation tractable. The properties of the cluster expansion and its performances on synthetic data are studied. To make the implementation easier we give the pseudo-code of the algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Algorithm 2
Algorithm 3
Algorithm 4
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25

Similar content being viewed by others

Notes

  1. We have to minimize here rather than maximize since the true Lagrange multipliers take imaginary values, the couplings and fields being their imaginary part.

  2. The vocable ‘field’, should strictly speaking, be used when the variables σ i are spins taking ±1 values. For 0,1 variable, the use of the denomination ‘chemical potential’ would be more appropriate. We keep to the simpler denomination ‘field’ hereafter.

  3. The minimal strength of the couplings which can be ‘detected’ depend on the quality of the sampling, and scales as \(\sqrt{\log N/B}\), see Sect. 4.3.2.

  4. A further theoretical argument supporting the existence of the cancellation property is, in the case of perfect sampling, the fact that the entropy S must be extensive in N. As S is the sum of ∼2N cluster entropies, those contributions must compensate each other.

  5. In the present work where spins are equal to 0,1, the couplings (J ij ) and fields (h i ) are the one’s in spin 0,1 (\(\hat{J}_{ij},\hat{h_{i}}\)) by the transformation: \(J_{ij}=4 \hat{J}_{i,j}\) and \(h_{i}=-\frac{1}{2} \sum_{j\neq i} J_{ij} +\hat{h}_{i}\). The numerical experiments of [17] where done with ±1 spins.

  6. If \(p=\frac{1}{2}\), ΔS IS is of the order of B K.

  7. This statement is widely believed to be true in the theory of liquids literature. The fast decay of the inverse susceptibility, χ −1(r), or, equivalently, of the direct pair correlation, g(r), is used to approximately close the hierarchy of correlation functions. The Percus-Yevik closure scheme, which gives an accurate equation of state for liquids of hard spheres, assumes that the inverse susceptibility vanishes above the interaction range of the potential (diameter of a particle).

  8. We have verified that the dependence on N is weaker in the case of periodic boundary conditions.

  9. The numerical experiments of [17] were done with ±1 spins and with coupling parameter θ and field ν=0; in the present work where spins are equal to 0,1, the corresponding couplings and fields are: J=4θ and \(h_{i}=-\frac{1}{2} \sum_{j\neq i} J_{ij}\).

  10. In mean-field spin-glasses, as the couplings scale as the inverse square root of the number N of spins, only loop diagrams have non-zero weights in the thermodynamical limit.

  11. With N=40, p=0.024, we find that the standard deviation is of the order of 10−13 for B=106, and 2⋅10−11 for B=105, see Fig. 7.

References

  1. Brush, S.G.: Rev. Mod. Phys. 39, 883 (1967)

    Article  ADS  Google Scholar 

  2. Schneidman, E., Berry, M.J. II, Segev, R., Bialek, W.: Nature 440, 1007 (2006)

    Article  ADS  Google Scholar 

  3. Tkacik, G., Schneidman, E., Berry, M.J. II, Bialek, W.: arXiv:q-Bio.NC/0611072 (2006)

  4. Marre, O., El Boustani, S., Frégnac, Y., Destexhe, A.: Phys. Rev. Lett. 102, 138101 (2009)

    Article  ADS  Google Scholar 

  5. Peyrache, A., et al.: Nat. Neurosci. 12, 919 (2009)

    Article  Google Scholar 

  6. Weigt, M., et al.: Proc. Natl. Acad. Sci. 106, 67 (2009)

    Article  ADS  Google Scholar 

  7. Balakrishnan, S., et al.: Proteins 79, 1061 (2011)

    Article  Google Scholar 

  8. Bialek, W., Cavagna, A., Giardina, I., Mora, T., Silvestri, E., Viale, M., Walczak, A.M.: arXiv:1107.0604 (2011)

  9. Cocco, S., Monasson, R.: Phys. Rev. Lett. 106, 090601 (2011)

    Article  ADS  Google Scholar 

  10. Jaynes, E.T.: Proc. IEEE 70, 939 (1982)

    Article  ADS  Google Scholar 

  11. Cocco, S., Leibler, S., Monasson, R.: Proc. Natl. Acad. Sci. 106, 14058 (2009)

    Article  ADS  Google Scholar 

  12. Opper, M., Saad, D. (eds.): Advanced Mean Field Methods: Theory and Practice. MIT Press, Cambridge (2001)

    MATH  Google Scholar 

  13. Ackley, D.H., Hinton, G.E., Sejnowski, T.J.: Cogn. Sci. 9, 147 (1985)

    Article  Google Scholar 

  14. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, Berlin (2009)

    MATH  Google Scholar 

  15. Wallace, C.S.: Statistical and Inductive Inference by Minimum Message Length. Information Science and Statistics. Springer, Berlin (2005)

    MATH  Google Scholar 

  16. Ravikumar, P., Wainwright, M.J., Lafferty, J.: Ann. Stat. 38, 1287 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  17. Bento, J., Montanari, A.: Which graphical models are difficult to learn? In: NIPS, 2009

  18. Jerrum, M., Sinclair, A.: The Markov chain Monte Carlo method: an approach to approximate counting and integration. In: Hochbaum, D.S. (ed.) Approximation Algorithms for NP-Hard Problems. PWS, Boston (1996)

    Google Scholar 

  19. Cagliotti, E., Kuna, T., Lebowitz, J.L., Speer, E.R.: Markov Process. Relat. Fields 12, 257 (2006)

    Google Scholar 

  20. Kuna, T., Lebowitz, J.L., Speer, E.R.: J. Stat. Phys. 129, 417 (2007)

    Article  MathSciNet  ADS  MATH  Google Scholar 

  21. Swendsen, R.H.: Phys. Rev. Lett. 52, 1165 (1984)

    Article  MathSciNet  ADS  Google Scholar 

  22. Meinshausen, N., Bühlmann, P.: Ann. Stat. 34, 1436 (2006)

    Article  MATH  Google Scholar 

  23. Aurell, E., Ekeberg, M.: arXiv:1107.3536 (2011)

  24. Plefka, T.: J. Phys. A, Math. Gen. 15, 1971 (1982)

    Article  MathSciNet  ADS  Google Scholar 

  25. Georges, A., Yedidia, J.: J. Phys. A, Math. Gen. 24, 2173 (1991)

    Article  ADS  Google Scholar 

  26. Georges, A.: In: Mancini, F., Avella, A. (eds.) Lectures on the Physics of Highly Correlated Electron Systems VIII: 8th Training Course in the Physics Correlated Electron Systems and High-Tc Superconductors, AIP Conf. Proc., vol. 715, p. 3 (2004)

    Google Scholar 

  27. Thouless, D.J., Anderson, P.W., Palmer, R.G.: Philos. Mag. 35, 593 (1977)

    Article  ADS  Google Scholar 

  28. Sherrington, D., Kirkpatrick, S.: Phys. Rev. Lett. 35, 1792 (1975)

    Article  ADS  Google Scholar 

  29. Tanaka, T.: Phys. Rev. E 58, 2302 (1998)

    Article  ADS  Google Scholar 

  30. Roudi, Y., Aurell, E., Hertz, J.: Front. Comput. Neurosci. 3, 22 (2009)

    Article  Google Scholar 

  31. Sessak, V., Monasson, R.: J. Phys. A 42, 055001 (2009)

    Article  MathSciNet  ADS  Google Scholar 

  32. Pelizzola, A.: J. Phys. A 38, R 309 (2005)

    Article  MathSciNet  ADS  Google Scholar 

  33. Mézard, M., Mora, T.: J. Physiol. (Paris) 103, 107 (2009)

    Article  Google Scholar 

  34. Marinari, E., Van Kerrebroeck, V.: J. Stat. Mech. P02008 (2010)

  35. de Dominicis, C.: J. Math. Phys. 3, 983 (1962)

    Article  ADS  Google Scholar 

  36. Hansen, J.P., McDonald, I.R.: Theory of Simple Liquids. Academic Press, New York (1976)

    Google Scholar 

  37. Frisch, H.L., Lebowitz, J.L.: The Equilibrium Theory of Classical Fluids. Benjamin, New York (1964). (A lecture note and reprint volume)

    MATH  Google Scholar 

  38. Flajolet, P., Gourdon, X., Dumas, P.: Theor. Comput. Sci. 144, 3 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  39. Schnitzer, M.J., Meister, M.: Neuron 37, 499–511 (2003)

    Article  Google Scholar 

  40. Gumbel, E.J.: Statistics of Extremes. Dover, New York (2004)

    MATH  Google Scholar 

  41. MacKay, D.J.C.: Neural Comput. 4, 415 (1991)

    Article  MathSciNet  Google Scholar 

  42. Bray, A.J., Moore, M.A.: J. Phys. A 10, 1927 (1977)

    Article  MathSciNet  ADS  Google Scholar 

  43. Percus, J.K., Yevick, G.J.: Phys. Rev. 110, 1 (1958)

    Article  MathSciNet  ADS  MATH  Google Scholar 

  44. Borzi, C., Ord, G., Percus, J.K.: J. Stat. Phys. 46, 51 (1986)

    Article  MathSciNet  ADS  Google Scholar 

  45. Barton, J.: private communication

  46. Gori, G., Trombettoni, A.: J. Stat. Mech. P10021 (2011)

  47. Fisher, M.: J. Math. Phys. (N.Y.) 5, 944 (1964)

    Article  ADS  Google Scholar 

  48. Fisher, M.: Phys. Rev. 162, 480 (1967)

    Article  ADS  Google Scholar 

  49. Zobin, D.: Phys. Rev. 5, 2387 (1978)

    Google Scholar 

Download references

Acknowledgements

We are grateful to J. Barton, J. Lebowitz, E. Speer for very useful and stimulating discussions, in particular regarding the correspondence between the inverse susceptibility and the direct correlation functions and the practical implementation of the inference algorithm. We thank E. Aurell for pointing to us the difference between P and Q, see Sect. 5.4.1.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to R. Monasson.

Appendices

Appendix A: Expression of the Entropy of Clusters with Size K=3

In this appendix, we give the analytical expression for the entropy of a cluster with K=3 spins. Using this expression instead of minimizing the cross-entropy (7) offers a valuable computational speed-up as there are O(N 3) clusters of size K=3. We start with the definition of the entropy P(σ):

$$S_3=-\sum_{\sigma_1=0,1}\sum _{\sigma_2=0,1}\sum_{\sigma_3=0,1} P(\sigma_1,\sigma_2,\sigma_3) \log P(\sigma_1,\sigma_2,\sigma_3) . $$
(A.1)

We then replace the probabilities P(σ 1,σ 2,σ 3) of the eight configurations of the three spins above with their expressions in terms of the probabilities {p i ,p kl } in the data, and of the probability p 123 that the three spins are equal to 1: P(1,1,1)=p 123, P(1,1,0)=p 12p 123, P(1,0,1)=p 13p 123, P(0,1,1)=p 23p 123, P(0,0,1)=p 3p 23p 13+p 123, P(0,1,0)=p 2p 12p 23+p 123, P(1,0,0)=p 1p 13p 12+p 123, P(0,0,0)=1−p 1p 2p 3+p 12+p 13+p 23p 123. The only unknown quantity (not available from p) is the probability p 123. To determine p 123 we impose

$$ \frac{d S_3}{d p_{123} }=0 ,$$
(A.2)

which means that the three-body coupling J 123 vanishes. Condition (A.2) gives a third degree equation on p 123,

$$p_{123}^3+\alpha p_{123}^2+\beta p_{123}+\gamma=0 $$
(A.3)

with

(A.4)

Upon substitution of p 123 in (A.1) we obtain the desired cross-entropy S 3, as a function of the three average values p i and the three two-point averages p kl . The expression of the cluster-entropy is given by,

(A.5)

according to (28). The expressions of the cluster-entropies for one and two spins are given by, respectively, (23) and (25). Similarly, one obtains the expressions for the contributions of the 3-spin cluster to the values of the interactions parameters by differentiating ΔS with respect to the p i ’s and the p kl ’s.

Appendix B: Diagrammatic Expansion of the Cluster-Entropies in Powers of the Connected Correlations

A better understanding of the cluster expansion and of the role of the reference entropy S 0 can be gained through the diagrammatic expansion of the entropy S(p) in powers of the connected correlations (high-temperature expansion),

$$c_{ij} = p_{ij}-p_ip_j .$$
(B.1)

Note that the entry M ij of the matrix M defined in (21) vanishes linearly with c ij . Thus, an expansion in powers of c ij is equivalent to an expansion in powers of M ij . A procedure to derive in a systematic way the diagrammatic expansion of S(p) is proposed in [31]. The diagrammatic expansion provides a simple representation of the cluster-entropies, in which the entropy S(p) can be represented as a sum of connected diagrams (Fig. 26). Each diagram is made of sites, connected or not by one or more edges. Each point symbolizes a variable, and carries a factor p i . The presence of n(≥0) edges between the sites k and l results in a multiplicative factor (c kl )n. The contribution of a diagram to the entropy is the product of the previous factors, times a function of the p i specific to the topology of the diagram, see [31]. Diagrams of interest include (Fig. 26):

  • the N single-point diagrams, whose contributions are ΔS (i)(p i );

  • the ‘loop’ diagrams, which consist of a circuit with K edges going through K sites i 1i 2→⋯→i K i 1, whose contributions to the entropy are

    $$ S_{\mathit{loop}} (\mathbf{p}|i_1,i_2,\ldots,i_K) = (-1)^{K-1} M_{i_1,i_2}M_{i_2,i_3}\ldots M_{i_{K-1},i_K}M_{i_K,i_1} ;$$
    (B.2)
  • the Eulerian circuit diagrams, for which there exists a closed path visiting each edge exactly once;

  • the non-Eulerian diagrams, with the lowest number of links (smallest power in M).

The entropy for two variables i,j, S(p i ,p j ,p ij ) (24), is the sum of the two single-point diagrams i and j, plus the sum of all connected diagrams made of the two sites i and j with an arbitrary large number of edges (n≥2) in between (first two columns in Fig. 26). According to (25), the cluster-entropy ΔS (i,j)(p i ,p j ,p ij ) is equal to the latter sum (second column in Fig. 26). More generally, the entropy of a cluster ΔS Γ(p) is the infinite sum of all diagrams whose sites are the indices in Γ.

Fig. 26
figure 30

Diagrammatic expansion of the cross-entropy S(p). A cluster-entropy (see Fig. 2) is the infinite sum of all the diagrams in a box (dashed contour), linking the K sites in the cluster. Each link in a diagram carries M ij , and each site p i ; in addition, each diagram carries a multiplicative factor, which is a function of the p i ’s. In the figure only one cluster among all \({N\choose K}\) clusters is represented. Only the first diagrams with non-zero coefficients are drawn. Loop diagrams are analytically summed up and removed from the expansion through the reference entropy S 0=S MF ; Eulerian circuit diagrams (brown/gray) are partly removed, see main text. Diagrams giving the largest contributions to the universal central peak of the cluster-entropy distribution (Appendix D) are shown in bold

We now interpret the Mean Field expression for the entropy, S MF , in the diagrammatic framework. We start from identity (21), and rewrite,

(B.3)

Using the fact that the diagonal elements of M are equal to unity, the term corresponding to K=1 above vanishes. For K≥2, we have

(B.4)

where the matrix \(\hat{M}\) has the same off-diagonal elements as M, and has zero diagonal elements. Each term in the above sum corresponds to an Eulerian circuit over K′≤K sites, where K′ is the number of distinct indices in (i 1,i 2,…,i K ). Note that the same circuit can be obtained from different K-uplets of indices. Consider for instance the longest circuits, obtained for K′=K, i.e. all distinct indices. 2K different K-uplets (i 1,i 2,…,i K ) correspond to the same circuit, as neither the starting site nor the orientation of the loop matter. For instance, i 1i 2i 3i 1, i 2i 3i 1i 2, i 1i 3i 2i 1, … are all equivalent. This multiplicity factor 2K precisely cancels the 2K at the denominator in (B.3). The contribution corresponding to a circuit therefore coincides with expression (B.2) for the loop entropy. We conclude that

  • S MF (p) sums up all loop diagrams exactly;

  • S MF (p), in addition, sums up Eulerian circuit diagrams, but with weights a priori different from their values in the cross-entropy S(p).Footnote 10 An exception is the three-variable Eulerian diagram shown in Fig. 26, whose weights in S MF and S coincide.

  • no non-Eulerian diagram is taken into account in S MF (p).

As a conclusion, the diagrammatic expansion provides a natural justification for the choice of the reference entropy S 0(p)=S MF (p). In addition, it provides us with the dominant contribution to the cluster-entropies once the Mean-Field entropy is substracted, see Fig. 26. A detailed study of those dominant contributions is presented in Appendix D.

Appendix C: Properties of the Cluster-Entropies of the One-dimensional Ising Model

Consider the one-dimensional Ising model with nearest-neighbor couplings and periodic boundary conditions. The Hamiltonian of the model is

$$ H=-h \sum_i \sigma_i- J\sum_i \sigma_i \sigma_{i+1},$$
(C.1)

where the spins σ i take 0,1 values. The parameters of the model are the N identical fields h i =h, the N couplings J i,i+1=J between neighbors and the remaining N×(N−3)/2 zero couplings J i,j =0 between non-neighbors.

We recall a few elementary facts about the model. The transfer matrix is

(C.2)

The eigenvalues are \(\lambda_{\pm}=\frac{1}{2} (e^{J+h}+1\pm \sqrt{(e^{J+h}-1)^{2}+4 e^{h}} )\), and the two components of the eigenvectors are, respectively, \(v_{\pm} (1)=-(1-\lambda_{\pm })/\sqrt{e^{h}+(1-\lambda_{\pm})^{2}}\) and \(v_{\pm} (2)=e^{h/2}/\sqrt {e^{h}+(1-\lambda_{\pm})^{2}}\). The probability that a spin is up is given by, in the N→∞ limit,

$$p=\langle\sigma_i\rangle_\mathbf{J} = \bigl(v_+(1)\bigr)^2 , $$
(C.3)

and the connected correlation at distance d is

$$ c_{i,i+d}= \langle\sigma_i\sigma_{i+d}\rangle_\mathbf{J}-\langle \sigma_i\rangle_\mathbf{J}\langle\sigma_{i+d}\rangle_\mathbf{J}=p(1-p) \biggl(\frac{\lambda_-}{\lambda_+} \biggr)^{d}= p(1-p) \exp(-d/\xi ),$$
(C.4)

where the correlation length is given by ξ=−1/log(λ /λ +).

3.1 C.1 Calculation of the Cluster-Entropies and Cancellation Property

In this section, we show the exact cancellation property between the entropies of clusters with different sizes discussed in Sect. 4.1. We will see that this property is a direct consequence of the existence of a unique interaction path along the unidimensional chain.

3.1.1 C.1.1 Case S 0=0

We first consider the case where the reference entropy is zero. Let Γ=(i 1,i 2,…,i K ) be a cluster of size K, with i 1<i 2<⋯<i K . Due to the unidimensional nature of the interactions, the Gibbs distribution over the K-spin configurations σ obeys the chain rule,

$$ P_\mathbf{J}[\boldsymbol{\sigma}] = P_\mathbf{J}(\sigma_{i_K}| \sigma_{i_{K-1}})\dots P_\mathbf{J}(\sigma_{i_4}| \sigma_{i_3}) P_\mathbf{J}(\sigma_{i_3}| \sigma_{i_2}) P_\mathbf{J}(\sigma_{i_2},\sigma_{i_1}) ,$$
(C.5)

where P(⋅,⋅) and P(⋅|⋅) denote, respectively, joint and conditional probabilities. Inserting the above formula into expression (9) for the cross-entropy, we obtain

(C.6)

Each variable \(\sigma_{i_{l}}\), with l=2,…,K−1, appears three times in (C.6), which explains the presence of three fields h with the same index i l . After optimization over \(\mathbf{J}=(\{J_{i_{l},i_{l+1}}\},\{h^{\rightarrow}_{i_{l}}\},\{h^{\leftarrow}_{i_{l}}\},\{h^{0}_{i_{l}}\})\) all these fields are equal, and we obtain

(C.7)

Hence the cross-entry S(p) is the sum of the 1-cluster entropies and of the entropies of the 2-clusters made of adjacent sites. None of the other cluster-entropies appear, which proves that they cancel each other. To illustrate the cancellation mechanism, consider the case K=3. According to (C.7),

$$ S(\mathbf{p}) =\Delta S_{(i_1,i_2)}(\mathbf{p}) +\Delta S_{(i_2,i_3)}(\mathbf{p})+ \Delta S_{(i_1)} (\mathbf{p}) +\Delta S_{(i_2)} (\mathbf{p}) +\Delta S_{(i_3)} (\mathbf{p}) .$$
(C.8)

Comparing with (28) we obtain

$$\Delta S_{(i_1,i_2,i_3)} (\mathbf{p}) = - \Delta S_{(i_1,i_3)} (\mathbf{p}) ,$$
(C.9)

which shows that the entropy of a 3-cluster and the one of a 2-cluster with the same extremities i 1,i 3 are opposite to one another. By a recursive applications of (C.7) this result can be immediately generalized to higher values of K. The entropy of a K-cluster is simply the entropy of the 2-cluster with the same extremities, multiplied by (−1)K−2. Hence, identity (43) is established.

According to formula (C.4) for the connected correlation, the entropy of a two-site cluster is a function of the distance d between the two sites:

$$ \Delta S_{(i,i+d)} = F \bigl(\exp(-d/\xi) \bigr) ,$$
(C.10)

where

(C.11)

To obtain the expression (C.11) for F, we have used formula (26) for the 2-spin cluster-entropy, with p 1=p 2=p and p 12=p 2+c 12, where the correlation c 12 is given by (C.4). Note that F(u)=O(u 2) for small u, in agreement with scaling (41).

3.1.2 C.1.2 Case S 0=S MF

We now introduce the reference entropy S 0=S MF . The matrix M defined in (21) has elements

$$M_{ij} = \frac{c_{ij}}{\sqrt{p_i(1-p_i)p_j(1-p_j)}}= \exp \bigl(-|i-j|/\xi\bigr) .$$
(C.12)

The inverse of M, G=M −1, is a tridiagonal matrix, whose non-zero elements are

$$ G_{ii}= \frac{1+\exp(-2/\xi)}{1-\exp(-2/\xi)}, \qquad G_{i,i\pm1}= - \frac{\exp(-1/\xi)}{1-\exp(-2/\xi)} .$$
(C.13)

Consider now the Gaussian model over N real-values variables φ i , whose energy function is given by

$$ E[\boldsymbol{\varphi}] = \frac{1}{2} \sum_{i,j}G_{ij} \varphi_i \varphi_j .$$
(C.14)

For this Gaussian model, the logarithm of the partition function is (up an irrelevant additional constant), \(\log Z[\mathbf{G}]= -\frac{1}{2} \log \det G\). By construction, model (C.14) is the solution of the inverse Gaussian problem, with data: 〈φ i 〉=0,〈φ i φ j 〉=M ij . Hence, S 0 can be interpreted as the cross-entropy of Gaussian model (C.14) under those data. A key feature of the Gaussian model above is that its interaction matrix G ij is tridiagonal. Only nearest neighbor variables are coupled to each other according to (C.13). We conclude that the Gaussian model is a one-dimensional model. Consequently, it obeys a chain rule similar to (C.5). This is the only requirement for the main conclusion of Sect. C.1.1 to hold: in the cluster expansion of S 0, the entropy of a K-cluster is simply equal to the entropy of the 2-cluster with the same extremities, multiplied by (−1)K−2. As both the expansions of S and the one of S 0 enjoy this property, so does the expansion of SS 0.

We conclude this section by the expression of the 2-cluster entropy ΔS (i,i+d). In the presence of the reference entropy S 0=S MF , we substract the following contribution to expression (C.10), see (21),

(C.15)

Hence, function F(u) defined in (C.11) should be substracted \(\frac{1}{2} \log(1-u^{2})\). It is a simple check that \(F(u)- \frac{1}{2} \log (1-u^{2})= O(u^{3})\), in agreement with scaling (45).

3.2 C.2 Examples and Calculation of Diagrammatic Coefficients

We have studied the histograms of cluster-entropies with K=2 and K=3 spins for specific choices of J,h. The averages p i and p ij were calculated exactly through formulas (C.3) and (C.4) (perfect sampling). We have found that

  • for K=2, the cluster-entropies take discrete values, labelled by the distance d between the two extremities of the cluster, e.g. Γ=(i,i+d). Expanding F(u) to the lowest order in u (for S 0=S MF ) we find the asymptotic formula for the 2-cluster entropy:

    $$ \Delta S_{i,i+d} \simeq\frac{(2p-1)^2}{6p(1-p)}e^{-3d/\xi} ,$$
    (C.16)

    in agreement with (D.1). We have verified that this formula is in very good agreement with the numerics at large distance d.

  • for K=3, we find that the entropies of 3-clusters of the type (i 1,i 2,i 3) depend only on the distances d=i 3i 1 between the extremities. In addition, the values are exactly the opposite of the ones found for K=2-clusters with the same distance d as expected. Two differences are:

    • The peak in d=1 in the histogram is not present because the minimal distance between three spins is d=2. The largest 3-spins entropy thus corresponds to triplets of the type (i,i+1,i+2).

    • The height of the peak (number of clusters) corresponding to distance d is (d−1)N. The degeneracy (d−1) is the number of ways of choosing the location of site i 2 in between i 1 and i 3.

We now show how the value of the cluster entropy can be found back from the leading terms in the diagrammatic expansion calculated in Sect. D.4. Let us call d′=ji<d the distance between the first two sites in the cluster. For each diagram in Fig. 27 we give in Table 1 the sum of the distances of its links, i.e. the power of exp(−1/ξ).

Fig. 27
figure 31

Leading diagrams to the order 5 (top) and 6 (bottom) in the connected correlation for the entropy of 3-clusters

Table 1 Distances corresponding to the diagrams of Fig. 27

Interestingly, the lowest total distances are found in diagrams (a) and (d), while the latter diagram is of a higher power (6) in terms of the correlated function than the former (5). Hence, contrary to the case of independent spins (Appendix F), diagrams (a) and (d) give the dominant contributions to the entropy. Summing the contributions of (a) and (d) we find

(C.17)

To derive the coefficients α (5) and α (6), we impose that ΔS (i,j,k) is the opposite of (C.16). We deduce that α (5) and α (6) are given by, respectively, (D.4) and (D.6).

The exact cancellation property discussed above has important consequences for the inferred fields and couplings. Consider for instance the coupling J i,i+2, which vanishes in the 1D-Ising model with nearest-neighbor interactions (C.1). As the connected correlation c i,i+2 is not equal to zero, a contribution to the coupling will be collected from the cluster (i,i+2) itself, equal to

$$\Delta J_{i,i+2;(i,i+2)}=- \frac{\partial\Delta S_{(i,i+2)}}{\partial p_{i,i+2}} .$$
(C.18)

Other contributions will come from larger clusters. For instance the cluster (i,i+1,i+2) will give an additional

$$\Delta J_{i,i+2;(i,i+1,i+2)}=- \frac{\partial\Delta S_{(i,i+1,i+2)}}{\partial p_{i,i+2}} .$$
(C.19)

The sum of the two contributions above vanishes due to the cancellation property. It can be checked that the contributions coming from all the other clusters vanish, too, which makes the coupling J i,i+2=0 as it should.

Appendix D: Leading Diagrammatic Contributions to Small Cluster-Entropies

We analyze the dominant diagrams contributing to the cluster-entropies for the various values of the cluster sizes, K, in the limit of small connected correlations c kl .

4.1 D.3 Case K=2

The entropy ΔS (i,j) of a 2-spin cluster is the sum of all diagrammatic contributions containing two spins and an arbitrary number of links between them, corresponding to the power of the expansion parameter M ij =c ij /(p i (1−p i )p j (1−p j )) (Fig. 26) and Appendix B. For small values of M ij the largest contribution to ΔS (i,j) is the one with three links (cubic power of M ij ), if the reference entropy S 0=S MF removes the two-link loop diagram. The entropy contribution of this diagram was computed in [31], with the result

$$ \Delta S^{(3)}_{i,j}=\alpha_{i,j} (c_{ij})^3,$$
(D.1)

where

$$\alpha_{i,j}^{(3)}= \frac{(2 p_i-1)(2 p_j-1)}{6 (p_i)^2(1-p_i)^2(p_j)^2(1-p_j)^2} . $$
(D.2)

The superscript 3 refers to the power of the connected correlation.

4.2 D.4 Case K=3

For K=3 the leading term to ΔS (i,j,k ) in powers of M ij was not derived analytically in [31]. Based on the studies of the unidimensional Ising model and the independent spin models (Appendix C), we find that the leading diagrams are diagrams (a), (b), (c) in Fig. 27 (bold diagrams in Fig. 26), whose sum is given by

$$ \Delta S^{(5)}_{i,j,k}= \alpha^{(5)}_{ijk}(c_{ij})^2(c_{jk})^2c_{ki} +\alpha^{(5)}_{jik}(c_{ij})^2c_{jk}(c_{ki})^2+ \alpha^{(5)}_{jki}c_{ij} (c_{jk})^2(c_{ki})^2,$$
(D.3)

with

$$ \alpha_{ijk}^{(5)}=-\frac{(2 p_i-1)(2p_k-1)}{2 (p_i)^2(1-p_i)^2(p_j)^2(1-p_j)^2(p_k)^2(1-p_k)^2 }.$$
(D.4)

Note that \(\alpha^{(5)}_{ijk}\) differs from \(\alpha^{(5)}_{jik}\). We have also found the coefficients of the subsequent diagrams, of the order of M 6. These diagrams are labelled by (d), (e), (f) in Fig. 27. Their total contribution to the cluster-entropy is

$$ \Delta S^{(6)}_{i,j,k}= \alpha^{(6)}_{ijk}(c_{ij})^3(c_{jk})^3+\alpha^{(6)}_{jik} (c_{ij})^3(c_{ki})^3 + \alpha^{(6)}_{jki}(c_{jk})^3(c_{ki})^3$$
(D.5)

with

$$ \alpha^{(6)}_{ijk} =\frac{(2 p_i-1)(2p_k-1)}{3 (p_i)^2(1-p_i)^2(p_j)^3(1-p_j)^3(p_k)^2(1-p_k)^2 }.$$
(D.6)

4.3 D.5 Generic Case K≥4

The above results for K=3 are easily generalized to any value of the cluster size K≥4. The diagrammatic expansion of a K-spin cluster includes all circuits where pairs of spins are linked together. Each diagram with (one or two) links between i l and i l+1 (l=1,…,K−1) and (one or two) links between i 1 and i K gives

(D.7)

At the next order in power of M ij , each diagram with three links between i l and i l+1 (l=1,…,K−1) gives a contribution

(D.8)

Appendix E: Critical Correlation Length ξ c for the Absolute Convergence

In this appendix, we briefly explain why the cluster-entropy series is absolutely convergent if and only if the correlation length ξ is smaller than

$$\xi_c =\frac{\Omega}{\log v} .$$
(E.1)

Here, Ω=2 when the reference entropy is S 0=0, and Ω=3 when S 0=S MF . Parameter v denotes the number of neighbors of a site on the lattice, supposed to be uniform. For instance, v=2D on a hypercubic lattice in dimension D≥1.

Consider a set of K distinct points on the lattice. Let \({\mathcal{N}}(L)\) be the number of closed paths of length L visiting all K points. We obviously have \({\mathcal{N}} (L)\leq v^{L}\). Hence, the series

$$ \sum_L {\mathcal{N}} (L) \exp (-\Omega L/\xi )$$
(E.2)

is convergent if ξ<ξ c . Reciprocally, let L 0 be the length of the shortest closed path \({\mathcal{C}}_{0}\) encircling the K points. A closed path of length L 1+L 0 can be built from \({\mathcal{C}}_{0}\) by attaching a closed loop of length L 1 to any one of the sites in \({\mathcal{C}}_{0}\). Hence, for LL 0+2, \({\mathcal{N}} (L)\geq L_{0}v^{L-L_{0}}\). We deduce that the series (E.2) is divergent if ξ>ξ c .

Appendix F: Distribution of Cluster-Entropies for the Independent Spin Model

We generate B configurations of N independent spins σ i . Spin i is equal to 1 with probability p and to zero with probability 1−p (for simplicity we assume here that all the frequencies p i are equal to the same value p). The empirical connected correlations c ij computed from the B sampled configurations of spins are generally non-zero. The marginal distribution of c ij is a normal law, with zero mean and standard deviation (48). The largest values of the correlations are, for a system with N spins, of the order of

$$c_{ij}^{\mathit{MAX}}=c_B\sqrt{ 4 \log N} , $$
(F.1)

according to extreme value theory.

We have checked the validity of formulas (D.3) and (D.5) for, respectively, the cluster-entropies ΔS ij and ΔS ijk with numerics carried out from randomly sampled configurations. The agreement, for B=106 sampled configurations, is excellent due to the small value of c B ≃2⋅10−5.

6.1 F.6 Distribution of Cluster-Entropies for K=2

To derive the analytical expression of the distribution in the N→∞ limit, we use the small-correlation formula (D.1) for ΔS (1,2), and the fact that the distribution of the connected correlation is Gaussian. As a result, approximating α 1,2 with its average value α obtained by substituting p 1 and p 2 with p in (D.2), we obtain

$$H_{\mathit{IS}}(\Delta S_{(1,2)})= \frac{\exp \bigl(-\frac{(\Delta S_{(1,2)})^{2/3}}{2 (c_B)^2\alpha^{2/3}} \bigr)}{3\alpha^{1/3} \sqrt {2\pi(c_B)^2}(\Delta S_{(1,2)})^{2/3} }. $$
(F.2)

This distribution is a stretched exponential at infinity, and diverges in zero. Its standard deviation is

$$\sigma_{\Delta S_{(1,2)}} =\sqrt{15}\alpha (c_B)^3=\frac{\sqrt{15} (2p-1)^2}{6 p(1-p) B^{3/2}}.$$
(F.3)

This analytical prediction is in very good agreement with the distribution of cluster-entropies found from numerical data for large values of B, e.g. B=106 for p=0.024. For finite N, the largest correlations are given by (F.1), and cluster-entropies cannot exceed ΔS (1,2), of the order of

$$\Delta S_{(1,2)}^{\mathit{MAX}} \simeq(4 \log N )^{3/2} (\sigma_{\Delta S_{(i,j)}} )^{1/2} . $$
(F.4)

6.2 F.7 Distribution of the Cluster-Entropies for K=3

The leading order contribution to the entropy of a 3-cluster is given by (D.3). We want to calculate the distribution of ΔS (1,2,3) when the connected correlations c ij are random Gaussian variables, of zero mean and variance (c B )2. We neglect the correlations between c 12,c 13,c 23, which is legitimate for large B. Let us call xS (1,2,3)/(α(c B )5), with α given by (D.4). We can easily evaluate the variance of each of the three terms of the sum in (D.3) as the product of the variances of the three terms in the product, based on the approximation that the connected correlations c ij are independent stochastic variables. We obtainFootnote 11

$$ \sigma_{\Delta S_{ijk}} =\frac{ 3\sqrt{3} (2p-1)^2 (c_B)^5}{2 p^6(1-p)^6} =\frac{3\sqrt{3} (2p-1)^2}{2p(1-p)B^{5/2}} .$$
(F.5)

Let P(x) be the probability density of x. Though we have not been able to find a closed expression for P(x), the asymptotics behavior of P for large or small arguments can characterized analytically.

6.2.1 F.7.3 Large x Behavior

The Mellin transform of P [38] is

$$ \int_0^\infty dx P(x)x^\lambda= \biggl(\frac{2}{\pi}\biggr)^{3/2} \int _0^\infty dc_{12} dc_{13}dc_{23} e^{-F(c_{12},c_{13},c_{23})}$$
(F.6)

where

(F.7)

The tail of P(x) at large x can be studied by considering large values of λ. We expect the dominant contribution to the multiple integral on the right hand side of (F.6) to come from large correlations. The location of the main contribution to the integral is the value of (c 12,c 13,c 23) which maximizes F. As F is invariant under any permutation of its arguments, we look for a maximum where c 12=c 13=c 23c . A straigthforward calculation shows that

$$c^*(\lambda)=\sqrt{\frac{5}{3} \lambda}, \qquad F^*(\lambda)=\frac{5}{2} \lambda\log \lambda+ \lambda \biggl(\log3 +\frac{5}{2} \log\frac{5}{3} -\frac{5}{2} \biggr) .$$
(F.8)

We now use the saddle-point method again, this time to estimate the integral on the left hand side of (F.6). We obtain

$$ \max_x \bigl[ \log P(x) + \lambda \log x \bigr]= F^*(\lambda) ,$$
(F.9)

which is true when λ is very large. Hence, F (λ) is the Legendre transform of logP(x). Solving (F.9) gives

$$ \log P(x) \simeq- \frac{3}{2} \biggl(\frac{x}{3}\biggr)^{2/5}$$
(F.10)

at large x. The distribution of the cluster entropies ΔS (1,2,3) thus follows a stretched exponential with exponent \(\frac{2}{5}\). This decay is much slower than an exponential, and leads to large tails as can be seen from Fig. 7.

6.2.2 F.7.4 Small x Behavior

In order for the rescaled entropy x to be small, at least one among the three correlations should be small according to (D.3). Without restriction, we may assume that c 12 is the smallest of the three correlations. As c 12 appears once with power one, and twice with power two in (D.3), we approximate \(x \simeq c_{12}c_{13}^{2}c_{23}^{2}\). The Mellin transform of P is, for negative λ,

$$ \int_0^\infty dx P(x)x^\lambda\simeq3 \biggl(\, \int_0^\infty dc\frac{2}{\sqrt{2\pi}} c^\lambda e^{-c^2/2} \biggr) \biggl(\, \int _c^\infty dc \frac{2}{\sqrt{2\pi}}c^{2\lambda} e^{-c^2/2} \biggr)^2 .$$
(F.11)

The largest pole is located in \(\lambda=-\frac{1}{2}\), and is of order 2. According to standard results on the inversion of Mellin transforms [38], we obtain a precise characterization of the divergence of the probability density at small x,

$$P(x) \simeq C \frac{(-\log x)}{\sqrt{x}} ,$$
(F.12)

where C is a constant.

6.3 F.8 Distribution of Cluster-Entropies for Generic K≥4

In general, for K≥3, the leading contribution to \(\Delta S_{(i_{1},i_{2},\ldots i_{K})}\) (D.7) contains the sum of K×(K−1)!/2 terms, each one being the product of K random variables, among which (K−1) are elevated to power two, and 1 is elevated to power 1. The factor K comes from the fact that there are K way of choosing the single link in the circuits with K spins. The factor (K−1)!/2 is the number of non-equivalent circuits going through K spins. We define the rescaled entropy x through

$$x =|\Delta S_{(i_1,i_2,\ldots i_K)}|\times \frac{2 (p(1-p))^{2K}}{\sqrt{\frac{K!}{2}} (2p-1)^2 (c_B)^{2K-1}}.$$
(F.13)

The approach followed in Sect. F.7 to calculate the asymptotic behavior of the probability density P of x for K=3 can be extended without difficulty to any value of K>3. We find that P(x) diverges when x→0, with

$$ P(x) = C \frac{(-\log x)^{K-2}}{\sqrt{x}}$$
(F.14)

where C is a constant. Hence the shape of the distribution of x is, up to logarithmic terms, independent of K. On the contrary, the tail of the distribution for large x is very sensitive to K,

$$ \log P(x) \simeq-\frac{K}{2(K-\frac{1}{2})^2} \biggl(\frac{x}{K}\biggr)^{2/(2K-1)}.$$
(F.15)

As in the K=3 case, the distribution of the cluster entropies ΔS follows a stretched exponential. The exponent of the stretched exponential decreases with K. The variance of the distribution can be easily evaluated, with the result

$$ \sigma_{\Delta S_{(i_1,i_2,\ldots i_K)}}= \frac{\sqrt{K!/2} (\sqrt{3})^{K-1} (2p-1)^2 (c_B)^{2K-1}}{ 2 (p(1-p))^{2K}} .$$
(F.16)

Appendix G: Inverse Susceptibility Matrix for the Unidimensional Ising Model

Hereafter, we want to invert the matrix χ, whose elements are given in (60). The matrix is of dimension \(\frac{1}{2} N(N-1)\), and each element is labelled by two indices (i,j) and (k,l), with i<j and k<l. Each index (i,j) can be represented by a site of coordinates i and j on the half-grid of Fig. 28(a). We now show that the non-zero entries of the inverse susceptibility matrix, (χ −1) ij,kl , are in one-to-one correspondence with the sites (i,j) and (k,l) that are either identical, or nearest neighbors, or diagonally opposed on the elementary mesh of the half-grid (Fig. 28(b, c, d)). Depending on the value of the difference ji, the number of those sites is equal to 9, 8, or 6.

Fig. 28
figure 32

Half grid representing the index (i,j) of the entries of the inverse susceptibility matrix, with i<j (a). Black circles locate the nearest-neighbors and the diagonally opposed sites (k,l) of (i,j) (cross), with i=4 and j=9 (b), 6 (c), 5 (d)

We start with the case ji≥3 (Fig. 28(b)). By symmetry, the nine unknown matrix elements (χ −1) ij,kl take only three independent values, denoted by γ for (k,l)=(i,j), β for (k,l) and (i,j) nearest neighbors, and α for (k,l)=(i±1,j±1). We now write the matrix inversion identity,

$$ \sum_{k<l} \bigl(\chi^{-1}\bigr)_{ij,kl} \chi_{kl,mn} = \delta_{i,m}\delta_{j,n} ,$$
(G.1)

for various values of (m,n). Let d=ji. For m=i,n=j, constraint (G.1) gives

$$\gamma\bigl( 1- x^{2d}\bigr) + 2\beta \bigl( 2x-x^{2d-1}-x^{2d+1}\bigr)+\alpha \bigl( 4x^2-x^{2d-2}-x^{2d}-x^{2d+2}\bigr)= 1,$$
(G.2)

which should hold for all d≥3. We deduce two coupled equations for the three unknown variables:

(G.3)
(G.4)

For m=i+1,n=j, constraint (G.1) is equivalent to

(G.5)

The d-dependent term in the equation above cancels by virtue of (G.3). We are left with an additional equation over α,β,γ:

$$\gamma x + \beta \bigl(1+ 3x^2\bigr)+2 x\bigl(1+x^2\bigr) \alpha=0 . $$
(G.6)

By symmetry of the matrices χ,χ −1, no new constraint is obtained when the values of m,n are further varied. Solving (G.3), (G.4), (G.6) we obtain

$$\alpha= \frac{x^2}{(1-x^2)^2}, \qquad \beta=- \frac{x(1+x^2)}{(1-x^2)^2} ,\qquad \gamma=\frac{(1+x^2)^2}{(1-x^2)^2} .$$
(G.7)

The analysis of the other cases j=i+2 (Fig. 28(c)) and j=i+1 (Fig. 28(d)) can be done along the same lines. We do not write the calculations in details, and simply report the results. The case j=i+2 is very similar to the previous case. There are 8 coefficients to be calculated, with three independent values, α′,β′,γ′. It turns out that

$$\alpha'=\alpha,\qquad \beta'=\beta,\qquad \gamma'=\gamma.$$
(G.8)

As for the last case, j=i+1, we call α″ the values of the entries of χ −1 with (k,l)=(i−1,j−1),(i−1,j+1),(i+1,j+1), β″ the values of the entries with (k,l)=(i−1,j),(i,j+1), and γ″ the diagonal element corresponding to (k,l)=(i,j). After some elementary algebra, we find

$$\alpha''=\alpha,\qquad \beta''= \beta,\qquad \gamma''=\frac{1+x^2+x^4}{(1-x^2)^2} .$$
(G.9)

All those results are reported in (61).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cocco, S., Monasson, R. Adaptive Cluster Expansion for the Inverse Ising Problem: Convergence, Algorithm and Tests. J Stat Phys 147, 252–314 (2012). https://doi.org/10.1007/s10955-012-0463-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10955-012-0463-4

Keywords

Navigation