In fact, whether
\(X_1\) and
\(X_2\) have a causal relation also depends on their environment
W, which was first made precise by the common cause principle of Reichenbach (
1956). This principle makes it possible to infer causal relation from statistical relation. Specifically, it follows from the non-correlation
$$\begin{aligned} E[X_1,X_2|W]=E[X_1|W]E[X_2|W] \end{aligned}$$
(1)
or the conditional independence
$$\begin{aligned} p(x_1,x_2|w)=p(x_1|w)p(x_2|w) \end{aligned}$$
(2)
that we can infer that there must exist one of the three causal relationships
\(X_1 \leftarrow W \rightarrow X_2, \ X_1\rightarrow W \rightarrow X_2, \ X_1\leftarrow W \leftarrow X_2\), though we can not identify specifically which one. We may identify Eq. (
1) or even Eq. (
2) from samples of variables
\(x_1,x_2, w\) when they are binary variables. However, it becomes increasingly difficult when the variables take multiple values or even continuous values, for which a kernel-based approach has been proposed to deal with such a task in Fukumizu et al. (
2008). Even worse, the environment typically consists of a set of features
\(W_1, \ldots, W_k,\) which makes the task become even much more difficult. Alternatively, the Rubin Causal Model was first proposed in 1974 by Rubin and subsequently studied for many years (Rubin and Rubin
2011), which considers the so-called average causal effect (ACE) by computing
\(E[X_2|X_1,W]\) or its differences with
\(X_1,W\) taking different values.
Pearl (
1986) has shown that the following decomposable distribution
$$\begin{aligned} p(x_1,x_2,x_3,w)=p(w)p(x_1|w)p(x_2|w)p(x_3|w) \end{aligned}$$
(3)
of dichotomous variables
\(x_1,x_2,x_3,w\) can be identified by examining whether the observable three-variable distribution
$$\begin{aligned} p(x_1,x_2,x_3)=\sum _w p(x_1,x_2,x_3,w) \end{aligned}$$
(4)
satisfies a necessary and sufficient condition on seven joint-occurrence probabilities of one, two, and three dichotomous variables, where these joint-occurrence probabilities are estimated from samples of
\(x_1,x_2,x_3\). Moreover, a necessary but not sufficient condition for
\(p(x_1,x_2,x_3)\) to be star-decomposable (as illustrated in Fig.
1a, b and to be further described in "
Methods") is that all correlation coefficients
\(\rho _{ji}, \ i,j \in \{1,2,3\}\) obey the following triangle inequalities:
$$\begin{aligned} \rho _{jk}\ge ~ \rho _{ji} \rho _{ik}, \quad {\rm with} \quad { \rho _{jk} \rho _{ik}}{\rho _{ji} }\ge 0, \quad i \ne j\ne k. \end{aligned}$$
(5)
Furthermore, for a tree-decomposable distribution (as illustrated in Fig.
1c and to be further described in "
Methods") of dichotomous variables, it is also shown in Pearl (
1986) that the topology of this tree can be uncovered uniquely from the observed correlation coefficients between pairs of variables, based on the following TETRAD conditions (Spearman
1904; Anderson and Rubin
1956):
$$\begin{aligned} T_\mathrm{e}^{(ijkl)}= \rho _{ij}\rho _{kl}- \rho _{il} \rho _{jk}=0, \quad i \ne j\ne k\ne l. \end{aligned}$$
(6)
Subsequently, Xu (
1986) and Xu and Pearl (
1987) further proceeded to study the distribution Eq. (
3) of Gaussian variables
\(x_1,x_2,x_3,w\) with three new results as follows:
1.
The analysing tool used in Pearl (
1986) stems from Eqs. (
3) and (
4) on dichotomous variables (i.e., Eq. 24 in Pearl
1986) that considers the products of conditional independence indirectly in a linear mixture, led to a set of constraint equations that are solved to get a necessary and sufficient condition. Differently, a new tool is suggested in Xu (
1986) and Xu and Pearl (
1987), which stems from
$$\begin{aligned} p(x_1,x_2,x_3|w)=p(x_1|w)p(x_2|w)p(x_3|w) \end{aligned}$$
(7)
that directly considers the product of conditional independence for inferring the star structure or topology of causality, and subsequently identifies the parameters of the involved distributions by
$$\begin{aligned} p(x_1,x_2,x_3)=\int p(x_1,x_2,x_3,w)\,\text{d}w. \end{aligned}$$
(8)
2.
Instead of following Pearl (
1986) that considers join probabilities to form constraint equations from Eq. (
4), the equation by Eq. (
7) is turned into one or a number of equations on different orders of statistics. Particularly, for Eq. (
7) with Gaussian variables
\(x_1,x_2,x_3,w,\) the block decomposition of covariance matrix (Gigi
1977) is adopted with equalities and inequalities on the second orders of statistics as constraints, which are further simplified into Eq. (
5).
3.
Specifically, the necessary and sufficient condition for
\(p(x_1,x_2,x_3)\) of Gaussian variables to be star-decomposable is simply that the triangle inequalities by Eq. (
5), i.e., the star-causality by Eq. (
3) and the latent structure by Eq. (
4) can be recovered from merely the second order statistics, i.e., correlation coefficients
\(\rho _{ji}, \ i,j \in \{x_1,x_2,x_3\}\).
When all the variables are Gaussians, the latent structure by
$$p(x_1,x_2,\ldots ,x_n)=\int p(x_1,x_2,\ldots ,x_n,w)\,\text{d}w$$
(9)
with the star-causality by
$$p(x_1,x_2,\ldots ,x_n)=\int p(x_1,x_2,\ldots ,x_n,w)\,\text{d}w$$
(10)
is actually equivalent to the classical factor analysis with only one factor. Pioneered by Spearman (
1904), whether the factor analysis model (as illustrated in Fig.
1d and to be further described in the next section) is identifiable has been a classical topic for more than 100 years, from perspectives that are more or less similar to constraints on the second-order statistics obtained from Eq. (
9). The well-known TETRAD equations or differences were discovered already in Spearman (
1904) and have been used for constructing casual structures not just in Pearl (
1986) but also by others (Spirtes and Glymour
2000; Bartholomew
1995; Bollen and Ting
2000). Moreover, Theorem 4.2 in Anderson and Rubin (
1956) also gave a necessary and sufficient condition for identifying whether a covariance matrix can be the one of a factor analysis model with one factor and three observation variables, which is actually equivalent to Eq. (
5) but expressed in a different format.