Skip to main content
Log in

Lingam: Non-Gaussian Methods for Estimating Causal Structures

  • Invited paper
  • Published:
Behaviormetrika Aims and scope Submit manuscript

Abstract

In many empirical sciences, the causal mechanisms underlying various phenomena need to be studied. Structural equation modeling is a general framework used for multivariate analysis, and provides a powerful method for studying causal mechanisms. However, in many cases, classical structural equation modeling is not capable of estimating the causal directions of variables. This is because it explicitly or implicitly assumes Gaussianity of data and typically utilizes only the covariance structure of data. In many applications, however, non-Gaussian data are often obtained, which means that more information may be contained in the data distribution than the covariance matrix is capable of containing. Thus, many new methods have recently been proposed for utilizing the non-Gaussian structure of data and estimating the causal directions of variables. In this paper, we provide an overview of such recent developments in causal inference, and focus in particular on the non-Gaussian methods known as LiNGAM.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Amari, S. (1998). Natural gradient learning works efficiently in learning. Neural Computation, 10:251–276.

    Article  Google Scholar 

  • Bach, F. R. and Jordan, M. I. (2002). Kernel independent component analysis. Journal of Machine Learning Research, 3:1–48.

    MathSciNet  MATH  Google Scholar 

  • Bentler, P. M. (1983). Some contributions to efficient statistics in structural models: Specification and estimation of moment structures. Psychometrika, 48:493–517.

    Article  MathSciNet  MATH  Google Scholar 

  • Bollen, K. (1989). Structural Equations with Latent Variables. John Wiley & Sons.

    Book  MATH  Google Scholar 

  • Bühlmann, P. (2013). Causal statistical inference in high dimensions. Mathematical Methods of Operations Research, 77(3):3–370.

    Article  MathSciNet  MATH  Google Scholar 

  • Bühlmann, P., Peters, J., and Ernest, J. (2013). CAM: Causal additive models, high-dimensional order search and penalized regression. arXiv:1310.1533.

    Google Scholar 

  • Cai, R., Zhang, Z., and Hao, Z. (2013). SADA: A general framework to support robust causation discovery. In Proc. 30th International Conference on Machine Learning (ICML2013), pages 208–216.

    Google Scholar 

  • Chen, Z. and Chan, L. (2013). Causality in linear nonGaussian acyclic models in the presence of latent Gaussian confounders. Neural Computation, 25(6):6–1641.

    Article  MATH  Google Scholar 

  • Chickering, D. (2002). Optimal structure identification with greedy search. Journal of Machine Learning Research, 3:507–554.

    MathSciNet  MATH  Google Scholar 

  • Comon, P. (1994). Independent component analysis, a new concept? Signal Processing, 36:62–83.

    Article  MATH  Google Scholar 

  • Darmois, G. (1953). Analyse g’en’erale des liaisons stochastiques. Review of the International Statistical Institute, 21:2–8.

    Article  MathSciNet  MATH  Google Scholar 

  • Dodge, Y. and Rousson, V. (2001). On asymmetric properties of the correlation coefficient in the regression setting. The American Statistician, 55(1):1–54.

    Article  MathSciNet  Google Scholar 

  • Efron, B. and Tibshirani, R. (1993). An Introduction to the Bootstrap. Chapman & Hall, New York.

    Book  MATH  Google Scholar 

  • Entner, D. and Hoyer, P. (2010). On causal discovery from time series data using FCI. In Proc. 5th European Workshop on Probabilistic Graphical Models (PGM2010).

    Google Scholar 

  • Entner, D. and Hoyer, P. O. (2011). Discovering unconfounded causal relationships using linear non-Gaussian models. In New Frontiers in Artificial Intelligence, Lecture Notes in Computer Science, volume 6797, pages 181–195.

    Article  Google Scholar 

  • Entner, D. and Hoyer, P. O. (2012). Estimating a causal order among groups of variables in linear models. In Proc. 22nd International Conference on Artificial Neural Networks (ICANN2012), pages 83–90.

    Google Scholar 

  • Eriksson, J. and Koivunen, V. (2004). Identifiability, separability, and uniqueness of linear ICA models. IEEE Signal Processing Letters, 11:601–604.

    Article  Google Scholar 

  • Ferkingsta, E., Lølanda, A., and Wilhelmsen, M. (2011). Causal modeling and inference for electricity markets. Energy Economics, 33(3):3–412.

    Google Scholar 

  • Gao, W. and Yang, H. (2012). Identifying structural VAR model with latent variables using overcomplete ICA. Far East Journal of Theoretical Statistics, 40(1):1–44.

    MathSciNet  MATH  Google Scholar 

  • Glymour, C. (2010). What is right with ‘Bayes net methods’ and what is wrong with ‘hunting causes and using them’? The British Journal for the Philosophy of Science, 61(1):1–211.

    Article  MathSciNet  Google Scholar 

  • Gretton, A., Bousquet, O., Smola, A. J., and Schölkopf, B. (2005). Measuring statistical dependence with Hilbert-Schmidt norms. In Proc. 16th International Conference on Algorithmic Learning Theory (ALT2005), pages 63–77.

    Chapter  Google Scholar 

  • Henao, R. and Winther, O. (2011). Sparse linear identifiable multivariate modeling. Journal of Machine Learning Research, 12:863–905.

    MathSciNet  MATH  Google Scholar 

  • Himberg, J., Hyvärinen, A., and Esposito, F. (2004). Validating the independent components of neuroimaging time-series via clustering and visualization. NeuroImage, 22:1214–1222.

    Article  Google Scholar 

  • Hirayama, J. and Hyvärinen, A. (2011). Structural equations and divisive normalization for energy-dependent component analysis. In Advances in Neural Information Processing Systems 23, pages 1872–1880.

    Google Scholar 

  • Holland, P. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81:945–970.

    Article  MathSciNet  MATH  Google Scholar 

  • Hoyer, P. O. and Hyttinen, A. (2009). Bayesian discovery of linear acyclic causal models. In Proc. 25th Conference on Uncertainty in Artificial Intelligence (UAI2009), pages 240–248.

    Google Scholar 

  • Hoyer, P. O., Hyvärinen, A., Scheines, R., Spirtes, P., Ramsey, J., Lacerda, G., and Shimizu, S. (2008a). Causal discovery of linear acyclic models with arbitrary distributions. In Proc. 24th Conference on Uncertainty in Artificial Intelligence (UAI2008), pages 282–289.

    Google Scholar 

  • Hoyer, P. O., Janzing, D., Mooij, J., Peters, J., and Schölkopf, B. (2009). Nonlinear causal discovery with additive noise models. In Koller, D., Schuurmans, D., Bengio, Y., and Bottou, L., editors, Advances in Neural Information Processing Systems 21, pages 689–696.

    Google Scholar 

  • Hoyer, P. O., Shimizu, S., Kerminen, A., and Palviainen, M. (2008b). Estimation of causal effects using linear non-Gaussian causal models with hidden variables. International Journal of Approximate Reasoning, 49(2):2–378.

    Article  MathSciNet  MATH  Google Scholar 

  • Hurley, D., Araki, H., Tamada, Y., Dunmore, B., Sanders, D., Humphreys, S., Affara, M., Imoto, S., Yasuda, K., Tomiyasu, Y., et al. (2012). Gene network inference and visualization tools for biologists: Application to new human transcriptome datasets. Nucleic Acids Research, 40(6):6–2398.

    Article  Google Scholar 

  • Hyvärinen, A. (1998). New approximations of differential entropy for independent component analysis and projection pursuit. In Advances in Neural Information Processing Systems 10, pages 273–279.

    Google Scholar 

  • Hyvärinen, A. (1999). Fast and robust fixed-point algorithms for independent component analysis. IEEE Transactions on Neural Networks, 10:626–634.

    Article  Google Scholar 

  • Hyvärinen, A. (2013). Independent component analysis: Recent advances. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 371:20110534.

    Article  MathSciNet  MATH  Google Scholar 

  • Hyvärinen, A., Karhunen, J., and Oja, E. (2001). Independent component analysis. Wiley, New York.

    Book  Google Scholar 

  • Hyvärinen, A. and Smith, S. M. (2013). Pairwise likelihood ratios for estimation of non-Gaussian structural equation models. Journal of Machine Learning Research, 14:111–152.

    MathSciNet  MATH  Google Scholar 

  • Hyvärinen, A., Zhang, K., Shimizu, S., and Hoyer, P. O. (2010). Estimation of a structural vector autoregressive model using non-Gaussianity. Journal of Machine Learning Research, 11:1709–1731.

    MATH  Google Scholar 

  • Imoto, S., Kim, S., Goto, T., Aburatani, S., Tashiro, K., Kuhara, S., and Miyano, S. (2002). Bayesian network and nonparametric heteroscedastic regression for nonlinear modeling of genetic network. In Proc. 1st IEEE Computer Society Bioinformatics Conference, pages 219–227.

    Chapter  Google Scholar 

  • Jutten, C. and H’erault, J. (1991). Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture. Signal Processing, 24(1):1–10.

    Article  MATH  Google Scholar 

  • Kadowaki, K., Shimizu, S., and Washio, T. (2013). Estimation of causal structures in longitudinal data using non-Gaussianity. In Proc. 23rd IEEE International Workshop on Machine Learning for Signal Processing (MLSP2013). In press.

    Google Scholar 

  • Kawahara, Y., Bollen, K., Shimizu, S., and Washio, T. (2010). GroupLiNGAM: Linear non-Gaussian acyclic models for sets of variables. arXiv:1006.5041.

    Google Scholar 

  • Kawahara, Y., Shimizu, S., and Washio, T. (2011). Analyzing relationships among ARMA processes based on non-Gaussianity of external influences. Neurocomputing, 4(12-13):2212–2221.

    Article  Google Scholar 

  • Komatsu, Y., Shimizu, S., and Shimodaira, H. (2010). Assessing statistical reliability of LiNGAM via multiscale bootstrap. In Proc. 20th International Conference on Artificial Neural Networks (ICANN2010), pages 309–314.

    Google Scholar 

  • Kraskov, A., Stögbauer, H., and Grassberger, P. (2004). Estimating mutual information. Physical Review E, 69(6):066138.

    Article  MathSciNet  Google Scholar 

  • Kuhn, H. W. (1955). The Hungarian method for the assignment problem. Naval Research Logistics Quarterly, 2:83–97.

    Article  MathSciNet  MATH  Google Scholar 

  • Lacerda, G., Spirtes, P., Ramsey, J., and Hoyer, P. O. (2008). Discovering cyclic causal models by independent components analysis. In Proc. 24th Conference on Uncertainty in Artificial Intelligence (UAI2008), pages 366–374.

    Google Scholar 

  • Lewicki, M. and Sejnowski, T. J. (2000). Learning overcomplete representations. Neural Computation, 12(2):2–365.

    Article  Google Scholar 

  • Maathuis, M., Colombo, D., Kalisch, M., and Bühlmann, P. (2010). Predicting causal effects in large-scale systems from observational data. Nature Methods, 7(4):4–248.

    Article  Google Scholar 

  • Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin, 105(1):1–166.

    Article  Google Scholar 

  • Moneta, A., Entner, D., Hoyer, P., and Coad, A. (2013). Causal inference by independent component analysis: Theory and applications. Oxford Bulletin of Economics and Statistics, 75:705–730.

    Article  Google Scholar 

  • Mooij, J., Janzing, D., Heskes, T., and Schölkopf, B. (2011). Causal discovery with cyclic additive noise models. In Advances in Neural Information Processing Systems 24, pages 639–647.

    Google Scholar 

  • Mooij, J., Janzing, D., Peters, J., and Schölkopf, B. (2009). Regression by dependence minimization and its application to causal inference in additive noise models. In Proc. 26th International Conference on Machine Learning (ICML2009), pages 745–752. Omnipress.

    Google Scholar 

  • Neyman, J. (1923). Sur les applications de la thar des probabilities aux experiences Agaricales: Essay des principle.

    Google Scholar 

  • Niyogi, D., Kishtawal, C., Tripathi, S., and Govindaraju, R. S. (2010). Observational evidence that agricultural intensification and land use change may be reducing the Indian summer monsoon rainfall. Water Resources Research, 46:W03533.

    Article  Google Scholar 

  • Ozaki, K. and Ando, J. (2009). Direction of causation between shared and non-shared environmental factors. Behavior Genetics, 39(3):3–336.

    Article  Google Scholar 

  • Ozaki, K., Toyoda, H., Iwama, N., Kubo, S., and Ando, J. (2011). Using non-normal SEM to resolve the ACDE model in the classical twin design. Behavior Genetics, 41(2):2–339.

    Article  Google Scholar 

  • Pearl, J. (1995). Causal diagrams for empirical research. Biometrika, 82(4):4–688.

    Article  MathSciNet  MATH  Google Scholar 

  • Pearl, J. (2000). Causality: Models, Reasoning, and Inference. Cambridge University Press. (2nd ed. 2009).

    MATH  Google Scholar 

  • Pearl, J. and Verma, T. (1991). A theory of inferred causation. In Allen, J., Fikes, R., and Sandewall., E., editors, Proc. 2nd International Conference on Principles of Knowledge Representation and Reasoning, pages 441–452. Morgan Kaufmann, San Mateo, CA.

    Google Scholar 

  • Pe’er, D. and Hacohen, N. (2011). Principles and strategies for developing network models in cancer. Cell, 144:864–873.

    Article  Google Scholar 

  • Peters, J., Janzing, D., and Schölkopf, B. (2011a). Causal inference on discrete data using additive noise models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(12):12–2450.

    Article  Google Scholar 

  • Peters, J., Janzing, D., and Schölkopf, B. (2013). Causal inference on time series using restricted structural equation models. In Advances in Neural Information Processing Systems 26.

    Google Scholar 

  • Peters, J., Mooij, J., Janzing, D., and Schölkopf, B. (2011b). Identifiability of causal graphs using functional models. Proc. 27th Conference on Uncertainty in Artificial Intelligence (UAI2011), pages 589–598.

    Google Scholar 

  • Ramsey, J., Hanson, S., and Glymour, C. (2011). Multi-subject search correctly identifies causal connections and most causal directions in the DCM models of the Smith et al. simulation study. NeuroImage, 58(3):3–848.

    Article  Google Scholar 

  • Richardson, T. (1996). A polynomial-time algorithm for deciding Markov equivalence of directed cyclic graphical models. In Proc. 12th Conference on Uncertainty in Artificial Intelligence (UAI1996), pages 462–469.

    Google Scholar 

  • Rosenström, T., Jokela, M., Puttonen, S., Hintsanen, M., Pulkki-Råback, L., Viikari, J. S., Raitakari, O. T., and Keltikangas-Järvinen, L. (2012). Pairwise measures of causal direction in the epidemiology of sleep problems and depression. PloS ONE, 7(11):e50841.

    Article  Google Scholar 

  • Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66:688–701.

    Article  Google Scholar 

  • Schaechtle, U., Stathis, K., Holloway, R., and Bromuri, S. (2013). Multi-dimensional causal discovery. In Proc. 23rd International Joint Conference on Artificial Intelligence (IJCAI2013), pages 1649–1655.

    Google Scholar 

  • Schölkopf, B., Janzing, D., Peters, J., Sgouritsa, E., Zhang, K., and Mooij, J. (2012). On causal and anticausal learning. In Proc. 29th International Conference on Machine learning (ICML2012), pages 1255–1262.

    Google Scholar 

  • Shimizu, S. (2012). Joint estimation of linear non-Gaussian acyclic models. Neurocomputing, 81:104–107.

    Article  Google Scholar 

  • Shimizu, S. and Bollen, K. (2013). Bayesian estimation of possible causal direction in the presence of latent confounders using a linear non-Gaussian acyclic structural equation model with individual-specific effects. arXiv:1310.6778.

    Google Scholar 

  • Shimizu, S., Hoyer, P. O., and Hyvärinen, A. (2009). Estimation of linear non-Gaussian acyclic models for latent factors. Neurocomputing, 72:2024–2027.

    Article  Google Scholar 

  • Shimizu, S., Hoyer, P. O., Hyvärinen, A., and Kerminen, A. (2006). A linear non-Gaussian acyclic model for causal discovery. Journal of Machine Learning Research, 7:2003–2030.

    MathSciNet  MATH  Google Scholar 

  • Shimizu, S. and Hyvarinen, A. (2008). Discovery of linear non-Gaussian acyclic models in the presence of latent classes. In Proc. 14th International Conference on Neural Information Processing (ICONIP2007), pages 752–761.

    Chapter  Google Scholar 

  • Shimizu, S., Inazumi, T., Sogawa, Y., Hyvarinen, A., Kawahara, Y., Washio, T., Hoyer, P. O., and Bollen, K. (2011). DirectLiNGAM: A direct method for learning a linear non-Gaussian structural equation model. Journal of Machine Learning Research, 12:1225–1248.

    MathSciNet  MATH  Google Scholar 

  • Shimizu, S. and Kano, Y. (2008). Use of non-normality in structural equation modeling: Application to direction of causation. Journal of Statistical Planning and Inference, 138:3483–3491.

    Article  MathSciNet  MATH  Google Scholar 

  • Shpitser, I. and Pearl, J. (2006). Identification of joint interventional distributions in recursive semi-Markovian causal models. In Proc. 22nd Conference on Uncertainty in Artificial Intelligence (UAI2006), pages 437–444.

    Google Scholar 

  • Shpitser, I. and Pearl, J. (2008). Complete identification methods for the causal hierarchy. Journal of Machine Learning Research, 9:1941–1979.

    MathSciNet  MATH  Google Scholar 

  • Skitovitch, W. P. (1953). On a property of the normal distribution. Doklady Akademii Nauk SSSR, 89:217–219.

    MathSciNet  Google Scholar 

  • Smith, S. (2012). The future of FMRI connectivity. NeuroImage, 62(2):2–1266.

    Article  Google Scholar 

  • Smith, S., Miller, K., Salimi-Khorshidi, G., Webster, M., Beckmann, C., Nichols, T., Ramsey, J., and Woolrich, M. (2011). Network modelling methods for FMRI. NeuroImage, 54(2):2–891.

    Article  Google Scholar 

  • Sogawa, Y., Shimizu, S., Shimamura, T., Hyvärinen, A., Washio, T., and Imoto, S. (2011). Estimating exogenous variables in data with more variables than observations. Neural Networks, 24(8):8–880.

    Article  MATH  Google Scholar 

  • Spirtes, P. and Glymour, C. (1991). An algorithm for fast recovery of sparse causal graphs. Social Science Computer Review, 9:67–72.

    Article  Google Scholar 

  • Spirtes, P., Glymour, C., and Scheines, R. (1993). Causation, Prediction, and Search. Springer Verlag. (2nd ed. MIT Press, 2000).

    Book  MATH  Google Scholar 

  • Spirtes, P., Meek, C., and Richardson, T. (1995). Causal inference in the presence of latent variables and selection bias. In Proc. 11th Annual Conference on Uncertainty in Artificial Intelligence (UAI1995), pages 491–506.

    Google Scholar 

  • Statnikov, A., Henaff, M., Lytkin, N. I., and Aliferis, C. F. (2012). New methods for separating causes from effects in genomics data. BMC Genomics, 13(Suppl 8):S22.

    Article  Google Scholar 

  • Swanson, N. and Granger, C. (1997). Impulse response functions based on a causal approach to residual orthogonalization in vector autoregressions. Journal of the American Statistical Association, pages 357–367.

    Google Scholar 

  • Takahashi, Y., Ozaki, K., Roberts, B., and Ando, J. (2012). Can low behavioral activation system predict depressive mood?: An application of non-normal structural equation modeling. Japanese Psychological Research, 54(2):2–181.

    Article  Google Scholar 

  • Tashiro, T., Shimizu, S., Hyvärinen, A., and Washio, T. (2014). ParceLiNGAM: A causal ordering method robust against latent confounders. Neural Computation.

    Google Scholar 

  • Thamvitayakul, K., Shimizu, S., Ueno, T., Washio, T., and Tashiro, T. (2012). Bootstrap confidence intervals in DirectLiNGAM. In Proc. 2012 IEEE 12th International Conference on Data Mining Workshops (ICDMW2012), pages 659–668. IEEE.

    Chapter  Google Scholar 

  • Tillman, R. E., Gretton, A., and Spirtes, P. (2010). Nonlinear directed acyclic structure learning with weakly additive noise models. In Advances in Neural Information Processing Systems 22, pages 1847–1855.

    Google Scholar 

  • Tillman, R. E. and Spirtes, P. (2011). When causality matters for prediction: Investigating the practical tradeoffs. In JMLR Workshop and Conference Proceedings, Causality: Objectives and Assessment (Proc. NIPS2008 Workshop on Causality), volume 6, pages 373–382.

    Google Scholar 

  • Wright, S. (1921). Correlation and causation. Journal of Agricultural Research, 20:557–585.

    Google Scholar 

  • Zhang, K. and Chan, L.-W. (2006). ICA with sparse connections. In Proc. 7th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL 2006), pages 530–537.

    Chapter  Google Scholar 

  • Zhang, K. and Hyvärinen, A. (2009a). Causality discovery with additive disturbances: An information-theoretical perspective. In Proc. European Conference on Machine Learning (ECML2009), pages 570–585.

    Google Scholar 

  • Zhang, K. and Hyvärinen, A. (2009b). On the identifiability of the post-nonlinear causal model. In Proc. 25th Conference in Uncertainty in Artificial Intelligence (UAI2009), pages 647–655.

    Google Scholar 

  • Zhang, K., Schölkopf, B., and Janzing, D. (2010). Invariant Gaussian process latent variable models and application in causal discovery. In Proc. 26nd Conference on Uncertainty in Artificial Intelligence (UAI2010), pages 717–724.

    Google Scholar 

  • Zou, H. (2006). The adaptive Lasso and its oracle properties. Journal of the American Statistical Association, 101:1418–1429.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shohei Shimizu.

About this article

Cite this article

Shimizu, S. Lingam: Non-Gaussian Methods for Estimating Causal Structures. Behaviormetrika 41, 65–98 (2014). https://doi.org/10.2333/bhmk.41.65

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.2333/bhmk.41.65

Key Words and Phrases

Navigation