Skip to main content
Erschienen in: Soft Computing 24/2017

03.08.2016 | Methodologies and Application

An efficient algorithm for large-scale causal discovery

verfasst von: Yinghan Hong, Zhusong Liu, Guizhen Mai

Erschienen in: Soft Computing | Ausgabe 24/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Causal discovery is a fundamental problem in scientific research. Although many researchers are committed to finding causal relationships from observational data, large-scale causal discovery remains a tremendous challenge. In this paper, a new approach for large-scale causal discovery is proposed, based on a split-and-merge strategy. The method first splits a given dataset into small subdatasets using a graph-partitioning method and then develops a effective algorithm to infer the causality of each subdataset. The entire causal structure with respect to the given dataset is achieved by combining all the causalities of each subdataset. The experimental results show that the proposed approach is effective and scalable for large-scale causal discovery problems.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Cai R, Zhang Z, Hao Z (2013) Sada: a general framework to support robust causation discovery. In: Proceedings of the 30th international conference on machine learning, pp 208–216 Cai R, Zhang Z, Hao Z (2013) Sada: a general framework to support robust causation discovery. In: Proceedings of the 30th international conference on machine learning, pp 208–216
Zurück zum Zitat Chickering DM (2003) Optimal structure identification with greedy search. J Mach Learn Res 3:507–554MATHMathSciNet Chickering DM (2003) Optimal structure identification with greedy search. J Mach Learn Res 3:507–554MATHMathSciNet
Zurück zum Zitat Daniusis P, Janzing D, Mooij J, Zscheischler J, Steudel B, Zhang K, Schölkopf B (2012) Inferring deterministic causal relations. arXiv preprint arXiv:1203.3475 Daniusis P, Janzing D, Mooij J, Zscheischler J, Steudel B, Zhang K, Schölkopf B (2012) Inferring deterministic causal relations. arXiv preprint arXiv:​1203.​3475
Zurück zum Zitat Fortier N, Sheppard J, Strasser S (2014) Abductive inference in Bayesian networks using distributed overlapping swarm intelligence. Soft Comput 19(4):981–1001CrossRef Fortier N, Sheppard J, Strasser S (2014) Abductive inference in Bayesian networks using distributed overlapping swarm intelligence. Soft Comput 19(4):981–1001CrossRef
Zurück zum Zitat Geiger D, Heckerman D (1994) Learning gaussian networks. In: Proceedings of the tenth international conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc, pp 235–243 Geiger D, Heckerman D (1994) Learning gaussian networks. In: Proceedings of the tenth international conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc, pp 235–243
Zurück zum Zitat Gullberg M, Noreus K, Brattsand G, Friedrich B, Shingler V (1990) Purification and characterization of a 19-kilodalton intracellular protein. An activation-regulated putative protein kinase c substrate of t lymphocytes. J Biol Chem 265(29):17499–17505 Gullberg M, Noreus K, Brattsand G, Friedrich B, Shingler V (1990) Purification and characterization of a 19-kilodalton intracellular protein. An activation-regulated putative protein kinase c substrate of t lymphocytes. J Biol Chem 265(29):17499–17505
Zurück zum Zitat Hadley SW, Pelizzari C, Chen GTY (1996) Registration of localization images by maximization of mutual information. In: Proceedings of annual meeting of the American association of physicists in medicine Hadley SW, Pelizzari C, Chen GTY (1996) Registration of localization images by maximization of mutual information. In: Proceedings of annual meeting of the American association of physicists in medicine
Zurück zum Zitat Hao Z, Zhang H, Cai R, Wen W, Li Z (2015) Causal discovery on high dimensional data. Appl Intell 42(3):594–607CrossRef Hao Z, Zhang H, Cai R, Wen W, Li Z (2015) Causal discovery on high dimensional data. Appl Intell 42(3):594–607CrossRef
Zurück zum Zitat Heckerman D, Meek C, Cooper G (1999) A bayesian approach to causal discovery. Comput Causation Discov 19:141–166MathSciNet Heckerman D, Meek C, Cooper G (1999) A bayesian approach to causal discovery. Comput Causation Discov 19:141–166MathSciNet
Zurück zum Zitat Herskovits E (1991) Computer-based probabilistic-network construction. Ph.D thesis, Stanford University, USA Herskovits E (1991) Computer-based probabilistic-network construction. Ph.D thesis, Stanford University, USA
Zurück zum Zitat Hoyer PO, Janzing D, Mooij JM, Peters J, Schölkopf B (2009) Nonlinear causal discovery with additive noise models. In: Advances in neural information processing systems. MIT press, Massachusetts, pp 689–696 Hoyer PO, Janzing D, Mooij JM, Peters J, Schölkopf B (2009) Nonlinear causal discovery with additive noise models. In: Advances in neural information processing systems. MIT press, Massachusetts, pp 689–696
Zurück zum Zitat Janzing D, Mooij J, Zhang K, Lemeire J, Zscheischler J, Daniušis P, Steudel B, Schölkopf B (2012) Information-geometric approach to inferring causal directions. Artif Intell 182:1–31CrossRefMATHMathSciNet Janzing D, Mooij J, Zhang K, Lemeire J, Zscheischler J, Daniušis P, Steudel B, Schölkopf B (2012) Information-geometric approach to inferring causal directions. Artif Intell 182:1–31CrossRefMATHMathSciNet
Zurück zum Zitat Kelly L, Clark J, Gilliland G (2002) Comprehensive genotypic analysis of leukemia: clinical and therapeutic implications. Curr Opin Oncol 14(1):10–18 Kelly L, Clark J, Gilliland G (2002) Comprehensive genotypic analysis of leukemia: clinical and therapeutic implications. Curr Opin Oncol 14(1):10–18
Zurück zum Zitat Kwak N, Choi C-H (2002) Input feature selection by mutual information based on parzen window. Pattern Anal Mach Intell IEEE Trans 24(12):1667–1671CrossRef Kwak N, Choi C-H (2002) Input feature selection by mutual information based on parzen window. Pattern Anal Mach Intell IEEE Trans 24(12):1667–1671CrossRef
Zurück zum Zitat Liu Z, Yan H, Li Z (2015a) Server-aided anonymous attribute-based authentication in cloud computing. Future Gener Comput Syst 24:61–66CrossRef Liu Z, Yan H, Li Z (2015a) Server-aided anonymous attribute-based authentication in cloud computing. Future Gener Comput Syst 24:61–66CrossRef
Zurück zum Zitat Liu Z, Yan H, Lin Z, Xu L (2015b) An improved cloud data sharing scheme with hierarchical attribute structure. J Univers Comput Sci 21(3):454–472 Liu Z, Yan H, Lin Z, Xu L (2015b) An improved cloud data sharing scheme with hierarchical attribute structure. J Univers Comput Sci 21(3):454–472
Zurück zum Zitat Ma S, Li J, Liu L, Le TD (2016) Mining combined causes in large data sets. Knowl Based Syst 92:104–111CrossRef Ma S, Li J, Liu L, Le TD (2016) Mining combined causes in large data sets. Knowl Based Syst 92:104–111CrossRef
Zurück zum Zitat Meek C (1997) Graphical models: selecting causal and statistical models. Ph.D thesis, Carnegie Mellon University Meek C (1997) Graphical models: selecting causal and statistical models. Ph.D thesis, Carnegie Mellon University
Zurück zum Zitat Peters J, Janzing D, Schölkopf B (2010) Identifying cause and effect on discrete data using additive noise models. In: International conference on artificial intelligence and statistics, pp 597–604 Peters J, Janzing D, Schölkopf B (2010) Identifying cause and effect on discrete data using additive noise models. In: International conference on artificial intelligence and statistics, pp 597–604
Zurück zum Zitat Peters J, Janzing D, Schölkopf B (2011) Causal inference on discrete data using additive noise models. Pattern Anal Mach Intell IEEE Trans 33(12):2436–2450CrossRef Peters J, Janzing D, Schölkopf B (2011) Causal inference on discrete data using additive noise models. Pattern Anal Mach Intell IEEE Trans 33(12):2436–2450CrossRef
Zurück zum Zitat Peters J, Mooij JM, Janzing D, Schölkopf B (2014) Causal discovery with continuous additive noise models. J Mach Learn Res 15(1):2009–2053MATHMathSciNet Peters J, Mooij JM, Janzing D, Schölkopf B (2014) Causal discovery with continuous additive noise models. J Mach Learn Res 15(1):2009–2053MATHMathSciNet
Zurück zum Zitat Rasmussen CE, Williams C (2006) Gaussian processes for machine learning. MIT Press, CambridgeMATH Rasmussen CE, Williams C (2006) Gaussian processes for machine learning. MIT Press, CambridgeMATH
Zurück zum Zitat Shimizu S, Hoyer PO, Hyvärinen A, Kerminen A (2006) A linear non-gaussian acyclic model for causal discovery. J Mach Learn Res 7:2003–2030MATHMathSciNet Shimizu S, Hoyer PO, Hyvärinen A, Kerminen A (2006) A linear non-gaussian acyclic model for causal discovery. J Mach Learn Res 7:2003–2030MATHMathSciNet
Zurück zum Zitat Spirtes P, Glymour CN, Richard S (2000) Causation, prediction and search, vol 81. MIT press, CambridgeMATH Spirtes P, Glymour CN, Richard S (2000) Causation, prediction and search, vol 81. MIT press, CambridgeMATH
Zurück zum Zitat Tang L-J, Jiang J-H, Wu H-L, Shen G-L, Yu R-Q (2009) Variable selection using probability density function similarity for support vector machine classification of high-dimensional microarray data. Talanta 79(2):260–267CrossRef Tang L-J, Jiang J-H, Wu H-L, Shen G-L, Yu R-Q (2009) Variable selection using probability density function similarity for support vector machine classification of high-dimensional microarray data. Talanta 79(2):260–267CrossRef
Zurück zum Zitat Wang X, Gotoh O (2009) Accurate molecular classification of cancer using simple rules. BMC Med Genom 2(1):64CrossRef Wang X, Gotoh O (2009) Accurate molecular classification of cancer using simple rules. BMC Med Genom 2(1):64CrossRef
Zurück zum Zitat Wen X, Shao L, Xue Y, Fang W (2015) A rapid learning algorithm for vehicle classification. Inf Sci 295:395–406CrossRef Wen X, Shao L, Xue Y, Fang W (2015) A rapid learning algorithm for vehicle classification. Inf Sci 295:395–406CrossRef
Zurück zum Zitat Zhang K, Hyvärinen A (2008) Distinguishing causes from effects using nonlinear acyclic causal models. In: Journal of machine learning research, workshop and conference proceedings (NIPS 2008 causality workshop), vol 6, pp 157–164 Zhang K, Hyvärinen A (2008) Distinguishing causes from effects using nonlinear acyclic causal models. In: Journal of machine learning research, workshop and conference proceedings (NIPS 2008 causality workshop), vol 6, pp 157–164
Zurück zum Zitat Zhang K, Peters J, Janzing D, Schölkopf B (2012) Kernel-based conditional independence test and application in causal discovery. arXiv preprint arXiv:1202.3775 Zhang K, Peters J, Janzing D, Schölkopf B (2012) Kernel-based conditional independence test and application in causal discovery. arXiv preprint arXiv:​1202.​3775
Metadaten
Titel
An efficient algorithm for large-scale causal discovery
verfasst von
Yinghan Hong
Zhusong Liu
Guizhen Mai
Publikationsdatum
03.08.2016
Verlag
Springer Berlin Heidelberg
Erschienen in
Soft Computing / Ausgabe 24/2017
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-016-2281-0

Weitere Artikel der Ausgabe 24/2017

Soft Computing 24/2017 Zur Ausgabe