ABSTRACT
Data Mining with Bayesian Network learning has two important characteristics: under conditions learned edges between variables correspond to casual influences, and second, for every variable T in the network a special subset (Markov Blanket) identifiable by the network is the minimal variable set required to predict T. However, all known algorithms learning a complete BN do not scale up beyond a few hundred variables. On the other hand, all known sound algorithms learning a local region of the network require an exponential number of training instances to the size of the learned region.The contribution of this paper is two-fold. We introduce a novel local algorithm that returns all variables with direct edges to and from a target variable T as well as a local algorithm that returns the Markov Blanket of T. Both algorithms (i) are sound, (ii) can be run efficiently in datasets with thousands of variables, and (iii) significantly outperform in terms of approximating the true neighborhood previous state-of-the-art algorithms using only a fraction of the training size required by the existing methods. A fundamental difference between our approach and existing ones is that the required sample depends on the generating graph connectivity and not the size of the local region; this yields up to exponential savings in sample relative to previously known algorithms. The results presented here are promising not only for discovery of local causal structure, and variable selection for classification, but also for the induction of complete BNs.
- C. Glymour and G. F. Cooper, editors. Computation, Causation, and Discovery. AAAI Press/The MIT Press, 1999.Google Scholar
- D. Heckerman. A tutorial on learning bayesian networks. Technical Report MSR-TR-95-06, Microsoft Research, March 1995.Google Scholar
- C. Jie, R. Greiner, J. Kelly, D. A. Bell, and W. Liu. Learning bayesian networks from data: An information-theory based approach. Artificial Intelligence, 137:43--90, 2002. Google ScholarDigital Library
- D. Koller and M. Sahami. Toward optimal feature selection. In Thirteen International Conference in Machine Learning, 1996.Google ScholarDigital Library
- D. Margaritis and S. Thrun. Bayesian network induction via local neighborhoods. In Advances in Neural Information Processing Systems 12 (NIPS), 1999.Google Scholar
- R. E. Neapolitan. Probabilistic Reasoning in Expert Systems. John Wiley and Sons, 1990. Google ScholarDigital Library
- J. Pearl. Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, San Mateo, CA, 1988. Google ScholarDigital Library
- P. Spirtes, C. Glymour, and R. Scheines. Causation, Prediction, and Search. The MIT Press, second edition, 2000.Google Scholar
- A. Statnikov, I. Tsamardinos, and C. F. Aliferis. An algorithm for the generation of large Bayesian Networks. Technical Report DSL-03-01, Vanderbilt Univesity, 2003.Google Scholar
- I. Tsamardinos and C. F. Aliferis. Towards principled feature selection: Relevancy, filters and wrappers. In Ninth International Workshop on Artificial Intelligence and Statistics (AI&Stats 2003), 2003.Google Scholar
- I. Tsamardinos, C. F. Aliferis, and A. Statnikov. Algorithms for large scale markov blanket discovery. In The 16th International FLAIRS Conference, 2003.Google Scholar
- I. Tsamardinos, C. F. Aliferis, and A. Statnikov. Time and sample efficient discovery of Markov Blankets and direct causal relations. Technical Report DSL-03-02, Vanderbilt University, 2003.Google ScholarDigital Library
Index Terms
- Time and sample efficient discovery of Markov blankets and direct causal relations
Recommendations
Selecting features by learning Markov blankets
KES'07/WIRN'07: Proceedings of the 11th international conference, KES 2007 and XVII Italian workshop on neural networks conference on Knowledge-based intelligent information and engineering systems: Part IIn this paper I propose a novel feature selection technique based on Bayesian networks. The main idea is to exploit the conditional independencies entailed by Bayesian networks in order to discard features that are not directly relevant for ...
A parallel framework for constraint-based bayesian network learning via markov blanket discovery
SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisBayesian networks (BNs) are a widely used graphical model in machine learning. As learning the structure of BNs is NP-hard, high-performance computing methods are necessary for constructing large-scale networks. In this paper, we present a parallel ...
Parallel Discovery of Direct Causal Relations and Markov Boundaries with Applications to Gene Networks
ICPP '11: Proceedings of the 2011 International Conference on Parallel ProcessingBayesian networks enable formal probabilistic reasoning on a set of interacting variables of a domain, and have been shown to have broad applicability. More specifically, in bioinformatics Bayesian networks are used to model gene interactions. Learning ...
Comments