Article

Time and sample efficient discovery of Markov blankets and direct causal relations

Authors:
Ioannis Tsamardinos

Nashville, TN

Nashville, TN
View Profile

,
Constantin F. Aliferis

Nashville, TN

Nashville, TN
View Profile

,
Alexander Statnikov

Nashville, TN

Nashville, TN
View Profile

KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2003Pages 673–678https://doi.org/10.1145/956750.956838

Published:24 August 2003Publication History

KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 673–678

ABSTRACT

Data Mining with Bayesian Network learning has two important characteristics: under conditions learned edges between variables correspond to casual influences, and second, for every variable T in the network a special subset (Markov Blanket) identifiable by the network is the minimal variable set required to predict T. However, all known algorithms learning a complete BN do not scale up beyond a few hundred variables. On the other hand, all known sound algorithms learning a local region of the network require an exponential number of training instances to the size of the learned region.The contribution of this paper is two-fold. We introduce a novel local algorithm that returns all variables with direct edges to and from a target variable T as well as a local algorithm that returns the Markov Blanket of T. Both algorithms (i) are sound, (ii) can be run efficiently in datasets with thousands of variables, and (iii) significantly outperform in terms of approximating the true neighborhood previous state-of-the-art algorithms using only a fraction of the training size required by the existing methods. A fundamental difference between our approach and existing ones is that the required sample depends on the generating graph connectivity and not the size of the local region; this yields up to exponential savings in sample relative to previously known algorithms. The results presented here are promising not only for discovery of local causal structure, and variable selection for classification, but also for the induction of complete BNs.

References

C. Glymour and G. F. Cooper, editors. Computation, Causation, and Discovery. AAAI Press/The MIT Press, 1999.Google Scholar
D. Heckerman. A tutorial on learning bayesian networks. Technical Report MSR-TR-95-06, Microsoft Research, March 1995.Google Scholar
C. Jie, R. Greiner, J. Kelly, D. A. Bell, and W. Liu. Learning bayesian networks from data: An information-theory based approach. Artificial Intelligence, 137:43--90, 2002. Google ScholarDigital Library
D. Koller and M. Sahami. Toward optimal feature selection. In Thirteen International Conference in Machine Learning, 1996.Google ScholarDigital Library
D. Margaritis and S. Thrun. Bayesian network induction via local neighborhoods. In Advances in Neural Information Processing Systems 12 (NIPS), 1999.Google Scholar
R. E. Neapolitan. Probabilistic Reasoning in Expert Systems. John Wiley and Sons, 1990. Google ScholarDigital Library
J. Pearl. Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, San Mateo, CA, 1988. Google ScholarDigital Library
P. Spirtes, C. Glymour, and R. Scheines. Causation, Prediction, and Search. The MIT Press, second edition, 2000.Google Scholar
A. Statnikov, I. Tsamardinos, and C. F. Aliferis. An algorithm for the generation of large Bayesian Networks. Technical Report DSL-03-01, Vanderbilt Univesity, 2003.Google Scholar
I. Tsamardinos and C. F. Aliferis. Towards principled feature selection: Relevancy, filters and wrappers. In Ninth International Workshop on Artificial Intelligence and Statistics (AI&Stats 2003), 2003.Google Scholar
I. Tsamardinos, C. F. Aliferis, and A. Statnikov. Algorithms for large scale markov blanket discovery. In The 16th International FLAIRS Conference, 2003.Google Scholar
I. Tsamardinos, C. F. Aliferis, and A. Statnikov. Time and sample efficient discovery of Markov Blankets and direct causal relations. Technical Report DSL-03-02, Vanderbilt University, 2003.Google ScholarDigital Library

Index Terms

Time and sample efficient discovery of Markov blankets and direct causal relations
1. Computing methodologies
  1. Machine learning

Recommendations

Selecting features by learning Markov blankets
KES'07/WIRN'07: Proceedings of the 11th international conference, KES 2007 and XVII Italian workshop on neural networks conference on Knowledge-based intelligent information and engineering systems: Part I

In this paper I propose a novel feature selection technique based on Bayesian networks. The main idea is to exploit the conditional independencies entailed by Bayesian networks in order to discard features that are not directly relevant for ...
Read More
A parallel framework for constraint-based bayesian network learning via markov blanket discovery
SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

Bayesian networks (BNs) are a widely used graphical model in machine learning. As learning the structure of BNs is NP-hard, high-performance computing methods are necessary for constructing large-scale networks. In this paper, we present a parallel ...
Read More
Parallel Discovery of Direct Causal Relations and Markov Boundaries with Applications to Gene Networks
ICPP '11: Proceedings of the 2011 International Conference on Parallel Processing

Bayesian networks enable formal probabilistic reasoning on a set of interacting variables of a domain, and have been shown to have broad applicability. More specifically, in bioinformatics Bayesian networks are used to model gene interactions. Learning ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
August 2003
736 pages
ISBN:1581137370
DOI:10.1145/956750
Conference Chair:
Lise Getoor
University of Maryland, College Park
,
General Chair:
Ted Senator
DARPA
,
Program Chairs:
Pedro Domingos
University of Washington
,
Christos Faloutsos
Carnegie Mellon University
Copyright © 2003 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 August 2003
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Bayesian Networks
novel data mining algorithms
robust and scalable statistical methods
Qualifiers
- Article
Conference

Acceptance Rates
KDD '03 Paper Acceptance Rate46of298submissions,15%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 202
  Total Citations
  View Citations
- 1,316
  Total Downloads
- Downloads (Last 12 months)69
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Time and sample efficient discovery of Markov blankets and direct causal relations

KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Selecting features by learning Markov blankets

A parallel framework for constraint-based bayesian network learning via markov blanket discovery

Parallel Discovery of Direct Causal Relations and Markov Boundaries with Applications to Gene Networks