Structural Sensitivity for the Knowledge Engineering of Bayesian Networks

Whether a Bayesian Network (BN) is constructed through expert elicitation, from data, or a combination of both, evaluation of the resultant BN is a crucial part of the knowledge engineering process. One kind of evaluation is to analyze how sensitive the network is to changes in inputs, a form of sensitivity analysis commonly called “sensitivity to findings”. The properties of d-separation can be used to determine whether or not evidence (or findings) about one variable may influence belief in a target variable, given the BN structure only. Once the network is parameterised, it is also possible to measure this influence, for example with mutual information or variance. Given such a metric of change, when evaluating a BN, it is common to rank nodes for either a maximum such effect or the average such effect. However this ranking tends to reflect the structural properties in the network: the longer the path from a node to the target node, the lower the influence, while the influence increases with the number of such paths. This raises the question: how useful is the ranking computed with the parameterised network, over and above what could be discerned from the structure alone? We propose a metric,

Distance Weighted Influence

, that ranks the influence of nodes based on the structure of the network alone. We show that not only does this ranking provide useful feedback on the structure in the early stages of the knowledge engineering process, after parameterisation the interest from an evaluation perspective is how much the ranking has changed. We illustrate the practical use of this on real-world networks from the literature.

David Albrecht, Ann E. Nicholson, Chris Whittle

A Pairwise Class Interaction Framework for Multilabel Classification

We present a general framework for multidimensional classification that captures the pairwise interactions between class variables. The pairwise class interactions are encoded using a collection of base classifiers (Phase 1), for which the class predictions are combined in a Markov random field that is subsequently used for multi-label inference (Phase 2); thus, the framework can be positioned between ensemble methods and label transformation-based approaches. Our proposal leads to a general framework supporting a wide range of base classifiers in the first phase as well as different inference methods in the second phase. We describe the basic framework and its main properties, including detailed experimental results based on a range of publicly available databases. By comparing the performance with other multilabel classifiers we see that the proposed classifier either outperforms or is competitive with the tested straw-men methods. We also analyse the scalability of our approach and discuss potential drawbacks and directions for future work.

Jacinto Arias, José A. Gámez, Thomas D. Nielsen, José M. Puerta

From Information to Evidence in a Bayesian Network

Evidence in a Bayesian network comes from information based on the observation of one or more variables. A review of the terminology leads to the assessment that two main types of non-deterministic evidence have been defined, namely likelihood evidence and probabilistic evidence but the distinction between fixed probabilistic evidence and not fixed probabilistic evidence is not clear, and neither terminology nor concepts have been clearly defined. In particular, the term

soft evidence

is confusing. The article presents definitions and concepts related to the use of non-deterministic evidence in Bayesian networks, in terms of specification and propagation. Several examples help to understand how an initial piece of information can be specified as a finding in a Bayesian network.

Ali Ben Mrad, Véronique Delcroix, Sylvain Piechowiak, Philip Leicester

Learning Gated Bayesian Networks for Algorithmic Trading

Gated Bayesian networks (GBNs) are a recently introduced extension of Bayesian networks that aims to model dynamical systems consisting of several distinct phases. In this paper, we present an algorithm for semi-automatic learning of GBNs. We use the algorithm to learn GBNs that output buy and sell decisions for use in algorithmic trading systems. We show how using the learnt GBNs can substantially lower risks towards invested capital, while at the same time generating similar or better rewards, compared to the benchmark investment strategy

buy-and-hold

.

Marcus Bendtsen, Jose M. Peña

Local Sensitivity of Bayesian Networks to Multiple Simultaneous Parameter Shifts

The robustness of the performance of a Bayesian network to shifts in its parameters can be studied with a sensitivity analysis. For reasons of computational efficiency such an analysis is often limited to studying shifts in only one or two parameters at a time. The concept of sensitivity value, an important notion in sensitivity analysis, captures the effect of local changes in a single parameter. In this paper we generalise this concept to an

n-way sensitivity value

in order to capture the local effect of multiple simultaneous parameters changes. Moreover, we demonstrate that an

n

-way sensitivity value can be computed efficiently, even for large

n

. An

n

-way sensitivity value is direction dependent and its maximum, minimum, and direction of maximal change can be easily determined. The direction of maximal change can, for example, be exploited in network tuning. To this end, we introduce the concept of

sliced sensitivity function

for an

n

-way sensitivity function restricted to parameter shifts in a fixed direction. We moreover argue that such a function can be computed efficiently.

Janneke H. Bolt, Silja Renooij

Bayesian Network Inference Using Marginal Trees

Variable Elimination

(VE) answers a query posed to a

Bayesian network

(BN) by manipulating the conditional probability tables of the BN. Each successive query is answered in the same manner. In this paper, we present an inference algorithm that is aimed at maximizing the reuse of past computation but does not involve precomputation. Compared to VE and a variant of VE incorporating precomputation, our approach fairs favourably in preliminary experimental results.

Cory J. Butz, Jhonatan de S. Oliveira, Anders L. Madsen

On SPI-Lazy Evaluation of Influence Diagrams

Influence Diagrams are an effective modelling framework for analysis of Bayesian decision making under uncertainty. Improving the performance of the evaluation is an element of crucial importance as real-world decision problems are more and more complex. Lazy Evaluation is an algorithm used to evaluate Influence Diagrams based on message passing in a strong junction tree. This paper proposes the use of Symbolic Probabilistic Inference as an alternative to Variable Elimination for computing the clique-to-clique messages in Lazy Evaluation of Influence Diagrams.

Rafael Cabañas, Andrés Cano, Manuel Gómez-Olmedo, Anders L. Madsen

Extended Probability Trees for Probabilistic Graphical Models

This paper proposes a flexible framework to work with probabilistic potentials in Probabilistic Graphical Models. The so-called Extended Probability Trees allow the representation of multiplicative and additive factorisations within the structure, along with context-specific independencies, with the aim of providing a way of representing and managing complex distributions. This work gives the details of the structure and develops the basic operations on potentials necessary to perform inference. The three basic operations, namely restriction, combination and marginalisation, are defined so they can take advantage of the defined factorisations within the structure, following a lazy methodology.

Andrés Cano, Manuel Gómez-Olmedo, Serafín Moral, Cora B. Pérez-Ariza

Mixture of Polynomials Probability Distributions for Grouped Sample Data

This paper describes techniques for developing a mixture of polynomials (MOP) probability distribution from a frequency distribution (also termed grouped data) summarized from a large dataset. To accomplish this task, a temporary dataset is produced from the grouped data and the parameters for the MOP function are estimated using a Bspline interpolation technique. Guidance is provided regarding the composition of the temporary dataset, and the selection of split points and order of the MOP approximation. Good results are obtained when using grouped data as compared to the underlying dataset, and this can be a major advantage when using a decision support system to obtain information for estimating probability density functions for random variables of interest.

Barry R. Cobb

Trading off Speed and Accuracy in Multilabel Classification

In previous work, we devised an approach for multilabel classification based on an ensemble of Bayesian networks. It was characterized by an efficient structural learning and by high accuracy. Its shortcoming was the high computational complexity of the MAP inference, necessary to identify the most probable joint configuration of all classes. In this work, we switch from the ensemble approach to the single model approach. This allows important computational savings. The reduction of inference times is exponential in the difference between the treewidth of the single model and the number of classes. We adopt moreover a more sophisticated approach for the structural learning of the class subgraph. The proposed single models outperforms alternative approaches for multilabel classification such as binary relevance and ensemble of classifier chains.

Giorgio Corani, Alessandro Antonucci, Denis D. Mauá, Sandra Gabaglio

Robustifying the Viterbi Algorithm

We present an efficient algorithm for estimating hidden state sequences in imprecise hidden Markov models (iHMMs), based on observed output sequences. The main difference with classical HMMs is that the local models of an iHMM are not represented by a single mass function, but rather by a set of mass functions. We consider as estimates for the hidden state sequence those sequences that are maximal. In this way, we generalise the problem of finding a state sequence with highest posterior probability, as is commonly considered in HMMs, and solved efficiently by the Viterbi algorithm. An important feature of our approach is that there may be multiple maximal state sequences, typically for iHMMs that are highly imprecise. We show experimentally that the time complexity of our algorithm tends to be linear in this number of maximal sequences, and investigate how this number depends on the local models.

Cedric De Boom, Jasper De Bock, Arthur Van Camp, Gert de Cooman

Extended Tree Augmented Naive Classifier

This work proposes an extended version of the well-known tree-augmented naive Bayes (TAN) classifier where the structure learning step is performed without requiring features to be connected to the class. Based on a modification of Edmonds’ algorithm, our structure learning procedure explores a superset of the structures that are considered by TAN, yet achieves global optimality of the learning score function in a very efficient way (quadratic in the number of features, the same complexity as learning TANs). A range of experiments show that we obtain models with better accuracy than TAN and comparable to the accuracy of the state-of-the-art classifier averaged one-dependence estimator.

Cassio P. de Campos, Marco Cuccu, Giorgio Corani, Marco Zaffalon

Evaluation of Rules for Coping with Insufficient Data in Constraint-Based Search Algorithms

A fundamental step in the PC causal discovery algorithm consists of testing for (conditional) independence. When the number of data records is very small, a classical statistical independence test is typically unable to reject the (null) independence hypothesis. In this paper, we are comparing two conflicting pieces of advice in the literature that in case of too few data records recommend (1) assuming dependence and (2) assuming independence. Our results show that assuming independence is a safer strategy in minimizing the structural distance between the causal structure that has generated the data and the discovered structure. We also propose a simple improvement on the PC algorithm that we call

blacklisting

. We demonstrate that blacklisting can lead to orders of magnitude savings in computation by avoiding unnecessary independence tests.

Martijn de Jongh, Marek J. Druzdzel

Supervised Classification Using Hybrid Probabilistic Decision Graphs

In this paper we analyse the use of probabilistic decision graphs in supervised classification problems. We enhance existing models with the ability of operating in hybrid domains, where discrete and continuous variables coexist. Our proposal is based in the use of mixtures of truncated basis functions. We first introduce a new type of probabilistic graphical model, namely probabilistic decision graphs with mixture of truncated basis functions distribution, and then present an initial experimental evaluation where our proposal is compared with state-of-the-art Bayesian classifiers, showing a promising behaviour.

Antonio Fernández, Rafael Rumí, José del Sagrado, Antonio Salmerón

Towards a Bayesian Decision Theoretic Analysis of Contextual Effect Modifiers

Relevance measures based on parametric and structural properties of Bayesian networks can be utilized to characterize predictors and their interactions. The advantage of the Bayesian framework is that it allows a detailed view of parametric and structural aspects of relevance for domain experts. We discuss two particularly challenging scenarios from psycho-genetic studies, (1) the analysis of weak effects, and (2) the analysis of contextual relevance, where a factor has a negligible main effect and it modifies an effect of another factor only in a given subpopulation. To cope with this challenge, we investigate the formalization of expert intuitions and preferences from the exploratory data analysis phase. We propose formal losses for these two scenarios. We introduce and evaluate a Bayesian effect size measure using an artificial data set related to a genetic association study, and real data from a psycho-genetic study.

Gabor Hullam, Peter Antal

Discrete Bayesian Network Interpretation of the Cox’s Proportional Hazards Model

Cox’s Proportional Hazards (CPH) model is quite likely the most popular modeling technique in survival analysis. While the CPH model is able to represent relationships between a collection of risks and their common effect, Bayesian networks have become an attractive alternative with far broader applications. Our paper focuses on a Bayesian network interpretation of the CPH model. We provide a method of encoding knowledge from existing CPH models in the process of knowledge engineering for Bayesian networks. We compare the accuracy of the resulting Bayesian network to the CPH model, Kaplan-Meier estimate, and Bayesian network learned from data using the EM algorithm. Bayesian networks constructed from CPH model lead to much higher accuracy than other approaches, especially when the number of data records is very small.

Jidapa Kraisangka, Marek J. Druzdzel

Minimizing Relative Entropy in Hierarchical Predictive Coding

The recent Hierarchical Predictive Coding theory is a very influential theory in neuroscience that postulates that the brain continuously makes (Bayesian) predictions about sensory inputs using a generative model. The Bayesian inferences (making predictions about sensory states, estimating errors between prediction and observation, and lowering the prediction error by revising hypotheses) are assumed to allow for efficient approximate inferences in the brain. We investigate this assumption by making the conceptual ideas of how the brain may minimize prediction error computationally precise and by studying the computational complexity of these computational problems. We show that each problem is intractable in general and discuss the parameterized complexity of the problems.

Johan Kwisthout

Treewidth and the Computational Complexity of MAP Approximations

The problem of finding the most probable explanation to a designated set of variables (the MAP problem) is a notoriously intractable problem in Bayesian networks, both to compute exactly and to approximate. It is known, both from theoretical considerations and from practical experiences, that low treewidth is typically an essential prerequisite to efficient exact computations in Bayesian networks. In this paper we investigate whether the same holds for approximating MAP. We define four notions of approximating MAP (by value, structure, rank, and expectation) and argue that all of them are intractable in general. We prove that efficient value-, structure-, and rank-approximations of MAP instances with high treewidth will violate the Exponential Time Hypothesis. In contrast, we hint that expectation-approximation can be done efficiently, even in MAP instances with high treewidth, if the most probable explanation has a high probability.

Johan Kwisthout

Bayesian Networks with Function Nodes

This paper introduces the notion of Bayesian networks augmented with function nodes. Two types of function nodes are considered. A real-valued function node represents a real value either used to parameterise one or more conditional probability distributions of the Bayesian network or a real value computed after a successful belief update or Monte Carlo simulation. On the other hand, a discrete function node represents a discrete marginal distribution. The paper includes four real-world examples that illustrate how function nodes have improved the flexibility and efficiency of utilizing Bayesian networks for reasoning with uncertainty in different domains.

Anders L. Madsen, Frank Jensen, Martin Karlsen, Nicolaj Soendberg-Jeppesen

A New Method for Vertical Parallelisation of TAN Learning Based on Balanced Incomplete Block Designs

The framework of Bayesian networks is a widely popular formalism for performing belief update under uncertainty. Structure restricted Bayesian network models such as the Naive Bayes Model and Tree-Augmented Naive Bayes (TAN) Model have shown impressive performance for solving classification tasks. However, if the number of variables or the amount of data is large, then learning a TAN model from data can be a time consuming task. In this paper, we introduce a new method for parallel learning of a TAN model from large data sets. The method is based on computing the mutual information scores between pairs of variables given the class variable in parallel. The computations are organised in parallel using balanced incomplete block designs. The results of a preliminary empirical evaluation of the proposed method on large data sets show that a significant performance improvement is possible through parallelisation using the method presented in this paper.

Anders L. Madsen, Frank Jensen, Antonio Salmerón, Martin Karlsen, Helge Langseth, Thomas D. Nielsen

Equivalences between Maximum a Posteriori Inference in Bayesian Networks and Maximum Expected Utility Computation in Influence Diagrams

Two important tasks in probabilistic reasoning are the computation of the maximum posterior probability of a given subset of the variables in a Bayesian network (MAP), and the computation of the maximum expected utility of a strategy in an influence diagram (MEU). Despite their similarities, research on both problems have largely been conducted independently, with algorithmic solutions and insights designed for one problem not (trivially) transferable to the other one. In this work, we show constructively that these two problems are equivalent in the sense that any algorithm designed for one problem can be used to solve the other with small overhead. These equivalences extend the toolbox of either problem, and shall foster new insights into their solution.

Denis D. Mauá

Speeding Up k-Neighborhood Local Search in Limited Memory Influence Diagrams

Limited memory influence diagrams are graph-based models that describe decision problems with limited information, as in the case of teams and agents with imperfect recall. Solving a (limited memory) influence diagram is an NP-hard problem, often approached through local search. In this paper we investigate algorithms for

k

-neighborhood local search. We show that finding a

k

-neighbor that improves on the current solution is W[1]-hard and hence unlikely to be polynomial-time tractable. We then develop fast schema to perform approximate

k

-local search; experiments show that our methods improve on current local search algorithms both with respect to time and to accuracy.

Denis D. Mauá, Fabio G. Cozman

Inhibited Effects in CP-Logic

An important goal of statistical relational learning formalisms is to develop representations that are compact and expressive but also easy to read and maintain. This is can be achieved by exploiting the modularity of rule-based structures and is related to the noisy-or structure where parents independently influence a joint effect. Typically, these rules are combined in an additive manner where a new rule increases the probability of the effect. In this paper, we present a new language feature for CP-logic, where we allow negation in the head of rules to express the inhibition of an effect in a modular manner. This is a generalization of the inhibited noisy-or structure that can deal with cycles and, foremost, is non-conflicting. We introduce syntax and semantics for this feature and show how this is a natural counterpart to the standard noisy-or. Experimentally, we illustrate that in practice there is no additional cost when performing inference compared to a noisy-or structure.

Wannes Meert, Joost Vennekens

Learning Parameters in Canonical Models Using Weighted Least Squares

We propose a novel approach to learning parameters of canonical models from small data sets using a concept employed in regression analysis: weighted least squares method. We assess the performance of our method experimentally and show that it typically outperforms simple methods used in the literature in terms of accuracy of the learned conditional probability distributions.

Krzysztof Nowak, Marek J. Druzdzel

Learning Marginal AMP Chain Graphs under Faithfulness

Marginal AMP chain graphs are a recently introduced family of models that is based on graphs that may have undirected, directed and bidirected edges. They unify and generalize the AMP and the multivariate regression interpretations of chain graphs. In this paper, we present a constraint based algorithm for learning a marginal AMP chain graph from a probability distribution which is faithful to it. We also show that the extension of Meek’s conjecture to marginal AMP chain graphs does not hold, which compromises the development of efficient and correct score+search learning algorithms under assumptions weaker than faithfulness.

Jose M. Peña

Learning Maximum Weighted (k+1)-Order Decomposable Graphs by Integer Linear Programming

This work is focused on learning maximum weighted graphs subject to three structural constraints: (1) the graph is decomposable, (2) it has a maximum clique size of

k

+ 1, and (3) it is coarser than a given maximum

k

-order decomposable graph. After proving that the problem is NP-hard we give a formulation of the problem based on integer linear programming. The approach has shown competitive experimental results in artificial domains. The proposed formulation has important applications in the field of probabilistic graphical models, such as learning decomposable models based on decomposable scores (e.g. log-likelihood, BDe, MDL, just to name a few).

Aritz Pérez, Christian Blum, Jose A. Lozano

Multi-label Classification for Tree and Directed Acyclic Graphs Hierarchies

Hierarchical Multi-label Classification (HMC) is the task of assigning a set of classes to a single instance with the peculiarity that the classes are ordered in a predefined structure. We propose a novel HMC method for tree and Directed Acyclic Graphs (DAG) hierarchies. Using the combined predictions of locals classifiers and a weighting scheme according to the level in the hierarchy, we select the “best” single path for tree hierarchies, and multiple paths for DAG hierarchies. We developed a method that returns paths from the root down to a leaf node (Mandatory Leaf Node Prediction or MLNP) and an extension for Non Mandatory Leaf Node Prediction (NMLNP). For NMLNP we compared several pruning approaches varying the pruning direction, pruning time and pruning condition. Additionally, we propose a new evaluation metric for hierarchical classifiers, that avoids the bias of current measures which favor conservative approaches when using NMLNP. The proposed approach was experimentally evaluated with 10 tree and 8 DAG hierarchical datasets in the domain of protein function prediction. We concluded that our method works better for deep, DAG hierarchies and in general NMLNP improves MLNP.

Mallinali Ramírez-Corona, L. Enrique Sucar, Eduardo F. Morales

Min-BDeu and Max-BDeu Scores for Learning Bayesian Networks

This work presents two new score functions based on the Bayesian Dirichlet equivalent uniform (BDeu) score for learning Bayesian network structures. They consider the sensitivity of BDeu to varying parameters of the Dirichlet prior. The scores take on the most adversary and the most beneficial priors among those within a contamination set around the symmetric one. We build these scores in such way that they are decomposable and can be computed efficiently. Because of that, they can be integrated into any state-of-the-art structure learning method that explores the space of directed acyclic graphs and allows decomposable scores. Empirical results suggest that our scores outperform the standard BDeu score in terms of the likelihood of unseen data and in terms of edge discovery with respect to the true network, at least when the training sample size is small. We discuss the relation between these new scores and the accuracy of inferred models. Moreover, our new criteria can be used to identify the amount of data after which learning is saturated, that is, additional data are of little help to improve the resulting model.

Mauro Scanagatta, Cassio P. de Campos, Marco Zaffalon

Causal Discovery from Databases with Discrete and Continuous Variables

Bayesian Constraint-based Causal Discovery (BCCD) is a state-of-the-art method for robust causal discovery in the presence of latent variables. It combines probabilistic estimation of Bayesian networks over subsets of variables with a causal logic to infer causal statements. Currently BCCD is limited to discrete or Gaussian variables. Most of the real-world data, however, contain a mixture of discrete and continuous variables. We here extend BCCD to be able to handle combinations of discrete and continuous variables, under the assumption that the relations between the variables are monotonic. To this end, we propose a novel method for the efficient computation of BIC scores for hybrid Bayesian networks. We demonstrate the accuracy and efficiency of our approach for causal discovery on simulated data as well as on real-world data from the ADHD-200 competition.

Elena Sokolova, Perry Groot, Tom Claassen, Tom Heskes

On Expressiveness of the AMP Chain Graph Interpretation

In this paper we study the expressiveness of the Andersson-Madigan-Perlman interpretation of chain graphs. It is well known that all independence models that can be represented by Bayesian networks also can be perfectly represented by chain graphs of the Andersson-Madigan-Perlman interpretation but it has so far not been studied how much more expressive this second class of models is. In this paper we calculate the exact number of representable independence models for the two classes, and the ratio between them, for up to five nodes. For more than five nodes the explosive growth of chain graph models does however make such enumeration infeasible. Hence we instead present, and prove the correctness of, a Markov chain Monte Carlo approach for sampling chain graph models uniformly for the Andersson-Madigan-Perlman interpretation. This allows us to approximate the ratio between the numbers of independence models representable by the two classes as well as the average number of chain graphs per chain graph model for up to 20 nodes. The results show that the ratio between the numbers of representable independence models for the two classes grows exponentially as the number of nodes increases. This indicates that only a very small fraction of all independence models representable by chain graphs of the Andersson-Madigan-Perlman interpretation also can be represented by Bayesian networks.

Dag Sonntag

Learning Bayesian Network Structures When Discrete and Continuous Variables Are Present

In any database, some fields are discrete and others continuous in each record. We consider learning Bayesian network structures when discrete and continuous variables are present. Thus far, most of the previous results assumed that all the variables are either discrete or continuous. We propose to compute a new Bayesian score for each subset of discrete and continuous variables, and to obtain a structure that maximizes the posterior probability given examples. We evaluate the proposed algorithm and make experiments to see that the error probability and Kullback-Leibler divergence diminish as

n

grows whereas the computation increases linearly in the logarithm of the number of bins in the histograms that approximate the density.

Joe Suzuki

Learning Neighborhoods of High Confidence in Constraint-Based Causal Discovery

Constraint-based causal discovery algorithms use conditional independence tests to identify the skeleton and invariant orientations of a causal network. Two major disadvantages of constraint-based methods are that (a) they are sensitive to error propagation and (b) the results of the conditional independence tests are binarized by being compared to a hard threshold; thus, the resulting networks are not easily evaluated in terms of reliability. We present PROPeR, a method for estimating posterior probabilities of pairwise relations (adjacencies and non-adjacencies) of a network skeleton as a function of the corresponding p-values. This novel approach has no significant computational overhead and can scale up to the same number of variables as the constraint-based algorithm of choice. We also present BiND, an algorithm that identifies neighborhoods of high structural confidence on causal networks learnt with constraint-based algorithms. The algorithm uses PROPeR; to estimate the confidence of all pairwise relations. Maximal neighborhoods of the skeleton with minimum confidence above a user-defined threshold are then identified using the Bron-Kerbosch algorithm for identifying maximal cliques. In our empirical evaluation, we demonstrate that (a) the posterior probability estimates for pairwise relations are reasonable and comparable with estimates obtained using more expensive Bayesian methods and (b) BiND; identifies sub-networks with higher structural precision and recall than the output of the constraint-based algorithm.

Sofia Triantafillou, Ioannis Tsamardinos, Anna Roumpelaki

Causal Independence Models for Continuous Time Bayesian Networks

The theory of causal independence is frequently used to facilitate the assessment of the probabilistic parameters of probability distributions of Bayesian networks. Continuous time Bayesian networks are a related type of graphical probabilistic models that describe stochastic processes that evolve over continuous time. Similar to Bayesian networks, the number of parameters of continuous time Bayesian networks grows exponentially in the number of parents of nodes. In this paper, we study causal independence in continuous time Bayesian networks. This new theory can be used to significantly reduce the number of parameters that need to be estimated for such models as well. We show that the noisy-OR model has a natural interpretation in the parameters of these models, and can be exploited during inference. Furthermore, we generalise this model to include synergistic and anti-synergistic effects, leading to noisy-OR synergy models.

Maarten van der Heijden, Arjen Hommersom

Expressive Power of Binary Relevance and Chain Classifiers Based on Bayesian Networks for Multi-label Classification

Bayesian network classifiers are widely used in machine learning because they intuitively represent causal relations. Multi-label classification problems require each instance to be assigned a subset of a defined set of

h

labels. This problem is equivalent to finding a multi-valued decision function that predicts a vector of

h

binary classes. In this paper we obtain the decision boundaries of two widely used Bayesian network approaches for building multi-label classifiers: Multi-label Bayesian network classifiers built using the

binary relevance method

and Bayesian network

chain classifiers

. We extend our previous single-label results to multi-label chain classifiers, and we prove that, as expected, chain classifiers provide a more expressive model than the binary relevance method.

Gherardo Varando, Concha Bielza, Pedro Larrañaga

An Approximate Tensor-Based Inference Method Applied to the Game of Minesweeper

We propose an approximate probabilistic inference method based on the CP-tensor decomposition and apply it to the well known computer game of Minesweeper. In the method we view conditional probability tables of the exactly ℓ-out-of-

k

functions as tensors and approximate them by a sum of rank-one tensors. The number of the summands is min {

l

+ 1,

k

−

l

+ 1}, which is lower than their exact symmetric tensor rank, which is

k

. Accuracy of the approximation can be tuned by single scalar parameter. The computer game serves as a prototype for applications of inference mechanisms in Bayesian networks, which are not always tractable due to the dimensionality of the problem, but the tensor decomposition may significantly help.

Jiří Vomlel, Petr Tichavský

Compression of Bayesian Networks with NIN-AND Tree Modeling

We propose to compress Bayesian networks (BNs), reducing the space complexity to being fully linear, in order to widely deploy in low-resource platforms, such as smart mobile devices. We present a novel method that compresses each conditional probability table (CPT) of an arbitrary binary BN into a Non-impeding noisy-And Tree (NAT) model. It achieves the above goal in the binary case. Experiments demonstrate that the accuracy is reasonably high in the general case and higher in tested real BNs. We show advantages of the method over alternatives on expressiveness, generality, space reduction and online efficiency.

Yang Xiang, Qing Liu

A Study of Recently Discovered Equalities about Latent Tree Models Using Inverse Edges

Interesting equalities have recently been discovered about latent tree models. They relate distributions of two or three observed variables with joint distributions of four or more observed variables, and with model parameters that depend on latent variables. The equations are derived by using matrix and tensor decompositions. This paper sheds new light on the equalities by offering an alternative derivation in terms of variable elimination and structure manipulations. The key technique is the introduction of inverse edges.

Nevin L. Zhang, Xiaofei Wang, Peixian Chen

An Extended MPL-C Model for Bayesian Network Parameter Learning with Exterior Constraints

Lack of relevant data is a major challenge for learning Bayesi-an networks (BNs) in real-world applications. Knowledge engineering techniques attempt to address this by incorporating domain knowledge from experts. The paper focuses on learning node probability tables using both expert judgment and limited data. To reduce the massive burden of eliciting individual probability table entries (parameters) it is often easier to elicit

constraints

on the parameters from experts. Constraints can be interior (between entries of the same probability table column) or exterior (between entries of different columns). In this paper we introduce the first auxiliary BN method (called MPL-EC) to tackle parameter learning with exterior constraints. The MPL-EC itself is a BN, whose nodes encode the data observations, exterior constraints and parameters in the original BN. Also, MPL-EC addresses (i) how to estimate target parameters with both data and constraints, and (ii) how to fuse the weights from different causal relationships in a robust way. Experimental results demonstrate the superiority of MPL-EC at various sparsity levels compared to conventional parameter learning algorithms and other state-of-the-art parameter learning algorithms with constraints. Moreover, we demonstrate the successful application to learn a real-world software defects BN with sparse data.

Yun Zhou, Norman Fenton, Martin Neil

Springer Professional

Über dieses Buch

Inhaltsverzeichnis

Frontmatter