Introduction
Existing literature on Bayesian networks in a federated setting
Our contribution
Bayesian networks
Structure learning
Parameter learning
Method
VertiBayes
Structure learning
Parameter learning
Number of scalar product protocols | \(\begin{array}{l}O(m), \\ \hbox {where } m\hbox { is the number of unique parent-child value combinations }\\ \hbox {for which a probability needs to be calculate}\end{array}\) |
Number of scalar product subprotocols per protocol | \(\begin{array}{l}\frac{n!}{(x!(n-x)!))},\hbox { for each }x, 2<= x <=n, \\ \hbox {where }n\hbox { is the number of parties involved in the protocol}\end{array}\) |
Number of multiplications per subprotocol | \(\begin{array}{l}O(p*n*(n-1)), \\ \hbox {where }p\hbox { is the population size, and }n\hbox { is the number of parties} \\ \hbox {involved in the protocol} \end{array}\) |
Time complexity when training a model in a federated setting
Federated classification and model validation
Hyperparemeters
-
Will structure learning be done using a predefined structure based on expert knowledge, or by utilizing the K2 algorithm.
-
If K2 is used for the structure learning, what are the maximum amount of parents a node may have.
-
Is discretization of continuous variables done using predefined bins based on expert knowledge, or utilizing an automatic approach. There are different strategies possible for automatic discretization which may have their own hyperparameters.
-
What validation strategy is chosen from among the options in “Federated classification and model validation”.
Experimental setup
Structure learning
Parameter learning
Scaling in the number of dataparties
Results
Structure learning
Parameter learning
Dataset | AUC | AIC | ||||||
---|---|---|---|---|---|---|---|---|
Training method | Training method | |||||||
Centralized learning | Federated learning | Centralized learning | Federated learning | AIC difference | ||||
Missing data % | Public validation | Public validation | SCV validation | SVDG validation | ||||
Alarm population size: 10,000 | 0 | 0.91 | 0.91 | 0.91 | 0.91 | −340571 | −318469 | −6.49% |
5 | 0.88 | 0.89 | 0.88 | 0.89 | −315856 | −313612 | −0.71% | |
10 | 0.85 | 0.86 | 0.86 | 0.86 | −340823 | −350576 | 2.86% | |
30 | 0.76 | 0.76 | 0.76 | 0.76 | −444297 | −444866 | 0.13% | |
Asia population size: 10,000 | 0 | 1.00 | 1.00 | 1.00 | 1.00 | −22555 | −22559 | 0.02% |
5 | 0.76 | 0.76 | 0.76 | 0.76 | −23430 | −23395 | −0.15% | |
10 | 0.69 | 0.7 | 0.7 | 0.7 | −24105 | −24090 | −0.06% | |
30 | 0.58 | 0.59 | 0.58 | 0.59 | −25517 | −25613 | 0.37% | |
Diabetes population size: 768 | 0 | 0.8 | 0.77 | 0.95 | 0.79 | −13593 | −14407 | 5.99% |
5 | 0.79 | 0.74 | 0.92 | 0.76 | −13699 | −14556 | 6.25% | |
10 | 0.75 | 0.72 | 0.90 | 0.73 | −13761 | −14408 | 4.70% | |
30 | 0.61 | 0.57 | 0.80 | 0.6 | −14736 | −15136 | 2.71% | |
Iris population size: 150 | 0 | 0.99 | 0.98 | 1.00 | 1.00 | −1036 | −1022 | −1.33% |
5 | 0.97 | 0.96 | 0.99 | 0.97 | −1243 | −1176 | −5.37% | |
10 | 0.9 | 0.89 | 0.95 | 0.92 | −1491 | −1022 | −31.49% | |
30 | 0.98 | 0.99 | 1.00 | 0.99 | −1099 | −1381 | 25.67% |
Time complexity
Scaling in the number of dataparties
AUC | ||||||
---|---|---|---|---|---|---|
Number of parties | Public validation | SCV validation | SVDG validation | AIC score | Running time MS | |
Asia Missing data: 0% | 2 | 0.97 | 1.00 | 1.00 | −22575 | 142292 |
3 | 0.98 | 1.00 | 1.00 | −22564 | 144124 | |
4 | 0.98 | 1.00 | 1.00 | −22642 | 145343 | |
5 | 0.98 | 1.00 | 1.00 | −22600 | 144756 | |
6 | 0.97 | 1.00 | 1.00 | −22570 | 142854 | |
7 | 0.99 | 1.00 | 1.00 | −22530 | 144551 | |
8 | 0.98 | 1.00 | 1.00 | −22488 | 145143 | |
Asia Missing data: 10% | 2 | 0.70 | 0.71 | 0.70 | −23888 | 426849 |
3 | 0.70 | 0.70 | 0.70 | −23837 | 426379 | |
4 | 0.70 | 0.69 | 0.71 | −23837 | 425364 | |
5 | 0.70 | 0.70 | 0.71 | −23918 | 427819 | |
6 | 0.70 | 0.71 | 0.71 | −24042 | 427391 | |
7 | 0.69 | 0.69 | 0.71 | −23871 | 432755 | |
8 | 0.70 | 0.70 | 0.70 | −24073 | 412467 |