1 Introduction
2 State of the art
3 Methodology
3.1 Decision tree algorithm
3.2 Optimization strategy of the decision tree algorithm
Input: the indefinite set of data D, all the attributes list attribute_list contained in D | |
Output: uncertain decision tree | |
Start: | |
1) create a node N; | |
2) If indeterminate dataset D all the tuple class labels are C; | |
3) return to N as a leaf node and mark as a class C; | |
4) Else if (attribute_list empty) then | |
5) return to the N node and mark with the majority of the class marks in the remaining tuples; | |
6) End if; | |
7) the information gain rate of each attribute is calculated, and the highest information gain rate is selected as the N point. | |
8) If (attribute is continuous or uncertain) then | |
9) select a split position Y; | |
10) For (R per unit of tuple) do | |
11) If (attribute = y) then | |
12) the weight of lD is w Rj. | |
13) Else if (attribute>y) then | |
14) the weight of rD is w Rj. | |
15) Else | |
16) to take the weight of lD from yxjdxxfw R | |
17) to take the weight of rD from (.Xyjdxxfw R 2) | |
18) End if; | |
19) End for; | |
20) Else For | |
21) each discrete attribute value NIA),..., 3,2,1 (I from do) | |
22) a direct downward division of iD branches; | |
23) End for; | |
24) End if; | |
25) For (each iD) do | |
26) according to the division rules of the decision tree, the nodes continue to be divided. | |
27) delete the attributes that have been partitioned from attribute_list after each partition. | |
28) End for; | |
29) End |
4 Result analysis and discussion
Category | Optimal decision tree | Naive Bayes | Logistic |
---|---|---|---|
Modeling time (s) | 3.21 | 4.65 | 6.39 |
Accuracy rate (%) | 78.6 | 68.2 | 73.1 |
Error rate (%) | 0.575 | 0.681 | 0.673 |