Introduction
Literature review
Methods and techniques
Description of data set
Attributes | Type | Description |
---|---|---|
Product_Info_1-7 | Categorical | 7 normalized attributes concerning the product applied for |
Ins_Age | Numeric | Normalized age of an applicant |
Ht | Numeric | Normalized height of an applicant |
Wt | Numeric | Normalized weight of an applicant |
BMI | Numeric | Normalized Body Mass Index of an applicant |
Employment_Info_1-6 | Numeric | 6 normalized attributes concerning employment history of an applicant |
InsuredInfo_1-6 | Numeric | 6 normalized attributes offering information about an applicant |
Insurance_History_1-9 | Numeric | 9 normalized attributes relating to the insurance history of an applicant |
Family_Hist_1-5 | Numeric | 5 normalized attributes related to an applicant’s family history |
Medical_History_1-41 | Numeric | 41 normalized variables providing information on an applicant’s medical history |
Medical_Keyword_1-48 | Numeric | 48 dummy variables relating to the presence or absence of a medical keyword associated with the application |
Response | Categorical | Target variable, which is an ordinal measure of risk level, having 8 levels |
Data pre-processing
Data exploration using visual analytics
Dimensionality reduction
Correlation-based feature selection
Principal components analysis feature extraction
Comparison between correlation-based feature selection and principal components analysis feature extraction
Supervised learning algorithms
Multiple linear regression model
REPTree algorithm
Random Tree
Artificial neural network
Experiments and results
Data pre-processing
Missing data mechanism
Missing data imputation
-
Imputation: This step involves the imputation of the missing values several times depending on the number of imputations stated. This step results in a number of complete data sets. The imputation is usually done by a predictive model, such as linear regression to replace missing values by predicted ones based on the other variables present in the data set.
-
Analysis: The various complete data sets formed are analyzed. Parameter estimates and standard errors are evaluated.
-
Pooling: The analysis results are then integrated together to form a final result.
Executive dashboard
Comparison between feature selection and feature extraction
Algorithms | Error measures | |||
---|---|---|---|---|
CFS | PCA | |||
MAE | RMSE | MAE | RMSE | |
Multiple linear regression | 1.5872 | 2.0309 | 1.6396 | 2.0659 |
Artificial neural network | 1.7859 | 2.369 | 1.7261 | 2.3369 |
REPTree | 1.5285 | 2.027 | 1.6973 | 2.1607 |
Random Tree | 1.7892 | 2.7475 | 2.0305 | 2.9142 |