1 Introduction
1.1 Assumptions
1.2 Target metrics
1.3 Contribution
2 Related work
2.1 Cost sensitive classification
2.2 Timeliness of prediction
2.3 External data acquisition
2.4 Active learning
3 Proposed framework: NPVModel
3.1 External data acquisition
-
Case A: (The basic model) No external data.
-
Case B: Up-to-date batch-wise external instances.
-
Case C: One-time external instance dump.
-
Case D: External features for each of the in-house instances.
3.2 Machine learning model development
3.3 Predictions to dollar value
3.4 Integrating with standard techniques
3.5 Variable cost modeling
4 Experimental setup
Dataset
|
% Minority
|
Instances
|
Ext. Data Instances
|
Time stamps
|
Costs
|
---|---|---|---|---|---|
Pendigits | 8.3 | 13,821 | simulated | simulated | simulated |
Medicare | 12.9 | 611,785 | 853,360 | simulated | simulated |
Open city data | 33.2 | 250,000 | 77 | actual | actual |
4.1 Datasets
4.2 Testing workflow
5 Results
5.1 Interpreting the results
5.2 Should one get external data?
5.3 How much external instance data should one get?
5.4 Model development strategies
5.5 Cost factor
5.6 Price negotiation for external data
5.7 Scalability
pendigits
) and for larger datasets (medicare
and open city data
). The modified cost matrix defined in Section 3.4 is agnostic to the size of the dataset in question. This makes NPVModel a powerful and easy addition to the existing modeling techniques and considerations.