Introduction
Function of DAPS and its development
The need for and purpose of DAPS
Function | Supporting literature |
---|---|
Support the translation of business goals into a data-analytic problem (that is, into questions about Y and X variables) | |
Help to define the anticipated business value, by articulating how the model is going to be used, and what value that is expected to bring | |
Help to define the type of model that is needed | |
Support the breaking down of a long-term ambition into manageable chunks, and thus scope the project |
Developing DAPS as a problem-structuring device
Research strategy
-
Students did not specify a specific action or decision that the data-analytic model’s predictions were intended to support, and as a consequence, they failed to explicate how the data-analytic model was envisioned to deliver the anticipated business value.
-
The definition of the \(Y\) characteristic that the model should predict was insufficiently precise (for example, in terms of prediction horizon or unit of analysis). As a consequence, the predictions might be ineffective in supporting the specified decisions or actions.
-
Students did not specify whether their application required a causal \(Y=f\left(X\right)\) model or a purely predictive (that is, correlational) model. This depends on whether the decisions or actions that the model is to support involve interventions in the \(X\)’s (as explained later).
Sector | Projects in 1st round | Later rounds |
---|---|---|
Industry | 8 | 7 |
Energy and infrastructure | 3 | |
Finance | 12 | |
Healthcare | 9 | |
Government | 2 | |
Agrofood | 1 | |
Retail & logistics | 4 | |
Sports and entertainment | 1 | |
Total | 10 | 37 |
The components of DAPS and how they should be used
Lower part of DAPS: the data-analytic problem
Middle part of DAPS: decision framework
Upper part of DAPS: business value
Example 1: Forecast-driven logistics
How DAPS is used to facilitate project scoping
Making projects manageable by chunking
-
First building a proof of concept (subproject 1), then building a deployable model (subproject 2).
-
First modeling the relation between the \(X\)’s and \(Y\) for a subsample of the data (e.g., a limited number of sites, such as a small, a medium, and a large site) (subproject 1), then generalizing the results to the full population of interest (subproject 2).
-
First creating an algorithm or infrastructure to measure outcomes or predictors, thus obtaining data on \(X\) or \(Y\) variables (subproject 1), then using these data to build a model (subproject 2).
-
First building a model (subproject 1), then optimizing the decision framework given the reduced but remaining decision uncertainty (subproject 2).
-
The core problem is to build a predictive model, which signals a problem called asymmetry in a power grid so that technicians can intervene and take measures to avoid nuisance for customers.
-
This application is based on the assumption that nuisance experienced by customers is actually caused by asymmetry, which is however not undisputed. A secondary data-analytic problem is to test this assumption, and establish to what extent nuisance is caused by asymmetry.
-
Asymmetry is not measured directly, and instead, is to be derived from raw sensor readings in the grid. The derivation of relevant characteristics from raw data is called feature engineering in data science, and the design of this algorithm that computes asymmetry values from available data was a premise for fitting the other two models.
Example 2: early warning system in a power grid
-
Subproject 1: Determine asymmetry from raw sensor data. The newly installed sensors in the transformation stations allowed the computation of asymmetry in each station every 15 min. The first subproject developed this algorithm and implemented the infrastructure needed for collecting and storing the resulting asymmetry data.
-
Subproject 2: Establish whether complaints and issues are caused by asymmetry. The asymmetry data made available through the algorithm of Subproject 1 would then be collated with customer complaints to analyze what fraction of issues experienced by end users could be related to asymmetry.
-
Subproject 3: Predict asymmetry. Provided the second subproject established that a substantial part of customer complaints is due to asymmetry, the third subproject would then develop a model that predicts asymmetry before it occurs. For training the model, the subproject used the asymmetry data created in the first subproject. The model predicts asymmetry in a transformation station (unit of analysis) every 15 min. If asymmetry is detected, this is signaled to the grid’s operational management, who will then decide how to intervene.
Conclusions
Appendix: evaluation protocol
-
Each project was reviewed multiple times as it unfolded (typically four to eight times in total). Students applied DAPS in the Business Understanding phase of their projects, but the authors followed the projects also in the ensuing CRISP-DM phases to identify issues in the project definition that emerged later.
-
One or both authors, who are both experienced data scientists, as well as experts in theory on data science.
-
The evaluations were done in discussion with the students applying the proposed DAPS technique.
-
The DAPS model and the instructions used to explain the technique.
-
Applications of DAPS in the projects were evaluated on how well the functions listed in Table 1 were fulfilled, how helpful the DAPS technique had been for fulfilling them, and how good the resulting project definition was. Below, we make these three main criteria more specific.
-
Function 1: Translation of business goal into data-analytic problem. Based on theory in machine learning and data-analytics such as [4, 5, 16], the validity and preciseness of the data-analytic problem definition were assessed. For example, is the problem a valid predictive, prescriptive or diagnostic problem? Are the \(Y\)variables well-defined?
-
Function 2: Definition of business value. The rationale and precision of the decision framework was assessed, as well as its relationship with project and organization KPIs. For example, how precise is the definition of the decisions or actions for which the algorithm is to be used? How convincing is the rationale for linking the decision framework to the anticipated improvement in the mentioned KPIs? Does the organization itself recognize the goals described as organization KPIs?
-
Function 3: Specification of the type of model. Based on theory in statistical learning and machine learning [4, 41‐43] it was assessed whether the type of model (causal, correlational or deductive) is suitable for the model’s purpose. For example, when the model should be able to make predictions involving interventions in the \(X\) variables, was the model specified as causal instead of correlational?
-
Scope the project into manageable chunks: it was assessed whether the technique facilitated the process of breaking down a large ambition into manageable subprojects.
-
Students were asked whether the DAPS model and its instructions were helpful in making the project definition.
-
Students were asked to identify elements of DAPS that they found unclear or difficult to apply, as well as suggestions for improvement.
-
All 47 projects followed the CRISP-DM stage model, where students applied DAPS in the Business Understanding stage. The authors followed the projects also in the ensuing stages (Data Understanding, Data Preparation, Modelling, Evaluation and Deployment), identifying issues in the original project definition.
-
In case of issues, the authors discussed with the students whether these could have been prevented by improving the DAPS model or its instructions, or instead, that the issues were due to unknowns or complications in the project’s context itself.