1 Introduction
1.1 Problem Statement
1.2 Procedure
1.3 Contribution
2 Background: Evaluation Methods in PAIS
2.1 Concepts and Definitions
2.2 Related Work
2.3 Classification of Evaluation Methods
-
Behavior-based This category includes evaluation methods that collect data from users in order to analyze users’ behavior during the interaction with the investigated artifacts, for instance, to identify if users interact with the artifacts in a planned manner. Representative methods are, for example, observational techniques, thinking aloud protocols and eye tracking. Moreover, behavior-based methods analyze user-centered performance, for instance log files analysis of the recorded users’ interaction with artifacts or the analysis of time that the users need to solve tasks with a prototype.
-
Opinion-based Opinion-based methods evaluate users’ opinions with regard to the investigated artifacts, e.g., through questionnaires and interviews. These methods can be helpful to detect suggestions for improvements from users or to assess the satisfaction of users for investigated artifacts.
-
Predictive Evaluation methods of this category aim at assessing the context of use of investigated artifacts depending on different requirements (e.g., domains, systems, and users). Typical examples are inspections, walkthroughs, use cases, and scenarios. For example, an inspection can be applied to analyze the investigated artifacts in regard to usability heuristics. Such evaluation methods can already be used very early in development process (e.g., to investigate which tasks the users want to perform and if these tasks could be carried out with the investigated artifacts). Moreover, these methods assess the investigated artifacts in regard to their feasibility (e.g., prototypical implementation of a concept) and to their performances in order to evaluate the artifacts under realistic conditions (e.g., execution time of an algorithm).
3 Methodology
-
RQ1: Which evaluation methods are typically used? This question aims to identify which evaluation methods are currently utilized to assess and evaluate artifacts for human orientation in general, but also for security and visualization in PAIS.
-
RQ2: Which evaluation methods are of future interest? The aim is to detect further methods that can be of interest for the evaluation of artifacts with focus on human orientation in general, but also on security and on visualization.
-
RQ3: How can these evaluation methods be classified into the categories Behavior-based, Opinion-based, and Predictive based on the purpose of evaluation? We analyze if typical and future evaluation methods can be categorized into Behavior-based, Opinion-based, and Predictive in order to examine the applicability of this classification.
4 Literature Review
4.1 Procedure
4.1.1 Literature Search
4.1.2 Literature Selection
Aspects | Keywords | Total hits | Selected papers |
---|---|---|---|
Human orientation | Human orientation, work experience, experience, resource, allocation, capabilities, organizational model, actor, human agent, human resources, work distribution, skills, capabilities, competencies, attitudes, experience process aware information systems, and workflow systems | 607 | 59 |
Security | Workflow security and business process security | 670 | 67 |
Visualization | Layout algorithm for business process, process model editor, worklist visualization, Process visualization, workflow visualization, RBAC visualization as well as event logs and business process visualization | 1799 | 151 |
4.1.3 Data Extraction and Synthesis
4.1.4 Categorization
4.2 Results of the Literature Review
Category | Evaluation methods |
---|---|
Behavior-based | Thinking aloud, observation, and video/audio record analysis |
Opinion-based | Questionnaire, interview, and focus group (includes group discussion) |
Predictive | Application (includes case, example, scenario, storyboard, and use case), contextual inquiry method, expert panel, formalization, function tests, implementation (includes prototypical implementation), inspection (includes heuristics and reviews), simulation, and performance measures (includes measurements of the artifacts like complexity measures, precision measures, generalization measures, robustness measures, precision and recall metrics, and execution time) |
Type | Artifacts |
---|---|
Executable | Algorithm, implementation, prototypical implementation, and system |
Theoretical | Algorithm, architecture, concept, environment, framework, guidelines, literature, mechanism, methodology, model, pattern, requirements, strategy, and theory |
Category | Human orientation | Security | Visualization | Total |
---|---|---|---|---|
Behavior-based | 11 | 2 | 9 | 22 |
Opinion-based | 17 | 4 | 18 | 39 |
Predictive | 95 | 102 | 244 | 441 |
Executable | 19 | 10 | 116 | 145 |
Theoretical | 47 | 71 | 103 | 221 |
5 Expert Survey
5.1 Sample
5.2 Procedure
5.3 Results of the Expert Survey
Category | Evaluation methods |
---|---|
Behavior-based | Eye tracking, observation, performance analysis of user activities, and thinking aloud |
Opinion-based | Interview, questionnaire, and expert session |
Predictive | Card sorting, conformance checking, data sensitivity analysis, discourse analysis, (expert) inspection, focus group, heuristic evaluation/heuristics, model checking, performance measures, policy formalization, prototype (including wizard of oz), quality metrics, review, simulation, soundness, case/use case/scenario, and walkthrough |
Type | Artifacts |
---|---|
Executable | Encryption algorithms, process mining algorithms, authentication, execution monitor, information system, process/runtime engine, (hi-fidelity) prototype, prototypical implementation, user interface, software mockup, and user interface/worklist |
Theoretical | Access control policy, conceptual model (data/process), data to visual representation mapping, domain-specific (modeling) language (DSL/DSML) for the specification of process-related security properties, initial sketches, knowledge map, organizational model, paper mockup, mockup, paper prototype, platform-specific model (PSM) with process-related security properties, process logs, process model, process priority model, quality framework, requirements description, security requirements documents, scenarios, security ontology, task to visualization mapping, usage control policy, use cases, use case descriptions, and use cases/functional descriptions |
Category | Evaluation methods |
---|---|
Behavior-based |
Emotion tracking, eye tracking, insight-based evaluation, neuroscience methods, neuroscientific analysis, and observation |
Opinion-based | Questionnaire |
Predictive | Card sorting, collaborative ratings, consistency checking, data sensitivity, dataflow correctness, discourse analysis, performance measures, review, semiotic analysis, simulation, user access rights evaluation, and walkthrough |
6 Focus Group
6.1 Sample
6.2 Procedure
6.3 Results of the Focus Group
7 Summary of Evaluation Methods
Evaluation methods | Category | Artifact | Hum | Sec | Vis |
---|---|---|---|---|---|
Application | P | E |
\(\boxtimes\)
|
\(\boxtimes\)
|
\(\boxtimes\)
|
(includes Case, Example, Scenario,
| T |
\(\boxtimes\)
|
\(\boxtimes\)
|
\(\boxtimes\)
| |
Storyboard, and Use Case)
| |||||
Card sorting | P | T |
\(\boxtimes\)
|
\(\square\)
|
\(\square\)
|
Contextual inquiry | B | T |
\(\boxtimes\)
|
\(\square\)
|
\(\square\)
|
Conformance checking | P | E |
\(\square\)
|
\(\boxtimes\)
|
\(\square\)
|
Data sensitivity analysis | P | T |
\(\square\)
|
\(\boxtimes\)
|
\(\square\)
|
Discourse analysis | P | T |
\(\square\)
|
\(\boxtimes\)
|
\(\square\)
|
Correctness | P | T |
\(\boxtimes\)
|
\(\boxtimes\)
|
\(\square\)
|
(includes formalization, model checking,
| |||||
and Soundness)
| |||||
Expert panel/session | O | E |
\(\boxtimes\)
|
\(\square\)
|
\(\square\)
|
T |
\(\boxtimes\)
|
\(\square\)
|
\(\square\)
| ||
Eye tracking | B | E |
\(\boxtimes\)
|
\(\square\)
|
\(\boxtimes\)
|
T |
\(\boxtimes\)
|
\(\square\)
|
\(\boxtimes\)
| ||
Focus group | O | E |
\(\square\)
|
\(\square\)
|
\(\boxtimes\)
|
(includes Group Discussion)
| T |
\(\boxtimes\)
|
\(\boxtimes\)
|
\(\boxtimes\)
| |
P | T |
\(\square\)
|
\(\square\)
|
\(\boxtimes\)
| |
Functionality test | P | T |
\(\boxtimes\)
|
\(\square\)
|
\(\square\)
|
Implementation | P | T |
\(\boxtimes\)
|
\(\boxtimes\)
|
\(\boxtimes\)
|
(includes prototype)
| |||||
Interview | O | E |
\(\boxtimes\)
|
\(\boxtimes\)
|
\(\boxtimes\)
|
T |
\(\boxtimes\)
|
\(\boxtimes\)
|
\(\boxtimes\)
| ||
Inspection | P | E |
\(\boxtimes\)
|
\(\square\)
|
\(\boxtimes\)
|
(includes heuristics and review)
| T |
\(\boxtimes\)
|
\(\boxtimes\)
|
\(\boxtimes\)
| |
Observation | B | E |
\(\boxtimes\)
|
\(\boxtimes\)
|
\(\boxtimes\)
|
T |
\(\boxtimes\)
|
\(\boxtimes\)
|
\(\boxtimes\)
| ||
Performance analysis of user activities | B | E |
\(\boxtimes\)
|
\(\square\)
|
\(\boxtimes\)
|
Performance measures/testing of systems | P | E |
\(\boxtimes\)
|
\(\boxtimes\)
|
\(\boxtimes\)
|
Questionnaire | O | E |
\(\boxtimes\)
|
\(\boxtimes\)
|
\(\boxtimes\)
|
T |
\(\square\)
|
\(\boxtimes\)
|
\(\boxtimes\)
| ||
Quality metrics | P | T |
\(\square\)
|
\(\boxtimes\)
|
\(\square\)
|
Simulation | P | E |
\(\boxtimes\)
|
\(\boxtimes\)
|
\(\square\)
|
T |
\(\boxtimes\)
|
\(\square\)
|
\(\square\)
| ||
Thinking aloud | B | E |
\(\square\)
|
\(\square\)
|
\(\boxtimes\)
|
T |
\(\boxtimes\)
|
\(\boxtimes\)
|
\(\boxtimes\)
| ||
Video/audio recording | B | E |
\(\boxtimes\)
|
\(\square\)
|
\(\boxtimes\)
|
T |
\(\boxtimes\)
|
\(\square\)
|
\(\square\)
| ||
Walkthrough | P | E |
\(\boxtimes\)
|
\(\boxtimes\)
|
\(\boxtimes\)
|
T |
\(\boxtimes\)
|
\(\boxtimes\)
|
\(\boxtimes\)
|
8 Discussion
8.1 Results
8.1.1 Result 1: Focus on Predictive Evaluation Methods
8.1.2 Result 2: Ten Widely used Evaluation Methods
-
performance measures/testing of systems and questionnaires for executable artifacts
-
implementations, inspections, focus groups, and thinking aloud for theoretical artifacts
-
applications, interviews, observations, and walkthroughs for theoretical as well as for executable artifacts