Crisis management command units are confronted with major incidents or disasters, have to deal with the respective situation and coordinate the individual forces of the emergency and rescue services. While it is fortunate that major incidents requiring command units are rare, this results in command unit members having little experience with real incidents. Therefore, appropriate training and its evaluation to ensure high quality is essential. The following sections provide brief background information on command units, their training, and approaches for the corresponding training evaluation.
2.1 Command Units
The command unit or the command staff, here used as synonyms, has its origin in the military and represents a team that supports the leader or incident commander
1 (Heath
1998; Lamers
2016). Nowadays, command units are present in some civilian areas, such as the fire services. The command unit is defined as a consulting and supporting council that assists the decision-making incident commander through specific roles, structures, and information flows (Hofinger and Heimann
2016). Command units can be regarded as emergency management teams dealing with incidents that are too large for local forces such as single fire brigades or rescue services. As such, these units have to process information quickly and come to decisions in a short time (Heath
1998; Wybo and Kowalski
1998). Thus, command units consist of different groups of experts and senior emergency managers. Their work can be described as a “flexible yet robust decision environment that uses both centralised and delegatory decision processes” (Heath
1998, p. 141). In Germany, the command unit consists of higher and senior service firefighters and experienced leaders of the volunteer fire brigades who have several years of service and have completed a specific training course (Feuerwehr-Dienstvorschrift (FwDV) 2
2012; Hofinger and Heimann
2016). Unlike the lower management levels, the command unit works as rear leading support, which means it is usually not at the site of the incident (Lamers
2016). Rather, it is located in specially prepared command unit rooms or a command center assisting and coordinating the work of the formation leaders.
2 Though command units have little experience in real incidents—according to Lamers (
2016), in Germany such incidents statistically occur about every 25 years—command units are preemptively deployed during scheduled major public events such as the Football World Cup. Therefore, these units have to practice in command post exercises.
2.2 Command Unit Structure and Training
The main phases of disaster and crisis management are generally known as prevention/mitigation, preparedness, response, and recovery (Heath
1998; Grunnan and Fridheim
2017). In this context, for a command unit to work efficiently it must create a specific organizational structure that allows it to deal with the emergency situation, to anticipate upcoming situational changes, and to coordinate the emergency teams on site (Heath
1998; Wybo and Kowalski
1998; Lamers
2016). In Germany, command unit members are divided into subject areas, whereby the subject areas (S) 1–4 and the incident commander are mandatory and subject areas 5 and 6 and the command unit leader are optional. The subject areas have different responsibilities, such as coordination of all personnel activities (S1), analysis of the situation (S2), action planning (S3), supply and sustenance (S4), press and media relations (S5), and information and communications systems (S6). Additionally, certain members coordinate all information sent to the command unit, and other members communicate with other organizations such as the police or technical rescue services (Lamers
2016). The German system is similar to the Incident Command System (ICS), as both have the same origin in NATO staffs (Lamers
2016). Therefore, similar principles and structures are applied: the “ICS Operations Section” is comparable to subject area S3, “Planning” is mainly found in S2 (and parts of S3), and “Logistics” is mostly covered in S1 and S4. There are differences, however, in that in Germany there is no command staff subordinate to the incident commander (their tasks are largely assigned to individual subject areas) and that the tasks of the “ICS Finance/Administration Section” are either handled by the entire staff or—in case of large-scale disasters—by a separate crisis committee dealing with all administrative measures related to the incident (Lamers
2016).
To successfully accomplish their tasks command unit members need to be very well trained. Their training consists of classical teaching elements and exercises; an excellent overview with a focus on exercises can be found at Grunnan and Fridheim (
2017). In Germany, prerequisite for attending the command unit training is a successfully completed training for formation leaders (Lamers
2016).
3 During the command unit training, several teaching methods are used, such as knowledge transfer during lectures, table top exercises, teamwork tasks, and command post exercises (Hofinger and Heimann
2016).
According to the literature on command unit work, critical success factors for teams include processing, coordinating, and integrating complex information as well as developing a shared comprehension within the entire team (Wybo and Kowalski
1998; Hagemann et al.
2012; Thieme and Hofinger
2012). Taking a closer look, these attributes infer that command units must have “shared mental models” (SMM). Shared mental models are a commonly discussed model in team decision making and is defined as “knowledge structures held by members of a team that enable them to form accurate explanations and expectations for the task, and […] to coordinate their actions and adapt their behavior to demands of the task and other team members” (Cannon-Bowers et al.
2013, p. 228). To generate these shared understandings of complex situations during major incidents, the command unit members need to communicate efficiently (Hagemann, Kluge, and Ritzmann
2012), which implies that they must share information with their colleagues and process information received. Due to the significance of SMMs, generating them is a competence worth developing during command unit training.
Thieme and Hofinger (
2012) emphasize that information exchange might be seen as an antecedent of SMM, which can be also found in the team learning beliefs and behaviors (TLBB) model developed by van den Bossche et al. (
2006). They describe the information exchange within a team as the “team learning behavior” that promotes “mutually shared cognition.” However, mutually shared cognition is defined as the mutual understanding and shared perception of a problem or task (van den Bossche et al.
2006). To generate a mutually shared cognition, team members need to “construct” (share) and “co-construct” (validate) information across the team and engage in “constructive conflicts” (discuss and amend information) about disagreements (van den Bossche et al.
2006), assuring a shared understanding of the incident and the upcoming actions. Following these definitions, SMM and mutually shared cognition describe closely related constructs. Further, the TLBB model was also confirmed for police and fire service teams by Boon et al. (
2013), which suggests that the processes of construction, co-construction and constructive conflict could be applicable in fire service teams for developing mutually shared cognitions. This assumption is encouraged by Thieme and Hofinger (
2012), who describe a procedure for developing and sustaining a shared mental model in command units. In this procedure, the command unit leader takes in all relevant information from each command unit member (construction) and recapitulates the gathered facts (co-construction). While the command unit leader is summarizing the information, each member is encouraged to confirm, adapt, or add to the condensed information (constructive conflict) (Thieme and Hofinger
2012).
2.3 Evaluation of Training
Evaluation is generally defined as the systematic assessment of an intervention’s merit, worth, or significance (Scriven
1999). It can be checked whether the intervention is worth the resources invested, achieves its goals, or causes unintended consequences (Scriven
1999). Furthermore, evaluations that apply quantitative and qualitative research methods offer the opportunity to gather information about the appropriateness of used methods and how to improve the intervention (Kirkpatrick and Kirkpatrick
2006). Thus, the choice of evaluation criteria is crucial for evaluating the effectiveness of a training (Arthur et al.
2003). Several theories and models exist for training evaluation, but the four-level model developed by Kirkpatrick (
1979) is still the most popular and commonly used model for training evaluation criteria (Salas and Cannon-Bowers
2001; Arthur et al.
2003; Grohmann and Kauffeld
2013).
2.3.1 The Four-Level Model of Training Evaluation
The four-level model of Kirkpatrick (
1979) represents a hierarchical system that indicates training effectiveness through four categories of evaluation: reaction, learning, behavior, and results (Kirkpatrick
1979). The first level (level I) of evaluation represents the reaction by the participants (Kirkpatrick
1979), which can be described as the trainees’ attitudinal and affective response to the training (Arthur et al.
2003; Kirkpatrick and Kirkpatrick
2006). A favorable reaction is important, because otherwise the trainees will lack learning motivation, which is the prerequisite for level II, learning (Kirkpatrick and Kirkpatrick
2006). The assessment of trainees’ reaction is operationalized with self-assessment reaction sheets using standardized questions and written comments immediately after the training (Kirkpatrick and Kirkpatrick
2006). Moreover, reaction questionnaires can be distinguished into affective and utility questionnaires (Blanchard and Thacker
2013), whereby affective questionnaires reflect the enjoyment or general feelings about the training, and utility questionnaires expresses the training’s value (Blanchard and Thacker
2013; Ritzmann et al.
2014). Following Schulte and Thielsch (
2019), the present study focuses on the utility measures, as they provide more suitable leverage points for improvements (Blanchard and Thacker
2013). Furthermore, meta-analytical evidence suggests that utility reactions have higher correlation with learning and behavior outcomes than affective reaction measures (Alliger et al.
1997).
The learning level of the model (level II) needs to be assessed to ensure the training was more than just a pleasant experience. Trainers or evaluation managers should obtain data about acquired knowledge, developed or improved skills, and changed attitudes, since these are prerequisites for change in behavior, which represents level III (Kirkpatrick and Kirkpatrick
2006). The present study uses self-assessment questionnaires for learning outcomes. Kirkpatrick and Kirkpatrick (
2006) recommend an immediate examination of level II to assure a high response rate and allow conclusions about the entire amount of learned content, as a delayed assessment excludes once learned but already forgotten content (Blanchard and Thacker
2013).
The evaluation criteria for behavior, level III, ask whether a change in on-the-job behavior occurred as a result of the training’s attendance (Kirkpatrick and Kirkpatrick
2006). Because applying the learned content requires some time for adjustment, the assessment of level III cannot be done promptly after the training (Kirkpatrick
1979; Blanchard and Thacker
2013). The same holds true for the level IV, namely the results. For evaluating the results, evaluation mangers want to figure out what impact occurred on an organizational level due to the training (Arthur et al.
2003; Kirkpatrick and Kirkpatrick
2006). Apparently, much more time is needed to detect effects on this macro criteria (Arthur et al.
2003). Further, level III and level IV need more sophisticated assessment methods like 360-degree performance appraisals or utility analysis estimates (Arthur et al.
2003; Kirkpatrick and Kirkpatrick
2006).
2.3.2 Process and Outcome Evaluation
A second classification of training evaluation data can be done by separating the process evaluations and the outcome evaluations (Blanchard and Thacker
2013). Learners’ processes are covered on level I of the Kirkpatrick model, thus this level is essential for identifying parts of a training that might have gone wrong. One can derive benefits from comparing whether the intended training program matches up well with the implemented training program. Thereby, process data require an evaluation focus on trainer, training techniques, and learning objectives (Blanchard and Thacker
2013). Outcome measures provide important information on whether the training achieved its intended goals. When only outcome data are assessed, it is possible to judge whether the training accomplished its objectives, but it may be difficult to identify the underlying causes (Blanchard and Thacker
2013). Thus, a combination of both data types is desirable in a training evaluation.
2.4 Evaluation in the Context of Emergency Services Education
Evaluation is of prime importance in ensuring that trainings for educating emergency and rescue services personnel are of optimal quality. There are two different foci of evaluation: First, an evaluation can involve trainees’ reflections on how they performed during trainings. Second, an evaluation can assess the training itself, including its structure, exercises, and instructor behavior. A typical approach in trainings is to schedule time for reports and discussion on what went well in the exercises and where participants can optimize their behavior (Berlin and Carlström
2014; Grunnan and Fridheim
2017); without such time to reflect, participants’ learning can be hindered, particularly if evaluations were given long after a training was conducted (Berlin and Carlström
2014). Thus, with respect to crisis management exercises, Grunnan and Fridheim (
2017, p. 80) stress that evaluative reflections are important and “should always be part of an exercise.”
Aside from lack of reflection time, several other factors can impede learning in fire service trainings, such as lack of structure, inappropriate instructor behavior, unrealistic training scenarios, or difficulties in knowledge transfer (Berlin and Carlström
2014). To evaluate training quality, the four-level model of Kirkpatrick (
1979) could be applied. Several general tools are available for an evaluation of team training (such as the Q4TE; Grohmann and Kauffeld
2013 or the TEI, Ritzmann et al.
2014). However, such instruments only allow for a global screening and cannot provide the trainers of crisis teams with detailed feedback on the perception of context-specific aspects and possibilities for improvement. At the moment, to the best of our knowledge, only one validated instrument in the context of firefighter education is published: The
Feedback Instrument for Rescue forces Education (FIRE, in German:
Feedback Instrument zur Rettungskräfte Entwicklung) (Schulte and Thielsch
2019), which is an evaluation tool for the training of group and platoon leaders.
The FIRE questionnaire was created in Germany in cooperation with the State Fire Service Institute North-Rhine-Westphalia (in German:
Institut der Feuerwehr Nordrhein-
Westfalen (IdF NRW)), and it is based on a series of three consecutive studies: As there was an absence of published previous work on fire service training evaluation, the first study was of qualitative design, whereby the authors conducted interviews with the trainers for firefighter leaders and the trainees, ascertaining crucial attributes for excellent firefighter training (Schulte and Thielsch
2019). By consulting with professionals and topic experts, the authors were able to gather data on this topic despite missing theoretical work (Wroblewski and Leitner
2009). Furthermore, including the perspectives of trainers and trainees from the beginning increases the accuracy of self-ratings on performance (Blanchard and Thacker
2013) as well as the acceptance of the developed evaluation tool (Wroblewski and Leitner
2009). Based on this first study’s 64 semistructured interviews, the authors deduced a number of factors related to excellent teaching of firefighter leaders. These factors were then used to build an initial set of evaluation questions (Schulte and Thielsch
2019). As the authors recognized several similarities between the teaching methods in fire service training and university courses (for example, lectures, group discussions, and group work with presentations) the initial questionnaire was supplemented with items from existing and validated instruments for evaluation in higher education. In the subsequent second study, the resulting initial 116 items were tested for comprehensibility, completeness, and relevance by seven trainers and 26 trainees. Afterwards (with
n = 263 trainees), an exploratory factor analysis (EFA) was conducted with the remaining 65 items to reveal the underlying structure of factors (Schulte and Thielsch
2019). In a final third study (with
n = 45 trainer and
n = 380 trainees), the found structure was validated using confirmatory factor analysis (CFA) and associations with related scales. The resulting core version of the FIRE questionnaire for group and platoon leader trainings consists of 21 items assessing six main factors:
trainer’s behavior,
structure,
overextension,
group, self-rated
competence, and
transfer (Schulte and Thielsch
2019).
With respect to Kirkpatrick’s model, Schulte and Thielsch (
2019) focused on the first two levels—reaction and learning. As each level of Kirkpatrick’s model builds on the previous one (Kirkpatrick and Kirkpatrick
2006), the results for the first evaluation levels must assure (for level I) that participants had favorable reactions to the training and (for level II) that they actually learned during the training before evaluations can examine levels III or IV, the behavior or result outcomes. Furthermore, for learned content to manifest in subsequent behavior or organizational results, not only on the learning outcomes but also on the on-the-job environment is important (Arthur et al.
2003). Considering the multifactorial influences on level III and level IV as well as the accompanying challenges in assessing these levels, the current evaluation study on the fire service command unit will as well focus on level I and II. Further, by integrating Kirkpatrick’s four-level model and the process and outcome model of Blanchard and Thacker (
2013), the FIRE questionnaire captures process data on the reaction level and outcome data on the learning level (Schulte and Thielsch
2019), which will be continued in the present study. Specifically, the constructs
trainer’s behavior,
structure,
overextension, and
group represent process data, and self-rated
competence and
transfer constitute outcome data.
The FIRE questionnaire has been successfully adapted and tested in the context of firefighter basic trainings at municipal and district levels (Thielsch et al.
2019). Furthermore, additional questionnaires and scales have been created for more specific evaluation purposes: For example, a questionnaire was developed to evaluate the quality of examinations of firefighters from the viewpoint of the candidates (Thielsch et al.
2018), and a four-item short questionnaire was developed to evaluate mission exercises in trainings of group and platoon leaders (Röseler et al.
2020). The latter questionnaire is based on items regarding mission exercises created in the first study of Schulte and Thielsch (
2019) that were not included in the main FIRE questionnaire but were instead validated as a separate scale (Röseler et al.
2020).
2.5 Adaption of the FIRE Questionnaire for Command Unit Trainings
Considering the literature on command unit training, the applied teaching methods for training command units seem generally identical to the ones described in Schulte and Thielsch (
2019) for group and platoon leader trainings. This impression was confirmed in a discussion with the deputy head of department “Crisis Management and Research” of IdF NRW in Germany. He explained that the teaching methods used in the trainings are comparable, while the contents differ due to the different audiences. Proceeding from these similarities, we decided to adapt the FIRE process scales for
trainer’s behavior,
structure,
overextension, and
group by only changing the names from group/platoon leader to command unit member. This applies to the command post exercises as well, which use similar teaching methods as those used for the mission exercises of group and platoon leaders. As mentioned earlier, the command post exercises are very important for the command units. Therefore, the present study aims to validate one additional scale of the FIRE, the mission exercise scale (Röseler et al.
2020). For the questions on this scale, the term “mission exercise” was changed to “command post exercise.”
With respect to outcome measures, the FIRE
transfer scale was slightly amended in wording to better fit with the command unit training content.
4 However, contrary to group and platoon leaders, the command unit leads form the rear. Thus, major changes were necessary in the self-rated
competence scale of the FIRE, as main key competences in the command unit are adequate processing of information, coordination, and communication (Wybo and Kowalski
1998; Hofinger and Heimann
2016). Particularly, the competence scale was amended based on the evidence presented above on team learning and shared mental models.
2.5.1 Amendment of the FIRE Competence Scale
Having considered the relevance of SMM (and its antecedent, information exchange) in command unit duties, we assume that such mental models develop during command unit trainings and conclude that they should also be explicitly taught; this, in turn means that evaluations should assess whether SMM were successfully developed during trainings. However, measuring the degree of SMM is challenging, and, so far, no consistently used methodology exists for doing so (Mohammed et al.
2010). In addition, although measuring SMM cannot be realized by a self-assessment evaluation form, it is possible to examine the acquired competence in construction, co-construction, and constructive conflict during training (van den Bossche et al.
2006; Boon et al.
2013). Therefore, based on the work of van den Bossche et al. (
2006) and Boon et al. (
2013), the competence dimension of the FIRE scale is extended by three items measuring construction, co-construction, and constructive conflict. In doing so, the nine items of van den Bossche et al. (
2006) were condensed to three items, namely “Through my participation in the course, I learned to better communicate the information relevant to my colleagues,” “My participation in the course has made it easier for me to process information received from my colleagues,” and “Through my participation in the course, I am able to critically check the information provided by my colleagues for my tasks.”
2.5.2 Amendment of Scale Names
Since the scale for self-rated
competence differs from the original FIRE scale and a number of changes in item wording were conducted, the name for the tool was also amended to avoid any possibility of confusion. So, the main evaluation questionnaire for command units is titled FIRE-CU (
Feedback
Instrument for
Rescue forces
Education
– Command
Unit). The additional amended mission exercise scale is named FIRE-CPX (
Feedback
Instrument for
Rescue forces
Education
– Command
Post e
Xercise scale). See Table
1 for the final FIRE-CU and FIRE-CPX items to be validated in the present study.
Table 1
Final FIRE-CU and FIRE-CPX items
FIRE-CU |
Trainer’s behavior | The trainers condensed difficult topics concisely |
I think the trainers gave useful feedback |
The trainers motivated me to participate actively in the course |
I think the trainers were interested in the participants’ learning success |
Overextension | I was overexerted by the amount of subject matter |
The speed of impartation was too high |
The course content was too difficult for me |
Structure | I think the course was well structured |
I was always able to follow the structure of the course |
I think the course gave a good overview of the subject area |
Group | The other trainees participated actively |
The participants supported each other |
I think there was a strong solidarity within the course |
Competence | Through my participation in the course, I learned to better communicate the information relevant to my colleagues |
After the training, it is easier to make decisions in critical situations |
After this training, I know my personal limitations better than before |
After this training, I think I am more capable of staying calm in stressful situations |
My participation in the course has made it easier for me to process information received from my colleagues |
Through my participation in the course, I am able to critically check the information provided by my colleagues for my tasks |
Transfer | I feel very well prepared for the next mission I will perform as a command unit member |
By participating in the exercises during the course, I gained the necessary self-assurance to perform missions as a command unit member |
| I can use the acquired knowledge for my future assignment as a command unit member |
FIRE-CPX | I learned a lot during the command post exercises |
The trainers provided useful feedback during the command post exercise |
During the command post exercises, I was able to apply my newly acquired knowledge |
The command post exercises’ level of difficulty was appropriate |
2.6 Validation of Evaluation Instruments and Application in the Present Study
The aim of the present research is to validate the FIRE-CU and FIRE-CPX in the evaluation of trainings for the highest management level of the fire services, the command unit. Validity is usually described as the most important characteristic of psychometric tests and instruments (Clark and Watson
1995; Irwing and Hughes
2018), whereby assessing the validity of a test or instrument implies determining its accuracy and appropriateness (American Educational Research Association et al.
2014; Irwing and Hughes
2018). Validity is referred to as a unitary concept, which means that instead of there being distinct types of validity (American Educational Research Association et al.
2014), there are different types of evidence for validity that require a series of investigations (Clark and Watson
1995). For example, assessing the evidence for content validity involves determining whether the test’s content relates well with the construct it is intended to measure (American Educational Research Association et al.
2014). Content validity can be ensured by developing the construct definition and the items in cooperation with experts in the field (American Educational Research Association et al.
2014), as was done in the development phase of the FIRE as described above.
Furthermore, testing the factorial structure by using confirmatory factor analysis (CFA) provides evidence of whether there is a sufficient model fit between the empirical data and the theoretically assumed model (Thompson
2004). Following the results of Schulte and Thielsch (
2019) for the original FIRE questionnaire, a six-dimensional structure is also assumed for the FIRE-CU: The original FIRE process items show a four-dimensional structure reflecting
trainer’s behavior,
overextension,
structure, and
group; the original FIRE outcome items show a two-dimensional structure reflecting self-rated
competence and
transfer. The additional module for mission exercises was assessed with a one-factor model (Röseler et al.
2020), which is also expected for the FIRE-CPX scale.
The construct validity of the FIRE-CU and FIRE-CPX questionnaires in this study are investigated with convergent, divergent, and concurrent scales (Nunnally
1978). Convergent measures were chosen based on the relatedness of fire service evaluations to teaching evaluations in higher education, as there are no other comparable specific evaluation instruments for fire service training. Therefore, positive relationships are expected for convergent scales. Evidence for divergent validity is assessed by a mood scale and the correlation between participants’ age and their evaluations. With regard to mood, we note that mood may not only influence quality assessments in the form of a bias variable, but mood can also be the result of training quality, as learning success can also have a positive effect on the trainee’s mood. In this context, small or medium correlations therefore do not argue against validity. Larger correlations, however, would call the construct validity of the scales into question. In addition, there should be little or no correlation between the age of the trainees and their assessment of training. Further evidence for construct validity can be provided with test-criterion relationships, whereby the instrument is assumed to predict a certain criterion (American Educational Research Association et al.
2014). An instrument is said to have predictive validity if it can forecast criteria measured at a later point in time, whereas it has concurrent validity by predicting criteria obtained at the same time (American Educational Research Association et al.
2014). The present study intends to assess all scales immediately after the trainings, reducing the effort for the participants and, thereby, concentrating on concurrent validity measures for FIRE-CU and FIRE-CPX. These are realized by an overall assessment of the entire training and a grade on a six-point scale. In sum, the present study focuses on testing the applicability, factorial structure, reliability, and the validity of the FIRE-CU and FIRE-CPX in command unit trainings.