Findings
This study aimed at developing and validating a generic diagnostic tool for assessing primary education pupils’ prior knowledge of technological systems. To this end, the Evidence Centered Design approach (Mislevy et al.,
2003; Oliveri et al.,
2019) was utilised. To properly validate the development of the diagnostic assessment tool, the design decisions (i.e., warrants) should be substantiated with theoretical as well as empirical evidence (i.e., backings).
Study 1 addressed the decisions related to the design of the assessment tasks, based on an electrical (i.e., BW device) and a mechanical (i.e., SMT device) system. More specifically, it examined whether pupils could combine the system's components in various ways, allowing them to demonstrate partial knowledge and by that generating a wide variety of work products. To this end, primary education pupils carried out both assessment tasks, which generated 272 individual work products per device. Results for both devices indicate that pupils interrelated the device’s components in various ways, resulting in 145 different BW and 112 different SMT work products. This demonstrates that both tasks allowed pupils to apply various aspects of knowledge about the interrelationship of the devices’ components. All in all, these empirical findings corroborate with the theoretical backings. More specifically, the assessment tasks enabled pupils to generate the necessary variety of work products (Davey et al.,
2015) despite the restrictions in experiencing the consequences of trial-and-error behaviour (Klahr & Robinson,
1981; Philpot et al.,
2017) and allowed them to make their tacit knowledge explicit (Levy,
2012; Zuzovsky,
1999).
Study 2 addressed the design decision that generic scoring rules can be utilised to infer pupils’ prior knowledge about technological systems from their generated work products. Since pupils’ prior knowledge may differ considerably per device (CITO,
2015; Molnár et al.,
2013), the theoretical backings favoured the development and utilisation of generic—device transcending—scoring rules (Nitko,
1996). Based on Fischer and Bidell’s dynamic skill development framework (2007), seven generic levels were operationalised in level-specific scoring rules (see Table
1). To examine the suitability of the generic scoring rules, two different types of expert groups were asked to qualify a representative sample of work products. Experts in the field of technology education (N = 9) rank-ordered, based on the pair-wise comparisons, a representative selection of work products in terms of the quality of the construction (i.e., device functionality). Researchers in the field of dynamic skill development (N = 6) interpreted and substantiated the level of work products based on their experience with Fischer’s framework on dynamic skill development. The semi-structured interviews yielded valuable insights and concrete suggestions, which were used to calibrate the task-specific scoring rules according to the principles of the generic scale for refining the task specific scoring rules (see Table
4). After utilising the refined scoring rules, results for both devices show a significant positive and high correlation with the ranking value that resulted from the independent judgements of technology education experts. The correlation between test and retest scores was high. Pupils’ results on items indicating specific combinations of components did fit an extended non-parametric Rasch model. All in all, these empirical findings align with the theoretical backings. Meaning that the diagnostic tool assesses construct relevant (i.e., prior knowledge about technological systems) pupils characteristics (Kane,
2004; Oliveri et al.,
2019).
Study 3 addressed whether the tasks did indeed generate the differences in skill levels that were expected regarding the age of the pupils. For that, the generated work products from study 1 were scored according to the refined scoring rules. The outcomes were in accordance with the distribution of levels that was expected from previous studies with the Fischer-scale (Schwartz,
2009). The findings confirm those of studies indicating that pupils in primary education find it difficult to understand technological systems (Assaraf & Orion,
2010; Ginns et al.,
2005; Koski & de Vries,
2013; Svensson, Zetterqvist, & Ingerman,
2012). A plausible explanation for this could be that pupils’ ability to apply inductive reasoning strategies (i.e., Fischer level 7) is not sufficiently developed yet in primary education (Molnár et al.,
2013).
By comparing pupils’ levels on the tasks with their scores on reading comprehension and mathematics, it was also explored whether such scores might also be used as an indication of pupils’ prior knowledge about technological systems. Prior research, for example, indicated that primary education pupils’ math and reading ability scores are strong predictors of their academic achievement (Safadi & Yerushalmi,
2014; Wagensveld et al.,
2014). To examine this, the levels of the work products from study 1 were related to pupils’ math and reading ability scores obtained from National standardised tests. In contrast to the study of Safidi and Yerushalmi (
2014), a neglectable effect of math and reading ability scores on task- achievement was obtained. A possible explanation might lie in the nature of the assessment task. Whereas Safi and Yerushalmi assessed pupils’ understanding with multiple-choice questions, this study made use of performance assessments. By doing so, pupils were enabled to make use of their tacit knowledge (i.e., design decision 1), which differs from solely enabling pupils to use verbalisations (Cianciolo et al.,
2006; Wagner & Sternberg,
1985). All in all, these empirical findings indicate that construct-irrelevance (i.e., assessing unintended/confounding pupil characteristics, see Kane,
2004; Roelofs,
2019) can be excluded.
The finding that the tasks reveal aspects of pupils’ prior knowledge, which are not reflected by their scores on mathematics and reading comprehension, strengthens the importance of using such tasks in primary education. On the one hand, it can reveal that, preferably within integrated STEM, engineering activities are necessary to promote pupils’ understanding of technological systems. On the other hand, it can also reveal the capacities of certain pupils that remain hidden by the current assessment practice.
Limitations
Although the obtained findings may sound promising, it is important to take the study’s limitations into account when generalising their implications to other educational practices and research. A major limitation follows from the tools’ purpose: enabling teachers to get information about their pupils’
prior knowledge that can help them to prepare their lessons. The design decisions following that purpose limit the tools’ application for formative use. By restricting the evaluation of trials, the tool does not enable pupils to show their problem-solving ability, i.e., the ability to infer a systems’ structure through interaction. See, for instance, Pisa 2012 assessment on creative problem-solving for such tasks (OECD,
2014). The use of a generic scale may suggest that the tool measures a generic ability. However, the generic scale only makes it possible to compare a pupils’ prior knowledge of different systems. The level resulting from a work-product only indicates prior knowledge about the technical system that the task represents.
Other limitations reside in the methodology used in this study. First, whereas utilising the ECD validation approach has proven its value, this was—to the best of our knowledge—was mainly the case for so-called high-stakes assessments such as standardised tests. Its utilisation for diagnostic assessment purposes is a yet unexplored area, and perhaps other validation approaches might be more suited for this end. To gain a broader perspective on the matter, the reader might, for example, also be interested in utilising design and validation approaches that have a stronger emphasis on formative educational practices (e.g., Black & William,
2018; Pellegrino et al.,
2016). Second, as indicated by Study 1, it remains to be seen whether the current study was able to gain insight into the full range of work products pupils might generate. In case the range increases, this might have implications for the generic scoring rules. It, thus, remains to be seen if the current scoring rules are also suitable for a larger variation in generated work products. Third, as indicated by Study 2, the limited number of experts in the field of dynamic skills development indicated they found it difficult to utilise the scoring rules to the BW device. Although, after the constructive discussions, an initial agreement about the generic score rules was obtained, further empirical backings (e.g., replication study with other technological devices) are required to substantiate this design decision further. Although the timeframe and availability of the experts did not allow it, it is also preferable to organise (multiple) calibration sessions in which the experts discuss the scoring rules with each other (O’Connell et al.,
2016).
In addition, although work products are valuable assessment tasks, it can be questioned whether a full understanding of pupils’ mental models (i.e., understanding of concepts and principles) can be inferred from them (Garmine & Pearson,
2006). As indicated by others, one should be aware that every assessment tool (e.g., purpose, task, scoring, outcomes) has its own merits and pitfalls and might want to consider the utilisation of a) multiple assessments with the same tool and b) different types of assessment tools (Gerritsen-van Leeuwenkamp et al.,
2017; Van der Schaaf et al.,
2019). Lastly, even though pupils from different schools and grade classes participated (Study 1 and Study 3), it remains to be seen if this specific sample properly reflects the entire population. This might have implications for the pupils’ characteristics (i.e., math and reading ability) that were included in this study and their effect on pupils’ understanding of technological systems. It might, for example, also be feasible to assume that a pupils’ motivation affects his/her task engagement and academic achievement (Hornstra et al.,
2020; Schunk, Meece, & Pintrich,
2012).
Implications for educational practices and research
To conclude, this study’s theoretical underpinning and its empirical findings support the validity of the generic diagnostic assessment tool. It is a first step in supporting teachers in assessing primary technology education-related learning outcomes (Garmine & Pearson,
2006) and—hopefully—warranting a more structural embedding of technology education in primary education curricula (Dochy et al.,
1996; McFadden & Williams,
2020). As indicated by the study limitations, the diagnostic assessment tool requires more research to validate its utilisation. One potential direction for this could lie in replicating the current study with devices based on the current design decisions but which differ regarding the underlying physical principles. By doing so, future studies could examine whether the current design is robust enough to warrant its utilisation in other contexts. Another potential direction could be that triangulation techniques are utilised to examine whether tools aimed at assessing the same construct (i.e., understanding technological systems) yield comparable results (Catrysse et al.,
2016). More specifically, it would be valuable if pupils’ verbalisation of their actions was measured a) during (i.e., think aloud) or after (i.e., stimulated recall) their task performance and related to the scoring of their generated work products. For educational practices, it is important to gain more insight into the tool’s ecological validity (Kane,
2004). That is, can primary education teachers actually utilise the diagnostic tool to diagnose and enhance their pupils’ prior knowledge about technological systems? Prior research indicates that teachers find it difficult to apply such formative teaching approaches (Heitink et al.,
2016). Reasons for this could be that they often lack a) a clear understanding of these approaches (Robinson et al.,
2014) and b) concrete—how to—examples indicating how such approaches can be utilised (Box et al.,
2015). A potential, first, direction for addressing is, is to organise training (Forbes et al.,
2015; Lynch et al.,
2019) or calibration sessions (O’Connell et al.,
2016; Verhavert et al.,
2019) in which teachers learn how to utilise the diagnostic tool. When familiar with administering the diagnostic assessment tool and analysing the obtained results, (more) support could be provided regarding the adaptive enhancement of pupils’ understanding of technological systems (Black & Wiliam,
2018; Van de Pol et al.,
2010).