Elsevier

Computers & Education

Volume 91, 15 December 2015, Pages 92-105
Computers & Education

Computer-generated log-file analyses as a window into students' minds? A showcase study based on the PISA 2012 assessment of problem solving

https://doi.org/10.1016/j.compedu.2015.10.018Get rights and content

Highlights

  • Computer-based assessments enables analyses of log-files that contain students' actions.

  • We analyzed PISA 2012 problem solving log-files of over 16,000 students.

  • We related students' strategic approach to performance on an individual and country level.

  • We identified several groups of students with different levels of (non-)mastery.

  • We discuss implications of log-file analyses for scientists, teachers, and policy makers.

Abstract

This paper aims at showcasing the potential of log-file analyses by capitalizing on the computer-based assessment of complex problem solving (CPS) in the 2012 cycle of the Programme for International Student Assessment (PISA). We analyzed log-file data from the CPS item Climate Control encompassing N = 16,219 students from 44 countries and economies. In Research Question 1, we related the application of an optimal exploration strategy (i.e., vary-one-thing-at-at-time, VOTAT) to performance on this specific problem and to overall level of CPS proficiency, both on an individual and on the country level. In Research Question 2, we identified several groups of students with different levels of non-mastery on a fine-grained level. Results indicated that (1) the VOTAT strategy was strongly related to performance on the Climate Control item as well as to overall problem solving proficiency, and (2) that there were different levels of non-mastery that ranged from applying no systematic strategic behavior to actually applying VOTAT but still failing to solve the item. On the backdrop of these results, we discuss implications and future potentials of log-file analyses in educational large-scale assessments for researchers, teachers, and policy makers.

Introduction

Ever since the advent of computers to psychological and educational assessment, the possibility of analyzing behavioral processes and sequences of actions through information captured in computer-generated log-files has been praised (Bunderson, Inouye, & Olsen, 1989). An asset of log-files is that they are a straightforward source of information containing data on every single behavioral action of each student (Bunderson et al., 1989, Greiff et al., 2014). As often as this potential has been praised, it has just as infrequently been exploited. In fact, almost 20 years after the initial excitement, Williamson, Bejar, and Mislevy (2006) state that surprisingly little use has been made of computer-generated log-files and that the exploitation of information contained in them has been lagging behind initial expectations.

The question of how log-files can be exploited to better understand students' levels of proficiency gained once again momentum when in the 2012 cycle of the Programme for International Student Assessment (PISA), the computer-based assessment of complex problem solving (CPS; labeled creative problem solving in PISA; OECD, 2014a) was included, PISA is among the internationally most widely recognized educational large-scale assessments (OECD, 2009b). Importantly, through its 2012 cycle it provides for the first time ever log-files containing the data of problem solving behavior from a representative sample of 15-year-old students in over 40 countries and economies that have now become available for further analyses.

The rationale behind including CPS in the PISA cycle in the first place is that CPS is considered a central 21st century skill of high importance for several outcomes including academic achievement (Sonnleitner et al., 2013, Wüstenberg et al., 2012; cf. also Wirth & Klieme (2003)). It encompasses a set of higher-order thinking skills that require students to engage in strategic planning, to think ahead, to carry out multi-step sequences of actions, to react to a dynamically changing system, to test hypotheses, and, if necessary, to adequately correct these hypotheses (Raven, 2000, Wüstenberg et al., 2012). In other words, it depends substantially on complex behavioral patterns and sequences of actions (Funke, 2010, Greiff et al., 2014).

Thus, CPS is an obvious candidate for in-depth analyses of the specific behaviors that underlie successful and unsuccessful performance. Information on exactly where and how students succeed or fail in their problem solving efforts is expected to equip researchers, teachers, and policy makers alike with important information about students' proficiency and about how to support them in optimizing their cognitive potential (cf. OECD, 2012). And yet, the exploitation of this rich resource through dedicated log-file analyses is still in its infancy. The study at hand was conducted to remedy this shortcoming by showcasing the potential of theoretically motivated log-file analyses by capitalizing on the PISA 2012 assessment of CPS. This assessment was comprised of several items in which students worked on novel and dynamic problem situations in computer-based assessment environments (OECD, 2014a). For our initial efforts, we chose one specific item from the dynamic CPS unit Climate Control. Climate Control offered a particularly good reflection of the underlying concept of CPS according to the CPS framework (OECD, 2014a) and was the focus of our initial efforts to understand students' behaviors when working on CPS tasks. A specific and detailed description of Climate Control is provided in section 2.2.

We targeted two research questions with regard to CPS. In the first research question, we investigated whether the specific strategic approach that students employed while solving the Climate Control item was related to their performance on this specific problem and to their overall level of CPS proficiency. In the second research question, we further exploited log-file data on a very fine-grained level in the attempt to identify several groups of students with different levels of mastery and non-mastery on the backdrop of their specific behaviors and their exploration strategies. To address these two research questions, we investigated individual log-files of the PISA 2012 cycle and used them for individual-level and country-level analyses.

Nearly three decades ago, Bunderson et al. (1989) were among the first to highlight the potential of computer-based assessments that came along with a so-far unprecedented amount of data on students' test-taking behaviors. Indeed, by analyzing test-taking behaviors stored in log-files, not only can the final outcome be measured (i.e., correct or incorrect) but also the preceding steps and actions that resulted in the specific outcome. That is, instead of just answering What has been achieved, log-files provide direct insights into How the results were produced. More specifically, important information can be gained about how a student interacted with the problem and how this resulted in specific mastery or non-mastery of the task.

As they had this potential in mind, it is not surprising that Bunderson et al. (1989) asked scholars to put all hands on deck to fully exploit the almost infinite amount of data as a door opener to “intelligent measurement” in order to offer advice to learners and teachers on the basis of students' profiles and as a window into the black box of cognitive processing. Clearly, it is impossible to identify what happens in students' brains while they work on problems, but researchers can make theoretically motivated inferences about the underlying cognitive mechanisms from overt behavior. Being able to track the behavioral sequences of actions that lead to a specific performance outcome constitutes an important step toward understanding what happened during the test. For instance, log-files provide information about mistakes that were made while working on a problem, and such mistakes may be associated with an incorrect understanding of the problem situation (Ifenthaler, Eseryel, & Ge, 2012). Thus, in the long run, the identification of specific test-taking behavior might be a gold mine for any kind of teaching or intervention as it enables differentiated instruction that can provide students with individually tailored avenues for learning.

A closer look at the current practice of how log-files are exploited and how meaningful information is derived from them shows a remarkable discrepancy to the potential that is assigned to log-files: Theory-driven research on log-files has been stagnating for years, and Bunderson et al. (1989) vision of intelligent measurement has not come to fruition (Williamson, Mislevy, & Bejar, 2006). The lack of studies investigating process data is – at least to a certain extent – due to the challenges and obstacles associated with analyzing the often-fuzzy relations found in large amounts of log-file data. As noted by Csapó, Ainley, Bennett, Latour, and Law (2012), it is very difficult to “make sense of the hundreds of pieces of information students may produce when engaging in a complex assessment’’ (p. 216). That is, even though powerful techniques for analyzing log-file data such as educational data mining have recently emerged (Romero, Ventura, Pechenizkiy, & Baker, 2011), it is often difficult to give conceptual meaning to and to derive specific implications from the behavioral patterns found in log-files. Further, the technical expertise required for extracting meaningful pieces of information out of log-files (cf. Dumais, Jeffries, Russell, Tang, & Teevan, 2014) might further slow the rate of or even hinder widespread investigations. This is illustrated in Fig. 1, which displays a small part of an already sorted log-file containing only four interactions (e.g., clicks) of one student who worked on one single CPS item. Of note, in a CPS task, it is possible for one student to produce up to 60 or even more interactions while working on one item over the course of a couple of minutes of problem exploration (cf. Wüstenberg et al., 2012), thus producing a vast amount of information to be analyzed when the data from all students and all items are considered.

From an empirical perspective, studies addressing the wider field of assessment of cognitive abilities have either abstained from investigating process data or are largely focused on time-on-task as the central behavioral indicator (Dodonova and Dodonov, 2013, Goldhammer et al., 2014). For instance, Goldhammer et al. (2014) explored the relation between overall performance and time-on-task separately for reading and problem solving but did not consider any further process indicators such as specific reading or problem solving strategies. Until now, only a few studies have applied other process indicators of test-taking behavior such as specific exploration strategies (e.g., Kröner et al., 2005, Wüstenberg et al., 2012). With regard to CPS, Scherer and Tiemann (2012) showed that preexisting strategic knowledge is associated with the exploration strategy that is applied in complex problems, which in turn is related to overall CPS performance. Along these lines, a specific strategy that has often been investigated in CPS research is the application of the vary-one-thing-at-a-time (VOTAT; Tschirgi, 1980) strategy (also labeled control of variables strategy; Chen & Klahr, 1999). When applying VOTAT, one variable is manipulated while all other variables are held constant to determine the effect of the varied independent variable on the dependent outcomes. VOTAT resembles the principle of isolated variation of variables, which is a core component of conducting valid scientific experiments (Chen & Klahr, 1999). Empirically, Kröner et al. (2005) as well as Wüstenberg et al. (2012) showed that application of VOTAT is strongly related to CPS performance. However, existing studies were mostly based on rather small and non-representative samples (e.g., Kröner et al., 2005, Wüstenberg et al., 2012). In this regard, the large and representative PISA 2012 sample provides an interesting venue for log-file analyses, as this data is likely to include a broad variety of test-taking behaviors.

That is, CPS items can be considered excellent candidates for investigating the relation between behavioral sequences and overall performance, as they require students to apply multi-step behavioral sequences while actively interacting with the problem environment, thereby generating a vast amount of behavioral data stored in log-files. For instance, in the Climate Control item that was part of the PISA 2012 assessment and was also analyzed in the present study, students have to adjust controls in successive steps to figure out how the item works (a detailed description of the item is provided section 2.2). In the next section, we introduce the concept of complex problem solving and describe the role of strategic exploration in more detail.

Complex problem solving in PISA 2012 is defined as “an individual's capacity to engage in cognitive processing to understand and resolve problem situations where a method of solution is not immediately obvious. It includes the willingness to engage with such situations in order to achieve one's potential as a constructive and reflective citizen” (OECD, 2014a, p. 30). In the first part of this definition, the OECD stresses that the verbs engage, understand, and resolve underline the idea that the assessment of problem solving measures not only the outcome of the problem solving process but also students' progress, including, for instance, the strategies they employ to solve the problem.

In this, the PISA 2012 definition closely resembles classical definitions of CPS. Buchner, for instance, emphasizes the importance of strategic and planned exploration in CPS by pointing out that the regularities of the complex problem environment can be revealed only by a successful exploration of the problem situation (see Buchner in Frensch & Funke, 1995). Exploring the problem is a fundamental characteristic of CPS because not all of the information necessary to obtain a solution is provided at the outset (Funke, 2010). Thus, individuals have to actively acquire and apply new knowledge (Novick & Bassok, 2005). The importance of knowledge acquisition as one core dimension of CPS also implies that routine behavior (e.g., mere application of pre-existing knowledge) is not sufficient for ensuring that one will actually find a correct solution and that non-routine behavior is required, such as carrying out and adapting strategic plans (cf. Funke, 2010, OECD, 2014a).

Due to its non-routine nature, CPS is often regarded as a crucial 21st century skill. Autor, Levy, and Murnane (2003) found that in nearly every job environment, computerization and automation is associated with a reduction in the inputting of routine tasks and an increase in the inputting of non-routine tasks. This implies that in today's jobs, CPS skills such as updating one's knowledge and reacting to dynamically changing environments are more crucial than only a few decades ago (OECD, 2014a). Recent empirical research has corroborated the notion of CPS as an important 21st century skill that is of particular relevance for educational outcomes. For instance, CPS has been found to predict school grades even beyond fluid reasoning (Greiff et al., 2013, Wüstenberg et al., 2012) and working memory (Schweizer, Wüstenberg, & Greiff, 2013), which are among the strongest predictors of academic achievement (Floyd, Evans, & McGrew, 2003). Despite this empirical importance of non-routine CPS skills, teachers often find that their students excel on routine exercises but fail to solve problems that are unlike those they have previously encountered (OECD, 2014a). To obtain an understanding of how students approach unknown problems such as CPS tasks, it is necessary to identify the behaviors that are associated with students' success or failure.

In PISA 2012, in addition to participating in the obligatory traditional domains (i.e., mathematics, science, reading), over 40 countries and economies participated in the optional computer-based assessment of CPS (OECD, 2014a). By using students' log-file data on a specific PISA 2012 item from the CPS item Climate Control that is presented in detail in section 2.2, we aimed to advance knowledge about the relation between strategic behavior and overall performance in CPS to illustrate the potential of information on processes obtained from log-files.

The purpose of this study is to showcase how log-file analyses can inform the understanding of how students' overall performance evolves against the backdrop of overt behavior and manifest actions. On the basis of the CPS assessment of the PISA 2012 cycle, we investigate the log-files of 15-year-old students from over 40 countries and economies, and their application of the VOTAT strategy as an indicator of manifest behavior in the specific CPS item Climate Control. In doing this, we target two research questions that build upon each other.

With Research Question 1, we investigate how application of the VOTAT strategy differs across students and across countries and whether this difference is related to students' performance in the same item Climate Control (individual-level analysis), to students' overall problem solving proficiency (individual level), and to the country-specific average of overall problem solving proficiency (country level). By relating application of VOTAT to students' performance in the same item Climate Control, we replicate previous findings based on rather small and non-representative samples showing that overt behavior such as VOTAT and performance are strongly correlated (e.g., Kröner et al., 2005, Scherer and Tiemann, 2012, Wüstenberg et al., 2012). However, to our best knowledge, these are the first analyses that are based on a large and representative sample and that relate students' strategic approach also to overall problem solving performance on an individual and country level including a variety of other CPS items beyond Climate Control.

Research Question 2 is aimed at extending previous research on log-file analyses that were mostly limited to analyses of whether a certain strategy was applied or not (see Kröner et al., 2005, Wüstenberg et al., 2012). Here, we analyze students' behavioral patterns on a fine-grained level and try to identify different groups of students with qualitatively different levels of strategic mastery and non-mastery. The identification of optimal and suboptimal strategies, for instance, can provide important information on different levels of understanding (e.g., different types of mistakes) that may be useful for any kind of teaching or intervention.

In summary, the overarching rationale behind our empirical efforts is not to test any specific hypotheses but rather to investigate whether we can gain additional insights from the log-file analyses and the behavioral sequences contained in them. These insights can help understanding how students succeed or fail in their problem solving attempts and can facilitate our understanding of how students perform on assessments such as these.

Section snippets

Participants

The present sample is drawn from the PISA 2012 assessment, in which about 510,000 students between the ages of 15 and 16 from 65 countries or economies participated. A part of this sample, about 85,000 students from 44 out of these 65 countries and economies, participated in the optional, fully computer-based assessment of CPS. This assessment was conducted after the paper-based assessments of mathematics, reading, and science and included countries from all geographical areas such as Brazil,

Results

Descriptive statistics for categorical and continuous variables based on individual-level and country-level data are reported in Table 1. As shown in Table 1, application of VOTAT and performance on Climate Control were based on categorical data, whereas problem solving proficiency on the individual level as well as application of VOTAT and overall problem solving proficiency on the country level were based on continuous data. Frequencies across cells for categorical variables and means of

Discussion and conclusion

This study is among the first to approach log-file analyses on data collected in an international educational large-scale study to gain insights into students' overt behavior when working on problem solving items. The results for our first research question showed that applying the VOTAT strategy was strongly related to overall performance in Climate Control but also to overall problem solving proficiency. This relation was observed both on the individual level and on the country level. The

Author note

This research was conducted as part of a Thomas J. Alexander fellowship that was awarded to Samuel Greiff by the OECD. This research was further funded by a grant from the Fonds National de la Recherche Luxembourg (ATTRACT “ASKI21”).

This work should not be reported as representing the official views of the OECD or its member countries. All opinions expressed and arguments employed herein are those of the author(s).

Acknowledgments

We thank Katinka Hardt and Jonas Neubert for their help in preparing the data and their support with the analyses and Katinka Hardt also for her helpful suggestions in preparing the manuscript.

References (36)

  • J. Cohen et al.

    Applied multiple regression/correlation analysis for the behavioral sciences

    (1983)
  • B. Csapó et al.

    Technological issues for computer-based assessment

  • S. Dumais et al.

    Understanding user behaviour through data and analysis

  • R.G. Floyd et al.

    Relations between measures of Cattell-Horn-Carroll (CHC) cognitive abilities and mathematics achievement across the school-age years

    Psychology in the Schools

    (2003)
  • P.A. Frensch et al.

    Complex problem solving: The European perspective

    (1995)
  • A.F. Fry et al.

    Processing speed, working memory, and fluid intelligence: evidence for a developmental cascade

    Psychological Science

    (1996)
  • J. Funke

    Complex problem solving: a case for complex cognition?

    Cognitive Processing

    (2010)
  • J. Gobert et al.

    From log-files to assessment metrics for science inquiry using educational data mining

    Journal of the Learning Sciences

    (2013)
  • Cited by (125)

    View all citing articles on Scopus
    View full text