Implementation
Students' posts from four consecutive offerings of the course were retrieved from the LMS, tabulated, and coded by two researchers. The coding was based on the widely used coding scheme by Visschers-Pleijers et al. (
2006), which was specifically created to code the process of PBL interactions, as well as the coding scheme by Yew and Schmidt (
2009), designed for the analysis of constructive collaborative interactions in PBL. The coding by Visschers-Pleijers et al. (
2006) covers learning-related interactions (e.g., argumentation, questions, and disagreements) and procedural interactions related to organizing group task and off-task interactions. The coding scheme was extended based on the work of Yew and Schmidt (
2009) to include information sharing, discussion, and argumentation interactions. The coding scheme was simplified by combining similar categories together, for example, “Opening question” and “Alternative question” into “Question”; “Argument” and “Counter argument” into “Argue”. Since the focus of this study is on the students, we combined teachers’ codes that were initially coded as “Guiding”, “Facilitating”, or “Giving feedback” as “Facilitating”, similar to Molenaar and Chiu (
2014). Codes that reflect co-regulation include
Organize,
Question as well as starting discussions with
Share or
Discuss, which prompt others to engage, respond or debate. Codes related to shared regulation are
Evaluation,
Question,
Agree or
Disagree, where students evaluate their approach and discuss how it aligns with goals. It is necessary to re-iterate what Hadwin et al. (
2017) emphasized: when only conversation is coded, the boundaries between coregulation and shared regulation may be blurred. The full coding scheme, with description of the codes, definitions, and abridged examples are detailed in Table
1. The two researchers initially met several times to agree on the coding scheme and the definition of each category. A total of 5% of the data posts were coded by both researchers. Then, the two researchers met, discussed differences, and reached a consensus. The Cohen’s Kappa for interrater reliability was 0.91.
Table 1
Coding scheme for the PBL forum data used in the study, with abridged examples
Agree | Agreeing with the statement of another student or endorsing an opinion | Sure, the analysis of blood indices is the primary method to diagnose anemia |
Argue | Reasoning, justifying, or rationally trying to resolve differences of perspectives and issues in critical discussion | Since the blood cells are of normal size (normocytic), the anemia is expected to be due to bleeding, rather than a blood disease |
Disagree | Disagreeing with other students' opinions or statements | I don’t think that anemia can be diagnosed only based on blood count |
Discuss | Using information or learning resources related to the discussion to trigger conversation, address shared information, or talk about the information in discussion | We can use the red blood cell size as a means to help diagnose the type of anemia: microcytic (small cell size) common, macrocytic (large cell size), normocytic (normal cell size) |
Evaluate | Assessing other students' work or ideas and giving opinions or judgments for their or others’ work with reasoning | Your approach to diagnosis seems straightforward and practical especially when it is an emergency |
Facilitate | Facilitating feedback, guiding, and encouraging | Would you elaborate more on the relationship between anemia and chronic disease? |
Organize | Discussing how to divide the work and reflecting on the progress of the assigned tasks. Talking about the group and related issues, as well as ways to solve them | The objectives of our problem are discussion of diagnosis of anemia, differences between anemia due to bleeding, and chronic disease and laboratory tests of anemia… |
Question | Raising a question that demands new information and elaborated explanations related to a specific contribution | What other diagnostic tests do you think that we can rely on to diagnose anemia? |
Share | Providing objective information (no reasoning) that does not represent the student’s words (e.g., definitions or facts) | “Anemia is a condition in which the number of red blood cells or the hemoglobin concentration within them is lower than normal” |
Socialize | Sharing feelings about others’ work or comments or experiences related to the discussion | Thank you, I really liked how you simplified the approach here |
Data analysis
Our study follows the conceptualization of CSCL as a time-dependent process that unfolds over time (Hadwin et al.,
2017). While temporal aspects have been previously explored as durations and rates and have been studied under several frameworks (Cress et al.,
2021; Häkkinen,
2013), our study embraces the view that order and sequence matter (Chiu & Reimann,
2021; Reimann,
2009; Ritter et al.,
2007). In doing so, we take a process-oriented approach to the temporal data, in contrast to the “code and count” approach (Oshima & Hoppe,
2021; Reimann,
2009; Suthers,
2006). We concur with the view of (Chiu & Reimann,
2021) and their methodological approach, where the authors explained in great detail why sequence methods are needed and why a stochastic (as opposed to deterministic) view of sequences is more true to the dialogical analysis. This view enables the exploration of temporality and extends our understanding of the process of CSCL. Exploring such ordered sequences, accounting for the dependencies between sequences of events and calculating the probability that an event follows the other (e.g., disagree follows argue) requires appropriate methods (Chiu & Reimann,
2021). As such, the methods used in this study, namely, sequence mining, stochastic process mining, and regression models, follow the suggested methods of Chiu and Reimann (
2021).
To answer RQ1 and RQ2, two analytical methods were selected: sequence and process mining (Chiu & Reimann,
2021; Häkkinen,
2013; Molenaar & Chiu,
2014). Sequence and process mining have been established in the literature to model the temporal patterns of learners’ behavior (Bogarín et al.,
2018; Matcha et al.,
2019; Saqr et al.,
2023). Sequence mining is particularly useful in modeling sequences of events, visualizing the timeline of the unfolding of such events, and offers a rich toolset for statistical analysis and clustering (Kinnebrew & Biswas,
2012; López-Pernas & Saqr,
2021). Sequence mining was performed by chronologically ordering the coded forum contributions by their associated timestamps. The time-ordered, coded interactions were used to build a state sequence object (the sequence file format) using the TraMineR R package (Gabadinho et al.,
2011). TraMineR is an open-source R package that has a rich repertoire of visualizations as well as statistical and modeling functions for sequence analysis. Building a sequence requires a time epoch, which is the unit of analysis, often referred to as a session or group of events that occur in close temporal proximity (Gabadinho et al.,
2009). For the analysis and modelling of the temporal unfolding of interactions at the
group level (RQ1), the time epoch was a whole week of interactions, which is the full duration a group takes from start to finish of a
problem discussion. The case ID was the group ID, and therefore, every sequence object is a time-ordered sequence of interactions in a given group. The state sequence object was visualized using a distribution plot, which shows the proportion of each coded interaction that occurred at each time point and, therefore, is suitable for showing the temporal distribution of interactions across time. An index plot was also plotted for each sequence to show the sequence of coded interactions in each group (Gabadinho et al.,
2011).
For the analysis and modelling of the temporal unfolding of interactions at the
student level (RQ2), the time epoch was considered as the session, that is, a sequence of successive interactions by the same student with less than 20 min (the 85
th percentile of the dataset) of inactive period between them (Fincham et al.,
2019; Saqr & López-Pernas,
2021), and the case ID was the student ID. Differential sequence mining is often used in educational settings to identify similar patterns of behavior and, therefore, it was performed to identify similar patterns of interactions. That is, patterns of students’ sequences of interactions that are homogenous and similar to each other, yet different from other patterns (e.g., Fincham et al.,
2019; Kinnebrew & Biswas,
2012). Differential sequence mining was performed using Agglomerative Hierarchical Clustering (AHC) based on similarities/dissimilarities using the Longest Common Prefix (LCP); in other words, clustering together sessions that have a similar start. LCP was chosen based on the best Average Silhouette Width (ASW) and R-squared (R
2) (Gabadinho et al.,
2009).
We complemented the aforementioned analysis with a stochastic process analysis that enables the visualization and estimation of transition probabilities between coded interactions. That is, for instance, the probability that a discussion is followed by an argument; which interactions follow an argument and what their associated transition probabilities are (Chiu & Reimann,
2021; Reimann,
2009). While several process mining algorithms exist (Bogarín et al.,
2018), we have opted for a process mining approach based on Markov models implemented in the pMineR R package (Gatta et al.,
2017a) as recommended in past work (Chiu & Reimann,
2021; Reimann,
2009). Markov models are particularly useful when assessing temporal processes and transitions (Gatta et al.,
2017b). Compared to other algorithms that are based on frequencies, Markov models make stochastic decisions based on transition probabilities, in ways that are better suited to our research questions and conceptualization of the collaborative interactions as time dependent, sequential, and contingent (Hadwin et al.,
2017; Molenaar & Chiu,
2014). We used a threshold of 0.1 such that transitions that accounted for less than 10% of the total count were omitted. In pMineR, the algorithm uses First Order Markov Modelling (FOMM) to train and visualize process models. The resulting model was visualized such that interactions are nodes, and the edges are transitions with arrows pointing in the direction of the transitions. The edges are labeled with the transition probabilities. We conducted this analysis both at the group level (RQ1), and at the student session level (RQ2).
To study the relationship between types of students’ interactions (counts or frequencies) and performance (RQ3), we used Multilevel Regression Models (MLMs). MLMs are superior to traditional regression models (e.g., ordinary least squares) when the data are nested or hierarchical (the data were nested in four different courses in our case) and can model this dependency across observations. As a preparatory step for the data to make it comparable across the four courses, the outcome variable was standardized (by subtracting the mean and dividing by the standard deviation). The rest of the variables were gaussianized using a non-paranormal transformation through a novel approach that fits the data to the normal distribution (Zhao et al.,
2012). To estimate the MLM, we followed the method described by (Field et al.,
2012; Roback & Legler,
2021), which starts by estimating the random intercept model or the unconditional means model, which contains no predictors but only the random effects (course in our case). A random intercept model is commonly estimated to investigate whether there is systematic variation across groups, that is, if there is a need for an MLM to account for the nestedness of the data. Also, random intercepts models were also used for comparison with the estimated models. The random intercept model Pseudo-R
2 (fixed effects) was 0, Bayesian information criterion (BIC) was 1556.3, Akaike information criterion (AIC) was 1546.4, and the interclass correlation (ICC) was 0.59. Therefore, an MLM was deemed necessary to account for data clustering. Four MLM regression models were estimated using the maximum likelihood (ML) estimator. An ML estimator is recommended when comparing across models with different fixed effects (Field et al.,
2012). The models were estimated using the R package lme4 (Bates et al.,
2014). The four models consisted of a full model with the frequencies of all coded interactions and three models using different subsets of the variables: a cognitive subset where the frequency of codes indicating cognitive activities were included, a high-cognitive subset where the frequency of variables representing high-cognition activities (i.e., dealing with the reasoning, critiquing or generation of new ideas or knowledge) were included, and one last subset with the frequency of off-task activities (i.e., socializing and organizing). All models were evaluated with AIC, BIC, and ICC following the recommendation of Bates et al. (
2014). The proportion of explained variance was calculated using Pseudo R
2 (Nakagawa & Schielzeth,
2013). All variables were tested for multicollinearity using the Variance Inflation Factor (VIF). Only the variables “evaluate” and “agree” had a VIF of 5.77 and 6.63 respectively; still, they were within the acceptable range (O’brien,
2007). All models were also checked for homoscedasticity of residuals and for influential observations (Cook’s distance).
To study the relationship between students’ sequences of interactions and performance (RQ4
), we used the temporal patterns of interaction (the clusters of sequences) in an MLM regression model as independent variables and the final grades as a dependent variable. The approach is similar to studies that investigated the association of such sequences with grades using other theoretical lenses (e.g., SRL) (Matcha et al.,
2019). The model aims to identify which and to what extent temporal patterns of sequences are associated with better performance. Additionally, the model fit indices (R
2 and BIC) are compared with RQ3 models to see which model (i.e., frequencies vs temporal sequences) is superior in explaining the final performance of students.
Please note that all regression models used individual-level data and therefore, all interpretation and inferences are made only at the individual level. This decision was made in order to avoid the interdependence of group-level data on the individual students forming the groups.