Using Computational Grounded Theory to Analyze Pre-service Chemistry Teachers’ Reflective Practice Regarding Technology Integration in Classrooms Within a Service-Learning–Oriented Seminar
This article investigates the reflective practices of pre-service chemistry teachers as they integrate technology into their teaching methods within a service-learning-oriented seminar. The study employs the TPACK (Technological Pedagogical Content Knowledge) model to analyze how teachers contextualize their professional knowledge acquisition and readiness for digitalized teaching. The research highlights the importance of reflective practice in teacher education, emphasizing the interplay of knowledge, classroom experiences, and technology integration. Through the use of computational grounded theory and natural language processing, the article provides a nuanced understanding of how pre-service teachers develop their professional competencies. The findings reveal the challenges and opportunities in balancing theoretical knowledge with practical application, offering valuable insights for educators and researchers in the field. The article also discusses the effectiveness of the service-learning approach in fostering a deeper understanding of professional knowledge and its application in real-world teaching scenarios. By examining the reflective diaries of pre-service teachers, the study sheds light on the diverse ways in which teachers perceive and integrate technology into their teaching practices, providing a comprehensive overview of the complexities involved in modern teacher education.
AI Generated
This summary of the content was generated with the help of AI.
Abstract
Integrating digital technologies in science education requires innovative methods to bridge theoretical knowledge and practical classroom application. This study examines a seminar at an Austrian university that was redesigned to address pre-service chemistry teachers’ professional knowledge regarding technology implementation in high school classes via a service-learning approach. By combining questionnaires with formative evaluation through reflective learning diaries, our study captures the multifaceted nature of professional knowledge acquisition. Central to this work is the use of natural language processing within a Computational Grounded Theory framework to analyze students’ reflective diaries. We employed techniques such as word embeddings and topic modeling to extract latent themes and patterns in student texts. We revealed that, despite mixed results from conventional self-report measures and deductive qualitative content analysis, students’ written reflections offered rich insights into their learning processes when we investigated them through the lens of natural language processing. Along the diary entries, our analysis uncovered shifts in emphasis from, e.g., broad cultural perceptions of teaching and learning to more detailed considerations of lesson planning and technology integration. These nuanced insights underscore the complementary value of natural language processing in identifying underlying patterns of reflective practice that traditional assessments may overlook. Although the study is limited by its small sample size and methodological constraints, the findings suggest that incorporating computational techniques can enhance the formative assessment of free writing in teacher education programs. Overall, the results motivate us to advocate for the integration of computational text analysis as a promising tool in evaluating and fostering the complex interplay of professional knowledge and reflective practice in technology-enhanced science education.
Notes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Introduction
Education is messy, and students’ learning outcomes depend on a multitude of influences. Within this multitude, acquiring professional knowledge, such as subject-specific content knowledge or pedagogical content knowledge, is generally deemed a necessary condition to become a good teacher (Carlson et al., 2019; Fischer et al., 2012; Shulman, 1986). Additionally, using technology in classrooms adds another layer to teacher training. Every type of technology requires specific knowledge and training for an appropriate application. The range in science education spans, e.g., digital classroom demonstrations (Shirkhanzadeh, 2009), setting up digital laboratory environments for students (Chen et al., 2024; Ter Horst et al., 2024), or, most recently, integrating large language models (Dahlkemper et al., 2023; Guo & Lee, 2023; Lee & Zhai, 2024; Tassoti, 2024), all of which challenge teacher educators with respect to balancing their choice of content.
Regardless of the specific content, acquiring subject-matter knowledge is merely the first step in teacher training: school teachers’ knowledge about chemistry in and of itself does not ensure instructional quality and cannot, in turn, ensure school students’ learning of chemistry (Baier et al., 2021; Heinitz & Nehring, 2023). Rather, it is the very interplay of knowledge, classroom experiences, and reflection with experts and peers that provides teachers with specific competencies (Stender et al., 2021) and constitutes a future-proven (Cramer et al., 2023) teacher education. Consequently, tracking and understanding the complexity of pre-service teachers’ reflective practice is an important task for pre-service teacher trainers.
Advertisement
At the University of Vienna, we examined a chemistry teacher education course redesigned according to the service-learning approach (SLA; Resch & Schrittesser, 2021). The SLA facilitates students’ engagement in service activities within schools (e.g., implementing digital teaching materials), thereby bridging the gap between theoretical knowledge and practical application while fostering professional knowledge and skills development (Furco, 2009; Resch et al., 2022). We evaluate how pre-service teachers contextualize their professional knowledge acquisition and readiness for digitalized teaching in Austrian secondary schools within the SLA framework, using an established professional knowledge test for summative assessment and semi-automatic approaches from natural language processing (NLP) for formative assessment.
Theoretical Background
Professional Knowledge
The TPACK (Technological Pedagogical Content Knowledge) model (Koehler & Mishra, 2009; Mishra & Koehler, 2006) was employed as a theoretical foundation to enhance our understanding of teachers’ professional knowledge, building upon the seminal work of Shulman (1986). It complements the body of literature by adding a specific technological dimension to more generic frameworks (Baumert & Kunter, 2013; Fischer et al., 2012).
TPACK conceptualizes teacher knowledge as comprising three primary dimensions integrated to address the demands of technology-enhanced teaching. Despite critiques regarding the empirical validity of TPACK’s structural specifics (Scherer et al., 2017), its general explanatory utility has been acknowledged for illuminating pre-service teachers’ professional knowledge growth (Stinken-Rösner et al., 2023). The core dimensions of TPACK and their intersections are depicted in Fig. 1.
1.
Content knowledge (CK) refers to the subject-specific foundation of a school lesson: “In the case of science, for example, this would include knowledge of scientific facts and theories, the scientific method, and evidence-based reasoning” (Koehler & Mishra, 2009, p. 63). That is, teachers need to know different theories and models of chemical reactions and how they relate to each other. For example, knowing the development and utility of either Brønsted’s and Lowry’s or Arrhenius’ model of acid–base reactions supports a flexible understanding of molecular mechanisms.
2.
Pedagogical knowledge (PK) refers to how teaching and learning happen: “This generic form of knowledge applies to understanding how students learn, general classroom management skills, lesson planning, and student assessment” (Koehler & Mishra, 2009, p. 64). That is, teachers need overarching knowledge of psychological mechanisms and the dynamics of student groups, irrespective of the subject. For example, knowing the difference between different theories of student motivation helps to tailor a lesson accordingly, e.g., with respect to expectancy and value of learning or students’ perceived autonomy when working on tasks.
3.
Technology knowledge (TK) refers to how “[…] persons understand information technology broadly enough to apply it productively at work and in their everyday lives, to recognize when information technology can assist or impede the achievement of a goal, and to continually adapt to changes in information technology” (Koehler & Mishra, 2009, p. 64). That is, teachers need to know how a technology works conceptually to decide whether its use makes sense either as an organizational vehicle or a digital device that supports chemical lab work. For example, knowing that large language models predict words stochastically is necessary to understand their limitations.
Fig. 1
Intersecting dimensions of professional knowledge delineated by Koehler and Mishra (2009)
We highlight the critical role of pedagogical content knowledge (PCK), situated at the intersection of content knowledge (CK) and pedagogical knowledge (PK). PCK, regarded as transformative knowledge, merges the specifics of a subject, in this case, chemistry, with universal strategies for instructional practice. It is transformative because it means that teachers shape content and pedagogy to specific situations, and that PCK is not a static body of literature (Fischer et al., 2012). In the sense of the refined consensus model of PCK (RCM; Carlson et al., 2019), we differentiate between personal PCK (pPCK) and enacted PCK (ePCK). The former means theoretical knowledge acquired in pre-service teacher training, and the latter means the implementation of what a teacher knows in a specific teaching situation at a specific time and place. In short, having high PCK means successfully connecting subject matter, devising various representations, and customizing instructional materials to align with students’ misconceptions and pre-existing knowledge.
Advertisement
However, understanding the nuances of pre-service teachers’ subject-specific lesson planning cannot be solely attributed to the knowledge dimensions’ interplay. Simply enhancing pre-service teachers’ knowledge base is not sufficient for equipping them for their forthcoming roles. Consequently, teacher education programs must design and assess learning environments that promote collaboration among lecturers, in-service teachers, and pre-service teachers to meld knowledge with reflective practice.
Moreover, using technology in classrooms and schools effectively is no abstract knowledge but a materialized effort of actual people doing actual things in the sense of ePCK. Enabling students to use an app to capture measurements during an experiment is very different from providing learning materials via an online platform. We address this by choosing a broader analytical framework comprising five design spaces that are essential for merging professional knowledge with the comprehensive demands of classroom activities (Mishra & Warr, 2021; Warr et al., 2020):
1.
Artifacts like apps or devices are “[s]table objects that can be perceived through the senses.”
2.
Processes, such as schedules or accessing learning materials, are “[…] procedure[s] or directions that can be used to achieve a goal outside of the context within it was created.”
3.
Experiences, such as class meetings or field trips, are “[…] piece[s] of time with associated sights, sounds, feelings, and thoughts.”
4.
Systems, such as standards for instruction or IT infrastructure, are “[…] organized and purposeful structure[s] of interrelated and interdependent elements.”
5.
Cultures, such as perceptions of technologies or personal beliefs, are “[…] pattern[s] of shared basic assumptions that [allow] groups to perceive and interpret the world in similar ways, develop and communicate meaning, and transmit values to new group members.”
Choosing this framework underscores that the complexity inherent in teaching and learning extends beyond mere knowledge transmission for the effective preparation of future-oriented chemistry teachers.
Service-Learning Approach
The SLA in teacher education seeks to connect theory and school practice by giving pre-service teachers the opportunity both to participate in an organized service activity outside the university (here: implementing technology-focused chemistry lessons) and to reflect on the experience within their coursework at university in order to gain a deeper understanding of the course content (here: combining professional content, technology, and pedagogical content knowledge in chemistry; Bringle et al., 2006). Applying an organized service activity in schools, meaning a direct service for pupils, parents, teachers, or other professionals in schools, leads to an enhanced sense of civic engagement (Chambers & Lavery, 2012). A technology-focused, direct service was chosen in the [blinded course], on which this study is based. While the SLA has been widely practiced, studied, and accepted in Anglo-American, Canadian, and Australian educational contexts, it has yet to establish itself fully in the German-speaking world (Reinders, 2016). In Austria, the higher education sector is said to be autonomous to develop without financial, political, or labor market restrictions (Resch & Fellner, 2022). Hence, knowledge production is oftentimes initiated without influences from practice fields (here: schools). Applying the SLA—for this reason—requires engaged scholarship and additional effort by educators. Educators who promote service-learning predominantly belong to the academic mid-level, according to a recent national study in Austria among educators (postdoc phase without habilitation; Fahrenwald et al., 2023). This is also true for the authors of this study. Their engaged scholarship means building trust relationships with schools, maintaining these relationships, and regularly cooperating in different forms of campus-community partnerships (Johnson Butterfield & Soska, 2004). The service-learning approach supports students in making practical experiences in their discipline (here: chemistry) and, in parallel, developing academic, technological, and civic competencies by engaging in practice while studying.
Research Question (RQ)
The SLA-driven technology implementation course aims to foster the professional knowledge of pre-service chemistry teachers. In particular, we were interested in understanding the interplay of a university course, classroom practice, and perceived knowledge acquisition of CK and PCK. CK and PCK were particularly interesting because we were able to contrast the self-reported measures with an established instrument for summative assessment (Tepner et al., 2012). Our research question was:
How do pre-service chemistry teachers contextualize their perceived knowledge gains during a service-learning course on planning and conducting a technology-integrated chemistry lesson in secondary school?
Setting of Data Collection
Course Characteristics
The course evaluated in our study is part of the elective Bachelor curriculum, offering 3 ECTS credits to pre-service chemistry teachers at the University of Vienna. Conducted by the Austrian Educational Competence Centre Chemistry (AECC Chemistry), this course is available every semester, running weekly for two hours. It was deliberately redesigned to align with the SLA, aiming to merge professional knowledge acquisition with practical preparation for secondary school chemistry teaching in Austria. To introduce the course participants to the TPACK model, they received a theoretical introduction and then categorized their initial ideas of technology implementations within the dimensions of the model. In-service teachers, collaborating through University of Vienna’s partnerships with approximately 58 schools under the school cooperation program of the Centre for Teacher Education, contributed to the course design by providing authentic classroom experiences to the participants.
Data Sources
To evaluate and compare the content knowledge (CK) and pedagogical content knowledge (PCK) of students (N = 6), we administered knowledge tests (Tepner et al., 2012) at the beginning and end of the course. Students were also asked to provide sociodemographic information, including their chosen subjects; details on participant demographics are available in the results section. Additionally, an adapted TPACK questionnaire (Schmidt et al., 2009) was used to assess students’ self-reported knowledge gains across the dimensions outlined by Mishra and Koehler (2006).
Students were further instructed to document their professional knowledge development through reflective diaries, incorporating insights from theory-focused lectures and their experiences in cooperating schools. These reflections aimed at elucidating students’ personal interpretations of how their knowledge gains could impact their future technology-integrated chemistry teaching. Each student was expected to make five diary entries ranging from 300 to 400 words. The final diary entry was part of a larger graded assignment. This assignment required students to demonstrate their lesson planning skills and provide a practical-theoretical analysis of their teaching experiences. The students knew that the diary entry, in particular, was not graded.
Prompts for Students’ Diaries
The (translated) prompt for their first diary entry was the following:
Using an example from chemistry lessons, explain how you think content knowledge, pedagogical content knowledge, and technology knowledge are related. In your opinion, what significance do these three dimensions of knowledge have for your future teaching activities?
The follow-up prompts for the remaining four entries were:
In the first learning diary entry, you were supposed to explain the relationship between content knowledge, pedagogical content knowledge, and technology knowledge from your point of view. Answer the following questions, taking into account the example you have chosen. Give reasons for your answers in each case.
1.
Did your perspective on the relation of these three dimensions change?
2.
Did your perspective on how these three dimensions have an impact on your future teaching activities change?
3.
Regarding the use of digital technology in chemistry lessons: Which aspects of your future teaching activities are important to you but are not reflected in the three dimensions?
We employed a triangulation approach to assess students’ knowledge development, utilizing diverse data sources. This included standardized questionnaires to evaluate both actual and perceived knowledge gains, serving as descriptive tools for participant characterization. Additionally, learning diaries provided detailed insights into students’ reflections on their development and classroom experiences.
Given the nature of the textual responses from diary entries, we adopted qualitative methodologies for analysis, acknowledging the usual tradeoffs in handling and interpreting writings. On the one hand, freely expressed writings allow a more nuanced understanding of students’ thoughts in comparison to standardized instruments such as questionnaires. On the other hand, researchers are challenged with analyzing and reporting text responses due to humans’ manifold writing skills, styles, and understanding of words. To systematically tackle this challenge, we leveraged computational text analysis techniques guided by the statistical semantics hypothesis, i.e., we assume that “statistical patterns of human word usage can be used to figure out what people mean” (Turney & Pantel, 2010, p. 146).
While the theoretical basis for these methods dates back to (at least) Harris (1954), only recently has software evolved to a point where it is applicable in science education, facilitating preliminary theme identification through common analysis software. This advancement allows for a nuanced blend of qualitative and quantitative analysis, transcending the traditional divide between qualitative document analysis and quantitative metrics. The exploration of (semi-)automated systems for such analyses is emerging (Gombert et al., 2023; Kubsch et al., 2022; Tschisgale et al., 2023; Wulff et al., 2022, 2023; Zhai & Lu, 2023) with specific applications to chemistry education notably gaining traction (Dood et al., 2024; Martin & Graulich, 2023, 2024; Martin et al., 2023; Watts et al., 2022; Winograd et al., 2021; Yik et al., 2021). For example, Martin et al. (2024) validated the integration of unsupervised machine learning for pattern detection in university students’ texts with traditional qualitative content analysis, demonstrating how automatic systems are inherently bound to research-informed prior information in the sense of the Computational Grounded Theory (CGT).
The Computational Grounded Theory
Grounded Theory (Glaser & Strauss, 2017) is our starting point for reading and interpreting our students’ texts. This approach facilitates the identification of observational patterns, potentially leading to hypotheses or generalizations. For example, consistent high performance in summative assessments by students taught by a specific teacher may suggest a causal relationship with the teacher’s methods. However, human analysis of observational or textual data often lacks reliability in inter-personal and intra-personal contexts over time (Williams et al., 2013), limiting our confidence in such inferences without comparative analysis involving different teachers.
Grounded Theory addresses these limitations through methodologies such as applying diverse theoretical perspectives, iterative coding, and interpretation, along with inter-subjective discussions. These practices may enable a deep understanding of pre-service chemistry teachers’ learning growth when rigorously applied. Nonetheless, the thorough application of Grounded Theory is notably time-intensive, posing challenges in terms of resource allocation.
We employed Computational Grounded Theory (CGT; Nelson, 2020; Nelson et al., 2021) to enhance our analysis of participants’ reflections, utilizing various data sources. CGT provides a structured framework to augment traditional hermeneutic methods with computational efficiency, allowing for consistent and reproducible data analysis.
CGT involves three main steps: The first, pattern detection, leverages computational tools to identify patterns within natural language texts, setting the stage for a more focused manual review. In the refinement phase, researchers engage in deep hermeneutic reading and discussions, or qualitative content analysis (QCA), refining insights based on the patterns identified earlier. This phase may also include the integration of additional data sources to enrich the analysis. The final step, confirmation, connects the insights back to the original patterns and the overarching research question, ensuring alignment and relevance. In chemistry education,
Analysis Workflow and Software
Textual Data Analysis
We implemented CGT via techniques from NLP, reflecting a growing toolbox in educational research. Following the methodology outlined by Martin et al. (2023) and Tschisgale et al. (2023), we structured our analysis according to the three-step framework proposed by Nelson (2020). This paper outlines our analytic procedures, with Table 2 providing a summary of the R-packages employed (R Core Team, 2023). Detailed findings, including visualizations, are presented in the results section.
We initiated our analysis by preprocessing the raw text data, which involved removing punctuation and numbers and converting all words to lowercase. This preprocessing step is standard in NLP tasks to minimize the influence of semantically low-value text tokens (Denny & Spirling, 2018). Tokens are the result of analysts’ decisions on the smallest sensemaking unit in a text (Jurafsky & Martin, 2023). Here, tokens can be understood as interchangeable with natural words.
Pattern Detection
We vectorized the texts using bidirectional encoder representations from transformers (BERT; Devlin et al., 2019) using BERT german uncased (Bayerische Staatsbibliothek, 2025) via a software implementation by Kjell et al. (2023). While preceding research suggests using cased BERT variants for German texts (Martin & Graulich, 2024), our embeddings with cased variants brought about more noise than their uncased relatives. We trace that phenomenon to our comparably sparse data, where differentiation between cased and uncased terms increases uncertainty. Given the complexity of the resulting multidimensional space (\({N}_{\text{dim}}=768\), software default), direct human interpretation of the word embeddings was not possible. We established a corridor of possible term clusterings from our text data via uniform manifold approximation and projection (UMAP; Konopka, 2023; McInnes et al., 2018). To maximize the utility of a data-driven approach in this regard, we let the computer iterate over a parameter grid for UMAP with cosine as a similarity metric and a hierarchical cluster analysis (Table 3).
Table 3
Parameter grid values to triangulate feasible cluster numbers and dimensions
Parameter
Values
Neighbors
5, 12, 15
Minimal distance
0.001, 0.005, 0.01, 0.05
Number of clusters
5, 7, 10, 15, 20
Number of dimensions
5, 10, 15, 20
The solutions obtained were evaluated by a fivefold cross-validation with respect to minimizing the sum of squared distances between observations in a cluster and their respective centroid, resulting in the range of fifteen to twenty clusters as being optimal (Fig. 2).
Fig. 2
Cluster tuning results (fivefold cross-validation) with respect to the mean within sum of squared error over the total sum of squared error. Columns show the number of evaluated neighbors, and rows show the minimum distance between them
This numerical corridor pointed at minimizing the error with five neighbors, thus facilitating a subsequent cluster analysis with density-based spatial clustering of applications with noise (DBSCAN Campello et al., 2015; Ester et al., 1996; Hahsler et al., 2019). A useful value for the radius of the neighborhood was estimated at around \({\text{radius}}_{\text{dist}}= .32\) via a k-nearest neighbor cluster plot with respect to the minimum number of neighbors from the tuning results (Fig. 3).
Fig. 3
k-Nearest neighbor distance plot with 5 as a minimum number of neighbors. The dashed line indicates the so-called elbow, leading to a distance value at around.32 as the parameter of choice for a subsequent DBSCAN application
The pattern refinement was a multi-step procedure. First, we grouped the individual words into clusters via DBSCAN and investigated the obtained solutions manually. Second, a sentence-wise qualitative content analysis was applied. Third, a variant of the so-called topic models (TM; Blei, 2012; Blei et al., 2003) contrasted and complemented the qualitative content analysis.
To refine the patterns via a DBSCAN, we tried out a combination of the DBSCAN parameters that arrived at solutions between 15 and 20, motivated by our parameter tuning described above. That approach led to our final decision of seventeen clusters, balancing differentiation, minimizing noise, and human-based interpretability. For example, if we had followed the optimal values from the cluster tuning, we would have been forced to choose a neighborhood radius of \({\text{radius}}_{\text{dist}}= .32\) and 5 as a minimum number of cluster members resulting in a cluster solution of seven clusters where cluster number one would have contained around 1500 tokens, the other clusters would have contained 6–18 tokens plus a noise cluster with 7 tokens. We were unable to interpret that solution and proceeded by keeping the radius value constant while increasing the cluster member number up until a value of 20, where DBSCAN identified 17 clusters, including a noise cluster to represent statistical uncertainty. We undertook a dual-step process for interpreting these clusters. In the first step, we categorized clusters with general labels, such as chemistry or grammar, to organize them into broader thematic families. Notably, some clusters aligned with the five design spaces for technology implementation in education (Mishra & Warr, 2021), aiding in our exploratory interpretation. Ambiguous tokens were further examined using a keyword-in-context function (KWIC; Benoit et al., 2018) for precise context comprehension within diary entries. The KWIC-function displays tokens in their respective sentences and documents, allowing a quick look-up, bridging the cluster interpretation and the qualitative content analysis.
In the second step, we conducted a summative qualitative content analysis using the software QCAmap (Hsieh & Shannon, 2005; Mayring, 2022; Mayring et al., 2022). Our analysis categories were derived from the five design spaces, facilitating the integration of individual texts into a broader framework for technology use in education. We present a summary of category frequencies and reliability assessments (\({\alpha }_{\text{Krippendorff}};\) Hughes, 2022; Krippendorff, 2004) in the “Results” section.
Building on our exploration and categorization, we utilized a sequential seeded Latent Dirichlet Allocation (SSLDA; Watanabe & Baturo, 2023; Watanabe & Xuan-Hieu, 2023), a semi-supervised advancement within the Topic Modeling (TM) algorithm family (Blei, 2012; Blei et al., 2003). The basic assumption of TM is that documents (here: sentences) comprise mixes of topics, which in turn consist of word distributions. For instance, a topic on organic chemistry might include terms like “structural formula,” “carbon,” and “electrophile substitution.” In contrast, if a sentence contains the terms “ion lattice” and “intermetallic,” it is rather likely that it does not belong to organic chemistry. Technically, TM operates on a document-word matrix, probabilistically deriving topics from the document collection by assigning words to topics based on likelihood. The generation of words under high probability, yet unnamed, topics is a key outcome, with the meaningfulness of these automated assignments open to interpretation:
Indeed calling these models ‘topic models’ is retrospective - the topics that emerge from the inference algorithm are interpretable for almost any collection that is analyzed. The fact that these look like topics has to do with the statistical structure of observed language and how it interacts with the specific probabilistic assumptions of [Latent Dirichlet Allocation]. (Blei, 2012)
SSLDA enhances traditional topic modeling by incorporating theoretically derived clusters of predefined seed words into the unsupervised topic assignment process. This approach is particularly beneficial for analyses involving a small number of documents. The sentence-wise approach first aligns with the coding procedure of our qualitative content analysis and allows for a finer-grained analysis compared to a diary-wise lens. Secondly, it reflects the rather eclectic nature of our students’ writings: they write about a lot of different things that come to their minds when being asked about their professional development. We guided them to reflect on their professional knowledge gains and digital classroom experiences, effectively predetermining the five design spaces as a set of topics for analysis. During the SSLDA process, we removed less informative words, including grammatical terms and stopwords. Additionally, we applied Laplace smoothing by adding a minimal constant to all entries in the document-word matrix (Manning et al., 2008), assigning a non-zero probability to each word. This technique enhances the robustness of NLP estimation methods by acknowledging the potential presence of words in documents where they do not empirically appear, reflecting the probabilistic nature of text production and analysis. In other words, if natural language is not deterministic, then the analysis of natural language should account for that and assume a low probability for all terms that could appear. The outcomes of the SSLDA analysis are detailed in the results section.
Pattern Confirmation
For the computer-based pattern confirmation in the sense of Nelson (2020), we assigned the most likely topic to each respective sentence. We split the data (Nsentences = 407) into a training set (N = 270; ~ 66%) and a test set (N = 137; ~ 33%) while preserving the proportion of assigned topics from the qualitative content analysis, i.e., stratified by topic. We applied a multinomial naïve Bayes classification with fivefold cross-validation on the training set based on a simple document feature matrix (Jurafsky & Martin, 2023). Then, the predictive accuracy of the classifier was checked on the test set and its performance measures (results section). In summary, we approached the pattern confirmation step by assessing the accuracy of a computerized classification based on mere word proximity measures (Jurafsky & Martin, 2023) independent of the word embeddings.
Results
Participants’ Characteristics
All participants in the course (\({N}_{\text{female}}=3,{N}_{\text{male}}=3\)) were pre-service secondary science teachers specializing in chemistry alongside a second subject within their Bachelor’s program at the University of Vienna. The majority (four students) selected mathematics as their second subject, while one opted for biology and another for Spanish. These students were in the advanced stages of their programs, with their study experience ranging from the 5 th to the 8 th semester. They identified as white and middle class, aligning with the common demographic profile of pre-service teachers in Austria. Figure 4 shows the pre-post comparison in terms their knowledge gains, assessed through summative knowledge tests by Tepner et al. (2012). Due to the absence of one student at the post-test session, their data were omitted from the pre-post analysis.
Fig. 4
Knowledge estimates from summative knowledge tests including changes from pre to post for pedagogical content knowledge (A) and content knowledge (B)
Our findings reveal a strong variability among students for content knowledge but not for pedagogical content knowledge. There was no substantial increase from pre to post, except for one student’s CK.
In terms of self-reported knowledge, there was an average increase in perceived CK and overall TPACK scores. However, responses related to PK and TK measures displayed diverse outcomes. The variations in individual progress are illustrated in Fig. 5, highlighting the within-person changes.
Fig. 5
Students’ self-reported gains of all knowledge dimensions from the TPACK model by Mishra and Koehler (2006) assessed via Schmidt et al. (2009)
Additionally, we report the relationship between our students’ self-reported PCK and CK with our summative questionnaire, effectively revealing no substantial correlation (Fig. 6).
Fig. 6
Relationship between self-reported knowledge and scorings from our summative questionnaires at pre- and post-measurement
Our analysis includes a descriptive overview of the diary entries, summarized in Fig. 7. Generally, students adhered to the suggested length of around 300 words per entry. Three notable trends emerged: First, one student (represented by pink data points) significantly exceeded the word limit in their second entry, only to reduce their word count below the initial level in the final entry. Second, another student (red data points) demonstrated a steady increase in word count from the third entry onward. Third, the majority (five out of six) of the entries exhibited a substantial increase in length in the final submission.
We visualize the word clusters in Fig. 8 resulting from our word embeddings. Exemplarily, we point at the magnified cluster number 7 (cf. next section), representing words from the chemistry cluster where students described experimental setups and respective chemicals.
Fig. 8
Clusters obtained via DBSCAN clustering after dimension reduction. For visualization purposes, we manually translated the German words into English and reduced the number of words to avoid cluttering in the subplot
We manually interpreted each cluster derived from the DBSCAN analysis, as outlined above, and mapped them, where possible, to the five design spaces for technology implementation in education (Mishra & Warr, 2021). Cluster 0 was identified as noise. Clusters 2 and 6 lacked cohesive themes. Cluster 5 encompassed general views on technology in education, particularly focusing on its opportunities and challenges for young users. Chemistry-related terms were grouped in Cluster 7. Cluster 8 captured discussions on the application of professional knowledge, emphasizing which knowledge areas were deemed essential for effective teaching.
Clusters 4 and 9 were associated with individuals and locations within educational contexts, with Cluster 4 addressing broader educational stakeholders and Cluster 9 focusing on specific individuals and settings encountered during the seminar. Students’ reflections on their teaching experiences were categorized into Clusters 10 and 13, highlighting target-performance evaluations and deficit-focused reflections, respectively. Clusters 1, 11, 12, and 17, relating to lesson planning aspects like procedural tool use, spatial/temporal considerations, specific tool references, and lesson orchestration, were grouped under a unified theme of lesson planning. Lastly, specific grammatical structures were identified in Clusters 3, 14, 15, and 16. The cluster interpretations and their linkages to the design spaces are detailed in Table 4.
Table 4
Interpretation of clusters obtained by DBSCAN
Cluster no
Interpretation
Exemplary token
Assignment to design spaces
0
Noise points
–
–
1
Lesson planning
Application of tools (verbs): using, used, applied
Processes
2
Fuzzy, not interpretable
–
–
3
Grammar
Demonstrative pronouns: those, these, they
–
4
Persons and places
Stakeholders in general: students, teachers, universities, schools
Experiences
5
Technology in education
Discussion of tech in general: internet, youth, chances, risks
Culture
6
Fuzzy, not interpretable
–
–
7
Chemistry
Macroscopic and submicroscopic: oxygen, burning, water, molecules
8
Application of knowledge
Interplay of TPACK and tech use: competence, in-depth knowledge, ressources, [interpretation of] results
To identify additional patterns, the first author initially coded the material sentence by sentence according to the five design spaces outlined by Mishra and Warr (2021). A week later, the coding was repeated to calculate Krippendorff’s alpha, assessing intra-coder reliability (\({\alpha }_{\text{Krippendorff,intra}}=.92\)). For inter-coder reliability, a random 20% sample of sentences was coded by an external colleague, using the established coding scheme with a brief introduction to the material. The initial inter-coder reliability score (\({\alpha }_{\text{Krippendorff,inter,manual}}= .21\)) was deemed unsatisfactory, prompting a review of potential issues. The category culture emerged as a significant source of discrepancy between coders. The first coder had classified statements on the integration of professional knowledge within culture as normative views, whereas the second coder perceived them as personal comments, marking them as not attributable. After reconciling these interpretations and adjusting the coding from not attributable to culture, an acceptable level of inter-coder reliability was achieved (\({\alpha }_{\text{Krippendorff,inter,discussed}}\) = 0.68). Figure 9 shows the aggregated counts of all categories derived from the qualitative content analysis.
Fig. 9
Aggregated category counts over all diaries from our qualitative content analysis
We investigated emerging patterns through SSLDA. We guided the SSLDA algorithm by selecting seed words from the clusters corresponding to the five design spaces, with Clusters 7 (chemistry) and 8 (professional knowledge) serving as additional seeding categories to reflect the chemistry focus of the seminar and the diaries’ emphasis on professional knowledge development. In contrast to the qualitative analysis that predominantly found references to Mishra and Warr’s (2021) “Culture” SSLDA shows a more diverse pattern distribution that, in turn, guides a more nuanced view of how the students perceive their learning opportunities. Figure 10 shows the topic distribution on the diary level by students. Figures 11, 12, 13, 14, 15, and 16 show individual students’ topic distributions on the sentence level. We provide exemplary interpretations for the patterns in the figure descriptions.
Fig. 10
Topic distributions over all diaries and students. On average, students are consistently concerned with the concrete artifacts related to digital teaching and learning. They allocate considerably fewer words to chemistry and knowledge compared to an increase regarding systems and processes
Topic distribution of student 1. The person clearly focuses on artifacts when reflecting on teaching and learning with digital tools: classroom activities heavily depend on the teacher’s choice of apps, experiments, or media
Topic distribution of student 2. In the beginning, knowledge-related terms dominate the writing. In later diaries, the student switches to reflecting on experiences in the classroom and the university seminar. In the end, broader questions regarding the relationship between lesson planning and equipment availability come up
Topic distribution of student 3. The pre-service teacher starts with an experimental description and, in the last two diaries, focuses on what they consider the respective next step for classroom teaching, i.e., connecting goals and available resources
Topic distribution of student 4. Two diary entries explicitly focus on chemistry in the sense that the person describes an experimental setup. Notably, the entries in diaries 3 and 4 have a very low probability for any of the seeded topics. There, the person keeps the entries very general, obscuring the probabilistic matching to the topics
Topic distribution of student 5. A considerable mixture of culture and knowledge can be seen. In the last diary entry, the person discusses the availability of resources and the subsequent steps in planning and conducting classroom activities
Topic distribution of student 6. Questions of how knowledge and culture are intertwined dominate the patterns in this person’s diaries. The student is especially concerned with the general impact of digital tools on school students’ learning and well-being
We achieved a multinomial Kappa \(\kappa =.47\) for the automated category assignment. While this would not have been acceptable on its own, the precision (i.e., the ratio of true positives and false positives) ranges between \({P}_{\text{min}}=.36\) (category: culture) and \({P}_{\text{max}}=.81\) (category: artifacts). A similarly broad range is reported for the Recall (i.e., the ratio of true positives and false negatives) with \({R}_{\text{min}}=.42\) (category: systems) and \({R}_{\text{max} }=.84\) (category: artifacts), reflecting the sparse mentioning of some categories, e.g., specifics about the experiences. The F1 statistic represents a joint metric combining precision and recall with equal weights for both (Jurafsky & Martin, 2023). Table 5 gives an overview of all estimates.
Table 5
Summary of all metrics for estimating the accuracy of the naïve Bayes classifier regarding the test set
Category
N
Precision
Recall
F1
Artifacts
8
0.81
0.84
0.82
Chemistry
5
0.61
0.67
0.59
Culture
2
0.36
0.54
0.42
Experiences
6
0.37
0.53
0.44
Knowledge
7
0.79
0.56
0.66
Processes
4
0.62
0.53
0.56
Systems
5
0.52
0.42
0.46
Overall, the classifier performed acceptably well, given the low sample size and uncertainties arising from the challenge of assigning seven categories, including agreement by chance (Brennan & Prediger, 1981).
Discussion
Questionnaires
Our objective was to understand how students contextualized their perceived knowledge gains to their experiences in an SLA-based course on utilizing technology in chemistry education, focusing on CK and PCK. CK and PCK measures showed no systematic improvement across the learning group, with perceived knowledge gains lacking a clear correlation to assessed capabilities. This finding aligns with existing research in pre-service teacher education, indicating that unsystematic PCK patterns underscore the absence of a definitive link between the course and summative knowledge assessments (Holtz & Gnambs, 2017; Krauskopf & Forssell, 2018). The self-reported measures increased descriptively from pre to post, but we could not find a meaningful relationship with the summative scores, limiting reliable insights over a broader range of the TPACK dimensions. We interpret the lack of a relationship such that our students contextualize their knowledge gains very diversely. We support this interpretative claim via our text analyses.
Text Analyses
We observed distinct patterns in students’ diary entries, notably one student deviating significantly in text length and content perspective.
Across the diaries, students’ reflections lacked a clear link between classroom activities, planning, and professional knowledge. This disconnect is supported by a documented inverse relationship between gaining professional knowledge and in-classroom experience time (Kulgemeyer et al., 2020, 2021), strengthening a hypothesis that the complex task of integrating digital technologies in a single classroom session overwhelms students, subsequently limiting their ability to apply their resources effectively. In our case, this resulted in reliance on general explanatory frameworks without reflecting the specifics of chemistry education, despite possibly having the necessary knowledge.
The overwhelming emergence of the “Culture” design space across diaries from the qualitative content analysis, with one student particularly emphasizing the role of digital technologies in education and the relevance of chemistry in daily life, indicates a holistic view rather than concrete professional development strategies. We observed a shift from cultural to more specific tech implementation discussions only as students detailed their thoughts, pointing to a nuanced understanding of digital media’s role in education yet lacking explicit connections to professional knowledge. In contrast, the SSLDA brought about a more nuanced perspective on our students’ writings. We see clear differences between the texts on the sentence level, reflecting the manifold contextualization of students when they report on what they learned.
Limitations
We want to present and discuss our study’s technical and conceptual limitations. While we are going to discuss our limitations with a focus on the computational text analyses, we point out that the mismatch between the categorization by humans and the topic model could also mark the deductive human coding as insufficiently capturing the rich contextualization of our students’ self-reported knowledge gains. From a technical perspective, we are strongly limited by a low sample size that is only partially compensated by multiple data collections per student, i.e., five diary entries over time. Even though we extracted interpretable patterns, statistical noise is inevitably bound to sample size and, therefore, generalizing conclusions from our sparse data must be handled with due care. This relates to a potential loss of important information due to using only two dimensions in DBSCAN. However, when we tested our workflow with more dimensions, we were left with even more noise, making the link between human interpretation and (semi-)automatic pattern detection fuzzier.
Within NLP and ML applications, data leakage can become problematic (Tschisgale et al., 2025). Data leakage metaphorically refers to detrimentally informing a predictive model, e.g., classification of unknown data, with known data from the training set. This gives rise to phenomena such as overfitting and, among other biases, hurts generalizing research results. A full-sized data-split strategy to prevent data leakage in this study, including testing, training, and validating as well as separately embedding, analyzing, and evaluating stratified texts, led to unstable and uninterpretable clusters due to the low amount of data. We tackled that with our cross-validation approach, yet due to the joint word embedding of all texts, the analysis still risks leakage from the test data to the train data. The same goes for the naïve Bayes classifier, where we tried to trade off accuracy and train-test separation with a stratification by topics rather than participants.
Also, choosing an uncased BERT variant for the word embeddings could have negatively influenced further semantic differentiation of the German texts, and it could have amplified biases already discussed within research driven by machine learning (Nwafor, 2021; Traag & Waltman, 2022). For example, Fuster-Baggetto and Fresno (2022) discuss the anisotropy of BERT models in this regard. Their demonstration of how choosing the a priori boundary conditions for text models and interpreting the respective semantics is intertwined motivates us to point out a major conceptual limitation of our study: Establishing a causal relationship between word embeddings, semantics, and human interpretation of texts and text models is unsurprisingly and incredibly difficult (Feder et al., 2022). Addressing this limitation here extensively is well beyond the study’s scope. Yet, we emphasize that asking questions about causality and its relationship to theory, data, and data interpretation should be our next step in exploring natural language processing and machine learning applications in science education research. This challenge is not even specific to technology implementation in science education research (Weidlich et al., 2023) but rather to all scientific domains (Cinelli et al., 2022; McElreath, 2020; McGowan et al., 2023; Pearl, 2009) and, consequently, a mandatory part of future NLP applications in the social sciences.
Synthesis
Despite explicit seminar parts on TPACK and explicit instructions to concentrate on professional knowledge gains in their diaries, students’ written reflections were eclectic and often lacked an explicit focus on professional development. This approach to reflection seemed simplistic but, through NLP, revealed more profound insights into their learning processes, extending beyond mere declarative knowledge gains as envisaged by the SLA. This adds to our understanding of how students contextualize their professional knowledge by further enriching the toolbox for processing formative assessments. Students discussed their knowledge gains in the context of the broader implications of becoming a chemistry teacher, emphasizing a multitude of contexts for technology integration within chemistry classrooms. The low resolution of our self-report questionnaires can be seen as a consequence of this multitude. That is, a multiple-choice self-report questionnaire is not able to capture students’ diverse views on professional knowledge growth through practical in-class technology implementation. Such a perspective aligns with recent scholarly discussions on incorporating sociocultural considerations (Thyssen et al., 2023) into technology implementation in science education or looking at the importance of non-cognitive dimensions in chemistry education (Flaherty, 2020). If science education research aims to understand these dimensions, it needs to adapt more sophisticated approaches to students’ professional development. We suggest that CGT and NLP can help to reveal and quantify the finer-grained entanglements of, e.g., learners’ cognition, affects, and attitudes. The results reflect a chemistry-specific interconnectedness of knowledge dimensions from the RCM (Carlson et al., 2019): regarding the questionnaires, our students may have had no systematic increase in pPCK. But their reflective diaries allow a better understanding of their ePCK, i.e., what they actually did with what they know. In particular, our study extracted the empirical coincidence of the five design spaces (Mishra & Warr, 2021) and perceived knowledge gains of individual students over multiple points in time. This way, we combined the technical application of CGT, research on pre-service chemistry teachers’ professional knowledge, and the development of a seminar with respect to SLA.
Regarding future pre-service teacher training at the University of Vienna, our findings underscore the SLA’s effectiveness in encouraging students to contemplate the broader questions of their future profession. This suggests that, similar to addressing students’ preconceptions in learning, SLA seminars could start from broader educational questions. Here, Mishra and Warr’s (2021) design spaces can be starting points to frame seminars addressing pre-service science teachers’ professional development of digital teaching tools. Our application of CGT within the SLA-driven seminar effectively analyzed pre-service teachers’ diaries, linking their school experiences with digital technology use in chemistry classes. However, the challenge of systematically integrating professional knowledge persists—our seminar effectively had no impact on summative knowledge measures. Nonetheless, professional knowledge is crucial for both reflective practice and effective teaching. We suggest incorporating direct instruction phases about the broader educational context into higher education teaching could mitigate these shortcomings without diminishing opportunities for reflection. If students show interest in a broader range of questions regarding technologies in teaching and learning, such themes should be conceived of as entry points. This way, teacher educators could enclose and focus on the multi-faceted affordances of digitalization and digitality. This would call for a stronger individualization of teacher education present within the SLA framework, or, in other words, to appreciate and leverage the messiness in education as a fruitful and diverse resource.
Acknowledgements
The authors are grateful for the challenging and constructive remarks of three anonymous reviewers. Alexandra Teplá supported us during the qualitative content analysis. An early discussion with Natasha G. Holmes jump-started the manuscript. The participants' work and the collaborating teachers' efforts were invaluable.
Declarations
Competing interests
The authors declare no competing interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Using Computational Grounded Theory to Analyze Pre-service Chemistry Teachers’ Reflective Practice Regarding Technology Integration in Classrooms Within a Service-Learning–Oriented Seminar