Advances in Quantitative Ethnography
7th International Conference, ICQE 2025, Mexico City, Mexico, October 11–16, 2025, Proceedings
- 2026
- Book
- Editors
- Guadalupe Carmona
- Cynthia Lima
- María Josefa Santos
- Héctor Benítez
- Luis Montero-Moguel
- Beatriz Galarza-Tohen
- Publisher
- Springer Nature Switzerland
About this book
This volume constitutes the refereed proceedings of the 7th International Conference on Quantitative Ethnography, ICQE 2025, held in Mexico City, Mexico, during October 11–16, 2025.
The 44 full papers included in this book were carefully reviewed and selected from 82 submissions. They were organized in the following topical sections:Theory, Methods, Coding, and Fairness; Gaming and Augmented Reality; Education and Self Learning and Global Collaborations, Politics, and Social Consciousness.
Table of Contents
-
Theory, Methods, Coding, and Fairness
-
Frontmatter
-
Models All the Way Down
Discourse and Other Models in Quantitative Ethnography David Williamson Shaffer, A. R. RuisAbstractThis paper re-examines the concept of segmentation in Quantitative Ethnography (QE), a topic that has historically confused new practitioners. The original formulation of segmentation, drawing analogies from poetry and employing an Initial Letter Convention (e.g., Line vs. line), created unnecessary conceptual hurdles. This work argues for retiring these confusing conventions in favor of a more intuitive and theoretically grounded framework: specifically, replacing the terms stanza and conversation with the concepts of window and horizon, respectively, to better align with contemporary theories of discourse and common ground. A window represents the immediate temporal context a person attends to, while a horizon defines the total perceptual field from which that window is drawn. This reframing is particularly vital for handling complex, multimodal data where different data streams contribute unequally to a participant’s perceived context. As a result, this paper advocates for reconceptualizing the construction of a QE analysis into four distinct modeling stages—structural, content, discourse, and semantic—as this framework offers a more transparent and defensible approach to building and justifying QE models. -
Educators’ AI Journey: Developing AI Competencies in a Professional Development Program
Jiayu Cheng, Yu Gao, Xiner Liu, Amanda Barany, Bodong ChenAbstractAs artificial intelligence (AI) becomes increasingly pervasive in education, professional development (PD) for educators to develop AI competencies has become a necessity. Despite growing offerings in AI-focused PD programs, few studies have systematically examined how educators’ AI-related understanding evolves during these programs. In this study, we investigated how educators engaged with and progressed through AI competencies across a ten-week online PD program. Drawing on the UNESCO AI Competency Framework for Teachers (AI CFT), we applied Ordered Network Analysis (ONA) to model the sequential structure of educators’ discussion posts and open-ended reflections across three program phases. Results showed that educators’ discourse primarily centered on foundational competencies, such as basic AI techniques and AI-assisted teaching . Additionally, comparison between teachers and administrators revealed distinct developmental trajectories early in the program, but these differences converged over time through shared learning experiences. This study demonstrates the utility of ONA for tracing conceptual development in PD contexts and highlights the need for intentional program design to guide educators toward developing advanced AI competencies. -
Research Leadership in the Context of Quantitative Ethnographic Work
Brendan Eagan, Adaurennaya C. Onyewuenyi, David Williamson Shaffer, A. R. RuisAbstractThis paper explores the role of research leadership skills in the context of methodological training and capacity building in quantitative ethnography (QE). To do this, we describe recent efforts to design, implement, and refine the QE Fellows Institute, a U.S. National Science Foundation-funded institute for advanced methodological training. We argue that research leadership, a set of socio-cultural skills needed to function effectively in a research environment, is central to methodological training in general and also to the ongoing development of the QE community of practice. This paper describes a particular approach to incorporating research leadership into methodological training, provides preliminary evidence about the efficacy of this approach, and examines how this approach can improve QE research praxis. -
Critical Quantitative Ethnography (CritQE): A Pathway Toward Convergence Research and Equity
Nichole Margarita Garcia, Adaurennaya C. Onyewuenyi, David Williamson ShafferAbstractThis paper introduces Critical Quantitative Ethnography (CritQE), a methodological and epistemological framework that integrates Quantitative Ethnography (QE) with Critical Social Theory (CST) to center equity, reflexivity, and epistemic justice in social science research. While QE offers tools for modeling meaning-making by combining qualitative and quantitative approaches, it often relies on epistemological assumptions that privilege consensus. Through collaborative inquiry, we critically examine the emphasis on interrater agreement in QE, arguing that treating coder disagreement as error reinforces dominant research paradigms and marginalizes diverse interpretive perspectives . In response, we introduce multivocal coding—a method that foregrounds divergence as a site of theoretical insight, acknowledging how social position and epistemological commitments inform what is seen and valued in the data. Drawing on data from a study involving Black American and Black immigrant college students, we demonstrate how multivocal coding enables researchers to engage with complexity, reveal positional dynamics, and produce ethical, inclusive analyses. This paper contributes to a growing body of scholarship that seeks to reimagine research design through the lens of critical inquiry, offering CritQE and multivocal coding as a model for transforming how meaning is constructed, validated, and applied in social science research. -
On Becausality: Revisiting Key Terms of Art in Quantitative Ethnography
David Williamson ShafferAbstractThis paper argues that we should re-examine the nomenclature of QE to: (1) clarify the way in which etic codes need to be grounded in emic symbols in the process of thick description; and (2) emphasize that the relationships between emic symbols are of interest to QE researchers because the goal of a QE analysis is to explain why people acted in particular ways rather than that they did so. Specifically, the paper advocates a terminology based on how emic symbols and becausal relationships are based on signifiers and co-occurrences in the data—and in turn, how etic codes and connections, as well as theoretical constructs and claims, are only meaningful if they are based on that emic substrate. -
Subgroup Fairness in Multilingual Text Classification
A. R. Ruis, Zhiqiang Cai, David Williamson ShafferAbstractMost research on subgroup fairness has been done with models that predict outcomes, where there typically should be parity between subgroups and main groups. In contexts where meaningful differences between different populations are expected, differences in classification could be due to actual differences or to biases in the classification process, confounding studies of subgroup fairness in such cases. This produces a fundamental challenge: how do we study the causes of and potential solutions to unfairness in classification when differences between subgroups are to be expected? To address this challenge, we aligned text from policy documents officially published in two different languages to test the fairness of classifiers designed to identify the same constructs in multiple languages by testing the extent to which the classifiers made the same coding decisions on items equivalent in content but expressed in different languages. This study presents a systematic analysis of the frequency and types of errors that occur in the classifier training process and lead to biased coding decisions, and it shows how a novel technique, negative reversion, can significantly reduce such errors. -
The Iterative Relationship Between Automated and Hand Coding Within a Quantitative Ethnography (QE) Framework: Methodological Integration and Practical Insights
Adaurennaya C. Onyewuenyi, Brendan Eagan, Danielle P. Espino, Michelle Bandiera, Alexander TanAbstractThis paper examines the iterative integration of manual and automated coding within the methodological framework of Quantitative Ethnography (QE). By combining the interpretive depth of hand coding with the scalability and consistency of automated coding, we demonstrate how iterative coding enhances analytic rigor, transparency, and validity. Drawing on an ongoing QE study exploring Black intra-racial dynamics and identity negotiation, we illustrate how iterative coding cycles—anchored in both inductive and deductive processes—refine thematic accuracy and reduce bias. Each phase, from exploratory automatic coding to manual validation and refinement, contributed to a robust coding schema that supported both contextual sensitivity and systematic reproducibility. The findings highlight key methodological insights, including the importance of clearly defined protocols, ongoing recalibration, and interdisciplinary expertise. We argue that this iterative approach aligns with QE’s core goals of integrating qualitative nuance with quantitative and computational precision, offering a scalable and reflexive model for analyzing complex, context-rich qualitative data at scale. -
Of Humans and Machines: Evaluating the Efficacy of GPT-4 in Coding Discourse Data
Omer Zahid, Jen Hsiang-Pan, Golnaz Arastoopour Irgens, Atefeh Behboudi, Alicia C. LaneAbstractWith the increasing use of Large Language Models (LLMs) for coding in qualitative research, this paper examines the ways that GPT-4o can code discourse data from a Critical Machine Learning (CML) curriculum for Black middle school girls. We question whether large data representing non-mainstream discourse can be coded successfully by the model, specifically by comparing human-coded data with outputs from three GPT-4o models: unsegmented (base), activity-wise segmentation (stanza), and sub-activity segmentation (sub-stanza). The principles of our study are based in Quantitative Ethnography, and we evaluate the LLM’s performance using Cohen’s Kappa, alongside qualitative justifications generated for each coded line. Even though GPT-4o received detailed definitions of constructs and background of the original curriculum, it showed significantly low agreement across all codes with human coders. Our findings shed light on the lack of transparency by OpenAI on training data, and the danger of semantic flattening on particular, culture-specific discourse in qualitative research. We argue for reflexivity from researchers during and after coding, care with prompt engineering, and the need for more culturally responsive AI tools in qualitative research. -
Embracing Mess: Reflection on How We Engage with QE
Danielle P. Espino, Adaurennaya C. OnyewuenyiAbstractThis paper reflects on the diverse disciplinary, epistemological, and methodological traditions that researchers bring to their engagement with Quantitative Ethnography (QE), and how these experiences shape their understanding and application of the approach. These varied pathways into QE widen its reach, strengthen its analytical rigor, and push the boundaries of how it is conceptualized and practiced. Drawing on reflective analysis and illustrative snapshots, we explore how reflexivity operates across the QE research process—from study conceptualization and data collection to coding, model development, and interpretation. Geared toward colleagues earlier in their QE journey, this paper embraces the uncertainties inherent in research and underscores the importance of intentional, reflective, and community-engaged practice for producing meaningful and methodologically sound QE scholarship. -
Expanding the Quantitative Ethnography Toolkit with Transition Network Analysis: Exploring Methodological Synergies and Boundaries
Kamila Misiejuk, Rogers Kaliisa, Sonsoles López-Pernas, Mohammed SaqrAbstractIntegrating new methods into Quantitative Ethnography (QE) presents an ongoing challenge, as well-established techniques, such as Epistemic Network Analysis have primarily shaped the field. This paper introduces Transition Network Analysis (TNA), a quantitative network modeling technique, and explores how it can be implemented using QE’s unified mixed-methods approach. We outline the core features of TNA, present a structured workflow for its implementation using the QE approach, and demonstrate its application through an illustrative example using a sample dataset. We provide a concrete model for incorporating emerging modeling techniques, thereby supporting methodological expansion and inspiring further innovation within the QE research community. -
On the Importance of Numerical and Visual Alignment: Comparing Transition Probability Matrices Visually Using Ordered Semantic Co-registration Layout and Modified Dot Layout
Yuanru Tan, Yizhou Fan, David Willamson ShafferAbstractIn this study, we examine two algorithms for visualizing and comparing transition probability matrices: ordered semantic co-registration layout (OSC) and modified dot layout (MDL). We examine how each algorithm utilizes key visual channels—position, shape, size, and color—in network layouts and encodings. Our findings show that for layouts, OSC provides consistent node positioning that reflects key group characteristics, which facilitates easier comparisons across networks. In contrast, MDL’s inconsistent node placement across groups makes it harder to compare networks visually. For encodings, OSC uses high-discriminability shapes and colors to encode node and edge weights, while MDL’s reliance on black lines and label-length-based node sizes limits its ability to visually distinguish weights. We conclude that OSC is more suited for comparing multiple transition probability matrices with a small number of nodes, while MDL may perform better in larger networks. -
ChatGPT-Assisted Codebook Design for Learning Analytics Datasets in Multiple Languages: A Case Study
Ayaz Karimov, Mirka Saarela, Xiner Liu, Zhanlan Wei, Andres Felipe Zambrano, Amanda Barany, Ryan S. Baker, Jaclyn Ocumpaugh, Sabina Mammadova, Tommi KärkkäinenAbstractThis study investigates the use of ChatGPT for the development of inductive (data-driven) codebooks from qualitative datasets in underrepresented languages (Azerbaijani and Finnish). Although prior work has leveraged GPT- 4o as a “co-researcher” that can support more efficient and comprehensive inductive codebook development, further work is needed to understand how consistent results are across languages and for translated text. The study found GPT- 4o to be useful for identifying relevant codes, but also found limitations, particularly in terms of the quality of example sentences generated for less-resourced languages. Social moderation by humans and construct evaluations were applied to refine the generated codebooks to ensure clarity and reduce redundancy. The results demonstrated that, while GPT-4o significantly aids in multilingual qualitative analysis, human intervention remains essential to validate and enhance the accuracy of the outputs. This research is particularly significant for the learning analytics field as it demonstrates scalable methods for multilingual qualitative analysis, a critical step in expanding the inclusivity and applicability of learning analytics across educational contexts. -
Exploring Differences Between Hybrid GPT-Human and Human-Created Qualitative Codebooks in an Educational Game
Xiner Liu, Zhanlan Wei, Amanda Barany, Jaclyn Ocumpaugh, Ryan S. Baker, Andres Felipe Zambrano, Yiqiu Zhou, Camille GiordanoAbstractThis study explores the ability of GPT-4 working together with humans to generate a codebook to analyze scientific observations from middle school learners in the What-if Hypothetical Implementations in Minecraft (WHIMC) project. It compares this Hybrid codebook to one fully developed by Humans using a variety of techniques to evaluate how the codes developed by each approach relate to one another and to external measures of student interest. Results show that the Hybrid GPT-Human codes consist of broader categories that align more consistently with the external interest metrics, whereas the Human codes offer finer-grained insights into specific student behaviors. However, the complementary insights offered by each suggest that combining both approaches could improve our understanding of student engagement and inform more effective strategies in educational game design and intervention. -
Reliable Confidence Intervals for Cohen’s Kappa in AI-Assisted Coding of Rare Behaviors
Zhiqiang Cai, David Willamson ShafferAbstractThe increasing use of large language models (LLMs) in qualitative research means that more researchers are using automated coding. Developing better tools and methods to assess the reliability of automated codes is thus a critical concern. In Quantitative Ethnography (QE), Cohen’s kappa (κ) is the standard measure of agreement. In what follows, we propose two new methods for estimating confidence intervals for κ: a Finite Exact test and Finite Bayesian estimation. These new methods take advantage of two pieces of information that are available to researchers in a QE context: the size of the dataset and the base rate of the automated classifier. We compare these new approaches to two existing methods: asymptotic standard error and bootstrapping. Results from 720 simulations show that the two existing methods produce inflated Type I error rates under QE-conditions: a combination of low classifier base rate (<10%), high κ threshold (>0.7), and where only a small number of human-coded samples (<1000) are available. The proposed methods have acceptable Type I error rates. Under these conditions. The proposed methods outperform make it possible to more reliably validate automated coding approaches in QE, particularly for rare but meaningful behaviors. -
Exploring Role-Based Knowledge Co-construction in Social Annotation with Epistemic Network Analysis
Yuwei Liang, Zhanlan Wei, Xiner Liu, Xinran Zhu, Yu Gao, Bodong ChenAbstractSocial annotation has emerged as a promising approach to fostering social reading and collaborative learning. However, the implementations of social annotation vary in pedagogical depth, with some lacking structured support for deep knowledge co-construction. To address this issue, role assignment—defined as the intentional allocation of predefined roles among students—has been widely adopted to guide learner participation and foster purposeful engagement. Prior research, however, has largely relied on traditional content analysis to quantify isolated knowledge co-construction behaviors without capturing how these behaviors interrelate or unfold over time in social annotation activities. To close the gap, this study employs Epistemic Network Analysis (ENA) to reveal how three assigned roles (facilitator, synthesizer, and summarizer) contributed to students’ knowledge co-construction in social annotation activities in a university-level class. Course-wide ENA revealed that both summarizers and students without assigned roles consistently linked three core practices (Externalization, Quick Consensus Building, and Integration-Oriented Consensus Building), whereas facilitators and synthesizers consistently engaged with these three practices along with Elicitation. Building on course-wide ENA, stage-specific ENA across early, middle, and late stages further illuminated how each role’s co-construction patterns evolved over time. The results underscore the need for structured guidance and intentional instructional support to foster deeper collaborative engagement in social annotation. -
More Than Words: Evidencing Qualitative Findings Through Multimodal Narratives
Yixin Cheng, Zachari SwieckiAbstractWhile texts and still images are foundationally used to evidence qualitative findings, they often require significant interpretations from the reader to extract meaning. In this study, we explored alternative ways of evidencing qualitative findings, with a particular focus on how video can support the warranting of interpretive claims. We used an investigation of generative artificial intelligence (GenAI) affordances in argumentative writing as a context to examine how video-based representations could reveal insights that might be obscured in texts or static images. We conducted case studies in which we observed and interviewed participants as they used GenAI tools in their writing. Drawing on theoretical lenses of mediated action and quantitative ethnography, we identified affordances through an ethnographic content analysis of screen recordings and transcripts. We proposed three dimensions—temporal, spatial, and contextual—to guide the use of video in capturing and analysing human-tool interactions, with particular attention to how mediated actions co-occur within meaningful stanzas. Using a multimodal narrative that combined texts, still images, and looped video clips, we showed how these dimensions reveal tool-use patterns often obscured in traditional qualitative representations. Our contribution is both theoretical and methodological: we explain why video matters in warranting qualitative findings and demonstrate when and how it can uncover mediated actions emerging through time, space, and context—and, critically, how these actions relate to one another in stanzas. -
Nothing Left Untouched: Design Case Extending Code-Wise ENA to Model Effect of Memo-Medium on Coding
Mariah A. Knowles, Amanda BaranyAbstractThis paper presents a design case demonstrating the extension of Code-wise Epistemic Network Analysis (ENA) to model the influence of memo-medium on one’s qualitative analysis. The approach taken adapts quantitative ethnographic methods, namely Code-wise ENA with Multi-Class Means Rotation (MCMR) and hierarchical clustering. By comparing models with and without the variance of the media effect removed, this paper shows the nature of this effect in detail, not just a measure that such effect occurred. The findings reveal that no conceptual category was left untouched by the media effect, though twelve stable “sub-themes” were identified. Overall, this design case underscores the importance of reflexivity in qualitative research and how quantitative tools, appropriately mixed, can support quantitative ethnographers in attending to that evaluative criteria. -
An ENA-Informed Approach to Integrating Diverse Expert Knowledge in Cognitive Work Analysis
Celeste Francis Esteves, Bowen HuiAbstractCognitive Work Analysis (CWA) integrates expert knowledge and decision making into a comprehensive framework for the design and analysis of complex systems. Subjective data collected on psychological constructs such as Situational Awareness and Mental Workload from expert operators are important inputs to CWA. Traditional forms of data collection and reporting are developed from ridged questionnaires and close-ended questions. In this study, Epistemic Network Analysis (ENA) was used to extend the interpretation of one such input - from a frequency-based summary table of expert commentary to a rich model of expert positionality that could be easily narrated and presented to member-subjects for validation or reinterpretation. Through this example, we demonstrate how an ENA-informed approach to CWA makes the practice more accessible and enhances reliability. -
Modeling Multimodal Interactions Using Epistemic Network Analysis: Key Considerations
Hanall SungAbstractIn this paper, I examine the use of Epistemic Network Analysis (ENA) for modeling multimodal interactions in learning contexts and outlines three key considerations: (1) operationalization and segmentation of multimodal events, (2) modeling approaches, and (3) interpretations of shared ENA spaces that integrate codes across modalities. These considerations are illustrated through two empirical case studies involving speech, gestures, and digital traces. By offering methodological guidance for applying ENA to multimodal data, this work advances both Quantitative Ethnography and Multimodal Learning Analytics, refining analytic techniques and providing theoretical insights into learning as a temporally entangled multimodal process. Ultimately, it contributes to more accurate modeling and understanding of learning processes and improved instructional support in educational practices. -
Computer-Assisted Code Generation Using Combination of Generative Artificial Intelligence, Stepwise Coding, and Topic Modeling
Ayano Ohsaki, Daisuke KanekoAbstractIn the current big data era, qualitative coding faces the challenge of balancing scalability and contextual depth. This study proposes a novel computer-assisted code generation framework that integrates generative artificial intelligence (GAI), stepwise coding, and topic modeling to enhance transparency and traceability in inductive analysis. Unlike prior work assuming full human coding, our approach compresses data using GAI to extract representative utterances, which are then analyzed via the Steps for Coding and Theorization (SCAT), a stepwise coding method. We compared three topic modeling techniques—latent Dirichlet allocation (LDA), biterm topic model (BTM), and BERTopic—using raw and SCAT-processed data. The results show that BTM applied to stepwise-coded data yields the most interpretable and thematically relevant topics. Coding tables constructed from BTM topics enabled epistemic network analysis (ENA) that visualized meaningful pedagogical perspective shifts before and after a technology trial. The findings suggest that the proposed hybrid approach can maintain analytical depth while supporting scalable qualitative analysis. This framework advances code generation practices in quantitative ethnography by preserving the cultural and interpretive context of human-centered inquiry.
-
- Title
- Advances in Quantitative Ethnography
- Editors
-
Guadalupe Carmona
Cynthia Lima
María Josefa Santos
Héctor Benítez
Luis Montero-Moguel
Beatriz Galarza-Tohen
- Copyright Year
- 2026
- Publisher
- Springer Nature Switzerland
- Electronic ISBN
- 978-3-032-12229-2
- Print ISBN
- 978-3-032-12228-5
- DOI
- https://doi.org/10.1007/978-3-032-12229-2
PDF files of this book have been created in accordance with the PDF/UA-1 standard to enhance accessibility, including screen reader support, described non-text content (images, graphs), bookmarks for easy navigation, keyboard-friendly links and forms and searchable, selectable text. We recognize the importance of accessibility, and we welcome queries about accessibility for any of our products. If you have a question or an access need, please get in touch with us at accessibilitysupport@springernature.com.