Skip to main content
Top

2017 | Book

Artificial Intelligence in Education

18th International Conference, AIED 2017, Wuhan, China, June 28 – July 1, 2017, Proceedings

Editors: Prof. Dr. Elisabeth André, Ryan Baker, Prof. Xiangen Hu, Ma. Mercedes T. Rodrigo, Benedict du Boulay

Publisher: Springer International Publishing

Book Series : Lecture Notes in Computer Science

insite
SEARCH

About this book

This book constitutes the refereed proceedings of the 18th International Conference on Artificial Intelligence in Education, AIED 2017, held in Wuhan, China, in June/July 2017. The 36 revised full papers presented together with 4 keynotes, 37 poster, presentations, 4 doctoral consortium papers, 5 industry papers, 4 workshop abstracts, and 2 tutorial abstracts were carefully reviewed and selected from 159 submissions. The conference provides opportunities for the cross-fertilization of approaches, techniques and ideas from the many fields that comprise AIED, including computer science, cognitive and learning sciences, education, game design, psychology, sociology, linguistics as well as many domain-specific areas.

Table of Contents

Frontmatter
Erratum to: Dusting Off the Messy Middle: Assessing Students’ Inquiry Skills Through Doing and Writing
Haiying Li, Janice Gobert, Rachel Dickler

Full Papers

Frontmatter
An Adaptive Coach for Invention Activities

A focus in recent AIED research is to create adaptive support for learners in inquiry learning environments. However, only few examples of such support have been demonstrated. Our work focuses on Invention activities, inquiry activities in which students generate representations that explain data presented as contrasting cases. To help teachers implement these activities in their classrooms, we have created and pilot-tested a dedicated adaptive computer coach (the Invention Coach) and are currently evaluating it in a classroom study. The Coach’s pedagogical strategy balances structuring and problematizing, unlike many ITSs, which favor structuring. The Coach is implemented in CTAT as a model-tracing tutor, with a rule-based model that captures its pedagogical coaching strategy, designed in part based on data from human tutors. We describe the Invention Coach and its pedagogical model. We present evidence from our pilot tests that illustrate the tutor’s versatility and provide preliminary evidence of its effectiveness. The contributions of the work are: identifying an adaptive coaching strategy for Invention tasks that balances structuring and problematizing, and an automated coach for a successful instructional method (Invention) for which few tutors have been built.

Vincent Aleven, Helena Connolly, Octav Popescu, Jenna Marks, Marianna Lamnina, Catherine Chase
Evaluating the Effect of Uncertainty Visualisation in Open Learner Models on Students’ Metacognitive Skills

Self-assessment is widely used in open learner models (OLMs) as a metacognitive process to enhance students’ self-regulated learning. Yet little research has investigated the impact of the visualisation when the OLM shows the conflict (i.e., uncertainty) between the system’s beliefs about student knowledge and students’ confidence in the correctness of their answers. We deployed such an OLM and studied its use. The impact of the uncertainty visualisation on student learning, confidence gains and actions was determined by comparing these measures across two treatment conditions and a control condition. Those who accessed the OLM performed significantly better on the post-test, and those in the treatment group who could see both sets of beliefs separately showed greater confidence gains and used the system more.

Lamiya Al-Shanfari, Carrie Demmans Epp, Chris Baber
Collaboration Improves Student Interest in Online Tutoring

Prior research indicates that students often experience negative emotions while using online learning environments, and that most of these negative emotions can have a detrimental impact on their behavior and learning outcomes. We investigate the impact of a particular intervention, namely face-to-face collaboration with a neighboring student, on student boredom and frustration. The data comes from a study with 106 middle school students interacting with a mathematics tutor that provided varying levels of collaboration. Students were randomly assigned to a collaboration or no-collaboration condition. Collaboration was associated with reduced boredom: Students who collaborated more frequently reported increased interest.

Ivon Arroyo, Naomi Wixon, Danielle Allessio, Beverly Woolf, Kasia Muldner, Winslow Burleson
Improving Sensor-Free Affect Detection Using Deep Learning

Affect detection has become a prominent area in student modeling in the last decade and considerable progress has been made in developing effective models. Many of the most successful models have leveraged physical and physiological sensors to accomplish this. While successful, such systems are difficult to deploy at scale due to economic and political constraints, limiting the utility of their application. Examples of “sensor-free” affect detectors that assess students based solely using data on the interaction between students and computer-based learning platforms exist, but these detectors generally have not reached high enough levels of quality to justify their use in real-time interventions. However, the classification algorithms used in these previous sensor-free detectors have not taken full advantage of the newest methods emerging in the field. The use of deep learning algorithms, such as recurrent neural networks (RNNs), have been applied to a range of other domains including pattern recognition and natural language processing with success, but have only recently been attempted in educational contexts. In this work, we construct new “deep” sensor-free affect detectors and report significant improvements over previously reported models.

Anthony F. Botelho, Ryan S. Baker, Neil T. Heffernan
ReaderBench Learns Dutch: Building a Comprehensive Automated Essay Scoring System for Dutch Language

Automated Essay Scoring has gained a wider applicability and usage with the integration of advanced Natural Language Processing techniques which enabled in-depth analyses of discourse in order capture the specificities of written texts. In this paper, we introduce a novel Automatic Essay Scoring method for Dutch language, built within the Readerbench framework, which encompasses a wide range of textual complexity indices, as well as an automated segmentation approach. Our method was evaluated on a corpus of 173 technical reports automatically split into sections and subsections, thus forming a hierarchical structure on which textual complexity indices were subsequently applied. The stepwise regression model explained 30.5% of the variance in students’ scores, while a Discriminant Function Analysis predicted with substantial accuracy (75.1%) whether they are high or low performance students.

Mihai Dascalu, Wim Westera, Stefan Ruseti, Stefan Trausan-Matu, Hub Kurvers
Keeping the Teacher in the Loop: Technologies for Monitoring Group Learning in Real-Time

Learning in groups allows students to develop academic and social competencies but requires the presence of a human teacher that is actively guiding the group. In this paper we combine data-mining and visualization tools to support teachers’ understanding of learners’ activities in an inquiry based learning environment. We use supervised learning to recognize salient states of activity in the group’s work, such as reaching a solution to a problem, exhibiting idleness, or experiencing technical challenges. These “critical” moments are visualized to teachers in real time, allowing them to monitor several groups in parallel and to intervene when necessary to guide the group. We embedded this technology in a new system, called SAGLET, which augments existing collaborative educational software and was evaluated empirically in real classrooms. We show that the recognition capabilities of SAGLET are compatible with that of a human domain expert. Teachers were able to use the system successfully to make intervention decisions in groups when deemed necessary, without overwhelming them with information. Our results demonstrate how AI can be used to augment existing educational environments to support the “teacher in the group”, and to scale up the benefits of group learning to the actual classroom.

Avi Segal, Shaked Hindi, Naomi Prusak, Osama Swidan, Adva Livni, Alik Palatnic, Baruch Schwarz, Ya’akov (Kobi) Gal
An Extensible Domain-Specific Language for Describing Problem-Solving Procedures

An intelligent tutoring system (ITS) is often described as having an inner loop for supporting solving tasks step by step, and an outer loop for selecting tasks. Many task domains have problem-solving procedures that express how tasks can be solved by applying steps or rules in a controlled way. In this paper we collect established ITS design principles, and use the principles to compare and evaluate existing ITS paradigms with respect to the way problem-solving procedures are specified. We argue that problem-solving procedures need an explicit representation, which is missing in most ITSs. We present an extensible domain-specific language (DSL) that provides a rich vocabulary for accurately describing procedures. We give three examples of tutors from different task domains that illustrate our DSL approach and highlight important qualities such as modularity, extensibility, and reusability.

Bastiaan Heeren, Johan Jeuring
Effects of Error-Based Simulation as a Counterexample for Correcting MIF Misconception

MIF (Motion Implies a Force) misconception is commonly observed in elementary mechanics learning where students think some force is applied to moving objects. This paper reports a practical use of Error-based Simulation (EBS) for correcting students’ MIF misconceptions in a junior high school and a technical college. EBS is a method to generate a phenomenon by using students’ erroneous idea (e.g., if a student thinks forward force applied to a skater traveling straight on ice at a constant velocity, EBS shows the skater accelerates). Such a phenomenon is supposed to work as a counterexample to students’ misconception. In the practice, students first worked on pre-test of five problems (called ‘learning task’), in each of which they drew all the forces applied to objects in a mechanical situation. They then worked on the same problems on system where EBSs were shown based on their answer. They last worked on post-test of the previous plus four new problems (called ‘transfer task’). As a result, in both schools, the numbers of MIF-answers (the erroneous answers supposed due to MIF misconception) in learning task decreased significantly between pre-test and post-test. Effect sizes of the decrease of MIF-answers were larger than that of other erroneous answers. Additionally, the percentages of MIF-answers to the whole erroneous answers in transfer task were much lower than those in learning task. These results suggest learning with EBS not only has the effect on the resolution of MIF misconception, but also promoted the correction of errors in conceptual level.

Tsukasa Hirashima, Tomoya Shinohara, Atsushi Yamada, Yusuke Hayashi, Tomoya Horiguchi
Algorithm for Uniform Test Assembly Using a Maximum Clique Problem and Integer Programming

Educational assessments occasionally require “uniform test forms” for which each test form consists of a different set of items, but the forms meet equivalent test specifications (i.e., qualities indicated by test information functions based on item response theory). For uniform test assembly, one of most important issues is to increase the number of assembled tests. This study proposes a new algorithm, RIPMCP, to improve the number of assembled tests. RIPMCP applies a maximum clique algorithm and integer programming for assembling uniform tests. RIPMCP requires less computational space resources, thus, the proposal can assemble a greater number of tests than the previous methods on the same computational environment. Finally, we demonstrate the advantage of the proposal using simulated and actual data.

Takatoshi Ishii, Maomi Ueno
Personalized Tag-Based Knowledge Diagnosis to Predict the Quality of Answers in a Community of Learners

Professionals in a discipline often interact with other professionals to help them keep up to date in their field, to overcome impasses, to answer questions, in short to meet their knowledge needs. Such professionals are essentially engaged in lifelong learning, and the platform that helps them interact with each other essentially supports a community of professional learners. In our research we have been studying one such community, the community of programmers supported by Stack Overflow (SO), with the ultimate goal of diagnosing the knowledge needs of the SO users in such an open ended and evolving learning environment. In this paper, we report on a study that is a step in the direction of achieving this goal. In particular we diagnosed the knowledge of users in SO to see if their performance level in answering questions could be predicted from their previous behavior. We used a tag-based knowledge model and a Naive Bayes model in making predictions. We measured the success of our predictions using 10-fold cross validation, root mean square deviation, and mean absolute error. Over different sample sizes and different numbers of tags, we achieved prediction accuracy ranging between 84.644% and 91.709%, root mean square error ranging between 0.0517 and .0629, and mean absolute error ranging between 0.011 and .0115. This level of success suggests the potential to provide adaptive feedback about an individual’s knowledge needs even before poor answers are provided. The approach has the further advantages of being lightweight (requiring minimal knowledge engineering) and of having the potential to evolve naturally with changes in the learner’s knowledge and changes in the disciplinary knowledge.

Oluwabukola Mayowa Ishola, Gordon McCalla
iSTART-ALL: Confronting Adult Low Literacy with Intelligent Tutoring for Reading Comprehension

There is little empirical research available on the substantial problem of adult low literacy rates, and limited educational technologies are available to address distinct instructional needs of this population. This paper reports on development and testing of a version of Interactive Strategy Training for Active Reading and Thinking (iSTART) for Adult Literacy Learners (iSTART-ALL) We describe modifications of iSTART to accommodate adult literacy learners, including new practice modules (i.e., summarization, question asking), a new library of texts, and an interactive narrative for adult literacy learners to engage in extended practice of reading comprehension strategies. We report results of a study examining reactions to iSTART-ALL and performance data while engaging with the interactive narrative. The attitudinal study, conducted with 38 adult literacy learners, demonstrated generally positive reactions to the narrative. Results also revealed that task performance was strongly related to individual difference scores on reading comprehension assessments, and more so with higher-level comprehension skills than basic word-level skills, providing concurrent validity for the interactive narrative tasks.

Amy M. Johnson, Tricia A. Guerrero, Elizabeth L. Tighe, Danielle S. McNamara
Adapting Step Granularity in Tutorial Dialogue Based on Pretest Scores

We explore the effectiveness of adaptively deciding whether to further decompose a step in a line of reasoning during tutorial dialogue based on students’ pretest scores. We compare two versions of a tutorial dialogue system in high school classrooms: one that always decomposes a step to its simplest substeps and one that adaptively decides to decompose a step based on a student’s performance on pretest items that target the knowledge required to correctly answer that step. We hypothesize that students using the two versions of the tutoring system will learn similarly but that students who use the version that adaptively decomposes a step will learn more efficiently. Our results from classroom studies suggest support for our hypothesis. While students learned similarly and with similar efficiency across conditions, high prior knowledge students in the adaptive condition learned significantly more efficiently than high prior knowledge students in the control condition and learned similar amounts.

Pamela Jordan, Patricia Albacete, Sandra Katz
The Impact of Student Individual Differences and Visual Attention to Pedagogical Agents During Learning with MetaTutor

In this paper, we investigate the relationship between students’ (N = 28) individual differences and visual attention to pedagogical agents (PAs) during learning with MetaTutor, a hypermedia-based intelligent tutoring systems. We used eye tracking to capture visual attention to the PAs, and our results reveal specific visual attention-related metrics (e.g., fixation rate, longest fixations) that are significantly influenced by learning depending on student achievement goals. Specifically, performance-oriented students learned more with a long longest fixation and a high fixation rate on the PAs, whereas mastery-oriented students learned less with a high fixation rate on the PAs. Our findings contribute to understanding how to design PAs that can better adapt to student achievement goals and visual attention to the PA.

Sébastien Lallé, Michelle Taub, Nicholas V. Mudrick, Cristina Conati, Roger Azevedo
Automatic Extraction of AST Patterns for Debugging Student Programs

When implementing a programming tutor it is often difficult to manually consider all possible errors encountered by students. An alternative is to automatically learn a bug library of erroneous patterns from students’ programs. We propose abstract-syntax-tree (AST) patterns as features for learning rules to distinguish between correct and incorrect programs. We use these rules to debug student programs: rules for incorrect programs (buggy rules) indicate mistakes, whereas rules for correct programs group programs with the same solution strategy. To generate hints, we first check buggy rules and point out incorrect patterns. If no buggy rule matches, we use rules for correct programs to recognize the student’s intent and suggest missing patterns. We evaluate our approach on past student programming data for a number of Prolog problems. For 31 out of 44 problems, the induced rules correctly classify over 85% of programs based only on their structural features. For approximately 73% of incorrect submissions, we are able to generate hints that were implemented by the student in some subsequent submission.

Timotej Lazar, Martin Možina, Ivan Bratko
Dusting Off the Messy Middle: Assessing Students’ Inquiry Skills Through Doing and Writing

Researchers are trying to develop assessments for inquiry practices to elicit students’ deep science learning, but few studies have examined the relationship between students’ doing, i.e. performance assessment, and writing, i.e. open responses, during inquiry. Inquiry practices include generating hypotheses, collecting data, interpreting data, warranting claims, and communicating findings [1]. The first four practices involve “doing” science, whereas the last involves writing scientific explanations, i.e. arguing using evidence. In this study, we explored whether what students wrote in their constructed responses reflected what they did during science inquiry in the Inq-ITS system. Results showed that more than half of the students’ writing did not match what they did in the environment. Findings revealed multiple types of students in the messy middle, which has implications for both teacher instruction and intelligent tutoring systems, such as Inq-ITS, in terms of providing real-time feedback for students to address the full complement of inquiry practices [1].

Haiying Li, Janice Gobert, Rachel Dickler
Impact of Pedagogical Agents’ Conversational Formality on Learning and Engagement

This study investigated the impact of pedagogical agents’ conversational formality on learning and engagement in a trialog-based intelligent tutoring system (ITS). Participants (N = 167) were randomly assigned into one of three conditions to learn summarization strategies with the conversational agents: (1) a formal condition in which both the teacher agent and the student agent spoke with a formal language style, (2) an informal condition in which both agents spoke informally, and (3) a mixed condition in which the teacher agent spoke formally, whereas the student agent spoke informally. Result showed that the agents’ informal discourse yielded higher performance, but elicited higher report of text difficulty and mind wandering. This discourse also caused longer response time and lower arousal. The implications are discussed.

Haiying Li, Art Graesser
iSTART Therefore I Understand: But Metacognitive Supports Did not Enhance Comprehension Gains

iSTART is an intelligent tutoring system designed to provide self-explanation instruction and practice to improve students’ comprehension of complex, challenging text. This study examined the effects of extended game-based practice within the system as well as the effects of two metacognitive supports implemented within this practice. High school students (n = 234) were either assigned to an iSTART treatment condition or a control condition. Within the iSTART condition, students were assigned to a 2 × 2 design in which students provided self-assessments of their performance or were transferred to Coached Practice if their performance did not reach a certain performance threshold. Those receiving iSTART training produced higher self-explanation and inference-based comprehension scores. However, there were no direct effects of either metacognitive support on these learning outcomes.

Kathryn S. McCarthy, Matthew E. Jacovina, Erica L. Snow, Tricia A. Guerrero, Danielle S. McNamara
Inducing Stealth Assessors from Game Interaction Data

A key untapped feature of game-based learning environments is their capacity to generate a rich stream of fine-grained learning interaction data. The learning behaviors captured in these data provide a wealth of information on student learning, which stealth assessment can utilize to unobtrusively draw inferences about student knowledge to provide tailored problem-solving support. In this paper, we present a long short-term memory network (LSTM)-based stealth assessment framework that takes as input an observed sequence of raw game-based learning environment interaction data along with external pre-learning measures to infer students’ post-competencies. The framework is evaluated using data collected from 191 middle school students interacting with a game-based learning environment for middle grade computational thinking. Results indicate that LSTM-based stealth assessors induced from student game-based learning interaction data outperform comparable models that required labor-intensive hand-engineering of input features. The findings suggest that the LSTM-based approach holds significant promise for evidence modeling in stealth assessment.

Wookhee Min, Megan H. Frankosky, Bradford W. Mott, Eric N. Wiebe, Kristy Elizabeth Boyer, James C. Lester
Supporting Constructive Video-Based Learning: Requirements Elicitation from Exploratory Studies

Although videos are a highly popular digital medium for learning, video watching can be a passive activity and results in limited learning. This calls for interactive means to support engagement and active video watching. However, there is limited insight into what engagement challenges have to be overcome and what intelligent features are needed. This paper presents an empirical way to elicit requirements for innovative functionality to support constructive video-based learning. We present two user studies with an active video watching system instantiated for soft skill learning (pitch presentations). Based on the studies, we identify whether learning is happening and what kind of interaction contributes to learning, what difficulties participants face and how these can be overcome with additional intelligent support. Our findings show that participants who engaged in constructive learning have improved their conceptual understanding of presentation skills, while those who exhibited more passive ways of learning have not improved as much as constructive learners. Analysis of participants’ profiles and experiences led to requirements for intelligent support with active video watching. Based on this, we propose intelligent nudging in the form of signposting and prompts to further promote constructive learning.

Antonija Mitrovic, Vania Dimitrova, Lydia Lau, Amali Weerasinghe, Moffat Mathews
Affect Dynamics in Military Trainees Using vMedic: From Engaged Concentration to Boredom to Confusion

The role of affect in learning has received increasing attention from AIED researchers seeking to understand how emotion and cognition interact in learning contexts. The dynamics of affect over time have been explored in a variety of research environments, allowing researchers to determine the extent to which common patterns are captured by hypothesized models. This paper present an analysis of affect dynamics among learners using vMedic, which teaches combat medicine protocols as part of the military training at West Point, the United States Military Academy. In doing so, we seek both to broaden the variety of learning contexts being explored in order better understand differences in these patterns and to test the theoretical predictions on the development of affect over time.

Jaclyn Ocumpaugh, Juan Miguel Andres, Ryan Baker, Jeanine DeFalco, Luc Paquette, Jonathan Rowe, Bradford Mott, James Lester, Vasiliki Georgoulas, Keith Brawner, Robert Sottilare
Behavioral Engagement Detection of Students in the Wild

This paper aims to investigate students’ behavioral engagement (On-Task vs. Off-Task) in authentic classrooms. We propose a two-phased approach for automatic engagement detection: In Phase 1, contextual logs are utilized to assess active usage of the content platform. If there is active use, the appearance information is utilized in Phase 2 to infer behavioral engagement. Through authentic classroom pilots, we collected around 170 hours of in-the-wild data from 28 students in two different classrooms using two different content platforms (one for Math and one for English as a Second Language (ESL)). Our data collection application captured appearance data from a 3D camera and context data from uniform resource locator (URL) logs. We experimented with two test cases: (1) Cross-classroom, where trained models were tested on a different classroom’s data; (2) Cross-platform, where the data collected in different subject areas (Math or ESL) were utilized in training and testing, respectively. For the first case, the behavioral engagement was detected with an F1-score of 77%, using only appearance. Incorporating the contextual information improved the overall performance to 82%. For the second case, even though the subject areas and content platforms changed, the proposed appearance classifier still achieved 72% accuracy (compared to 77%). Our experiments proved that the accuracy of the proposed model is not adversely impacted considering different set of students or different subject areas.

Eda Okur, Nese Alyuz, Sinem Aslan, Utku Genc, Cagri Tanriover, Asli Arslan Esme
Improving Reading Comprehension with Automatically Generated Cloze Item Practice

This study investigated the effect of cloze item practice on reading comprehension, where cloze items were either created by humans, by machine using natural language processing techniques, or randomly. Participants from Amazon Mechanical Turk ($$N=302$$) took a pre-test, read a text, and took part in one of five conditions, Do-Nothing, Re-Read, Human Cloze, Machine Cloze, or Random Cloze, followed by a 24-hour retention interval and post-test. Participants used the MoFaCTS system [27], which in cloze conditions presented items adaptively based on individual success with each item. Analysis revealed that only Machine Cloze was significantly higher than the Do-Nothing condition on post-test, $$d=.58$$, $$CI_{95} [.21,.94]$$. Additionally, Machine Cloze was significantly higher than Human and Random Cloze conditions on post-test, $$d=.49$$, $$CI_{95} [.12,.86]$$ and $$d=.71$$, $$CI_{95} [.34,1.09]$$ respectively. These results suggest that Machine Cloze items generated using natural language processing techniques are effective for enhancing reading comprehension when delivered by an adaptive practice scheduling system.

Andrew M. Olney, Philip I. Pavlik Jr., Jaclyn K. Maass
Variations of Gaming Behaviors Across Populations of Students and Across Learning Environments

Although gaming the system, a behavior in which students attempt to solve problems by exploiting help functionalities of digital learning environments, has been studied across multiple learning environments, little research has been done to study how (and whether) gaming manifests differently across populations of students and learning environments. In this paper, we study the differences in usage of 13 different patterns of actions associated with gaming the system by comparing their distribution across different populations of students using Cognitive Tutor Algebra and across students using one of three learning environments: Cognitive Tutor Algebra, Cognitive Tutor Middle School and ASSISTments. Results suggest that differences in gaming behavior are more strongly associated to the learning environments than to student populations and reveal different trends in how students use fast actions, similar answers and help request in different systems.

Luc Paquette, Ryan S. Baker
Identifying Productive Inquiry in Virtual Labs Using Sequence Mining

Virtual labs are exploratory learning environments in which students learn by conducting inquiry to uncover the underlying scientific model. Although students often fail to learn efficiently in these environments, providing effective support is challenging since it is unclear what productive engagement looks like. This paper focuses on the mining and identification of student inquiry strategies during an unstructured activity with the DC Circuit Construction Kit (https://phet.colorado.edu/). We use an information theoretic sequence mining method to identify productive and unproductive strategies of a hundred students. Low domain knowledge students who successfully learned during the activity paused more after testing their circuits, particularly on simply structured circuits that target the activity’s learning goals, and mainly earlier in the activity. Moreover, our results show that a strategic use of pauses so that they become opportunities for reflection and planning is highly associated with productive learning. Implication to theory, support, and assessment are discussed.

Sarah Perez, Jonathan Massey-Allard, Deborah Butler, Joss Ives, Doug Bonn, Nikki Yee, Ido Roll
“Thanks Alisha, Keep in Touch”: Gender Effects and Engagement with Virtual Learning Companions

Virtual learning companions have shown significant potential for supporting students. However, there appear to be gender differences in their effectiveness. In order to support all students well, it is important to develop a deeper understanding of the role that student gender plays during interactions with learning companions. This paper reports on a study to explore the impact of student gender and learning companion design. In a three-condition study, we examine middle school students’ interactions in a game-based learning environment that featured one of the following: (1) a learning companion deeply integrated into the narrative of the game; (2) a learning companion whose backstory and personality were not integrated into the narrative but who provided equivalent task support; and (3) no learning companion. The results show that girls were significantly more engaged than boys, particularly with the narrative-integrated agent, while boys reported higher mental demand with that agent. Even when controlling for video game experience and prior knowledge, the gender effects held. These findings contribute to the growing understanding that learning companions must adapt to students’ gender in order to facilitate the most effective learning interactions.

Lydia G. Pezzullo, Joseph B. Wiggins, Megan H. Frankosky, Wookhee Min, Kristy Elizabeth Boyer, Bradford W. Mott, Eric N. Wiebe, James C. Lester
Hint Generation Under Uncertainty: The Effect of Hint Quality on Help-Seeking Behavior

Much research in Intelligent Tutoring Systems has explored how to provide on-demand hints, how they should be used, and what effect they have on student learning and performance. Most of this work relies on hints created by experts and assumes that all help provided by the tutor is correct and of high quality. However, hints may not all be of equal value, especially in open-ended problem solving domains, where context is important. This work argues that hint quality, especially when using data-driven hint generation techniques, is inherently uncertain. We investigate the impact of hint quality on students’ help-seeking behavior in an open-ended programming environment with on-demand hints. Our results suggest that the quality of the first few hints on an assignment is positively associated with future hint use on the same assignment. Initial hint quality also correlates with possible help abuse. These results have important implications for hint design and generation.

Thomas W. Price, Rui Zhi, Tiffany Barnes
Balancing Learning and Engagement in Game-Based Learning Environments with Multi-objective Reinforcement Learning

Game-based learning environments create rich learning experiences that are both effective and engaging. Recent years have seen growing interest in data-driven techniques for tutorial planning, which dynamically personalize learning experiences by providing hints, feedback, and problem scenarios at run-time. In game-based learning environments, tutorial planners are designed to adapt gameplay events in order to achieve multiple objectives, such as enhancing student learning or student engagement, which may be complementary or competing aims. In this paper, we introduce a multi-objective reinforcement learning framework for inducing game-based tutorial planners that balance between improving learning and engagement in game-based learning environments. We investigate a model-based, linear-scalarized multi-policy algorithm, Convex Hull Value Iteration, to induce a tutorial planner from a corpus of student interactions with a game-based learning environment for middle school science education. Results indicate that multi-objective reinforcement learning creates policies that are more effective at balancing multiple reward sources than single-objective techniques. A qualitative analysis of select policies and multi-objective preference vectors shows how a multi-objective reinforcement learning framework shapes the selection of tutorial actions during students’ game-based learning experiences to effectively achieve targeted learning and engagement outcomes.

Robert Sawyer, Jonathan Rowe, James Lester
Is More Agency Better? The Impact of Student Agency on Game-Based Learning

Student agency has long been viewed as a critical element in game-based learning. Agency refers to the degree of freedom and control that a student has to perform meaningful actions in a learning environment. While long postulated to be central to student self-regulation, there is limited evidence on the design of game-based learning environments that promote student agency and its effect on learning. This paper reports on an experiment to investigate the impact of student agency on learning and problem-solving behavior in a game-based learning environment for microbiology. Students interacted with one of three versions of the system. In the High Agency condition, students could freely navigate the game’s 3D open-world environment and perform problem-solving actions in any order they chose. In the Low Agency condition, students were required to traverse the environment and solve the mystery in a prescribed partially ordered sequence. In the No Agency condition, students watched a video of an expert playing the game by following an “ideal path” for solving the problem scenario. Results indicate that students in the Low Agency condition achieved greater learning gains than students in both the High Agency and No Agency conditions, but exhibited more unproductive behaviors, suggesting that artfully striking a balance between high and low agency best supports learning.

Robert Sawyer, Andy Smith, Jonathan Rowe, Roger Azevedo, James Lester
Can a Teachable Agent Influence How Students Respond to Competition in an Educational Game?

Learning in educational games is often associated with some form of competition. We investigated how students responded to winning or losing in an educational math game, with respect to playing with or without a Teachable Agent (TA). Students could choose between game modes in which the TA took a more passive or active role, or let the TA play a game entirely on its own. Based on the data logs from 3983 games played by 163 students (age 10–11), we analyzed data on students’ persistence, challenge-seeking and performance during gameplay. Results indicated that students showed greater persistence when playing together with the TA, by more often repeating a lost game with the TA, than a lost game after playing alone. Students’ challenge-seeking, by increasing the difficulty level, was greater following a win than following a loss, especially after the TA won on its own. Students’ gameplay performance was unaffected by their TA winning or losing but was, unexpectedly, slightly worse following a win by the student alone. We conclude that engaging a TA can make students respond more productively to both winning and losing, depending on the particular role the TA takes in the game. These results may inform more specific hypotheses as to the differential effects of competing and collaborating in novel, AI-supported social constellations, such as with TAs, on students’ motivation and ego-involvement in educational games.

Björn Sjödén, Mats Lind, Annika Silvervarg
Face Forward: Detecting Mind Wandering from Video During Narrative Film Comprehension

Attention is key to effective learning, but mind wandering, a phenomenon in which attention shifts from task-related processing to task-unrelated thoughts, is pervasive across learning tasks. Therefore, intelligent learning environments should benefit from mechanisms to detect and respond to attentional lapses, such as mind wandering. As a step in this direction, we report the development and validation of the first student-independent facial feature-based mind wandering detector. We collected training data in a lab study where participants self-reported when they caught themselves mind wandering over the course of completing a 32.5 min narrative film comprehension task. We used computer vision techniques to extract facial features and bodily movements from videos. Using supervised learning methods, we were able to detect a mind wandering with an F1 score of .390, which reflected a 31% improvement over a chance model. We discuss how our mind wandering detector can be used to adapt the learning experience, particularly for online learning contexts.

Angela Stewart, Nigel Bosch, Huili Chen, Patrick Donnelly, Sidney D’Mello
Modeling the Incubation Effect Among Students Playing an Educational Game for Physics

We attempted to model the Incubation Effect, a phenomenon in which a momentary break helps the generation of a solution to a problem, among students playing Physics Playground. We performed a logistic regression analysis to predict the outcome of the incubation using a genetic algorithm for feature selection. Out of 14 candidate features, those that significantly predicted the outcome were total badges earned prior to post-incubation, the problem’s level of difficulty, total attempts made prior to post-incubation, and time interval of post-incubation. We found evidence that incubation in the earlier part of the game is more beneficial than breaks at the later part where students may already be mentally exhausted.

May Marie P. Talandron, Ma. Mercedes T. Rodrigo, Joseph E. Beck
Predicting Learner’s Deductive Reasoning Skills Using a Bayesian Network

Logic-Muse is an Intelligent Tutoring System (ITS) that helps improve deductive reasoning skills in multiple contexts. All its three main components (The learner, the tutor and the expert models) have been developed while relying on the help of experts and on important work in the field of reasoning and computer science. It is now known that one can’t support a student in a learning task without being aware of his level of skills (what he/she knows and what he/she needs to know). Thus, it is important in the setting up of the learner model to consider an efficient mechanism that can both assess and predict her skills. This paper describes the Bayesian Network (that allows real time diagnosis, prediction and modeling of the learner’s state of skills) implemented in the learner component of Logic-Muse. We proved that the BN (Bayesian Network) is able to predict with an accuracy near 85%, the answers of learners on different exercises of the domain. Given this result, the system is therefore able to predict the learner’s deductive reasoning skills at a given time and help the tutor model for a better assessment and coaching.

Ange Tato, Roger Nkambou, Janie Brisson, Serge Robert
Group Optimization to Maximize Peer Assessment Accuracy Using Item Response Theory

As an assessment method based on a social constructivist approach, peer assessment has become popular in recent years. When the number of learners increases as in MOOCs, peer assessment is often conducted by dividing learners into multiple groups to reduce the learner’s assessment workload. However, in this case, a difficulty remains that the assessment accuracies of learners in each group depends on the assigned rater. To solve that problem, this study proposes a group optimization method to maximize peer assessment accuracy based on item response theory using integer programming. Experimental results, however, showed that the proposed method does not necessarily present higher accuracy than a random group formation. Therefore, we further propose an external rater selection method that assigns a few outside-group raters to each learner. Simulation and actual data experiments demonstrate that introduction of external raters using the proposed method improves the peer assessment accuracy considerably.

Masaki Uto, Nguyen Duc Thien, Maomi Ueno
What Matters in Concept Mapping? Maps Learners Create or How They Create Them

Generative strategies, where learners process the target content while connecting different concepts to build a knowledge network, has shown potential to improve student learning outcomes. While concept maps in particular have been linked to the development of generative strategies, few studies have explored structuring the concept mapping process to support generative strategies, and few studies offer intelligent support. In this work, we present a concept mapping tool that offers navigational support in the form of hyperlinks, where nodes in the concept map are linked to segments of text. We evaluate the effect of the hyperlinks on generative strategies and learning outcomes through a week-long high school study with 32 participants. Our results indicate that proper navigational and visual aid during concept mapping facilitates the development of generative strategies, with implications for learning outcomes. Based on these findings, we propose a constraint-based tutoring system to adaptively support the development of generative strategies in concept mapping.

Shang Wang, Erin Walker, Ruth Wylie
Reliability Investigation of Automatic Assessment of Learner-Build Concept Map with Kit-Build Method by Comparing with Manual Methods

This paper describes an investigation into the reliability of an automatic assessment method of the learner-build concept map by comparing it with two well-known manual methods. We have previously proposed the Kit-Build (KB) concept map framework where a learner builds a concept map by using only a provided set of components, known as the set “kit”. In this framework, instant and automatic assessment of a learner-build concept map has been realized. We call this assessment method the “Kit-Build method” (KB method). The framework and assessment method have already been practically used in classrooms in various schools. As an investigation of the reliability of this method, we have conducted an experiment to compare the assessment results of the method with the assessment results of two other manual assessment methods. In this experiment, 22 university students attended as subjects and four as raters. It was found that the scores of the KB method had a very strong correlation with the scores of the other manual methods. The results suggest that automatic assessment of the Kit-Build concept map can attain almost the same level of reliability as well-known manual assessment methods.

Warunya Wunnasri, Jaruwat Pailai, Yusuke Hayashi, Tsukasa Hirashima
Characterizing Students’ Learning Behaviors Using Unsupervised Learning Methods

In this paper, we present an unsupervised approach for characterizing students’ learning behaviors in an open-ended learning environment. We describe our method for generating metrics that describe a learner’s behaviors and performance using Coherence Analysis. Then we combine feature selection with a clustering method to group students by their learning behaviors. We characterize the primary behaviors of each group and link these behaviors to the students’ ability to build correct models as well as their learning gains derived from their pre- and post-test scores. Finally, we discuss how this behavior characterization may contribute to a framework for adaptive scaffolding of learning behaviors.

Ningyu Zhang, Gautam Biswas, Yi Dong

Poster Papers

Frontmatter
Student Preferences for Visualising Uncertainty in Open Learner Models

User preferences for indicating uncertainty using specific visual variables have been explored outside of educational reporting. Exploring students’ preferred method to indicate uncertainty in open learner models can provide hints about which approaches students will use, so further design approaches can be considered. Participants were 67 students exploring 6 visual variables applied to a learner model visualisation (skill meter). Student preferences were ordered along a scale, which showed the size, numerosity, orientation and added marks visual variables were near one another in the learner’s preference space. Results of statistical analyses revealed differences in student preferences for some variables with opacity being the most preferred and arrangement the least preferred. This result provides initial guidelines for open learner model and learning dashboard designers to represent uncertainty information using students’ preferred method of visualisation.

Lamiya Al-Shanfari, Chris Baber, Carrie Demmans Epp
Intelligent Augmented Reality Tutoring for Physical Tasks with Medical Professionals

Percutaneous radiology procedures often require the repeated use of medical radiation in the form of computed tomography (CT) scanning, to demonstrate the position of the needle in the underlying tissues. The angle of the insertion and the distance travelled by the needle inside the patient play a major role in successful procedures, and must be estimated by the practitioner and confirmed periodically by the use of the scanner. Junior radiology trainees, who are already highly trained professionals, currently learn this task “on-the-job” by performing the procedures on real patients with varying levels of guidance. Therefore, we present a novel Augmented Reality (AR)-based system that provides multiple layers of intuitive and adaptive feedback to assist junior radiologists in achieving competency in image-guided procedures.

Mohammed A. Almiyad, Luke Oakden-Rayner, Amali Weerasinghe, Mark Billinghurst
Synthesis of Problems for Shaded Area Geometry Reasoning

A shaded area problem in high school geometry consists of a figure annotated with facts such as lengths of line segments or angle measures, and asks to compute the area of a shaded portion of the figure. We describe a technique to generate fresh figures for these problems. Given a figure, we describe a technique to automatically synthesize shaded area problems. We demonstrate the efficacy of our synthesis techniques by synthesizing problems from fresh figures as well as figures from a corpus of problems from high-school geometry textbooks.

Chris Alvin, Sumit Gulwani, Rupak Majumdar, Supratik Mukhopadhyay
Communication Strategies and Affective Backchannels for Conversational Agents to Enhance Learners’ Willingness to Communicate in a Second Language

Willingness to Communicate (WTC) in a second language (L2) is believed to have a direct and sustained influence on learners’ actual usage frequency of the targeted language. To help overcome the lack of suitable environments to increase L2 learners WTC, our approach is to implement a WTC model based conversational agent. In this paper, we propose a dialogue management model based on set of communication strategies and affective backchannels in order to foster the agent’s ability to carry on natural and WTC friendly conversations with L2 learners. We expect that combining communication strategies with affective backchannels can empower conversational agents to the extent to effectively help L2 learners recover from eventual communication pitfalls and create a warm conversation atmosphere.

Emmanuel Ayedoun, Yuki Hayashi, Kazuhisa Seta
A Multi-layered Architecture for Analysis of Non-technical-Skills in Critical Situations

In most technical domains, non-technical skills have an influence on a worker’s performance. Studies have shown that these skills are most influential during critical situations, where usual technical procedures cannot be successfully applied. This article describes the challenges raised by the diagnosis of non-technical skills during critical situations inside a virtual environment, and presents the first steps of this diagnosis task, namely the evaluation of a learner’s perceptual and gestural performance using a neural network.

Yannick Bourrier, Francis Jambon, Catherine Garbay, Vanda Luengo
Conceptual Framework for Collaborative Educational Resources Adaptation in Virtual Learning Environments

Frequently, the existing resources in Virtual Learning Environments (VLEs), used in distance education courses and blended, are presented in the same way for all students. This may complicate the effective learning process of each student. In order to solve this problem, the approach adopted in this paper is based on a framework called ArCARE, which allows adaptation of resources for students in VLEs, allowing the construction of his knowledge, using multi-agent system technology that handles open learner model ontology. These ArCARE resources are recommendation and adaptation of collaborative activities such as pedagogical architectures for the students have a more effective learning of particular course content. Results obtained in a Computational Thinking course show the feasibility of the proposal.

Vitor Bremgartner, José de Magalhães Netto, Crediné Menezes
Minimal Meaningful Propositions Alignment in Student Response Comparisons

In an intelligent educational system, automatic sentence alignment has a pivotal role in determining a foundation for clustering, comparing, summarizing and classifying responses. In this paper, we go beyond sentence alignment by splitting the reference and the student responses into single clauses, which are then aligned using fine-grained semantic components (facets). This detailed analysis will enable automated educational systems to become highly scalable, domain-independent and to enrich the classroom experience. The results are very promising, showing a significant increase in terms of $$F_1$$-score, compared to the best performing baseline.

Florin Bulgarov, Rodney Nielsen
Does Adaptive Provision of Learning Activities Improve Learning in SQL-Tutor?

Tutored Problem Solving (PS), worked examples (WE) and Erroneous Examples (ErrEx) have all been proven to be effective in supporting learning. We previously found that learning from a fixed sequence of alternating WE/PS pairs and ErrEx/PS pairs (WPEP) was beneficial for students in comparison to learning from a fixed sequence of PS and WEs [1]. In this paper, we introduce an adaptive strategy which determines which learning activities (a WE, a 1-error ErrEx, a 2-error ErrEx or a problem to be solved) to provide to the student based on the score the student obtained on the previous problem. We compared the adaptive strategy to the fixed WPEP strategy, and found that students in the adaptive condition significantly improved their post-test scores on conceptual, procedural and debugging questions.

Xingliang Chen, Antonija Mitrovic, Moffat Mathews
Constraint-Based Modelling as a Tutoring Framework for Japanese Honorifics

Japanese honorifics establish the social and working relationship of people and it is indispensable in a conversation. In this research, we examine the use of Constraint-Based Modelling (CBM) and its implementation for developing a tutoring system for Japanese honorifics. We focus on implementing CBM for one form of honorifics called sonkeigo and we represent its formation through constraints. We demonstrate an implementation of a reading assistant tutor using CBM for rewriting sonkeigo expressions to their regular form and vice-versa by pattern matching via constraints.

Zachary T. Chung, Takehito Utsuro, Ma. Mercedes Rodrigo
Teaching iSTART to Understand Spanish

iSTART is a web-based reading comprehension tutor. A recent translation of iSTART from English to Spanish has made the system available to a new audience. In this paper, we outline several challenges that arose during the development process, specifically focusing on the algorithms that drive the feedback. Several iSTART activities encourage students to use comprehension strategies to generate self-explanations in response to challenging texts. Unsurprisingly, analyzing responses in a new language required many changes, such as implementing Spanish natural language processing tools and rebuilding lists of regular expressions used to flag responses. We also describe our use of an algorithm inspired from genetics to optimize the Fischer Discriminant Function Analysis coefficients used to determine self-explanation scores.

Mihai Dascalu, Matthew E. Jacovina, Christian M. Soto, Laura K. Allen, Jianmin Dai, Tricia A. Guerrero, Danielle S. McNamara
Data-Driven Generation of Rubric Parameters from an Educational Programming Environment

We demonstrate that, by using a small set of hand-graded students, we can automatically generate rubric parameters with a high degree of validity, and that a predictive model incorporating these rubric parameters is more accurate than a previously reported model. We present this method as one approach to addressing the often challenging problem of grading assignments in programming environments. A classic solution is creating unit-tests that the student-generated program must pass, but the rigid, structured nature of unit-tests is suboptimal for assessing more open-ended assignments. Furthermore, the creation of unit-tests requires predicting the various ways a student might correctly solve a problem – a challenging and time-intensive process. The current study proposes an alternative, semi-automated method for generating rubric parameters using low-level data from the Alice programming environment.

Nicholas Diana, Michael Eagle, John Stamper, Shuchi Grover, Marie Bienkowski, Satabdi Basu
Exploring Learner Model Differences Between Students

Bayesian Knowledge Tracing (BKT) has been employed successfully in intelligent learning environments to individualize curriculum sequencing and help messages. Standard BKT employs four parameters, which are estimated separately for individual knowledge components, but not for individual students. Studies have shown that individualizing the parameter estimates for students based on existing data logs improves goodness of fit and leads to substantially different practice recommendations. This study investigates how well BKT parameters in a tutor lesson can be individualized ahead of time, based on learners’ prior activities, including reading text and completing prior tutor lessons. We find that directly applying best-fitting individualized parameter estimates from prior tutor lessons does not appreciably improve BKT goodness of fit for a later tutor lesson, but that individual differences in the later lesson can be effectively predicted from measures of learners’ behaviors in reading text and in completing the prior tutor lessons.

Michael Eagle, Albert Corbett, John Stamper, Bruce M. McLaren, Ryan Baker, Angela Wagner, Benjamin MacLaren, Aaron Mitchell
Investigating the Effectiveness of Menu-Based Self-explanation Prompts in a Mobile Python Tutor

PyKinetic is a mobile tutor for Python, which offers Parsons problems with incomplete lines of code (LOCs). This paper reports the results of a study in which we investigated the effect of menu-based self-explanation (SE) prompts. Students were asked to self-explain concepts related to incomplete LOCs they have solved. The goals of the study were (1) to investigate whether students are learning with PyKinetic and (2) to determine the effect of SE prompts. The scores of participants have significantly improved from the pre-test to the post-test. There was also a significant difference on the post-test scores of participants from the experimental group compared to the control group. In future work, we aim to add other activities to PyKinetic, and introduce a student model and a pedagogical model for an adaptive version of PyKinetic.

Geela Venise Firmalo Fabic, Antonija Mitrovic, Kourosh Neshatian
Striking a Balance: User-Experience and Performance in Computerized Game-Based Assessment

Game-based assessment (GBA) is a new frontier in the assessment industry. However, as with serious games, it will likely be important to find an optimal balance between making the game “fun” versus focusing on achieving the educational goals. We created two minigames to assess students’ knowledge of argumentation skills. We conducted an iterative counter-balanced pre-survey-interaction-post-survey study with 124 students. We discovered that game presentation sequence and game perceptions are related to performance in two games with varying numbers of game features and alignment to educational content. Specifically, understanding how to play the games is related to performance when users start with a familiar environment and move to one with more game features, whereas enjoyment is related to performance when users start with a more gamified experience before moving to a familiar environment.

Carol M. Forsyth, Tanner Jackson, Del Hebert, Blair Lehman, Pat Inglese, Lindsay Grace
Interactive Score Reporting: An AutoTutor-Based System for Teachers

Teachers often have difficulties understanding many aspects of score reports for assessments, thus hindering their ability to help students. Computerized environments with natural language conversations may help teachers better understand these reports. Thus, we created a tutor on score reports for teachers based on the AutoTutor conversational framework, which conventionally teaches various topics to students rather than teachers. We conducted a pilot study where eight teachers completed interaction with the tutor, providing a total of 98 responses. Results revealed specific ways the framework may be altered for teachers as well as teachers’ overall favorable attitudes towards the tutor.

Carol M. Forsyth, Stephanie Peters, Diego Zapata-Rivera, Jennifer Lentini, Art Graesser, Zhiqiang Cai
Transforming Foreign Language Narratives into Interactive Reading Applications Designed for Comprehensibility and Interest

This study reports on the design and use of a second language reading application for enhanced comprehension and pleasure reading. The application combines short narratives with dialog construction tasks. Quantitative reading comprehension scores were compared between reading by using the application and reading by using regular text and it also evaluates qualitatively how users perceived the application. Preliminary results indicate that the software was successful in improving reading comprehension by guiding user behavior through its design. However, not all students were optimistic about the application as a learning tool given its implicit approach. How the work stands in relation to extensive reading is also discussed.

Pedro Furtado, Tsukasa Hirashima, Yusuke Hayashi
Exploring Students’ Affective States During Learning with External Representations

We conducted a user study that explored the relationship between students’ usage of multiple external representations and their affective states during fractions learning. We use the affective states of the student as a proxy indicator for the ease of reasoning with the representation. Extending existing literature that highlights the advantages of learning with multiple external representations, our results indicate that low-performing students have difficulties in reasoning with representations that do not fully accommodate the fraction as a part-whole concept. In contrast, high-performing students were at ease with a range of representations, including the ones that vaguely involved the fraction as part-whole concept.

Beate Grawemeyer, Manolis Mavrikis, Claudia Mazziotti, Alice Hansen, Anouschka van Leeuwen, Nikol Rummel
Enhancing an Intelligent Tutoring System to Support Student Collaboration: Effects on Learning and Behavior

In this study we explore how different methods of structuring collaborative interventions affect student learning and interaction in an Intelligent Tutoring System for Computer Science. We compare two methods of structuring collaboration: one condition, unstructured, does not provide students with feedback on their collaboration; whereas the other condition, semistructured, offers a visualization of group performance over time, partner contribution comparison and feedback, and general tips on collaboration. We present a contrastive analysis of student interaction outcomes between conditions, and explore students reported perceptions of both systems. We found that students in both conditions have significant learning gains, equivalent coding efficiency, and limited reliance on system examples. However, unstructured users are more on-topic in their conversational dialogue, whereas semistructured users exhibit better planning skills as problem difficulty increases.

Rachel Harsley, Barbara Di Eugenio, Nick Green, Davide Fossati
Assessing Question Quality Using NLP

An NLP algorithm was developed to assess question quality to inform feedback on questions generated by students within iSTART (an intelligent tutoring system that teaches reading strategies). A corpus of 4575 questions was coded using a four-level taxonomy. NLP indices were calculated for each question and machine learning was used to predict question quality. NLP indices related to lexical sophistication modestly predicted question type. Accuracies improved when predicting two levels (shallow versus deep).

Kristopher J. Kopp, Amy M. Johnson, Scott A. Crossley, Danielle S. McNamara
The Effect of Providing Motivational Support in Parsons Puzzle Tutors

In response to student feedback on a tutor on Parsons puzzles on the programming concept of sequence, we incorporated three features meant to improve the motivation of the student solving the puzzles. We compared the performance of students before and after introducing these features. We found that introduction of motivational supports did not affect pre-post improvement, and therefore, the amount of learning. Students who were provided motivational supports spent more time per puzzle than those who were not.

Amruth N. Kumar
Assessing Student Answers to Balanced Tree Problems

Problems in the domain of balanced binary tree operations usually involve the students constructing a sequence of transformations to insert or delete a value. An Intelligent Tutoring System (ITS) in this area must be able to perform.

Chun W. Liew, Huy Nguyen, Darren J. Norton
A Comparisons of BKT, RNN and LSTM for Learning Gain Prediction

The objective of this study is to develop effective computational models that can predict student learning gains, preferably as early as possible. We compared a series of Bayesian Knowledge Tracing (BKT) models against vanilla RNNs and Long Short Term Memory (LSTM) based models. Our results showed that the LSTM-based model achieved the highest accuracy and the RNN based model have the highest F1-measure. Interestingly, we found that RNN can achieve a reasonably accurate prediction of student final learning gains using only the first 40% of the entire training sequence; using the first 70% of the sequence would produce a result comparable to using the entire sequence.

Chen Lin, Min Chi
Uncovering Gender and Problem Difficulty Effects in Learning with an Educational Game

A prior study showed that middle school students who used the educational game Decimal Point achieved significantly higher gain scores on immediate and delayed posttests of decimal understanding than students who learned with a more conventional computer-based learning tool. This paper reports on new analyses of the data from that study, providing new insights into the benefits of the game. First, females benefited more than males from the game. Second, students in the game condition performed better on the more difficult intervention problems. This paper presents these new analyses and discusses why the educational game might have led to these results.

Bruce McLaren, Rosta Farzan, Deanne Adams, Richard Mayer, Jodi Forlizzi
Analyzing Learner Affect in a Scenario-Based Intelligent Tutoring System

Scenario-based tutoring systems influence affective states due to two distinct mechanisms during learning: (1) reactions to performance feedback and (2) responses to the scenario context or events. To explore the role of affect and engagement, a scenario-based ITS was instrumented to support unobtrusive facial affect detection. Results from a sample of university students showed relatively few traditional academic affective states such as confusion or frustration, even at decision points and after poor performance (e.g., incorrect responses). This may show evidence of “over-flow,” with a high level of engagement and interest but insufficient confusion/disequilibrium for optimal learning.

Benjamin Nye, Shamya Karumbaiah, S. Tugba Tokel, Mark G. Core, Giota Stratou, Daniel Auerbach, Kallirroi Georgila
Proficiency and Preference Using Local Language with a Teachable Agent

With a teachable agent system and a set of linguistically diverse comparison prototypes, we explore questions of proficiency with and preference for local language agents in two sites in the Philippines. We found that students in a higher-performing school produce more English-language math explanations at a faster rate than students in a lower-performing school, who were more proficient in their local language. However, these students preferred the English-language agent, while students in the higher-performing school had equal preference for agents who communicates in the local language. These findings demonstrate the complex interactions between language and engagement in AIED systems.

Amy Ogan, Evelyn Yarzebinski, Roberto De Roock, Cristina Dumdumaya, Michelle Banawan, Ma. Mercedes Rodrigo
LiftUpp: Support to Develop Learner Performance

The last two decades have seen enormous progress in both theories and technology to support learner progress. However, many of the Artificial Intelligence in Education (AIED) techniques are difficult to apply in workplace-based educational settings, such as dentistry. Such settings put high demands on e-infrastructure, because they require intelligent systems that can be used in the workplace every day, and can also fuse many different forms of assessment data together. In addition, such systems should be able to enhance student development through personalised real time feedback (in dentistry education, for example, from both staff and patients) to drive learner self-reflection. Moreover, the information these systems provide must be reliable to facilitate defensible decisions over individual student progress to protect the public [2].

Frans A. Oliehoek, Rahul Savani, Elliot Adderton, Xia Cui, David Jackson, Phil Jimmieson, John Christopher Jones, Keith Kennedy, Ben Mason, Adam Plumbley, Luke Dawson
StairStepper: An Adaptive Remedial iSTART Module

This paper introduces StairStepper, a new addition to Interactive Strategy Training for Active Reading and Thinking (iSTART), an intelligent tutoring system (ITS) that provides adaptive self-explanation training and practice. Whereas iSTART focuses on improving comprehension at levels geared toward answering challenging questions associated with complex texts, StairStepper focuses on improving learners’ performance when reading grade-level expository texts. StairStepper is designed as a scaffolded practice activity wherein text difficulty level and task are adapted according to learners’ performance. This offers a unique module that provides reading comprehension tutoring through a combination of self-explanation practice and answering of multiple-choice questions representative of those found in standardized tests.

Cecile A. Perret, Amy M. Johnson, Kathryn S. McCarthy, Tricia A. Guerrero, Jianmin Dai, Danielle S. McNamara
AttentiveLearner2: A Multimodal Approach for Improving MOOC Learning on Mobile Devices

We propose AttentiveLearner2, a multimodal mobile learning system for MOOCs running on unmodified smartphones. AttentiveLearner2 uses both the front and back cameras of a smartphone as two complementary and fine-grained feedback channels in real time: the back camera monitors learners’ photoplethysmography (PPG) signals and the front camera tracks their facial expressions during MOOC learning. AttentiveLearner2 implicitly infers learners’ affective and cognitive states during learning by analyzing learners’ PPG signals and facial expressions. In a 26-participant user study, we found that it is feasible to detect 6 types of emotion during learning via collected PPG signals and facial expressions and these modalities are complement with each other.

Phuong Pham, Jingtao Wang
Automated Analysis of Lecture Video Engagement Using Student Posts

This work explores the feasibility of a learning analytic that would provide high level engagement data to instructors based on students’ text artifacts in online learning systems. Student posts from an online lecture video system were collected and manually coded by engagement using the ICAP framework. Analyses show what features are most indicative of engagement and the performance of using a neural network to classify posts by engagement.

Nicholas R. Stepanek, Brian Dorn
A Study of Learners’ Behaviors in Hands-On Learning Situations and Their Correlation with Academic Performance

This study analyzes students’ behavior in our remote laboratory environment and aims at identifying behavioural patterns during a practical session that lead to better learning outcomes, in order to predict learners’ performance and to automatically guide students who might need more support. Based on data collected from an experimentation conducted in an authentic learning context, we discover recurrent sequential patterns of actions that lead us to the definition of learning strategies as indicators of higher level of abstraction. Results show that some of the strategies are correlated to the learners’ performance at the final assessment test. For instance, construction of a complex action step by step, or reflexion before submitting an action, are two strategies applied more often by learners of a higher level of performance than by others. These findings led us to instrument for both students and instructors new guiding and tutoring tools in our remote lab environment.

Rémi Venant, Kshitij Sharma, Pierre Dillenbourg, Philippe Vidal, Julien Broisin
Assessing the Collaboration Quality in the Pair Program Tracing and Debugging Eye-Tracking Experiment

We assessed the extent of collaboration of pairs of novice programmers as they traced and debugged fragments of code using cross-recurrence quantification analysis (CRQA). Specifically, we compared which among the pairs collaborated the most given a particular task. This was also a preliminary study that looked for patterns on how the pairs categorized according to expertise collaborated. We performed a CRQA to build cross-recurrence plots using the eye tracking data and computed for the CRQA metrics, such as recurrence rate (RR), determinism (DET), entropy (ENTR), and laminarity (LAM) using the CRP toolbox for MATLAB. Findings showed that Pair 3, which consisted of both high-performers, collaborated the most because of its highest RR and DET. However, its highest ENT and LAM implied that Pair 3 struggled the most in program comprehension. We found also that all the pairs as assessed through their RR’s started with low values, peaked in the middle, declined, and increased again when the task was about to end, regardless of how well partners knew each other prior to the task. This could mean that at the start the pairs were still independently assessing how to approach the task, then they started to collaborate once comfortable but then worked independently again in an attempt to finish.

Maureen Villamor, Yancy Vance Paredes, Japheth Duane Samaco, Joanna Feliz Cortez, Joshua Martinez, Ma. Mercedes Rodrigo
EMBRACE: Applying Cognitive Tutor Principles to Reading Comprehension

Reading comprehension is a critical skill, and one where dual language learners can fall behind compared to native English speakers. We developed EMBRACE, an intelligent tutoring system to improve reading comprehension of dual language learners. Based on theories of embodied cognition, EMBRACE tutors children on how to create cognitive simulations of text content. We describe the implementation of EMBRACE and show how it is closely aligned to principles posed by Anderson and colleagues in 1995 for the design of cognitive tutors, a type of intelligent tutoring system.

Erin Walker, Audrey Wong, Sarah Fialko, M. Adelaida Restrepo, Arthur M. Glenberg
Effects of a Dashboard for an Intelligent Tutoring System on Teacher Knowledge, Lesson Plans and Class Sessions

Even though Intelligent Tutoring Systems (ITS) have been shown to help students learn, little research has investigated how a dashboard could help teachers help their students. In this paper, we explore how a dashboard prototype designed for an ITS affects teachers’ knowledge about their students, their classroom lesson plans and class sessions. We conducted a quasi-experimental classroom study with 5 middle school teachers and 8 classes. We found that the dashboard influences what teachers know about their students, which in turn influences the lesson plans they prepare, which then guides what teachers cover in a class session. We believe this is the first study that explores how a dashboard for an ITS affects teacher’s knowledge, decision-making and actions in the classroom.

Françeska Xhakaj, Vincent Aleven, Bruce M. McLaren
Dynamics of Affective States During MOOC Learning

We investigate the temporal dynamics of learners’ affective states (e.g., engagement, boredom, confusion, frustration, etc.) during video-based learning sessions in Massive Open Online Courses (MOOCs) in a 22-participant user study. We also show the feasibility of predicting learners’ moment-to-moment affective states via implicit photoplethysmography (PPG) sensing on unmodified smartphones.

Xiang Xiao, Phuong Pham, Jingtao Wang
Learning from Errors: Identifying Strategies in a Math Tutoring System

This study attempts to investigate how students gain knowledge by utilizing help and practice after making errors. We define three types of strategies used by students after errors: help-seeking (requesting two worked examples in the next attempts after an error), practice (solving the problems in the next two attempts after an error), and mixed (first requesting a worked example or first solving a problem in the next two attempts after an error). Our results indicate that the most frequently used strategies are help and mixed strategies. However, the practice strategy and mixed strategies facilitate immediate performance improvement. Additionally, the help strategy was found to interfere with delayed performance.

Jun Xie, Keith Shubeck, Scotty D. Craig, Xiangen Hu
Can Short Answers to Open Response Questions Be Auto-Graded Without a Grading Rubric?

Auto-grading short-answers seems to be sufficiently resolved. However, most auto-graders require comprehensive scoring rubrics, which were not always available. This paper used modern machine learning techniques to build auto-graders without expressly defining the rubrics. The result shows that the best auto-grading model is able to achieve a good inter-rater agreement (kappa = 0.625) with expert grading. The agreement can be further improved (kappa = 0.726) if the auto-grading model gave up scoring some of the answers.

Xi Yang, Lishan Zhang, Shengquan Yu
Regional Cultural Differences in How Students Customize Their Avatars in Technology-Enhanced Learning

As AIED systems with agents and avatars are used by students in different world regions, we expect students to prefer ones that look like them according to the Similarity Attraction Hypothesis. We investigate this effect via a system with a customizable avatar deployed in 2 US regions and 2 Philippines regions. We find that US students do customize as expected, while students in the Philippines tend to select names and hairstyles from outside their culture. These results show the need for more nuanced system design to tailor options for regional-level preferences.

Evelyn Yarzebinski, Cristina Dumdumaya, Ma. Mercedes T. Rodrigo, Noboru Matsuda, Amy Ogan

Doctoral Consortium Papers

Frontmatter
Teaching Informal Logical Fallacy Identification with a Cognitive Tutor

In this age of fake news and alternative facts, the need for a citizenry capable of critical thinking has never been greater. While teaching critical thinking skills in the classroom remains an enduring challenge, research on an ill-defined domain like critical thinking in the educational technology space is even more scarce. We propose a difficulty factors assessment (DFA) to explore two factors that may make learning to identify fallacies more difficult: type of instruction and belief bias. This study will allow us to make two key contributions. First, we will better understand the relationship between sense-making and induction when learning to identify informal fallacies. Second, we will contribute to the limited work examining the impact of belief bias on informal (rather than formal) reasoning. The results of this DFA will also be used to improve the next iteration of our fallacy tutor, which may ultimately contribute to a computational model of informal fallacies.

Nicholas Diana, Michael Eagle, John Stamper, Kenneth R. Koedinger
Digital Learning Projection
Learning Performance Estimation from Multimodal Learning Experiences.

Multiple modalities of the learning process can now be captured on real-time through wearable and contextual sensors. By annotating these multimodal data (the input space) by expert assessments or self-reports (the output space), machine learning models can be trained to predict the learning performance. This can lead to continuous formative assessment and feedback generation, which can be used to personalise and contextualise content, improve awareness and support informed decisions about learning.

Daniele Di Mitri
Learning with Engaging Activities via a Mobile Python Tutor

This paper presents work on a new mobile Python tutor – PyKinetic. The tutor is designed to be used by novices, as a complement to traditional labs and lectures. PyKinetic currently contains one type of activity – Parsons problems, which require learners to re-order lines of code to produce a desired output. We present results of studies conducted to evaluate the usability and effectiveness of PyKinetic for learning. The enthusiasm from the participants was encouraging. We have also evaluated menu-based self-explanation prompts in PyKinetic. Results revealed that participants significantly improved their scores from pre- to post-test. Furthermore, participants who self-explained learned more than those who did not. We aim to develop more activities for PyKinetic to support code reading and code writing skills. We also plan to improve the tutor by providing engaging features to maximise learning, and to provide adaptive pedagogical support. Evaluation studies will also be conducted for future versions of PyKinetic.

Geela Venise Firmalo Fabic, Antonija Mitrovic, Kourosh Neshatian
Math Reading Comprehension: Comparing Effectiveness of Various Conversation Frameworks in an ITS

Conversation based intelligent tutoring systems (ITSs) are highly effective at promoting learning across a wide range of domains. This is in part because these systems allow for the implementation of pedagogical strategies used by expert human tutors (e.g., self-reflection and deep-level reasoning questions). However, the various conversation frameworks used by these ITSs affect high domain knowledge students and low domain knowledge students differently. The experiment proposed in this paper will explore and test the added effectiveness of interactive dialogues and trialogues in learning Algebra I, utilized in a conversation based ITS. The experiment will compare learning across five conditions: (1) a static reading control condition, (2) a vicarious control dialogue condition with animated agents, (3) an interactive dialogue condition (i.e., human learner and tutor agent), (4) an interactive trialogue condition (i.e., human learner, tutor agent, and tutee agent) and (5) a vicarious monologue condition. This research will seek to answer questions concerning the effectiveness of dialogue and trialogue conversation environments in an Algebra 1 domain compared to vicarious learning, and whether trialogues provide an added benefit over dialogues within this domain.

Keith T. Shubeck, Ying Fang, Xiangen Hu

Industry Papers

Frontmatter
4C: Continuous Cognitive Career Companions

We explore the evolution of digital career advising companions for the rapidly growing knowledge economies to enable continuous evaluation and re-skilling of workforce in a wide range of domains. These companions deal with a variety of unstructured data sources to glean actionable insights. We present our experiences from building one such companion, and describe interesting natural language processing and machine learning challenges and open problems.

Bhavna Agrawal, Rong Liu, Ravi Kokku, Yi-Min Chee, Ashish Jagmohan, Satya Nitta, Michael Tan, Sherry Sin
Wizard’s Apprentice: Cognitive Suggestion Support for Wizard-of-Oz Question Answering

Recent advances in artificial intelligence and natural language processing greatly enhance the capabilities of intelligent tutoring systems. However, gathering a subject-appropriate corpus of training data remains challenging. In order to address this issue, we present a system based on a hybrid Wizard-of-Oz technique, which enables cognitive systems to work in tandem with a human operator (the “wizard”), to enhance collection of dialog variants.

Jae-wook Ahn, Patrick Watson, Maria Chang, Sharad Sundararajan, Tengfei Ma, Nirmal Mukhi, Srijith Prabhu
Interaction Analysis in Online Maths Human Tutoring: The Case of Third Space Learning

This ‘industry’ paper reports on the combined effort of researchers and industrial designers and developers to ground the automatic quality assurance of online maths human-to-human tutoring on best practices. We focus on the first step towards this goal. Our aim is to understand the largely under-researched field of online tutoring, to identify success factors in this context and to model best practice in online teaching. We report our research into best practice in online maths teaching and describe and discuss our design and evaluation iterations towards annotation software that can mark up human-to-human online teaching interactions with successful teaching interaction signifiers.

Mutlu Cukurova, Manolis Mavrikis, Rose Luckin, James Clark, Candida Crawford
Using a Model for Learning and Memory to Simulate Learner Response in Spaced Practice

McGraw-Hill Education’s new adaptive flashcard application, StudyWise, implements spaced practice to help learners memorize collections of basic facts. For classroom use, subject matter experts needed a scheduling algorithm that could provide effective practice schedules to learn a pre-set number of facts over a specific interval of days. To test the pedagogical effectiveness of such schedules, we used the ACT-R model of memorization to simulate learner responses. Each schedule has one 30 min study session per day, with overall study intervals that ranged from one day for sets of less than 30 items to three weeks for sets of two hundred or more items. In each case, we succeeded in tuning our algorithm to give a high probability the simulated learner answered each item correctly by the end of the schedule. This use of artificial intelligence allowed us to optimize the algorithm before engaging large numbers of real users. As real user data becomes available for this application, the simulated user model can be further tested and refined.

Mark A. Riedesel, Neil Zimmerman, Ryan Baker, Tom Titchener, James Cooper
Bridging the Gap Between High and Low Performing Pupils Through Performance Learning Online Analysis and Curricula

Metacognition is a neglected area of investment in formal education and in teachers’ professional development. This paper presents an approach and tools, created by a London-based company called Performance Learning Education (PL), for supporting front-line teachers and learners in developing metacognitive competencies. An iterative process adopted by PL in developing and validating its approach is presented, demonstrating its value to real educational practices, it’s research potential in the area of metacognition, and its AI readiness, especially in relation to modelling learners’ non-cognitive competencies.

Tej Samani, Kaśka Porayska-Pomsta, Rose Luckin
Backmatter
Metadata
Title
Artificial Intelligence in Education
Editors
Prof. Dr. Elisabeth André
Ryan Baker
Prof. Xiangen Hu
Ma. Mercedes T. Rodrigo
Benedict du Boulay
Copyright Year
2017
Electronic ISBN
978-3-319-61425-0
Print ISBN
978-3-319-61424-3
DOI
https://doi.org/10.1007/978-3-319-61425-0

Premium Partner