Skip to main content

2018 | Book

Artificial Intelligence in Education

19th International Conference, AIED 2018, London, UK, June 27–30, 2018, Proceedings, Part I

Editors: Carolyn Penstein Rosé, Roberto Martínez-Maldonado, H. Ulrich Hoppe, Rose Luckin, Manolis Mavrikis, Dr. Kaska Porayska-Pomsta, Bruce McLaren, Benedict du Boulay

Publisher: Springer International Publishing

Book Series : Lecture Notes in Computer Science


About this book

This two volume set LNAI 10947 and LNAI 10948 constitutes the proceedings of the 19th International Conference on Artificial Intelligence in Education, AIED 2018, held in London, UK, in June 2018.The 45 full papers presented in this book together with 76 poster papers, 11 young researchers tracks, 14 industry papers and 10 workshop papers were carefully reviewed and selected from 192 submissions. The conference provides opportunities for the cross-fertilization of approaches, techniques and ideas from the many fields that comprise AIED, including computer science, cognitive and learning sciences, education, game design, psychology, sociology, linguistics as well as many domain-specific areas.

Table of Contents


Full Papers

Investigating the Impact of a Meaningful Gamification-Based Intervention on Novice Programmers’ Achievement

Gamification is becoming a popular classroom intervention used in computer science instruction, including CS1, the first course computer science students take. It is being used as a medium to encourage certain student behaviors in anticipation of positive effects on learning experience and achievement. However, existing studies have mostly implemented reward-based game elements which have resulted to contrasting behaviors among students. Meaningful gamification, defined as the use of game design elements to encourage users build internal motivation to behave in a certain way, is contended to be a more effective approach. This concept is founded on the ‘Self-Determination Theory’, which states that there are three components associated with intrinsic motivation: mastery, autonomy, and relatedness. This study describes the analysis of data collected from an experiment where students of an introductory programming class used a system embedded with elements that map to the components of the Self-Determination Theory: feedback cycles, freedom to fail, and progress to support mastery; control to enable autonomy; and collaboration for relatedness. It looks into whether the experimental group performed significantly better than the control group. It also tries to explore how different user types respond to the different game design elements.

Jenilyn L. Agapito, Ma. Mercedes T. Rodrigo
Automatic Item Generation Unleashed: An Evaluation of a Large-Scale Deployment of Item Models

Automatic item generation represents a potential solution to the increased item development demands in this era of continuous testing. However, the use of test items that are automatically generated on-the-fly poses significant psychometric challenges for item calibration. The solution that has been suggested by a small but growing number of authors is to replace item calibration with item model (or family) calibration and to adopt a multilevel approach where items are nested within item models. Past research on the feasibility of this approach was limited to simulations or small-scale illustrations of its potential. The purpose of this study was to evaluate the results of a large-scale deployment of automatic item generation in a low-stakes adaptive testing context, with a large number of item models, and a very large number of randomly generated item instances.

Yigal Attali
Quantifying Classroom Instructor Dynamics with Computer Vision

Classroom teachers utilize many nonverbal activities, such as gesturing and walking, to maintain student attention. Quantifying instructor behaviors in a live classroom environment has traditionally been done through manual coding, a prohibitively time-consuming process which precludes providing timely, fine-grained feedback to instructors. Here we propose an automated method for assessing teachers’ non-verbal behaviors using video-based motion estimation tailored for classroom applications. Motion was estimated by subtracting background pixels that varied little from their mean values, and then noise was reduced using filters designed specifically with the movements and speeds of teachers in mind. Camera pan and zoom events were also detected, using a method based on tracking the correlations between moving points in the video. Results indicated the motion estimation method was effective for predicting instructors’ non-verbal behaviors, including gestures (kappa = .298), walking (kappa = .338), and camera pan (an indicator of instructor movement; kappa = .468), all of which are plausibly related to student attention. We also found evidence of predictive validity, as these automated predictions of instructor behaviors were correlated with students’ mean self-reported level of attention (e.g., r = .346 for walking), indicating that the proposed method captures the association between instructors’ non-verbal behaviors and student attention. We discuss the potential for providing timely, fine-grained, automated feedback to teachers, as well as opportunities for future classroom studies using this method.

Nigel Bosch, Caitlin Mills, Jeffrey D. Wammes, Daniel Smilek
Learning Cognitive Models Using Neural Networks

A cognitive model of human learning provides information about skills a learner must acquire to perform accurately in a task domain. Cognitive models of learning are not only of scientific interest, but are also valuable in adaptive online tutoring systems. A more accurate model yields more effective tutoring through better instructional decisions. Prior methods of automated cognitive model discovery have typically focused on well-structured domains, relied on student performance data or involved substantial human knowledge engineering. In this paper, we propose Cognitive Representation Learner (CogRL), a novel framework to learn accurate cognitive models in ill-structured domains with no data and little to no human knowledge engineering. Our contribution is two-fold: firstly, we show that representations learnt using CogRL can be used for accurate automatic cognitive model discovery without using any student performance data in several ill-structured domains: Rumble Blocks, Chinese Character, and Article Selection. This is especially effective and useful in domains where an accurate human-authored cognitive model is unavailable or authoring a cognitive model is difficult. Secondly, for domains where a cognitive model is available, we show that representations learned through CogRL can be used to get accurate estimates of skill difficulty and learning rate parameters without using any student performance data. These estimates are shown to highly correlate with estimates using student performance data on an Article Selection dataset.

Devendra Singh Chaplot, Christopher MacLellan, Ruslan Salakhutdinov, Kenneth Koedinger
Measuring the Quality of Assessment Using Questions Generated from the Semantic Web

This article describes a new feature of the adaptive assessment system SIETTE that allows for the static and dynamic generation of questions from tables of data for knowledge assessment. Almost the same approach can be used to generate questions from data collected in a spreadsheet, a database query, or a semantic web query using SPARQL. The main problem faced with question generation is ensuring that the questions are valid for assessment. For this reason, most of the existing systems propose to use this mechanism only for low-stakes assessments. In this paper, we propose a methodology to control question generation quality and measure the impact of potential invalid instances on the final score as well as recommend some strategies to overcome these problems.

Ricardo Conejo, Beatriz Barros, Manuel F. Bertoa
Balancing Human Efforts and Performance of Student Response Analyzer in Dialog-Based Tutors

Accurately interpreting student responses is a critical requirement of dialog-based intelligent tutoring systems. The accuracy of supervised learning methods, used for interpreting or analyzing student responses, is strongly dependent on the availability of annotated training data. Collecting and grading student responses is tedious, time-consuming, and expensive. This work proposes an iterative data collection and grading approach. We show that data collection efforts can be significantly reduced by predicting question difficulty and by collecting answers from a focused set of students. Further, grading efforts can be reduced by filtering student answers that may not be helpful in training Student Response Analyzer (SRA). To ensure the quality of grades, we analyze the grader characteristics, and show improvement when a biased grader is removed. An experimental evaluation on a large scale dataset shows a reduction of up to 28% in the data collection cost, and up to 10% in grading cost while improving the response analysis macro-average F1.

Tejas I. Dhamecha, Smit Marvaniya, Swarnadeep Saha, Renuka Sindhgatta, Bikram Sengupta
An Instructional Factors Analysis of an Online Logical Fallacy Tutoring System

The proliferation of fake news has underscored the importance of critical thinking in the civic education curriculum. Despite this recognized importance, systems designed to foster these kinds of critical thinking skills are largely absent from the educational technology space. In this work, we utilize an instructional factors analysis in conjunction with an online tutoring system to determine if logical fallacies are best learned through deduction, induction, or some combination of both. We found that while participants were able to learn the informal fallacies using inductive practice alone, deductive explanations were more beneficial for learning.

Nicholas Diana, John Stamper, Ken Koedinger
Using Physiological Synchrony as an Indicator of Collaboration Quality, Task Performance and Learning

Over the last decade, there has been a renewed interest in capturing 21st century skills using new data collection tools. In this paper, we leverage an existing dataset where multimodal sensors (mobile eye- trackers, motion sensors, galvanic skin response wristbands) were used to identify markers of productive collaborations. The data came from 42 pairs (N = 84) of participants who had no coding experience. They were asked to program a robot to solve a variety of mazes. We explored four different measures of physiological synchrony: Signal Matching (SM), Instantaneous Derivative Matching (IDM), Directional Agreement (DA) and Pearson’s Correlation (PC). Overall, we found PC to be positively associated with learning gains and DA with collaboration quality. We compare those results with prior studies and discuss implications for measuring collaborative process through physiological sensors.

Yong Dich, Joseph Reilly, Bertrand Schneider
Towards Combined Network and Text Analytics of Student Discourse in Online Discussions

This paper presents a novel method for the evaluation of students’ use of asynchronous discussions in online learning environments. In particular, the paper shows how students’ cognitive development across different course topics can be examined using the combination of natural language processing and graph-based analysis techniques. Drawing on the theoretical foundation of the community of inquiry model, we show how topic modeling and epistemic network analysis can provide qualitatively new insight into students’ development of critical and deep thinking skills. We also show how the same method can be used to investigate the effectiveness of instructional interventions and its effect on student learning. The results of this study and its practical implications are further discussed.

Rafael Ferreira, Vitomir Kovanović, Dragan Gašević, Vitor Rolim
How Should Knowledge Composed of Schemas be Represented in Order to Optimize Student Model Accuracy?

Most approaches to student modeling assume that students’ knowledge can be represented by a large set of knowledge components that are learned independently. Knowledge components typically represent fairly small pieces of knowledge. This seems to conflict with the literature on problem solving which suggests that expert knowledge is composed of large schemas. This study compared several domain models for knowledge that is arguably composed of schemas. The knowledge is used by students to construct system dynamics models with the Dragoon intelligent tutoring system. An evaluation with 52 students showed that a relative simple domain model, that assigned one KC to each schema and schema combination, sufficed and was more parsimonious than other domain models with similarly accurate predictions.

Sachin Grover, Jon Wetzel, Kurt VanLehn
Active Learning for Improving Machine Learning of Student Explanatory Essays

There is an increasing emphasis, especially in STEM areas, on students’ abilities to create explanatory descriptions. Holistic, overall evaluations of explanations can be performed relatively easily with shallow language processing by humans or computers. However, this provides little information about an essential element of explanation quality: the structure of the explanation, i.e., how it connects causes to effects. The difficulty of providing feedback on explanation structure can lead teachers to either avoid giving this type of assignment or to provide only shallow feedback on them. Using machine learning techniques, we have developed successful computational models for analyzing explanatory essays. A major cost of developing such models is the time and effort required for human annotation of the essays. As part of a large project studying students’ reading processes, we have collected a large number of explanatory essays and thoroughly annotated them. Then we used the annotated essays to train our machine learning models. In this paper, we focus on how to get the best payoff from the expensive annotation process within such an educational context and we evaluate a method called Active Learning.

Peter Hastings, Simon Hughes, M. Anne Britt
Student Learning Benefits of a Mixed-Reality Teacher Awareness Tool in AI-Enhanced Classrooms

When used in K-12 classrooms, intelligent tutoring systems (ITSs) can be highly effective in helping students learn. However, they might be even more effective if designed to work together with human teachers, to amplify their abilities and leverage their complementary strengths. In the present work, we designed a wearable, real-time teacher awareness tool: mixed-reality smart glasses that tune teachers in to the rich analytics generated by ITSs, alerting them to situations the ITS may be ill-suited to handle. A 3-condition experiment with 286 middle school students, across 18 classrooms and 8 teachers, found that presenting teachers with real-time analytics about student learning, metacognition, and behavior had a positive impact on student learning, compared with both business-as-usual and classroom monitoring support without advanced analytics. Our findings suggest that real-time teacher analytics can help to narrow the gap in learning outcomes across students of varying prior ability. This is the first experimental study showing that real-time teacher analytics can enhance student learning. This research illustrates the promise of AIED systems that integrate human and machine intelligence to support student learning.

Kenneth Holstein, Bruce M. McLaren, Vincent Aleven
Opening Up an Intelligent Tutoring System Development Environment for Extensible Student Modeling

ITS authoring tools make creating intelligent tutoring systems more cost effective, but few authoring tools make it easy to flexibly incorporate an open-ended range of student modeling methods and learning analytics tools. To support a cumulative science of student modeling and enhance the impact of real-world tutoring systems, it is critical to extend ITS authoring tools so they easily accommodate novel student modeling methods. We report on extensions to the CTAT/Tutorshop architecture to support a plug-in approach to extensible student modeling, which gives an author full control over the content of the student model. The extensions enhance the range of adaptive tutoring behaviors that can be authored and support building external, student- or teacher-facing real-time analytics tools. The contributions of this work are: (1) an open architecture to support the plugging in, sharing, re-mixing, and use of advanced student modeling techniques, ITSs, and dashboards; and (2) case studies illustrating diverse ways authors have used the architecture.

Kenneth Holstein, Zac Yu, Jonathan Sewall, Octav Popescu, Bruce M. McLaren, Vincent Aleven
Better Late Than Never but Never Late Is Better: Towards Reducing the Answer Response Time to Questions in an Online Learning Community

Professionals increasingly turn to online learning communities (OLCs) such as Stack Overflow (SO) to get help with their questions. It is important that the help is appropriate to the learning needs of the professional and is received in a timely fashion. However, we observed in SO a rise in the proportion of questions either answered late or not answered at all, from 5% in 2009 to 23% in 2016. There is clearly a need to be able to quickly find appropriate answerers for the questions asked by users. Our research goal is thus to find techniques that allow us to predict from SO data (using only information available at the time the question was asked) the actual answerers who provided the best answers and the most timely answers to users’ questions. Such techniques could then be deployed proactively at the time a question is asked to recommend an appropriate answerer. We used a variety of tag-based, response-based, and hybrid approaches in making these predictions. Comparing the approaches, we achieved success rates that varied from a low of .88% to a high of 89.64%, with the hybrid approaches being the best. We also explored the effect of excluding from the pool of possible answerers those users, who had already answered a question “recently”, with “recent” varying from 15 min up to 12 h, so as to have well rested helpers. We still achieved reasonable success rates at least for smaller exclusion periods of up to an hour, although naturally not as good as the time exclusion grew longer. We believe our work shows promise for allowing us to predict prospective answerers for questions who are not overworked, hence reducing the number of questions that would otherwise be answered late or not answered at all.

Oluwabukola Mayowa (Ishola) Idowu, Gordon McCalla
Expert Feature-Engineering vs. Deep Neural Networks: Which Is Better for Sensor-Free Affect Detection?

The past few years have seen a surge of interest in deep neural networks. The wide application of deep learning in other domains such as image classification has driven considerable recent interest and efforts in applying these methods in educational domains. However, there is still limited research comparing the predictive power of the deep learning approach with the traditional feature engineering approach for common student modeling problems such as sensor-free affect detection. This paper aims to address this gap by presenting a thorough comparison of several deep neural network approaches with a traditional feature engineering approach in the context of affect and behavior modeling. We built detectors of student affective states and behaviors as middle school students learned science in an open-ended learning environment called Betty’s Brain, using both approaches. Overall, we observed a tradeoff where the feature engineering models were better when considering a single optimized threshold (for intervention), whereas the deep learning models were better when taking model confidence fully into account (for discovery with models analyses).

Yang Jiang, Nigel Bosch, Ryan S. Baker, Luc Paquette, Jaclyn Ocumpaugh, Juliana Ma. Alexandra L. Andres, Allison L. Moore, Gautam Biswas
A Comparison of Tutoring Strategies for Recovering from a Failed Attempt During Faded Support

The support the tutor provides for a student is expected to fade over time as the student makes progress towards mastery of the learning objectives. One way in which the tutor can fade support is to prompt or elicit a next step that requires the student to fill in some intermediate actions or reasoning on her own. But what should the tutor do if the student is unable to complete such a step? In human-human tutoring interactions, a tutor may remediate by explicitly covering the missing intermediate steps with the student and in some contexts this behavior correlates with learning. But if there are multiple intermediate steps that need to be made explicit, the tutor could focus the student’s attention on the last successful step and then move forward through the intermediate steps (forward reasoning) or the tutor could focus the student’s attention on the intermediate step just before the failed step and move backward through the intermediate steps (backward reasoning). In this paper we explore when the forward strategy or backward strategy may be beneficial for remediation. We also compare the two faded support+remediation strategies to a control in which support is never faded and found that faded support was not detrimental to student learning outcomes when the two remediation strategies were available and it took significantly less time on task to achieve similar learning gains when starting the tutor-student interaction with faded support.

Pamela Jordan, Patricia Albacete, Sandra Katz
Validating Revised Bloom’s Taxonomy Using Deep Knowledge Tracing

Revised Bloom’s Taxonomy is used for classifying educational objectives. The said taxonomy describes a hierarchical ordering of cognitive skills from simple to complex. The Revised Taxonomy relaxed the strict cumulative hierarchical assumptions of the Original Taxonomy allowing overlaps. We use a knowledge tracing model, Deep Knowledge Tracing (DKT), to investigate the hierarchical nature of the Revised Taxonomy and also study the overlapping behavior of the Taxonomy. The DKT model is trained on about 42 million problems attempted on funtoot by the students. funtoot is an adaptive learning platform where students learn by answering problems. We propose a novel way to interpret the model’s output to measure the effects of each learning objective on every other learning objectives. The results confirm the relaxed hierarchy of the skills from simple to complex. Moreover, the results also suggest overlaps even among the non-adjacent skills.

Amar Lalwani, Sweety Agrawal
Communication at Scale in a MOOC Using Predictive Engagement Analytics

When teaching at scale in the physical classroom or online classroom of a MOOC, the scarce resource of personal instructor communication becomes a differentiating factor between the quality of learning experience available in smaller classrooms. In this paper, through real-time predictive modeling of engagement analytics, we augment a MOOC platform with personalized communication affordances, allowing the instructional staff to direct communication to learners based on individual predictions of three engagement analytics. The three model analytics are the current probability of earning a certificate, of submitting enough materials to pass the class, and of leaving the class and not returning. We engineer an interactive analytics interface in edX which is populated with real-time predictive analytics from a backend API service. The instructor can target messages to, for example, all learners who are predicted to complete all materials but not pass the class. Our approach utilizes the state-of-the-art in recurrent neural network classification, evaluated on a MOOC dataset of 20 courses and deployed in one. We provide evaluation of these courses, comparing a manual feature engineering approach to an automatic feature learning approach using neural networks. Our provided code for the front-end and back-end allows any instructional team to add this personalized communication dashboard to their edX course granted they have access to the historical clickstream data from a previous offering of the course, their course’s daily provided log data, and an external machine to run the model service API.

Christopher V. Le, Zachary A. Pardos, Samuel D. Meyer, Rachel Thorp
How to Use Simulation in the Design and Evaluation of Learning Environments with Self-directed Longer-Term Learners

Designing, developing, and evaluating interactive and adaptive learning environments requires significant investment of financial and human resources. This is especially the case when evaluating the impact of learning environments aimed at supporting self-directed longer-term learners, environments of increasing interest to AIED as the field moves into lifelong learning where mentorship is key. In this paper we propose the use of simulation to help in both the design and evaluation of such environments. As a case study, we have built a simulated university doctoral program (SimDoc) with simulated doctoral students and supervisors (mentors). To make sure SimDoc replicates observed data from a real-world environment as closely as possible, we informed and calibrated the simulation model with data from an actual doctoral program, as well as drawing on various empirical studies of graduate students and supervisors. Next, we used the calibrated simulation model to explore the effect of varying research group sizes of learners and supervisors’ mentoring workload on students’ completion rates and time-to-completion. Our main goal is to provide insight into how to build simulations of environments that support self-directed longer-term learners through a case study of one such simulation, thus further demonstrating the importance of simulation in AIED.

David Edgar Kiprop Lelei, Gordon McCalla
Students’ Academic Language Use When Constructing Scientific Explanations in an Intelligent Tutoring System

In the present study, we examined the use of academic language in students’ scientific explanations in the form of written claim, written evidence, and written reasoning (CER) statements during science inquiry within an intelligent tutoring system. Results showed that students tended to use more academic language when constructing their evidence and reasoning statements. Further analyses showed that both the number of words and pronouns used by students were significant predictors for the quality of students’ written CER statements. The quality of claim statements was significantly reduced by the lexical density (type-token ratio), but the quality of reasoning significantly increased with lexical density. The quality of evidence statements increased significantly with the inclusion of causal and temporal relationships, verb overlap, and descriptive writing. These findings indicate that students used language differently when constructing their CER statements. Implications are discussed in terms of how to increase students’ knowledge of and use of academic language.

Haiying Li, Janice Gobert, Rachel Dickler, Natali Morad
Automated Pitch Convergence Improves Learning in a Social, Teachable Robot for Middle School Mathematics

Pedagogical agents have the potential to provide not only cognitive support to learners but socio-emotional support through social behavior. Socio-emotional support can be a critical element to a learner’s success, influencing their self-efficacy and motivation. Several social behaviors have been explored with pedagogical agents including facial expressions, movement, and social dialogue; social dialogue has especially been shown to positively influence interactions. In this work, we explore the role of paraverbal social behavior or social behavior in the form of paraverbal cues such as tone of voice and intensity. To do this, we focus on the phenomenon of entrainment, where individuals adapt their paraverbal features of speech to one another. Paraverbal entrainment in human-human studies has been found to be correlated with rapport and learning. In a study with 72 middle school students, we evaluate the effects of entrainment with a teachable robot, a pedagogical agent that learners teach how to solve ratio problems. We explore how a teachable robot which entrains and introduces social dialogue influences rapport and learning; we compare with two baseline conditions: a social condition, in which the robot speaks socially, and a non-social condition, in which the robot neither entrains nor speaks socially. We find that a robot that does entrain and speaks socially results in significantly more learning.

Nichola Lubold, Erin Walker, Heather Pon-Barry, Amy Ogan
The Influence of Gender, Personality, Cognitive and Affective Student Engagement on Academic Engagement in Educational Virtual Worlds

Educational virtual worlds (EVWs) are emerging as immersive learning environments that allow students to engage in experiential learning. However, understanding whether individual student differences influence learning behaviours and adapting the EVW accordingly has not been extensively investigated. This paper reports an experimental study with 115 undergraduate students to explore the link between their gender, personality, cognitive and affective engagement in relation to their academic engagement, measured by a quiz after using the world. We also explore whether providing hints can improve their academic engagement in the EVW. Only personal support (a factor of affective engagement) and intrinsic motivation (a factor of cognitive engagement) were found to be related to the quiz score. Also the personality dimension of Openness indicated student propensity to accept support.

Anupam Makhija, Deborah Richards, Jesse de Haan, Frank Dignum, Michael J. Jacobson
Metacognitive Scaffolding Amplifies the Effect of Learning by Teaching a Teachable Agent

Learning by teaching has been compared with learning by being tutored, aka cognitive tutoring, to learn algebra linear equations for 7th to 8th grade algebra. Two randomized-controlled trials with 46 and 141 6th through 8th grade students were conducted in 3 public schools in two different years. Students in the learning by teaching (LBT) condition used an online learning environment (APLUS), where they interactively taught a teachable agent (SimStudent) how to solve equations with a goal to have the teachable agent pass the quiz. Students in the learning by being tutored condition used a version of cognitive tutor that uses the same user interface as APLUS, but no teachable agent. Instead, a teacher agent tutored students how to solve equations. The goal for students in this condition was to pass the quiz by themselves. Students selected and entered problems to be tutored by themselves. This condition is hence called Goal-Oriented Practice (GOP). For both conditions, students received metacognitive scaffolding on how to teach the teachable agent (LBT) and how to regulate their learning (GOP). The results from the classroom studies show that (1) students in both conditions learned equally well, measured as pre- and post-test scores, (2) prior competency does not influence the effect of LBT nor GOP (i.e., no aptitude-treatment interaction observed), and (3) GOP students primarily focused on submitting the quiz rather than practicing on problems. These results suggest that with the metacognitive scaffolding, learning by teaching is equally effective as cognitive tutoring regardless of the prior competency.

Noboru Matsuda, Vishnu Priya Chandra Sekar, Natalie Wall
A Data-Driven Method for Helping Teachers Improve Feedback in Computer Programming Automated Tutors

The increasing prevalence and sophistication of automated tutoring systems necessitates the development of new methods for their evaluation and improvement. In particular, data-driven methods offer the opportunity to provide teachers with insight about student interactions with online systems, facilitating their improvement to maximise their educational value. In this paper, we present a new technique for analysing feedback in an automated programming tutor. Our method involves first clustering submitted programs with the same functionality together, then applying sequential pattern mining and graphically visualising student progress through an exercise. Using data from a beginner Python course, we demonstrate how this method can be applied to programming exercises to analyse student approaches, responses to feedback, areas of greatest difficulty and repetition of mistakes. This process could be used by teachers to more effectively understand student behaviour, allowing them to adapt both traditional and online teaching materials and feedback to optimise student experiences and outcomes.

Jessica McBroom, Kalina Yacef, Irena Koprinska, James R. Curran
Student Agency and Game-Based Learning: A Study Comparing Low and High Agency

A key feature of most computer-based games is agency: the capability for students to make their own decisions in how they play. Agency is assumed to lead to engagement and fun, but may or may not be helpful to learning. While the best learners are often good self-regulated learners, many students are not, only benefiting from instructional choices made for them. In the study presented in this paper, involving a total of 158 fifth and sixth grade students, children played a mathematics learning game called Decimal Point, which helps middle-school students learn decimals. One group of students (79) played and learned with a low-agency version of the game, in which they were guided to play all “mini-games” in a prescribed sequence. The other group of students (79) played and learned with a high-agency version of the game, in which they could choose how many and in what order they would play the mini-games. The results show there were no significant differences in learning or enjoyment across the low and high-agency conditions. A key reason for this may be that students across conditions did not substantially vary in the way they played, perhaps due to the indirect control features present in the game. It may also be the case that the young students who participated in this study did not exercise their agency or self-regulated learning. This work is relevant to the AIED community, as it explores how game-based learning can be adapted. In general, once we know which game and learning features lead to the best learning outcomes, as well as the circumstances that maximize those outcomes, we can better design AI-powered, adaptive games for learning.

Huy Nguyen, Erik Harpstead, Yeyu Wang, Bruce M. McLaren
Engaging with the Scenario: Affect and Facial Patterns from a Scenario-Based Intelligent Tutoring System

Facial expression trackers output measures for facial action units (AUs), and are increasingly being used in learning technologies. In this paper, we compile patterns of AUs seen in related work as well as use factor analysis to search for categories implicit in our corpus. Although there was some overlap between the factors in our data and previous work, we also identified factors seen in the broader literature but not previously reported in the context of learning environments. In a correlational analysis, we found evidence for relationships between factors and self-reported traits such as academic effort, study habits, and interest in the subject. In addition, we saw differences in average levels of factors between a video watching activity, and a decision making activity. However, in this analysis, we were not able to isolate any facial expressions having a significant positive or negative relationship with either learning gain, or performance once question difficulty and related factors were also considered. Given the overall low levels of facial affect in the corpus, further research will explore different populations and learning tasks to test the possible hypothesis that learners may have been in a pattern of “Over-Flow” in which they were engaged with the system, but not deeply thinking about the content or their errors.

Benjamin D. Nye, Shamya Karumbaiah, S. Tugba Tokel, Mark G. Core, Giota Stratou, Daniel Auerbach, Kallirroi Georgila
Role of Socio-cultural Differences in Labeling Students’ Affective States

The development of real-time affect detection models often depends upon obtaining annotated data for supervised learning by employing human experts to label the student data. One open question in labeling affective data for affect detection is whether the labelers (i.e., human experts) need to be socio-culturally similar to the students being labeled, as this impacts the cost and feasibility of obtaining the labels. In this study, we investigate the following research questions: For affective state labeling, how does the socio-cultural background of human expert labelers, compared to the subjects (i.e., students), impact the degree of consensus and distribution of affective states obtained? Secondly, how do differences in labeler background impact the performance of affect detection models that are trained using these labels? To address these questions, we employed experts from Turkey and the United States to label the same data collected through authentic classroom pilots with students in Turkey. We analyzed within-country and cross-country inter-rater agreements, finding that experts from Turkey obtained moderately better inter-rater agreement than the experts from the U.S., and the two groups did not agree with each other. In addition, we observed differences between the distributions of affective states provided by experts in the U.S. versus Turkey, and between the performances of the resulting affect detectors. These results suggest that there are indeed implications to using human experts who do not belong to the same population as the research subjects.

Eda Okur, Sinem Aslan, Nese Alyuz, Asli Arslan Esme, Ryan S. Baker
Testing the Validity and Reliability of Intrinsic Motivation Inventory Subscales Within ASSISTments

Online learning environments allow for the implementation of psychometric scales on diverse samples of students participating in authentic learning tasks. One such scale, the Intrinsic Motivation Inventory (IMI) can be used to inform stakeholders of students’ subjective motivational and regulatory styles. The IMI is a multidimensional scale developed in support of Self-Determination Theory [1–3], a strongly validated theory stating that motivation and regulation are moderated by three innate needs: autonomy, belonging, and competence. As applied to education, the theory posits that students who perceive volition in a task, those who report stronger connections with peers and teachers, and those who perceive themselves as competent in a task are more likely to internalize the task and excel. ASSISTments, an online mathematics platform, is hosting a series of randomized controlled trials targeting these needs to promote integrated learning. The present work supports these studies by attempting to validate four subscales of the IMI within ASSISTments. Iterative factor analysis and item reduction techniques are used to optimize the reliability of these subscales and limit the obtrusive nature of future data collection efforts. Such scale validation efforts are valuable because student perceptions can serve as powerful covariates in differentiating effective learning interventions.

Korinn S. Ostrow, Neil T. Heffernan
Correctness- and Confidence-Based Adaptive Feedback of Kit-Build Concept Map with Confidence Tagging

In this paper, we present an adaptive feedback of Kit-Build concept map with confidence tagging (KB map-CT) for improving the understanding of learners in a reading situation. KB map-CT is a digital tool that supports the concept maps strategy where learners can construct concept maps for representing their understanding as learner maps and can identify their confidence in each proposition of the learner maps as a degree of their understanding. Kit-Build concept map (KB map) has been already realized the propositional level automatic diagnosis of the learner maps. Therefore, KB map-CT can utilize both correctness and confidence information for each proposition to design and distinguish feedback, that is, (1) correct and confident, (2) correct and unconfident, (3) incorrect and confident, and (4) incorrect and unconfident. An experiment was conducted to investigate the effectiveness of the adaptive feedback. The results suggest that learners can revise their maps after receiving feedback appropriately. In “correct and unconfident” case, adaptive feedback is useful to improve the confidence. In the case of “incorrect and confident,” improvement of the propositions was the same ratio with the case of “incorrect and unconfident.” The results of the delay test demonstrate that learners can retain their understanding and confidence one week later.

Jaruwat Pailai, Warunya Wunnasri, Yusuke Hayashi, Tsukasa Hirashima
Bring It on! Challenges Encountered While Building a Comprehensive Tutoring System Using ReaderBench

Intelligent Tutoring Systems (ITSs) are aimed at promoting acquisition of knowledge and skills by providing relevant and appropriate feedback during students’ practice activities. ITSs for literacy instruction commonly assess typed responses using Natural Language Processing (NLP) algorithms. One step in this direction often requires building a scoring mechanism that matches human judgments. This paper describes the challenges encountered while implementing an automated evaluation workflow and adopting solutions for increasing performance of the tutoring system. The algorithm described here comprises multiple stages, including initial pre-processing, a rule-based system for pre-classifying self-explanations, followed by classification using a Support Virtual Machine (SVM) learning algorithm. The SVM model hyper-parameters were optimized using grid search approach with 4,109 different self-explanations scored 0 to 3 (i.e., poor to great). The accuracy achieved for the model was 59% (adjacent accuracy = 97%; Kappa = .43).

Marilena Panaite, Mihai Dascalu, Amy Johnson, Renu Balyan, Jianmin Dai, Danielle S. McNamara, Stefan Trausan-Matu
Predicting the Temporal and Social Dynamics of Curiosity in Small Group Learning

Curiosity is an intrinsic motivation for learning, but is highly dynamic and changes moment to moment in response to environmental stimuli. In spite of the prevalence of small group learning in and outside of modern classrooms, little is known about the social nature of curiosity. In this paper, we present a model that predicts the temporal and social dynamics of curiosity based on sequences of behaviors exhibited by individuals engaged in group learning. This model reveals distinct sequential behavior patterns that predict increase and decrease of curiosity in individuals, and convergence to high and low curiosity among group members. In particular, convergence of the entire group to a state of high curiosity is highly correlated with sequences of behaviors that involve the most social of group behaviors - such as questions and answers, arguments and sharing findings, as well as scientific reasoning behaviors such as hypothesis generation and justification. The implications of these findings are discussed for educational systems that intend to evoke and scaffold curiosity in group learning contexts.

Bhargavi Paranjape, Zhen Bai, Justine Cassell
Learning Curve Analysis in a Large-Scale, Drill-and-Practice Serious Math Game: Where Is Learning Support Needed?

This paper applies data-driven methods to understand learning and derives game design insights in a large-scale, drill-and-practice game: Spatial Temporal (ST) Math. In order for serious games to thrive we must develop efficient, scalable methods to evaluate games against their educational goals. Learning models have matured in recent years and have been applied across e-learning platforms but they have not been used widely in serious games. We applied empirical learning curve analyses to ST Math under different assumptions of how knowledge components are defined in the game and map to game contents. We derived actionable game design feedback and educational insights regarding fraction learning. Our results revealed cases where students failed to transfer knowledge between math skills, content, and problem representations. This work stresses the importance of designing games that support students’ comprehension of math concepts, rather than the learning of content- and situation-specific skills to pass games.

Zhongxiu Peddycord-Liu, Rachel Harred, Sarah Karamarkovich, Tiffany Barnes, Collin Lynch, Teomara Rutherford
Conceptual Issues in Mastery Criteria: Differentiating Uncertainty and Degrees of Knowledge

Mastery learning is a common personalization strategy in adaptive educational systems. A mastery criterion decides whether a learner should continue practice of a current topic or move to a more advanced topic. This decision is typically done based on comparison with a mastery threshold. We argue that the commonly used mastery criteria combine two different aspects of knowledge estimate in the comparison to this threshold: the degree of achieved knowledge and the uncertainty of the estimate. We propose a novel learner model that provides conceptually clear treatment of these two aspects. The model is a generalization of the commonly used Bayesian knowledge tracing and logistic models and thus also provides insight into the relationship of these two types of learner models. We compare the proposed mastery criterion to commonly used criteria and discuss consequences for practical development of educational systems.

Radek Pelánek
Reciprocal Content Recommendation for Peer Learning Study Sessions

Recognition of peer learning as a valuable supplement to formal education has lead to a rich literature formalising peer learning as an institutional resource. Facilitating peer learning support sessions alone however, without providing guidance or context, risks being ineffective in terms of any targeted, measurable effects on learning. Building on an existing open-source, student-facing platform called RiPPLE, which recommends peer study sessions based on the availability, competencies and compatibility of learners, this paper aims to supplement these study sessions by providing content from a repository of multiple-choice questions to facilitate topical discussion and aid productiveness. We exploit a knowledge tracing algorithm alongside a simple Gaussian scoring model to select questions that promote relevant learning and that reciprocally meet the expectations of both learners. Primary results using synthetic data indicate that the model works well at scale in terms of the number of sessions and number of items recommended, and capably recommends from a large repository the content that best approximates a proposed difficulty gradient.

Boyd A. Potts, Hassan Khosravi, Carl Reidsema
The Impact of Data Quantity and Source on the Quality of Data-Driven Hints for Programming

In the domain of programming, intelligent tutoring systems increasingly employ data-driven methods to automate hint generation. Evaluations of these systems have largely focused on whether they can reliably provide hints for most students, and how much data is needed to do so, rather than how useful the resulting hints are to students. We present a method for evaluating the quality of data-driven hints and how their quality is impacted by the data used to generate them. Using two datasets, we investigate how the quantity of data and the source of data (whether it comes from students or experts) impact one hint generation algorithm. We find that with student training data, hint quality stops improving after 15–20 training solutions and can decrease with additional data. We also find that student data outperforms a single expert solution but that a comprehensive set of expert solutions generally performs best.

Thomas W. Price, Rui Zhi, Yihuan Dong, Nicholas Lytle, Tiffany Barnes
Predicting Question Quality Using Recurrent Neural Networks

This study assesses the extent to which machine learning techniques can be used to predict question quality. An algorithm based on textual complexity indices was previously developed to assess question quality to provide feedback on questions generated by students within iSTART (an intelligent tutoring system that teaches reading strategies). In this study, 4,575 questions were coded by human raters based on their corresponding depth, classifying questions into four categories: 1-very shallow to 4-very deep. Here we propose a novel approach to assessing question quality within this dataset based on Recurrent Neural Networks (RNNs) and word embeddings. The experiments evaluated multiple RNN architectures using GRU, BiGRU and LSTM cell types of different sizes, and different word embeddings (i.e., FastText and Glove). The most precise model achieved a classification accuracy of 81.22%, which surpasses the previous prediction results using lexical sophistication complexity indices (accuracy = 41.6%). These results are promising and have implications for the future development of automated assessment tools within computer-based learning environments.

Stefan Ruseti, Mihai Dascalu, Amy M. Johnson, Renu Balyan, Kristopher J. Kopp, Danielle S. McNamara, Scott A. Crossley, Stefan Trausan-Matu
Sentence Level or Token Level Features for Automatic Short Answer Grading?: Use Both

Automatic short answer grading for Intelligent Tutoring Systems has attracted much attention of the researchers over the years. While the traditional techniques for short answer grading are rooted in statistical learning and hand-crafted features, recent research has explored sentence embedding based techniques. We observe that sentence embedding techniques, while being effective for grading in-domain student answers, may not be best suited for out-of-domain answers. Further, sentence embeddings can be affected by non-sentential answers (answers given in the context of the question). On the other hand, token level hand-crafted features can be fairly domain independent and are less affected by non-sentential forms. We propose a novel feature encoding based on partial similarities of tokens (Histogram of Partial Similarities or HoPS), its extension to part-of-speech tags (HoPSTags) and question type information. On combining the proposed features with sentence embedding based features, we are able to further improve the grading performance. Our final model achieves better or competitive results in experimental evaluation on multiple benchmarking datasets and a large scale industry dataset.

Swarnadeep Saha, Tejas I. Dhamecha, Smit Marvaniya, Renuka Sindhgatta, Bikram Sengupta
When Optimal Team Formation Is a Choice - Self-selection Versus Intelligent Team Formation Strategies in a Large Online Project-Based Course

Prior research in Team-Based Massive Open Online Project courses (TB-MOOPs) has demonstrated both the importance of effective group composition and the potential for using automated methods for forming effective teams. Past work on automated team assignment has produced both spectacular failures and spectacular successes. In either case, different contexts pose particular challenges that may interfere with the applicability of approaches that have succeeded in other contexts. This paper reports on a case study investigating the applicability of an automated team assignment approach that has succeeded spectacularly in TB-MOOP contexts to a large online project-based course. The analysis offers both evidence of partial success of the paradigm as well as insights into areas for growth.

Sreecharan Sankaranarayanan, Cameron Dashti, Chris Bogart, Xu Wang, Majd Sakr, Carolyn Penstein Rosé
Perseverance Is Crucial for Learning. “OK! but Can I Take a Break?”

In a study with 108 10- to 12-year-olds who used a digital educational game targeting history, we addressed the phenomenon of perseverance, that is, the tendency to stick with a task even when it is challenging. The educational game was designed to make all students encounter tasks they did not succeed to solve, at which point they were offered a set of choices corresponding to perseverance and non-perseverance. Methods used were behavioral log data, post-questionnaires, and an in-game questionnaire conducted by a game character, who asked the students about the reason for their choice. Overall, we found no differences between high and low-perseverance students as to their experiences of effort, difficulty, and learning, and neither in their self-reported motives for persevering – when doing so. With respect to performance, however, high-persevering students solved significantly more tasks at higher difficulty levels. Comparing high-perseverance students who tended to take a break directly after a failed test – before they continued with the same task – with those who did not take a break, we found no significant differences, indicating that taking a break is not detrimental to learning and perseverance.

Annika Silvervarg, Magnus Haake, Agneta Gulz
Connecting the Dots Towards Collaborative AIED: Linking Group Makeup to Process to Learning

We model collaborative problem solving outcomes using data from 37 triads who completed a challenging computer programming task. Participants individually rated their group’s performance, communication, cooperation, and agreeableness after the session, which were aggregated to produce group-level measures of subjective outcomes. We scored teams on objective task outcomes and measured individual students’ learning outcomes with a posttest. Groups with similar personalities performed better on the task and had higher ratings of communication, cooperation, and agreeableness. Importantly, greater deviation in teammates’ perception of group performance and higher ratings of communication, cooperation, and agreeableness negatively predicted individual learning. We discuss findings from the perspective of group work norms and consider applications to intelligent systems that support collaborative problem solving.

Angela Stewart, Sidney K. D’Mello
Do Preschoolers ‘Game the System’? A Case Study of Children’s Intelligent (Mis)Use of a Teachable Agent Based Play-&-Learn Game in Mathematics

For learning to take place in digital learning environments, learners need to use educational software – more or less – as intended. However, previous studies show that some school children, instead of trying to learn and master a skill, choose to systematically exploit or outsmart the system to gain progress. But what about preschoolers? The present study explores the presence of this kind of behavioral patterns among preschoolers who use a teachable agent-based play-&-learn game in early math. We analyzed behavioral data logs together with interviews and observations. We also analyzed action patterns deviating from the pedagogical design intentions in terms of non-harmful gaming, harmful gaming, and wheel-spinning. Our results reveal that even if pedagogically not intended use of the game did occur, harmful gaming was rare. Interestingly, the results also indicate an unexpected awareness in children of what it means to learn and to teach. Finally, we present a series of possible adjustments of the used software in order to decrease gaming-like behavior or strategies that signalize insufficient skills or poor learning.

Eva-Maria Ternblad, Magnus Haake, Erik Anderberg, Agneta Gulz
Vygotsky Meets Backpropagation
Artificial Neural Models and the Development of Higher Forms of Thought

In this paper we revisit Vygotsky’s developmental model of concept formation, and use it to discuss learning in artificial neural networks. We study learning in neural networks from a learning science point of view, asking whether it is possible to construct systems that have developmental patterns that align with empirical studies on concept formation. We put the state-of-the-art Inception-v3 image recognition architecture in an experimental setting that highlights differences and similarities in algorithmic and human cognitive processes.The Vygotskian model of cognitive development reveals important limitations in currently popular neural algorithms, and puts neural AI in the context of post-behavioristic science of learning. At the same time, the Vygotskian model of development of thought suggests new architectural principles for developing AI, machine learning, and systems that support human learning. In this context we can ask what would it take for machines to learn, and what could they learn from research on learning.

Ilkka Tuomi
Providing Automated Real-Time Technical Feedback for Virtual Reality Based Surgical Training: Is the Simpler the Better?

In surgery, where mistakes have the potential for dire consequences, proper training plays a crucial role. Surgical training has traditionally relied upon experienced surgeons mentoring trainees through cadaveric dissection and operating theatre practice. However, with the growing demand for more surgeons and more efficient training programs, it has become necessary to employ supplementary forms of training such as virtual reality simulation. However, the use of such simulations as autonomous training platforms is limited by the extent to which they can provide automated performance feedback. Recent work has focused on overcoming this issue by developing algorithms to provide feedback that emulates the advice of human experts. These algorithms can mainly be categorized into rule-based and machine learning based methods, and they have typically been validated through user studies against controls that received no feedback. To our knowledge, no investigations into the performance of the two types of feedback generation methods in comparison to each other have so far been conducted. To this end, we introduce a rule-based method of providing technical feedback in virtual reality simulation-based temporal bone surgery, implement a machine learning based method that has been proven to outperform other similar methods, and compare their performance in teaching surgical skills in practice through a user study. We show that simpler rule-based methods can be equally or more effective in teaching surgical skills when compared to more complex methods of feedback generation.

Sudanthi Wijewickrema, Xingjun Ma, Patorn Piromchai, Robert Briggs, James Bailey, Gregor Kennedy, Stephen O’Leary
Reciprocal Kit-Building of Concept Map to Share Each Other’s Understanding as Preparation for Collaboration

Collaborative learning is an active teaching and learning strategy, in which learners who give each other elaborated explanations can learn most. However, it is difficult for learners to explain their own understanding elaborately in collaborative learning. In this study, we propose a collaborative use of a Kit-Build concept map (KB map) called “Reciprocal KB map”. In a Reciprocal KB map for a pair discussion, at first, the two participants make their own concept maps expressing their comprehension. Then, they exchange the components of their maps and request each other to reconstruct their maps by using the components. The differences between the original concept map and the reconstructed map are diagnosed automatically as an advantage of the KB map. Reciprocal KB map is expected to encourage pair discussion to recognize the understanding of each other and to create an effective discussion. In an experiment reported in this paper, Reciprocal KB map was used for supporting a pair discussion and was compared with a pair discussion which was supported by a traditional concept map. Nineteen pairs of university students were requested to use the traditional concept map in their discussion, while 20 pairs of university students used Reciprocal KB map for discussing the same topic. The results of the experiment were analyzed using three metrics: a discussion score, a similarity score, and questionnaires. The discussion score, which investigates the value of talk in discussion, demonstrates that Reciprocal KB map can promote more effective discussion between the partners compared to the traditional concept map. The similarity score, which evaluates the similarity of the concept maps, demonstrates that Reciprocal KB map can encourage the pair of partners to understand each other better compared to the traditional concept map. Last, the questionnaires illustrate that Reciprocal KB map can support the pair of partners to collaborate in the discussion smoothly and that the participants accepted this method for sharing their understanding with each other. These results suggest that Reciprocal KB map is a promising approach for encouraging pairs of partners to understand each other and to promote the effective discussions.

Warunya Wunnasri, Jaruwat Pailai, Yusuke Hayashi, Tsukasa Hirashima
Early Identification of At-Risk Students Using Iterative Logistic Regression

Higher education institutions are faced with the challenge of low student retention rates and high number of dropouts. 41% of college students in United States do not finish their undergraduate degree program in six years, and 60% of them drop out in their first two years of study. It is crucial for universities and colleges to develop data-driven artificial intelligence systems to identify students at-risk as early as possible and provide timely guidance and support for them. However, most of the current classification approaches on early dropout prediction are unable to utilize all the information from historical data from previous cohorts to predict dropouts of current students in a few semesters. In this paper, we develop an Iterative Logistic Regression (ILR) method to address the challenge of early prediction. The proposed framework is able to make full use of historical student record and effectively predict students at-risk of failing or dropping out in future semesters. Empirical results evaluated on a real-wold dataset show significant improvement with respect to the performance metrics in comparison to other existing methods. The application enabled by this proposed method provide additional support to students who are at risk of dropping out of college.

Li Zhang, Huzefa Rangwala
Artificial Intelligence in Education
Carolyn Penstein Rosé
Roberto Martínez-Maldonado
H. Ulrich Hoppe
Rose Luckin
Manolis Mavrikis
Dr. Kaska Porayska-Pomsta
Bruce McLaren
Benedict du Boulay
Copyright Year
Electronic ISBN
Print ISBN

Premium Partner