Automated essay evaluation software in English Language Arts classrooms: Effects on teacher feedback, student motivation, and writing quality
Introduction
A commonly used method for teaching writing is to provide instructional feedback (Biber et al., 2011, Black and William, 2009, Hattie and Timperley, 2007, Kellogg and Whiteford, 2009). Instructional feedback is information provided by an agent—such as a teacher, peer, or computer—that indicates both correctness/incorrectness and ways to improve performance or understanding (Hattie and Timperley, 2007, Parr and Timperley, 2010). Struggling writers, in particular, need targeted instructional feedback because they tend to produce shorter, less-developed, and more error-filled and problem-laden texts than their peers (Troia, 2006).
However, the role of instructional feedback in the teaching of writing is not without controversy. Proponents advocate its role in supporting motivation and writing quality by (a) indicating to the author his/her position relative to a desired level of quality; (b) identifying areas in need of improvement related to low-level writing skills (spelling, word choice, mechanics, grammar) or high-level skills (idea development and elaboration, organization, rhetoric); and (c) prompting additional practice attempts in which the author incorporates and eventually internalizes the feedback (Ferster et al., 2012, Kellogg et al., 2010, Parr and Timperley, 2010). In contrast, others argue that providing instructional feedback is (a) too time consuming and leads to teacher burnout (Anson, 2000, Baker, 2014, Lee, 2014); (b) too difficult for teachers to provide given the complexity of writing ability (Marshall and Drummond, 2006, Parr and Timperley, 2010); and (c) ineffective or incapable of achieving substantial, generalizable gains in students’ writing performance (Bangert-Drowns et al., 1991, Biber et al., 2011, Kluger and DeNisi, 1998). Nevertheless, instructional feedback continues to be recommended as a method for teaching writing (American Psychological Association, 2015, Graham et al., 2015, Graham et al., 2015, Graham et al., 2012, Graham and Perin, 2007).
In the U.S., an increasingly common form of instructional feedback for writing is feedback provided by automated essay evaluation systems (AEE) (Warschauer & Grimes, 2008). AEE are web-based formative writing assessment software programs which provide students with immediate automated feedback in the form of essay ratings and individualized suggestions for improvement when revising (Shermis & Burstein, 2013). Some of the principal benefits of AEE are efficiency and flexibility. While there is no consensus regarding the optimal timing of feedback (see Shute (2008) for review), immediate feedback is often preferred (Chan et al., 2014, Ferster et al., 2012) especially in classroom settings (Hattie and Timperley, 2007, Shute, 2008). In addition, unlike teacher or peer feedback, automated feedback allows students to control feedback timing. Students receive feedback when they request it, either in the middle of, or after having completed, an essay draft. This enables feedback to be immediately actionable, accelerating the practice-feedback loop (Foltz et al., 2013, Kellogg et al., 2010).
While automated feedback addresses some of the barriers faced by teachers when providing instructional feedback, the intended use of AEE systems is to complement and not replace teacher feedback (Foltz, 2014, Foltz et al., 2013, Kellogg et al., 2010). Indeed, AEE is thought to free up instructional time and allow teachers to be more selective in the type of feedback they provide, thereby improving students’ writing motivation and writing performance (Grimes & Warschauer, 2010). For instance, after implementing AEE in her high school, a school administrator reported that, “[AEE] has helped motivate our students to write while making it easier for educators to provide the feedback needed to ensure growth in writing” (Schmelzer, 2004, p.34).
Yet, the growing adoption of AEE in the U.S. has been accompanied by a number of concerns and fears. For instance, despite its intended role as a complement to teacher feedback, some fear that AEE will come to replace the teacher as primary feedback agent (Ericcson and Haswell, 2006, Herrington and Moran, 2001), and thereby negate the social communicative function of writing (National Council of Teachers of English [NCTE], 2013). Others are concerned that AEE can be easily fooled to assign high scores to essays which are long, syntactically complex, and replete with complex vocabulary (Bejar et al., 2014, Higgins and Heilman, 2014). Concerns such as these have led some groups to summarily reject the use of AEE (Conference on College Composition and Communication, 2014, National Council of Teachers of English, 2013).
This debate over AEE’s virtues and ills is compounded by two related issues. First, there is a dearth of research on AEE used for the purpose of formative assessment—i.e., assessment for, rather than of learning (Black & William, 2009). By far, the majority of research has focused on the psychometric properties of the automated scoring engine, rather than documenting evidence that automated feedback is associated with desired changes in teacher feedback practices or students’ writing motivation or writing quality (Stevenson & Phakiti, 2014). Indeed, a recent chapter on the formative use of AEE in the Handbook of Writing Research still primarily discusses the features of the AEE scoring systems and the reliability and agreement of those systems with human essay ratings. The chapter authors acknowledge that research “still needs to be conducted to gain a more comprehensive understanding of the impact of [automated] feedback than can guide best-use practice” (Shermis, Burstein, Elliot, Miel, & Foltz, 2015, p.406). Second, previous research has most often examined the effects of automated feedback in isolation of teacher feedback (Stevenson & Phakiti, 2014). Such designs lack ecological validity and may inadvertently bolster fears that adoption of AEE will replace teachers as feedback agents.
Given the controversies surrounding the use of instructional feedback and AEE, as well as the dearth of prior research focusing on the intended usage of AEE, the current study was designed to explore the implications for instruction and student performance when teacher feedback on writing was combined with automated feedback. Specific outcomes of interest included the amount, type, and level of teacher feedback; students’ writing motivation; and final-draft writing quality. To further provide context for this study, three areas of prior research will be discussed: (1) categorization of teacher feedback on writing, (2) effects of teacher and automated feedback on writing motivation, and (3) effects of teacher and automated feedback on writing quality.
Section snippets
Categorizing teacher feedback on writing
Teacher feedback on writing is commonly categorized as having at least two components: feedback type and feedback level. Feedback type relates to the manner in which feedback is presented to the student. A common distinction is between direct and indirect feedback (Biber et al., 2011, Black and William, 1998, Cho et al., 2006, DeGroff, 1992, Shute, 2008). Direct feedback (i.e., directives) involves teachers making a correction or directly telling students what needs to be revised. Indirect
Study purpose
This study examined effects on teacher feedback, and students’ writing motivation and final-draft writing quality associated with a combined automated + teacher feedback condition, in which students received feedback from an AEE system called PEG Writing® as well as their teacher, and a teacher-feedback-only condition, in which students received feedback from their teacher via the comment and edit functions of GoogleDocs. To date, no research has evaluated the effects of a combined
Setting and participants
This study was conducted in a middle school in an urban school district in the mid-Atlantic region of the United States. The district serves approximately 10,000 students in ten elementary schools, three middle schools, and one high school. In this district, 43% of students are African-American, 20% are Hispanic/Latino, and 33% White. Approximately 9% of students are English Language Learners, and 50% of students come from low income families.
Two eighth-grade English Language Arts (ELA)
Pretest analyses
A one-way ANOVA indicated that groups were equivalent with regard to prior literacy achievement: PEG + Teacher (M = 916.23, SD = 256.79), GoogleDocs (M = 851.37, SD = 252.56); F(1, 142) = 2.34, p = 0.129. Non-parametric analyses performed on the individual writing-motivation survey items revealed that the null hypothesis of equal distributions across feedback conditions was retained in all cases. Hence, at pretest, groups were equivalent with respect to prior literacy ability and writing
Discussion
To our knowledge, this was the first study to compare the effects of a combined teacher + automated feedback condition against a teacher-feedback-only condition (GoogleDocs) A recent literature review indicated the absence of such comparisons and the need to utilize a more ecologically valid experimental design than customary automated feedback versus no-feedback control designs (Stevenson & Phakiti, 2014).
Conclusion
With the increasing adoption of AEE in classroom settings in the U.S., it is important to carefully understand the associated effects on teachers’ feedback practices and key student outcomes, such as writing motivation and writing quality, when AEE is used as intended. The current study provides partial support for the claim that AEE will afford teachers the ability to focus on higher-level writing skills, while increasing students’ writing motivation and writing quality. Nevertheless, study
Acknowledgements
An earlier version of this work was presented as a paper presented at the 10th Workshop on Innovative Use of NLP for Building Educational Applications in Denver, Colorado, in June of 2015. This research was supported in part by a Delegated Authority contract from Measurement Incorporated® to University of Delaware (EDUC432914150001). The opinions expressed in this paper are those of the authors and do not necessarily reflect the positions or policies of this agency, and no official endorsement
References (92)
Response and the social construction of error
Assessing Writing
(2000)“Get it off my stack”: teachers’ tools for grading papers
Assessing Writing
(2014)- et al.
On the vulnerability of automated scoring to construct-irrelevant response strategies (CIRS): an illustration
Assessing Writing
(2014) - et al.
The effectiveness of educational technology applications for enhancing mathematics achievement in K-12 classrooms: a meta-analysis
Educational Research Review
(2013) - et al.
Automated essay scoring feedback for second language writers: how does it compare to instructor feedback?
Assessing Writing
(2014) - et al.
The structural relationship between writing attitude and writing achievement in first and third grade students
Contemporary Educational Psychology
(2007) - et al.
Sugaring the pill: praise and criticism in written feedback
Journal of Second Language Writing
(2001) Feedback in writing: issues and challenges
Assessing Writing
(2014)- et al.
Feedback to writing, assessment for teaching and learning and student progress
Assessing Writing
(2010) - et al.
Written feedback and scoring of sixth-grade girls’ and boys’ narrative and persuasive writing
Assessing Writing
(2004)