Improving Writing Feedback for Struggling Writers: Generative AI to the Rescue?
Writing Instruction and Feedback for Struggling Writers
Effective feedback must be strategically delivered, and goal directed; feedback is most effective when the learner has a goal, and the feedback informs the learner regarding areas needing improvement and ways to improve performance… Teachers should provide ongoing feedback until learners reach their established learning goals. (p. 25)
Supporting Writing by Struggling Writers
Teachers’ DDDM
Efficient and Effective DDDM
Purpose
-
RQ1: What is the difference between responses generated by GPT-3.5 and GPT-4 given prompts which provide varying specificity about students’ essays?
-
RQ2: What is the nature of the instructional suggestions provided by ChatGPT for students with and without disabilities and/or ELLs (aka struggling writers)?
-
RQ3: How does the formative feedback provided by GPT-3.5 and GPT-4 compare to the feedback provided by teachers when given the same rubric?
Method
Participants
Study Context
-
Some students go to school on Saturday. Write an essay on whether or not students should go to school on Saturdays.
-
Some people believe kids your age should not have cell phones. Using specific details and examples to persuade someone of your opinion, argue whether or not kids your age should have cell phones.
Data Sources
Title | Prompt Characteristics | Sample Prompt |
---|---|---|
Specific Analytic Rubric | • Information about a student (grade and characteristics) • Writing prompt a student responded to • Specific analytic rubric from the TBGO | Here is a rubric I used to assess my students' writing [upload the specific analytic rubric into Chat GPT]. My [insert grade level] student who is [insert specific student characteristic: with a learning disability, with ADHD, an English Language Learner; a struggling writer] wrote this essay arguing whether [insert writing prompt here]. “[Insert student essay here].” Score it using the rubric |
Generic 0–4 Rating | • Information about a student (grade and characteristics) • Writing prompt a student responded to • Generic 0–4 scale rating (no specifics) | I am an elementary/middle school writing teacher. I need to provide feedback and identify the areas of need for this students’ writing. My [insert grade level] student [insert specific student characteristic: with a learning disability, with ADHD, an English Language Learner; a struggling writer] wrote this essay arguing whether [insert writing prompt here]. “[Insert student essay here].” Given student characteristics, please score the essay on a 0–4 scale for holistic quality |
No Rubric | • Information about a student (grade and characteristics) • Writing prompt a student responded to • One area of need; one instructional suggestion | I am an elementary/middle school writing teacher. My [insert grade level] student [insert specific student characteristic: with a learning disability, with ADHD, an English Language Learner; a struggling writer] wrote this essay arguing whether [insert writing prompt here]. “[Insert student essay here].” Given student characteristics, identify one area of need for the student’s writing and offer one instructional suggestion to address it |
No Info | • No information about the student or prompt • General feedback and instructional suggestions | I am an elementary school writing teacher. My student wrote this essay. “[Insert student essay here].” What feedback and instructional suggestions should I provide? |
Data Analysis and Credibility
Findings
RQ1: Differences in AI Responses
Predictable Pattern of Response
Specific Analytic Rubric GPT-3.5 | Specific Analytic Rubric GPT-4 | Generic 0–4 Rating GPT-3.5 | Generic 0–4 Rating GPT-4 | No Rubric GPT-3.5 | No Rubric GPT-4 | No Info GPT-3.5 | No Info GPT-4 | |
---|---|---|---|---|---|---|---|---|
Excessive amounts of feedback (not prioritized) | X | X | X | X | X | X | X | X |
Predictable pattern for response | X | X | X | X | X | X | X | X |
Teacher specific language | X | X | ||||||
Broad areas of need; common phrases | X | X | X | X | X | |||
Individualized, specific feedback using examples | X | X | X | |||||
Misaligned feedback not matching sample | X | X | X | X | ||||
Response starts with specific praise | X | X | X | |||||
Instructional suggestions: Specific | N/A | N/A | X | X | X | |||
Instructional suggestions: Broad | N/A | N/A | X | X | ||||
Instructional suggestions: None | N/A | N/A | X | |||||
Mentioning of student characteristics | X | X | X | X | N/A | N/A | ||
Not aligning with grade/age | X | X | X | X | X | X | N/A | N/A |
Revised essay included | X | X | X |
Using Specific Language from the Rubric
Identifying General, Broad Areas of Need
Focusing on an Individualized, Specific Areas of Need
Misaligned Feedback
-
was missing an opinion that aligned with the prompt
-
had an opinion but did not start it with words “I believe …” (e.g., “Kids should get more recess time.”); and
-
already had a strong introductory sentence (e.g., “I believe that school starts too early and should begin later in the morning.”).
Starting with Specific Praise/Positive Affirmation
RQ2: Instructional Suggestions
Specific Suggestions
Broad Instructional Suggestions
No Instructional Suggestions
RQ3: Comparisons Between Teachers and ChatGPT
Discussion and Practical Implications
-
The findings indicate the possibilities and limitations of ChatGPT for evaluating student writing, interpreting a teacher-developed rubric, and providing instructional strategies.
-
In the ChatGPT sets which included no contextual information, the responses included more feedback.
-
All sets generated excessive amounts of feedback about student writing with no delineation of the next clear instructional move a teacher should attend to. So, ChatGPT may work as a great starting point, but teachers will need to go through the response to prioritize and design their instruction. Sifting through information for relevance can be time consuming and may even warrant a teacher verifying the content further.
-
Additionally, if students relied directly on ChatGPT, without any vetting from a teacher about the content, they too may be overwhelmed by the amount of feedback given to modify their writing or they may even be provided with erroneous feedback.
-
All GPT-3.5 sets identified broad areas of writing that needed improvement and frequently used common phrases such as grammar, organization/development of ideas, and attention to detail. In addition, this feedback was more often misaligned with students’ writing. This observation is worrisome since GPT-3.5 version of ChatGPT is free and highly accessible, making it likely the preferred AI tool for classroom educators.
-
Most GPT-4 sets (except one) generated more specific and individualized feedback about student writing. The specific feedback included in the generated outputs were much lengthier and would take much more time for a teacher to review than GPT-3.5 responses.
-
All sets identified multiple areas of need and when included in the responses, there were multiple instructional suggestions. Even the No Rubric sets, which explicitly prompted ChatGPT to focus on just one area of instructional need and one suggestion, included much more in the responses. This finding reiterates that we are still learning about AI literacy and the language we need to use to communicate effectively.
-
Both GPT-3.5 and GPT-4 allowed the upload of a researcher-developed analytic rubric and moreover, interpreted the performance criteria, rating scale, and indicators. ChatGPT also used the rubric’s specific language when providing its evaluation of the student writing.
-
No tailored feedback or specific suggestions were contextualized when prompts included varying ages, grade levels, or various student abilities and needs. Further research is needed to determine the types of AI literacy prompts or the contextual information that ChatGPT needs to address the particular needs of an individual child. Specially designed instruction, the heart of special education, should be tailored to a particular student (Sayeski et al., 2023).
-
Low agreement reported between the rubric scores and instructional suggestions made by teachers and those generated by ChatGPT does not necessarily mean that ChatGPT’s feedback is incorrect. One explanation for the difference may be that teachers provide targeted and individualized instruction using multiple forms of data and critical information to make instructional decisions. This includes their own professional judgement and knowledge about how each students’ backgrounds, culture, and language may influence student performance (McLeskey et al., 2017).