Writing evaluation: what can analytic versus holistic essay scoring tell us?
Introduction
In adopting scoring instruments with clearly identifiable criteria for evaluating1 English as a Foreign Language/English as a Second Language (EFL/ESL) essays, the paramount guiding principle is obviously the purpose of the essay. Evaluating essays in EFL/ESL programs has been mainly for diagnostic, developmental or promotional purposes (66, 67). Thus, in order for these programs to obtain valid results upon which to base decisions, the choice of evaluation instrument to be adopted becomes significant. Although Gamaroff (2000) states that language testing is not in an “abyss of ignorance” (Alderson, 1983 cited in Gamaroff, 2000), the choice of the ‘right’ essay writing evaluation criteria in many EFL/ESL programs remains problematic as often those chosen are inappropriate for the purpose. This is most crucial when decisions concerning student promotion at the end of the semester to the next English course have to be made mainly based on essay writing scores. It is then important that teachers are aware of the potential of the evaluation criteria being adopted. This article focuses on examining what two types of scoring instruments, analytic and holistic, can tell us in essay writing evaluation for promotional purposes.
Section snippets
Literature review
In evaluating essay writing either analytically or holistically, teachers have had to address a number of concerns that have been found in the research to affect the assigning of a final score to a writing product. Some of these concerns have included the need to attain valid and reliable scores, set relevant tasks, give sufficient writing time, set clear essay prompts, and choose appropriate rhetorical modes (5, 14, 15, 29, 49, 50, 6, 57, 8, 28, 36, 10, 27, 51, 48, 62, 20). These issues are
The sample
A stratified random sample of 30 essays was selected from a corpus of N=156 final exam essays written by L1 Arabic non-native students at the end of a 4-month semester in the Freshman English I course, the first of four in the EFL program at the Lebanese American University. The final essay (given as part of the final exam which also includes a reading comprehension component) is used to test for promotional purposes. The percentage of the final exam is 30% of the final course grade with the
Results
Significant positive relations of 0.8 were obtained between the two readers’ holistic essay scores (P=0.001) using the Spearman Correlation Coefficient Statistical Test (inter-rater reliability). When a random sample of essays (N=10) was re-evaluated by the same readers, there was also a similar significant positive relation (P=0.001) between the scores of the first and second readings using the Spearman Correlation Coefficient Statistical Test. That is, raters were in agreement with their own
Discussion
Although there were high inter- and intra-reliability coefficients, the holistic scoring revealed little about the performance of the students in the different components of the writing skill. White (1994) warns against holistic scoring alone and suggests that “…we need to define writing more inclusively than our holistic scoring guides…which normally yield no gain scores across an academic year” (p. 266). However, this finding suggests that the performance of the students is a reliable
Conclusion
The aim of the present study was to find out what holistic and analytic evaluation can tell us and what general lessons can be drawn for the evaluation of writing. Although holistic scoring may blend together many of the traits assessed separately in analytic scoring, making it relatively easier and reliable, it is not as informative for the learning situation as analytic scoring. Although the study was done on a limited sample, the results indicate that more attention should be given to the
References (66)
Rater reliability in language assessment: the bug of all bears
System
(2000)Essay examination prompts and the teaching of academic writing
English for Specific Purposes
(1986)Providing relevant content in an EAP writing test
English for Specific Purposes
(1990)ESL essay evaluation: the influence of sentence-level and rhetorical features
Journal of Second Language Writing
(1993)ESL writing assessment: subject matter knowledge and its impact on performance
English for Specific Purposes
(1990)Fundamental Considerations in Language Testing
(1990)What does language testing have to offer
TESOL Quarterly
(1991)Multiple-choice and holistic essay scores: what are they measuring?
College Composition and Communication
(1982)Research in Written Composition
(1963)
Rhetorical specification in essay examination topics
College English
Topic differences on writing tests: how much do they matter?
English Quarterly
The validity of timed essay tests in the assessment of writing skills
ELT Journal
The validity of using holistic scoring to evaluate writing
Research in the Teaching of English
Research Methods in Education
Looking behind the curtain: what do L2 composition ratings really mean?
TESOL Quarterly
Evaluating Writing: Describing, Measuring, Judging
Audience and mode of discourse effects on syntactic complexity in writing at two grade levels
Research in the Teaching of English
Expertise in evaluating second language compositions
Language Testing
Effects of training on raters of ESL compositions
Language Testing
Developments in language testing
Annual Review of Applied Linguistics
An investigation into stylistic errors of Arab students learning English for academic purposes
English for Specific Purposes
Ranking, evaluating, and liking: sorting out three forms of judgments
Writing in a foreign language and rhetorical transfer: influences on raters’ evaluation
Second language writing: assessment issues
Rating nonnative writing: the trouble with holistic scoring
TESOL Quarterly
Evaluations of essay prompts by nonnative speakers of English
TESOL Quarterly
Research on Written Composition: New Directions for Teaching
Essay examination topics and students’ writing
College Composition and Communication
Patterns of Lexis in Text
Cited by (89)
The impact of essay organization and overall quality on the holistic scoring of EFL writing: Perspectives from classroom english teachers and national writing raters
2022, Assessing WritingCitation Excerpt :Holistic and analytical scoring methods have gained wide acceptance in writing assessment practices (Ohta et al., 2018; Wilson et al., 2016; Zhang et al., 2015). Although the holistic scoring method has the highest construct validity when the overall attained writing proficiency is the construct to be assessed, it has threats to reliability because it can be highly subjective (Bacha, 2001; Liu & Hunag, 2020, Zhao & Huang, 2020; Cumming, 2001; Knoch, 2009; Lumley, 2005; Ohta et al., 2018). Meanwhile, holistic scoring sometimes becomes difficult for the raters, especially when different proficiency levels of language, content, and organization are displayed in a single essay (e.g., good in organization but poor in language, or good in language but poor in organization), in which case raters have to do some cognitively demanding work of synthesizing or weighing before assigning a single score to the essay (Barkaoui, 2007, 2010a).
Dimensions of text-based analytical writing of secondary students
2022, Assessing WritingRater variability and reliability of constructed response questions in New York state high-stakes tests of English language arts and mathematics: implications for educational assessment policy
2023, Humanities and Social Sciences Communications