Elsevier

System

Volume 29, Issue 3, September 2001, Pages 371-383
System

Writing evaluation: what can analytic versus holistic essay scoring tell us?

https://doi.org/10.1016/S0346-251X(01)00025-2Get rights and content

Abstract

Two important issues in essay evaluation are choice of an appropriate rating scale and setting up criteria based on the purpose of the evaluation. Research has shown that reliable and valid information gained from both analytic and holistic scoring instruments can tell teachers much about their students’ proficiency levels. However, it is claimed that the purpose of the essay task, whether for diagnosis, development or promotion, is significant in deciding which scale is chosen. Revisiting the value of these scales is necessary for teachers to continue to be aware of their relevance. This article reports a study carried out on a sample of final exam essays written by L1 Arabic non-native students of English attending the Freshman English I course in the English as a Foreign Language (EFL) program at the Lebanese American University. Specifically, it aims to find out what analytic and holistic scoring using one evaluation instrument, the English as a Second Language (ESL) Composition Profile (Jacobs et al., 1981. Testing ESL Composition: A Practical Approach. Newbury House, Rowley, MA), can tell teachers about their students’ essay proficiency on which to base promotional decisions. Findings indicate that the EFL program would benefit from more analytic measures.

Introduction

In adopting scoring instruments with clearly identifiable criteria for evaluating1 English as a Foreign Language/English as a Second Language (EFL/ESL) essays, the paramount guiding principle is obviously the purpose of the essay. Evaluating essays in EFL/ESL programs has been mainly for diagnostic, developmental or promotional purposes (66, 67). Thus, in order for these programs to obtain valid results upon which to base decisions, the choice of evaluation instrument to be adopted becomes significant. Although Gamaroff (2000) states that language testing is not in an “abyss of ignorance” (Alderson, 1983 cited in Gamaroff, 2000), the choice of the ‘right’ essay writing evaluation criteria in many EFL/ESL programs remains problematic as often those chosen are inappropriate for the purpose. This is most crucial when decisions concerning student promotion at the end of the semester to the next English course have to be made mainly based on essay writing scores. It is then important that teachers are aware of the potential of the evaluation criteria being adopted. This article focuses on examining what two types of scoring instruments, analytic and holistic, can tell us in essay writing evaluation for promotional purposes.

Section snippets

Literature review

In evaluating essay writing either analytically or holistically, teachers have had to address a number of concerns that have been found in the research to affect the assigning of a final score to a writing product. Some of these concerns have included the need to attain valid and reliable scores, set relevant tasks, give sufficient writing time, set clear essay prompts, and choose appropriate rhetorical modes (5, 14, 15, 29, 49, 50, 6, 57, 8, 28, 36, 10, 27, 51, 48, 62, 20). These issues are

The sample

A stratified random sample of 30 essays was selected from a corpus of N=156 final exam essays written by L1 Arabic non-native students at the end of a 4-month semester in the Freshman English I course, the first of four in the EFL program at the Lebanese American University. The final essay (given as part of the final exam which also includes a reading comprehension component) is used to test for promotional purposes. The percentage of the final exam is 30% of the final course grade with the

Results

Significant positive relations of 0.8 were obtained between the two readers’ holistic essay scores (P=0.001) using the Spearman Correlation Coefficient Statistical Test (inter-rater reliability). When a random sample of essays (N=10) was re-evaluated by the same readers, there was also a similar significant positive relation (P=0.001) between the scores of the first and second readings using the Spearman Correlation Coefficient Statistical Test. That is, raters were in agreement with their own

Discussion

Although there were high inter- and intra-reliability coefficients, the holistic scoring revealed little about the performance of the students in the different components of the writing skill. White (1994) warns against holistic scoring alone and suggests that “…we need to define writing more inclusively than our holistic scoring guides…which normally yield no gain scores across an academic year” (p. 266). However, this finding suggests that the performance of the students is a reliable

Conclusion

The aim of the present study was to find out what holistic and analytic evaluation can tell us and what general lessons can be drawn for the evaluation of writing. Although holistic scoring may blend together many of the traits assessed separately in analytic scoring, making it relatively easier and reliable, it is not as informative for the learning situation as analytic scoring. Although the study was done on a limited sample, the results indicate that more attention should be given to the

References (66)

  • G Brossell

    Rhetorical specification in essay examination topics

    College English

    (1983)
  • N Carlman

    Topic differences on writing tests: how much do they matter?

    English Quarterly

    (1986)
  • Carlson, S. et al., 1985: Relationship of Admission Test Scores to Writing Performance of Native and Non-native...
  • T Caudery

    The validity of timed essay tests in the assessment of writing skills

    ELT Journal

    (1990)
  • D Charney

    The validity of using holistic scoring to evaluate writing

    Research in the Teaching of English

    (1984)
  • L Cohen et al.

    Research Methods in Education

    (1994)
  • J Connor-Linton

    Looking behind the curtain: what do L2 composition ratings really mean?

    TESOL Quarterly

    (1995)
  • C.R Cooper et al.

    Evaluating Writing: Describing, Measuring, Judging

    (1977)
  • M Crowhurst et al.

    Audience and mode of discourse effects on syntactic complexity in writing at two grade levels

    Research in the Teaching of English

    (1979)
  • A Cumming

    Expertise in evaluating second language compositions

    Language Testing

    (1990)
  • S Cushing-Weigle

    Effects of training on raters of ESL compositions

    Language Testing

    (1994)
  • D Douglas

    Developments in language testing

    Annual Review of Applied Linguistics

    (1995)
  • M.H Doushaq

    An investigation into stylistic errors of Arab students learning English for academic purposes

    English for Specific Purposes

    (1986)
  • P Elbow

    Ranking, evaluating, and liking: sorting out three forms of judgments

  • Hamp-Lyons, L., 1986a. Testing second language writing in academic settings. Unpublished doctoral dissertation,...
  • L Hamp-Lyons

    Writing in a foreign language and rhetorical transfer: influences on raters’ evaluation

  • L Hamp-Lyons

    Second language writing: assessment issues

  • L Hamp-Lyons

    Rating nonnative writing: the trouble with holistic scoring

    TESOL Quarterly

    (1995)
  • M Hayward

    Evaluations of essay prompts by nonnative speakers of English

    TESOL Quarterly

    (1990)
  • G Hillocks

    Research on Written Composition: New Directions for Teaching

    (1986)
  • J Hoetker

    Essay examination topics and students’ writing

    College Composition and Communication

    (1982)
  • M Hoey

    Patterns of Lexis in Text

    (1991)
  • Cited by (89)

    • The impact of essay organization and overall quality on the holistic scoring of EFL writing: Perspectives from classroom english teachers and national writing raters

      2022, Assessing Writing
      Citation Excerpt :

      Holistic and analytical scoring methods have gained wide acceptance in writing assessment practices (Ohta et al., 2018; Wilson et al., 2016; Zhang et al., 2015). Although the holistic scoring method has the highest construct validity when the overall attained writing proficiency is the construct to be assessed, it has threats to reliability because it can be highly subjective (Bacha, 2001; Liu & Hunag, 2020, Zhao & Huang, 2020; Cumming, 2001; Knoch, 2009; Lumley, 2005; Ohta et al., 2018). Meanwhile, holistic scoring sometimes becomes difficult for the raters, especially when different proficiency levels of language, content, and organization are displayed in a single essay (e.g., good in organization but poor in language, or good in language but poor in organization), in which case raters have to do some cognitively demanding work of synthesizing or weighing before assigning a single score to the essay (Barkaoui, 2007, 2010a).

    View all citing articles on Scopus
    View full text