This paper describes a study of the computer essay-scoring program BETSY. While the use of computers in rating written scripts has been criticised in some quarters for lacking transparency or lack of fit with how human raters rate written scripts, a number of essay rating programs are available commercially, many of which claim to offer comparable reliability with human raters. Much of the validation of such programs has focused on native-speaking tertiary-level students writing in subject content areas. Instead of content areas with native-speakers, the data for this study is drawn from a representative sample of scripts from an English as a second language (ESL) Year 11 public examination in Hong Kong. The scripts (900 in total) are taken from a writing test consisting of three topics (300 scripts per topic), each representing a different genre. Results in the study show good correlations between human raters’ scores and the program BETSY. A rater discrepancy rate, where scripts need to be re-marked because of disagreement between two raters, emerged at levels broadly comparable with those derived from discrepancies between paired human raters. Little difference was apparent in the ratings of test takers on the three genres. The paper concludes that while computer essay-scoring programs may appear to rate inside a ‘black box’ with concomitant lack of transparency, they do have potential to act as a third rater, time-saving assessment tool. And as technology develops and rating becomes more transparent, so will their acceptability. Copyright © 2009 European Association for Computer Assisted Language Learning.
CitationConiam, D. (2009). Experimenting with a computer essay-scoring program based on ESL student writing scripts. ReCALL, 21(2), 259-279. doi: 10.1017/S0958344009000147
- Computer scoring
- English language