
Full text loading...
In our article, we discuss the quality of two tests of writing proficiency at university level; the two tests represent two realistic tasks at this level: summary, and short, informative text.We investigated the practicability of the tests, and the reliability of rating procedures empirically. Different groups of 30 university students wrote summaries of two texts, and short, informative texts on three subjects. The writing of one such summary took 40 minutes, the writing of a short, informative text 90 minutes; the length of a summary had to be about 100 words, the length of the short, informative text about 300 words.Each summary, or text, i.e. the writings of one group of 30 students, was rated by two raters, who worked independently; they rated, separately with an interval of some days, a-correctness of language, and b-content and organisation. In order to improve the reliability of rating, correctness of language was rated per paragraph: raters divided each summary in three, each informative text in four paragraphs of approximately equal length, and scored each paragraph separately; while rating the content and organisation, raters had to discern two, or four different aspects among which the complete-ness of the content, and the organisation (of paragraphs and) of text.Raters didn't have to analyse texts thoroughly: for each rating they read a text once or twice. In this way, a summary can be rated in 2+3= 5 minutes, an informative text in 3+5= 8 minutes. Correlations between ratings of correctness of language and of content and organisation of the same rater were not very high; they were lowest for the informative text; this suggests the well-foundedness of the separate rating of these two aspects. Rater-reliability was higher for correctness of language than for content and organisation.To ensure satisfactory reliability, two ratings are necessary, but even then two raters disagree in about 30% of the cases: these should be reconsidered.In our opinion, it is possible to reduce rating time: our fin-dings show that if rater A only rates correctness of language, and if rater does rate the same texts for content and organisation, they will agree in about 60% of the cases. Each of them should then rate the other aspect for the 40% of students whose work is to be reconsidered. The time left can, finally, be spent to reconsider the most difficult cases they don't yet agree about. This reduction of rating time is of particular interest if students are asked to write, for the sake of reliability, two (or more) texts.