Volume 13, Issue 2
  • ISSN 0155-0640
  • E-ISSN: 1833-7139
Buy:$35.00 + Taxes


The study examined the effects of fixed criteria, training and moderation on reliability of ratings assigned to written scripts. Using an item response analysis, consistency of inter and intra rater reliability of scoring patterns were examined under changing conditions. Ratings were assigned twice under workshop conditions and once under unsupervised isolated conditions. The workshops were used to identify criteria used by raters and then to obtain an agreed set of criteria using a consensus moderation approach. Results indicate that raters are influenced by their backgrounds, the moderation procedure and by the criteria depending on the circumstances under which the ratings were assigned. However a lack of fit of the ratings to a single dimension model over time, suggests that the raters may change their criteria under different conditions. Although similar ratings may be assigned, different criteria are employed by the same rater over time. The results seriously question the use of classical measurement approaches in the assessment of rater reliability.


Article metrics loading...

Loading full text...

Full text loading...


  1. Andrich, D.
    (1978) Application of a psychometric rating model to ordered categories which are scored with successive integers. Applied Psychological Measurement, 2:581–594. doi: 10.1177/014662167800200413
    https://doi.org/10.1177/014662167800200413 [Google Scholar]
  2. Ingram D.E. and E. Wiley
    (1979, Revised 1985) The Australian Second Language Proficiency Ratings. Griffith University, Mimeograph.
    [Google Scholar]
  3. Rasch, G.
    (1960, revised 1980) Probabilistic models for some intelligence and attainment tests. Copenhagen, Denmark Paedegogiske Institut and Chicago, University of Chicago Press.
    [Google Scholar]
  4. Skeehan, P.
    (1988) Peter Skeehan on Testing Part I. Language Teaching21,4:211–221.
    [Google Scholar]
  5. (1989) Peter Skeehan on Language Testing Part II. Language Teaching22,1:1–13. doi: 10.1017/S0261444800005346
    https://doi.org/10.1017/S0261444800005346 [Google Scholar]
  6. Wright B. and G. Masters
    (1982) Rating scale analysis. Chicago, MESA Press.
    [Google Scholar]
  7. Wright B. and M. Stone
    (1979) Best test design. Chicago, MESA Press.
    [Google Scholar]
  • Article Type: Research Article
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error