Volume 17, Issue 2
  • ISSN 0155-0640
  • E-ISSN: 1833-7139
Buy:$35.00 + Taxes


Lack of inter-rater agreement in the assessment of oral tests is wellknown. In this paper, multi-faceted Rasch analysis was used to determine whether any bias was evident in the way a group of raters (N=13) rated two different versions of an oral interaction test, undertaken by the same candidates (N=83) under the two conditions – direct and semi-direct. Rasch measurement allows analysis of the interaction between ‘facets’; in this case, raters, items and candidates are all facets. In this study, the interaction between rater and item was investigated in order to determine whether particular tasks in the test were scored in a consistently biased way by particular raters. The results of the analysis indicated that certain raters consistently assessed the tape-version of the test more harshly whilst others consistently rated the live version more harshly. This type of approach also allowed a finer analysis at the level of individual items with respect to harshness and consistency across ratings. The implications for rater training and feedback are discussed.


Article metrics loading...

Loading full text...

Full text loading...


  1. Brown, A.
    (1993) The effect of rater variables in the development of an occupation-specific language performance test. Paper presented atthe annual Language Testing Research Colloquium, Cambridge, U.K., August 2-5
    [Google Scholar]
  2. Elder, C.
    (1993) Are rater judgements of teacher effectiveness wholly language based. Paper presented atthe annual Language Testing Research Colloquium, Cambridge, U.K., August 2-5
    [Google Scholar]
  3. Fayer, J.M. and E. Krasinski
    (1987) Native and nonnative judgements of intelligibility and irritation. Language Learning37,3:313–326 doi: 10.1111/j.1467‑1770.1987.tb00573.x
    https://doi.org/10.1111/j.1467-1770.1987.tb00573.x [Google Scholar]
  4. Linacre, J A M.
    (1992a) FACETS computer program for many faceted Rasch measurement (version 2.62). Chicago IL: Mesa Press
    [Google Scholar]
  5. Linacre, J.M.
    (1992b) A User’s Guide to Facets. Chicago IL: Mesa Press
    [Google Scholar]
  6. McNamara, T.
    (1990) Item Response Theory and the validation of an ESP test for health professionals, Language Testing7:52–75 doi: 10.1177/026553229000700105
    https://doi.org/10.1177/026553229000700105 [Google Scholar]
  7. (in preparation) Second Language Performance Assessment. Unpublished manuscript.
    [Google Scholar]
  8. Shohamy, E. , C.M. Gordon and R. Kraemer
    (1992) The effect of rater’ background and training on the reliability of direct writing test. The Modern Language Journal76,1:27–33 doi: 10.1111/j.1540‑4781.1992.tb02574.x
    https://doi.org/10.1111/j.1540-4781.1992.tb02574.x [Google Scholar]
  9. Stahl, J. and Lunz, M.
    (1992) Judge Performance Reports. Paper presented atAERA, San Francisco, April
    [Google Scholar]
  10. Wigglesworth, G.
    (1993) Exploring bias analysis as a tool for improving rater consistency in assessing oral interaction. Language Testing, 10,3:305–335 doi: 10.1177/026553229301000306
    https://doi.org/10.1177/026553229301000306 [Google Scholar]
  • Article Type: Research Article
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error