Volume 24, Issue 1
  • ISSN 1384-6647
  • E-ISSN: 1569-982X
Buy:$35.00 + Taxes



In this study, we applied and evaluated a scoring method known as to assess spoken-language interpreting. This methodological exploration represents an extension of previous efforts to optimise scoring methods for assessing interpreting. Essentially, comparative judgement requires judges to compare two similar objects and make a binary decision about their relative qualities. To evaluate its reliability, validity and usefulness in the assessment of interpreting, we recruited two groups of judges (novice and experienced) to assess 66 two-way English/Chinese interpretations based on a computerised comparative judgement system. Our data analysis shows that the new method produced reliable and valid results across judge types and interpreting directions. However, the judges held polarised opinions about the method’s usefulness: while some considered it convenient, efficient and reliable, the opposite view was expressed by others. We discuss the results by providing an integrated analysis of the data collected, outline the perceived drawbacks and propose possible solutions to the drawbacks. We call for more evidence-based, substantive investigation into comparative judgement as a potentially useful method for assessing spoken-language interpreting in certain settings.


Article metrics loading...

Loading full text...

Full text loading...


  1. Andrich, D.
    (1978) Relationships between the Thurstone and Rasch approaches to item scaling. Applied Psychological Measurement2 (3), 451–462.   10.1177/014662167800200319
    https://doi.org/10.1177/014662167800200319 [Google Scholar]
  2. Barik, H. C.
    (1971) A description of various types of omissions, additions and errors of translation encountered in simultaneous interpretation. Meta16 (4), 199–210.   10.7202/001972ar
    https://doi.org/10.7202/001972ar [Google Scholar]
  3. Bradley, R. A. & Terry, M. E.
    (1952) Rank analysis of incomplete block designs: The method of paired comparisons. Biometrika39 (3/4), 324–345.   10.1093/biomet/39.3‑4.324
    https://doi.org/10.1093/biomet/39.3-4.324 [Google Scholar]
  4. Bramley, T.
    (2015) Investigating the reliability of adaptive comparative judgement. https://www.cambridgeassessment.org.uk/Images/232694-investigating-the-reliability-of-adaptive-comparative-judgment.pdf (Accessed9 June 2021).
    [Google Scholar]
  5. Bramley, T., Bell, J. & Pollitt, A.
    (1998) Assessing changes in standards over time using Thurstone paired comparisons. Education Research and Perspectives25 (2), 1–23.
    [Google Scholar]
  6. Bühler, H.
    (1986) Linguistic (semantic) and extralinguistic (pragmatic) criteria for the evaluation of conference interpretation and interpreters. Multilingua5, 231–235.
    [Google Scholar]
  7. CCHI
    CCHI (2012) Technical report on the development and pilot testing of the Certified Healthcare Interpreter™ (CHI™) examination for Arabic and Mandarin. https://cchicertification.org/uploads/CCHI_Technical_Report-CHI-Arabic_Mandarin.pdf (Accessed9 June 2021).
    [Google Scholar]
  8. Chen, J., Yang, H-B. & Han, C.
    (2021) Holistic versus analytic scoring of spoken-language interpreting: A multi-perspectival comparative analysis. Manuscript submitted for publication.
    [Google Scholar]
  9. Cheung, A. K. F.
    (2015) Scapegoating the interpreter for listeners’ dissatisfaction with their level of understanding: An experimental study. Interpreting17 (1), 46–63.   10.1075/intp.17.1.03che
    https://doi.org/10.1075/intp.17.1.03che [Google Scholar]
  10. Han, C.
    (2015) Investigating rater severity/leniency in interpreter performance testing: A multifaceted Rasch measurement approach. Interpreting17 (2), 255–283.   10.1075/intp.17.2.05han
    https://doi.org/10.1075/intp.17.2.05han [Google Scholar]
  11. (2017) Using analytic rating scales to assess English–Chinese bi-directional interpreting: A longitudinal Rasch analysis of scale utility and rater behaviour. Linguistica Antverpiensia, New Series: Themes in Translation Studies16, 196–215. https://lans-tts.uantwerpen.be/index.php/LANS-TTS/article/view/429/407
    [Google Scholar]
  12. (2018) Using rating scales to assess interpretation: Practices, problems and prospects. Interpreting20 (1), 59–95.   10.1075/intp.00003.han
    https://doi.org/10.1075/intp.00003.han [Google Scholar]
  13. (2019) A generalizability theory study of optimal measurement design for a summative assessment of English/Chinese consecutive interpreting. Language Testing36 (3), 419–438.   10.1177/0265532218809396
    https://doi.org/10.1177/0265532218809396 [Google Scholar]
  14. (2021) Interpreting testing and assessment: A state-of-the-art review. Language Testing.   10.1177/02655322211036100
    https://doi.org/10.1177/02655322211036100 [Google Scholar]
  15. Hartley, A., Mason, I., Peng, G. & Perez, I.
    (2003) Peer- and self-assessment in conference interpreter training. Centre for Languages, Linguistics and Area Studies. https://researchportal.hw.ac.uk/en/publications/peer-and-self-assessment-in-conference-interpreting-training
    [Google Scholar]
  16. International School of Linguists
    International School of Linguists (2017) Diploma in Public Service Interpreting: Learner handbook. London, UK. https://www.islinguists.com/wp-content/uploads/2017/07/ISL-DPSI-Handbook-v4.2.pdf (Accessed9 June 2021).
    [Google Scholar]
  17. Jones, I. & Inglis, M.
    (2015) The problem of assessing problem solving: Can comparative judgement help?Educational Studies in Mathematics89 (3), 337–355.   10.1007/s10649‑015‑9607‑1
    https://doi.org/10.1007/s10649-015-9607-1 [Google Scholar]
  18. Jones, I. & Wheadon, C.
    (2015) Peer assessment using comparative and absolute judgement. Studies in Educational Evaluation47, 93–101.   10.1016/j.stueduc.2015.09.004
    https://doi.org/10.1016/j.stueduc.2015.09.004 [Google Scholar]
  19. Jones, I., Swan, M. & Pollitt, A.
    (2015) Assessing mathematical problem solving using comparative judgement. International Journal of Science and Mathematics Education13, 151–177.   10.1007/s10763‑013‑9497‑6
    https://doi.org/10.1007/s10763-013-9497-6 [Google Scholar]
  20. Laming, D.
    (2004) Marking university examinations: Some lessons from psychophysics. Psychology Learning and Teaching3 (2), 89–96.   10.2304/plat.2003.3.2.89
    https://doi.org/10.2304/plat.2003.3.2.89 [Google Scholar]
  21. Lee, J.
    (2008) Rating scales for interpreting performance assessment. The Interpreter and Translator Trainer2 (2), 165–184.   10.1080/1750399X.2008.10798772
    https://doi.org/10.1080/1750399X.2008.10798772 [Google Scholar]
  22. Lee, S-B.
    (2015) Developing an analytic scale for assessing undergraduate students’ consecutive interpreting performances. Interpreting17 (2), 226–254.   10.1075/intp.17.2.04lee
    https://doi.org/10.1075/intp.17.2.04lee [Google Scholar]
  23. Linacre, J. M.
    (2002) What do infit and outfit, mean-square and standardized mean?Rasch Measurement Transactions16 (2), 878.
    [Google Scholar]
  24. Liu, M.
    (2013) Design and analysis of Taiwan’s interpretation certification examination. InD. Tsagari & R. van Deemter (Eds.), Assessment issues in language translation and interpreting. Frankfurt: Peter Lang, 163–178.
    [Google Scholar]
  25. Luce, R.
    (1959) Individual choice behavior. New York: Wiley.
    [Google Scholar]
  26. McMahon, S. & Jones, I.
    (2015) A comparative judgement approach to teacher assessment. Assessment in Education: Principles, Policy & Practice22 (3), 368–389.   10.1080/0969594X.2014.978839
    https://doi.org/10.1080/0969594X.2014.978839 [Google Scholar]
  27. Myford, C. M. & Wolfe, E. W.
    (2003) Detecting and measuring rater effects using many-facet Rasch measurement: Part I. Journal of Applied Measurement4 (4), 386–422.
    [Google Scholar]
  28. National Center for State Courts
    National Center for State Courts (2019) Federal Court Interpreter Certification Examination for Spanish/English: Examinee handbook. https://www.prometric.com/sites/default/files/2019-09/fcice_examineehandbook.pdf (Accessed9 June 2021).
    [Google Scholar]
  29. Pöchhacker, F.
    (2002) Researching interpreting quality: Models and methods. InG. Garzone & M. Viezzi (Eds.), Interpreting in the 21st century: Challenges and opportunities. Amsterdam: John Benjamins, 95–106.   10.1075/btl.43.10poc
    https://doi.org/10.1075/btl.43.10poc [Google Scholar]
  30. Pollitt, A.
    (2012a) Comparative judgement for assessment. International Journal of Technology and Design Education22 (2), 157–170.   10.1007/s10798‑011‑9189‑x
    https://doi.org/10.1007/s10798-011-9189-x [Google Scholar]
  31. (2012b) The method of adaptive comparative judgement. Assessment in Education: Principles, Policies & Practice19 (3), 281–300.
    [Google Scholar]
  32. Pollitt, A. & Murray, N. L.
    (1996) What raters really pay attention to?InM. Milanovic, & N. Saville (Eds.), Studies in language testing 3: Performance testing, cognition and assessment. Cambridge: Cambridge University Press, 74–91.
    [Google Scholar]
  33. PSI Services LLC
    PSI Services LLC (2013) Development and validation of oral examinations for medical interpreter certification: Mandarin, Russian, Cantonese, Korean, and Vietnamese forms. https://nbcmi.memberclicks.net/assets/docs/tech-report-development-validation-language-forms.pdf (Accessed9 June 2021).
    [Google Scholar]
  34. Riccardi, A.
    (1998) Evaluation in interpretation: Macrocriteria and microcriteria. InE. Hung (Ed.), Teaching translation and interpreting 4. Building bridges. Amsterdam: John Benjamins, 115–127.   10.1075/btl.42.14ric
    https://doi.org/10.1075/btl.42.14ric [Google Scholar]
  35. Roberts, R. P.
    (2000) Interpreter assessment tools for different settings. InR. P. Roberts, S. E. Carr, D. Abraham & A. Dufour (Eds.), Critical link 2: Interpreters in the community. Amsterdam: John Benjamins, 103–130.   10.1075/btl.31.13rob
    https://doi.org/10.1075/btl.31.13rob [Google Scholar]
  36. Salkind, N. J.
    (2007) Encyclopedia of measurement and statistics. Thousand Oaks, CA: Sage. 10.4135/9781412952644
    https://doi.org/10.4135/9781412952644 [Google Scholar]
  37. Schjoldager, A.
    (1996) Assessment of simultaneous interpreting. InC. Dollerup & V. Appel (Eds.), Teaching translation and interpreting 3: New horizons. Amsterdam: John Benjamins, 187–195.   10.1075/btl.16.26sch
    https://doi.org/10.1075/btl.16.26sch [Google Scholar]
  38. Setton, R. & Dawrant, A.
    (2016) Conference interpreting: A trainer’s guide. Amsterdam: John Benjamins.   10.1075/btl.121
    https://doi.org/10.1075/btl.121 [Google Scholar]
  39. Setton, R. & Motta, M.
    (2007) Syntacrobatics: Quality and reformulation in simultaneous-with-text. Interpreting9 (2), 199–230.   10.1075/intp.9.2.04set
    https://doi.org/10.1075/intp.9.2.04set [Google Scholar]
  40. Thurstone, L. L.
    (1927) A law of comparative judgment. Psychological Review34 (4), 273–286.   10.1037/h0070288
    https://doi.org/10.1037/h0070288 [Google Scholar]
  41. (1954) The measurement of values. Psychological Review61 (1), 47–58.   10.1037/h0060035
    https://doi.org/10.1037/h0060035 [Google Scholar]
  42. Tiselius, E.
    (2009) Revisiting Carroll’s scales. InC. V. Angelelli & H. E. Jacobson (Eds.), Testing and assessment in translation and interpreting studies. Amsterdam: John Benjamins, 95–121.   10.1075/ata.xiv.07tis
    https://doi.org/10.1075/ata.xiv.07tis [Google Scholar]
  43. Verhavert, S., Bouwer, R., Donche, V. & De Maeyer, S.
    (2019) A meta-analysis on the reliability of comparative judgement. Assessment in Education: Principles, Policy & Practice26 (5), 541–562.   10.1080/0969594X.2019.1602027
    https://doi.org/10.1080/0969594X.2019.1602027 [Google Scholar]
  44. Wu, S.
    (2010) Assessing simultaneous interpreting: A study on test reliability and examiners’ assessment behavior. PhD thesis, Newcastle University.
    [Google Scholar]
  45. (2013) How do we assess students in the interpreting examinations?InD. Tsagari & R. van Deemter (Eds.), Assessment issues in language translation and interpreting. Frankfurt am Main: Peter Lang, 15–33.
    [Google Scholar]

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error