1887
Volume 19, Issue 3
  • ISSN 1932-2798
  • E-ISSN: 1876-2700
USD
Buy:$35.00 + Taxes

Abstract

Abstract

This study examines the differences between paper- and computer-based translation quality assessment, focusing on score reliability, variability, scoring speed, and raters’ preferences. Utilizing a within-subjects design, 27 raters assessed 29 translations presented in both handwritten and word-processed formats, employing a holistic scoring method. The findings reveal comparable translation quality ratings across both modes, with paper-based scoring showing greater inter-rater disagreement and being affected by handwriting legibility. Paper-based scoring was generally faster, though computer-based scoring demonstrated less variability in inter-rater reliability. Raters exhibited a preference for paper-based scoring due to its perceived faster speed, flexibility in annotating, and eye-friendliness. The study highlights the importance of comprehensive rater training and calibration to mitigate biases and non-uniform severity, as well as the adoption of detailed scoring rubrics to ensure consistent assessment across modes. The article offers insights on refining computer-based scoring systems, including enhancements in annotation functionality and ergonomic considerations.

Loading

Article metrics loading...

/content/journals/10.1075/tis.23035.sun
2025-05-13
2025-06-24
Loading full text...

Full text loading...

References

  1. Angelelli, Claudia V.
    2009 “Using a rubric to assess translation ability: Defining the construct.” InTesting and Assessment in Translation and Interpreting Studies, ed. byClaudia V. Angelelli and Holly E. Jacobson, 13–47. Amsterdam: John Benjamins. 10.1075/ata.xiv.03ang
    https://doi.org/10.1075/ata.xiv.03ang [Google Scholar]
  2. Arnold, Voiza, et al
    1990 “Do students get higher scores on their word-processed papers? A study of bias in scoring hand-written vs. word-processed papers.” Unpublished manuscript, Rio Hondo College, Whittier, CA.
    [Google Scholar]
  3. Ayres, Leonard Porter
    1912A Scale for Measuring the Quality of Handwriting of School Children (No. 113). New York: Russell Sage Foundation.
    [Google Scholar]
  4. Barnett, Anna L., Mellissa Prunty, and Sara Rosenblum
    2018 “Development of the Handwriting Legibility Scale (HLS): A preliminary examination of reliability and validity.” Research in Developmental Disabilities721: 240–247. 10.1016/j.ridd.2017.11.013
    https://doi.org/10.1016/j.ridd.2017.11.013 [Google Scholar]
  5. Bennett, Randy Elliot
    2003Online assessment and the comparability of score meaning. Research Memorandum RM-03–05. Princeton, NJ: Educational Testing Service.
    [Google Scholar]
  6. Bugbee, Alan C. Jr.
    1996 “The equivalence of paper-and-pencil and computer-based testing.” Journal of Research on Computing in Education28(3): 282–299. 10.1080/08886504.1996.10782166
    https://doi.org/10.1080/08886504.1996.10782166 [Google Scholar]
  7. Campbell, Stuart, and Sandra Hale
    2003 “Translation and interpreting assessment in the context of educational measurement.” InTranslation Today: Trends and Perspectives, ed. byGunilla Anderman and Margaret Rogers, 205–224. Clevedon: Multilingual Matters. 10.2307/jj.27710953.19
    https://doi.org/10.2307/jj.27710953.19 [Google Scholar]
  8. Canz, Thomas, Lars Hoffmann, and Renate Kania
    2020 “Presentation-mode effects in large-scale writing assessments.” Assessing Writing451:100470. 10.1016/j.asw.2020.100470
    https://doi.org/10.1016/j.asw.2020.100470 [Google Scholar]
  9. Chan, Sathena, Stephen Bax, and Cyril Weir
    2018 “Researching the comparability of paper-based and computer-based delivery in a high-stakes writing test.” Assessing Writing361: 32–48. 10.1016/j.asw.2018.03.008
    https://doi.org/10.1016/j.asw.2018.03.008 [Google Scholar]
  10. Chatzikoumi, Eirini
    2019 “How to evaluate machine translation: A review of automated and human metrics.” Natural Language Engineering26(2): 1–25. 10.1017/S1351324919000469
    https://doi.org/10.1017/S1351324919000469 [Google Scholar]
  11. Clariana, Roy, and Patricia Wallace
    2002 “Paper-based versus computer-based assessment: Key factors associated with the test mode effect.” British Journal of Educational Technology33(5): 593–602. 10.1111/1467‑8535.00294
    https://doi.org/10.1111/1467-8535.00294 [Google Scholar]
  12. Congdon, Peter J., and Joy McQueen
    2000 “The stability of rater severity in large-scale assessment programs.” Journal of Educational Measurement37(2): 163–178. 10.1111/j.1745‑3984.2000.tb01081.x
    https://doi.org/10.1111/j.1745-3984.2000.tb01081.x [Google Scholar]
  13. Cook, David A., and Thomas J. Beckman
    2006 “Current concepts in validity and reliability for psychometric instruments: Theory and application.” The American Journal of Medicine119(2): 166.e7–166.e16. 10.1016/j.amjmed.2005.10.036
    https://doi.org/10.1016/j.amjmed.2005.10.036 [Google Scholar]
  14. Crisp, Victoria, and Martin Johnson
    2007 “The use of annotations in examination marking: Opening a window into markers’ minds.” British Educational Research Journal33(6): 943–961. 10.1080/01411920701657066
    https://doi.org/10.1080/01411920701657066 [Google Scholar]
  15. Dooey, Patricia
    2008 “Language testing and technology: Problems of transition to a new era.” ReCALL20(1): 21–34. 10.1017/S0958344008000311
    https://doi.org/10.1017/S0958344008000311 [Google Scholar]
  16. Eames, Kate, and Kate Loewenthal
    1990 “Effects of handwriting and examiner’s expertise on assessment of essays.” The Journal of Social Psychology130(6): 831–833. 10.1080/00224545.1990.9924637
    https://doi.org/10.1080/00224545.1990.9924637 [Google Scholar]
  17. Graham, Matthew, Anthony Milanowski, and Jackson Miller
    2012Measuring and Promoting Inter-Rater Agreement of Teacher and Principal Performance Ratings: The Center for Educator Compensation and Reform (CECR).
    [Google Scholar]
  18. Graham, Steve, Karen R. Harris, and Michael Hebert
    2011 “It is more than just the message: Presentation effects in scoring writing.” Focus on Exceptional Children44(4): 1–12. 10.17161/foec.v44i4.6687
    https://doi.org/10.17161/foec.v44i4.6687 [Google Scholar]
  19. Greifeneder, Rainer, Sarah Zelt, Tim Seele, Konstantin Bottenberg, and Alexander Alt
    2010 “On writing legibly: Processing fluency systematically biases evaluations of handwritten material.” Social Psychological and Personality Science1(3): 230–237. 10.1177/1948550610368434
    https://doi.org/10.1177/1948550610368434 [Google Scholar]
  20. Greifeneder, Rainer,
    2012 “Towards a better understanding of the legibility bias in performance assessments: The case of gender-based inferences.” British Journal of Educational Psychology82(3): 361–374. 10.1111/j.2044‑8279.2011.02029.x
    https://doi.org/10.1111/j.2044-8279.2011.02029.x [Google Scholar]
  21. Guapacha Chamorro, Maria E.
    2020 “Investigating the comparative validity of computer- and paper-based writing tests and differences in impact on EFL test-takers and raters.” PhD dissertation. The University of Auckland.
  22. 2022 “Cognitive validity evidence of computer- and paper-based writing tests and differences in the impact on EFL test-takers in classroom assessment.” Assessing Writing511: art. 100594. 10.1016/j.asw.2021.100594
    https://doi.org/10.1016/j.asw.2021.100594 [Google Scholar]
  23. Han, Chao
    2020 “Translation quality assessment: a critical methodological review.” The Translator26(3): 257–273. 10.1080/13556509.2020.1834751
    https://doi.org/10.1080/13556509.2020.1834751 [Google Scholar]
  24. 2021 “Analytic rubric scoring versus comparative judgment: A comparison of two approaches to assessing spoken-language interpreting.” Meta66(2): 337–361. 10.7202/1083182ar
    https://doi.org/10.7202/1083182ar [Google Scholar]
  25. 2022 “Interpreting testing and assessment: A state-of-the-art review.” Language Testing39(1): 30–55. 10.1177/02655322211036100
    https://doi.org/10.1177/02655322211036100 [Google Scholar]
  26. Hanson, Thomas A.
    2025 “Interpreting and psychometrics.” InThe Routledge Handbook of Interpreting and Cognition, ed. byChristopher D. Mellinger, 151–169. London: Routledge. 10.4324/9780429297533‑12
    https://doi.org/10.4324/9780429297533-12 [Google Scholar]
  27. Hunsu, Nathaniel J.
    2015 “Issues in transitioning from the traditional blue-book to computer-based writing assessment.” Computers and Composition351: 41–51. 10.1016/j.compcom.2015.01.006
    https://doi.org/10.1016/j.compcom.2015.01.006 [Google Scholar]
  28. Jin, Yan, and Ming Yan
    2017 “Computer literacy and the construct validity of a high-stakes computer-based writing assessment.” Language Assessment Quarterly14(2): 101–119. 10.1080/15434303.2016.1261293
    https://doi.org/10.1080/15434303.2016.1261293 [Google Scholar]
  29. Johnson, Martin, Rita Nádas, and John F. Bell
    2010 “Marking essays on screen: An investigation into the reliability of marking extended subjective texts.” British Journal of Educational Technology41(5): 814–826. 10.1111/j.1467‑8535.2009.00979.x
    https://doi.org/10.1111/j.1467-8535.2009.00979.x [Google Scholar]
  30. Johnson, Martin, and Stuart Shaw
    2008 “Annotating to comprehend: A marginalised activity?” Research Matters (6): 19–24.
    [Google Scholar]
  31. Kassim, Noor Lide Abu
    2011 “Judging behaviour and rater errors: An application of the many-facet Rasch model.” GEMA Online Journal of Language Studies11(3): 179–197.
    [Google Scholar]
  32. Kivilehto, Marja, and Leena Salmi
    2017 “Assessing assessment: The authorized translator’s examination in Finland.” Linguistica Antverpiensia, New Series: Themes in Translation Studies161: 57–70. 10.52034/lanstts.v16i0.442
    https://doi.org/10.52034/lanstts.v16i0.442 [Google Scholar]
  33. Klein, Joseph, and David Taub
    2005 “The effect of variations in handwriting and print on evaluation of student essays.” Assessing Writing10(2): 134–148. 10.1016/j.asw.2005.05.002
    https://doi.org/10.1016/j.asw.2005.05.002 [Google Scholar]
  34. Ko, Leong
    2020 “Translation and interpreting assessment schemes: NATTI versus CATTI.” InKey Issues in Translation Studies in China: Reflections and New Insights, ed. byLily Lim and Defeng Li, 161–194. Singapore: Springer. 10.1007/978‑981‑15‑5865‑8_8
    https://doi.org/10.1007/978-981-15-5865-8_8 [Google Scholar]
  35. Koby, Geoffrey, and Gertrud Champe
    2013 “Welcome to the real world: Professional-level translator certification.” The International Journal of Translation and Interpreting Research5(1): 156–173. 10.12807/ti.105201.2013.a09
    https://doi.org/10.12807/ti.105201.2013.a09 [Google Scholar]
  36. Kohler, Benjamin
    2015 “Paper-based or computer-based essay writing: Differences in performance and perception.” Linguistic Portfolios4(1): 130–146.
    [Google Scholar]
  37. Lee, H. K.
    2004 “A comparative study of ESL writers’ performance in a paper-based and a computer-delivered writing test.” Assessing Writing9(1): 4–26. 10.1016/j.asw.2004.01.001
    https://doi.org/10.1016/j.asw.2004.01.001 [Google Scholar]
  38. Lumley, Tom, and Tim F. McNamara
    1995 “Rater characteristics and rater bias: Implications for training.” Language Testing12(1): 54–71. 10.1177/026553229501200104
    https://doi.org/10.1177/026553229501200104 [Google Scholar]
  39. Lynch, Sarah
    2022 “Adapting paper-based tests for computer administration: Lessons learned from 30 years of mode effects studies in education.” Practical Assessment, Research, and Evaluation27(1): art. 22.
    [Google Scholar]
  40. Marshall, Jon C., and Jerry M. Powers
    1969 “Writing neatness, composition errors, and essay grades.” Journal of Educational Measurement6(2): 97–101. 10.1111/j.1745‑3984.1969.tb00665.x
    https://doi.org/10.1111/j.1745-3984.1969.tb00665.x [Google Scholar]
  41. Medadian, Gholamreza, and Dariush Nezhadansari Mahabadi
    2015 “A summative translation quality assessment model for undergraduate student translations: Objectivity versus manageability.” Studies about Languages261: 40–54. 10.5755/j01.sal.0.26.12421
    https://doi.org/10.5755/j01.sal.0.26.12421 [Google Scholar]
  42. Mellinger, Christopher D., and Thomas A. Hanson
    2017Quantitative Research Methods in Translation and Interpreting Studies. London: Routledge. 10.4324/9781315647845
    https://doi.org/10.4324/9781315647845 [Google Scholar]
  43. Mills, Craig N., and Krista J. Breithaupt
    2016 “Current issues in computer based testing.” InEducational Measurement: From Foundations to Future, ed. byC. S. Wells and M. Faulkner-Bond, 208–220. New York: The Guilford Press.
    [Google Scholar]
  44. Mogey, Nora, Jessie Paterson, John Burk, and Michael Purcell
    2010 “Typing compared with handwriting for essay examinations at university: Letting the students choose.” ALT-J, Research in Learning Technology18(1): 29–47. 10.1080/09687761003657580
    https://doi.org/10.1080/09687761003657580 [Google Scholar]
  45. Noyes, Jan M., and Kate J. Garland
    2008 “Computer- vs. paper-based tasks: Are they equivalent?” Ergonomics51(9): 1352–1375. 10.1080/00140130802170387
    https://doi.org/10.1080/00140130802170387 [Google Scholar]
  46. O’Hara, Kenton, and Abigail Sellen
    1997 “A comparison of reading paper and on-line documents.” InProceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems, 335–342. New York: ACM.10.1145/258549.258787
    https://doi.org/10.1145/258549.258787 [Google Scholar]
  47. Odacıoglu, Mehmet Cem, and Saban Kokturk
    2015 “The effects of technology on translation students in academic translation teaching.” Procedia-Social and Behavioral Sciences1971: 1085–1094. 10.1016/j.sbspro.2015.07.349
    https://doi.org/10.1016/j.sbspro.2015.07.349 [Google Scholar]
  48. Phelps, Joanne, Lynn Stempel, and Gail Speck
    1985 “The children’s handwriting scale: A new diagnostic tool.” The Journal of Educational Research79(1): 46–50. 10.1080/00220671.1985.10885646
    https://doi.org/10.1080/00220671.1985.10885646 [Google Scholar]
  49. Powers, Donald E., Mary E. Fowles, Marisa Farnum, and Paul Ramsey
    1994 “They think less of my handwritten essay if others word process theirs? Effects on essay scores of intermingling handwritten and word-processed essays.” Journal of Educational Measurement31(3): 220–233. 10.1111/j.1745‑3984.1994.tb00444.x
    https://doi.org/10.1111/j.1745-3984.1994.tb00444.x [Google Scholar]
  50. R Core Team
    R Core Team 2022R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. https://www.R-project.org/
    [Google Scholar]
  51. Rankin, Angelica Desiree
    2015 “A comparability study on differences between scores of handwritten and typed responses on a large-scale writing assessment.” PhD dissertation. The University of Iowa. 10.17077/etd.2j82ewmk
    https://doi.org/10.17077/etd.2j82ewmk
  52. Russell, Michael, Amie Goldberg, and Kathleen O’Connor
    2003 “Computer-based testing and validity: A look back into the future.” Assessment in Education: Principles, Policy & Practice10(3): 279–293. 10.1080/0969594032000148145
    https://doi.org/10.1080/0969594032000148145 [Google Scholar]
  53. Ruusuvirta, Timo, Helena Sievänen, and Marjo Vehkamäki
    2021 “Negative findings of the handwriting legibility effect: The explanatory role of spontaneous task-specific debiasing.” SN Social Sciences11: 1–14. 10.1007/s43545‑021‑00183‑w
    https://doi.org/10.1007/s43545-021-00183-w [Google Scholar]
  54. Şahin, Mehmet, and Nilgün Dungan
    2014 “Translation testing and evaluation: A study on methods and needs.” Translation & Interpreting6(2): 67–90.
    [Google Scholar]
  55. Salmi, Leena, and Marja Kivilehto
    2020 “A comparative approach to assessing assessment: Revising the scoring chart for the authorized translator’s examination in Finland.” InInstitutional Translation and Interpreting, ed. byFernando Prieto Ramos, 9–25. London: Routledge. 10.4324/9780429264894‑3
    https://doi.org/10.4324/9780429264894-3 [Google Scholar]
  56. Shaw, Stuart
    2008 “Essay marking on-screen: Implications for assessment validity.” E-Learning and Digital Media5(3): 256–274. 10.2304/elea.2008.5.3.256
    https://doi.org/10.2304/elea.2008.5.3.256 [Google Scholar]
  57. Shin, Sun-Young, Senyung Lee, and Yena Park
    2023 “Exploring Rater behaviors on handwritten and typed reading-to-write essays using FACETS.” InFundamental Considerations in Technology Mediated Language Assessment, ed. byKarim Sadeghi and Dan Douglas, 99–114. New York: Routledge. 10.4324/9781003292395‑9
    https://doi.org/10.4324/9781003292395-9 [Google Scholar]
  58. Stansfield, Charles W., Mary Lee Scott, and Dorry Mann Kenyon
    1992 “The measurement of translation ability.” The Modern Language Journal76(4): 455–467. 10.1111/j.1540‑4781.1992.tb05393.x
    https://doi.org/10.1111/j.1540-4781.1992.tb05393.x [Google Scholar]
  59. Sweedler-Brown, Carol O.
    1991 “Computers and assessment: The effect of typing versus handwriting on the holistic scoring of essays.” Research and Teaching in Developmental Education8(1): 5–14.
    [Google Scholar]
  60. Thorndike, Edward L.
    1910 “Handwriting.” Teachers College Record11(2): 1–11. 10.1177/016146811001100206
    https://doi.org/10.1177/016146811001100206 [Google Scholar]
  61. Waddington, Christopher
    2001 “Different methods of evaluating student translations: The question of validity.” Meta46(2): 311–325. 10.7202/004583ar
    https://doi.org/10.7202/004583ar [Google Scholar]
  62. Way, Walter D., and Frederic Robin
    2016 “The history of computer-based testing.” InEducational Measurement: From Foundations to Future, ed. byCraig S. Wells and Molly Faulkner-Bond, 185–207. New York: The Guilford Press.
    [Google Scholar]
  63. Weigle, Sara Cushing
    1999 “Investigating rater/prompt interactions in writing assessment: Quantitative and qualitative approaches.” Assessing Writing6(2): 145–178. 10.1016/S1075‑2935(00)00010‑6
    https://doi.org/10.1016/S1075-2935(00)00010-6 [Google Scholar]
  64. Williams, Malcolm
    2013 “A holistic-componential model for assessing translation student performance and competency.” Mutatis Mutandis6(2): 419–443. 10.17533/udea.mut.17214
    https://doi.org/10.17533/udea.mut.17214 [Google Scholar]
  65. Wilson, Mark, and Harry Case
    2000 “An examination of variation in rater severity over time: A study in rater drift.” InObjective Measurement: Theory into Practice, ed. byMark Wilson and Jr. George Engelhard, 113–133. Stamford, CT: Ablex.
    [Google Scholar]
  66. Wise, Steven L.
    2018 “Computer-based testing.” InThe SAGE Encyclopedia of Educational Research, Measurement, and Evaluation, ed. byBruce B. Frey, 340–344. California: SAGE Publications.
    [Google Scholar]
  67. Yan, Zheng, Xue Hu, Hao Chen, and Fan Lu
    2008 “Computer Vision Syndrome: A widely spreading but largely unknown epidemic among computer users.” Computers in Human Behavior24(5): 2026–2042. 10.1016/j.chb.2007.09.004
    https://doi.org/10.1016/j.chb.2007.09.004 [Google Scholar]
  68. Yu, Guoxing, and Jing Zhang
    2017 “Computer-based english language testing in China: Present and future.” Language Assessment Quarterly14(2): 177–188. 10.1080/15434303.2017.1303704
    https://doi.org/10.1080/15434303.2017.1303704 [Google Scholar]
  69. Zhang, Qi, and Ge Min
    2019 “Chinese writing composition among CFL learners: A comparison between handwriting and typewriting.” Computers and Composition541: 102522. 10.1016/j.compcom.2019.102522
    https://doi.org/10.1016/j.compcom.2019.102522 [Google Scholar]
  70. Zhao, Hulin, and Xiangdong Gu
    2016 “China Accreditation Test for Translators and Interpreters (CATTI): Test review based on the language pairing of English and Chinese.” Language Testing33(3): 439–446. 10.1177/0265532216643630
    https://doi.org/10.1177/0265532216643630 [Google Scholar]
  71. Ziviani, Jenny, and John Elkins
    1984 “An evaluation of handwriting performance.” Educational Review36(3): 249–261. 10.1080/0013191840360304
    https://doi.org/10.1080/0013191840360304 [Google Scholar]
/content/journals/10.1075/tis.23035.sun
Loading
/content/journals/10.1075/tis.23035.sun
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error