Volume 172, Issue 2
  • ISSN 0019-0829
  • E-ISSN: 1783-1490
Buy:$35.00 + Taxes



This study aimed to compare second-language (L2) students’ ratings of their peers’ essays on multiple criteria with those of their teachers’ under different assessment conditions. Forty EFL teachers and 40 EFL students took part in the study. They each rated one essay on five criteria twice, under high-stakes and low-stakes assessment conditions. Multifaceted Rasch Analysis and correlation analyses were conducted to compare rater severity and consistency across rater groups, rating criteria and assessment conditions. The results revealed that there was more variation in students’ ratings than the teachers’ across assessment conditions. Additionally, both rater groups had different degrees of severity in assessing different criteria. In general, students were significantly more severe on language use than were teachers; whereas teachers were significantly more severe than were peers on organization. Student and teacher severity also varied across rating criteria and assessment conditions. The findings of this study have implications for planning and implementing peer assessment in the L2 writing classroom as well as for future research.


Article metrics loading...

Loading full text...

Full text loading...


  1. Bachman, L. F.
    (2004) Statistical analyses for language assessment. Ernst Klett Sprachen. 10.1017/CBO9780511667350
    https://doi.org/10.1017/CBO9780511667350 [Google Scholar]
  2. Bachman, L. F., Palmer, A. S.
    (2010) Language assessment in practice: Developing language assessments and justifying their use in the real world. Oxford: Oxford University Press.
    [Google Scholar]
  3. Baker, B. A.
    (2010) Playing with the stakes: A consideration of an aspect of the social context of a gatekeeping writing assessment. Assessing Writing, 15(3), 133–153. 10.1016/j.asw.2010.06.002
    https://doi.org/10.1016/j.asw.2010.06.002 [Google Scholar]
  4. Ballantyne, R., Hughes, K., & Mylonas, A.
    (2002) Developing procedures for imple-menting peer assessment in large classes using an action research process. Assessment & Evaluation in Higher Education, 27, 427–441. 10.1080/0260293022000009302
    https://doi.org/10.1080/0260293022000009302 [Google Scholar]
  5. Barkaoui, K.
    (2013) Multifaceted Rasch analysis for test evaluation. The companion to language assessment, 3, 1301–1322. 10.1002/9781118411360.wbcla070
    https://doi.org/10.1002/9781118411360.wbcla070 [Google Scholar]
  6. Biber, D., Nekrasova, T., & Horn, B.
    (2011) The effectiveness of feedback for L1-English and L2- writing development: A meta-analysis. ETS Research Report Series 2011(1), i–99. 10.1002/j.2333‑8504.2011.tb02241.x
    https://doi.org/10.1002/j.2333-8504.2011.tb02241.x [Google Scholar]
  7. Black, P., Harrison, C., Lee, C., Marshall, B., & Wiliam, D.
    (2004) Working inside the black box: Assessment for learning in the classroom. Phi delta kappan, 86(1), 8–21. 10.1177/003172170408600105
    https://doi.org/10.1177/003172170408600105 [Google Scholar]
  8. Black, P., & Wiliam, D.
    (1998) Assessment and classroom learning. Assessment in Education: principles, policy & practice, 5(1), 7–74. 10.1080/0969595980050102
    https://doi.org/10.1080/0969595980050102 [Google Scholar]
  9. Chapelle, C. A., Enright, M. K., & Jamieson, J. M.
    (Eds.) (2008) Building a validity argument for the Test of English as a Foreign LanguageTM. Routledge. 10.4324/9780203937891
    https://doi.org/10.4324/9780203937891 [Google Scholar]
  10. Cheng, W. & Warren, M.
    (2005) Peer assessment of language proficiency. Language Testing, 22(3), 93–121. 10.1191/0265532205lt298oa
    https://doi.org/10.1191/0265532205lt298oa [Google Scholar]
  11. Cho, K., Schunn, C. D., & Wilson, R. W.
    (2006) Validity and reliability of scaffolded peer assessment of writing from instructor and student perspectives. Journal of Educational Psychology, 98, 891–901. 10.1037/0022‑0663.98.4.891
    https://doi.org/10.1037/0022-0663.98.4.891 [Google Scholar]
  12. De Ayala, R. J.
    (2009) The Theory and Practice of Item Response Theory. Psychometrika, 75(4), 778–779.
    [Google Scholar]
  13. Esfandiari, R., & Myford, C. M.
    (2013) Severity differences among self-assessors, peer- assessors, and teacher assessors rating EFL essays. Assessing writing, 18(2), 111–131. 10.1016/j.asw.2012.12.002
    https://doi.org/10.1016/j.asw.2012.12.002 [Google Scholar]
  14. Falchikov, N.
    (1995) Peer feedback marking: Developing peer assessment. Innovations in Education and Training International, 32, 175–187. 10.1080/1355800950320212
    https://doi.org/10.1080/1355800950320212 [Google Scholar]
  15. (2005) Improving assessment through student involvement: Practical solutions for aiding learning in higher and further education. London: RoutledgeFalmer.
    [Google Scholar]
  16. Falchikov, N., & Goldfinch, J.
    (2000) Student peer assessment in higher education: a meta-analysis comparing peer and teacher marks. Review of Educational Research, 70(3), 287–322. 10.3102/00346543070003287
    https://doi.org/10.3102/00346543070003287 [Google Scholar]
  17. Farrokhi, F., Esfandiari, R., & Schaefer, E.
    (2012) A many-facet Rasch measurement of differential rater severity/leniency in three types of assessment. JALT Journal, 34(1), 79–101. 10.37546/JALTJJ34.1‑3
    https://doi.org/10.37546/JALTJJ34.1-3 [Google Scholar]
  18. Gielen, S., Peeters, E., Dochy, F., Onghena, P., & Struyven, K.
    (2010) Improving the effectiveness of peer feedback for learning. Learning and instruction, 20(4), 304–315. 10.1016/j.learninstruc.2009.08.007
    https://doi.org/10.1016/j.learninstruc.2009.08.007 [Google Scholar]
  19. Hattie, J.
    (2009) Visible learning: A synthesis of over 800 meta-analyses relating to achievement. Routledge.
    [Google Scholar]
  20. Jacobs, H. L., Zinkgraf, S. A., Wormouth, D. R., Hartfiel, V. F., & Hughey, J. B.
    (1981) Testing ESL composition: A practical approach. Rowely, MA: Newbury House.
    [Google Scholar]
  21. Jeffery, D., Yankulov, K., Crerar, A., & Ritchie, K.
    (2016) How to achieve accurate peer assessment for high value written assignments in a senior undergraduate course. Assessment & Evaluation in Higher Education, 41, 127–140. 10.1080/02602938.2014.987721
    https://doi.org/10.1080/02602938.2014.987721 [Google Scholar]
  22. Kearney, S. P., & Perkins, T.
    (2014) Engaging students through assessment: the success and limitations of the ASPAL (authentic self and peer-assessment for learning) model. Journal of University Teaching and Learning Practice, 11 (3), 1–13. 10.53761/
    https://doi.org/10.53761/ [Google Scholar]
  23. Kearney, S., Perkins, T. & Clark, S. K.
    (2016) Using self- and peer-assessments for summative purposes: analysing the relative validity of the AASL (authentic assessment for sustainable learning) model. Assessment & Evaluation in Higher Education, 41 (6), 840–853. 10.1080/02602938.2015.1039484
    https://doi.org/10.1080/02602938.2015.1039484 [Google Scholar]
  24. Lamb, T. E. R. R. Y.
    (2010) Assessment of autonomy or assessment for autonomy? Evaluating learner autonomy for formative purposes. Testing the untestable in language education, 98–119. 10.21832/9781847692672‑008
    https://doi.org/10.21832/9781847692672-008 [Google Scholar]
  25. Lee, S. B.
    (2016) University students’ experience of ‘scale-referenced’ peer assessment for a consecutive interpreting examination. Assessment and Evaluation in Higher Education, 41, 1–15.
    [Google Scholar]
  26. Linacre, J. M.
    (2002) What do infit and outfit, mean-square and standardized mean?Rasch Measurement Transactions, 16(2), 878.
    [Google Scholar]
  27. (2005) A user’s guide to FACETS: Rasch-model computer programs [Software manual]. Chicago, IL: Winsteps.com
    [Google Scholar]
  28. (2013) A user’s guide to FACETS. Program manual 3 71.0. Rasch-Model Computer Programs. Retrieved from: www.winsteps.com
    [Google Scholar]
  29. Little, D.
    (2009) Language learner autonomy and the European language portfolio: Two L2 English examples. Language Teaching, 42(2), 222–233. 10.1017/S0261444808005636
    https://doi.org/10.1017/S0261444808005636 [Google Scholar]
  30. Liu, X. & Li, L.
    (2014) Assessment training effects on student assessment skills and task performance in a technology-facilitated peer assessment. Assessment and Evaluation in Higher Education, 39(3), 275–292. 10.1080/02602938.2013.823540
    https://doi.org/10.1080/02602938.2013.823540 [Google Scholar]
  31. Matsuno, S.
    (2009) Self-, peer-, and teacher-assessments in Japanese university EFL writing classrooms, Language Testing, 26(1), 75–100. 10.1177/0265532208097337
    https://doi.org/10.1177/0265532208097337 [Google Scholar]
  32. Myford, C. M., & Wolfe, E. W.
    (2004) Detecting and measuring rater effects using many-facet Rasch measurement: Part II. InE. V. Smith & R. M. Smith, (Eds.), Introduction to Rasch measurement (pp.518–574). Maple Grove, MI: JAM Press.
    [Google Scholar]
  33. Nakamura, Y.
    (2002) Teacher Assessment and Peer Assessment in Practice (English Teaching). Educational studies, 44, 203–215.
    [Google Scholar]
  34. Nguyen, L. T. C., & Gu, Y.
    (2013) Strategy-based instruction: A learner-focused approach to developing learner autonomy. Language Teaching Research, 17(1), 9–30. 10.1177/1362168812457528
    https://doi.org/10.1177/1362168812457528 [Google Scholar]
  35. Ozogul, G., & Sullivan, H.
    (2007) Student performance and attitudes under formative evaluation by teacher, self- and peer-evaluators. Education Technology Research and Development. 57(3), 393–410. 10.1007/s11423‑007‑9052‑7
    https://doi.org/10.1007/s11423-007-9052-7 [Google Scholar]
  36. Saito, H.
    (2008) EFL classroom peer assessment: Training effects on rating and commenting. Language Testing, 25, 553–581. 10.1177/0265532208094276
    https://doi.org/10.1177/0265532208094276 [Google Scholar]
  37. Saito, H., & Fujita, T.
    (2004) Characteristics and user acceptance of peer rating in EFL writing classroom. Language Teaching Research, 31, 31–54. 10.1191/1362168804lr133oa
    https://doi.org/10.1191/1362168804lr133oa [Google Scholar]
  38. Topping, K. J.
    (2003) Self and peer assessment in school and university: Reliability, validity and utility. InM. S. R. Segers, F. J. R. C. Dochy, & E. C. Cascallar (Eds.), Optimizing new modes of assessment: In search of qualities and standards (pp.55–87). Dordrecht, Netherlands. 10.1007/0‑306‑48125‑1_4
    https://doi.org/10.1007/0-306-48125-1_4 [Google Scholar]
  39. (2010) Methodological quandaries in studying process and outcomes in peer assessment. Learning and Instruction, 20, 339–343. 10.1016/j.learninstruc.2009.08.003
    https://doi.org/10.1016/j.learninstruc.2009.08.003 [Google Scholar]
  40. Van Gennip, N. A. E., Segers, M. S. R., & Tillema, H. H.
    (2009) Peer assessment for learning from a social perspective: the influence of interpersonal variables and structural features. Educational Research Review, 4, 41–54. 10.1016/j.edurev.2008.11.002
    https://doi.org/10.1016/j.edurev.2008.11.002 [Google Scholar]
  41. (2010) Peer assessment as a collaborative learning activity: the role of interpersonal factors and conceptions. Learning and Instruction, 20(4), 280–290. 10.1016/j.learninstruc.2009.08.010
    https://doi.org/10.1016/j.learninstruc.2009.08.010 [Google Scholar]
  42. Van Zundert, M., Sluijsmans, D. M. A., & Van Merrie¨nboer, J. J. G.
    (2010) Effective peer assessment processes: research findings and future directions. Learning and Instruction, 20(4), 270–279. 10.1016/j.learninstruc.2009.08.004
    https://doi.org/10.1016/j.learninstruc.2009.08.004 [Google Scholar]
  43. Weaver, D., & Esposto, A.
    (2012) Peer assessment as a method of improving student engagement. Assessment & Evaluation in Higher Education, 37(7), 805–816. 10.1080/02602938.2011.576309
    https://doi.org/10.1080/02602938.2011.576309 [Google Scholar]
  44. Weir, C. J.
    (2005) Language testing and validation. Hampshire: Palgrave McMillan. 10.1057/9780230514577
    https://doi.org/10.1057/9780230514577 [Google Scholar]

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error