1887
image of Difficulty level of EFL test designed by pre‑service teachers
USD
Buy:$35.00 + Taxes

Abstract

Abstract

Since 2024, English as a Foreign Language (EFL) teaching and assessment in Indonesian primary and secondary schools have aimed for B1 Common European Framework of Reference (CEFR) proficiency. However, studies regarding aligning teacher training with CEFR-based assessment design are rare. Consequently, teacher training institutions, which previously did not pay attention to the issue, were not ready to integrate this target into assessment design courses. To fill the gap, this study leverages corpus analysis to maintain the difficulty level of the test in accordance with the targeted CEFR level. This study investigates formative test items created by 28 pre-service teachers (PTs) in a Designing Assessment Course. The alignment of CEFR vocabulary and difficulty levels in the developed test items was scrutinized. Using two corpus analysis tools, 26,487 tokens from receptive skill tests were compared with 5,354 tokens from the CEFR. The results showed that both test types were dominated by very easy (A1) and easy (A2) levels, with limited representation of medium (B1), difficult (B2), and very difficult (C1 and C2) items. Listening items had 68.51% CEFR-aligned vocabulary, mostly A1 (55.98%). Similarly, reading items had 68.19% CEFR-aligned vocabulary, with A1 dominating (51.70%). These findings suggest that the test items do not fully align with B1 proficiency. The very easy and easy levels limit the tests’ effectiveness in assessing students’ achievement and higher-level language skills which in turn may weaken the test validity. The findings urge education institutions to integrate corpus literacy into assessment design. The test analysis in this study was a relatively simple procedure but significant for difficulty level investigation. The procedure can be duplicated for assessment design in EFL classrooms and research settings.

Loading

Article metrics loading...

/content/journals/10.1075/aral.25009.rud
2026-03-13
2026-04-20
Loading full text...

Full text loading...

References

  1. Ahmadi, H., Behnam, B., and Seifoori, Z.
    (2021) The Reciprocal Questioning as a Formative Assessment Strategy: EFL Learners’ Reading Comprehension and Vocabulary Learning. Teaching English Language, (), –. 10.22132/tel.2021.139843
    https://doi.org/10.22132/tel.2021.139843 [Google Scholar]
  2. Akbari, R. [Google Scholar]
  3. Alderson, J. C., & Banerjee, J.
    (2002) Language testing and assessment (Part 2). Language Teaching, (), –. 10.1017/S0261444802001751
    https://doi.org/10.1017/S0261444802001751 [Google Scholar]
  4. Alotaibi, K. A.
    (2019) Teachers’ Perceptions on Factors Influence Adoption of Formative Assessment. Journal of Education and Learning, (), –. 10.5539/jel.v8n1p74
    https://doi.org/10.5539/jel.v8n1p74 [Google Scholar]
  5. Anam, S. U., & Putri, N. V. W.
    (2021) How literate am I about assessment: Evidence from Indonesian EFL pre-service and in-service teachers. English Review: Journal of English Education, (), –. 10.25134/erjee.v9i2.4374
    https://doi.org/10.25134/erjee.v9i2.4374 [Google Scholar]
  6. Anthony, L.
    (2024a) AntConc (Version 4.3.1) [Computer Software]. Tokyo, Japan: Waseda University. https://www.laurenceanthony.net/software/antconc/
    [Google Scholar]
  7. (2024b) AntWordProfiler (Version 2.2.1) [Computer Software]. Tokyo, Japan: Waseda University. https://www.laurenceanthony.net/software/antwordprofiler/
    [Google Scholar]
  8. Bachman, L. F.
    (1990) Fundamental considerations in language testing. Oxford University Press.
    [Google Scholar]
  9. Berger, A.
    (2023) A Difficulty-Informed Approach to Developing Language Assessment Literacy for Classroom Purposes. InChinese J. of Appl. Ling. (Vol., Issue). 10.1515/CJAL‑2023‑0209
    https://doi.org/10.1515/CJAL-2023-0209 [Google Scholar]
  10. Bhola, D. S., Impara, J. C., & Buckendahl, C. W.
    (2003) Aligning tests with states’ content standards: Methods and issues. Educational Measurement: Issues and Practice, (), –. 10.1111/j.1745‑3992.2003.tb00134.x
    https://doi.org/10.1111/j.1745-3992.2003.tb00134.x [Google Scholar]
  11. Boston, C.
    (2002) “The Concept of Formative Assessment”, Practical Assessment, Research, and Evaluation(): . 10.7275/kmcq‑dj31
    https://doi.org/10.7275/kmcq-dj31 [Google Scholar]
  12. Brown, H. D.
    (2004) Language Assessment and Classroom Practices. New York: Pearson Education Limited
    [Google Scholar]
  13. Cakrawati, T. D., Agung, A. S. S. N., Nugroho, A., & Ramadhan, R.
    (2024) How Do The Indonesian Pre-Service Teachers Perceive CEFR?. IJET (Indonesian Journal of English Teaching), (), –. 10.15642/ijet2.2024.13.1.14‑28
    https://doi.org/10.15642/ijet2.2024.13.1.14-28 [Google Scholar]
  14. Chen, L. C., Chang, K. H., Yang, S. C., & Chen, S. C.
    (2023) A Corpus-Based Word Classification Method for Detecting Difficulty Level of English Proficiency Tests. Applied Sciences (Switzerland), (). 10.3390/app13031699
    https://doi.org/10.3390/app13031699 [Google Scholar]
  15. Chen, Y.
    (2021) Comparing incidental vocabulary learning from reading-only and reading-while-listening. System, . 10.1016/j.system.2020.102442
    https://doi.org/10.1016/j.system.2020.102442 [Google Scholar]
  16. Chiedu, R. E., & Omenogor, H. D.
    (2014) The concept of reliability in language testing: issues and solutions. https://www.academia.edu/72225099/The_Concept_of_Reliability_in_Language_Testing_Issues_and_Solutions
  17. Choi, I. C.
    (1994) Content and construct validation of a criterion-referenced English proficiency test. English Teaching, , –. https://www.researchgate.net/publication/280926030_Content_and_construct_validation_of_a_criterion-referenced_English_proficiency_test_English_Teaching_503_161-168
    [Google Scholar]
  18. Choi, I. C., & Moon, Y.
    (2020) Predicting the Difficulty of EFL Tests Based on Corpus Linguistic Features and Expert Judgment. Language Assessment Quarterly, (), –. 10.1080/15434303.2019.1674315
    https://doi.org/10.1080/15434303.2019.1674315 [Google Scholar]
  19. Cizek, G. J., Andrade, H. L., & Bennett, R. E.
    (2019) Formative assessment: History, definition, and progress. InHandbook of formative assessment in the disciplines (pp.–). Routledge. 10.4324/9781315166933‑1
    https://doi.org/10.4324/9781315166933-1 [Google Scholar]
  20. Crossley, S. A., Greenfield, J., & McNamara, D. S.
    (2008) Assessing text readability using cognitively based indices. Tesol Quarterly, (), –. 10.1002/j.1545‑7249.2008.tb00142.x
    https://doi.org/10.1002/j.1545-7249.2008.tb00142.x [Google Scholar]
  21. Davies, A.
    (2008) Textbook trends in teaching language testing. Language Testing, (), –. 10.1177/0265532208090156
    https://doi.org/10.1177/0265532208090156 [Google Scholar]
  22. Defianty, D., Wilson, K.
    (2024) Beating Barriers to Formative Assessment in a Testing-Oriented Nation. TARBIYA: Journal of Education in Muslim Society, (), –. 10.15408/tjems.v11i1.40391
    https://doi.org/10.15408/tjems.v11i1.40391 [Google Scholar]
  23. Du, G., Hasim, Z., & Chew, F. P.
    (2022) Contribution of English aural vocabulary size levels to L2 listening comprehension. International Review of Applied Linguistics in Language Teaching, IRAL, (), –. 10.1515/iral‑2020‑0004
    https://doi.org/10.1515/iral-2020-0004 [Google Scholar]
  24. Estaji, M. & Mirzaii, M.
    (2018) Enhancing EFL learners’ vocabulary learning through formative assessment: Is the effort worth expending?. Language Learning in Higher Education, (), –. 10.1515/cercles‑2018‑0015
    https://doi.org/10.1515/cercles-2018-0015 [Google Scholar]
  25. Fan, N.
    (2020) Strategy use in second language vocabulary learning and its relationships with the breadth and depth of vocabulary knowledge: A structural equation modeling study. Frontiers in Psychology, , . 10.3389/fpsyg.2020.00752
    https://doi.org/10.3389/fpsyg.2020.00752 [Google Scholar]
  26. Freedle, R., & Kostin, I.
    (1993) The prediction of TOEFL reading item difficulty: implications for construct validity. Language Testing, (), –. 10.1177/026553229301000203
    https://doi.org/10.1177/026553229301000203 [Google Scholar]
  27. (1996) The prediction of TOEFL listening comprehension item difficulty for the expository prose passages for minitalk passages: Implications for construct validity. TOEFL Research Reports, . 10.1002/j.2333‑8504.1996.tb01707.x
    https://doi.org/10.1002/j.2333-8504.1996.tb01707.x [Google Scholar]
  28. Gaffas, Z. M.
    (2024) Learning medical terminology in an ESP medical course: Vocabulary notebooks versus word lists. Australian Review of Applied Linguistics. 10.1075/aral.24070.gaf
    https://doi.org/10.1075/aral.24070.gaf [Google Scholar]
  29. Gavaldà, N., & Queralt, S.
    (2020) Determining the Level of a Language Test with English Profile: A Forensic Linguistics Case Study. Atlantis, (), –. 10.28914/Atlantis‑2020‑42.2.01
    https://doi.org/10.28914/Atlantis-2020-42.2.01 [Google Scholar]
  30. Green, C., Pantelich, M., Barrow, M., Weerasinghe, D., & Daniel, R.
    (2024) Receptive vocabulary size estimates for general and academic vocabulary at a multi-campus Australian university. Australian Review of Applied Linguistics, (), –. 10.1075/aral.21099.gre
    https://doi.org/10.1075/aral.21099.gre [Google Scholar]
  31. Hale, G. A., Rock, D. A., & Jirele, T.
    (1982) Confirmatory factor analysis of the Test of English as a Foreign Language. ETS Research Report Series, (), –. https://onlinelibrary.wiley.com/doi/pdf/10.1002/j.2333-8504.1982.tb01327.x. 10.1002/j.2333‑8504.1982.tb01327.x
    https://doi.org/10.1002/j.2333-8504.1982.tb01327.x [Google Scholar]
  32. Hamada, A.
    (2015) Linguistic variables determining the difficulty of Eiken reading passages. JLTA Journal, , –. 10.20622/jltajournal.18.0_57
    https://doi.org/10.20622/jltajournal.18.0_57 [Google Scholar]
  33. Heilman, M., Collins-Thompson, K., & Eskenazi, M.
    (2008) An analysis of statistical models and features for reading difficulty prediction. InProceedings of the third workshop on innovative use of NLP for building educational applications (pp.–). https://aclanthology.org/W08-0909.pdf. 10.3115/1631836.1631845
    https://doi.org/10.3115/1631836.1631845 [Google Scholar]
  34. Hsu, W.
    (2011) The vocabulary thresholds of business textbooks and business research articles for EFL learners. English for Specific Purposes, (), –. 10.1016/j.esp.2011.04.005
    https://doi.org/10.1016/j.esp.2011.04.005 [Google Scholar]
  35. Ismail, S. M., Rahul, D. R., Patra, I., & Rezvani, E.
    (2022) Formative vs. summative assessment: impacts on academic motivation, attitude toward learning, test anxiety, and self-regulation skill. Language Testing in Asia, (), . 10.1186/s40468‑022‑00191‑4
    https://doi.org/10.1186/s40468-022-00191-4 [Google Scholar]
  36. Kementrian Pendidikan Dasar dan Menengah
    Kementrian Pendidikan Dasar dan Menengah (2024) Peraturan Menteri Pendidikan, Kebudayaan, Riset, dan Teknologi Tentang Kurikulum pada Pendidikan Anak Usia Dini, Jenjang Pendidikan Dasar, dan Jenjang Pendidikan Menengah. InKurikulum Merdeka (Nomor 12 Tahun 2024). RetrievedMay 21, 2024, fromhttps://kurikulum.kemdikbud.go.id/rujukan [English Translation: The Ministry of Primary and Secondary Education (2024) Regulation of the Minister of Education, Culture, Research, and Technology Concerning the Curriculum for Early Childhood Education, Primary Education, and Secondary Education. InEmancipated Curriculum (Number 12 of 2024).]
    [Google Scholar]
  37. Kumar, D., Jaipurkar, R., Shekhar, A., Sikri, G., & Srinivas, V.
    (2021) Item analysis of multiple-choice questions: A quality assurance test for an assessment tool. Medical Journal Armed Forces India, , –. 10.1016/j.mjafi.2020.11.007
    https://doi.org/10.1016/j.mjafi.2020.11.007 [Google Scholar]
  38. Li, Z., Li, J. Z., Zhang, X., & Reynolds, B. L.
    (2024) Mastery of Listening and Reading Vocabulary Levels in Relation to CEFR: Insights into Student Admissions and English as a Medium of Instruction. Languages, (), . 10.3390/languages9070239
    https://doi.org/10.3390/languages9070239 [Google Scholar]
  39. Loukina, A., Yoon, S. Y., Sakano, J., Wei, Y., & Sheehan, K.
    (2016) Textual complexity as a predictor of difficulty of listening items in language proficiency tests. InProceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers (pp.–). Osaka, Japan. https://aclanthology.org/C16-1306.pdf
    [Google Scholar]
  40. Marzaini, A. F. M., Sharil, W. N. E. H., Supramaniam, K., & Yusoff, S. M.
    (2023) Evaluating Teachers’ Assessment Literacy in Enacting CEFR-Aligned Classroom-Based Assessment in Malaysian Secondary Schools’ ESL Classrooms. International Journal of Academic Research in Progressive Education and Development, (). 10.6007/IJARPED/v12‑i1/15691
    https://doi.org/10.6007/IJARPED/v12-i1/15691 [Google Scholar]
  41. McLean, S., Stewart, J., & Batty, A. O.
    (2020) Predicting L2 reading proficiency with modalities of vocabulary knowledge: A bootstrapping approach. Language Testing, (), –. 10.1177/0265532219898380
    https://doi.org/10.1177/0265532219898380 [Google Scholar]
  42. McMillen, S., Anaya, J. B., Peña, E. D., Bedore, L. M., & Barquin, E.
    (2022) That’s hard! Item difficulty and word characteristics for bilinguals with and without developmental language disorder. International Journal of Bilingual Education and Bilingualism, (), –. 10.1080/13670050.2020.1832039
    https://doi.org/10.1080/13670050.2020.1832039 [Google Scholar]
  43. Medero, J., & Ostendorf, M.
    (2009) Analysis of vocabulary difficulty using Wiktionary. InSLaTE (pp.–). www.eee.bham.ac.uk/SLaTE2009/. 10.21437/SLaTE.2009‑26
    https://doi.org/10.21437/SLaTE.2009-26 [Google Scholar]
  44. Milton, J., Alexiou, T.
    (2009) Vocabulary Size and the Common European Framework of Reference for Languages. In: Richards, B., Daller, M. H., Malvern, D. D., Meara, P., Milton, J., Treffers-Daller, J. (eds) Vocabulary Studies in First and Second Language Acquisition. Palgrave Macmillan, London. 10.1057/9780230242258_12
    https://doi.org/10.1057/9780230242258_12 [Google Scholar]
  45. Nation, I. S. P.
    (2013) Learning Vocabulary in Another Language. Second edition. Cambridge: Cambridge University Press. 10.1017/CBO9781139858656
    https://doi.org/10.1017/CBO9781139858656 [Google Scholar]
  46. Natova, I.
    (2021) Estimating CEFR reading comprehension text complexity. Language Learning Journal, (), –. 10.1080/09571736.2019.1665088
    https://doi.org/10.1080/09571736.2019.1665088 [Google Scholar]
  47. Nhan, L. K.
    (2024) Enhancing Teaching and Learning Through Formative Assessment. 10.51386/25815946/ijsms‑v7i3p128
    https://doi.org/10.51386/25815946/ijsms-v7i3p128 [Google Scholar]
  48. North, B., & Jarosz, E.
    (2001) Implementing the CEFR in teacher-based assessment: approaches and challenges. Exploring Language Frameworks, . https://www.cambridgeenglish.org/Images/735099-studies-in-language-testing-volume-36.pdf
    [Google Scholar]
  49. Owen, N., Shrestha, P., & Bax, S.
    (2021) Researching lexical thresholds and lexical profiles across the Common European Framework of Reference for Languages (CEFR) levels assessed in the APTIS test. ARAG’s Research Reports Online, (). https://www.britishcouncil.org/exam/english/aptis/research/publications/arags/researching-lexical-thresholds-and-lexical-profiles-across
    [Google Scholar]
  50. Oxford 3000 and 5000
    Oxford 3000 and 5000 (2023) Oxford Learner’s Dictionaries. RetrievedOctober 10, 2024, fromhttps://www.oxfordlearnersdictionaries.com/wordlists/oxford3000-5000
  51. Petersen, S. E., & Ostendorf, M.
    (2009) A machine learning approach to reading level assessment. Computer speech & language, (), –. 10.1016/j.csl.2008.04.003
    https://doi.org/10.1016/j.csl.2008.04.003 [Google Scholar]
  52. Pujianto, D., Damayanti, I. L., Hamied, F. A., and Sari, D. N. K.
    (2023) “Identifying the proficiency level of primary English language teachers’ productive skills from Kurikulum Merdeka and CEFR,” Bahasa dan Seni: Jurnal Bahasa, Sastra, Seni, dan Pengajarannya: Vol.: No., Article 4. 10.17977/um015v51i22023p210
    https://doi.org/10.17977/um015v51i22023p210 [Google Scholar]
  53. Qian, D. D.
    (2002) Investigating the relationship between vocabulary knowledge and academic reading performance: An assessment perspective. Language learning, (), –. 10.1111/1467‑9922.00193
    https://doi.org/10.1111/1467-9922.00193 [Google Scholar]
  54. Rafatbakhsh, E., & Ahmadi, A.
    (2023) Predicting the difficulty of EFL reading comprehension tests based on linguistic indices. Asian-Pacific Journal of Second and Foreign Language Education, (). 10.1186/s40862‑023‑00214‑4
    https://doi.org/10.1186/s40862-023-00214-4 [Google Scholar]
  55. Rupp, A. A., Garcia, P., & Jamieson, J.
    (2001) Combining Multiple Regression and CART to Understand Difficulty in Second Language Reading and Listening Comprehension Test Items. International Journal of Testing, (), –. 10.1080/15305058.2001.9669470
    https://doi.org/10.1080/15305058.2001.9669470 [Google Scholar]
  56. Shih, C. M.
    (2008) The general English proficiency test. Language Assessment Quarterly, (), –. 10.1080/15434300701776377
    https://doi.org/10.1080/15434300701776377 [Google Scholar]
  57. Sung, P.-J., Lin, S.-W., & Hung, P.-H.
    (2015) Factors Affecting Item Difficulty in English Listening Comprehension Tests. Universal Journal of Educational Research, (), –. 10.13189/ujer.2015.030704
    https://doi.org/10.13189/ujer.2015.030704 [Google Scholar]
  58. Susanti, Y., Nishikawa, H., Tokunaga, T., & Hiroyuki, O.
    (2016) Item difficulty analysis of English vocabulary questions. InInternational Conference on Computer Supported Education (Vol., pp.–). Scitepress. 10.5220/0005775502670274
    https://doi.org/10.5220/0005775502670274 [Google Scholar]
  59. Schmitt, N., Jiang, X., & Grabe, W.
    (2011) The percentage of words known in a text and reading comprehension. The modern language journal, (), –. 10.1111/j.1540‑4781.2011.01146.x
    https://doi.org/10.1111/j.1540-4781.2011.01146.x [Google Scholar]
  60. Uchida, S., Arase, Y., & Kajiwara, T.
    (2024) Profiling English sentences based on CEFR levels. ITL — International Journal of Applied Linguistics. 10.1075/itl.22018.uch
    https://doi.org/10.1075/itl.22018.uch [Google Scholar]
  61. Tu, M., Ma, Q., & Jiang, L.
    (2024) Exploring EFL vocabulary learning through the story continuation writing task A mixed-methods study. Australian Review of Applied Linguistics. 10.1075/aral.24036.tu
    https://doi.org/10.1075/aral.24036.tu [Google Scholar]
  62. Van Der Vleuten, C. P.
    (1996) The assessment of professional competence: developments, research, and practical implications. Advances in Health Sciences Education, (), –. 10.1007/BF00596229
    https://doi.org/10.1007/BF00596229 [Google Scholar]
  63. Warnby, M.
    (2024) Relating academic reading with academic vocabulary and general English proficiency to assess standards of students’ university-preparedness — the case of IELTS and CEFR B2. Scandinavian Journal of Educational Research, (), –. 10.1080/00313831.2024.2318434
    https://doi.org/10.1080/00313831.2024.2318434 [Google Scholar]
  64. Weir, C. J.
    (2005) Language testing and validation. Hampshire: Palgrave McMillan. 10.1057/9780230514577
    https://doi.org/10.1057/9780230514577 [Google Scholar]
  65. Waluyo, B.
    (2019) Examining Thai first-year university students’ English proficiency on CEFR levels. The New English Teacher ISSN 2985–0959 (Online), (), –. https://assumptionjournal.au.edu/index.php/newEnglishTeacher/article/view/3651
    [Google Scholar]
  66. Wilkinson, D.
    (2024) Formative Assessment Activities That Engage Students and Support Success. Journal of Higher Education Theory & Practice, (). 10.33423/jhetp.v24i1.6774
    https://doi.org/10.33423/jhetp.v24i1.6774 [Google Scholar]
  67. Xi, X.
    (2017) What does corpus linguistics have to offer to language assessment?Language Testing, (), –. 10.1177/0265532217720956
    https://doi.org/10.1177/0265532217720956 [Google Scholar]
/content/journals/10.1075/aral.25009.rud
Loading
/content/journals/10.1075/aral.25009.rud
Loading

Data & Media loading...

  • Article Type: Research Article
Keywords: corpus analysis ; test items ; difficulty level ; assessment ; CEFR
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error