Volume 168, Issue 1
  • ISSN 0019-0829
  • E-ISSN: 1783-1490
Buy:$35.00 + Taxes


The Vocabulary Levels Test ( Nation, 1983 ; Schmitt, Schmitt, & Clapham, 2001 ) indicates the word frequency level that should be used to select words for learning. The present study involves the development and validation of two new forms of the test. The new forms consist of five levels measuring knowledge of vocabulary at the 1000, 2000, 3000, 4000, and 5000 levels. Items for the tests were sourced from Nation’s (2012) BNC/COCA word lists. The research involved first identifying quality items using the data from 1,463 test takers to create two equivalent forms, and then evaluating the forms with the data from a further 250 test takers. This study also makes an initial attempt to validate the new forms using Messick’s ( 1989 , 1995 ) validity framework.


Article metrics loading...

Loading full text...

Full text loading...


  1. Andrich, D.
    (1988) Rasch models for measurement. Beverly Hills, CA: Sage. doi: 10.4135/9781412985598
    https://doi.org/10.4135/9781412985598 [Google Scholar]
  2. Bachman, L. F.
    (1990) Fundamental considerations in language testing. Oxford: Oxford University Press.
    [Google Scholar]
  3. (2000) Modern language testing at the turn of the century: Assuring that what we count counts. Language Testing, 17(1), 1–42. doi: 10.1177/026553220001700101
    https://doi.org/10.1177/026553220001700101 [Google Scholar]
  4. Bachman, L. F. , & Palmer, A.
    (1996) Language testing in practice: Designing and developing useful language tests. Oxford: Oxford University Press.
    [Google Scholar]
  5. Beglar, D.
    (2010) A Rasch-based validation of the Vocabulary Size Test. Language Testing, 27(1), 101–118. doi: 10.1177/0265532209340194
    https://doi.org/10.1177/0265532209340194 [Google Scholar]
  6. Bond, T. G. , & Fox, C. M.
    (2015) Applying the Rasch model: Fundamental measurement in the human sciences. New York, NY: Routledge.
    [Google Scholar]
  7. Chapelle, C. A.
    (1999) Validity in language assessment. Annual Review of Applied Linguistics, 19, 254–272. doi: 10.1017/S0267190599190135
    https://doi.org/10.1017/S0267190599190135 [Google Scholar]
  8. Cohen, J.
    (1988) Statistical power analysis for the behavioral science (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.
    [Google Scholar]
  9. (1992) A power primer. Psychological Bulletin, 112(1), 155–159. doi: 10.1037/0033‑2909.112.1.155
    https://doi.org/10.1037/0033-2909.112.1.155 [Google Scholar]
  10. Coxhead, A.
    (2000) A new academic word list. TESOL Quarterly, 34(2), 213–238. doi: 10.2307/3587951
    https://doi.org/10.2307/3587951 [Google Scholar]
  11. Coxhead, A. , Nation, I. S. P. , & Sim, D.
    (2015) Measuring the vocabulary size of native speakers of English in New Zealand secondary schools. New Zealand Journal of Educational Studies, 50(1), 121–135. doi: 10.1007/s40841‑015‑0002‑3
    https://doi.org/10.1007/s40841-015-0002-3 [Google Scholar]
  12. Dang, T. N. Y. , & Webb, S.
    (2014) The lexical profile of academic spoken English. English for Specific Purposes, 33(1), 66–76. doi: 10.1016/j.esp.2013.08.001
    https://doi.org/10.1016/j.esp.2013.08.001 [Google Scholar]
  13. (2016) Making an essential word list for beginners. In I. S. P. Nation , Making and using word lists for language learning and testing (pp.153–167, 188–195). Amsterdam: John Benjamins. doi: 10.1075/z.208.15ch15
    https://doi.org/10.1075/z.208.15ch15 [Google Scholar]
  14. Hu, H. M. , & Nation, P.
    (2000) What vocabulary size is needed to read unsimplified texts. Reading in a Foreign Language, 8, 689–696.
    [Google Scholar]
  15. Karabatsos, G.
    (2000) A critique of Rasch residual fit statistics. Journal of Applied Measurement, 1, 152–176.
    [Google Scholar]
  16. Kremmel, B.
    (2016) Word families and frequency bands in vocabulary tests: Challenging conventions. TESOL Quarterly, 50(4), 976–987. doi: 10.1002/tesq.329
    https://doi.org/10.1002/tesq.329 [Google Scholar]
  17. Laufer, B. , & Goldstein, Z.
    (2004) Testing vocabulary knowledge: Size, strength, and computer adaptiveness. Language Learning, 54(3), 399–436. doi: 10.1111/j.0023‑8333.2004.00260.x
    https://doi.org/10.1111/j.0023-8333.2004.00260.x [Google Scholar]
  18. Laufer, B. , & Levitzky-Aviad, T.
    (2016) Computer Adaptive Test of Size and Strength. Retrieved from catss.ga/
  19. Laufer, B. , & Ravenhorst-Kalovski, G.
    (2010) Lexical threshold revisited: Lexical text coverage, learners’ vocabulary size and reading comprehension. Reading in a Foreign Language, 22(1), 15–30.
    [Google Scholar]
  20. Linacre, J. M.
    (1995) Prioritizing misfit indicators. Rasch Measurement Transactions, 9(2), 422–423.
    [Google Scholar]
  21. (2002) What do infit and outfit, mean-square and standardized mean?Rasch Measurement Transactions, 16(2), 878.
    [Google Scholar]
  22. (2003) Size vs. significance: Infit and outfit mean-square and standardized chi-square fit statistic. Rasch Measurement Transactions, 17(1), 918.
    [Google Scholar]
  23. (2016a) WINSTEPS® Rasch measurement computer program. Beaverton, Oregon: Winsteps.com.
  24. (2016b) WINSTEPS® Rasch measurement computer programs User’s Guide. Beaverton, Oregon: Winsteps.com.
    [Google Scholar]
  25. Linacre, J. M. , & Tennant, A.
    (2009) More about critical eigenvalue sizes (variences) in standardized-residual principal components analysis (PCA). Rasch Measurement Transactions, 23(3), 1228.
    [Google Scholar]
  26. Mantel, N. , & Haenszel, W.
    (1959) Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute, 22, 719–748.
    [Google Scholar]
  27. McLean, S. , Kramer, B. , & Beglar, D.
    (2015) The creation and validation of a listening vocabulay levels test. Language Teaching Research, 19(6), 741–760. doi: 10.1177/1362168814567889
    https://doi.org/10.1177/1362168814567889 [Google Scholar]
  28. McNamara, T.
    (2006) Validity in language testing: The challenge of Sam Messick’s legacy. Language Assessment Quarterly, 3(1), 31–51. doi: 10.1207/s15434311laq0301_3
    https://doi.org/10.1207/s15434311laq0301_3 [Google Scholar]
  29. Meara, P. , & Buxton, B.
    (1987) An alternative to multiple choice vocabulary tests.
    [Google Scholar]
  30. Meara, P. , & Miralpeix, I.
    (2017) Tools for researching vocabulary. Bristol, UK: Multilingual Matters.
    [Google Scholar]
  31. Messick, S.
    (1989) Validity. In R. L. Linn (Ed.), Educational Measurement (3rd ed., pp.13–103). New York, NY: Macmillan.
    [Google Scholar]
  32. (1995) Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50(9), 741–749. doi: 10.1037/0003‑066X.50.9.741
    https://doi.org/10.1037/0003-066X.50.9.741 [Google Scholar]
  33. Nation, I. S. P.
    (1983) Testing and teaching vocabulary. Guidelines, 5(1), 12–25.
    [Google Scholar]
  34. (2012) The BNC/COCA word family lists. Retrieved from www.victoria.ac.nz/lals/about/staff/paul-nation
  35. Nation, I. S. P. , & Beglar, D.
    (2007) A vocabulary size test. The Language Teacher, 31(7), 9–13.
    [Google Scholar]
  36. Nation, I. S. P. , & Webb, S.
    (2011) Researching and Analyzing Vocabulary. Boston, MA: Heinle.
    [Google Scholar]
  37. Raîche, G.
    (2005) Critical eigenvalue sizes in standardized residual principal components analysis. Rasch Measurement Transactions, 19(1), 1012.
    [Google Scholar]
  38. Rasch, G.
    (1960) Probabilistic models for some intelligence and attainment tests. Copenhagen: Danmarks Paedagogiske Institut.
    [Google Scholar]
  39. Read, J.
    (2000) Assessing Vocabulary. Cambridge: Cambridge University Press. doi: 10.1017/CBO9780511732942
    https://doi.org/10.1017/CBO9780511732942 [Google Scholar]
  40. Read, J. , & Chapelle, C.
    (2001) A framework for second language vocabulary assessment. Language Testing, 18(1), 3–32. doi: 10.1191/026553201666879851
    https://doi.org/10.1191/026553201666879851 [Google Scholar]
  41. Schmitt, N. , Jiang, X. , & Grabe, W.
    (2011) The percentage of words known in a text and reading comprehension. The Modern Language Journal, 95(1), 26–43. doi: 10.1111/j.1540‑4781.2011.01146.x
    https://doi.org/10.1111/j.1540-4781.2011.01146.x [Google Scholar]
  42. Schmitt, N. , & Schmitt, D.
    (2014) A reassessment of frequency and vocabulary size in L2 vocabulary teaching. Language Teaching, 47(4), 484–503. doi: 10.1017/S0261444812000018
    https://doi.org/10.1017/S0261444812000018 [Google Scholar]
  43. Schmitt, N. , Schmitt, D. , & Clapham, C.
    (2001) Developing and exploring the behaviour of two new versions of the Vocabulary Levels Test. Language Testing, 18(1), 55–88. doi: 10.1177/026553220101800103
    https://doi.org/10.1177/026553220101800103 [Google Scholar]
  44. Schmitt, N. , & Zimmerman, C.
    (2002) Derivative word forms: What do learners know?TESOL Quarterly, 36(2), 145–171. doi: 10.2307/3588328
    https://doi.org/10.2307/3588328 [Google Scholar]
  45. Scott, N. W. , Fayers, P. M. , Aaronson, N. K. , Bottomley, A. , de Graeff, A. , Groenvold, M. , … Sprangers, M. A. G.
    (2009) A simulation study provided sample size guidance for differential item functioning (DIF) studies using short scales. Journal of Clinical Epidemiology, 62(3), 288–295. doi: 10.1016/j.jclinepi.2008.06.003
    https://doi.org/10.1016/j.jclinepi.2008.06.003 [Google Scholar]
  46. Smith, A. B. , Rush, R. , Fallowfield, L. J. , Velikova, G. , & Sharpe, M.
    (2008) Rasch fit statistics and sample size considerations for polytomous data. BMC Medical Research Methodology, 8(33), 1–11.
    [Google Scholar]
  47. Smith Jr., E. V.
    (2004) Evidence for the reliability of measures and validity of measure interpretation: a Rasch measurement perspective. In E. V. Smith Jr. & R. M. Smith (Eds.), Introduction to Rasch measurement: Theory, models and applications (pp.93–122). Maple Grove, MN: JAM Press.
    [Google Scholar]
  48. Stæhr, L. S.
    (2009) Vocabulary knowledge and advanced listening comprehension in English as a foreign language. Studies in Second Language Acquisition, 31(04), 577–607. doi: 10.1017/S0272263109990039
    https://doi.org/10.1017/S0272263109990039 [Google Scholar]
  49. Stevens, J.
    (2002) Applied multivariate statistics for the social sciences (4th ed.). Mahwah, NJ: Lawrence Erlbaum Associates.
    [Google Scholar]
  50. van Zeeland, H. , & Schmitt, N.
    (2013) Lexical coverage in L1 and L2 listening comprehension: The same or different from reading comprehension?Applied Linguistics, 34(4), 457–479. doi: 10.1093/applin/ams074
    https://doi.org/10.1093/applin/ams074 [Google Scholar]
  51. Webb, S.
    (2013) Depth of vocabulary knowledge. In C. Chappelle (Ed.), Encyclopedia of Applied Linguistics (pp.1656–1663). Oxford, UK: Wiley-Blackwell.
    [Google Scholar]
  52. Webb, S. , & Sasao, Y.
    (2013) New directions in vocabulary testing. RELC Journal, 44(3), 263–278. doi: 10.1177/0033688213500582
    https://doi.org/10.1177/0033688213500582 [Google Scholar]
  53. Webb, S. A. & Chang, A. C. -S.
    (2012) Second language vocabulary growth. RELC Journal, 43(1), 113–126. doi: 10.1177/0033688212439367
    https://doi.org/10.1177/0033688212439367 [Google Scholar]
  54. Webb, S. , & Macalister, J.
    (2013) Is text written for children appropriate for L2 extensive reading?TESOL Quarterly, 47(2), 300–322. doi: 10.1002/tesq.70
    https://doi.org/10.1002/tesq.70 [Google Scholar]
  55. Webb, S. , & Nation, P.
    (2017) How Vocabulary is Learned. Oxford: Oxford University Press.
    [Google Scholar]
  56. Webb, S. , & Paribakht, T. S.
    (2015) What is the relationship between the lexical profile of test items and performance on a standardized English proficiency test?English for Specific Purposes, 38, 34–43. doi: 10.1016/j.esp.2014.11.001
    https://doi.org/10.1016/j.esp.2014.11.001 [Google Scholar]
  57. Webb, S. & Rodgers, M. P. H.
    (2009a) The lexical coverage of movies. Applied Linguistics, 30(3), 407–427. doi: 10.1093/applin/amp010
    https://doi.org/10.1093/applin/amp010 [Google Scholar]
  58. (2009b) The vocabulary demands of television programs. Language Learning, 59(2), 335–366. doi: 10.1111/j.1467‑9922.2009.00509.x
    https://doi.org/10.1111/j.1467-9922.2009.00509.x [Google Scholar]
  59. Wolfe, E. W. , & Smith Jr., E. V.
    (2007) Instrument development tools and activities for measure validation using Rasch models: Part 2 – Validation activities. Journal of Applied Measurement, 8, 204–234.
    [Google Scholar]
  60. Wright, B. D. , & Stone, M. H.
    (1979) Best test design. Chicago, IL: MESA Press.
    [Google Scholar]
  61. (2004) Making measures. Chicago, IL: Phaneron Press.
    [Google Scholar]
  62. Xing, P. , & Fulcher, G.
    (2007) Reliability assessment for two versions of Vocabulary Levels Tests. System, 35(2), 182–191.
    [Google Scholar]
  63. Xue, G. , & Nation, I. S. P.
    (1984) A university word list. Language Learning and Communication, 3(2), 215 229.
    [Google Scholar]

Data & Media loading...

  • Article Type: Research Article
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error