1887
image of Achieving stability in corpus-based analysis of word types
USD
Buy:$35.00 + Taxes

Abstract

Abstract

Rank-ordered lists of word types are ubiquitous in corpus linguistics and applied linguistics. Word lists are commonly developed as aids for language teaching and learning, vocabulary testing, and language description. Yet, these lists are often produced and used without evaluation of their stability — or replicability — across corpus samples. Our primary objective in this paper is to describe the cumulative state of knowledge regarding the stability of corpus-based word type lists, focusing on three goals that motivate the creation and use of rank-ordered lists: identifying key lexical items for learning or teaching, assessing vocabulary size or knowledge, and identifying all items in a language domain. We show that word type lists are far less stable than researchers and practitioners often assume, although there is substantial variability in stability depending on the goals and methods behind list creation.

Loading

Article metrics loading...

/content/journals/10.1075/ijcl.24109.egb
2025-05-20
2025-06-24
Loading full text...

Full text loading...

References

  1. Anthony, L.
    (2024) AntConc (Version 4.3.1) [Computer software]. Waseda University. https://laurenceanthony.net/software/antconc/
    [Google Scholar]
  2. Baayen, R. H.
    (2001) Word frequency distributions. Kluwer. 10.1007/978‑94‑010‑0844‑0
    https://doi.org/10.1007/978-94-010-0844-0 [Google Scholar]
  3. Baker, P.
    (2004) Querying keywords: Questions of difference, frequency, and sense in keywords analysis. Journal of English linguistics, (), –. 10.1177/0075424204269894
    https://doi.org/10.1177/0075424204269894 [Google Scholar]
  4. Baroni, M.
    (2009) Distributions in text. InA. Lüdeling and M. Kytö (Eds.), Corpus linguistics: An international handbook (Vol., pp.–). Mouton de Gruyter. 10.1515/9783110213881.2.803
    https://doi.org/10.1515/9783110213881.2.803 [Google Scholar]
  5. Biber, D., Conrad, S., & Cortes, V.
    (2004) If you look at…: Lexical bundles in university teaching and textbooks. Applied Linguistics, (), –. 10.1093/applin/25.3.371
    https://doi.org/10.1093/applin/25.3.371 [Google Scholar]
  6. Biber, D., Johansson, S., Leech, G. N., Conrad, S., Finegan, E., & Quirk, R.
    (2021) Grammar of spoken and written English. John Benjamins. 10.1075/z.232
    https://doi.org/10.1075/z.232 [Google Scholar]
  7. Brezina, V., & Gablasova, D.
    (2015) Is there a core general vocabulary?Introducing the New General Service List. Applied Linguistics, (), –. 10.1093/applin/amt018
    https://doi.org/10.1093/applin/amt018 [Google Scholar]
  8. Brezina, V., Weill-Tessier, P., & McEnery, A.
    (2021) #LancsBox (Version 6) [Computer software]. corpora.lancs.ac.uk/lancsbox/index.php
    [Google Scholar]
  9. Brown, D.
    (2017) Coverage-based frequency bands: A proposal. Vocabulary Learning and Instruction, (), –. 10.7820/vli.v06.2.Brown
    https://doi.org/10.7820/vli.v06.2.Brown [Google Scholar]
  10. Browne, C.
    (2014) A new general service list: The better mousetrap we’ve been looking for?Vocabulary learning and Instruction, (), –. 10.7820/vli.v03.2.browne
    https://doi.org/10.7820/vli.v03.2.browne [Google Scholar]
  11. Burch, B., & Egbert, J.
    (2022) Word use equivalence and hierarchical word tiers. Journal of Quantitative Linguistics, (), –. 10.1080/09296174.2022.2129377
    https://doi.org/10.1080/09296174.2022.2129377 [Google Scholar]
  12. Carroll, J. B., Davies, P., & Richman, B.
    (1971) The American heritage word frequency book. Houghton Mifflin.
    [Google Scholar]
  13. Cobb, T.
    (n.d.). Compleat Lexical Tutor (Version 8.5) [Computer software]. https://www.lextutor.ca/
    [Google Scholar]
  14. Coxhead, A.
    (2000) A new academic word list. TESOL Quarterly, (), –. 10.2307/3587951
    https://doi.org/10.2307/3587951 [Google Scholar]
  15. Coxhead, A., Nation, I. S. P., & Sim, D.
    (2012) Creating and trialling six versions of the Vocabulary Size Test. TESOLANZ Journal, , –. 10.26686/wgtn.12543416.v1
    https://doi.org/10.26686/wgtn.12543416.v1 [Google Scholar]
  16. Davies, M.
    (n.d.). English-Corpora.org [Computer software]. https://www.english-corpora.org/
    [Google Scholar]
  17. Egbert, J.
  18. Egbert, J., & Biber, D.
    (2019) Incorporating text dispersion into keyword analyses. Corpora, (), –. 10.3366/cor.2019.0162
    https://doi.org/10.3366/cor.2019.0162 [Google Scholar]
  19. Egbert, J., Biber, D., & Gray, B.
    (2022) Designing and evaluating language corpora: A practical framework for corpus representativeness. Cambridge University Press. 10.1017/9781316584880
    https://doi.org/10.1017/9781316584880 [Google Scholar]
  20. Egbert, J., Larsson, T., & Biber, D.
    (2020) Doing linguistics with a corpus: Methodological considerations for the everyday user. Cambridge University Press. 10.1017/9781108888790
    https://doi.org/10.1017/9781108888790 [Google Scholar]
  21. Ellis, N. C.
    (2012) Formulaic language and second language acquisition: Zipf and the phrasal teddy bear. Annual Review of Applied Linguistics, , –. 10.1017/S0267190512000025
    https://doi.org/10.1017/S0267190512000025 [Google Scholar]
  22. Francis, W. N., Kučera, H., Mackie, A. W.
    (1982) Frequency analysis of English usage: Lexicon and grammar. Houghton Mifflin.
    [Google Scholar]
  23. Gardner, D., & Davies, M.
    (2014) A new academic vocabulary list. Applied Linguistics, (), –. 10.1093/applin/amt015
    https://doi.org/10.1093/applin/amt015 [Google Scholar]
  24. Kilgarriff, A., Baisa, V., Bušta, J., Jakubíček, M., Kovář, V., Michelfeit, J., Rychlý, P., Suchomel, V.
    (2014) The Sketch Engine: Ten years on. Lexicography, : –. 10.1007/s40607‑014‑0009‑9
    https://doi.org/10.1007/s40607-014-0009-9 [Google Scholar]
  25. Leech, G., Rayson, P., & Wilson, A.
    (2014) Word frequencies in written and spoken English: Based on the British National Corpus. Routledge. 10.4324/9781315840161
    https://doi.org/10.4324/9781315840161 [Google Scholar]
  26. McLean, S., & Kramer, B.
    (2015) The creation of a new vocabulary levels test. Shiken, (), –.
    [Google Scholar]
  27. Miller, D.
    (2022) Replication as a means of assessing corpus representativeness and the generalizability of specialized word lists. Applied Corpus Linguistics, 2(), . 10.1016/j.acorp.2022.100027
    https://doi.org/10.1016/j.acorp.2022.100027 [Google Scholar]
  28. Miller, D., & Biber, D.
    (2015) Evaluating reliability in quantitative vocabulary studies: The influence of corpus design and composition. International Journal of Corpus Linguistics, (), –. 10.1075/ijcl.20.1.02mil
    https://doi.org/10.1075/ijcl.20.1.02mil [Google Scholar]
  29. Nation, I. S. P.
    (2004) A study of the most frequent word families in the British National Corpus. InP. Bogaards & B. Laufer (Eds.), Vocabulary in a second language: Selection, acquisition, and testing (pp.–). John Benjamins. 10.1075/lllt.10.03nat
    https://doi.org/10.1075/lllt.10.03nat [Google Scholar]
  30. (2016) Making and using word lists for language learning and testing. John Benjamins. 10.1075/z.208
    https://doi.org/10.1075/z.208 [Google Scholar]
  31. Nation, P., & Beglar, D.
    (2007) A vocabulary size test. The Language Teacher, (), –. 10.26686/wgtn.12552197.v1
    https://doi.org/10.26686/wgtn.12552197.v1 [Google Scholar]
  32. Pan, F., Reppen, R., & Biber, D.
    (2016) Comparing patterns of L1 versus L2 English academic professionals: Lexical bundles in Telecommunications research journals. Journal of English for Academic Purposes, , –. 10.1016/j.jeap.2015.11.003
    https://doi.org/10.1016/j.jeap.2015.11.003 [Google Scholar]
  33. Renouf, A.
    (2012) A finer definition of neology in English: The life-cycle of a word. InH. Hasselgård, J. Ebeling, & S. Oksefjell Ebeling (Eds.), Corpus perspectives on patterns of lexis (pp.–). John Benjamins. 10.1075/scl.57.14ren
    https://doi.org/10.1075/scl.57.14ren [Google Scholar]
  34. Reppen, R.
    (2010) Using corpora in the language classroom. Cambridge University Press.
    [Google Scholar]
  35. (2016) Enhancing language teaching: How corpus linguistics can help. Corpus Linguistics Research, , –. 10.18659/CLR.2016.1.0.02
    https://doi.org/10.18659/CLR.2016.1.0.02 [Google Scholar]
  36. Richards, J. C.
    (1974) Word lists: Problems and prospects. RELC Journal, (), –. 10.1177/003368827400500207
    https://doi.org/10.1177/003368827400500207 [Google Scholar]
  37. Schmitt, N., & Schmitt, D.
    (2014) A reassessment of frequency and vocabulary size in L2 vocabulary teaching. Language Teaching, (), –. 10.1017/S0261444812000018
    https://doi.org/10.1017/S0261444812000018 [Google Scholar]
  38. Scott, M.
    (2024) WordSmith Tools (Version 9.0) [Computer software]. Lexical Analysis Software. https://lexically.net/wordsmith/downloads/
    [Google Scholar]
  39. Shin, D., & Nation, P.
    (2008) Beyond single words: The most frequent collocations in spoken English. ELT Journal, (), –. 10.1093/elt/ccm091
    https://doi.org/10.1093/elt/ccm091 [Google Scholar]
  40. Simpson-Vlach, R., & Ellis, N. C.
    (2010) An academic formulas list: New methods in phraseology research. Applied Linguistics, (), –. 10.1093/applin/amp058
    https://doi.org/10.1093/applin/amp058 [Google Scholar]
  41. Tanaka‐Ishii, K., & Terada, H.
    (2011) Word familiarity and frequency. Studia Linguistica, (), –. 10.1111/j.1467‑9582.2010.01176.x
    https://doi.org/10.1111/j.1467-9582.2010.01176.x [Google Scholar]
  42. West, M.
    (Ed.) (1953) A general service list of English words: With semantic frequencies and a supplementary word-list for the writing of popular science and technology. Longman.
    [Google Scholar]
  43. Xue, G., & Nation, I. S.
    (1984) A university word list. Language Learning and Communication, 3(), –.
    [Google Scholar]
  44. Yang, M. -N.
    (2015) A nursing academic word list. English for Specific Purposes, , –. 10.1016/j.esp.2014.05.003
    https://doi.org/10.1016/j.esp.2014.05.003 [Google Scholar]
/content/journals/10.1075/ijcl.24109.egb
Loading
/content/journals/10.1075/ijcl.24109.egb
Loading

Data & Media loading...

  • Article Type: Research Article
Keywords: vocabulary lists ; stability ; word frequency ; ranking ; word types
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error