Volume 27, Issue 3
  • ISSN 1384-6655
  • E-ISSN: 1569-9811
Buy:$35.00 + Taxes



Vocabulary lists of high-frequency lexical items are an important resource in language education and a key product of corpus research. However, no single vocabulary list will be useful for every learning context, with the appropriateness of such lists affected by the corpora on which they are based. This paper investigates the impact of corpus selection on one measure of lexical sophistication, Advanced Guiraud, focusing on two frequency lists originating from an in-house learner corpus (PELIC) and a global learner corpus (Cambridge Learner Corpus). This analysis shows that frequency lists derived from both types of learner corpus can effectively serve as the basis for measuring the development of lexical sophistication, regardless of the specific program of the learners. Therefore, publicly available learner corpus frequency lists can be a reliable resource for stakeholders interested in the lexical gains of language learners.


Article metrics loading...

Loading full text...

Full text loading...


  1. Alexopoulou, T., Geertzen, J., Korhonen, A., & Meurers, D.
    (2015) Exploring big educational learner corpora for SLA research: Perspectives on relative clauses. International Journal of Learner Corpus Research, 1(1), 96–129. 10.1075/ijlcr.1.1.04ale
    https://doi.org/10.1075/ijlcr.1.1.04ale [Google Scholar]
  2. Baayen, R.
    (2008) Analyzing Linguistic Data: A Practical Introduction to Statistics using R. Cambridge University Press. 10.1017/CBO9780511801686
    https://doi.org/10.1017/CBO9780511801686 [Google Scholar]
  3. Bates, D., Maechler, M., Bolker, B., & Walker, S.
    (2015) Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48. 10.18637/jss.v067.i01
    https://doi.org/10.18637/jss.v067.i01 [Google Scholar]
  4. Bestgen, Y.
    (2017) Beyond single-word measures: L2 writing assessment, lexical richness and formulaic competence. System, 69, 65–78. 10.1016/j.system.2017.08.004
    https://doi.org/10.1016/j.system.2017.08.004 [Google Scholar]
  5. Blanchard, D., Tetreault, J., Higgins, D., Cahill, A., & Chodorow, M.
    (2014) ETS Corpus of Non-Native Written English LDC2014T06. Linguistic Data Consortium.
    [Google Scholar]
  6. Browne, C., Culligan, B., & Phillips, J.
    (2013) The New General Service List. www.newgeneralservicelist.org
  7. Callies, M.
    (2015) Learner corpus methodology. InS. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge Handbook of Learner Corpus Research (pp.35–56). Cambridge University Press. 10.1017/CBO9781139649414.003
    https://doi.org/10.1017/CBO9781139649414.003 [Google Scholar]
  8. Callies, M., & Paquot, M.
    (2015) Learner Corpus Research: An interdisciplinary field on the move. International Journal of Learner Corpus Research, 1(1), 1–6. 10.1075/ijlcr.1.1.00edi
    https://doi.org/10.1075/ijlcr.1.1.00edi [Google Scholar]
  9. Cambridge English Language Assessment
    Cambridge English Language Assessment (2012) Cambridge English: Preliminary and Preliminary for Schools Vocabulary List. www.cambridgeenglish.org/images/84669-pet-vocabulary-list.pdf
    [Google Scholar]
  10. Centre for English Corpus Linguistics
    Centre for English Corpus Linguistics (2019) Learner Corpora around the World. Université catholique de Louvain. RetrievedJanuary, 2019, fromhttps://uclouvain.be/en/research-institutes/ilc/cecl/learner-corpora-around-the-world.html
    [Google Scholar]
  11. Cobb, T.
    (2018) Compleat Web VP [Computer software]. https://www.lextutor.ca/vp/comp/
    [Google Scholar]
  12. Cobb, T. & Horst, M.
    (2015) Learner corpora and lexis. InS. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge Handbook of Learner Corpus Research (pp.185–206). Cambridge University Press. 10.1017/CBO9781139649414.009
    https://doi.org/10.1017/CBO9781139649414.009 [Google Scholar]
  13. Council of Europe
    Council of Europe (2001) Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Press Syndicate of the University of Cambridge.
    [Google Scholar]
  14. Coxhead, A.
    (2000) A new Academic Word List. TESOL Quarterly, 34(2), 213–238. 10.2307/3587951
    https://doi.org/10.2307/3587951 [Google Scholar]
  15. Crossley, S. A., Salsbury, T., & Mcnamara, D. S.
    (2015) Assessing lexical proficiency using analytic ratings: A case for collocation accuracy. Applied Linguistics, 36(5), 570–590.
    [Google Scholar]
  16. Daller, H., & Phelan, D.
    (2007) What is in a teacher’s mind? Teacher ratings of EFL essays and different aspects of lexical richness. InH. Daller, J. Milton, & J. Treffers-Daller (Eds.), Modelling and Assessing Vocabulary Knowledge (pp.234–244). Cambridge University Press. 10.1017/CBO9780511667268.016
    https://doi.org/10.1017/CBO9780511667268.016 [Google Scholar]
  17. Daller, H., Turlik, J., & Weir, I.
    (2013) Vocabulary acquisition and the learning curve. InS. Jarvis & M. Daller (Eds.), Vocabulary Knowledge: Human Ratings and Automated Measures (pp.185–218). John Benjamins. 10.1075/sibil.47.09ch7
    https://doi.org/10.1075/sibil.47.09ch7 [Google Scholar]
  18. Daller, H., van Hout, R., & Treffers-Daller, J.
    (2003) Lexical richness in the spontaneous speech of bilinguals. Applied Linguistics, 24(2), 197–222. 10.1093/applin/24.2.197
    https://doi.org/10.1093/applin/24.2.197 [Google Scholar]
  19. Daller, H., & Xue, H.
    (2007) Lexical richness and the oral proficiency of Chinese EFL students. InH. Daller, J. Milton, & J. Treffers-Daller (Eds.), Modelling and Assessing Vocabulary Knowledge (pp.150–164). Cambridge University Press. 10.1017/CBO9780511667268.011
    https://doi.org/10.1017/CBO9780511667268.011 [Google Scholar]
  20. Davies, M.
    (2008–) The Corpus of Contemporary American English (COCA): 560 million words, 1990-present. RetrievedOctober, 2018, fromhttps://corpus.byu.edu/coca/ (accessed
    [Google Scholar]
  21. DeKeyser, R. M. & Botana, G. P.
    (Eds.) (2019) Doing SLA Research with Implications for the Classroom: Reconciling Methodological Demands with Pedagogical Applicability. John Benjamins. 10.1075/lllt.52
    https://doi.org/10.1075/lllt.52 [Google Scholar]
  22. Duràn, P., Malvern, D., Richards, B., & Chipere, N.
    (2004) Developmental trends in lexical diversity. Applied Linguistics, 25(2), 220–242. 10.1093/applin/25.2.220
    https://doi.org/10.1093/applin/25.2.220 [Google Scholar]
  23. Dunlap, S.
    (2012) Orthographic Quality in English as a Second Language. [Unpublished doctoral dissertation]. University of Pittsburgh.
    [Google Scholar]
  24. Gablasova, D., Brezina, V., & McEnery, T.
    (2017) Exploring learner language through corpora: Comparing and interpreting corpus frequency information. Language Learning, 67(1), 130–154. 10.1111/lang.12226
    https://doi.org/10.1111/lang.12226 [Google Scholar]
  25. Geertzen, J., Alexopoulou, T., & Korhonen, A.
    (2013) Automatic linguistic annotation of large scale L2 databases: The EF-Cambridge Open Language Database (EFCAMDAT). Selected Proceedings of the 31st Second Language Research Forum (SLRF) (pp.240–254). Cascadilla Press.
    [Google Scholar]
  26. Gilquin, G.
    (2015) From design to collection of learner corpora. InS. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge Handbook of Learner Corpus Research (pp.9–34). Cambridge University Press. 10.1017/CBO9781139649414.002
    https://doi.org/10.1017/CBO9781139649414.002 [Google Scholar]
  27. Granger, S., & Wynne, M.
    (1999) Optimising measures of lexical variation in EFL learner corpora. InJ. Kirk (Ed.), Corpora Galore (pp.249257). Rodopi.
    [Google Scholar]
  28. Holliday, A.
    (2006) Native-speakerism. ELT Journal, 60(4), 385–387. 10.1093/elt/ccl030
    https://doi.org/10.1093/elt/ccl030 [Google Scholar]
  29. Jarvis, S.
    (2013) Defining and measuring lexical diversity. InS. Jarvis & M. Daller (Eds.), Vocabulary Knowledge: Human Ratings and Automated Measures (pp.13–44). John Benjamins. 10.1075/sibil.47.03ch1
    https://doi.org/10.1075/sibil.47.03ch1 [Google Scholar]
  30. Juffs, A.
    (2019) Lexical development in the writing of intensive English program students. InR. M. DeKeyser, & G. P. Botana (Eds.), Reconciling Methodological Demands with Pedagogical Applicability (pp.179–200). John Benjamins. 10.1075/lllt.52.09juf
    https://doi.org/10.1075/lllt.52.09juf [Google Scholar]
  31. Juffs, A., Han, N-R., & Naismith, B.
    (2020) PELIC: The University of Pittsburgh English Language Institute Corpus. Available online athttps://eli-data-mining-group.github.io/Pitt-ELI-Corpus/
    [Google Scholar]
  32. Kim, M. M., Crossley, S. A., & Kyle, K.
    (2018) Lexical sophistication as a multidimensional phenomenon: Relations to second language lexical proficiency, development, and writing quality. The Modern Language Journal, 102(1), 120–141. 10.1111/modl.12447
    https://doi.org/10.1111/modl.12447 [Google Scholar]
  33. Laufer, B., & Nation, P.
    (1995) Vocabulary size and use: Lexical richness in L2 written production. Applied Linguistics, 16(3), 307–322. 10.1093/applin/16.3.307
    https://doi.org/10.1093/applin/16.3.307 [Google Scholar]
  34. Levshina, N.
    (2015) How to do Linguistics with R: Data Exploration and Statistical Analysis. John Benjamins. 10.1075/z.195
    https://doi.org/10.1075/z.195 [Google Scholar]
  35. Lindqvist, C., Gudmundson, A., & Bardel, C.
    (2013) A new approach to measuring lexical sophistication in L2 oral production. InC. Bardel, C. Lindqvist, & B. Laufer (Eds.), EUROSLA Monographs Series 2 (pp.109–126). European Second Language Association.
    [Google Scholar]
  36. Malvern, D., Richards, B. J., Chipere, N., & Durán, P.
    (2004) Lexical Diversity and Language Development. Palgrave Macmillan. 10.1057/9780230511804
    https://doi.org/10.1057/9780230511804 [Google Scholar]
  37. McCarthy, M.
    (1998) Spoken Language and Applied Linguistics. Cambridge University Press.
    [Google Scholar]
  38. Miller, D., & Biber, D.
    (2015) Evaluating reliability in quantitative vocabulary studies: The influence of corpus design and composition. International Journal of Corpus Linguistics, 20(1), 30–53. 10.1075/ijcl.20.1.02mil
    https://doi.org/10.1075/ijcl.20.1.02mil [Google Scholar]
  39. Monteiro, K. R., Crossley, S. A., & Kyle, K.
    (2018) In search of new benchmarks: Using L2 lexical frequency and contextual diversity indices to assess second language writing. Applied Linguistics, 41(2), 1–22.
    [Google Scholar]
  40. Mukherjee, J., & Rohrbach, J.-M.
    (2006) Rethinking applied corpus linguistics from a language-pedagogical perspective: New departures in learner corpus research. InB. Kettemann, & G. Marko (Eds.), Planing, Gluing and Painting Corpora: Inside the Applied Corpus Linguist’s Workshop (pp.205–232). Peter Lang.
    [Google Scholar]
  41. Naismith, B., Han, N.-R., Juffs, A., Hill, B. L., & Zheng, D.
    (2018) Accurate measurement of lexical sophistication with reference to ESL learner data. InK. E. Boyer & M. Yudelson (Eds.), Proceedings of the 11th International Conference on Educational Data Mining (pp259–265). International Educational Data Mining Society. https://educationaldatamining.org/EDM2018/proceedings/
    [Google Scholar]
  42. Nation, I. S. P.
    (2001) Learning Vocabulary in Another Language. Cambridge University Press. 10.1017/CBO9781139524759
    https://doi.org/10.1017/CBO9781139524759 [Google Scholar]
  43. Ortega, L.
    (2016) Multi-competence in second language acquisition: Inroads into the mainstream?InV. Cook & L. Wei (Eds.), The Cambridge Handbook of Linguistic Multi-Competence (pp.50–76). Cambridge University Press. 10.1017/CBO9781107425965.003
    https://doi.org/10.1017/CBO9781107425965.003 [Google Scholar]
  44. Princeton University
    Princeton University (2010) WordNet Search – 3.1. WordNet. wordnetweb.princeton.edu/perl/webwn
    [Google Scholar]
  45. R Core Team
    R Core Team (2019) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URLhttps://www.R-project.org/
    [Google Scholar]
  46. Rampton, M. B. H.
    (1990) Displacing the ‘native speaker’: Expertise, affiliation, and inheritance. ELT Journal, 44(2), 97–101. 10.1093/eltj/44.2.97
    https://doi.org/10.1093/eltj/44.2.97 [Google Scholar]
  47. Rankin, T., & Schiftner, B.
    (2011) Marginal prepositions in learner English: Applying local corpus data. International Journal of Corpus Linguistics, 16(3), 412–434. 10.1075/ijcl.16.3.07ran
    https://doi.org/10.1075/ijcl.16.3.07ran [Google Scholar]
  48. Schmitt, N., & Schmitt, D.
    (2014) A reassessment of frequency and vocabulary size in L2 vocabulary teaching. Language Teaching, 47(4), 484–503. 10.1017/S0261444812000018
    https://doi.org/10.1017/S0261444812000018 [Google Scholar]
  49. Speelman, D., Heylen, K., & Geeraerts, D.
    (Eds.) (2018) Mixed-Effects Regression Models in Linguistics. Springer. 10.1007/978‑3‑319‑69830‑4
    https://doi.org/10.1007/978-3-319-69830-4 [Google Scholar]
  50. Stewart, D., Bernardini, S., & Aston, G.
    (2004) Introduction: Ten years of TaLC. InD. Stewart, S. Bernardini, & G. Aston (Eds.), Corpora and Language Learners (pp.1–18). John Benjamins. 10.1075/scl.17.01ste
    https://doi.org/10.1075/scl.17.01ste [Google Scholar]
  51. Tidball, F., & Treffers-Daller, J.
    (2008) Analysing lexical richness in French learner language: What frequency lists and teacher judgements can tell us about basic and advanced words. Journal of French Language Studies, 18(3), 299–313. 10.1017/S0959269508003463
    https://doi.org/10.1017/S0959269508003463 [Google Scholar]
  52. van Hout, R., & Vermeer, A.
    (2007) Comparing measures of lexical richness. InH. Daller, J. Milton, & J. Treffers-Daller (Eds.), Modelling and Assessing Vocabulary Knowledge (pp.93–115). Cambridge University Press. 10.1017/CBO9780511667268.008
    https://doi.org/10.1017/CBO9780511667268.008 [Google Scholar]
  53. Van Rossum, G., & Drake, F. L.
    (2009) Python 3 Reference Manual. CreateSpace.
    [Google Scholar]
  54. Vilkaitė-Lozdienė, L., & Schmitt, N.
    (2020) Frequency as a guide for vocabulary usefulness. InS. Webb (Ed.), The Routledge Handbook of Vocabulary Studies (pp.81–96). Routledge. 10.4324/9780429291586‑6
    https://doi.org/10.4324/9780429291586-6 [Google Scholar]

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error