Volume 24, Issue 1
  • ISSN 1384-6655
  • E-ISSN: 1569-9811
Buy:$35.00 + Taxes



Recently-developed tools which quickly and reliably quantify vocabulary use on a range of measures open up new possibilities for understanding the construct of vocabulary sophistication. To take this work forward, we need to understand how these different measures relate to each other and to human readers’ perceptions of texts. This study applied 356 quantitative measures of vocabulary use generated by an automated vocabulary analysis tool (Kyle & Crossley, 2015) to a large corpus of assignments written for First-Year Composition courses at a university in the United States. Results suggest that the majority of measures can be reduced to a much smaller set without substantial loss of information. However, distinctions need to be retained between measures based on content vs. function words and on different measures of collocational strength. Overall, correlations with grades are reliable but weak.


Article metrics loading...

Loading full text...

Full text loading...


  1. Berman, R. A., & Nir, B.
    (2010) The lexicon in writing-speech-differentiation. Written Language and Literacy, 13(2), 183–205. 10.1075/wll.13.2.01ber
    https://doi.org/10.1075/wll.13.2.01ber [Google Scholar]
  2. Berman, R. A., & Nir-Sagiv, B.
    (2007) Comparing narrative and expository text construction across adolescence: A developmental paradox. Discourse Processes, 43(2), 79–120. 10.1080/01638530709336894
    https://doi.org/10.1080/01638530709336894 [Google Scholar]
  3. Bestgen, Y.
    (2017) Beyond single-word measures: L2 writing assessment, lexical richness and formulaic competence. System, 69, 65–78. 10.1016/j.system.2017.08.004
    https://doi.org/10.1016/j.system.2017.08.004 [Google Scholar]
  4. Bestgen, Y., & Granger, S.
    (2014) Quantifying the development of phraseological competence in L2 English writing: An automated approach. Journal of Second Language Writing, 26, 28–41. 10.1016/j.jslw.2014.09.004
    https://doi.org/10.1016/j.jslw.2014.09.004 [Google Scholar]
  5. Biber, D.
    (1988) Variation Across Speech and Writing. Cambridge: Cambridge University Press. 10.1017/CBO9780511621024
    https://doi.org/10.1017/CBO9780511621024 [Google Scholar]
  6. BNC Consortium
    BNC Consortium (2007) British National Corpus, version 3 (BNC XML ed.). Retrieved fromwww.natcorp.ox.ac.uk (Last accessedFebruary 2019).
  7. Brown, G. D. A.
    (1984) A frequency count of 190,000 words in the London-Lund corpus of English conversation. Behavior Research Methods, Instrumentation & Computers, 16(6), 502–532. 10.3758/BF03200836
    https://doi.org/10.3758/BF03200836 [Google Scholar]
  8. Brysbaert, M., & New, B.
    (2009) Moving beyond Kucera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods, 41(4), 977–990. 10.3758/BRM.41.4.977
    https://doi.org/10.3758/BRM.41.4.977 [Google Scholar]
  9. Bulté, B., & Housen, A.
    (2014) Conceptualizing and measuring short-term changes in L2 writing complexity. Journal of Second Language Writing, 26, 42–65. 10.1016/j.jslw.2014.09.005
    https://doi.org/10.1016/j.jslw.2014.09.005 [Google Scholar]
  10. Burnage, G.
    (1990) CELEX: A Guide for Users. Nijmegen: CELEX – Centre for Lexical Information.
    [Google Scholar]
  11. Coxhead, A.
    (2000) A new academic wordlist. TESOL Quarterly, 34(2), 213–238. 10.2307/3587951
    https://doi.org/10.2307/3587951 [Google Scholar]
  12. Crossley, S. A., Cai, Z., & McNamara, D.
    (2012) Syntagmatic, paradigmatic, and automatic n-gram approaches to assessing essay quality. InG. M. Youngblood & P. M. McCarthy (Eds.), Proceedings of the Twenty-Fifth International Florida Artificial Intelligence Research Society Conference (pp.214–219). Palo-Alto, CA: The AAAI Press.
    [Google Scholar]
  13. Crossley, S. A., DeFore, C., Kyle, K., Dai, J., & McNamara, D.
    (2013) Paragraph specific n-gram approaches to automatically assessing essay quality. InS. K. D’Mello, R. A. Clavo & A. Olney (Eds.), Proceedings of the 6th International Conference on Educational Data Mining (pp.216–219). Heidelberg: Springer. Retrieved fromwww.educationaldatamining.org/EDM2013/papers/rn_paper_31.pdf (Last accessedFebruary 2019)
    [Google Scholar]
  14. Crossley, S. A., Salsbury, T., McNamara, D., & Jarvis, S.
    (2010) Predicting lexical proficiency in language learner texts using computational indices. Language Testing, 28(4), 561–580. 10.1177/0265532210378031
    https://doi.org/10.1177/0265532210378031 [Google Scholar]
  15. Crossley, S. A., Weston, J. L., Sullivan, S. T. M., & McNamara, D.
    (2011) The development of writing proficiency as a function of grade level: A linguistic analysis. Written Communication, 28(3), 282–311. 10.1177/0741088311410188
    https://doi.org/10.1177/0741088311410188 [Google Scholar]
  16. Cumming, A., Kantor, R., Baba, K., Erdosy, U., Eouanzoui, K., & James, M.
    (2005) Differences in written discourse in independent and integrated prototype tasks for next generation TOEFL. Assessing Writing, 10(1), 5–43. 10.1016/j.asw.2005.02.001
    https://doi.org/10.1016/j.asw.2005.02.001 [Google Scholar]
  17. Daller, H., Turlik, J., & Weir, I.
    (2013) Vocabulary acquisition and the learning curve. InS. Jarvis & H. Daller (Eds.), Vocabulary Knowledge: Human Ratings and Automated Measures (pp.185–215). Amsterdam/Philadelphia, PA: John Benjamins. 10.1075/sibil.47.09ch7
    https://doi.org/10.1075/sibil.47.09ch7 [Google Scholar]
  18. Davies, M.
    (2008-) The Corpus of Contemporary American: 450 million words, 1990-present. Retrieved fromcorpus.byu.edu/coca/ (last accessedFebruary 2019).
    [Google Scholar]
  19. Durrant, P.
    (2014) Corpus frequency and second language learners’ knowledge of collocations. International Journal of Corpus Linguistics, 19(4), 443–477. 10.1075/ijcl.19.4.01dur
    https://doi.org/10.1075/ijcl.19.4.01dur [Google Scholar]
  20. Durrant, P., & Brenchley, M.
    (in press). Corpus research on the development of children’s writing in L1 English. InA. Glaznieks, A. Abel, V. Lyding, & V. Nicolas (Eds.) Corpora and Language in Use: Proceedings of the Learner Corpus Research Conference 2017 Louvain: Presses Universitaires de Louvain.
    [Google Scholar]
  21. Durrant, P., & Schmitt, N.
    (2009) To what extent do native and non-native writers make use of collocations?International Review of Applied Linguistics, 47(2), 157–177. 10.1515/iral.2009.007
    https://doi.org/10.1515/iral.2009.007 [Google Scholar]
  22. Garner, J., Crossley, S. A., & Kyle, K.
    (2018) Beginning and intermediate L2 writers’ use of N-grams: An association measures study. International Review of Applied Linguistics. Advance online publication. 10.1515/iral‑2017‑0089
    https://doi.org/10.1515/iral-2017-0089 [Google Scholar]
  23. Golub, L. S., & Frederick, W. C.
    (1979) Linguistic Structures in the discourse of fourth and sixth graders. Madison, WI: Center for Cognitive Learning, The University of Wisconsin.
    [Google Scholar]
  24. Graesser, A. C., McNamara, D., Louwerse, M. M., & Cai, Z.
    (2014) Coh-Metrix: Analysis of text on cohesion and language. Behavioral Research Methods, Instruments, and Computers, 36(2), 193–202. 10.3758/BF03195564
    https://doi.org/10.3758/BF03195564 [Google Scholar]
  25. Granger, S., & Bestgen, Y.
    (2014) The use of collocations by intermediate vs. advanced non-ntive writers: A bigram-based study. International Review of Applied Linguistics, 52(3), 229–252. 10.1515/iral‑2014‑0011
    https://doi.org/10.1515/iral-2014-0011 [Google Scholar]
  26. Gries, S. Th.
    (2013) 50-something years of work on collocations: What is or should be next…International Journal of Corpus Linguistics, 18(1), 137–165. 10.1075/ijcl.18.1.09gri
    https://doi.org/10.1075/ijcl.18.1.09gri [Google Scholar]
  27. Grobe, C.
    (1981) Syntactic maturity, mechanics, and vocabulary as predictors of quality ratings. Research in the Teaching of English, 15(1), 75–85.
    [Google Scholar]
  28. Guo, L., Crossley, S. A., & McNamara, D.
    (2013) Predicting human judgments of essay quality in both integrated and independent second language writing samples: A comparison study. Assessing Writing, 18(3), 218–238. 10.1016/j.asw.2013.05.002
    https://doi.org/10.1016/j.asw.2013.05.002 [Google Scholar]
  29. Hou, J., Verspoor, M., & Loerts, H.
    (2016) An exploratory study into the dynamics of Chinese L2 writing development. Dutch Journal of Applied Linguistics, 5(1), 65–96. 10.1075/dujal.5.1.04loe
    https://doi.org/10.1075/dujal.5.1.04loe [Google Scholar]
  30. Jarvis, S., Grant, L., Bikowski, D., & Ferris, D.
    (2003) Exploring multiple profiles of highly rated learner compositions. Journal of Second Language Writing, 12, 377–403. 10.1016/j.jslw.2003.09.001
    https://doi.org/10.1016/j.jslw.2003.09.001 [Google Scholar]
  31. Kim, J.-Y.
    (2014) Predicting L2 writing proficiency using linguistic complexity measures: A corpus-based study. English Teaching, 69(4), 27–51. 10.15858/engtea.69.4.201412.27
    https://doi.org/10.15858/engtea.69.4.201412.27 [Google Scholar]
  32. Kim, M., Crossley, S. A., & Kyle, K.
    (2018) Lexical sophistication as a multidimensional phenomenon: Relations to second language lexical proficiency, development, and writing quality. The Modern Language Journal, 102(1), 120–141. 10.1111/modl.12447
    https://doi.org/10.1111/modl.12447 [Google Scholar]
  33. Knoch, U., Rouhshad, A., Oon, S. P., & Storch, N.
    (2015) What happens to ESL students’ writing after three years of study at an English medium university?Journal of Second Language Writing, 28, 39–52. 10.1016/j.jslw.2015.02.005
    https://doi.org/10.1016/j.jslw.2015.02.005 [Google Scholar]
  34. Knoch, U., Rouhshad, A., & Storch, N.
    (2014) Does the writing of undergraduate ESL students develop after one year of study in an English-medium university?Assessing Writing, 21, 1–17. 10.1016/j.asw.2014.01.001
    https://doi.org/10.1016/j.asw.2014.01.001 [Google Scholar]
  35. Kucera, H. & Francis, W.
    (1967) Computational Analysis of Present-day American English. Providence, RI: Brown University Press.
    [Google Scholar]
  36. Kyle, K.
    (2017) Modelling quality in source-based texts. Retrieved fromhttps://a4li.sri.com/archive/papers/Kyle_2017_Writing_Quality.pdf (last accessedFebruary 2019).
  37. Kyle, K., & Crossley, S. A.
    (2015) Automatically assessing lexical sophistication: Indices, tools, findings, and application. TESOL Quarterly, 49(4), 757–786. 10.1002/tesq.194
    https://doi.org/10.1002/tesq.194 [Google Scholar]
  38. (2016) The relationship between lexical sophistication and independent and source-based writing. Journal of Second Language Writing, 34, 12–24. 10.1016/j.jslw.2016.10.003
    https://doi.org/10.1016/j.jslw.2016.10.003 [Google Scholar]
  39. Malvern, D., & Richards, B.
    (2002) Investigating accommodation in language proficiency interviews using a new measure of lexical diversity. Language Testing, 19(1), 85–104. 10.1191/0265532202lt221oa
    https://doi.org/10.1191/0265532202lt221oa [Google Scholar]
  40. Malvern, D., Richards, B. J., Chipere, N., & Durán, P.
    (2004) Lexical Diversity and Language Development. Basingstoke: Palgrave Macmillan. 10.1057/9780230511804
    https://doi.org/10.1057/9780230511804 [Google Scholar]
  41. Massey, A. J., & Elliott, G. L.
    (1996) Aspects of Writing in 16+ English Examinations Between 1980 & 1994. Cambridge: University of Cambridge Local Examinations Syndicate.
    [Google Scholar]
  42. Massey, A. J., Elliott, G. L., & Johnson, N. K.
    (2005) Variations in Aspects of Writing in 16+ English Examinations Between 1980 and 2004: Vocabulary, Spelling, Punctuation, Sentence Structure, Non-standard English. Cambridge: Cambridge Assessment.
    [Google Scholar]
  43. Mazgutova, D., & Kormos, J.
    (2015) Syntactic and lexical development in an intensive English for Academic Purposes programme. Journal of Second Language Writing, 29, 3–15. 10.1016/j.jslw.2015.06.004
    https://doi.org/10.1016/j.jslw.2015.06.004 [Google Scholar]
  44. McCarthy, P. M., & Jarvis, S.
    (2011) MTLD, voc-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment. Behavior Research Methods, 42(2), 381–392. 10.3758/BRM.42.2.381
    https://doi.org/10.3758/BRM.42.2.381 [Google Scholar]
  45. Meurers, D., & Dickinson, M.
    (2017) Evidence and interpretation in language learning research: Opportunities for collaboration with computational linguistics. Language Learning, 67:S1, 66–95. 10.1111/lang.12233
    https://doi.org/10.1111/lang.12233 [Google Scholar]
  46. Moxley, J.
    (2013) Big data, learning analytics, and social assessment. Journal of Writing Assessment, 6(1), 1–10.
    [Google Scholar]
  47. Myhill, D.
    (1999) Writing matters: Linguistic characteristics of writing in GCSE English examinations. English in Education, 33(3), 70–81. 10.1111/j.1754‑8845.1999.tb00726.x
    https://doi.org/10.1111/j.1754-8845.1999.tb00726.x [Google Scholar]
  48. (2009) From talking to writing: Linguistic development in writing. BJEP Monograph Series II, 6, 27–44.
    [Google Scholar]
  49. Olinghouse, N. G., & Leaird, J. T.
    (2009) The relationship between measures of vocabulary and narrarive writing quality in second- and fourth-grade students. Reading and Writing, 22, 545–565. 10.1007/s11145‑008‑9124‑z
    https://doi.org/10.1007/s11145-008-9124-z [Google Scholar]
  50. Olinghouse, N. G., & Wilson, J.
    (2013) The relationship between vocabulary and writing quality in three genres. Reading and Writing: An Interdisciplinary Journal, 26, 45–65. 10.1007/s11145‑012‑9392‑5
    https://doi.org/10.1007/s11145-012-9392-5 [Google Scholar]
  51. Paquot, M.
    (2018) Phraseological competence: A missing component in university entrance language tests? Insights from a study of EFL learners’ use of statistical collocations. Language Assessment Quarterly, 15(1), 29–43. 10.1080/15434303.2017.1405421
    https://doi.org/10.1080/15434303.2017.1405421 [Google Scholar]
  52. (2019) The phraseological dimension in interlanguage complexity research. Second Language Research, 35(1), 121–145. 10.1177/0267658317694221
    https://doi.org/10.1177/0267658317694221 [Google Scholar]
  53. R Development Core Team
    R Development Core Team (2013) R: A Language and Environment for Statistical Computing (Version 1.0.136) [Computer software]. Vienna: R Foundation for Statistical Computing. Retrieved fromwww.R-project.org/ (last accessedFebruary 2019).
    [Google Scholar]
  54. Read, J.
    (2000) Assessing Vocabulary. Cambridge: Cambridge University Press. 10.1017/CBO9780511732942
    https://doi.org/10.1017/CBO9780511732942 [Google Scholar]
  55. Roessingh, H., Elgie, S., & Kover, P.
    (2015) Using lexical profiling tools to investigage children’s written vocabulary in grade 3: An exploratory study. Language Assessment Quarterly, 12(1), 67–86. 10.1080/15434303.2014.936603
    https://doi.org/10.1080/15434303.2014.936603 [Google Scholar]
  56. Simpson-Vlach, R., & Ellis, N. C.
    (2010) An Academic Formulas List: New methods in phraseology research. Applied Linguistics, 31(4), 487–512. 10.1093/applin/amp058
    https://doi.org/10.1093/applin/amp058 [Google Scholar]
  57. Storch, N.
    (2009) The impact of studying in a second language (L2) medium university on the development of L2 writing. Journal of Second Language Writing, 18(2), 103–118. 10.1016/j.jslw.2009.02.003
    https://doi.org/10.1016/j.jslw.2009.02.003 [Google Scholar]
  58. Thorndike, E. L. & Lorge, I.
    (1944) The Teacher’s Word Book of 30,000 Words. New York, NY: Teachers College, Columbia University.
    [Google Scholar]
  59. Treffers-Daller, J., Parslow, P., & Williams, S.
    (2018) Back to basics: How measures of lexical diversity can help discriminate between CEFR levels. Applied Linguistics, 39(3), 302–327.
    [Google Scholar]
  60. Uccelli, P., Dobbs, C. L., & Scott, J.
    (2013) Mastering academic language: Organization and stance in the persuasive writing of high school students. Written Communication, 30(1), 36–62. 10.1177/0741088312469013
    https://doi.org/10.1177/0741088312469013 [Google Scholar]
  61. Verspoor, M., Schmid, M. S., & Xu, X.
    (2012) A dynamic usage based perspective on L2 writing. Journal of Second Language Writing, 21(3), 239–263. 10.1016/j.jslw.2012.03.007
    https://doi.org/10.1016/j.jslw.2012.03.007 [Google Scholar]
  62. Vidakovic, I., & Barker, F.
    (2010) Use of words and multi-word units in Skills for Life Writing examinations. University of Cambridge ESOL Examinations Research Notes, 41, 7–14.
    [Google Scholar]
  63. Vieregge, Q., Stedman, K., Mitchell, T., & Moxley, J.
    (2012) Agency in the Age of Peer Production. Urbana, IL: National Council of Teachers of English.
    [Google Scholar]

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error