Chapter 6. Modelling L2 vocabulary learning

MyBook is a cheap paperback edition of the original book and will be sold at uniform, low price.
This Chapter is currently unavailable for purchase.

In this paper we propose a frequency-based model of vocabulary acquisition and test it on texts written by second language (L2) writers of English. One goal of the paper is to address an issue that has arisen in previous work attempting to verify Laufer and Nation’s (1995) proposal for using lexical frequency profiling tools with L2 texts to estimate the underlying vocabulary size of the L2 writers. That issue is the application of Zipf’s law (1935, 1949) directly to student texts (see Meara, 2005; Edwards & Collins, 2011), which assumes that words are learned in the order of their frequency in the language at large. As this is clearly not the case, a more valid model of vocabulary learning needs to account for the presence of less common words at different points of the acquisition process. Our model supposes that learning consists of a sequence of exposures to words, seen in proportion to their frequency in the language as a whole, and that some number of exposures are required for a word to be learned (a model parameter). This allows calculation of the probabilities that a given word (whether common or uncommon) is learned after a given number of exposures in this sequence. Furthermore, it allows calculation of the likelihood that a word is used once it has been learned, based on the word’s rank in the learner’s interlanguage (we also considered the possibility of basing this step on the word’s rank in the L2 as a whole), from which we can predict frequency distributions for learner texts. For a given 1K word count in texts, the model predicts a smaller underlying productive vocabulary than predicted by the naïve application of Zipf’s law. We then fit the parameters of the model to texts written by 90 francophone ESL learners at different points of a five-month intensive program. The best fit was obtained with a ‘number of exposures’ parameter value of 3. The model reproduces the steeper-than-Zipf tail of the frequency distribution of words observed in texts.


This is a required field
Please enter a valid email address