Chapter 1. Defining and measuring lexical diversity

Most existing measures of lexical diversity are either direct or indirect measures of the proportion of repeated words in a language sample, and they tend to be validated in accordance with how well they avoid sample-size effects and/or how strongly they correlate with measures of knowledge and proficiency. The present paper argues that such measures suffer from the lack of construct validity in two ways: (a) They are not grounded in an adequate or clearly articulated theoretical account of the nature of the construct of lexical diversity, and (b) they are not validated in relation to how well they measure lexical diversity itself, but rather in relation to how well they do or do not correlate with other constructs. The present paper proposes solutions to both of these problems by defining lexical diversity as a perception-based phenomenon with six measurable properties, and by calibrating the six objective properties against human judgments of lexical diversity. The purpose of the empirical portion of the paper is to determine how well a statistical model constructed on the basis of the proposed six objective properties is able to account for nine human raters’ judgments of the lexical diversity found in 50 narratives written by adolescent learners and native speakers of English. The results support the proposed six-dimensional construct of lexical diversity, but also suggest the need for further refinements to how the six properties are measured.


