Chapter 2. From intrinsic to extrinsic issues of lexical diversity assessment
Issues of lexical diversity assessment have only been addressed with consideration of the approach, rather than the corpus. Of necessity, <i>intrinsic issues of lexical diversity</i> related to the approach needed to be addressed first; however, given that they now have received due attention in recent research, it is time to turn our attention to <i>extrinsic issues of lexical diversity</i>, which is the assessment of how variations in texts and corpora affect the results of the approach. The focus of intrinsic issues has been on the algorithms and approaches used to produce values of lexical diversity on laboratory-like data sets. With extrinsic issues of word count, the focus moves to more naturalistic data sets with texts that demonstrate ranges of inconsistencies in terms of size, quality, and length. For these data, indices of lexical diversity are required to demonstrate ecological validity. The degree to which an index of lexical diversity exhibits ecological validity is of considerable importance to the field of second language learning because naturalistic corpora vary considerably in size, and texts within the corpora vary considerably in terms of word count. In other words, ecological validity is a necessary element of the construct validity of lexical diversity. In this study, we assess the three primary indices of lexical diversity (MTLD, HD-D, and Maas) using a corpus of naturalistic data in order to evaluate extrinsic issues of lexical diversity assessment by way of ecological validation. Our results show that the index of MTLD appears strongest and the index of Maas appears the least strong. Our conclusion, while encouraging broader research, is that the Maas index be abandoned as a lexical diversity index because of its over-sensitivity to word count. By contrast, MTLD appears to be resilient to a wide range of extrinsic factors and, consequently, is recommended for future lexical diversity studies.