Chapter 4. Validating lexical measures using human scores of lexical proficiency
This study examines the convergent validity of a wide range of computational indices reported by Coh-Metrix that have been associated in past studies with lexical features such as basic category words, semantic co-referentiality, word frequency, and lexical diversity. This study uses human judgments of these lexical features as found in free-writing samples as operationalizations of the lexical constructs the indices are meant to measure. Statistical analyses were then conducted to examine the convergent validity of each index and to assess the predictive ability of the indices that correlate strongest with the human judgments to explain holistic scores of lexical proficiency in L1 and L2 speakers. Correlations between the automated lexical indices and the operationalized constructs demonstrated small to large effect sizes providing a degree of convergent validity for most of the automated indices examined in this study. A multiple regression predicting holistic judgments of lexical proficiency using these automated lexical indices explained 40% of the variance in a training set and 37% of the variance in a test set. The findings from the study provide a degree of confidence that the indices are measuring the constructs they were predicted to measure.