Chapter 5. Computer simulations of MRC Psycholinguistic Database word properties
This study investigates the potential for computational models informed through automated lexical indices to simulate human ratings of word concreteness, word familiarity, and word imageability. The goal of the study is to provide word information estimates for words with human ratings, thereby affording greater textual coverage and permitting a better understanding of features that underlie word properties. This study uses traditional automated word features such word length, word frequency, hypernymy, and polysemy along with novel automated word features such as word type attributes taken from WordNet, LSA dimensions, and inverse entropy weights as predictor variables. The model reported in this study for word concreteness predicted 61% of the variance in human ratings of word concreteness and demonstrated that more concrete words contain attributes related to people, animals, and food, have higher hypernymy levels, are related to two LSA dimensions, are more frequent, and are shorter. The model for word familiarity predicted 62% of the variance in the human ratings reported in the MRC database and demonstrated that more familiar words are found in a greater number of text samples and are more frequent. The model for word imageability ratings explained 42% of the variance in the human ratings and demonstrated that more concrete words contain attributes related to artifacts, animals, and plants, are related to two LSA dimensions, are more frequent, and are shorter.