Towards a new generation of corpus-derived lexical resources for language learning
This chapter first argues that, despite their convenience compared to paper-based resources, corpora are, by their very nature as collections of texts and tokens, severely limited in what they can offer directly to language learners or teachers. The focus here is on understanding these limitations with respect to lexical knowledge, and it is suggested that overcoming them requires a different sort of digital resource that mediates between corpora on the one hand and teachers or learners on the other. The challenge is complicated by the fact that such a lexical knowledge resource should capture patterns of word behaviors that fall along a continuum between grammatically well-behaved and lexically idiosyncratic. A knowledgebase called StringNet, designed to capture this range of word behaviors, is described and motivated in detail.