Full text loading...
, Steven Holtzman1, Paul Deane1 and Isaac Bejar1
Abstract
We describe a large-scale effort to map English-language vocabulary by U.S. school grade levels. Our motivation is to rapidly expand graded vocabulary resources for work with native English speakers in the USA, while taking into consideration school-related influences rather than relying on just the corpus-frequency approaches. We report on the initial effort of data collection, with mapping of about 22K word forms. We provide comparisons of this mapping to some other recent vocabulary mapping efforts, such as age-of-acquisition. We then describe the efforts to automatically expand this resource by using linguistically motivated variables and corpus-based methods. Our current resource maps more than 126K English word forms to US school grade levels. We also compare a subset of our L1 mapped data to English L2 vocabulary levels, as expressed on the CEFR scale, and find that there is a considerable overlap in the order of vocabulary learning in L1 and L2 English.
Article metrics loading...
Full text loading...
References
Data & Media loading...