The ALeSKo learner corpus
The ALesKo learner corpus is a small-scale comparable corpus consisting of two subcorpora: annotated essays by advanced Chinese learners of German and comparable essays by German native speakers. The motivation for its compilation was the investigation of discourse-related phenomena such as local coherence in second-language acquisition of German. After introducing how the texts were compiled and annotated, the article focuses on quantitative studies at the token level. We discuss problems of tokenisation and part-of-speech tagging and compare the inventory of the two subcorpora in terms of frequently used N-grams and lexical richness, among other aspects. We conclude the article by describing possible applications of the study in foreign language acquisition research and language teaching.