Creation and analysis of a reading comprehension exercise corpus
We discuss the collection and analysis of a cross-sectional and longitudinal learner corpus consisting of answers to reading comprehension questions written by adult second language learners of German. We motivate the need for such task-based learner corpora and identify the properties which make reading comprehension exercises a particularly interesting task. In terms of the creation of the corpus, we introduce the web-based WELCOME tool we developed to support the decentralized data collection and annotation of the richly structured corpus in real-life language teaching programs. On the analysis side, we investigate the binary and the complex content-assessment classification scheme used by the annotators and the inter-annotator agreement obtained for the current corpus snapshot, at the halfway point of our four-year effort. We present results showing that for such task-based corpora, meaning assessment can be performed with reasonable agreement and we discuss several sources of disagreement.