Volume 2, Issue 1
  • ISSN 2215-1478
  • E-ISSN: 2215-1486
Buy:$35.00 + Taxes


This article presents the (KANDEL), a corpus of L2 German writing samples produced by several cohorts of North American university students over four semesters of instructed language study. This corpus expands the number of freely and publicly available learner corpora while adding to the depth of these corpora with a unique set of features. It does so by focusing on an L2 other than English, German, targeting beginning to intermediate L2 proficiency levels, and including dense developmental data and annotations for multiple linguistic variables, learner errors, and over twenty learner and task variables. Furthermore, this article reports the procedure and results of an inter-annotator agreement study as well as an in-depth analysis of annotator disagreement. In this way, it contributes to best practices of annotating learner corpora by making the annotation process transparent and demonstrating its reliability.


Article metrics loading...

Loading full text...

Full text loading...


  1. Aarts, J. & Granger, S
    1998 “Tag sequences in learner corpora: A key to interlanguage grammar and discourse”. In S. Granger (Ed.), Learner English on Computer. New York: Longman, 132–141.
    [Google Scholar]
  2. Alexopoulou, T. , Geertzen, J. , Korhonen, A. & Meurers, D
    2015 “Exploring big educational learner corpora for SLA research: Perspectives on relative clauses”, International Journal of Learner Corpus Research1(1), 96–129. doi: 10.1075/ijlcr.1.1.04ale
    https://doi.org/10.1075/ijlcr.1.1.04ale [Google Scholar]
  3. Brants, T
    2000 “Inter-Annotator agreement for a German newspaper corpus”. Proceedings of the Second International Conference on Language Resources and Evaluation . Athens, Greece: ELRA. Available at: www.coli.uni-saarland.de/~thorsten/publications/Brants-LREC00.pdf (accessed4 March 2016).
    [Google Scholar]
  4. Byrnes, H. , Maxim, H. & Norris, J.M
    2010 “Realizing advanced foreign language writing development in collegiate education: Curricular design, pedagogy, assessment [Monograph]”. Modern Language Journal94(S1). doi: 10.1111/j.1540‑4781.2010.01136.x
    https://doi.org/10.1111/j.1540-4781.2010.01136.x [Google Scholar]
  5. Callies, M. & Paquot, M
    2015 “An interview with Yukio Tono”, International Journal of Learner Corpus Research1(1), 160–171. doi: 10.1075/ijlcr.1.1.06lee
    https://doi.org/10.1075/ijlcr.1.1.06lee [Google Scholar]
  6. Council of Europe
    2001Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Strasbourg: Language Policy Unit. Available at: www.coe.int/t/dg4/linguistic/source/framework_en.pdf (accessed4 March 2016).
    [Google Scholar]
  7. Granger, S
    2015 “Contrastive interlanguage analysis: A reappraisal”, International Journal of Learner Corpus Research1(1), 7–24. doi: 10.1075/ijlcr.1.1.01gra
    https://doi.org/10.1075/ijlcr.1.1.01gra [Google Scholar]
  8. Granger, S. , Gilquin, G. & Meunier, F
    2015 “Introduction: Learner corpus research – past, present and future”. In S. Granger , G. Gilquin , & F. Meunier (Eds.), The Cambridge Handbook of Learner Corpus Research. Cambridge: Cambridge University Press, 1–5. doi: 10.1017/CBO9781139649414.001
    https://doi.org/10.1017/CBO9781139649414.001 [Google Scholar]
  9. Granger, S. & Thewissen, J
    2007Computer-aided Error Analysis . Lecture presented at the Summer School Learner Corpus Research: From corpus design to data interpretation . University of Louvain/Belgium, 9–14 September 2007.
    [Google Scholar]
  10. Gries, S.T
    2015 “Statistics for learner corpus research”. In S. Granger , G. Gilquin , & F. Meunier (Eds.), The Cambridge Handbook of Learner Corpus Research. Cambridge: Cambridge University Press, 159–181. doi: 10.1017/CBO9781139649414.008
    https://doi.org/10.1017/CBO9781139649414.008 [Google Scholar]
  11. Gries, S.T. & Deshors, S
    2015 “EFL and/vs. ESL?: A multi-level regression modeling perspective on bridging the paradigm gap”, International Journal of Learner Corpus Research1(1), 130–159. doi: 10.1075/ijlcr.1.1.05gri
    https://doi.org/10.1075/ijlcr.1.1.05gri [Google Scholar]
  12. Gut, U
    2012 “The LeaP corpus: A multilingual corpus of spoken learner German and learner English”. In T. Schmidt & K. Wörner (Eds.), Multilingual Corpora and Multilingual Corpus Analysis. Amsterdam: John Benjamins, 3–23. doi: 10.1075/hsm.14.03gut
    https://doi.org/10.1075/hsm.14.03gut [Google Scholar]
  13. Jarvis, S. & Pavlenko, A
    2008Crosslinguistic Influence in Language and Cognition. New York: Routledge.
    [Google Scholar]
  14. Krummes, C. & Ensslin, A
    2014 “What’s hard in German? WHiG: A British learner corpus of German”, Corpora9(2), 191–205. doi: 10.3366/cor.2014.0057
    https://doi.org/10.3366/cor.2014.0057 [Google Scholar]
  15. Larsen-Freeman, D
    2006 “The emergence of complexity, fluency, and accuracy in the oral and written production of five Chinese learners of English”, Applied Linguistics27, 590–619. doi: 10.1093/applin/aml029
    https://doi.org/10.1093/applin/aml029 [Google Scholar]
  16. Lu, X
    2010 “Automatic analysis of syntactic complexity in second language writing”, International Journal of Corpus Linguistics15(4), 474–496. doi: 10.1075/ijcl.15.4.02lu
    https://doi.org/10.1075/ijcl.15.4.02lu [Google Scholar]
  17. Lüdeling, A
    2008 “Mehrdeutigkeiten und Kategorisierung: Probleme bei der Annotation von Lernerkorpora”. In M. Walter & P. Grommes (Eds.), Fortgeschrittene Lernervarietäten: Korpuslinguistik und Zweitspracherwerbsforschung. Tübingen: Max Niemeyer Verlag, 119–140.
    [Google Scholar]
  18. Lüdeling, A. & Hirschmann, H
    2015 “Error annotation systems”. In S. Granger , G. Gilquin , & F. Meunier (Eds.), The Cambridge Handbook of Learner Corpus Research. Cambridge: Cambridge University Press, 135–157. doi: 10.1017/CBO9781139649414.007
    https://doi.org/10.1017/CBO9781139649414.007 [Google Scholar]
  19. Lüdeling, A. , Walter, M. , Kroymann, E. & Adolphs, P
    2005 “Multi-level error annotation in learner corpora”, Proceedings of Corpus Linguistics 2005 , Birmingham, UK. Available at: www.birmingham.ac.uk/research/activity/corpus/publications/conference-archives/2005-conf-e-journal.aspx (accessed4 March 2016).
    [Google Scholar]
  20. Mackey, A. & Gass, S
    2005Second Language Research: Methodology and Design. New York, NY: Routledge.
    [Google Scholar]
  21. Maden-Weinberger, U
    2015 “‘Hätte, wäre, wenn…’: A pseudo-longitudinal study of subjunctives in the Corpus of Learner German (CLEG)”, International Journal of Learner Corpus Research1(1), 25–57. doi: 10.1075/ijlcr.1.1.02mad
    https://doi.org/10.1075/ijlcr.1.1.02mad [Google Scholar]
  22. Meunier, F. & Littré, D
    2013 “Tracking learners’ progress: Adopting a dual corpus cum experimental data approach”, Modern Language Journal97(S1), 61–76. doi: 10.1111/j.1540‑4781.2012.01424.x
    https://doi.org/10.1111/j.1540-4781.2012.01424.x [Google Scholar]
  23. Meurers, D
    2011 On automatically analyzing learner language. Keynote lecture presented atLearner Corpus Research 2011, Université Catholique de Louvain, Louvain-la-Neuve, Belgium, 15-17 September 2011. Available at: www.sfs.uni-tuebingen.de/~dm/handouts/louvain-11-09-17.pdf (accessed4 March 2016).
    [Google Scholar]
  24. Ortega, L. & Byrnes, H
    2008 “Theorizing advancedness, setting up the longitudinal research agenda”. In L. Ortega & H. Byrnes (Eds.), The Longitudinal Study of Advanced L2 Capacities. New York, NY: Routledge/Taylor & Francis, 281–300.
    [Google Scholar]
  25. Ortega, L. & Sinicrope, C
    2008Novice Proficiency in a Foreign Language: A Study of Task-based Performance Profiling on the STAMP Test. (Technical report). University of Oregon, Center for Applied Second Language Studies.
    [Google Scholar]
  26. Ott, N. , Ziai, R. & Meurers, D
    2012 “Creation and analysis of a reading comprehension exercise corpus: Towards evaluating meaning in context”. In T. Schmidt & K. Wörner (Eds.), Multilingual Corpora and Multilingual Corpus Analysis. Amsterdam: John Benjamins, 47–69. doi: 10.1075/hsm.14.05ott
    https://doi.org/10.1075/hsm.14.05ott [Google Scholar]
  27. Reznicek, M. , Lüdeling, A. & Hirschmann, H
    2013 “Competing target hypotheses in the Falko corpus: A flexible multi-layer corpus architecture”. In A. Díaz-Negrillo , N. Ballier , & P. Thompson (Eds.), Automatic Treatment and Analysis of Learner Corpus Data. Amsterdam: John Benjamins, 101–124. doi: 10.1075/scl.59.07rez
    https://doi.org/10.1075/scl.59.07rez [Google Scholar]
  28. Reznicek, M. , Lüdeling, A. , Krummes, C. , Schwantuschke, F. , Walter, M. , Schmidt, K. , Hirschmann, H. & Andreas, T
    2012 Das Falko-Handbuch: Korpusaufbau und Annotationen, Version 2.01. Available at: https://www.linguistik.hu-berlin.de/institut/professuren/korpuslinguistik/forschung/falko/Falko-Handbuch_Korpusaufbau%20und%20Annotationen_v2.01 (accessed4 March 2016).
  29. Reznicek, M. , Walter, M. , Schmidt, K. , Lüdeling, A. , Hirschmann, H. , Krummes, C. & Andreas, T
    2010Das Falko-Handbuch: Korpusaufbau und Annotationen, Version 1.0.1. Available at: https://www.linguistik.hu-berlin.de/institut/professuren/korpuslinguistik/forschung/falko/Falko-Handbuch_Korpusaufbau%20und%20Annotationen_v1.0.1 (accessed4 March 2016).
    [Google Scholar]
  30. Schiller, A. , Teufel, S. , Stöckert, C. & Thielen, C
    1999Guidelines für das Tagging deutscher Textcorpora mit STTS [Guidelines for tagging German corpora of written language with STTS]. Technical Report. Stuttgart, Germany: Institut für maschinelle Sprachverarbeitung [Institute for Machine Language Processing].
    [Google Scholar]
  31. Schmid, H
    1994 “Probabilistic part-of-speech tagging using decision trees”, Proceedings of the International Conference on New Methods in Language Processing . Manchester, UK, 44–49. Available at: citeseerx.ist.psu.edu/viewdoc/download?doi=[C1] (accessed4 March 2016).
    [Google Scholar]
  32. Schmidt, T
    2011 “A TEI-based approach to standardising spoken language transcription”, Journal of the Text Encoding Initiative1. Available at: jtei.revues.org/142 (accessed4 March 2016).
    [Google Scholar]
  33. Vyatkina, N
    2012 “The development of second language writing complexity in groups and individuals: A longitudinal learner corpus study”, Modern Language Journal96(4), 576–598. doi: 10.1111/j.1540‑4781.2012.01401.x
    https://doi.org/10.1111/j.1540-4781.2012.01401.x [Google Scholar]
  34. 2013a “Analyzing part-of-speech variability in a longitudinal learner corpus and a pedagogic corpus”. In S. Granger , G. Gilquin , & F. Meunier (Eds.), Twenty Years of Learner Corpus Research: Looking Back, Moving Ahead. Corpora and Language in Use - Proceedings 1. Louvain-la-Neuve: Presses universitaires de Louvain, 479–491.
    [Google Scholar]
  35. 2013b “Specific syntactic complexity: Developmental profiling of individuals based on an annotated learner corpus”, Modern Language Journal97(s1), 11–30. doi: 10.1111/j.1540‑4781.2012.01421.x
    https://doi.org/10.1111/j.1540-4781.2012.01421.x [Google Scholar]
  36. 2016 “Data-driven learning for beginners: The case of German verb-preposition collocations”, ReCALL28(2), 207–226. doi: 10.1017/S0958344015000269
    https://doi.org/10.1017/S0958344015000269 [Google Scholar]
  37. Vyatkina, N. , Hirschmann, H. & Golcher, F
    2015 “Syntactic modification at early stages of L2 German writing development: A longitudinal learner corpus study”, Journal of Second Language Writing29, 28–50. doi: 10.1016/j.jslw.2015.06.006
    https://doi.org/10.1016/j.jslw.2015.06.006 [Google Scholar]
  38. Wisniewski, K. , Schöne, K. , Nicolas, L. , Vettori, C. , Boyd, A. , Meurers, D. , Abel, A. & Hana, J
    2013 “MERLIN: An online trilingual learner corpus empirically grounding the European Reference Levels in authentic learner data”. In ICT for Language Learning, Conference Proceedings 2013 . Libreriauniversitaria.itEdizioni. Available at: conference.pixel-online.net/ICT4LL2013/common/download/Paper_pdf/322-CEF03-FP-Wisniewski-ICT2013.pdf (accessed4 March 2016).
    [Google Scholar]
  39. Zinsmeister, H. & Breckle, M
    2012 “The ALeSKo learner corpus: Design – annotation – quantitative analyses”. In T. Schmidt & K. Wörner (Eds.), Multilingual Corpora and Multilingual Corpus Analysis. Amsterdam: John Benjamins, 71–96. doi: 10.1075/hsm.14.06zin
    https://doi.org/10.1075/hsm.14.06zin [Google Scholar]
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error