Volume 6, Issue 1
  • ISSN 2215-1478
  • E-ISSN: 2215-1486
Buy:$35.00 + Taxes



This paper discusses machine learning techniques for the prediction of Common European Framework of Reference (CEFR) levels in a learner corpus. We summarise the CAp 2018 Machine Learning (ML) competition, a classification task of the six CEFR levels, which map linguistic competence in a foreign language onto six reference levels. The goal of this competition was to produce a machine learning system to predict learners’ competence levels from written productions comprising between 20 and 300 words and a set of characteristics computed for each text extracted from the French component of the EFCAMDAT data (Geertzen et al., 2013). Together with the description of the competition, we provide an analysis of the results and methods proposed by the participants and discuss the benefits of this kind of competition for the learner corpus research (LCR) community. The main findings address the methods used and lexical bias introduced by the task.


Article metrics loading...

Loading full text...

Full text loading...


  1. Abney, S.
    2007Semisupervised learning for computational linguistics. London: Chapman and Hall/CRC. 10.1201/9781420010800
    https://doi.org/10.1201/9781420010800 [Google Scholar]
  2. Alexopoulou, T., Michel, M., Murakami, A., & Meurers, D.
    2017 Task effects on linguistic complexity and accuracy: A large-scale learner corpus analysis employing natural language processing techniques. Language Learning, 67(S1), 180–208. 10.1111/lang.12232
    https://doi.org/10.1111/lang.12232 [Google Scholar]
  3. Alexopoulou, T., Yannakoudakis, H., & Salamoura, A.
    2013 Classifying intermediate learner English: a data-driven approach to learner corpora. InTwenty years of learner corpus research: Looking back, moving ahead (pp.11–23). Belgium: Presses Universitaires de Louvain.
    [Google Scholar]
  4. Attali, Y. & Burstein, J.
    2006 Automated essay scoring with e-rater® v.2. The Journal of Technology, Learning and Assessment, 4(3).
    [Google Scholar]
  5. Balikas, G.
    2018 Lexical bias in essay level prediction. ArXiv e-prints.
    [Google Scholar]
  6. Barker, F., Salamoura, A., & Saville, N.
    2015 Learner corpora and language testing. InS. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge Handbook of Learner Corpus Research (pp.511–534). Cambridge: Cambridge University Press. 10.1017/CBO9781139649414.023
    https://doi.org/10.1017/CBO9781139649414.023 [Google Scholar]
  7. Baur, C., Caines, A., Chua, C., Gerlach, J., Qian, M., Rayner, M., Russell, M., Strik, H., & Wei, X.
    2018 Overview of the 2018 spoken CALL shared task. InInterspeech 2018, 2354–2358. Geneva: ISCA. 10.21437/Interspeech.2018‑97
    https://doi.org/10.21437/Interspeech.2018-97 [Google Scholar]
  8. Baur, C., Chua, C., Gerlach, J., Rayner, E., Russel, M., Strik, H., & Wei, X.
    2017 Overview of the 2017 spoken CALL shared task. InWorkshop on Speech and Language Technology in Education (SLaTE). Stockholm, Sweden. 10.21437/SLaTE.2017‑13
    https://doi.org/10.21437/SLaTE.2017-13 [Google Scholar]
  9. Boyd, A., Hana, J., Nicolas, L., Meurers, D., Wisniewski, K., Abel, A., Schöne, K., Stindlová, B., & Vettori, C.
    2014 The MERLIN corpus: Learner language and the CEFR. InLREC, 1281–1288. Reykjavik, Iceland.
    [Google Scholar]
  10. Callies, M. & Paquot, M.
    2015 Learner corpus research: An interdisciplinary field on the move. International Journal of Learner Corpus Research, 1(1), 1–6. 10.1075/ijlcr.1.1.00edi
    https://doi.org/10.1075/ijlcr.1.1.00edi [Google Scholar]
  11. Chen, X. & Meurers, D.
    2016 CTAP: A web-based tool supporting automatic complexity analysis. InProceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC), 113–119.
    [Google Scholar]
  12. Council of Europe
    Council of Europe (2001aCommon European Framework of Reference for Lan- guages: Learning, teaching, assessment. Strasbourg, Language Policy Division: Cambridge University Press.
    [Google Scholar]
  13. Council of Europe
    Council of Europe (2001bCommon European Framework of Reference for Lan- guages: Learning, teaching, assessment. Structured overview of all CEFR scales. Strasbourg, Language Policy Division: Cambridge University Press.
    [Google Scholar]
  14. Council of Europe
    Council of Europe (2018Common European Framework of Reference for Languages: Learning, teaching, assessment; Companion volume with new descriptors. Strasbourg, Language Policy Division: Cambridge University Press.
    [Google Scholar]
  15. Crossley, S. A., Salsbury, T., McNamara, D. S., & Jarvis, S.
    2011 Predicting lexical proficiency in language learner texts using computational indices. Language Testing, 28(4), 561–580. 10.1177/0265532210378031
    https://doi.org/10.1177/0265532210378031 [Google Scholar]
  16. Cushing Weigle, S.
    2010 Validation of automated scores of TOEFL iBT tasks against non-test indicators of writing ability. Language Testing, 27(3), 335–353. 10.1177/0265532210364406
    https://doi.org/10.1177/0265532210364406 [Google Scholar]
  17. Dahlmeier, D., Ng, H. T., & Wu, S. M.
    2013 Building a large annotated corpus of learner English: The NUS corpus of learner English. InProceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications, 22–31. Association for Computational Linguistics. Atlanta, Georgia.
    [Google Scholar]
  18. Dale, R. & Kilgarriff, A.
    2011 Helping our own: The HOO 2011 pilot shared task. InProceedings of the 13th European Workshop on Natural Language Generation, ENLG ’11, 242–249. Association for Computational Linguistics. Nancy, France.
    [Google Scholar]
  19. Dale, R., Anisimoff, I., & Narroway, G.
    2012 HOO 2012: A report on the preposition and determiner error correction shared task. InProceedings of the Seventh Workshop on Building Educational Applications Using NLP, NAACL HLT ’12, 54–62. Association for Computational Linguistics. Montreal, Canada.
    [Google Scholar]
  20. Díaz-Negrillo, A., Ballier, N., & Thompson, P.
    2013Automatic treatment and analysis of learner corpus data. Amsterdam and Philadelphia: John Benjamins. 10.1075/scl.59
    https://doi.org/10.1075/scl.59 [Google Scholar]
  21. Flach, P.
    2012Machine learning: The art and science of algorithms that make sense of data. Cambridge: Cambridge University Press. 10.1017/CBO9780511973000
    https://doi.org/10.1017/CBO9780511973000 [Google Scholar]
  22. Friedman, J., Hastie, T., & Tibshirani, R.
    2001The elements of statistical learning, volume 1. New York: Springer Series in Statistics.
    [Google Scholar]
  23. Geertzen, J., Alexopoulou, T., & Korhonen, A.
    2013 Automatic linguistic annotation of large scale L2 databases: The EF-Cambridge open language database (EFCAMDAT). InProceedings of the 31st Second Language Research Forum. Somerville, MA: Cascadilla Proceedings Project.
    [Google Scholar]
  24. Goldberg, Y.
    2017Neural network methods for natural language processing. synthesis lectures on human language technologies. San Rafael, CA: Morgan & Claypool Publishers.
    [Google Scholar]
  25. Granger, S., Kraif, O., Ponton, C., Antoniadis, G., & Zampa, V.
    2007 Integrating learner corpora and natural language processing: A crucial step towards reconciling technological sophistication and pedagogical effectiveness. ReCALL, 19(3), 252–268. 10.1017/S0958344007000237
    https://doi.org/10.1017/S0958344007000237 [Google Scholar]
  26. Hawkins, J. A. & Buttery, P.
    2010 Criterial features in learner corpora: Theory and illustrations. English Profile Journal, 1(01). 10.1017/S2041536210000103
    https://doi.org/10.1017/S2041536210000103 [Google Scholar]
  27. Hawkins, J. A. & Filipović, L.
    2012Criterial features in L2 English: Specifying the reference levels of the Common European Framework, volume 1 of English Profile Studies. United Kingdom: Cambridge University Press.
    [Google Scholar]
  28. Higgins, D., Ramineni, C., & Zechner, K.
    2015 Learner corpora and automated scoring. InS. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge Handbook of Learner Corpus Research (pp.587–604). Cambridge: Cambridge University Press. 10.1017/CBO9781139649414.026
    https://doi.org/10.1017/CBO9781139649414.026 [Google Scholar]
  29. Hopman, E., Thompson, B., Austerweil, J., & Lupyan, G.
    2018 Predictors of L2 word learning accuracy: A big data investigation. Inthe 40th Annual Conference of the Cognitive Science Society (CogSci 2018), 513–518.
    [Google Scholar]
  30. Jarvis, S. & Paquot, M.
    2015 Learner corpora and native language identification. InS. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge Handbook of Learner Corpus Research (pp.605–628). Cambridge: Cambridge University Press. 10.1017/CBO9781139649414.027
    https://doi.org/10.1017/CBO9781139649414.027 [Google Scholar]
  31. Jarvis, S.
    2011 Data mining with learner corpora. InF. Meunier, S. De Cock, G. Gilquin, & M. Paquot (Eds.), A taste for corpora: In honour of Sylviane Granger (pp.127–154). Amsterdam and Philadelphia: John Benjamins. 10.1075/scl.45.10jar
    https://doi.org/10.1075/scl.45.10jar [Google Scholar]
  32. Le, Q. V. & Mikolov, T.
    2014 Distributed representations of sentences and documents. ArXiv: 1405.4053.
    [Google Scholar]
  33. Leacock, C., Chodorow, M., Gamon, M., & Tetreault, J.
    2010 Automated grammatical error detection for language learners. Synthesis Lectures on Human Language Technologies, 3(1), 1–134. 10.2200/S00275ED1V01Y201006HLT009
    https://doi.org/10.2200/S00275ED1V01Y201006HLT009 [Google Scholar]
  34. Lin, T.-Y., Goyal, P., Girshick, R., He, K., & Dollár, P.
    2017 Focal loss for dense object detection. InProceedings of the IEEE International Conference on Computer Vision, 2980–2988.
    [Google Scholar]
  35. Lissón, P. & Ballier, N.
    2018 Investigating learners’ progression in French as a foreign language: vocabulary growth and lexical diversity. CUNY Student Research Day. Poster.
    [Google Scholar]
  36. Lissón, P.
    2017 Investigating the use of readability metrics to detect differences in written productions of learners: a corpus-based study. Bellaterra Journal of Teaching & Learning Language & Literature, 10(4), 68–86. 10.5565/rev/jtl3.752
    https://doi.org/10.5565/rev/jtl3.752 [Google Scholar]
  37. Liu, B.
    2012Sentiment analysis and opinion mining. San Rafael, CA: Morgan & Claypool Publishers. 10.2200/S00416ED1V01Y201204HLT016
    https://doi.org/10.2200/S00416ED1V01Y201204HLT016 [Google Scholar]
  38. Lu, X.
    2014Computational methods for corpus annotation and analysis. New York: Springer. 10.1007/978‑94‑017‑8645‑4
    https://doi.org/10.1007/978-94-017-8645-4 [Google Scholar]
  39. Magerman, D. M.
    1995 Statistical decision-tree models for parsing. InProceedings of the 33rd Annual Meeting on Association for Computational Linguistics, 276–283. Association for Computational Linguistics. 10.3115/981658.981695
    https://doi.org/10.3115/981658.981695 [Google Scholar]
  40. Malmasi, S., Evanini, K., Cahill, A., Tetreault, J., Pugh, R., Hamill, C., Napolitano, D., & Qian, Y.
    2017 A report on the 2017 native language identification shared task. InProceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, 62–75. Association for Computational Linguistics. Copenhagen, Denmark. 10.18653/v1/W17‑5007
    https://doi.org/10.18653/v1/W17-5007 [Google Scholar]
  41. Meurers, D.
    2015 Learner corpora and natural language processing. InS. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge Handbook of Learner Corpus Research (pp.537–566). Cambridge: Cambridge University Press. 10.1017/CBO9781139649414.024
    https://doi.org/10.1017/CBO9781139649414.024 [Google Scholar]
  42. Michalke, M.
    2017 koRpus: An R package for text analysis. (Version 0.10–2). Available at: https://reaktanz.de/?c=hacking&s=koRpus (accessedOctober 2018).
  43. Mons, B.
    2018Data stewardship for open science: Implementing FAIR principles. London: Chapman and Hall/CRC. 10.1201/9781315380711
    https://doi.org/10.1201/9781315380711 [Google Scholar]
  44. Murakami, A.
    2014 Individual variation and the role of L1 in the L2 development of English grammatical morphemes: Insights from learner corpora. PhD thesis, University of Cambridge.
  45. 2016 Modeling systematicity and individuality in nonlinear second language development: The case of English grammatical morphemes. Language Learning, 66(4), 834–871. 10.1111/lang.12166
    https://doi.org/10.1111/lang.12166 [Google Scholar]
  46. Murphy, K. P.
    2012Machine learning. A probabilistic perspective. Adaptive Com- putation and Machine Learning. Cambridge (MA): MIT Press.
    [Google Scholar]
  47. Ng, H. T., Wu, S. M., Briscoe, T., Hadiwinoto, C., Susanto, R. H., & Bryant, C.
    2014 The CoNLL-2014 shared task on grammatical error correction. InProceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task, 1–14. Association for Computational Linguistics. Baltimore, Maryland. 10.3115/v1/W14‑1701
    https://doi.org/10.3115/v1/W14-1701 [Google Scholar]
  48. Nissim, M., Abzianidze, L., Evang, K., van der Goot, R., Haagsma, H., Plank, B., & Wieling, M.
    2017 Sharing is caring: The future of shared tasks. Computational Linguistics, 43(4), 897–904. 10.1162/COLI_a_00304
    https://doi.org/10.1162/COLI_a_00304 [Google Scholar]
  49. O’Keeffe, A. & Mark, G.
    2017 The English grammar profile of learner competence. International Journal of Corpus Linguistics, 22(4), 457–489. 10.1075/ijcl.14086.oke
    https://doi.org/10.1075/ijcl.14086.oke [Google Scholar]
  50. Page, E. B.
    1968 The use of the computer in analyzing student essays. International Review of Education / Internationale Zeitschrift für Erziehungswissenschaft / Revue Internationale de l’Education, 14(2), 210–225.
    [Google Scholar]
  51. Paquot, M. & Plonsky, L.
    2017 Quantitative research methods and study quality in learner corpus research. International Journal of Learner Corpus Research, 3(1), 61–94. 10.1075/ijlcr.3.1.03paq
    https://doi.org/10.1075/ijlcr.3.1.03paq [Google Scholar]
  52. Paroubek, P., Chaudiron, S., & Hirschman, L.
    2007 Principles of evaluation in natural language processing. Traitement Automatique des Langues, 48(1), 7–31.
    [Google Scholar]
  53. Rich, A., Popp, P. O., Halpern, D., Rothe, A., & Gureckis, T.
    2018 Modeling second-language learning from a psychological perspective. InProceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications, 223–230. 10.18653/v1/W18‑0526
    https://doi.org/10.18653/v1/W18-0526 [Google Scholar]
  54. Sang, E. F. & De Meulder, F.
    2003 Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. arXiv preprint cs/0306050, 142–147.
    [Google Scholar]
  55. Settles, B.
    2018 Data for the 2018 Duolingo shared task on second language acquisition modeling (SLAM). Available at: doi:  10.7910/DVN/8SWHNO. (accessedOctober 2018).
    https://doi.org/10.7910/DVN/8SWHNO [Google Scholar]
  56. Settles, B., Brust, C., Gustafson, E., Hagiwara, M., & Madnani, N.
    2018 Second language acquisition modeling. InProceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications, 56–65. 10.18653/v1/W18‑0506
    https://doi.org/10.18653/v1/W18-0506 [Google Scholar]
  57. Shermis, M. D., Burstein, J., Higgins, D., & Zechner, K.
    2010 Automated essay scoring: Writing assessment and instruction”. InP. Peterson, E. Baker, & B. McGaw (Eds.), International Encyclopedia of Education (Third Edition) (pp.20–26). Oxford: Elsevier. 10.1016/B978‑0‑08‑044894‑7.00233‑5
    https://doi.org/10.1016/B978-0-08-044894-7.00233-5 [Google Scholar]
  58. Tetreault, J., Burstein, J., Kochmar, E., Leacock, C., & Yannakoudakis, H.
    2018Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications. Association for Computational Linguistics. New Orleans, Louisiana.
    [Google Scholar]
  59. Thewissen, J.
    2015Accuracy across proficiency levels: A learner corpus approach. Louvain: Presses universitaires de Louvain.
    [Google Scholar]
  60. Thrun, S. & Pratt, L.
    1998Learning to learn. Norwell, MA, USA: Kluwer Aca- demic Publishers. 10.1007/978‑1‑4615‑5529‑2
    https://doi.org/10.1007/978-1-4615-5529-2 [Google Scholar]
  61. Vajjala, S. & Loo, K.
    2014 Automatic CEFR level prediction for Estonian learner text. InNEALT Proceedings Series, volume22, 113–128.
    [Google Scholar]
  62. Volodina, E., Pilán, I. & Alfter, D.
    2016 Classification of Swedish learner essays by CEFR levels. CALL Communities and Culture–Short Papers from EURO- CALL 2016, 456–461.
    [Google Scholar]
  63. Wisniewski, K.
    2017 Empirical learner language and the levels of the Common European Framework of Reference. Language Learning, 67(S1), 232–253. 10.1111/lang.12223
    https://doi.org/10.1111/lang.12223 [Google Scholar]
  64. Yannakoudakis, H., Briscoe, T., & Medlock, B.
    2011 A New dataset and method for automatically grading ESOL texts. InProceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies – Volume 1, HLT ’11, 180–189. Association for Computational Linguistics.
    [Google Scholar]
  65. Yannakoudakis, H., Kochmar, E., Leacock, C., Madnani, N., Pilán, I., & Zesch, T.
    2019Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications. Association for Computational Linguistics. Florence, Italy.
    [Google Scholar]

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error