Volume 10, Issue 1
  • ISSN 2215-1478
  • E-ISSN: 2215-1486
Buy:$35.00 + Taxes



In this position paper, I argue that proficiency-rated learner corpora should gain a more prominent role in data-driven learning (DDL). With specific reference to DDL, proficiency-rated learner corpora can provide typical, atypical and erroneous target language data at different levels of proficiency, which can be meaningfully used in the design of learning activities. This makes them pivotal in expanding the scope of DDL to include mid- and lower-level proficiency learners more extensively. Although the field of learner corpus research has been promoting learner corpus use in DDL for a long time, only a small fraction of DDL studies make use of a learner corpus. As a contribution to overcome this hiatus, I will demonstrate how using a specific proficiency-rated learner corpus (i.e., the ; Spina et al., 2022, 2023) can enrich the design of DDL activities, making them more adaptable to a wider range of learner needs.


Article metrics loading...

Loading full text...

Full text loading...


  1. Baisa, V., & Suchomel, V.
    (2014) SkELL: Web interface for English language learning. Eighth Workshop on Recent Advances in Slavonic Natural Language Processing, 63–70.
    [Google Scholar]
  2. Bestgen, Y., & Granger, S.
    (2014) Quantifying the development of phraseological competence in L2 English writing: An automated approach. Journal of Second Language Writing, 261, 28–41. 10.1016/j.jslw.2014.09.004
    https://doi.org/10.1016/j.jslw.2014.09.004 [Google Scholar]
  3. (2018) Tracking L2 writers’ phraseological development using collgrams: Evidence from a longitudinal EFL corpus. InS. Hoffmann, A. Sand, S. Arndt-Lappe, & L. M. Dillmann (Eds.), Corpora and Lexis (pp.277–301). Brill. 10.1163/9789004361133_011
    https://doi.org/10.1163/9789004361133_011 [Google Scholar]
  4. Boulton, A., & Cobb, T.
    (2017) Corpus use in language learning: A meta-analysis. Language Learning, 67(2), 348–393. 10.1111/lang.12224
    https://doi.org/10.1111/lang.12224 [Google Scholar]
  5. Boulton, A., & Vyatkina, N.
    (2021) Thirty years of data-driven learning: Taking stock and charting new directions. Language Learning and Technology, 25(3), 66–89.
    [Google Scholar]
  6. Boyd, A., Hana, J., Nicolas, L., Meurers, D., Wisniewski, K., Abel, A., Schöne, K., Štindlová, B., & Vettori, C.
    (2014) The MERLIN corpus: learner language and the CEFR. InN. Calzolari, K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014) (pp.1281–1288), European Language Resources Association (ELRA).
    [Google Scholar]
  7. Carlsen, C.
    (2012) Proficiency level – A fuzzy variable in computer learner corpora. Applied Linguistics, 33(2), 161–183. 10.1093/applin/amr047
    https://doi.org/10.1093/applin/amr047 [Google Scholar]
  8. Casani, E.
    (2020) Valutare la competenza morfosintattica in italiano L2. Una validazione corpus-based dei livelli del QCER. InE. Nuzzo, E. Santoro, & L. Vedder (Eds.), Valutazione e misurazione delle produzioni orali e scritte in italiano lingua seconda (pp.15–26). Cesati.
    [Google Scholar]
  9. Chambers, A.
    (2019) Towards the corpus revolution? Bridging the research–practice gap. Language Teaching, 52(4), 460–475. 10.1017/S0261444819000089
    https://doi.org/10.1017/S0261444819000089 [Google Scholar]
  10. Cole, M. W.
    (2014) Speaking to read: Meta-analysis of peer-mediated learning for English language learners. Journal of Literacy Research, 46(3), 358–382. 10.1177/1086296X14552179
    https://doi.org/10.1177/1086296X14552179 [Google Scholar]
  11. Forti, L.
    (2023) Learner corpora and the design of data-driven learning activities. InB. Bédi, Y. Choubsaz, K. Friðriksdóttir, A. Gimeno-Sanz, S. Björg Vilhjálmsdóttir, & S. Zahova (Eds.), CALL for all Languages – EUROCALL 2023 Short Papers. University of Iceland, Reykjavik, August 15–18 (pp.139–144). Editorial Universitat Politècnica de València. 10.4995/EuroCALL2023.2023.16959
    https://doi.org/10.4995/EuroCALL2023.2023.16959 [Google Scholar]
  12. Forti, L., Bolli, G. G., Santarelli, F., Santucci, V., & Spina, S.
    (2020) MALT-IT2: A new resource to measure text difficulty in light of CEFR levels for Italian L2 learning. InN. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of the 12th Language Resources and Evaluation Conference (pp.7206–7213). European Language Resources Association (ELRA).
    [Google Scholar]
  13. Frey, J. C., König, A., Stemle, E. W., & Paquot, M.
    (2023, August/September). A core metadata schema for L2 data. Poster presented at theEuroSLA 32 conference 2023, University of Birmingham, United Kingdom.
    [Google Scholar]
  14. Friginal, E.
    (2018) Corpus linguistics for English teachers: Tools, online resources, and classroom activities. Routledge. 10.4324/9781315649054
    https://doi.org/10.4324/9781315649054 [Google Scholar]
  15. Gilquin, G.
    (2023) Written learner corpora to inform teaching. InR. R. Jablonkai, & E. Csomay (Eds.), The Routledge handbook of corpora and English language teaching and learning (pp.281–295). Routledge.
    [Google Scholar]
  16. Gilquin, G., & Granger, S.
    (2022) Using data-driven learning in language teaching. InA. O’Keeffe, & M. J. McCarthy (Eds.), The Routledge handbook of corpus linguistics (2nd ed., pp.430–442). Routledge. 10.4324/9780367076399‑30
    https://doi.org/10.4324/9780367076399-30 [Google Scholar]
  17. Glaznieks, A., Frey, J.-C., Stopfner, M., Zanasi, L., & Nicolas, L.
    (2022) LEONIDE: A longitudinal trilingual corpus of young learners of Italian, German and English. International Journal of Learner Corpus Research, 8(1), 97–120. 10.1075/ijlcr.21004.gla
    https://doi.org/10.1075/ijlcr.21004.gla [Google Scholar]
  18. Götz, S.
    (2022, August). Learner corpora and DDL: A promising synergy?Paper presented at theCorpusCALL SIG symposium “DDL and learner corpora” as part of the EuroCALL conference 2022 (online), University of Iceland, Iceland.
    [Google Scholar]
  19. Goulart, L., & Veloso, I.
    (Eds.) (2023) Corpora in English language teaching. Classroom activities for teachers new to corpus linguistics. Open Educational Resource. Montclair State University.
    [Google Scholar]
  20. Granger, S.
    (1996) From CA to CIA and back: An integrated approach to computerized bilingual and learner corpora. InK. Aijmer, B. Altenberg, & M. Johansson (Eds.), Languages in contrast: Papers from a symposium on text-based cross-linguistic studies: Lund 4–5 March 1994 (pp.37–51). Lund University Press.
    [Google Scholar]
  21. (2015) Contrastive interlanguage analysis: A reappraisal, International Journal of Learner Corpus Research, 1(1), 7–24. 10.1075/ijlcr.1.1.01gra
    https://doi.org/10.1075/ijlcr.1.1.01gra [Google Scholar]
  22. (2009) The contribution of learner corpora to Second Language Acquisition and foreign language teaching: A critical evaluation. InK. Aijmer (Ed.), Corpora and Language Teaching (pp.13–33). John Benjamins. 10.1075/scl.33.04gra
    https://doi.org/10.1075/scl.33.04gra [Google Scholar]
  23. Granger, S., Dupont, M., Meunier, F., Naets, H., & Paquot, M.
    (Eds.) (2020) International Corpus of Learner English. Version 3. Presses universitaires de Louvain.
    [Google Scholar]
  24. Granger, S., & Paquot, M.
    (2017, December). Towards standardization of metadata for L2 corpora. Invited talk at theCLARIN workshop on Interoperability of Second Language Resources and Tools, University of Gothenburg, Sweden.
    [Google Scholar]
  25. Gyllstad, H., & Snoder, P.
    (2021) Exploring learner corpus data for language testing and assessment purposes: The case of verb + noun collocations. InS. Granger (Ed.), Perspectives on the L2 phrasicon (pp.49–71). Multilingual Matters. 10.21832/9781788924863‑004
    https://doi.org/10.21832/9781788924863-004 [Google Scholar]
  26. Johns, T.
    (1991) Should you be persuaded – Two examples of data driven learning materials. InJ. Johns, & P. King (Eds.), Classroom concordancing, English Language Research Journal, 41, 1–16.
    [Google Scholar]
  27. La Russa, F., D’Alesio, V., & Suadoni, A.
    (2023) Designing a corpus-based syllabus of Italian collocations: Criteria, methods and procedure, Revue Roumaine de Linguistique, 41, 377–389. 10.59277/RRL.2023.4.03
    https://doi.org/10.59277/RRL.2023.4.03 [Google Scholar]
  28. Le Foll, E.
    (2021) Creating corpus-informed materials for the English as a foreign language classroom. A step-by-step guide for (trainee) teachers using online resources (Third Edition). Open Educational Resource. https://elenlefoll.pressbooks.com. CC-BY-NC 4.0.
    [Google Scholar]
  29. Lee, H., Warschauer, M., & Lee, J. H.
    (2018) The effects of corpus use on second language vocabulary learning: A multilevel meta-analysis. Applied Linguistics, 40(5), 721–753. 10.1093/applin/amy012
    https://doi.org/10.1093/applin/amy012 [Google Scholar]
  30. Mizumoto, A., & Chujo, K.
    (2015) A meta-analysis of data-driven learning approach in the Japanese EFL classroom. English Corpus Studies, 221, 1–18.
    [Google Scholar]
  31. Paquot, M., Rubin, R., & Vandeweerd, N.
    (2022) Crowdsourced adaptive comparative judgment: A community-based solution for proficiency rating. Language Learning, 72(3), 853–885. 10.1111/lang.12498
    https://doi.org/10.1111/lang.12498 [Google Scholar]
  32. Pérez-Paredes, P.
    (2022) A systematic review of the uses and spread of corpora and data-driven learning in CALL research during 2011–2015. Computer Assisted Language Learning, 35(1–2), 36–61. 10.1080/09588221.2019.1667832
    https://doi.org/10.1080/09588221.2019.1667832 [Google Scholar]
  33. Pérez-Paredes, P., Ordoñana Guillamón, C., Van de Vyver, J., Meurice, A., Aguado Jiménez, P., Conole, G., & Sánchez Hernández, P.
    (2019) Mobile data-driven language learning: Affordances and learners’ perception. System, 841, 145–159. 10.1016/j.system.2019.06.009
    https://doi.org/10.1016/j.system.2019.06.009 [Google Scholar]
  34. Poole, R.
    (2018) A guide to using corpora for English language learners. Edinburgh University Press. 10.1515/9781474427180
    https://doi.org/10.1515/9781474427180 [Google Scholar]
  35. Seidlhofer, B.
    (2002) Pedagogy and local learner corpora: Working with learning-driven data. InS. Granger, J. Hung, & S. Petch-Tyson (Eds.), Computer learner corpora, Second Language Acquisition and foreign language teaching (pp.213–234). John Benjamins. 10.1075/lllt.6.14sei
    https://doi.org/10.1075/lllt.6.14sei [Google Scholar]
  36. Shatz, I.
    (2020) Refining and modifying the EFCAMDAT: Lessons from creating a new corpus from an existing large-scale English learner language database. International Journal of Learner Corpus Research, 6(2), 220–236. 10.1075/ijlcr.20009.sha
    https://doi.org/10.1075/ijlcr.20009.sha [Google Scholar]
  37. Spina, S., Fioravanti, I., Forti, L., Santucci, V., Scerra, A., & Zanda, F.
    (2022) Il Corpus CELI: Una nuova risorsa per studiare l’acquisizione dell’italiano L2. Italiano LinguaDue, 14(1), 116–138. 10.54103/2037‑3597/18161
    https://doi.org/10.54103/2037-3597/18161 [Google Scholar]
  38. Spina, S., Fioravanti, I., Forti, L., & Zanda, F.
    (2023) The CELI corpus: Design and linguistic annotation of a new online learner corpus. Second Language Research, Ahead of print. https://journals.sagepub.com/doi/epub/10.1177/02676583231176370
    [Google Scholar]
  39. Viana, V.
    (Ed.) (2023) Teaching English with corpora: A resource book. Routledge.
    [Google Scholar]
  40. Vyatkina, N.
    (2020) Corpora as open educational resources for language teaching. Foreign Language Annals, 53(2), 359–370. 10.1111/flan.12464
    https://doi.org/10.1111/flan.12464 [Google Scholar]
  41. Wu, S., Fitzgerald, A., & Witten, I.
    (2019) Developing and evaluating a learner-friendly collocation system with user query data. International Journal of Computer-Assisted Language Learning and Teaching, 9(2), 53–78. 10.4018/IJCALLT.2019040104
    https://doi.org/10.4018/IJCALLT.2019040104 [Google Scholar]
  42. Zanda, F., & Rini, D.
    (2023) Using a learner corpus to refresh rating scales of CELI exams. InConference proceedings of the ALTE 8th International Conference: Language Assessment Fit for the Future (pp.42–47). ALTE.
    [Google Scholar]

Data & Media loading...

  • Article Type: Research Article
Keyword(s): CEFR; data-driven learning; Italian; learner corpus; proficiency
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error