1887
image of The Core Metadata Schema for Learner Corpora (LC-meta)

Abstract

Abstract

Metadata is critical throughout the research process, from study design to corpus selection/compilation, result interpretability and cumulative research. To date, however, learner corpus research has not developed community standards or best practices for metadata collection and sharing. In this article, we present the results of a collaborative project aimed at addressing this issue by developing a standardised metadata schema for learner corpora. We first describe the procedure implemented to design the schema, including the ways in which we continuously involved learner corpus researchers in this initiative. We then introduce the (LC-meta, Version 2), which consists in a set of obligatory and optional variables that encapsulate crucial information about L2 data (administrative details, corpus design, text-related variables, learner-related variables, annotations, annotators, or transcribers). Finally, we discuss future developments and emphasise the importance of continued maintenance and further refinement of this schema by the research community.

Available under the CC BY 4.0 license.
Loading

Article metrics loading...

/content/journals/10.1075/ijlcr.24010.paq
2024-09-13
2024-10-06
Loading full text...

Full text loading...

/deliver/fulltext/10.1075/ijlcr.24010.paq/ijlcr.24010.paq.html?itemId=/content/journals/10.1075/ijlcr.24010.paq&mimeType=html&fmt=ahah

References

  1. Barker, F., Salamoura, A. & Saville, N.
    (2015) Learner corpora and language testing. InS. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge handbook of learner corpus research (pp.–). Cambridge University Press. 10.1017/CBO9781139649414.023
    https://doi.org/10.1017/CBO9781139649414.023 [Google Scholar]
  2. Biber, D. & S. Conrad
    (2019) Register, genre and style. Cambridge University Press. 10.1017/9781108686136
    https://doi.org/10.1017/9781108686136 [Google Scholar]
  3. Brown, R.
    (2021) The importance of data citation. BioScience, (), . 10.1093/biosci/biab012
    https://doi.org/10.1093/biosci/biab012 [Google Scholar]
  4. Burnard, L.
    (2004) Developing linguistic corpora: a guide to good practice. Metadata for corpus work. https://users.ox.ac.uk/~martinw/dlc/chapter3.htm
  5. (2017) How many standards do we need to model reality?https://www.researchgate.net/publication/298721904 (last accessed 2017).
  6. Carlsen, C.
    (2012) Proficiency level — A fuzzy variable in computer learner corpora. Applied Linguistics, (), –. 10.1093/applin/amr047
    https://doi.org/10.1093/applin/amr047 [Google Scholar]
  7. Council of Europe
    Council of Europe (2020) Common European Framework of Reference for Languages: Learning, teaching, assessment — Companion volume. Council of Europe Publishing, Strasbourg, available atwww.coe.int/lang-cefr
    [Google Scholar]
  8. Frey, J.-C., König, A., Stemle, E. & Paquot, M.
    (2023) Core Metadata Schema for L2 data [Conference presentation]. 32nd Conference of the European Second Language Association (EUROSLA), 30 August — 2 September 2023, University of Birmingham, UK.
    [Google Scholar]
  9. Gilquin, G.
    (2015) From design to collection of learner corpora. InS. Granger, G. Gilquin & F. Meunier (Eds.), The Cambridge handbook of learner corpus research (pp.–). Cambridge University Press. 10.1017/CBO9781139649414.002
    https://doi.org/10.1017/CBO9781139649414.002 [Google Scholar]
  10. Granger, S. & Lefer, M.-A.
    (2020) The Multilingual Student Translation corpus: a resource for translation teaching and research. Language Resources and Evaluation, : –. https://rdcu.be/b0MkU. 10.1007/s10579‑020‑09485‑6
    https://doi.org/10.1007/s10579-020-09485-6 [Google Scholar]
  11. Granger, S. & Paquot, M.
    (2017) Towards standardization of metadata for L2 corpora. Invited talk at theCLARIN workshop on Interoperability of Second Language Resources and Tools, 6–8 December 2017, University of Gothenburg, Sweden. https://sweclarin.se/sites/sweclarin.se/files/event_atachements/Granger_Paquot_Metadata_G%C3%B6teborg_final.pdf
    [Google Scholar]
  12. Higgins, S.
    (2007) What are metadata standards? Digital Curation Centre. Standards Watch Papers. www.dcc.ac.uk/resources/briefing-papers/standards-watch-papers/what-are-metadata-standards
  13. Ide, N.
    (1998) Encoding linguistic corpora. Sixth Workshop on Very Large Corpora (pp.–). https://aclanthology.org/W98-1102
    [Google Scholar]
  14. Kerz, E. & Wiechmann, D.
    (2020) Individual differences. InN. Tracy-Ventura & M. Paquot (Eds.), The Routledge handbook of second language acquisition and corpora (pp.–). Routledge. 10.4324/9781351137904‑35
    https://doi.org/10.4324/9781351137904-35 [Google Scholar]
  15. König, A., Frey, J.-C. & Stemle, E.
    (2021) Exploring reusability and reproducibility for a research infrastructure for L1 and L2 learner corpora. Information(): , 10.3390/info12050199
    https://doi.org/10.3390/info12050199 [Google Scholar]
  16. Kormos, J.
    (2020) Specific learning difficulties in second language learning and teaching. Language Teaching, (), –. 10.1017/S0261444819000442
    https://doi.org/10.1017/S0261444819000442 [Google Scholar]
  17. Larsson, T., Paquot, M., & Biber, D.
    (2021) On the importance of register in learner writing: A multi-dimensional approach. InE. Seoane & D. Biber (Eds.), Corpus based approaches to register variation (pp.–). Benjamins. 10.1075/scl.103.09lar
    https://doi.org/10.1075/scl.103.09lar [Google Scholar]
  18. Lehmberg, T. & Wörner, K.
    (2008) Annotation standards. InA. Lüdeling & M. Kytö (Eds.), Corpus linguistics — An international handbook (volume) (pp.–). Walter de Gruyter.
    [Google Scholar]
  19. Li, S., Hiver, P., & Papi, M.
    (2022) Individual differences in second language acquisition: Theory, research, and practice. InS. Li, P. Hiver & M. Papi (Eds.), The Routledge handbook of second language acquisition and individual differences (pp.–). Routledge. 10.4324/9781003270546‑2
    https://doi.org/10.4324/9781003270546-2 [Google Scholar]
  20. Lindström Tiedemann, T., Lenardič, J., & Fišer, D.
    (2018) L2 learner corpus survey: Towards improved verifiability, reproducibility and inspiration in learner corpus research. Proceedings of CLARIN Annual Conference 2018, Pisa, Italy, 8–10 October 2018, pp.–. https://office.clarin.eu/v/CE-2018-1292-CLARIN2018_ConferenceProceedings.pdf
    [Google Scholar]
  21. MacWhinney, B.
    (2017) A shared platform for studying second language acquisition. Language Learning, (), –. 10.1111/lang.12220
    https://doi.org/10.1111/lang.12220 [Google Scholar]
  22. (2000) The CHILDES project: Tools for analyzing talk (3rd edition). Lawrence Erlbaum Associates.
    [Google Scholar]
  23. (2024) Tools for analyzing talk. Part 1: The CHAT transcription format. 10.21415/3mhn‑0z89
    https://doi.org/10.21415/3mhn-0z89 [Google Scholar]
  24. Ortega, L.
    (2019) SLA and the study of equitable multilingualism. The Modern Language Journal, , –. 10.1111/modl.12525
    https://doi.org/10.1111/modl.12525 [Google Scholar]
  25. Paquot, M.
    (2023) The Core Metadata Schema for L2 data: Collaborative efforts towards improved data findability, metadata quality and study comparability in L2 research. “Corpus Linguistics and Applied Linguistics Research” series of online talks, Universidad de Murcia, Spain, 30 October 2023. https://www.youtube.com/watch?v=jvnVT40qLcw
    [Google Scholar]
  26. Stemle, E. W., Boyd, A., Janssen, M., Tiedemann, T. L., Preradovic, N. M., Rosen, A., Rosén, D., & Volodina, E.
    (2019) Working together towards an ideal infrastructure for language learner corpora. InA. Abel, A. Glaznieks, V. Lyding & L. Nicolas (Eds.), Widening the scope of learner corpus research. Selected papers from the fourth Learner Corpus Research Conference (pp.–). Corpora and Language in Use — Proceedings 5, Presses Universitaires de Louvain.
    [Google Scholar]
  27. Tracy-Ventura, N., Paquot, M. & Myles, F.
    (2021) The future of corpora in SLA. InN. Tracy-Ventura & M. Paquot (Eds), The Routledge handbook of second language acquisition and corpora (pp.–). Routledge.
    [Google Scholar]
  28. Volodina, E., Janssen, M., Lindström Tiedemann, T., Mikelic Preradovic, N., Ragnhildstveit, S., Tenfjord, K., & de Smedt, K.
    (2018) Interoperability of second language resources and tools. Proceedings of the CLARIN annual conference 2018, Pisa, Italy, 8–10 October 2018, –. https://office.clarin.eu/v/CE-2018-1292-CLARIN2018_ConferenceProceedings.pdf
    [Google Scholar]
  29. Wilkinson, M. D., Dumontier, M., Aalbersberg, Ij. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., … Mons, B.
    (2016) The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, (), 160018. 10.1038/sdata.2016.18
    https://doi.org/10.1038/sdata.2016.18 [Google Scholar]
  30. Windhouwer, M. & Goosen, T.
    (2022) Component metadata infrastructure. InD. Fišer & A. Witt (Eds.), CLARIN: The infrastructure for language resources (pp.–). De Gruyter. 10.1515/9783110767377‑008
    https://doi.org/10.1515/9783110767377-008 [Google Scholar]
  31. Wulff, S.
    (2023) Corpus research. InJ. Cabrelli, A. Chaouch-Orozco, J. González Alonso, S. Pereira Soares, E. Puig-Mayenco, & J. Rothman (Eds.), The Cambridge handbook of third language acquisition (pp.–). Cambridge University Press. 10.1017/9781108957823.027
    https://doi.org/10.1017/9781108957823.027 [Google Scholar]
/content/journals/10.1075/ijlcr.24010.paq
Loading
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error