Volume 1, Issue 1
  • ISSN 2215-1478
  • E-ISSN: 2215-1486
Buy:$35.00 + Taxes


We consider the opportunities presented by big educational learner corpora for Second Language Acquisition (SLA). In particular, we focus on the EF Cambridge Open Language Database (EFCAMDAT), an open access database of student writings submitted to Englishtown, the online school of EF Education First. EFCAMDAT stands out for its size (33 million words, 85 thousand learners) and a range of 128 writing tasks covering all CEFR levels with data from learners from varying nationalities. We discuss methodological issues arising from analyzing big data resources generated in educational contexts and argue that Natural Language Processing (NLP) is essential for the automated processing of such datasets. As a study case, we follow the developmental trajectory of relative clauses, a construction that necessitates deeper syntactic analysis. We consider specific issues that can affect the developmental trajectory, including task effects, formulaic language and national language effects.


Article metrics loading...

Loading full text...

Full text loading...

This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error