Volume 7, Issue 2
  • ISSN 2215-1478
  • E-ISSN: 2215-1486
Buy:$35.00 + Taxes



This article reports on an open-source R package for the extraction of syntactic units from dependency-parsed French texts. To evaluate the reliability of the package, syntactic units were extracted from a corpus of L2 French and were compared to units extracted manually from the same corpus. The f-score of the extracted units ranged from 0.53–0.97. Although units were not always identical between the two methods, manual and automatically-derived syntactic complexity measures were strongly and significantly correlated ( = 0.62–0.97,  < 0.001), suggesting that this package may be a suitable replacement for manual annotation in some cases where manual annotation is not possible but that care should be used in interpreting the measures based on these units.


Article metrics loading...

Loading full text...

Full text loading...


  1. Abeillé, A., & Barrier, N.
    (2004) Enriching a French treebank. InProceedings of the Fourth International Conference on Language Resources and Evaluations (LREC ’04), 2233–2236.
    [Google Scholar]
  2. Benevento, C., & Storch, N.
    (2011) Investigating writing development in secondary school learners of French. Assessing Writing, 16(2), 97–110. doi:  10.1016/j.asw.2011.02.001
    https://doi.org/10.1016/j.asw.2011.02.001 [Google Scholar]
  3. Bernardini, P., & Granfeldt, J.
    (2019) On cross-linguistic variation and measures of linguistic complexity in learner texts: Italian, French and English. International Journal of Applied Linguistics, 29(2), 211–232. doi:  10.1111/ijal.12257
    https://doi.org/10.1111/ijal.12257 [Google Scholar]
  4. Brown, J. D.
    (2014) Classical theory reliability. InA. J. Kunnen (Ed.), The companion to language assessment (pp.1165–1181). Oxford: Wiley-Blackwell.
    [Google Scholar]
  5. Candito, M., Nivre, J., Denis, P., & Anguiano, E. H.
    (2010) Benchmarking of statistical dependency parsers for French. InProceedings of the 23rd International Conference on Computational Linguistics (COLING 2010: Poster Volume), 108–116.
    [Google Scholar]
  6. Council of Europe
    Council of Europe (2001) The common european framework of reference for languages: Learning, teaching, assessment. Cambridge: Cambridge University Press.
    [Google Scholar]
  7. Csardi, G., & Nepusz, T.
    (2006) The igraph software package for complex network research. InterJournal (Complex Systems) 1695 https://igraph.org
    [Google Scholar]
  8. De Clercq, B., & Housen, A.
    (2017) A cross-linguistic perspective on syntactic complexity in L2 development: Syntactic elaboration and diversity. The Modern Language Journal, 101(2), 315–334. doi:  10.1111/modl.12396
    https://doi.org/10.1111/modl.12396 [Google Scholar]
  9. Demol, A., & Hadermann, P.
    (2008) An exploratory study of discourse organisation in French L1, Dutch L1, French L2 and Dutch L2 written narratives. InG. Gilquin, S. Papp, & M. B. Díez-Bedmar (Eds.), Linking up contrastive and learner corpus research (pp.255–282). Amsterdam: Brill. doi:  10.1163/9789401206204_011
    https://doi.org/10.1163/9789401206204_011 [Google Scholar]
  10. Denis, P., & Sagot, B.
    (2012) Coupling an annotated corpus and a lexicon for state-of-the-art POS tagging. Language Resources and Evaluation, 46, 721–736. doi:  10.1007/s10579‑012‑9193‑0
    https://doi.org/10.1007/s10579-012-9193-0 [Google Scholar]
  11. Garretson, G.
    (2011) Dexter coder. Retrievedfromwww.dextercoder.org/
  12. Gyllstad, H., Granfeldt, J., Bernardini, P., & Källkvist, M.
    (2014) Linguistic correlates to communicative proficiency levels of the CEFR: The case of syntactic complexity in written L2 English, L3 French and L4 Italian. EuroSLA Yearbook, 14(1), 1–30. doi:  10.1075/eurosla.14.01gyl
    https://doi.org/10.1075/eurosla.14.01gyl [Google Scholar]
  13. Henry, L., & Wickham, H.
    (2020) purrr: Functional programming tools. Retrievedfromhttps://cran.r-project.org/package=purrr
  14. Honnibal, M., & Montani, I.
    (2017) spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing.
    [Google Scholar]
  15. Klein, D., & Manning, C.
    (2003) Fast exact inference with a factored model for natural language parsing. InS. Becker, S. Thrun, & K. Obermayer (Eds.), Advances in neural information processing systems15 (pp.3–10). Cambridge, MA: The MIT Press.
    [Google Scholar]
  16. Kuiken, F., & Vedder, I.
    (2008) Cognitive task complexity and written output in Italian and French as a foreign language. Journal of Second Language Writing, 17(1), 48–60. doi:  10.1016/j.jslw.2007.08.003
    https://doi.org/10.1016/j.jslw.2007.08.003 [Google Scholar]
  17. Kyle, K.
    (2021) (Ed.) Natural language processing for learner corpus research [Special issue]. International Journal of Learner Corpus Research7(1). 10.1075/ijlcr.00019.int
    https://doi.org/10.1075/ijlcr.00019.int [Google Scholar]
  18. Kyle, K., & Crossley, S. A.
    (2018) Measuring syntactic complexity in L2 writing using fine-grained clausal and phrasal indices. The Modern Language Journal, 102(2), 333–349. doi:  10.1111/modl.12468
    https://doi.org/10.1111/modl.12468 [Google Scholar]
  19. Landis, J. R., & Koch, G. G.
    (1977) The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174. doi:  10.2307/2529310
    https://doi.org/10.2307/2529310 [Google Scholar]
  20. Lu, X.
    (2010) Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics, 15(4), 474–496. doi:  10.1075/ijcl.15.4.02lu
    https://doi.org/10.1075/ijcl.15.4.02lu [Google Scholar]
  21. (2011) A corpus-based evaluation of syntactic complexity measures as indices of college-level ESL writers’ language development. TESOL Quarterly, 45(1), 36–62.
    [Google Scholar]
  22. Nivre, J., Hall, J., & Nilsson, J.
    (2006) MaltParser: A data-driven parser-generator for dependency parsing. InProceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006), 2216–2219.
    [Google Scholar]
  23. Norris, J. M., & Ortega, L.
    (2009) Towards an organic approach to investigating CAF in instructed SLA: The case of complexity. Applied Linguistics, 30(4), 555–578. doi:  10.1093/applin/amp044
    https://doi.org/10.1093/applin/amp044 [Google Scholar]
  24. Ortega, L.
    (2003) Syntactic complexity measures and their relationship to L2 proficiency: A research synthesis of college level L2 writing. Applied Linguistics, 24(4), 492–518. 10.1093/applin/24.4.492
    https://doi.org/10.1093/applin/24.4.492 [Google Scholar]
  25. Plonsky, L., & Derrick, D. J.
    (2016) A meta-analysis of reliability coefficients in second language research. The Modern Language Journal, 100(2), 538–553. doi:  10.1111/modl.12335
    https://doi.org/10.1111/modl.12335 [Google Scholar]
  26. Plonsky, L., & Oswald, F. L.
    (2014) How big is “big”? Interpreting effect sizes in L2 research. Language Learning, 64(4), 878–912. doi:  10.1111/lang.12079
    https://doi.org/10.1111/lang.12079 [Google Scholar]
  27. R Core Team
    R Core Team (2019) R: A language and environment for statistical computing. Retrievedfromhttps://www.r-project.org/
  28. RStudio Team
    RStudio Team (2018) RStudio: Integrated Development Environment for R. Retrievedfromwww.rstudio.com/
  29. Scott, W. A.
    (1955) Reliability of content analysis: The case of nominal scale coding. The Public Opinion Quarterly, 19(3), 321–325. 10.1086/266577
    https://doi.org/10.1086/266577 [Google Scholar]
  30. Shrout, P. E.
    (1998) Measurement reliability and agreement in psychiatry. Statistical Methods in Medical Research, 7(3), 301–317. doi:  10.1177/096228029800700306
    https://doi.org/10.1177/096228029800700306 [Google Scholar]
  31. Vanderbauwhede, G.
    (2012) Le déterminant démonstratif en français et en néerlandais à travers les corpus: Théorie, description, acquisition (Unpublished doctoral dissertation). Katholieke Universiteit Leuven, Leuven, Belgium; Université Paris Ouest Nanterre La Défense, Paris, France.
  32. Vandeweerd, N., Housen, A., & Paquot, M.
    (2021) Applying phraseological complexity measures to L2 French: A partial replication study. International Journal of Learner Corpus Research, 7(2), 197–229. 10.1075/ijlcr.20015.van
    https://doi.org/10.1075/ijlcr.20015.van [Google Scholar]
  33. Way, D. P., Joiner, E. G., & Seaman, M. A.
    (2000) Writing in the secondary foreign language classroom: The effects of prompts and tasks on novice learners of French. The Modern Language Journal, 84(2), 171–184. doi:  10.1111/0026‑7902.00060
    https://doi.org/10.1111/0026-7902.00060 [Google Scholar]
  34. Wolfe-Quintero, K., Inagaki, S., & Kim, H.-Y.
    (1998) Second language development in writing: Measures of fluency, accuracy & complexity. Honolulu, HI: Second Language Teaching & Curriculum Center.
    [Google Scholar]
  35. Cooper, T. C.
    (1976) Measuring written syntactic patterns of second language learners of German. Journal of Educational Research, 69(5), 176–183. doi:  10.1080/00220671.1976.10884868
    https://doi.org/10.1080/00220671.1976.10884868 [Google Scholar]
  36. Hunt, K.
    (1965) Grammatical structures written at three grade levels. Champaign, IL: NCTE.
    [Google Scholar]
  37. (1970) Do sentences in the second language grow like those in the first?TESOL Quarterly, 4(3), 195–202. 10.2307/3585720
    https://doi.org/10.2307/3585720 [Google Scholar]

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error