Volume 7, Issue 1
  • ISSN 2215-1478
  • E-ISSN: 2215-1486
Buy:$35.00 + Taxes



The extraction of phraseological units operationalized in phraseological complexity measures (Paquot, 2019) relies on automatic dependency annotations, yet the suitability of annotation tools for learner language is often overlooked. In the present article, two Dutch dependency parsers, Alpino (van Noord, 2006) and Frog (van den Bosch et al., 2007), are evaluated for their performance in automatically annotating three types of dependency relations (verb + direct object, adjectival modifier, and adverbial modifier relations) across three proficiency levels of L2 Dutch. These observations then serve as the basis for an investigation into the impact of automatic dependency annotation on phraseological sophistication measures. Results indicate that both learner proficiency and the type of dependency relation function as moderating factors in parser performance. Phraseological complexity measures computed on the basis of both automatic and manual dependency annotations demonstrate moderate to high correlations, reflecting a moderate to low impact of automatic annotation on subsequent analyses.


Article metrics loading...

Loading full text...

Full text loading...


  1. Banerjee, S. , & Pedersen, T.
    (2003) The design, implementation, and use of the Ngram Statistics Package. InProceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics. 10.1007/3‑540‑36456‑0_38
    https://doi.org/10.1007/3-540-36456-0_38 [Google Scholar]
  2. Bestgen, Y.
    (2017) Beyond single-word measures: L2 writing assessment, lexical richness and formulaic competence. System, 69, 65–78. 10.1016/j.system.2017.08.004
    https://doi.org/10.1016/j.system.2017.08.004 [Google Scholar]
  3. Bouma, G. , & Kloosterman, G.
    (2007) Mining syntactically annotated corpora with XQuery. InProceedings of the linguistic annotation workshop (pp.17–24). Stroudsburg: Association for Computational Linguistics. 10.3115/1642059.1642062
    https://doi.org/10.3115/1642059.1642062 [Google Scholar]
  4. Boyd, A. , & Meurers, D.
    (2008) Revisiting the impact of different annotation schemes on PCFG parsing: a grammatical dependency evaluation. InProceedings of the workshop on parsing German (pp.24–32). Stroudsburg: Association for Computational Linguistics. 10.3115/1621401.1621405
    https://doi.org/10.3115/1621401.1621405 [Google Scholar]
  5. Carlsen, C.
    (2012) Proficiency level–a fuzzy variable in computer learner corpora. Applied Linguistics, 33(2), 161–183. 10.1093/applin/amr047
    https://doi.org/10.1093/applin/amr047 [Google Scholar]
  6. Council of Europe
    Council of Europe (2001) Common European framework of reference for languages: Learning, teaching, assessment. Cambridge, UK: Cambridge University Press.
    [Google Scholar]
  7. Daelemans, W. , van den Bosch, A. , & Weijters, T.
    (1997) IGTree: using trees for compression and classification in lazy learning algorithms. Artificial Intelligence Review, 11(1), 407–423. 10.1023/A:1006506017891
    https://doi.org/10.1023/A:1006506017891 [Google Scholar]
  8. de Marneffe, M.-C. , Dozat, T. , Silveira, N. , Haverinen, K. , Ginter, F. , Nivre, J. , & Manning, C. D.
    (2014) Universal Stanford Dependencies: A cross-linguistic typology. InProceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14) (pp.4585–4592). European Language Resources Association (ELRA)
    [Google Scholar]
  9. de Marneffe, M.-C. , & Nivre, J.
    (2019) Dependency grammar. Annual Review of Linguistics, 5, 197–218. 10.1146/annurev‑linguistics‑011718‑011842
    https://doi.org/10.1146/annurev-linguistics-011718-011842 [Google Scholar]
  10. Díaz-Negrillo, A. , Meurers, D. , Valera, S. , & Wunsch, H.
    (2010) Towards interlanguage POS annotation for effective learner corpora in SLA and FLT. Language Forum, 36(1–2), 139–154.
    [Google Scholar]
  11. Dickinson, M. , & Ragheb, M.
    (2009) Dependency annotation for learner corpora. In M. Passarotti , A. Przepiórkowski , S. Raynaud , & F. Van Eynde (Eds.), Proceedings of the eighth international workshop on treebanks and linguistic theories (pp.59–70). Milan: EDUCatt.
    [Google Scholar]
  12. Durrant, P. , & Schmitt, N.
    (2009) To what extent do native and non-native writers make use of collocations?International Review of Applied Linguistics in Language Teaching, 47(2), 157–177. 10.1515/iral.2009.007
    https://doi.org/10.1515/iral.2009.007 [Google Scholar]
  13. Granger, S. , & Bestgen, Y.
    (2014) The use of collocations by intermediate vs. advanced non-native writers: A bigram-based study. International Review of Applied Linguistics in Language Teaching, 52(3), 229–252. 10.1515/iral‑2014‑0011
    https://doi.org/10.1515/iral-2014-0011 [Google Scholar]
  14. Granger, S. , & Paquot, M.
    (2008) Disentangling the phraseological web. In S. Granger & F. Meunier (Eds.), Phraseology: An interdisciplinary perspective (pp.27–49). Amsterdam, Philadelphia: John Benjamins. 10.1075/z.139.07gra
    https://doi.org/10.1075/z.139.07gra [Google Scholar]
  15. Gries, S. T.
    (2008) Phraseology and linguistic theory: a brief survey. In S. Granger & F. Meunier (Eds.), Phraseology: An interdisciplinary perspective (pp.3–26). Amsterdam, Philadelphia: John Benjamins. 10.1075/z.139.06gri
    https://doi.org/10.1075/z.139.06gri [Google Scholar]
  16. Heid, U.
    (2008) Computational phraseology: an overview. In S. Granger & F. Meunier (Eds.), Phraseology: An interdisciplinary perspective (pp.337–360). Amsterdam, Philadelphia: John Benjamins. 10.1075/z.139.28hei
    https://doi.org/10.1075/z.139.28hei [Google Scholar]
  17. Housen, A. , & Kuiken, F.
    (2009) Complexity, accuracy, and fluency in second language acquisition. Applied linguistics, 30(4), 461–473. 10.1093/applin/amp048
    https://doi.org/10.1093/applin/amp048 [Google Scholar]
  18. Huang, Y. , Murakami, A. , Alexopoulou, T. , & Korhonen, A.
    (2018) Dependency parsing of learner English. International Journal of Corpus Linguistics, 23(1), 28–54. 10.1075/ijcl.16080.hua
    https://doi.org/10.1075/ijcl.16080.hua [Google Scholar]
  19. Krivanek, J. , & Meurers, D.
    (2013) Comparing rule-based and data-driven dependency parsing of learner language. InK. Gerdes, E. Hajičová, & L. Wanner (Eds.), Computational dependency theory (pp.207-225). Amsterdam: IOS Press.
    [Google Scholar]
  20. Lüdeling, A. , Walter, M. , Kroymann, E. , & Adolphs, P.
    (2005) Multi-level error annotation in learner corpora. InProceedings of corpus linguistics 2005.
    [Google Scholar]
  21. Meurers, D.
    (2009) On the automatic analysis of learner language: Introduction to the special issue. CALICO Journal, 26(3), 469–473. 10.1558/cj.v26i3.469‑473
    https://doi.org/10.1558/cj.v26i3.469-473 [Google Scholar]
  22. Meurers, D. , & Dickinson, M.
    (2017) Evidence and interpretation in language learning research: Opportunities for collaboration with computational linguistics. Language Learning, 67(S1), 66–95. 10.1111/lang.12233
    https://doi.org/10.1111/lang.12233 [Google Scholar]
  23. Meurers, D. , & Wunsch, H.
    (2010) Linguistically annotated learner corpora: Aspects of a layered linguistic encoding and standardized representation. InProceedings of Linguistic Evidence.
    [Google Scholar]
  24. Norris, J. M. , & Ortega, L.
    (2009) Towards an organic approach to investigating CAF in instructed SLA: The case of complexity. Applied Linguistics, 30(4), 555–578. 10.1093/applin/amp044
    https://doi.org/10.1093/applin/amp044 [Google Scholar]
  25. Ordelman, R. J. F. , De Jong, F. M. G. , Van Hessen, A. J. , & Hondorp, G. H. W.
    (2007) TwNC: a Multifaceted Dutch News Corpus. ELRA Newsletter, 12(3–4).
    [Google Scholar]
  26. Ortega, L.
    (2003) Syntactic complexity measures and their relationship to L2 proficiency: A research synthesis of college-level L2 writing. Applied Linguistics24(4), 492–518. 10.1093/applin/24.4.492
    https://doi.org/10.1093/applin/24.4.492 [Google Scholar]
  27. Ott, N. , & Ziai, R.
    (2010) Evaluating dependency parsing performance on German learner language. In M. Dickinson , K. Müürisep , & M. Passarotti (Eds.), Proceedings of the ninth international workshop on treebanks and linguistic theories Vol. 9 (pp.175–186). Northern European Association for Language Technology (NEALT).
    [Google Scholar]
  28. Paquot, M.
    (2018) Phraseological competence: A missing component in university entrance language tests? insights from a study of EFL learners’ use of statistical collocations. Language Assessment Quarterly, 15(1), 29–43. 10.1080/15434303.2017.1405421
    https://doi.org/10.1080/15434303.2017.1405421 [Google Scholar]
  29. (2019) The phraseological dimension in interlanguage complexity research. Second Language Research, 35(1), 121–145. 10.1177/0267658317694221
    https://doi.org/10.1177/0267658317694221 [Google Scholar]
  30. R Core Team
    R Core Team (2017) R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria. Retrieved fromhttps://www.R-project.org/
  31. Ragheb, M. , & Dickinson, M.
    (2012) Defining syntax for learner language annotation. In M. Kay & C. Boitet (Eds.), Proceedings of COLING 2012 (pp.965–974).
    [Google Scholar]
  32. Rubin, Housen , & Paquot
    (in press). Phraseological complexity as an index of L2 Dutch writing proficiency: A partial replication study. In S. Granger Ed. Perspectives on the Second Language Phrasicon: The View from Learner Corpora. Bristol: Multilingual Matters.
    [Google Scholar]
  33. Sharwood Smith, M. & Truscott, J.
    (2005) Stages or Continua in Second Language Acquisition: A MOGUL Solution. Applied Linguistics, 26(2), 219–240. 10.1093/applin/amh049
    https://doi.org/10.1093/applin/amh049 [Google Scholar]
  34. Tsarfaty, R. , Nivre, J. , & Andersson, E.
    (2011) Evaluating dependency parsing: Robust and heuristics-free cross-annotation evaluation. InProceedings of the 2011 conference on empirical methods in natural language processing (pp.385–396). Stroudsburg: Association for Computational Linguistics.
    [Google Scholar]
  35. van den Bosch, A. , Busser, B. , Canisius, S. , & Daelemans, W.
    (2007) An efficient memory-based morphosyntactic tagger and parser for Dutch. In P. Dirix , I. Schuurman , V. Vandeghinste , & F. Van Eynde (Eds.), Proceedings of the 17th meeting of Computational Linguistics in the Netherlands (pp.191–206).
    [Google Scholar]
  36. van der Beek, L. , Bouma, G. , Malouf, R. , & van Noord, G.
    (2002) The Alpino dependency treebank. InComputational linguistics in the Netherlands 2001 (pp.8–22).
    [Google Scholar]
  37. van Noord, G.
    (2006) At last parsing is now operational. InTALN 2006 (pp.20–42).
    [Google Scholar]
  38. van Noord, G. , Schuurman, I. , & Bouma, G.
    (2011) Lassy Syntactische Annotatie, Revision 19455. Retrieved fromwww.let.rug.nl/vannoord/Lassy/sa-man_lassy.pdf
    [Google Scholar]
  39. van Noord, G. , Schuurman, I. , & Vandeghinste, V.
    (2006) Syntactic annotation of large corpora in STEVIN. InProceedings of the fifth international conference on language resources and evaluation (LREC’06). European Language Resources Association (ELRA).
    [Google Scholar]
  40. Weiss, Z. & Meurers, D.
    (this issue). Analyzing the linguistic complexity of German learner language in a reading comprehension task: Using proficiency classification to investigate short answer data, the impact of linguistic analysis quality, and cross-data generalizability. International Journal of Learner Corpus Research, Special Issue on NLP.
    [Google Scholar]

Data & Media loading...

  • Article Type: Research Article
Keyword(s): dependency parsing; L2 Dutch; phraseological complexity; proficiency
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error