Volume 7, Issue 1
  • ISSN 2215-1478
  • E-ISSN: 2215-1486
Buy:$35.00 + Taxes



This paper explores the use of natural language processing (NLP) tools and their utility for learner language analyses through a comparison of automatic linguistic annotation against a gold standard produced by humans. While there are a number of automated annotation tools for English currently available, little research is available on the accuracy of these tools when annotating learner data. We compare the performance of three linguistic annotation tools (a tagger and two parsers) on academic writing in English produced by learners (both L1 and L2 English speakers). We focus on lexico-grammatical patterns, including both phrasal and clausal features, since these are frequently investigated in applied linguistics studies. Our results report both precision and recall of annotation output for argumentative texts in English across four L1s: Arabic, Chinese, English, and Korean. We close with a discussion of the benefits and drawbacks of using automatic tools to annotate learner language.


Article metrics loading...

Loading full text...

Full text loading...


  1. Ansarifar, A. , Shahriari, H. , & Pishghadam, R.
    (2018) Phrasal complexity in academic writing: A comparison of abstracts written by graduate students and expert writers in applied linguistics. Journal of English for Academic Purposes, 31, 58–71. 10.1016/j.jeap.2017.12.008
    https://doi.org/10.1016/j.jeap.2017.12.008 [Google Scholar]
  2. Biber, D.
    (1988) Variation across speech and writing. Cambridge: Cambridge University Press. 10.1017/CBO9780511621024
    https://doi.org/10.1017/CBO9780511621024 [Google Scholar]
  3. (2006) University language: A corpus-based study of spoken and written registers. Amsterdam: John Benjamins Publishing. 10.1075/scl.23
    https://doi.org/10.1075/scl.23 [Google Scholar]
  4. Biber, D. , & Gray, B.
    (2013) Discourse characteristics of writing and speaking task types on the TOEFL ibt® test: a lexico-grammatical analysis. ETS Research Report Series 2013(1), i–128. 10.1002/j.2333‑8504.2013.tb02311.x
    https://doi.org/10.1002/j.2333-8504.2013.tb02311.x [Google Scholar]
  5. (2016) Grammatical complexity in academic English: Linguistic change in writing. Cambridge: Cambridge University Press. 10.1017/CBO9780511920776
    https://doi.org/10.1017/CBO9780511920776 [Google Scholar]
  6. Biber, D. , Gray, B. , & Poonpon, K.
    (2011) Should we use characteristics of conversation to measure grammatical complexity in L2 writing development?Tesol Quarterly, 45(1), 5–35. 10.5054/tq.2011.244483
    https://doi.org/10.5054/tq.2011.244483 [Google Scholar]
  7. Biber, D. , Johansson, S. , Leech, G. , Conrad, S. , & Finegan, E.
    (1999) Longman grammar of written and spoken English. Harlow: Longman.
    [Google Scholar]
  8. Buchholz, S. , & Marsi, E.
    (2006) CoNLL-X shared task on multilingual dependency parsing. In L. Màrquez & D. Klein (Eds.), Proceedings of the tenth conference on computational natural language learning (pp.149–164). Stroudsburg: Association for Computational Linguistics. 10.3115/1596276.1596305
    https://doi.org/10.3115/1596276.1596305 [Google Scholar]
  9. Canty, A. , & Ripley, B.
    (2019) Boot: Bootstrap R (S-Plus) Functions. R package version1.3–22.
    [Google Scholar]
  10. Casal, J. E. , & Lee, J. J.
    (2019) Syntactic complexity and writing quality in assessed first-year L2 writing. Journal of Second Language Writing, 44, 51–62. 10.1016/j.jslw.2019.03.005
    https://doi.org/10.1016/j.jslw.2019.03.005 [Google Scholar]
  11. Cer, D. M. , de Marneffe, M. , Jurafsky, D. , & Manning, C.
    (2010) Parsing to Stanford Dependencies: Trade-offs between Speed and Accuracy. In N. Calzolari , K. Choukri , B. Maegaard , J. Mariani , J. Odijk , S. Piperidis , M. Rosner , & D. Tapias (Eds.), Proceedings of the 2010 International Conference on Language Resources and Evaluation (pp.1–5). European Language Resources Association (ELRA).
    [Google Scholar]
  12. Charles, M.
    (2007) Argument or evidence? Disciplinary variation in the use of the noun that pattern in stance construction. English for Specific Purposes, 26(2), 203–218. 10.1016/j.esp.2006.08.004
    https://doi.org/10.1016/j.esp.2006.08.004 [Google Scholar]
  13. Charniak, E.
    (2000) A maximum-entropy-inspired parser. In J. Wiebe (Ed.), Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference (pp.132–139). Stroudsburg: Association for Computational Linguistics.
    [Google Scholar]
  14. Chen, D. , & Manning, C.
    (2014) A fast and accurate dependency parser using neural networks. In A. Moschitti , B. Pang , W. Daelemans (Eds.), Proceedings of the 2014 conference on empirical methods in natural language processing (pp.740–750). Stroudsburg: Association for Computational Linguistics.
    [Google Scholar]
  15. Crossley, S. A. , & McNamara, D. S.
    (2009) Computational assessment of lexical differences in L1 and L2 writing. Journal of Second Language Writing, 18(2), 119–135. 10.1016/j.jslw.2009.02.002
    https://doi.org/10.1016/j.jslw.2009.02.002 [Google Scholar]
  16. (2014) Does writing development equal writing quality? A computational investigation of syntactic complexity in L2 learners. Journal of Second Language Writing, 26, 66–79. 10.1016/j.jslw.2014.09.006
    https://doi.org/10.1016/j.jslw.2014.09.006 [Google Scholar]
  17. de Marneffe, M. , & Manning, C.
    (2008) The Stanford typed dependencies representation. InColing 2008: proceedings of the workshop on cross-framework and cross-domain parser evaluation (pp.1–8). Stroudsburg: Association for Computational Linguistics. 10.3115/1608858.1608859
    https://doi.org/10.3115/1608858.1608859 [Google Scholar]
  18. Eisenstein, J.
    (2019) Introduction to natural language processing. Cambridge, MA: The MIT Press.
    [Google Scholar]
  19. ETS
    ETS (2014) A guide to understanding TOEFL iBT® scores. Educational Testing Service.
    [Google Scholar]
  20. Francis, W. , & Kučera, H.
    (1964) Brown corpus. Providence, Rhode Island: Department of Linguistics, Brown University.
    [Google Scholar]
  21. Graesser, A. C. , McNamara, D. S. , Louwerse, M. M. , & Cai, Z.
    (2004) Coh-Metrix: Analysis of text on cohesion and language. Behavior research methods, instruments, & computers, 36(2), 193–202. 10.3758/BF03195564
    https://doi.org/10.3758/BF03195564 [Google Scholar]
  22. Granger, S.
    (2008) Learner corpora in foreign language education. In S. Thorne & S. May (Eds.), Language, Education and Technology. Encyclopedia of Language and Education (pp.1427–1441). Berlin: Springer. 10.1007/978‑0‑387‑30424‑3_109
    https://doi.org/10.1007/978-0-387-30424-3_109 [Google Scholar]
  23. Halacsy, P. , Kornai, A. , & Oravecz, C.
    (2007) Hunpos: an open source trigram tagger. In S. Ananiadou (Ed.), Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions (pp.209–212). Stroudsburg: Association for Computational Linguistics. 10.3115/1557769.1557830
    https://doi.org/10.3115/1557769.1557830 [Google Scholar]
  24. Hempelmann, C. F. , Rus, V. , Graesser, A. C. , & McNamara, D. S.
    (2006) Evaluating state-of-the-art treebank-style parsers for Coh-metrix and other learning technology environments. Natural Language Engineering, 12(2), 131–144. 10.1017/S1351324906004207
    https://doi.org/10.1017/S1351324906004207 [Google Scholar]
  25. Honnibal, M. , & Montani, I.
    (2017) spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing [Python Library version 2.3.2].
    [Google Scholar]
  26. Jiang, J. , Bi, P. , & Liu, H.
    (2019) Syntactic complexity development in the writings of EFL learners: Insights from a dependency syntactically-annotated corpus. Journal of Second Language Writing, 46, 100666–100679. 10.1016/j.jslw.2019.100666
    https://doi.org/10.1016/j.jslw.2019.100666 [Google Scholar]
  27. Johansson, S. , Leech, G. , & Goodluck, H.
    (1978) Manual of information to accompany the Lancaster-Olso/Bergen corpus of British English, for use with digital computers. Oslo. Department of English, University of Oslo. Retrieved fromkorpus.uib.no/icame/manuals/LOB/INDEX.HTM
    [Google Scholar]
  28. Jurafsky, D. , & Martin, J. H.
    (2008) Speech and language processing: An introduction to natural Language processing, computational linguistics, and speech recognition. Upper Saddle River, NJ: Pearson Prentice Hall.
    [Google Scholar]
  29. Klein, D. , & Manning, C. D.
    (2003) Fast exact inference with a factored model for natural language parsing. In S. Becker , S. Thrun , & K. Obermayer (Eds.), Advances in neural information processing systems 15 (pp.3–10). Cambridge, MA: The MIT Press.
    [Google Scholar]
  30. Koehn, P.
    (2004) Statistical significance tests for machine translation evaluation. InProceedings of the 2004 conference on empirical methods in natural language processing (pp.388–395). Stroudsburg: Association for Computational Linguistics.
    [Google Scholar]
  31. Kyle, K.
    (2016) Measuring syntactic development in L2 writing: Fine Grained Indices of Syntactic Complexity and Usage-based Indices of Syntactic Sophistication (Unpublished doctoral dissertation). Georgia State University, Atlanta, GA.
    [Google Scholar]
  32. Kyle, K. , & Crossley, S. A.
    (2018) Measuring syntactic complexity in L2 writing using fine-grained clausal and phrasal indices. The Modern Language Journal, 102(2), 333–349. 10.1111/modl.12468
    https://doi.org/10.1111/modl.12468 [Google Scholar]
  33. Levy, R. , & Andrew, G.
    (2006) Tregex and Tsurgeon: tools for querying and manipulating tree data structures. In N. Calzolari , K. Choukri , A. Gangemi , B. Maegaard , J. Mariani , J. Odijk , & D. Tapias (Eds.), Proceedings of the Fifth International Conference on Language Resources and Evaluation (pp.2231–2234). European Language Resources Association (ELRA).
    [Google Scholar]
  34. Liu, L. , & Li, L.
    (2016) Noun Phrase Complexity in EFL Academic Writing: A Corpus-Based Study of Postgraduate Academic Writing. Journal of Asia TEFL, 13(1), 48–66.
    [Google Scholar]
  35. Lu, X.
    (2010) Automatic analysis of syntactic complexity in second language writing. International journal of corpus linguistics, 15(4), 474–496. 10.1075/ijcl.15.4.02lu
    https://doi.org/10.1075/ijcl.15.4.02lu [Google Scholar]
  36. Lu, X. , & Ai, H.
    (2015) Syntactic complexity in college-level English writing: Differences among writers with diverse L1 backgrounds. Journal of Second Language Writing, 29, 16–27. 10.1016/j.jslw.2015.06.003
    https://doi.org/10.1016/j.jslw.2015.06.003 [Google Scholar]
  37. Marcus, M. , Marcinkiewicz, M. , & Santorini, B.
    (1993) Building a large annotated corpus of English: The Penn Treebank. Computational linguistics, 19(2), 313–330.
    [Google Scholar]
  38. Marcus, M. , Kim, G. , Marcinkiewicz, M. A. , MacIntyre, R. , Bies, A. , Ferguson, M. , Katz, K. , & Schasberger, B.
    (1994) The Penn Treebank: annotating predicate argument structure. InProceedings of Human Language Technology Workshop (pp.114–119). Stroudsburg: Association for Computational Linguistics. 10.3115/1075812.1075835
    https://doi.org/10.3115/1075812.1075835 [Google Scholar]
  39. McNamara, D. S. , Graesser, A. C. , McCarthy, P. M. , & Cai, Z.
    (2014) Automated evaluation of text and discourse with Coh-Metrix. New York, NY: Cambridge University Press. 10.1017/CBO9780511894664
    https://doi.org/10.1017/CBO9780511894664 [Google Scholar]
  40. Nivre, J. , Hall, J. , Nilsson, J. , Chanev, A. , Eryigit, G. , Kubler, S. , Marinov, S. , & Marsi, E.
    (2007) MaltParser: A language-independent system for data-driven dependency parsing. Natural Language Engineering, 13(2), 95–135. 10.1017/S1351324906004505
    https://doi.org/10.1017/S1351324906004505 [Google Scholar]
  41. Norris, J. M. , & Ortega, L.
    (2009) Towards an organic approach to investigating CAF in instructed SLA: The case of complexity. Applied Linguistics, 30(4), 555–578. 10.1093/applin/amp044
    https://doi.org/10.1093/applin/amp044 [Google Scholar]
  42. Ott, N. , & Ziai, R.
    (2010) Evaluating dependency parsing performance on German learner language. In M. Dickinson , K. Müürisep , & M. Passarotti (Eds.), Proceedings of the Ninth International Workshop on Treebanks and Linguistic Theories (pp.175–186). Tartu: NEALT.
    [Google Scholar]
  43. Paquot, M.
    (2019) The phraseological dimension in interlanguage complexity research. Second Language Research, 35(1), 121–145. 10.1177/0267658317694221
    https://doi.org/10.1177/0267658317694221 [Google Scholar]
  44. Pérez-Paredes, P. , & Díez-Bedmar, M. B.
    (2019) Researching learner language through POS keyword and syntactic complexity analyses. In S. Götz & J. Mukherjee (Eds.), Learner Corpora and Language Teaching (pp.101–127). Amsterdam: John Benjamins Publishing. 10.1075/scl.92.06per
    https://doi.org/10.1075/scl.92.06per [Google Scholar]
  45. Parkinson, J. , & Musgrave, J.
    (2014) Development of noun phrase complexity in the writing of English for academic purposes students. Journal of English for Academic Purposes, 14, 48–59. 10.1016/j.jeap.2013.12.001
    https://doi.org/10.1016/j.jeap.2013.12.001 [Google Scholar]
  46. Paul, D. , & Baker, J.
    (1992) The design for the Wall Street Journal-based CSR corpus. InProceedings of the workshop on Speech and Natural Language (pp.357–362). Stroudsburg: Association for Computational Linguistics. 10.3115/1075527.1075614
    https://doi.org/10.3115/1075527.1075614 [Google Scholar]
  47. Peters, T.
    (2018) Difflib: Helpers for computing differences between objects. [Python library]. Retrieved fromhttps://docs.python.org/3/library/difflib.html
  48. Polio, C. , & Yoon, H. J.
    (2018) The reliability and validity of automated tools for examining variation in syntactic complexity across genres. International Journal of Applied Linguistics, 28(1), 165–188. 10.1111/ijal.12200
    https://doi.org/10.1111/ijal.12200 [Google Scholar]
  49. Rayson, P.
    (2008) From key words to key semantic domains. International Journal of Corpus Linguistics, 13(4), 519–549. 10.1075/ijcl.13.4.06ray
    https://doi.org/10.1075/ijcl.13.4.06ray [Google Scholar]
  50. (2009) Wmatrix: A Web-based Corpus-processing Environment. Lancaster: Computing Department, Lancaster University.
    [Google Scholar]
  51. Riezler, S. , & Maxwell, J. T.
    (2005) On some pitfalls in automatic evaluation and significance testing for MT. InProceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization (pp.57–64). New Brunswick: Association for Computational Linguistics.
    [Google Scholar]
  52. Santorini, B.
    (1990) Part-of-speech tagging guidelines for the Penn Treebank (3rd Revision, 2nd Edition). Philadelphia: Department of Computer Science, University of Pennsylvania. Retrieved fromhttps://catalog.ldc.upenn.edu/docs/LDC99T42/tagguid1.pdf
  53. Schmid, H.
    (2019) Deep learning-based morphological taggers and lemmatizers for annotating historical texts. InProceedings of the Digital Access to Textual Cultural Heritage conference (DATeCH) (pp.133–137). New York: Association for Computing Machinery.
    [Google Scholar]
  54. Shenoy, G. G. , Dsouza, E. H. , & Kübler, S.
    (2017) Performing stance detection on Twitter data using computational linguistics techniques. arXiv, arXiv:1703.02019.
    [Google Scholar]
  55. Simar, L. , & Wilson, P. W.
    (1998) Sensitivity analysis of efficiency scores: How to bootstrap in nonparametric frontier models. Management Science, 44(1), 49–61. 10.1287/mnsc.44.1.49
    https://doi.org/10.1287/mnsc.44.1.49 [Google Scholar]
  56. Staples, S. , & Reppen, R.
    (2016) Understanding first-year L2 writing: A lexico-grammatical analysis across L1s, genres, and language ratings. Journal of Second Language Writing, 32, 17–35. 10.1016/j.jslw.2016.02.002
    https://doi.org/10.1016/j.jslw.2016.02.002 [Google Scholar]
  57. Staples, S. , Biber, D. , & Reppen, R.
    (2018) Using Corpus-Based Register Analysis to Explore the Authenticity of High-Stakes Language Exams: A Register Comparison of TOEFL iBT and Disciplinary Writing Tasks. The Modern Language Journal, 102(2), 310–332. 10.1111/modl.12465
    https://doi.org/10.1111/modl.12465 [Google Scholar]
  58. Sokolova, M. , & Lapalme, G.
    (2009) A systematic analysis of performance measures for classification tasks. Information processing & management, 45(4), 427–437. 10.1016/j.ipm.2009.03.002
    https://doi.org/10.1016/j.ipm.2009.03.002 [Google Scholar]
  59. van Rooy, B.
    (2015) Annotating learner corpora. In S. Granger , G. Gilquin , & F. Meunier (Eds.), The Cambridge Handbook of Learner Corpus Research (pp.79–106). Cambridge: Cambridge University Press. 10.1017/CBO9781139649414.005
    https://doi.org/10.1017/CBO9781139649414.005 [Google Scholar]
  60. Yoon, H. , & Polio, C.
    (2017) ESL students’ linguistic development in two written genres. TESOL Quarterly, 51(2), 275–301. 10.1002/tesq.296
    https://doi.org/10.1002/tesq.296 [Google Scholar]

Data & Media loading...

  • Article Type: Research Article
Keyword(s): automated annotation; learner English; learner NLP; writing research
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error