Volume 1, Issue 2
  • ISSN 2542-3835
  • E-ISSN: 2542-3843
Buy:$35.00 + Taxes


This paper critically discusses how corpus linguistics in general, but learner corpus research in particular, has been dealing with all sorts of frequency data in general, but over- and underuse frequencies in particular. I demonstrate on the basis of learner corpus data the pitfalls of using aggregate data and lacking statistical control that much work is unfortunately characterized by. In fact, I will demonstrate that monofactorial methods have very little to offer at all to research on observational data. While this paper is admittedly very didactic and methodological, I think the discussion of the empirical data offered here – a reanalysis of previously published work – shows how misleading many studies potentially and provides far-reaching implications for much of corpus linguistics and learner corpus research. Ideally/maximally, this paper together with Paquot & Plonsky ( 2017 , ) would lead to a complete revision of how learner corpus linguists use quantitative methods and study over-/underuse; minimally, this paper would stimulate a much-needed discussion of currently lacking methodological sophistication.


Article metrics loading...

Loading full text...

Full text loading...


  1. Aijmer, K.
    (2002) Modality in advanced Swedish learners’ written interlanguage. In S. Granger , J. Hung , & S. Petch-Tyson (Eds.), Computer learner corpora, second language acquisition, and foreign language teaching (pp.55–76). Amsterdam: John Benjamins.10.1075/lllt.6.07aij
    https://doi.org/10.1075/lllt.6.07aij [Google Scholar]
  2. Altenberg, B.
    (2002) Using bilingual corpus evidence in learner corpus research. In S. Granger , J. Hung , & S. Petch-Tyson (Eds.), Computer learner corpora, second language acquisition, and foreign language teaching (pp.37–54). Amsterdam: John Benjamins.10.1075/lllt.6.06alt
    https://doi.org/10.1075/lllt.6.06alt [Google Scholar]
  3. Burnham, K. P. , & Anderson, D. R.
    (2002) Model selection and multimodel inference: A practical information-theoretic approach (2nd ed). New York, NY: Springer.
    [Google Scholar]
  4. Connor, U. , Precht, K. , & Upton, T.
    (2005) Business English: Learner data from Belgium, Finland, and the U.S. In S. Granger , J. Hung , & S. Petch-Tyson (Eds.), Computer learner corpora, second language acquisition, and foreign language teaching (pp.175–194). Amsterdam: John Benjamins.
    [Google Scholar]
  5. Doğruöz, A. S. , & Gries, S. Th
    (2012) Spread of on-going changes in an immigrant language: Turkish in the Netherlands. Review of Cognitive Linguistics, 10(2), 401–426.10.1075/rcl.10.2.07sez
    https://doi.org/10.1075/rcl.10.2.07sez [Google Scholar]
  6. Fox, J.
    (2003) Effect displays in R for generalised linear models. Journal of Statistical Software, 8(15), 1–27.10.18637/jss.v008.i15
    https://doi.org/10.18637/jss.v008.i15 [Google Scholar]
  7. Gilquin, G. , & Granger, S.
    (2011) From EFL to ESL: Evidence from the International Corpus of Learner English. In J. Mukherjee & M. Hundt (Eds.), Exploring second-language varieties of English and learner Englishes: Bridging a paradigm gap (pp.55–78). Amsterdam: John Benjamins.10.1075/scl.44.04gra
    https://doi.org/10.1075/scl.44.04gra [Google Scholar]
  8. Gilquin, G. , & Lefer, M. -A.
    (2017) Exploring word-formation in Learner Corpus Research: A case study on English negative affixes. Paper presented atthe Learner Corpus Research conference 2017, Bolzano, Italy.
    [Google Scholar]
  9. Gries, S. Th.
    (2006) Exploring variability within and between corpora: some methodological considerations. Corpora, 1(2), 109–151.10.3366/cor.2006.1.2.109
    https://doi.org/10.3366/cor.2006.1.2.109 [Google Scholar]
  10. Gries, S. Th
    (2008) Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics, 13(4), 403–437.10.1075/ijcl.13.4.02gri
    https://doi.org/10.1075/ijcl.13.4.02gri [Google Scholar]
  11. (2013) Statistics for linguistics with R (2nd rev. and ext. ed). Berlin: De Gruyter Mouton.10.1515/9783110307474
    https://doi.org/10.1515/9783110307474 [Google Scholar]
  12. (2015) The most underused statistical method in corpus linguistics: Multi-level (and mixed-effects) models. Corpora, 10(1), 95–125.10.3366/cor.2015.0068
    https://doi.org/10.3366/cor.2015.0068 [Google Scholar]
  13. Gries, S. Th ., & Adelman, A. S.
    (2014) Subject realization in Japanese conversation by native and non-native speakers: Exemplifying a new paradigm for learner corpus research. In J. Romero-Trillo (Ed.), Yearbook of corpus linguistics and pragmatics 2014: New empirical and theoretical paradigms (pp.35–54). Cham: Springer.
    [Google Scholar]
  14. Gries, S. Th ., & Deshors, S. C.
    (2014) Using regressions to explore deviations between corpus data and a standard/target: Two suggestions. Corpora, 9(1), 109–136.10.3366/cor.2014.0053
    https://doi.org/10.3366/cor.2014.0053 [Google Scholar]
  15. Gries, S. Th
    . (to appear). Priming of syntactic alternations by learners of English: An analysis of sentence-completion and collostructional results.
    [Google Scholar]
  16. Gries, S. Th ., & Wulff, S.
    (2009) Psycholinguistic and corpus linguistic evidence for L2 constructions. Annual Review of Cognitive Linguistics, 7, 163–186.10.1075/arcl.7.07gri
    https://doi.org/10.1075/arcl.7.07gri [Google Scholar]
  17. Hasselgård, H. , & Johansson, S.
    (2011) Learner corpora and contrastive interlanguage analysis. In F. Meunier , S. De Cock , G. Gilquin , & M. Paquot (Eds.), A taste for corpora: In honour of Sylviane Granger (pp.33–61). Amsterdam: John Benjamins.10.1075/scl.45.06has
    https://doi.org/10.1075/scl.45.06has [Google Scholar]
  18. Hawkins, J. A.
    (1994) A performance theory of order and constituency. Cambridge: Cambridge University Press.
    [Google Scholar]
  19. Hyland, K. , & Milton, J.
    (1997) Qualification and certainty in L1 and L2 students’ writing. Journal of Second Language Writing, 6(2), 183–205.10.1016/S1060‑3743(97)90033‑3
    https://doi.org/10.1016/S1060-3743(97)90033-3 [Google Scholar]
  20. Jaeger, T. F.
    (2010) Redundancy and reduction: Speakers manage syntactic information density. Cognitive Psychology, 61(1), 23–62.10.1016/j.cogpsych.2010.02.002
    https://doi.org/10.1016/j.cogpsych.2010.02.002 [Google Scholar]
  21. Labov, W.
    (1982) The social stratification of English in New York City. Washington, DC: Center for Applied Linguistics.
    [Google Scholar]
  22. Laufer, B. , & Waldman, T.
    (2011) Verb-noun collocations in second language writing: A corpus analysis of learners’ English. Language Learning, 61(2), 647–672.10.1111/j.1467‑9922.2010.00621.x
    https://doi.org/10.1111/j.1467-9922.2010.00621.x [Google Scholar]
  23. Neff van Aertselaer, J. & Bunce, C.
    (2012) The use of small corpora for tracing the development of academic literacies. In F. Meunier , S. De Cock , G. Gilquin , & M. Paquot (Eds.), A taste for corpora: In honour of Sylviane Granger (pp.63–83). Amsterdam: John Benjamins.
    [Google Scholar]
  24. Paquot, M. & Plonsky, L.
    (2017) Quantitative research methods and study quality in learner corpus research. International Journal of Learner Corpus Research, 3(1), 61–94.10.1075/ijlcr.3.1.03paq
    https://doi.org/10.1075/ijlcr.3.1.03paq [Google Scholar]
  25. Wulff, S.
    (2016) A friendly conspiracy of input, L1, and processing demands: that-variation in German and Spanish learner language. In A. Tyler , L. Ortega , H. I. Park , & M. Uno (Eds.), The usage-based study of language learning and multilingualism (pp.115–136). Washington, DC: Georgetown University Press.
    [Google Scholar]
  26. Wulff, S. , Lester, N. A. & Martinez-Garcia, M. M.
    (2014)  That-variation in German and Spanish L2 English. Language and Cognition, 6(2), 271–299.10.1017/langcog.2014.5
    https://doi.org/10.1017/langcog.2014.5 [Google Scholar]

Data & Media loading...

  • Article Type: Research Article
Keyword(s): learner corpora; multifactorial analysis; over-/underuse; speaker/file variation
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error