Volume 6, Issue 1
  • ISSN 2542-3835
  • E-ISSN: 2542-3843
Buy:$35.00 + Taxes



Corpus linguistic methods can now be easily employed in a wide range of studies within sub-disciplines of linguistics and well beyond. In a two-part paper, Gries (2022a2022b) challenges some of the most widely used ‘association measures’ of what many might feel to be powerful aspects of text patterning: collocation and key words. While the additional association measure offers some new possibilities, this paper highlights the strong influence of another frequency parameter on odds ratio and Gries’s suggested association measure, and questions the applicability of his cautions for many different kinds of corpus research. Nevertheless, having been inspired to look at different aspects of association and dispersion more carefully, the author presents some new visualizations which were designed to communicate some of the important lessons to be learned from Gries’s papers, especially for learners and teachers using corpus tools in Second Language classrooms.


This is a commentary article in response to the following content:
What do (some of) our association measures measure (most)? Association?

Article metrics loading...

Loading full text...

Full text loading...


  1. Anthony, L.
    (2022) AntConc (Version 4.0.1). Tokyo, Japan: Waseda University. Retrieved fromhttps://www.laurenceanthony.net/software/antconc/
  2. Bestgen, Yves & Sylviane Granger
    2014 Quantifying the development of phraseological competence in L2 English writing: An automated approach. Journal of Second Language Writing261. 28–41. 10.1016/j.jslw.2014.09.004
    https://doi.org/10.1016/j.jslw.2014.09.004 [Google Scholar]
  3. Brezina, V., McEnery, T., & Wattam, S.
    (2015) Collocations in context: A new perspective on collocation networks. International Journal of Corpus Linguistics, 20(2), 139–173. 10.1075/ijcl.20.2.01bre
    https://doi.org/10.1075/ijcl.20.2.01bre [Google Scholar]
  4. BNC
    BNC (2007) The British National Corpus (Version 3 BNC XML ed.): Oxford University Computing Services on behalf of the BNC Consortium. URL: www.natcorp.ox.ac.uk/
  5. Croft, W. B., Metzler, D., & Strohman, T.
    (2010) Search Engines: Information Retrieval in Practice. Boston: Addison-Wesley.
    [Google Scholar]
  6. Dunning, T.
    (1993) Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1), 61–74.
    [Google Scholar]
  7. Evert, Stefan & Brigitte Krenn
    (2001) Methods for the qualitative evaluation of lexical association measures. Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, p, 188–195. 10.3115/1073012.1073037
    https://doi.org/10.3115/1073012.1073037 [Google Scholar]
  8. Garside, R., & Smith, N.
    (1997) A hybrid grammatical tagger: CLAWS4. InR. Garside, G. Leech & A. McEnery (Eds.), Corpus Annotation: Linguistic Information from Computer Text Corpora (pp.102–121). London: Longman. 10.4324/9781315841366‑13
    https://doi.org/10.4324/9781315841366-13 [Google Scholar]
  9. Gries, S. T.
    (2013) 50-something years of work on collocations. International Journal of Corpus Linguistics, 18(1), 137–165. 10.1075/ijcl.18.1.09gri
    https://doi.org/10.1075/ijcl.18.1.09gri [Google Scholar]
  10. Gries
    Gries (2015) The most under-used statistical method in corpus linguistics: multi-level (and mixed-effects) models. Corpora, 10(1), 95–125. 10.3366/cor.2015.0068
    https://doi.org/10.3366/cor.2015.0068 [Google Scholar]
  11. Gries, S.
    (2022a) What do (some of) our association measures measure (most)? Association?Journal of Second Language Studies, 5(1). 10.1075/jsls.21028.gri
    https://doi.org/10.1075/jsls.21028.gri [Google Scholar]
  12. (2022b) What do (most of) our dispersion measures measure (most)? Dispersion?Journal of Second Language Studies. 10.1075/jsls.21029.gri
    https://doi.org/10.1075/jsls.21029.gri [Google Scholar]
  13. Hann, M. N.
    (1973) The Statistical Force of Random Distribution. International Journal of Applied Linguistics, 201, 31–44. 10.1075/itl.20.04han
    https://doi.org/10.1075/itl.20.04han [Google Scholar]
  14. Hardie, A.
    (2012) CQPweb: Combining Power, Flexibility and Usability in a Corpus Analysis Tool. International Journal of Corpus Linguistics, 17(3), 380–409. 10.1075/ijcl.17.3.04har
    https://doi.org/10.1075/ijcl.17.3.04har [Google Scholar]
  15. Heaps, H.
    (1978) Information retrieval: Computational and theoretical aspects. New York: Academic Press.
    [Google Scholar]
  16. Hoey, M.
    (2005) Lexical Priming: A New Theory of Words and Language. London: Routledge.
    [Google Scholar]
  17. (2014) Words and their neighbours. InJ. R. Taylor (Ed.), Oxford Handbook of the Word. Oxford: Oxford University Press.
    [Google Scholar]
  18. Hunston, S.
    (2002) Corpora in Applied Linguistics. Cambridge: Cambridge University Press. 10.1017/CBO9781139524773
    https://doi.org/10.1017/CBO9781139524773 [Google Scholar]
  19. Jeaco, S.
    (2017) Concordancing Lexical Primings: The rationale and design of a user-friendly corpus tool for English language teaching and self-tutoring based on the Lexical Priming theory of language. InM. Pace-Sigge & K. J. Patterson (Eds.), Lexical Priming: Applications and Advances (pp.273–296). Amsterdam: John Benjamins. 10.1075/scl.79.11jea
    https://doi.org/10.1075/scl.79.11jea [Google Scholar]
  20. (2020) Key words when text forms the unit of study: Sizing up the effects of different measures. International Journal of Corpus Linguistics, 25(2), 125–154. 10.1075/ijcl.18053.jea
    https://doi.org/10.1075/ijcl.18053.jea [Google Scholar]
  21. Kilgarriff, A., Rychly, P., Smrz, P., & Tugwell, D.
    (2004) The Sketch Engine. Paper presented at the2003 International Conference on Natural Language Processing and Knowledge Engineering, Beijing.
    [Google Scholar]
  22. Mahlberg, M.
    (2013) Corpus stylistics and Dickens’s fiction: New York ; Routledge 2013 10.4324/9780203076088
    https://doi.org/10.4324/9780203076088 [Google Scholar]
  23. Oakes, M. P.
    (1998) Statistics for Corpus Linguistics. Edinburgh: Edinburgh University Press.
    [Google Scholar]
  24. O’Keeffe, A., McCarthy, M., & Carter, R.
    (2007) From Corpus to Classroom: Language Use and Language Teaching. Cambridge: Cambridge University Press. 10.1017/CBO9780511497650
    https://doi.org/10.1017/CBO9780511497650 [Google Scholar]
  25. Rayson, P., & Garside, R.
    (2000) Comparing corpora using frequency profiling. Paper presented at theWorkshop on Comparing Corpora, Hong Kong University of Science and Technology, Hong Kong.
    [Google Scholar]
  26. Read, T. R. C., & Cressie, N. A. C.
    (1988) Goodness-of-fit Statistics for Discrete Multivariate Data. New York: Springer-Verlag. 10.1007/978‑1‑4612‑4578‑0
    https://doi.org/10.1007/978-1-4612-4578-0 [Google Scholar]
  27. RStudio Team
    RStudio Team (2022) RStudio: Integrated Development Environment for R. Boston, MA:PBC. Retrieved fromwww.rstudio.com/
    [Google Scholar]
  28. Rychlý, P.
    (2008) A lexicographer-friendly association score. Paper presented at theRecent Advances in Slavonic Natural Language Processing Conference, Masaryk University, Brno.
    [Google Scholar]
  29. Scott, M.
    (1997) PC analysis of key words – and key key words. System, 25(2), 233–245. 10.1016/S0346‑251X(97)00011‑0
    https://doi.org/10.1016/S0346-251X(97)00011-0 [Google Scholar]
  30. Scott, M., & Tribble, C.
    (2006) Textual Patterns: Key Words and Corpus Analysis in Language Education. Amsterdam: John Benjamins. 10.1075/scl.22
    https://doi.org/10.1075/scl.22 [Google Scholar]
  31. Scott, M.
    (2020) WordSmith Tools (Version 8). Oxford: Oxford University Press.
    [Google Scholar]
  32. (2022) WordSmith Tools online manual “KeyWords: calculation”. Retrieved31 October, 2022, fromwww.lexically.net/downloads/version7/HTML/keywords_calculate_info.htm
  33. Sinclair, J. M.
    (1991) Corpus, Concordance, Collocation. Oxford: Oxford University Press.
    [Google Scholar]
  34. (2004) New evidence, new priorities, new attitudes. InJ. M. Sinclair (Ed.), How to Use Corpora in Language Teaching (pp.271–299). Amsterdam: John Benjamins. 10.1075/scl.12.20sin
    https://doi.org/10.1075/scl.12.20sin [Google Scholar]
  35. Wermter, J., & Hahn, U.
    (2006) You can’t beat frequency (unless you use linguistic knowledge): A qualitative evaluation of association measures for collocation and term extraction. Paper presented at theAnnual Meeting of the Association for Computational Linguistics, Sydney. 10.3115/1220175.1220274
    https://doi.org/10.3115/1220175.1220274 [Google Scholar]
  36. Wood, S. N.
    (2011) Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. Journal of the Royal Statistical Society (B) 73(1):3–36. 10.1111/j.1467‑9868.2010.00749.x
    https://doi.org/10.1111/j.1467-9868.2010.00749.x [Google Scholar]
  37. Zipf, G. K.
    (1935) The Psycho-Biology of Language: An Introduction to Dynamic Philology. Boston, MA: Houghton Mifflin.
    [Google Scholar]

Data & Media loading...

  • Article Type: Article Commentary
Keyword(s): association; collocation; dispersion; frequency; Gries’s DP; keyness; log-likelihood; range
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error