Volume 5, Issue 1
  • ISSN 2542-3835
  • E-ISSN: 2542-3843
Buy:$35.00 + Taxes



This paper discusses the degree to which some of the most widely-used measures of association in corpus linguistics are not particularly valid in the sense of actually measuring association rather than some amalgam of a lot of frequency and a little association. The paper demonstrates these issues on the basis of hypothetical and actual corpus data and outlines implications of the findings. I then outline how to design an association measure that only measures association and show that its behavior supports the use of the log odds ratio as a true association-only measure but separately from frequency; in addition, this paper sets the stage for an analogous review of dispersion measures in corpus linguistics.


A commentary article has been published for this article:
How can we communicate (visually) what we (usually) mean by collocation and keyness?

Article metrics loading...

Loading full text...

Full text loading...


  1. Baayen, R. Harald, Petar Milin, & Michael Ramscar
    2016 Frequency in lexical processing. Aphasiaology30(11). 1174–1220. 10.1080/02687038.2016.1147767
    https://doi.org/10.1080/02687038.2016.1147767 [Google Scholar]
  2. Bestgen, Yves & Sylviane Granger
    2014 Quantifying the development of phraseological competence in L2 English writing: An automated approach. Journal of Second Language Writing26. 28–41. 10.1016/j.jslw.2014.09.004
    https://doi.org/10.1016/j.jslw.2014.09.004 [Google Scholar]
  3. Chruch, Kenneth W. & Patrick Hanks
    1993 Word association norms, mutual information, and lexicography. Computational Linguistics16(1). 22–29.
    [Google Scholar]
  4. Dunning, Ted
    1993 Accurate methods for the statistics of surprise and coincidence. Computational Linguistics19(1), 61–74.
    [Google Scholar]
  5. Durrant, Phil
    2014 Corpus frequency and second language learners’ knowledge of collocations. International Journal of Corpus Linguistics19(4). 443–477. 10.1075/ijcl.19.4.01dur
    https://doi.org/10.1075/ijcl.19.4.01dur [Google Scholar]
  6. Durrant, Phil & Norbert Schmitt
    2009 To what extent do native and non-native writers make use of collocations?Internationak Review of Applied Linguistics47. 157–177. 10.1515/iral.2009.007
    https://doi.org/10.1515/iral.2009.007 [Google Scholar]
  7. Ellis, Nick C.
    2007a Language acquisition as rational contingency learning. Applied Linguistics27(1). 1–24. 10.1093/applin/ami038
    https://doi.org/10.1093/applin/ami038 [Google Scholar]
  8. 2007b The Associative-Cognitive CREED. InBill VanPatten & Jessica Williams (eds.), Theories of second language acquisition: an introduction, 77–95. Mahwah, NJ: Lawrence Erlbaum.
    [Google Scholar]
  9. Ellis, Nick C., Rita Simpson-Vlach, & Carson Maynard
    2008 Formulaic language in native and second language speakers: Psycholinguistics, corpus linguistics, and TESOL. TESOL Quarterly42(3). 375–396. 10.1002/j.1545‑7249.2008.tb00137.x
    https://doi.org/10.1002/j.1545-7249.2008.tb00137.x [Google Scholar]
  10. Evert, Stefan
    2009 Corpora and collocations. InAnke Lüdeling & Merja. Kytö (eds.), Corpus Linguistics: An International Handbook, Vol.2, 1212–1248. Berlin & New York: Mouton de Gruyter.
    [Google Scholar]
  11. Evert, Stefan & Brigitte Krenn
    2001 Methods for the qualitative evaluation of lexical association measures. Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, p, 188–195. 10.3115/1073012.1073037
    https://doi.org/10.3115/1073012.1073037 [Google Scholar]
  12. Groom, Nicholas
    2009 Effects of second language immersion on second language collocational development. InAndy Barfield & Henrik Gyllstad (eds.), Researching collocations in another language, 21–33. Basingstoke, UK: Palgrave Macmillan. 10.1057/9780230245327_2
    https://doi.org/10.1057/9780230245327_2 [Google Scholar]
  13. Gries, Stefan Th.
    2008 Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics13(4). 403–437. 10.1075/ijcl.13.4.02gri
    https://doi.org/10.1075/ijcl.13.4.02gri [Google Scholar]
  14. 2010 Dispersions and adjusted frequencies in corpora: further explorations. InStefan Th. Gries, Stefanie Wulff, & Mark Davies (eds.), Corpus linguistic applications: current studies, new directions, 197–212. Amsterdam: Rodopi. 10.1163/9789042028012_014
    https://doi.org/10.1163/9789042028012_014 [Google Scholar]
  15. 2013 50-something years of work on collocations: what is or should be next … International Journal of Corpus Linguistics18(1). 137–165. 10.1075/ijcl.18.1.09gri
    https://doi.org/10.1075/ijcl.18.1.09gri [Google Scholar]
  16. 2019aTen lectures on corpus-linguistic approaches: Applications for usage-based and psycholinguistic research. Leiden & Boston: Brill. 10.1163/9789004410343
    https://doi.org/10.1163/9789004410343 [Google Scholar]
  17. 2019b 15 years of collostructions: some long overdue additions/corrections (to/of actually all sorts of corpus-linguistics measures). International Journal of Corpus Linguistics24(3). 385–412. 10.1075/ijcl.00011.gri
    https://doi.org/10.1075/ijcl.00011.gri [Google Scholar]
  18. 2020 Analyzing dispersion. InMagali Paquot & Stefan Th. Gries (eds.), A practical handbook of corpus linguistics, 99–118. Berlin & New York: Springer. 10.1007/978‑3‑030‑46216‑1_5
    https://doi.org/10.1007/978-3-030-46216-1_5 [Google Scholar]
  19. Gries, Stefan Th.
    2021 A new approach to (key) keywords analysis: using frequency, and now also dispersion. Research in Corpus Linguistics9(2). 1–33. 10.32714/ricl.09.02.02
    https://doi.org/10.32714/ricl.09.02.02 [Google Scholar]
  20. Gries, Stefan Th.
    2022 What do (some of) our dispersion measures measure (most)? Dispersion?Journal of Second Language Studies. 10.1075/jsls.21028.gri
    https://doi.org/10.1075/jsls.21028.gri [Google Scholar]
  21. Hunston, Susan
    2002Corpora in applied linguistics. Cambridge: Cambridge University Press. 10.1017/CBO9781139524773
    https://doi.org/10.1017/CBO9781139524773 [Google Scholar]
  22. Pecina, Pavel
    2009 Lexical AMs and collocation extraction. Language Resources and Evaluation44(1–2). 137–158. 10.1007/s10579‑009‑9101‑4
    https://doi.org/10.1007/s10579-009-9101-4 [Google Scholar]
  23. Savický, Petr & Jaroslava Hlaváčová
    2002 Measures of word commonness. Journal of Quantitative Linguistics9(3), 215–231. 10.1076/jqul.
    https://doi.org/10.1076/jqul. [Google Scholar]
  24. Schmid, Hans Joerg
    2010 Entrenchment, salience, and basic levels. InDirk Geeraerts & Hubert Cuyckens (eds.), The Oxford Handbook of Cognitive Linguistics, 117–138. Oxford: Oxford University Press.
    [Google Scholar]
  25. Siyanova-Chanturia, Anna
    2015 Collocation in beginner learner writing: A longitudinal study. System53. 148–160. 10.1016/j.system.2015.07.003
    https://doi.org/10.1016/j.system.2015.07.003 [Google Scholar]
  26. Stubbs, Michael
    1995 Collocations and semantic profiles: on the cause of the trouble with quantitative methods. Functions of Language2(1). 23–55. 10.1075/fol.2.1.03stu
    https://doi.org/10.1075/fol.2.1.03stu [Google Scholar]
  27. Thanopoulos, Aristomenis, Nikos Fakotakis, & George Kokkinakis
    2002 Comparative Evaluation of Collocation Extraction Metrics. Paper presented atLREC 2002.
    [Google Scholar]

Data & Media loading...

  • Article Type: Research Article
Keyword(s): association; dispersion; frequency; generalized additive modeling; log-likelihood; MI; t
This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error