Volume 5, Issue 2
  • ISSN 2542-3835
  • E-ISSN: 2542-3843
Buy:$35.00 + Taxes



This paper discusses the degree to which most of the most widely-used measures of dispersion in corpus linguistics are not particularly valid in the sense of actually measuring dispersion rather than some amalgam of a lot of frequency and a little dispersion. The paper demonstrates these issues on the basis of data from a variety of corpora. I then outline how to design a dispersion measure that only measures dispersion and show that (i) it indeed measures information that is different from frequency in an intuitive way and (ii) has a higher degree of predictive power of lexical decision times from the MALD database than nearly all other measures in nearly all corpora tested.


Article metrics loading...

Loading full text...

Full text loading...


  1. Adelman, James S., Gordon D. A. Brown, & José F. Quesada
    2006 Contextual Diversity, not word frequency, determines word-naming and lexical decision times. Psychological Science19(9). 814–823. 10.1111/j.1467‑9280.2006.01787.x
    https://doi.org/10.1111/j.1467-9280.2006.01787.x [Google Scholar]
  2. Baayen, R. Harald
    2008Analyzing linguistic data: a practical introduction to statistics with R. Cambridge: Cambridge University Press. 10.1017/CBO9780511801686
    https://doi.org/10.1017/CBO9780511801686 [Google Scholar]
  3. 2010 Demythologizing the word frequency effect: A discriminative learning perspective. The Mental Lexicon5(3). 436–461. 10.1075/ml.5.3.10baa
    https://doi.org/10.1075/ml.5.3.10baa [Google Scholar]
  4. Baayen, R. Harald, Petar Milin, & Michael Ramscar
    2016 Frequency in lexical processing. Aphasiaology30(11). 1174–1220. 10.1080/02687038.2016.1147767
    https://doi.org/10.1080/02687038.2016.1147767 [Google Scholar]
  5. Balota, David A. & Daniel H. Spieler
    1998 The utility of item level analyses in model evaluation: a reply to Seidenberg and Plaut. Psychological Science9(3). 238–240. 10.1111/1467‑9280.00047
    https://doi.org/10.1111/1467-9280.00047 [Google Scholar]
  6. Bestgen, Yves & Sylviane Granger
    2009 Quantifying the development of phraseological competence in L2 English writing: An automated approach. Journal of Second Language Writing26. 28–41. 10.1016/j.jslw.2014.09.004
    https://doi.org/10.1016/j.jslw.2014.09.004 [Google Scholar]
  7. Brysbaert, Marc & Boris New
    2009 Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods41(4). 977–990. 10.3758/BRM.41.4.977
    https://doi.org/10.3758/BRM.41.4.977 [Google Scholar]
  8. Brysbaert, Marc, Pawel Mandera, Samantha F. McCormick, & Emmanuel Keuleers
    2019 Word prevalence norms for 62,000 English lemmas. Behavior Research Methods51. 467–479. 10.3758/s13428‑018‑1077‑9
    https://doi.org/10.3758/s13428-018-1077-9 [Google Scholar]
  9. Carroll, John B.
    1970 An alternative to Juilland’s usage coefficient for lexical frequencies and a proposal for a standard frequency index. Computer Studies in the Humanities and Verbal Behaviour3(2). 61–65.
    [Google Scholar]
  10. Durrant, Phil & Norbert Schmitt
    2009 To what extent do native and non-native writers make use of collocations?International Review of Applied Linguistics47. 157–177. 10.1515/iral.2009.007
    https://doi.org/10.1515/iral.2009.007 [Google Scholar]
  11. Ellis, Nick C.
    2007a Language acquisition as rational contingency learning. Applied Linguistics27(1). 1–24. 10.1093/applin/ami038
    https://doi.org/10.1093/applin/ami038 [Google Scholar]
  12. 2007b The Associative-Cognitive CREED. InBill VanPatten & Jessica Williams. (eds.), Theories of second language acquisition: an introduction, 77–95. Mahwah, NJ: Lawrence Erlbaum.
    [Google Scholar]
  13. Ellis, Nick C., Rita Simpson-Vlach, & Carson Maynard
    2008 Formulaic language in native and second language speakers: Psycholinguistics, corpus linguistics, and TESOL. TESOL Quarterly42(3). 375–396. 10.1002/j.1545‑7249.2008.tb00137.x
    https://doi.org/10.1002/j.1545-7249.2008.tb00137.x [Google Scholar]
  14. Evert, Stefan
    2009 Corpora and collocations. InAnke Lüdeling & Merja. Kytö. (eds.), Corpus Linguistics: An International Handbook, Vol.2, 1212–1248. Berlin & New York: Mouton de Gruyter.
    [Google Scholar]
  15. Fu, M. & Shaofeng, Li
    2019 The associations between individual differences in working memory and the effectiveness of immediate and delayed corrective feedback. Journal of Second Language Studies2(2). 233-257 (25) 10.1075/jsls.19002.fu
    https://doi.org/10.1075/jsls.19002.fu [Google Scholar]
  16. Gries, Stefan Th.
    2008 Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics13(4). 403–437. 10.1075/ijcl.13.4.02gri
    https://doi.org/10.1075/ijcl.13.4.02gri [Google Scholar]
  17. 2010 Dispersions and adjusted frequencies in corpora: further explorations. InStefan Th. Gries, Stefanie Wulff, & Mark Davies. (eds.), Corpus linguistic applications: current studies, new directions, 197–212. Amsterdam: Rodopi. 10.1163/9789042028012_014
    https://doi.org/10.1163/9789042028012_014 [Google Scholar]
  18. 2019aTen lectures on corpus-linguistic approaches: Applications for usage-based and psycholinguistic research. Leiden & Boston: Brill. 10.1163/9789004410343
    https://doi.org/10.1163/9789004410343 [Google Scholar]
  19. 2019b 15 years of collostructions: some long overdue additions/corrections (to/of actually all sorts of corpus-linguistics measures). International Journal of Corpus Linguistics24(3). 385–412. 10.1075/ijcl.00011.gri
    https://doi.org/10.1075/ijcl.00011.gri [Google Scholar]
  20. 2020 Analyzing dispersion. InMagali Paquot & Stefan Th. Gries. (eds.), A practical handbook of corpus linguistics, 99–118. Berlin & New York: Springer. 10.1007/978‑3‑030‑46216‑1_5
    https://doi.org/10.1007/978-3-030-46216-1_5 [Google Scholar]
  21. Gries, Stefan, Th.
    2021 What do (some of) our association measures measure (most)? Association?Journal of Second Language Studies. Available online: 12 November 2021. 10.1075/jsls.21028.gri
    https://doi.org/10.1075/jsls.21028.gri [Google Scholar]
  22. Juilland, Alphonse G., Dorothy R. Brodin, & Catherine Davidovitch
    1970Frequency dictionary of French words. The Hague: Mouton de Gruyter.
    [Google Scholar]
  23. Kromer, Victor
    2003 An usage measure based on psychophysical relations. Journal of Quantitative Linguistics10(2). 177–186. 10.1076/jqul.
    https://doi.org/10.1076/jqul. [Google Scholar]
  24. Oakes, Michael P. & Malcolm Farrow
    2007 Use of the Chi-Squared Test to examine vocabulary differences in English language corpora representing seven different countries. Literary and Linguistic Computing22(1). 85–99. 10.1093/llc/fql044
    https://doi.org/10.1093/llc/fql044 [Google Scholar]
  25. Pecina, Pavel
    2009 Lexical association measures and collocation extraction. Language Resources and Evaluation44(1–2). 137–158. 10.1007/s10579‑009‑9101‑4
    https://doi.org/10.1007/s10579-009-9101-4 [Google Scholar]
  26. Robertson, Stephen
    2004 Understanding Inverse Document Frequency: on theoretical arguments of IDF. Journal of Documentation60(5). 503–520. 10.1108/00220410410560582
    https://doi.org/10.1108/00220410410560582 [Google Scholar]
  27. Rosengren, Inger
    1971 The quantitative concept of language and its relation to the structure of frequency dictionaries. Études de linguistique appliquée (Nouvelle Série)1. 103–127.
    [Google Scholar]
  28. Savický, Petr & Jaroslava Hlaváčová
    2002 Measures of word commonness. Journal of Quantitative Linguistics9(3), 215–231. 10.1076/jqul.
    https://doi.org/10.1076/jqul. [Google Scholar]
  29. Schmid, Hans Joerg
    2010 Entrenchment, salience, and basic levels. InDirk Geeraerts & Hubert Cuyckens. (eds.), The Oxford Handbook of Cognitive Linguistics, 117–138. Oxford: Oxford University Press.
    [Google Scholar]
  30. Siyanova-Chanturia, Anna
    2015 Collocation in beginner learner writing: A longitudinal study. System53. 148–160. 10.1016/j.system.2015.07.003
    https://doi.org/10.1016/j.system.2015.07.003 [Google Scholar]
  31. Spärck Jones, Karen
    1972 A statistical interpretation of term specificity and its application in information retrieval. Journal of Documentation28(1). 11–21. 10.1108/eb026526
    https://doi.org/10.1108/eb026526 [Google Scholar]
  32. Spieler, Daniel H. & David A. Balota
    1997 Bringing computational models of word naming down to the item level. Psychological Science8(6). 411–416. 10.1111/j.1467‑9280.1997.tb00453.x
    https://doi.org/10.1111/j.1467-9280.1997.tb00453.x [Google Scholar]
  33. Tucker, Benjamin V., Daniel Brennerm, D. Kyle Danielson, Matthew C. Kelley, Filip Nenadić, & Michelle Sims
    2019 The Massive Auditory Lexical Decision (MALD) database. Behavior Research Methods51. 1187–1204. 10.3758/s13428‑018‑1056‑1
    https://doi.org/10.3758/s13428-018-1056-1 [Google Scholar]
  34. Zagorsky, Jay L.
    2007 Do you have to be smart to be rich? The impact of IQ on wealth, income and financial distress. Intelligence35(5). 489–501. 10.1016/j.intell.2007.02.003
    https://doi.org/10.1016/j.intell.2007.02.003 [Google Scholar]

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error