1887
Volume 5, Issue 2
  • ISSN 2542-3835
  • E-ISSN: 2542-3843
USD
Buy:$35.00 + Taxes

Abstract

Abstract

This paper discusses the degree to which most of the most widely-used measures of dispersion in corpus linguistics are not particularly valid in the sense of actually measuring dispersion rather than some amalgam of a lot of frequency and a little dispersion. The paper demonstrates these issues on the basis of data from a variety of corpora. I then outline how to design a dispersion measure that only measures dispersion and show that (i) it indeed measures information that is different from frequency in an intuitive way and (ii) has a higher degree of predictive power of lexical decision times from the MALD database than nearly all other measures in nearly all corpora tested.

Loading

Article metrics loading...

/content/journals/10.1075/jsls.21029.gri
2021-11-30
2024-10-16
Loading full text...

Full text loading...

References

  1. Adelman, James S., Gordon D. A. Brown, & José F. Quesada
    2006 Contextual Diversity, not word frequency, determines word-naming and lexical decision times. Psychological Science19(9). 814–823. 10.1111/j.1467‑9280.2006.01787.x
    https://doi.org/10.1111/j.1467-9280.2006.01787.x [Google Scholar]
  2. Baayen, R. Harald
    2008Analyzing linguistic data: a practical introduction to statistics with R. Cambridge: Cambridge University Press. 10.1017/CBO9780511801686
    https://doi.org/10.1017/CBO9780511801686 [Google Scholar]
  3. 2010 Demythologizing the word frequency effect: A discriminative learning perspective. The Mental Lexicon5(3). 436–461. 10.1075/ml.5.3.10baa
    https://doi.org/10.1075/ml.5.3.10baa [Google Scholar]
  4. Baayen, R. Harald, Petar Milin, & Michael Ramscar
    2016 Frequency in lexical processing. Aphasiaology30(11). 1174–1220. 10.1080/02687038.2016.1147767
    https://doi.org/10.1080/02687038.2016.1147767 [Google Scholar]
  5. Balota, David A. & Daniel H. Spieler
    1998 The utility of item level analyses in model evaluation: a reply to Seidenberg and Plaut. Psychological Science9(3). 238–240. 10.1111/1467‑9280.00047
    https://doi.org/10.1111/1467-9280.00047 [Google Scholar]
  6. Bestgen, Yves & Sylviane Granger
    2009 Quantifying the development of phraseological competence in L2 English writing: An automated approach. Journal of Second Language Writing26. 28–41. 10.1016/j.jslw.2014.09.004
    https://doi.org/10.1016/j.jslw.2014.09.004 [Google Scholar]
  7. Brysbaert, Marc & Boris New
    2009 Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods41(4). 977–990. 10.3758/BRM.41.4.977
    https://doi.org/10.3758/BRM.41.4.977 [Google Scholar]
  8. Brysbaert, Marc, Pawel Mandera, Samantha F. McCormick, & Emmanuel Keuleers
    2019 Word prevalence norms for 62,000 English lemmas. Behavior Research Methods51. 467–479. 10.3758/s13428‑018‑1077‑9
    https://doi.org/10.3758/s13428-018-1077-9 [Google Scholar]
  9. Carroll, John B.
    1970 An alternative to Juilland’s usage coefficient for lexical frequencies and a proposal for a standard frequency index. Computer Studies in the Humanities and Verbal Behaviour3(2). 61–65.
    [Google Scholar]
  10. Durrant, Phil & Norbert Schmitt
    2009 To what extent do native and non-native writers make use of collocations?International Review of Applied Linguistics47. 157–177. 10.1515/iral.2009.007
    https://doi.org/10.1515/iral.2009.007 [Google Scholar]
  11. Ellis, Nick C.
    2007a Language acquisition as rational contingency learning. Applied Linguistics27(1). 1–24. 10.1093/applin/ami038
    https://doi.org/10.1093/applin/ami038 [Google Scholar]
  12. 2007b The Associative-Cognitive CREED. InBill VanPatten & Jessica Williams. (eds.), Theories of second language acquisition: an introduction, 77–95. Mahwah, NJ: Lawrence Erlbaum.
    [Google Scholar]
  13. Ellis, Nick C., Rita Simpson-Vlach, & Carson Maynard
    2008 Formulaic language in native and second language speakers: Psycholinguistics, corpus linguistics, and TESOL. TESOL Quarterly42(3). 375–396. 10.1002/j.1545‑7249.2008.tb00137.x
    https://doi.org/10.1002/j.1545-7249.2008.tb00137.x [Google Scholar]
  14. Evert, Stefan
    2009 Corpora and collocations. InAnke Lüdeling & Merja. Kytö. (eds.), Corpus Linguistics: An International Handbook, Vol.2, 1212–1248. Berlin & New York: Mouton de Gruyter.
    [Google Scholar]
  15. Fu, M. & Shaofeng, Li
    2019 The associations between individual differences in working memory and the effectiveness of immediate and delayed corrective feedback. Journal of Second Language Studies2(2). 233-257 (25) 10.1075/jsls.19002.fu
    https://doi.org/10.1075/jsls.19002.fu [Google Scholar]
  16. Gries, Stefan Th.
    2008 Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics13(4). 403–437. 10.1075/ijcl.13.4.02gri
    https://doi.org/10.1075/ijcl.13.4.02gri [Google Scholar]
  17. 2010 Dispersions and adjusted frequencies in corpora: further explorations. InStefan Th. Gries, Stefanie Wulff, & Mark Davies. (eds.), Corpus linguistic applications: current studies, new directions, 197–212. Amsterdam: Rodopi. 10.1163/9789042028012_014
    https://doi.org/10.1163/9789042028012_014 [Google Scholar]
  18. 2019aTen lectures on corpus-linguistic approaches: Applications for usage-based and psycholinguistic research. Leiden & Boston: Brill. 10.1163/9789004410343
    https://doi.org/10.1163/9789004410343 [Google Scholar]
  19. 2019b 15 years of collostructions: some long overdue additions/corrections (to/of actually all sorts of corpus-linguistics measures). International Journal of Corpus Linguistics24(3). 385–412. 10.1075/ijcl.00011.gri
    https://doi.org/10.1075/ijcl.00011.gri [Google Scholar]
  20. 2020 Analyzing dispersion. InMagali Paquot & Stefan Th. Gries. (eds.), A practical handbook of corpus linguistics, 99–118. Berlin & New York: Springer. 10.1007/978‑3‑030‑46216‑1_5
    https://doi.org/10.1007/978-3-030-46216-1_5 [Google Scholar]
  21. Gries, Stefan, Th.
    2021 What do (some of) our association measures measure (most)? Association?Journal of Second Language Studies. Available online: 12 November 2021. 10.1075/jsls.21028.gri
    https://doi.org/10.1075/jsls.21028.gri [Google Scholar]
  22. Juilland, Alphonse G., Dorothy R. Brodin, & Catherine Davidovitch
    1970Frequency dictionary of French words. The Hague: Mouton de Gruyter.
    [Google Scholar]
  23. Kromer, Victor
    2003 An usage measure based on psychophysical relations. Journal of Quantitative Linguistics10(2). 177–186. 10.1076/jqul.10.2.177.16718
    https://doi.org/10.1076/jqul.10.2.177.16718 [Google Scholar]
  24. Oakes, Michael P. & Malcolm Farrow
    2007 Use of the Chi-Squared Test to examine vocabulary differences in English language corpora representing seven different countries. Literary and Linguistic Computing22(1). 85–99. 10.1093/llc/fql044
    https://doi.org/10.1093/llc/fql044 [Google Scholar]
  25. Pecina, Pavel
    2009 Lexical association measures and collocation extraction. Language Resources and Evaluation44(1–2). 137–158. 10.1007/s10579‑009‑9101‑4
    https://doi.org/10.1007/s10579-009-9101-4 [Google Scholar]
  26. Robertson, Stephen
    2004 Understanding Inverse Document Frequency: on theoretical arguments of IDF. Journal of Documentation60(5). 503–520. 10.1108/00220410410560582
    https://doi.org/10.1108/00220410410560582 [Google Scholar]
  27. Rosengren, Inger
    1971 The quantitative concept of language and its relation to the structure of frequency dictionaries. Études de linguistique appliquée (Nouvelle Série)1. 103–127.
    [Google Scholar]
  28. Savický, Petr & Jaroslava Hlaváčová
    2002 Measures of word commonness. Journal of Quantitative Linguistics9(3), 215–231. 10.1076/jqul.9.3.215.14124
    https://doi.org/10.1076/jqul.9.3.215.14124 [Google Scholar]
  29. Schmid, Hans Joerg
    2010 Entrenchment, salience, and basic levels. InDirk Geeraerts & Hubert Cuyckens. (eds.), The Oxford Handbook of Cognitive Linguistics, 117–138. Oxford: Oxford University Press.
    [Google Scholar]
  30. Siyanova-Chanturia, Anna
    2015 Collocation in beginner learner writing: A longitudinal study. System53. 148–160. 10.1016/j.system.2015.07.003
    https://doi.org/10.1016/j.system.2015.07.003 [Google Scholar]
  31. Spärck Jones, Karen
    1972 A statistical interpretation of term specificity and its application in information retrieval. Journal of Documentation28(1). 11–21. 10.1108/eb026526
    https://doi.org/10.1108/eb026526 [Google Scholar]
  32. Spieler, Daniel H. & David A. Balota
    1997 Bringing computational models of word naming down to the item level. Psychological Science8(6). 411–416. 10.1111/j.1467‑9280.1997.tb00453.x
    https://doi.org/10.1111/j.1467-9280.1997.tb00453.x [Google Scholar]
  33. Tucker, Benjamin V., Daniel Brennerm, D. Kyle Danielson, Matthew C. Kelley, Filip Nenadić, & Michelle Sims
    2019 The Massive Auditory Lexical Decision (MALD) database. Behavior Research Methods51. 1187–1204. 10.3758/s13428‑018‑1056‑1
    https://doi.org/10.3758/s13428-018-1056-1 [Google Scholar]
  34. Zagorsky, Jay L.
    2007 Do you have to be smart to be rich? The impact of IQ on wealth, income and financial distress. Intelligence35(5). 489–501. 10.1016/j.intell.2007.02.003
    https://doi.org/10.1016/j.intell.2007.02.003 [Google Scholar]
/content/journals/10.1075/jsls.21029.gri
Loading
/content/journals/10.1075/jsls.21029.gri
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was successful
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error